constru;ction workers causing problems

What If Your Site is Down? Availability

Clearly, your Web site doesn’t help your business if it’s down. Someone who visits it gets a very bad message of some sort. What kind of impression is made? That your business can’t even bother to keep its Web site working! Who would want to do business with such an organization?

This post deals with the issue of Web site availability for small business. I’ve been forced to confront this issue, because my hosting provider just experienced a 24 hour outage, the first such outage I’ve had with them in more than ten years of using their service.

Everyone wants their Web site to be up–all the time. More correctly, they would like hosting that makes their site always available inexpensive. However, there’s a tradeoff–today, barring widespread Internet failure, you can have whatever level of availability you want, if you’re willing to pay for it. Here are some of the issue and tradeoffs to consider.

Consider Your Needs

What’s the purpose of your site? If it’s to attract business, sign up newsletter subscriptions, and take the occasional online order, then perhaps the business can actually continue to run quite well if the site is down for, say, 24 hours. You don’t want it to be down very much, but you can tolerate an occasional outage.

Of course, if ecommerce is your main business, or some other online function, then you need to be up all the time, so that customers can spend money or do whatever they do with you any time they’d like to. You don’t want a customer who has decided to do business with you to go elsewhere because your site is down.

There are other issues important in hosting as well. For my hosting service, I use the WordFence security plugin, which has valuable security features. One of those features is a scan of all the code on the site, comparing it with the originally released WordPress code, for WordPress itself and plugins. I’ve seen this scan find penetrations of a site, so that I could fix them, before there was any other evidence of a problem. However, the scan takes a significant amount of cpu time to run. I’ve learned that most hosting services limit the cpu time per site so that this scan can’t run to completion. If you installed WordFence on such a site, you could have a penetration that wouldn’t be detected until the site started to malfunction.

Most small businesses, that don’t depend on their site for the daily business to operate, can withstand an outage that’s hours long. However, even a low-traffic site needs to have high limits on cpu time, so that sophisticated security tools can operate properly.

Recent Outage

The outage that my hosting service experienced last week is instructive. The service is provided by a large data center outside New York. It has backup power with on-site fuel for several days. It has multiple Internet connections from different vendors. Each site is monitored continuously, and if there’s a failure the people who manage the service get the site up and running again quickly. This should be the model of great availability, shouldn’t it? There shouldn’t be an outage longer than minutes.

Near the data center are four railroad tracks. Crossing the tracks is a bridge. Near that bridge, a crew was digging last week, and they realized they had “hit something”. That something was a major cable carrying a lot of Internet fiber. As it turns out, the independent Internet connections from the data center all connect to various upstream providers, and all of the upstream providers for the data center crossed the railroad tracks through the same cable! So the entire center was off the Internet, and all of my hosting service was down.

Because all of the upstreams crossed the tracks at the same place, there were no Internet connections from the center that could be used. Happily, the discipline of repairing fiber cables has advanced, so the repair was completed in about 12 hours.

I spent 30 years working for the IT office in a large government organization. We had multiple Internet connections, and we knew about disaster and disaster preparedness. However, one day a backhoe operating near the Washington beltway, near an overpass, cut a major fibre cable. And guess what? Although there were multiple Internet connections coming from our campus, with different companies, upstream the different providers all crossed the Beltway at the same overpass, in the same cable. Needless to say, considerable attention was given to the repair.

What’s the lesson to be drawn from this experience? That in spite of everything that’s done at a single location, Web availability is still subject o occasional outages that may be noticeably long. We all know that Amazon Web Services, that darling of availability and failover, occasionally experiences major outages.

Failover

The big question for availability is the question of failover. Usually, failover has three elements:

  • installing copies of the site on two different IP addresses
  • monitoring the site to detect a failure
  • changing the IP address used by the Internet to access the site to a backup when a failure is detected

I’ve recently looked at a lot of hosting offerings that mention the word “cloud”. Of course we all assume that the “cloud” will be highly available. Similarly, we assume that big companies that offer hosting are highly available.

I found that most of the “cloud” offerings have backup and failover, but it’s entirely within a single computer center.  This means that a fire in the center, that took down the whole center, would take down all the sites. Loss of power if the backup didn’t work (that happens!) could also cause loss of all the sites. Of course, a loss of the Internet connection to the data center could also take all the hosted sites off the air.

The best approach to providing never-interrupted availability is for the two IP addresses to be at two different physical locations, at some distance from each other. In addition, the monitoring and switchover service must itself be highly available; it must itself be built so that it spans several locations. A quality distributed DNS service (that directs the Internet to an IP address for a domain name), along with two instances of quality hosting, are needed.

I’ve found that there are offerings of true 100% availability hosting service, but they are prohibitively expensive for small businesses. If you’re Home Depot, these services are just what you need, but for small businesses, there’s nothing that removes that risk of a whole data center being disabled or cut off from the Internet. At present, we’re all vulnerable to that enemy of Internet connectivity, the backhoe!

I’m staying with my present hosting provider, in spite of the outage, because of more than a decade of highly reliable service from them. They’ve provided excellent availability–and their service is free from the CPU limitations imposed by the major hosting providers that prevent complete security checks for my clients’ sites.

A 100% Solution

For clients who want 100% availability, I’ve begun work on a method to provide hosting service that won’t go down, even if a whole data center has a catastrophic failure or is taken off the Internet. It will involve the three ingredients for failover that are listed above, and the backup sites will be hosted geographically separate from the primary site, and there will be a highly available DDNS service to do the switchover.

Backups will be stored at a third location, independent from the first two, so the site can be restored even if both of the online copies are lost. In addition, if certain conditions are met by the site, additional special security features will be provided so that the site can’t be hacked.

This is an approach that’s intended for small businesses. If you’re the Washington Post, for example, I won’t be able to host you for $200 a year and provide 100% availability. But for small businesses, I have a way to provide very high availability.

I’ll make this 100% available service available to my clients. The approach I’m planning to use will let me offer it at or near my standard price for hosting service. If the price is the same, then I’ll be converting all my clients to 100% availability at no additional charge.

The Bottom Line

Today, the best way to handle this issue is to make it clear to business management that you don’t have 100% availability, that there will be outages that may last hours but they won’t be frequent, that there may be a year or more without an outage. Or, if you really need 100%, then simply pay for it. My offering of 100% Web site availability for small business is on the way, watch for it.

It's only fair to share...Share on Facebook
Facebook
Tweet about this on Twitter
Twitter
Share on LinkedIn
Linkedin
Share on Google+
Google+
Print this page
Print