Yes, that is correct, even reliable infrastructure fails. This has never been more obvious than the past two weeks. The recent celebrity deaths caused massive spikes in traffic to several sites, so of course, some of the sites could not handle the load. However, this is only a small part of what I am talking about.

In the past week, we saw the ever reliable Rackspace go down, taking a whole bunch of sites with it. Just a few days later, went down due to a fire. TechCrunch relates the seriousness of this outage:

Talk about a serious outage. Payment gateway service provider has been down and out for several hours, a number of tipsters inform us. That has big implications: since the service is used by tens of thousands of e-commerce vendors to accept credit card and electronic checks payments on their websites…

Now, imagine if you are an ecommerce website like Toys R Us. Your host, or even your CDN, goes down. If this is during your peak season (Nov. and Dec.), you could lose millions of dollars … in an hour. If your payment processing goes down, the situation becomes more interesting. You cannot make transactions, but your site still looks live. This is potentially more frustrating for customers, who may have been willing to wait for an outage to clear. The “odd errors” customers are likely to see when part of the infrastructure goes down, could cause them to go to another site for their purchases. So, what should you do?

First, if you are a major ecommerce site, you should ensure that your main hosting services are properly redundant. So, if your hosting provider loses a data center, your site should not be impacted. This is not necessary, or affordable,  for smaller sites, but for ecommerce you need to ensure reliability. The main idea with an ecommerce site is that you need to “keep the lights on”. What if your favorite provider does not have redundancy across data centers? In this case, you should look into a smaller and cheaper hosting provider as backup. If you always have your site deployed to a backup server, you can quickly redirect services to the backup site. Granted, their are a lot of pieces in this idea, like the DNS and database servers, but if you have the potential of losing significant revenue due to an outage of one hour you have to take precautions. In some cases, you could even be using these backup servers as external beta servers, so that you do not feel like you are throwing away money.

In the case of’s outage, you can take an example from Twitter. Whenever Twitter has had high loads or general database problems, they turn off a feature like searching. As annoying as that feels from the user perspective, they manage to keep the lights on, but with a limited feature set. Going back to the ecommerce example, if your check processing provider goes down, it would be nice to be able to turn them off, but still accept other forms of payment. Even more impressive would be the ability to quickly switch from one provider to another. Wouldn’t you rather accept payments for a limited number of payment methods, than not accept payments at all? Maybe you can still generate 50% of your normal revenue during that time.

I am assuming most of my readers do not run ecommerce sites, but there is a lot we can learn from these issues. Even a social media application like Twitter wants to maintain as much uptime as possible. So, they turn off search capabilities for a little while. Almost any application can benefit from the ability to turn off a specific feature at any given time. What are you doing for your site to keep the lights on?

Reblog this post [with Zemanta]