«Back to blog home

Disaster Recovery - What can happen and what would you do "if"?

With my roots in Systems Administration, Disaster Recovery or "DR" is a subject near and dear to my heart. However, I find that most people discount the importance of having a solid plan of what to do if the unexpected occurs.

When a person has a valuable asset, they never hesitate to insure it "just in case". Yet with a website, which has inherent value and revenue potenial, the idea of insuring their site "just in case" never comes to mind. Unfortunately, it only takes a single mistake to wish that you had a DR plan. Many people will say, "We have a DR plan! What do you think those backups are for?" Unfortunately, having backups is not a DR plan. It is simply one of the counter measures that may come in handy when recovering from a disaster.

Let's take a look at several scenarios when planning for DR:

Data Center

When choosing a hosting provider, always make sure that you have had the opportunity to review their data center locations. Ask about security, age of the facility, power redundancy, bandwidth provider and the address of the facility. It also never hurts to ask if you could take a tour of the facility. It is not so much that you are going to hop on a plane and visit, but if they deny the request entirely, you may want to keep shopping. This could be a sign of issues that they are trying to hide. Usually a claim of SOX / PCI Compliance and umbrella insurance documentation is enough to get a tour. It is not uncommon to require pictures of the location that is physically housing your server.

Mother Nature

So, Hurricane Jane just passed over your data center, leaving a rather sizeable hole in the ceiling.

Never underestimate the power of a storm. While this is more of a concern for your service provider, the end result is the same. If the hole is right above your server rack and water has penetrated your rack and hardware, your server will most likely be out of commission. If someone does not have a contingency plan for how the site will get up and running again, you may be out of business until services can be transferred to another data center or another server nearby. Hint: If your data center is located at the foot of an active volcano, in the most active segment of "tornado alley" or below sea level, you may want to put a little extra effort into your DR Plan (and consider moving).

Theft

John Doe, systems administrator extraordinaire, is doing routine maintenance in your rack. While he is taking his lunch break, he leaves the door open to your rack.

What would you do if someone stole a hard drive containing your entire customer database? While this does not happen every day...nothing in a Disaster Recovery plan does. Competition can be fierce...are YOU located in the same data center as your competitor? Physical theft of components not only means you need to replace the component (hard drive, SAN, backup device), but you also need to replace the data and find out who may have stole it in the first place.

Another reason to consider theft when planning is due to the location of most data centers. Some of the best Tier 1 provider data centers are located in the "bad parts of town" so to speak. Hosting requires 3 things: Space, Bandwidth, and Power. The cheaper the space, the higher their profit margin. Looking for places which have a nice balance between crime rate and the level of security of the facility are always worthwhile choices to house your data.

Also remember, if you are going to manage your own rack...do you want to get a call at 3 am and have to head to a high-crime area? You are now not only responsible for your equipment safety, but your own personal safety as well.

Power Outages

After a thunder storm, the main transformer leading into your data center is 'fried'.

How redundant are the power sources in your data center? The best choice in a data center is one that provides redundant generators. Thus in the case of a power outage, your site will remain online during the repair to the data center.

Bandwidth Outages

During some routine upgrades, the main router for bandwidth control into your data center loses its configuration.

This is one that many people don't think of. Does your data center provide you with redudant bandwidth sources? If their router is down, your site could end up down for hours while they restore the configuration.

Systems Administrators

John Doe, the systems administrator, runs updates on your server and performs a restart without authorization.

Did John discuss the update with your application team? Could one of these updates break the site? Perhaps your site requires configuration after a restart to fully configure a shared folder or other service you do not want started automatically. Regardless of the situation, John most likely took down your site. A proper DR plan accounts for situations like this which may not be apparent at first glance.

Application Support Team

Jane, one of your application developers, pushes an update to your site before headed out for the day. Having had a rough day, Jane turns off her phone and goes to bed early.

If Jane did not ensure that the update worked properly, the site could be down and no one knows it yet. Jane in particular, as she turned off her phone. This problem may persist until she makes it to work the next day.

This can usually be taken care of with an escalation system of some sort. In it’s simplest sense, it is a list of people to call in a hierarchical order. The more complex involves a system which allows a user to log a ticket and the system escalates the request automatically. Having proper procedures in place for application deployment can certainly help to alleviate this situation, but accidents do happen.

Novice User

Jake, a new user to your Content Management system, is asked to perform a cleanup task on some of the old articles.

This could end poorly for Jake and his boss. If Jake accidentally deletes articles or pages he was not supposed to, how will this content be retrieved? Having appropriate backups of the database will certainly help in a situation like this. With a proper DR plan, Jake can simply call the appropriate person and the backups may even be restored before his boss finds out! This is good news for Jake and for the site as well.

Equipment Age

In an attempt to save money, John’s widgets puts his website up on a server with 5 year old hard drives. After a successful launch to their brand new website, their main hard drive fails bringing their online sales to a halt.

For anyone that doesn’t know, a hard drive’s life is measured using what is called “Mean-Time-To-Failure” or MTTF. It isn’t a matter of “IF”, it is a matter of “WHEN” the drive will fail. Their failure rates are normally within the first 3-4 months or 6-7 years. The 3-4 month range is due to design flaws and the 6-7 year is the life expectancy of the drive. Hard drives are devices which wear out and should be part of a cyclic hardware refresh program. Every year, a company who purchases and manages their own hardware should plan a portion of the budget to phase out old hard drives, upgrade ram and even upgrade servers over time. By replacing this equipment in a timely manner, a company can prevent server failures due to failing equipment.

Facilities

After a business in an adjacent office leaves a candle burning overnight, a fire spreads throughout the office complex burning all documents, equipment and leaving the widget of John’s widgets in shambles

While a DR plan should cover your servers, it should also cover disasters which may occur at your office locale as well. If your office building burned, was flooded or was closed due to an environmental contaminant...what happens? Does everyone have the rest of the month off while the office is repaired? In a good DR plan, there are clauses covering “continuing operations”. So all employees should know what happens if a disaster were to happen in the office building.

If something were to happen to an office building, the DR plan should lead a CEO / CIO to everything needed to restore operations as soon as possible. Situations such as unexpected equipment damage are why a “complete DR plan” will also include copies of all insurance policies, documentation of assets (all serial numbers with dates of purchase) and a plan to obtain replacement assets as soon as possible. It may also include a rendezvous point for any emergency meeting which may arise from a facility disaster.

Is that it?

Depending on your business domain, you may find that you have more issues to worry about than this. The best way to put together a DR plan is to have a brainstorm session with a diverse group from your company. At least one person from each department should chime in on what issues they can foresee. Once you’ve compiled issues like the examples above, appropriate counter-measures have to be planned for. This is where the actual plan may take the form of a dialog or flow-chart to handle any issues which may arise, in the appropriate manner.

Once a DR plan is made, it should be distributed to all employees or hosted in a location where everyone has access. An offsite copy should also be kept in case the digital version is destroyed.

Conclusions

Now you may be thinking, “Isn’t this a little excessive?” This sounds like a large investment of time and money! This is, in fact, true. A DR plan is a huge investment in protecting your company from the unforeseen. However, if your website goes down and all you have is ‘backups’, how long will it take you to get that site back up and running? Do you know what schedule the backups run on? Do you have hardware to replace the dead server or stolen equipment? Is your sole sysadmin on vacation for the next two weeks in Orlando? If your site goes down for even just a few hours, people lose faith in your brand. Particularly if you are in the IT industry in any shape or form. If you can’t keep your own equipment running, many people may doubt your ability to keep their equipment running properly as well.

So the answer to excessiveness: How much is your brand worth to you?

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

chris Burdick

While I am officially a PHP Developer and Systems Administrator for Purple, Rock, Scissors, I consider myself to be a jack-of-all-trades. I have done a little bit of everything and have a storehouse of useful (and some useless) knowledge to share with the world. It is part of being what I like to call an "NBG" or "Natural Born Geek". While 
I 
don’t 
look 
like 
a “stereotypical” 
geek, 
I
 have had
 a 
fascination
 with
 anything
 technical
 since 
the
 age
 of
 7,
 got
 into
 software 
development 
in
 high
 school,
 and
 later 
got 
into
 artificial 
intelligence/robotics
 in
 college. After 24 years of living in a town of 7,000 people (thats 7K not 70K) in Connecticut, it was time to move on to Orlando, my new home. It was a big change, but certainly worth it. When I'm not at work, I like to hit up theme parks, watch a little TV, play some video games, hang out with friends, or cook. Cooking is a side passion of mine, and I love to whip up a fresh dish instead of eating out all the time. Nothing beats a relaxing night in with friends with a good bottle of wine, a newly discovered recipe, and good music to bring it all together. What defines good music? Everyone is different, but I like a little of everything--from techno to indie to country to alternative rock to classical to jazz. No genre goes untouched in my collection.

my Favorites

KitchenAid 10 pc hard-anodized cookware set KitchenAid 10 pc hard-anodized cookware set

These are an awesome (and cheaper) alternative to Calphalon. They wear extremely well and you can always get a great deal on them via Amazon. The only thing that would sweeten the deal would be a 12” deep skillet.

Infiniti G37 Journey Sedan

So, while I enjoyed my Honda Civic Si...I truly missed having creature comforts. I finally took the plunge and went back into a luxury Sedan with a fully loaded 09 G37 Journey Sedan in Liquid Platinum. Call me crazy, but I like it when my car opens to my touch, responds to my voice and just seems to 'know me'. I call him HAL 9000 and he is my new toy :).

MacBook Pro 15

Although I have the older style, I love my MacBook. I honestly never thought I would say that, as I was a PC guy up until 9 months ago!

my Flickr

  • Happy Birthday Justin!!
  • Photo 5
  • Foosball Fights
  • Foosball Fights