What do you document? Some people document everything. I have seen Disaster Recovery (DR) plans that are 250 pages long. Reading the plan, let alone recovering from it, would require 3 or 4 hours before the implementation even starts. Documentation does not have to be huge and hard to do. Starting with a basic plan and then updating it as you work through the process is the most critical.
First and foremost, the most important part of the documentation for a DR plan is step by step instructions on what to do and how to do it. In the best of circumstances, you will have your top database administrator (or your only DBA) there to do the process. In the worst of circumstances you will have “Sam the Security Guard” doing the recovery. This is important to take into account when documenting the plan. If you have nothing else, get this done!!! I visited an insurance company in Connecticut to talk about DR. Their base level documentation fit on 1 page of paper.
All of the information needed to recover the database and application should be explained in such a way that anyone in the company could perform the recovery. This makes the process much more challenging.
Next you need to document which applications are critical to the business running and which ones can wait. I will go into more detail on this in a future blog.
You documentation should be printed out and stored safely AWAY from the computer room and the production building. I have visited customers in the past that have well documented, thorough, “pretty” plans that failed to get outside of the PC in the computer room.
Other things that you should document in your plan include:
- Internal contact information (including cell and home phone numbers) for everyone that could, would, or should be involved in the recovery effort.
- Copies of contracts with all of your 1st, 2nd and 3rd level vendors. For example, what is your relationship with your telephone company? What is their guarantee to you for availability (this is your Service Level Agreement or SLA with them)? What is your hardware DR plan? Is your software support and DR plan the same as your hardware?
- External contact information – is there someone specific at your DR site that you need to contact? Do you just call the front desk and ask for Joe? What if Joe is not there? What is the process to get the ball rolling?
- Specifications for all the critical resources – If you are failing over to a data center that is shared with others the last thing you want is to be in a position of having a generic system waiting for you that does not fit the OS/Java/HW requirements for the application to run.
- Who pushes the button? Who makes the decision to go to backup or failover to the DR site? Document this and have a chain of command for who is in charge if all of the executives are out of town. The Incident Command System (ICS) from FEMA is an excellent example of how to design your DR organization. More importantly, get the executive team to commit – in writing – that the people that have been identified have the authority to make the decision to fail over.
I once participated in a ½ day DR drill at a DR conference. If you ever get the chance to do this you should. If it is run well, it is very insightful and fun. You will also learn a lot about people’s ability to NOT make a decision. The room was filled with 70 Business Continuity planners. There were 7 tables representing different functions within the fictitious company (in this case a chemical manufacturing plant). Each function had 10 people that had to act as one. The hardest part of the exercise was getting the Executive table to declare a disaster and start the process. Their inability to make the decision cost the company almost a day of time. These were people who should have known better! When they were given the responsibility of declaring a disaster they were dumbstruck with fear. The moral of the story is getting someone in charge that can pull the trigger if needed.
There are many other things that could go into a plan. This is, by no means, the complete list. For more information and ideas on what to document - Google the idea. Here are some links I have found insightful (or at a minimum, get you thinking) about what you should have in your plan.
Welcome - Bureau of Emergency Management, Division of Emergency Services, Communication and Management (You will have a local version of this)
Data center disaster recovery planning software
National and State Disaster Preparedness Resources and Tools
The BCI Good Practice Guidelines
When do you update your documentation? You do this every time something changes in your systems. This sounds daunting, but if your application can’t recover because it is missing a small part of the application that is new then you don’t have a DR plan.
I look forward to hearing what your documentation for your DR plan looks like! I’m always looking for plans and different ideas. I will share what I receive from customers as well.
In my next blog I will talk about who is in charge.
Until then, if you can’t document it, you can’t measure it. If you can’t measure it then you don’t have it!
Brian B
Brian Bowman
Brian Bowman has been working for Progress for over 20 years. He has performed database tuning and disaster planning for all sizes of customers around the world. Brian started in technical support, and has also worked in product development, pre-sales for Direct and Indirect customers and is currently a Senior Principal Product Manager for OpenEdge.