In the final post in the BCP/DR Primer series we are wrapping up with the final task in BCP program; testing the BCP plan. Another piece of debate is the semantics of the word “test”. Many BCP programs will refer to these as “exercises” rather that “tests” but regardless of the label we apply, this is the ultimate result that we must be able to get to.
Test the plan, do not plan the test
It’s a simple statement, but it could not be more important. The concept of the DR test is to apply the plan you have in place to test the recovery of your systems and prove the effectiveness of your plan. Much like Test Driven Development, you may find that you do not hit the mark in the first pass. This is important in illustrating that the BCP plan is an organic document that must continue to grow and develop over time.
The real heart of the phrase “test the plan, don’t plan the test” is that you should not be designing a test to be successful. What I mean by this is that you should be performing the recovery in as natural an order as possible. Many organizations even have some teams involved where they simulate a true recovery scenario by involving vendors and internal support teams with limited notice. What this does is add more realism to the test to ensure that following the plan will produce the result you desire.
Failure is not always failure
Failure may be a strong word, but when you are performing a recovery test and one of the components you have planned for fails in its recovery, you have not necessarily failed, but you have learned that the plan requires additional data. One of my colleagues likes to refer to these as “challenges” and not failures. We use the word “learnings” as well (which isn’t really a word, but we use it anyways).
The long and the short of it is that you must take the issue that caused either a delay or failure of a piece of your recovery and then adjust the documentation and the plan accordingly to work around it.
The Deadly Embrace – a recovery nightmare realized
One thing that will stop any plan in its tracks regardless of its effectiveness is the deadly embrace. This is a database term where you have two or more processes accessing a single item where neither will relent, and neither will continue until the other process releases the item. In other words, a stalemate or a deadlock. The end result: nobody wins.
If you have a situation arise during your BCP recovery in a test or in a real situation, the deadly embrace will stop your plan in its tracks and require you to effectively restart the entire process. During a recovery test, you most likely will not have enough time to re-run the entire test during your test window which means that you will have to close out the exercise and plan a second attempt when resources are available.
The Post Mortem
It isn’t just for Quincy M.D. anymore. The Post Mortem meeting is a necessity to evaluate the recovery test and take our learnings from the process so that we can apply them to the documents and the plan to alleviate those issues for future recovery tests, or more importantly a real recovery scenario.
Be prepared to have detailed discussions in this meeting, and be prepared to apply as much time as necessary to the process. If we do not open our minds and our plan up to what really took place during the recovery, we cannot be as effective as we need to be in maintaining the best possible recovery plan for your organization.
And you must remember that there will be people processes which are involved here, not just IT processes. While our focus in the IT organization is the technical documentation and the bits and bytes portion of the recovery, there are many processes that require warm bodies to be the key and regardless of the most perfect technical recovery plan, without a person to implement it, the plan may as well be written in Sanskrit.
You’re never done
The interesting challenge with BCP/DR programs is that they are never completed. Because your IT and business are dynamic, so should be your BCP/DR program. Once you have your environment documented, and tested, you can consider this to be complete as of a point in time. The next step is to continue to engage the business in participating in updating, managing and revisiting your BCP plans with regularity to keep the recovery plan as current as your production environment.
The last task in your Post Mortem meeting is setting the next planning meeting. No, seriously, you must maintain the momentum and focus to continue to keep the program active. Once you have your baseline work done it will be simpler.
Change Management and BCP
The best way to keep focus on your BCP program is to fully integrate it with your Change Management process. For each application and infrastructure change there should be a checkbox in the process which asks “does this change affect the BCP recovery plan?”. If you involve your business sponsors and application owners with BCP as part of their day-to-day processes for design and implementation then it will ease the pain and raise the awareness of the importance of the recovery planning and BCP program all around.
Want to talk?
I’ve been working with BCP programs for years, and the one thing that I have learned is that an outside opinion can be very helpful. Feel free to drop a comment to me or Tweet me and I would be happy to offer anything that I can to help you along the way.
Truthfully this could be a never ending set of posts, but my goal was to try to help those who either have little or no BCP experience to get to the first steps, or to formalize their process. The more we do, the better it is for all of us in our respective organizations.