BCP/DR Primer – Part 3 – The BCP Recoverability Matrix

In our last post we laid out the RPO and RTO of  a couple of systems to illustrate how we can map our systems into BCP tiers. Now that we have defined our current state of the core IT applications and services we will examine the requirements and limitations and define how we can maintain, or increase the recoverability and redundancy of those systems.

Defining your IT BCP capabilities: The BCP Recoverability Matrix

The focus of this series is on the IT systems and how you can define and increase the availability and recoverability to meet your business requirements for alternate site recovery. I’ve created a template that I’ve used for many years which provides a simplified view of your environment. The spreadsheet is called the BCP Recoverability Matrix. It is a Microsoft Excel 2010 XLSX file so you may find that it appears as a ZIP file. Just rename the extension to .XLSX after download. The matrix can be found HERE.

The matrix spreadsheet has 3 tabs which are Current State, Desired State and Application List. The Application List will hold the information about each application from your BIA. This will be a snapshot of information which you need to place each application into the matrix to view it’s overall recoverability level.

Here is what the matrix looks like. Again, remember that we use the lowest common denominator of RPO and RTO to show the actual place on the matrix, so these factors will define the BCP Tier (color coded) as the application appears in the matrix.

Oxygen Services

The first thing that we have to do is define all of the core IT services, or as we call them Oxygen Services which will be required to begin recovery of the business application systems. Without the Oxygen Services we would not have any method by which to begin recovery. Many of these services are already Tier 1 and fully automated but it is still of absolute importance that we document each system in the matrix.

Business Applications and Dependency Applications

Once we have our core infrastructure and recovery infrastructure mapped out, we are now tasked with the putting the business applications onto the matrix. The key part of understanding the business application is understanding they underlying dependencies that are required to bring or keep it online.

Take a web application. While the business sponsor may understand it to simply be an application server, there may be multiple dependency applications, database connections, third-party connections, firewall and VLAN considerations and much more. The goal of the BIA is to have the business define their application needs, and from that we then fill in the blanks to build the dependency diagrams and document any downstream services.

This series is about the process of defining BCP, so you may already have your own methods, or you may want to search out software solutions to fully and effectively document the application environment. There are lots of great solutions available to assist with the process, and you may find that you wish to build your own depending on how “original” your configuration is.

The absolutely most important part of any BCP program is clear and effective documentation of your systems. And you also have to be able to access these documents and information in your recovery sites, so consider that requirement when you are looking at document management and information storage for your BCP program.

Once you are able to map out the dependencies you will meet with your business representatives to confirm that the resulting recovery time-lines meet their expectations and requirements.

Understanding People and Prioritization

There is one thing that you will learn very quickly during this process: People do not want to pay for, wait for, invest time in recovery. That’s a bold statement, but you have to understand that a business sponsor has one responsibility: conduct business. Their focus is on the business and people process, and they look to us as IT SME (Subject Matter Experts) to be able to provide technical solutions for business problems.

Along with providing daily operational support for their environment, there is a need, and sometimes an assumption, that there is BCP “built-in” to their application environment. During the BIA you’ve discovered their business needs, and now that we have laid out the technical dependencies we will present them with what may be an unfortunate set of news about how quickly their particular system can be recovered.

Also, when you tell someone that it will take between 4 to 24 hours to recover a system, guess when they will expect it to be available? Your phone will begin ringing at 4 hours and 1 minute asking “is it online yet?”. A key phrase you will learn from this if you don’t already know it is “management of expectations”. This is where you as the IT organization must raise the awareness and understanding of the business on what will take place during a recovery.

So where the BIA collected information, and you have documented the current state of recoverability of each system, you will now meet with the business to evaluate their comfort with what can be done in the event of initiating the BCP plan. This may be a rude awakening for some on where there are cracks in the armor of a particular system, and it will potentially introduce additional cost to the business.

One more thing that you may learn about many business representatives is that they do not wish to participate in this process. It’s not that they don’t want to be able to recover their systems, but as mentioned earlier, there may be an assumption that this should just be part of normal operations and thus shouldn’t require more interaction from the business to make it happen. Remember that while we are here to enable business through technology, that they have one single goal which is to run the business.

Up and Over

When we look at the BCP Recoverability Matrix, we see that the categories of RPO and RTO are setup so that the are descending downwards for RPO and to the right for RTO. What we like to be able to do with systems on this matrix is move them higher in both the RPO and RTO to increase the recoverability, and reduce the manual interaction required to make this happen.

The goal of the up and over process is to be able to reduce the effort required by IT to provide the business requirement of recoverability. The goal of IT in this is to also do this with as little expense to the business organization as possible. Ultimately we want to be able to reduce the work required by our resources.

Under-Promise, Over-Deliver

Have you ever heard that phrase? BCP isn’t the only place where you want to apply that tactic, but it is most certainly one of the most important. If the business requires only 24 hour old data (aka “tape”) then we should replicate the data asynchronously. We will meet and exceed the requirements, but it may not be necessary to make the business expectation as near-zero data loss.

Why would we not attach the new ability to the plan? Great question, but as we discussed about mapping the recovery Tiers you do not want to suddenly make the requirement to be a sub-4 hour RPO which could place an undue stress on your team and your infrastructure. If you can do so when the business doesn’t absolutely need it you have just saved yourself, and your business sponsor some grief in a recovery situation.

What’s Next?

We have  a pretty good view of the overall requirements at this point. The next post will introduce the Application Recovery Document and defining the recovery schedule for our overall matrix.