The next section of incident management is all about business continuity and disaster recovery, planning, and procedures. So take a second, pause this audio, go grab yourself a cup of tea, maybe a cup of cookies because we got about 20 slides in this module. If you look at this, there are a number of steps that we go through to do contingency planning. We start with a policy, we do the business impact analysis we identify the things that can be prevented, like single points of failure. We develop a strategy for how we're going to address the things that can't be prevented. Once we've defined that strategy, we put together our contingency plan and then we test the plan. Train our people and then throughout its life, that plan is going to have to be maintained. When you do recovery planning or business recovery processes, we have to look to see if there is a risk based asset classification system in place. The assets that are most critical, are the ones that we'll get risk analysis, risk assessment. And the ones that are have risk assessment done, are the ones that will get recovery plans and procedures, business continuity, or disaster recovery. They will be documented, that people will be trained, and all of those plans will be tested and audited. When we do the recovery operations, the idea is to be able to get the business up and running as quickly as possible. It might be that we restore back to the primary site, it might be that we have to go to an alternate site to bring the systems up. What is the strategy? What do we bring up first? And there are a number of things that go into that consideration. How critical is the business process? What's the cost? How long does it take to recover? Are there any additional security considerations by not being at home, but by being at an alternative site? And how reliable is that backup system? One of the this we have to look at when we determine whether or not the strategy is appropriate of cause is cost, as well as how long does it take to recover? And what are the impact and the likely hood of the event that cost us to implement the recovery strategy of occuring. When we are looking at threats, the idea is that we want to minimize that threat. We want to minimize the likelihood [COUGH] that that threat is going to occur and even if it does, within we want to minimize the effects of that threat. When we talk about recovery sites, we have a couple of options. If you're leasing from a third party like Sunguard, then you can have hot sites, warm sites and cold sites. And I've listed some time parameters on slide 68. If your maximum tolerable outage for example, is 48 hours, then your cost effective recommendation would be a warm site. [COUGH] There are a couple of others that are not owned. They may be owned or leased like a mobile site or you may have a fully redundant data processing center, you own that one. So what is the basis we go through for selecting that altanate slide? We look at a number of things, how long does it take for IT to come up RTO? How much data can you loos recovery point objective? What is the maximum tolerable outage the business you need is willing to accept? How close are we? What's our proximity? Where is it located? What's the nature of the probable disruptions? When we put together our strategy for response and recovery, we have to be able to preempt or presuppose or preincident and be ready for it before it happens. We have to have evacuation and accountability procedures in place for our people. We have to have [COUGH] plan activation criteria. How do you declare a disaster? Who declares it? What processes and IT resources are necessary to recover? Who are the people that necessary to do that? How do we get them to that alternate location. What are the recovery step by step that we have to go through? And what are the various resources? For example, there are some company do that still print payroll checks on custom check stock. That would be a resource, that would be required if we had to recover payroll and go to an alternate site. That response recovery plans, has to include mission, strategies, goals and objectives. It has to be approved by senior manager and we need metrics in place to measure how well we're doing when it comes to recovering from a system malfunction. It needs to be integrated with the business [COUGH]. All of those variables, recovery time, recovery point, maximum tolerable outage, all have to be developed in conjunction with the business unit. This is a graphic that would explain how this works. RPO recovery point objective is lost data. RTO recovery time objective is how long it takes IT to put that equipment back together and to restore that last backup. WRT work recovery time is how much time it takes the business unit to find that lost data and put it back in. If you take IT's time RTO and the business unit time WRT, and add those two together, that will give you the maximum tolerable outage to maximum tolerable down time. From a notification perspective, that plan business continuity or disaster recovery should have a Call Tree. And that includes not just your employees, but vendors that you're doing business with. Vendors that are providing you with equipment or supplies. The off site storage location, the alternate site location, human resources, your insurance agent, and even law enforcement need to be in that Call Tree. From a supply perspective, are there any unique supplies like custom check stock that we need to include that need to have available so that we can run at that alternate site? How much communication? If you go to an alternate site, and then alternate site, lets say it's 500 miles away, you're externally facing IP address is most likely going to change. What do you have to change in order to make the connection? And who do you have to let know that you now have a different externally facing IP? It might be your internet service provider, it might be your extranet customers. In general, it might be anybody that's cached records in DNS that is now trying to connect to you and can't connect to you. So what are the things that we have to look at when we're providing for network recoverability? Redundancy, two runs an alternate routing different way of getting to the data center. Diverse routing and you can read the explanation for this in the file print at the bottom of slide 77. Last mile protection, that one is critical to understand for purposes of the system exam. As a company, when we are talking about what we say last mile protection, you are responsible for making the collection between where the internet service provider stops. And that could very well be the telephone pole in your yard into your data center. Now it might come in to the patch panel inside the data center, but then it's your responsibility to go from the patch panel to your network racks, that is critical. We look at disaster recovery and we talk about having something called high availability, multiple devices. So that if one device fails, we don't lose the system because we have others that can take over, that might be a cluster of servers, or we call a server farm. It might be having multiple raid arrays. It might be having your storage out in a storage area network but all of those are high availability considerations. We need insurance. One of the things that I find quiet interesting is, if you've got a disruptive event. One of the system that goes down is probably account receiver wall which means you don't have any money coming in. So we buy insurance for money because we need money to continue to run that business and you can see the list of things that we buy in addition to that. Extra reading your insurance policy covers this things somebody probably in legal or in finance, or maybe you have an insurance department. Needs to go through that insurance policy in detail to make sure that every possible condition is covered. Recovery plans, like anything else, have to be maintained, things change, people change, applications changes. And anytime there is a changes, anytime there is a test, we need to make sure that we updates those recovery plans. What I'll like for you to do now, take a moment and identify one instance that will drive the updates to those plans, because this continuity ends disaster recovery. After you've identify that instance, then document why change would be appropriate. [MUSIC]