In this lesson, I'll discuss when something goes wrong. Something is always going to go wrong, it's how you react to it that makes a difference. So I'll discuss Incident Response Plans, understand how organizations may react to a disaster or an outage and a little bit about what to do during those incidents - what we should be looking at. So going back to the planning for a disaster lesson. Something is always going to happen. May not be a natural disaster, but your website could get attacked, files may go missing, theft may happen. I was working at a company several - oh, gosh, it's been about 12 years now - where we had two data thefts. They broke through the doors and stole laptops, they stole one server. I don't know why they just stole one, but stuff like that happens. We need to make sure that we plan for this kind of stuff just like I talked about in the planning for disaster lesson. Disasters are inevitable. So how we deal with those incidents makes a huge difference. What is an incident? An incident is whenever a user is not expecting a certain level of service from an IT service. An expected level of service could be based on a service level agreement, for example - and we're not meeting that service level. A major incident or outage could be also defined as a major incident; that is, a significant event which demands response beyond a normal routine, resulting in an uncontrolled development in the course of the operation of any establishment or transient work activity. That's how we've defined it here at UCCS. We developed an incident - a major incident response plan several years ago and that's how we defined major incidents or outages. If you're coming from an ITIL world, an incident means something completely different. Also, an outage may mean something completely different. Let me go to incident response plans. According to the Sands Incident Response Plans - I'm sorry. According to Sands, Incident Response Plan should include preparation, identification, containment, eradication, recovery and lessons learned. That's what a good response plan should contain. The goal is to minimize damages, so that could be communication. If we tell users that we had an outage, that we made a mistake, it's going to go much better than if we don't tell people. So not only do we may have damage to systems, we may actually have damage to reputation because of an incident. So the more that we communicate what is going on, the better - the better or the less the damage could be. We need to understand our critical systems. We need to identify mission critical systems. And to do that, we look at what somebody is using on a day-to-day basis. Can they do their job well? We need to identify the support structure for those services. For example, I may have - in my Incident Response Plan, I may actually have network, which is defined as firewalls, switches, routers, connections to data centers, connections to mission critical buildings. Any disruption of those services, I'm going to develop a response plan for that service. How do I get that service up and running as soon as possible? What about power disruptions as well? Power disruptions could impact data centers, for example. And what is a data center running? Well, they could be running authentication. Sure, your data may be out on the Cloud, but you're still relying on onsite systems to get information. An Incident Response Plan also defines the roles. Roles are critical to make sure that people know their job during an incident. So this may be the same as the job that you're currently - that you currently have in system administration or it may be that you're managing communications, for example, if you're a system administrator. Understanding how those roles interact and testing those interactions will go a long way in an incident. Communication - and I can't stress this enough - during outages is critical. People want to know what's going on. That's why in the news, people like to - or the newscasters, anchors like to say over and over again, what is going on - to inform the public as to what's happening. Okay? The communications officer role within an Incident Response Plan should be defined. Also, if your organization has a communications officer or public relations representative, thereby it is important as well. We may not want to say something to media that may get us in trouble. We also need to identify logistics during an incident. Logistics could be things like well, who's going to get sleep? What if I have a disaster on campus and I need to work on the network for over 24 hours straight? You're going to have to get sleep somewhere in there, so let's identify who's on call at what time. Who takes over for whom? What about purchasing? How about food? Identifying all those in a major Incident Response Plan or even a procedure that is not public, will get you far. So in conclusion, an Incident Response Plan is critical for your success during an incident. It may not help the damage, but it may lessen it as well.