Hello. Welcome to the cybersecurity leadership and management course. Today we'll be discussing cybersecurity management responsibility in business continuity and disaster recovery programs. My name is Cicero Chimamanda. I am your instructor for this course. Cybersecurity management, responsibility in business continuity and disaster recovery program. We will discuss how to develop an effective cybersecurity management, business continuity, which is BCP, and disaster recovery, which is DRP. We will talk about how it's important for that to yield the security, trust, and stability for your organization. Let's begin. In this course, we will talk about cybersecurity management, business continuity and disaster recovery overview. We will also be discussing the business continuity planning in detail. We'll delve into that. Then lastly, we'll delve into disaster recovery. Cybersecurity management, business continuity, and DR, which is disaster recovery. Let's look at some attacks and event because it is not a matter of if, but it really is when the organization will be hit by what we call an interruptive event. These events can come in a cyber attack, it can come in a physical, natural event, it can even come as we have seen it in a health pandemic. They can hit your organization. Nevertheless, again, it's not if, but it's when. As a cybersecurity leader, it is up to you to be prepared, to have a proactive mindset and to anticipate, to plan for it, and also more importantly to mitigate this course of action for your company to have minimal impact or interruption when those events hit. The number one cause of companies going bankrupt are actually our business continuity, disaster recovery events. Let's look at top five cyber attacks in Dallas. We start off with Epsilon, which was a company that does data collection. In 2011, they lost $4 billion in the cybersecurity event. Epsilon is a company that makes about $1.9 billion per year and a service around a 160 million clients. They were hit by what was called an email attack, and it actually collected about 75 with their biggest clients. Again, as we see, $4 billion cost in that cyber attack. We have the Veterans Administration, which it costs about $500 million in 2006. It was a database containing about 26.5 million records. Unfortunately, an unencrypted laptop was lost. This cost the Veterans Administrator about $500 million. Hannaford brothers, which is a food quality grocery shopping and distributor, about 25,000 employees. They make about $2 billion per year in revenue. Well, they had a malware that spread all 300 of their stores and their independent stores that sold their products, and $252 million was the cost of that data breach. Sony PlayStation, we all know about Sony. About a $171 million. Hackers broke into the Sony digital data room, and made off about a 100 million customer records, PlayStation online services. Lastly, Target, a $162 million, 2013 a hacker took about a 110 million Target shoppers of their credit cards. They cost him about a $162 million, lost the sales, public lost faith in their business. As we can see, there is a high price to have an event. Business continuity as we will learn, disaster recovery are methods of mitigating and preparing in advance so that you can minimize the impact. What about natural? Actually, before we go to natural, we saw the slide previously, 2020 breaches, financial firms were about 24 percent hit, the public sector was about a 12 percent, health care was about 15 percent, and retail was about 15 percent, and in other industries were about 34 percent. We see that breaches are there and it's very avid. Again, it's not a matter of if, but it's when. We need to prepare an advance. Talking about natural disasters. We see in two top pie natural disasters in the US area. We see here that in dollars of cost, Hurricane Katrina, 2005 but $125 billion aware about 1,800 fatalities in this particular event. Hurricane Harvey, 2017. Again, $125 billion estimated lost overall but 100 fatalities. Hurricane Maria, 2017, $92 billion, 3,100 fatalities. Hurricane Sandy in 2012, $70 billion about 250 fatalities. Hurricane Irma, $50 billion about 300 in fatality. As we look at these particular events, we can see that in dollars and cents and fatalities, obviously, natural event will have a greater impact to an organization. That's not always the case. Obviously the types of nation-state attacks that are happening, especially now that they're going to infrastructure or even to specific defense or military where war can start, obviously that type of event can cause and can reach those billions of dollars of natural event. But nevertheless, it is important to keep in mind that 40 percent of businesses, according to FEMA, fail to reopen after a disaster. That's 40% of the businesses in a disaster area fail to reopen. Another 25 percent fail, but open within the first year according to FEMA. You're looking at about 65 percent of the businesses within one year, they will fail to open after a natural disaster attack in that area. Again, 90% of the companies fail within two years of being struck by a disaster, and this is from the US small businesses. You can see it depends on how the size of the business, how protected they are. But it's of the utmost importance that the cybersecurity leader, along with the Professionals of compliance, the IT, the CIO, CTO, the back-office COO, the Chief Financial Officer, creating that task force to report to the board have the program for business continuity and disaster recovery in mind. This is the case of why it's important. Let's look at how the overview of BCP and BR. Business Continuity Plan, BCP. NIST defines Business Continuity Plan as the documentation or procedures that is predetermined and it is a set of instructions that describe how an organization's mission or business processes or systems will sustain during a significant disruption. That's the definition of NIST. For Disaster Recovery Plan, the definition in NIST, it says it's a written plan for the recovery of one or more information systems at an alternative facility in response to a major hardware or software failure or destruction of facilities. Let's dig in a little bit more on business continuity planning. One of the strategies of mitigating and having successful business continuity planning during an event is to have what's called high availability or redundancy. That's where you have duplication of your mission critical systems, so in case something happens, you will be protected. These are two types of high availability models. You can have what's called Active-Active. That's where two data centers or a data center and the Cloud or facilities, they basically are Active-Active. They're running nodes active at the same time. It's very important to configure that. It's more expensive obviously. But that's one design. The other one is Active-Passive. Active-Passive is where you have a data center or facility A. It's running all their nodes and the node B is in passive mode. It has all the activities and it does synchronize, has all the data that is synchronized. They have live data, but it only goes active when there is an interruption. When something happens to the node A, or data center A, or facility A, then automatically the passive becomes active. This is what we call high availability and redundancy in order to mitigate, in order to have business continuity. In disaster recovery, the most important terms is what we call RTO and RPO. Well, what is RPO and RTO? RPO is recovery point objective. Recovery point objective is the point in time which the data must be recovered after an outage. It's the point in time which a data must be recovered in terms of the time. The RTO is the overall length of time of information that can negatively impact the organization in terms of business continuity. Let's look at more in depth than that, so it makes sense. For example, RPO, which is recovery point objective; it's how far back data is allowed. You have a disaster right there in the middle, and you go back, and you need to make sure that your recovery point objective is how far back is allowed for data to be lost. For example, you can live with, let's say four hours of data that will be lost. But you need to have four hours and beyond, you need to have that data. That's your service level agreement, that's your recovery point objective. It's how far back data is allowed to be lost in terms of recovery. Now recovery time objective is the maximum time in recovery. This is how quickly can you get back up into your normal state. This is how long can you be down. A disaster happens, you can only be down, let's say four hours, or two hours, or one hour. Your maximum time of recovery is your recovery time objective. Again, so one is backwards looking, that's your RPO and the other one is forward looking, that's your RTO. Hopefully that helps a little bit. These are your main components when you're looking at disaster recovery plan objectives. Actually BCP/DRC, that's where a lot of companies will couple both into one program. You have business continuity, event happens. This is how we can continue to operate. But disaster recovery is how quickly can we get back up and running. These are some components that's very important to think about, high-level looking at a BCP, DR. Meaning you got to identify commonalities rather than assuming the details. For example, in a pandemic or natural disaster, you need to think of a disruption; thinking about suppliers, the third-party vendors, supply chain. You think generally in terms of the event, so you're not putting yourself into a box. You prepare for each specific high-level plan in terms of your disaster recovery approach. The other thing to think about is Cloud. Leveraging the benefits of Cloud. Cloud is an open adopted for many enterprise. We've looked at a specific courses where we talked about SaaS, which is software as a service. We talked about PaaS, which is platform as a service, or IaaS, which is infrastructure as a service. Again, leveraging Cloud benefits will help in business continuity and disaster recovery. Then the last thing about leadership is educate. Keeping people informed, maintaining notification systems to reach those stakeholders, to be aware, having annual, semi-annual, quarterly tests, and educating your stakeholders is so important in maintaining your business continuity and disaster recovery plan upfront. Let's look at business continuity planning specifically as we delve in deep in business continuity planning. First of all, the first exercise in business continuity planning is to focus on the objectives of the organization. What is the business and what must be kept in order for it to be successful during an event. You think of mission critical systems. The first one is your front office. These are applications that are business client interface systems. For example, healthcare, it would be your hospital patient equipment. It would be that which actually gets the hospital running. Those are your mission critical business applications. Financial systems could be for financial investment banking. It could be your order management systems or your ATM, or your Swift, or ACA. Those systems must be upfront. Those are your front office. If you're a retail shopping, point of sale, inventory systems, public or government safety systems, security systems, military advance, a non-for-profit would be your fundraising systems, donation systems. Again, each depending on what industry, you have different front office mission-critical systems, they need to be up the upfront in terms of if there's an event, these systems need to be up and running. Then secondly, we look at our legal office. There around regulatory obligations that must be up capped. Depending on what you're regulated under, whether it be financial health care, nation-state, you need to look at those systems to make sure those systems are also having business continuity. Then lastly, your back office, This is your email systems, your file and share, your print. Again, these are your back office and those two also need to be up and running. Stability. Now, some back office will actually go first. In order, for example, you need your Active Directory, you need your firewalls up and running. It's not necessarily in that order, but you need to identify, and then obviously when you talk about your plan, and we'll talk a little bit more, the order in which you should bring up your systems. But these are components, mission-critical systems that one needs to think about. It's not just about technology. Redundancy and duplication really is the three components that's made up of your whole system's life cycle. We talk about people, processes, and technology. There needs to be redundancy in all those fronts. When we look at people and process, we'll think about the BIA, which is your business impact analysis. This is determining the critical business activities associated to the resources required to operate during an event. During in the disruption. Again, these are your business activities and resources required, and so you look at your business impact analysis, if this particular process is down, what is the impact into the business? You need your business partners to be involved in your BIA exercise. The other one is looking at essential workers. There needs to be redundancy and rotation and segregation of job rotation, of job duties. Again, if there's an event, you need to ask the question, what are the minimum employees that we need to have? These are mission critical or essential employees that we need to make sure they're up and running first during an event because we can't necessarily have everybody running. But we need to identify those mission essential workers. Then we need to define or prioritize the standard operational procedures. These are procedures, these are processes. This is the secret source. Step-by-step instructions compiled by the organization to help workers carry out their routine operations. That needs to be documented, so you need to have the processes. The processes might look different when you're operating normally, your standard operation procedures, and you duplicate some within your business continuity event. Some will be overlapped and some will be newly created because you operate a little bit different during a disaster. It might be a clef note or a shortcut, and the way you operate during a disaster. Lastly, it's communication. Absolutely, the biggest component in a disaster is how you communicate, how quickly you communicate, how effectively you communicate. Setting up a communication program during a crisis, having templates, simple messages to expedite to suppliers, to partners, to clients who should communicate. Not everyone should be the voice of the company. You need to designate PR or legal or specific groups that will do the communication. Again, people process is just as essential, if not sometimes more important than the technology. But they're all important, people, process, technology. Looking a little further in terms of internal high availability configuration. We look at example of a topology. This is a topology that shows where you have high availability internally, you can look at the enterprise firewall, but internally you have two LDAP application. These are applications that do queries to your Active Directory. You have a load balancer that does what's called rotating between the two. If one's down, it makes sure that the other one is operated. Again, it could be active active or active passive. You have two directory of proxy servers. You have two wired LAN, local area networks, as you see in case you have routers that are redundant. You have two of every nodes that are duplicated to each system, M1, M2, M3, M4, and these are servers and then you actually go out to the clients. Again, this is internal high availability configuration. We have two routes internally. But you also need to make sure that your Internet service providers are redundant. A lot of times you can have internal high availability, but your Internet service providers, if it's only one route, you can also have a single point of failure. Having redundancy here, as you see in the scale or in this topology, you have BGP, which is a form of protocol. In order for routing, you can also have HSRP, but BGP is for Internet service providers. You have routing. You have the red, which is your primary route, outbound and then you have also a secondary route, which is your web servers out there and you have two routers, two ISP providers, Internet service providers that's providing you Internet. It goes out to your Internet hosting provider. Again, this is very important and you can even go as granular and having geographical redundancy, making sure in your building you have two point of entry, redundancy in a building, you have two pops. You can really get granular and your redundancy to make sure the main important thing is not to have a single point of failure. This is what business continuity needs when it comes to technology. It is costly, it takes thought, but a leader and a manager needs to know how to manage the business continuity. What are the four phases of business continuity planning? One, need and respond, incident response model. There needs to be a response, an event happen, you need to respond and you need to know how to have that response cut down. Then you need to have relocation if it warrants. If there's a natural disaster and the building is down, where are we going to relocate? We'll talk a little bit more about hot, warm, and cold site. Then recovery is number 3. How do we recover from these disaster? Then obviously number 4 is restoration, restoring back to normal operations. These are the four phases. Number 3 and number 4 is where we talk about in disaster recovery mode. Let's talk about disaster recovery. Disaster recovery plan. It starts off with having successful mindset and the five successful tips that we look at in consolidating is the disaster recovery plan must be current. No matter how good your plan is, if it's outdated, then is not a good plan. You have to constantly be updating your systems. If you upgrade your system, if you have a new software, new hardware, you've got new employees, new business activity, new processes, you want to make sure that that is being fed into your disaster recovery plan. Number 2 is your disaster recovery plan must be tested. Obviously, you need the test unit test, that's where you test the individual plans, individual phases, individual systems. But then you'd also need to do a holistic testing where you're looking at a disaster recovery test for a specific program that can be done in a tabletop exercise where you're in a room, board room, or a war room and you're talking about the different disasters that are coming in and what are we going to do, having the disaster recovery team on board, or it can actually be going to an offsite and actually implementing a full disaster recovery plan test. There must be recovery objectives and also responsibilities. What do we mean? Well, that's what we talked about, the RPO, recovery point objective and the RTO, recovery time objective and we need to define the roles of responsibility and it needs to be defined clearly, people need to know who are the mission critical employees, their roles, and they need to be available incase there's a disaster. Because whenever there's a disaster, obviously, there's family that you got to attend to, there's personal, and so communication and making sure you have redundancy if somebody's effective, then you need to make sure you planned that ahead exactly. There must be relevant and reliable backups. Your recovery is only as good as your backups. There must be alternative recovery sites and services, and we'll look more about that. Having a pre-planned of where people are going to go physically, where your systems are going to be physically, where you're going to stand up. That's got to be planned ahead of time. Looking at specific recovery replication, here's a two data centers and we talk about high availability. For example, New York backs up London and London backs up New York. You have what's called a recovery replication of two data centers. Literally, all the systems are exactly the same in the two data centers. They could be active-active or it could be active-passive. What do we mean by that? Well, let's look at that. When you're looking at recovery site, you can have what's called a cold site. A cold site is when you have a secondary location and that's it, you have a designed secondary location. If there's an outage, in a week's time, you're measuring how quickly can you get systems up and running in that second location. You might have to get servers there, you might have to get the workstations there, you might have to move stuff there. Again, you're looking at weeks in terms of recovery in a cold site. The second type of site is a warm site. This is where you have a secondary location that's already allocated. You have equipment at that location, you have connectivity at that location, and that's about it. It's not active. It's just there, you pay, it's a warm site. The recovery time is really days and hours. Once there's an outage, once there is an event, in a warm site, you already have the secondary location, you already have equipment there, connectivity. Now it's just a matter of activating the software and the data and getting that up and bringing your backups up at that second location. You can be up and running in days and hours. The other one is a hot site. What we looked at recovery replication, that's where you have a secondary location, you have equipment already at the location, you have connectivity at the location, you have active before failover. They're already active and running. Again, it's just a matter of hours and minutes to come up if there's an event. That's when you have a secondary site. You have system recovery manager for VMware, that's one of the systems that can bring you back up and running. Again, this is the type of cold, warm, and hot disaster recovery models. One needs to make a decision in terms of your mission critical systems, and how you operate, and what model you will use. Planning a DR test. In planning a DR test, you need to make sure, again, when we talked about a number of two, the disaster recovery plan must be tested. Running a disaster recovery plan, it becomes with number one, you choose an event, a new alert. A lot of times you don't want to make it announced. That's really the best test. Where you choose an event, you declare a disaster, whether it's cyber, whether it's natural disaster, whether it's pandemic health, or if you're nation-state, whether it's a war and you call an alert. Then the teams basically execute the procedures. They obtained the BCP, they obtained the DR procedures, documentation, and they get those procedures. It's always good to have those procedures on paper because you just don't know if your systems are down, so you need to have hardcopy of your procedures as well. Hardware and network is next. You decide where your hardware and network will be, whether it's a cold, a warm, or a hot site. This is where the second. Then you bring it the software and data. You execute your software installs and you execute your data recovery from your backups. Then you institute your business process, you start testing your business processes, and you're in recovery mode. Then lastly, you roll back to your production. These are the high-level disaster recovery test basis that one must go through in order to test your disaster recovery plans. Lastly, as we talked about, it's all about people and documentation. You need a plan. Who are the ones that are going to do the work and what is the documentation? Where is it going to be residing? As we close the four phases of BCP, MDR, responding, relocate if it warns, you execute, in terms of recovery, you execute your plan. Number four, you recover. That your course on disaster recovery business continuity and disaster recovery. We'll see you at the next course.