- [Russ] We have spoken about storage options, like EBS, EFS, and S3. And database options, like RDS and DynamoDB. What are our options for protecting the data in these services, and maximizing our uptime? We want the ability to recover, or even keep operating, when a failure occurs. Let's think of some scenarios where we could experience any data loss, and how can we plan for each? I have a hard disk in my gaming PC in my house. Infrastructure--like hard disks--can, and do, fail. What types of protection can we apply so that I don't lose my saved games at home? We want to be sure that data is stored durably. Usually this takes two forms, backup data and replication of data. I can copy my data to another medium for backup. For added confidence, I can store the backup offsite from my house. The more copies of my data I create, the better the durability of my data is. If a failure occurs, I can recover from a copy of my data. How long does it take me to discover the failure, and restore from backup? This is called the Recovery Time Objective, or RTO. If I'm creating a backup every Sunday, what happens if my hard drive fails on Saturday? Oh no, we've lost 6 days of data. We can call this the Recovery Point Objective, or RPO. This is the maximum time period of data we could potentially lose between backups. I can also replicate my data. Every byte I write to the hard disk can be replicated to additional disks. If a failure occurs, I have a copy of the data on another disk. This can avoid an outage altogether. Replication doesn't protect me against accidental deletes or corruption of data. Let's say I accidentally copy over some game files. The replication will replicate the overwritten files, just like any replication. To mitigate this, any approach to recovery will generally be a combination of backup and replication. Let's apply this to storage and database services we have learned in this course. In RDS, we can improve the durability and availability with a Multi-AZ deployment. If I launch a database instance in two AZs, RDS will launch an active database in one AZ and a standby replica in a different AZ. Every insert, update, and delete I perform in my primary database is being replicated to another AZ. I sleep a little bit better at night knowing that my data is being stored in two different Availability Zones. An RDS Multi-AZ deployment will fail over to the replica in a few scenarios. For example, OS security patches in the maintenance window. When an unhealthy primary is detected, or a storage volume of a primary experiences failure, RDS switches the replica to become the primary. RDS sends clients to the new primary with a DNS change. For this reason, it's important to make sure clients aren't caching DNS records for long periods. Remember, I said all my inserts, updates, and deletes will be replicated? What if that delete was a script I ran by accident, or a newly deployed version of my application overwrote all the records in a table? Just like my home PC, I need a plan for data corruption. I'm thinking backups. RDS can perform automated backups during a backup window you specify. An automated backup is a snapshot of the storage attached to your database instance. So, it will contain all the databases on the instance. Automated backups are kept available for restore, for the retention period that you specify. You can also manually create snapshots at any time. Manual snapshots are not subject to the retention period. I can restore from backup snapshots. I can also restore from a point in time. Remember the bad script that I ran that corrupted my database? I'd want to restore from a time just before I ran the script. RDS isn't only storing snapshots of your database, the transaction log, or all the activity on your database, is also being uploaded to S3. When you want to restore to a point in time, the prior snapshot is being used and then logs are replayed to reconstruct the database. Now, let's talk about DynamoDB. In RDS, we talked about creating Multi-AZ instances. In Dynamo, your data is automatically being stored in multiple Availability Zones, without you having to do anything. DynamoDB supports on-demand and continuous backups. Continuous backups are set at the table level, and are used to enable point-in-time restores. When you need to restore a DynamoDB table, you can restore from on-demand backup or specify a time for point-in-time restore. You can also do a point-in-time export of a table to an S3 bucket. Now, let's switch over to object storage with S3. Like DynamoDB, S3 keeps copies of your objects in multiple facilities, and on multiple storage devices. S3 is designed for 11 9s of durability. I can also turn on object versioning. S3 will now maintain a version history of every object in my bucket. This protects me against accidental deletes and overwrites. Now, deleting an object is just another entry in the list of versions. You can explicitly delete an object version. If you want further protection, you can enable S3 Object Lock. With S3 Object Lock, you can store your objects with a write-once-read-many model, or WORM. A lock can have a retention period. This is the number of days or years you want the lock in place. Or, using a legal hold, you can hold an object lock indefinitely. S3 Object Lock requires that you have versioning enabled on your bucket. A lock has two retention modes. Governance mode, where only users with special permissions can overwrite or delete an object, or change the lock settings. In compliance mode, not even a root user can overwrite or delete an object version. Now, for the data we are storing on our EBS volumes. To create a durable, point-in-time copy of my EBS volumes, I can create a snapshot. To restore data, I can create a new EBS volume from a snapshot. The new volume will be an exact replica of the original volume I created the snapshot from. I can copy snapshots between AZs and Regions. Having a copy of your data in another Region can be part of a disaster recovery plan to rebuild your environments. Snapshots created from an instance with multiple volumes are crash resistant. This means snapshots created contain data from the same point in time. And now, for EFS. To explain how to back up and restore EFS, we need to talk about AWS Backup. AWS Backup is a place to centralize and automate your data protection across supported AWS services. At the time of writing, the supported services are the services you are seeing now. The services we have already spoken about are supported. DynamoDB, EBS, RDS, EFS. AWS Backup automates a lot of the features we have just spoken about. Let's go over the resources you'll create in AWS Backup. First, you'll create a vault. A vault is just a container where backups are stored. Backup plans contain information about your schedule, and the vault where you'll be sending recovery points. For example, I plan to run this every 24 hours. A backup selection is how you select resources that will be included in your backup plan. My selection can contain a list of tags to find resources, and a list of ARNs to specify resources. For example, a selection could be: backup everything tagged with application=project X and the DynamoDB table, customers. Backup jobs will be created on the schedule specified in the backup plan. When the job completes, I'll see recovery points for each resource delivered to a vault. For service like EBS, this has created a snapshot for you. For DynamoDB, a backup has been created. To restore, we find a recovery point. You'll find them in the vault, or you can find them by resource. Then, I just restore. If the backup was an EBS snapshot, I'll see a new volume being created from the snapshot. How does this apply to EFS? When you create an EFS file system, automatic backups are enabled by default. This feature is creating an AWS Backup plan, and a vault with default settings for EFS. I'll find my EFS backups as recovery points sent to the EFS automatic backup vault. Now, I can restore. At restore time, I can choose from a full restore, or an item-level restore. The files are restored through a directory in the original file system, or you can restore to a new file system. As you can tell, there are many options to design your storage and database solutions for durability and protection from failure. We all hope disaster doesn't strike our production systems. Planning and preparation is the best way to be ahead of any disaster when it does happen.