Let's get technical. In this lesson, we'll explore specific features of Delta Lake. We'll also discuss a related data model that balances flexibility and reliability. By the end of this lesson, you'll be all set for me to hand you over to Brooke, who will show you how to implement these ideas in code on your own data. To recap, lakehouses add reliability, quality, and performance to Data Lakes. It's backed by Delta Lake, which is an open source technology developed initially at Databricks before we donated it to the Linux Foundation. It's built on the back of Apache Parquet, which is the scalable file format you saw in earlier lessons. Parquet is great, but its features won't quite get us to the lakehouse vision. We need a new tool that we find in Delta first. Delta adds a transaction log on top of Parquet files. This means that we can perform updates and delete rows of files. This specific term for this is ACID transactions. ACID is an acronym. The A stands for atomicity or the guarantee that if you're adding data to one table and subtracting it from another, you can have that be a single transaction where both steps either succeed or fail together. The C stands for consistency, or that the database is always in a valid state. That means that if I start writing to the table while somebody else is reading from it, there will be a consistent view. Isolation is our I, and that means that we can do concurrent queries against our data. For our D, that's durability. This means that if the lights go out, we won't lose our data. In our case, this also means that if we take down our Spark cluster, then the data will persist. Those are our ACID guarantees. That's why sometimes we joke that Delta is Spark on ACID. Just as ACID transactions don't come with the Parquet file format, schema enforcement is also a difficult one. With Delta, you can evolve and enforce your schema as needed. You can also write from batch and streaming operations to Delta tables. Finally, since you have a log of all of the transactions at this table, you can always time travel back to previous versions of your data. In summary, you get data versioning reliable and fault tolerant transactions in a fast query engine all while maintaining open standards. It ensures your teams can access timely reliable, high-quality data. Now let's talk about the architecture. On the left-hand here, we have a number of different possible data sources, including data cues like Kafka and Kinesis, Firehose in a Data Lake and Spark. You can land the data however you need to. The data is then put into a bronze table, which is also called a raw or ingestion table. Basically, we just need a way to land our data and some raw format. We want it to be raw so that we can go back and see the source of our data in case something goes wrong. Now this bronze table would be a Delta table backed by S3 or Azure Blob. On this table, you'd want to do some schema enforcement so that the data has the schema you'd expect. If it fails this check, then you can quarantine your data so that it's not propagating bad data through your system. You can then go to the quarantine to figure out what went wrong. To keep organized, it's helpful to have bronze or raw in the prefix of the table or database name. Next up is the silver tables. This is also known as filtered or augmented data. This would be its own Delta table or tables with higher-quality data. You might parse out timestamps or pull out specific values you're expecting to see. Then finally, you'd have another set of tables known as gold tables. These are your high level aggregates. You might have a given table for a given report or dashboard, or as features for some machine learning model. The core idea is that you're incrementally improving the quality of your data until it's made for a specific application. Now, with ACID transactions, you can always delete parts of your data if need be. You couldn't really do this with Parquet, CSV, or other file types because you'd have to read back in all of the data and write it all back out without the particular rows you were wanting to delete. You can easily unify streaming and batch workloads. You can run a nightly batch process to propagate data through this architecture, or you can have a streaming job setup between each of these different tables. While streaming is outside of the scope of this course, it's good to know that you can use it to keep your data constantly up-to-date. Finally, you can do standard practices for retention and corrections, including inserting, updating, merging, etc. This is really important when it comes to GDPR or the data protections that address issues of privacy and data ownership. In summary, Delta allows you to do advanced database-like operations in a Data Lake so that you can have scalable, reliable, and optimized queries. This is the so-called medallion architecture, moving from bronze to silver to gold. In the next videos, Brooke is going to demonstrate how to bring this lake house vision into reality using Spark and Delta.