Welcome back. If you'll recall in the last module, we learned about the hierarchy of data, and you'll remember that we've already spent time practicing the skills at the base of the hierarchy. In this section, we'll practice aggregating and labeling data, which is the natural progression after you've explored and transform your data. Let's get started. Remember how much work it was to count how many users were added each day. Let's automate some of the work of that process, and add in labels that we'll use to segment our users. The kind of table we're building is called a snapshot table. It will give a picture of what the users table looked like each day. I have all this information about what users have done like, has their account then deleted, have they ever ordered anything. So here are the columns that I'm going to be adding. All of these things could be useful for segmenting users into meaningful groups during analysis, or for generating dashboards. For prediction problems in machine learning, it can be really helpful to look at what information would have been available before an outcome of interests like, account deletion or an order being placed. A snapshot table is a great tool for gathering pre-event data. If you want some additional practice on Complex SQL, this would be a good exercise to build on. There are lots of other features we could add here, that are more complicated to build because they would require windowing functions. I imagine it could be useful to include features like the most recently used platform, and the most recently viewed item. Don't worry, we'll spend some time in the third module talking about how to compute the most recent things using windowing functions. But just keep this in mind, you can come back to this one. The point of this section here is mainly to practice creating a table that exists at this level in the hierarchy. Since we've already spent so much time thinking about dependencies, let's think about it for this table we want to build. The upstream dependencies are the users table and the orders table. The downstream tables, well, we haven't built them yet because they're at a different level of the data hierarchy, but some examples would be a dashboard specific table, or a prediction scores that you computed using these features. Now let's think about how we build this table, if it's going to be updated daily. When we built this, we want a query that can add to the table just the latest updates from it, nearly continuous stream of new data. We can build a query that adds data into single day of the snapshot. When we use this table, we'll need to make sure it works for all of the days in the past. We use the word backfilling to describe that process. Backfilling can be done by running the one day at a time query over an entire date range, one day at a time. Or it can be done with a slightly altered query. If you're working with a very big dataset and scheduling a query to run every day, it makes sense to conserve computational resources. This stage, creating a table that updates daily, creating a query for it, is where it can be worth it to spend some time making sure your query is efficient. This can be done by using appropriate table structure, by being clever about rows that need to be updated, or by adding and setting parameters like memory usage in an appropriate way. Optimizing queries isn't the focus of this class, and many of the techniques are specific to the database set up you're working with. If you're interested in this specific area, you should check out classes own Spark. In the next concept video, I'll talk about one of the data warehousing tools that's specific to Hive. Now let's dive into our example.