Person: Derive tables are a great feature of Looker that allows you to create new tables that do not yet exist in a database. In this section, we'll explore how to use derive tables to define new custom tables to address complex business user questions. Recall that when you work with existing tables in your connected database, you reference the necessary table names in the SQL_table_name parameter of the view files. You can also see the table names used by a view by reviewing the generated SQL for Explore queries. However, it is not unusual for existing database tables to be insufficient in assisting you to answer more complex questions. Sometimes, you need to structure your data in other ways through SQL objects like temporary tables, materialized views, common table expressions, or CTEs, and subqueries. In LookML terms, if you need to produce a CTE or temp table, you need to define a derive table. The most common use case for derive tables is to overcome structural limitations for patterns that would require a subquery in raw SQL. A nested or multistep aggregation is a common pattern where we need to do an aggregate of an aggregate. Consider the example of trying to find which department has the highest total sales. You may think to try something like select max(sum(sales)) from orders, group by department. However, this query won't work for two reasons. First, we cannot take the max of a sum. Any database dialect would complain about these nested aggregates. Second, the group by logic needs to be applied separately. To resolve the issue in SQL, you can isolate the first aggregation and grouping in a subquery, then find the max from those results using select max(subquery.total_sales) from (select sum(sales) as total_sales from orders, group by department) as subquery. Here is another scenario. Consider an order items that can have many items per order like a shopping cart. Now maybe you want to do some shopping-cart analysis like, are there any patterns to or drivers behind orders that contain multiple items as opposed to just one, or what is the average number of items people tend to combine in a single order? For the first question, you would need to separate orders with one item versus orders with multiple items while the second question would be like an average of account. Or maybe you want to classify high versus low-value orders or determine the average total order cost using a table that is granular to the item level. For the first question, this would be like doing a case-when with each order ID's sum of sale price while the second question is another inquiry about average of sum, specifically average customer lifetime spending. To understand how derive tables can help answer these questions, let's take a look at an example data set. Imagine that to the left is the actual structure of the data in the table. To the right is the table structure that you need to answer those various questions related to number of items per order and total order cost. Specifically, you need to calculate each order ID's count of item IDs and sum of cost. Using derive tables, you can create a new table of these two aggregations and then use this new derive table to answer questions about orders based on the number of items or total revenue. So far, we have explored a few use cases where we need to preaggregate or dimensionalize metrics using derive tables. Of course, there are many more situations in which derive tables would be useful such as analyzing behavioral patterns using raw event data that needs to be partitioned or organized into user sessions based on specific activity timeframes. With a derive table, you can ask questions such as, how often does someone who started on page A end up converting or doing some specific desired action, or how much more likely is a customer to purchase product B given that they have purchased product A? Another example might be order information that needs to be rolled up by customer and month so that you can determine things like average amount of time that passes between one order and the next, or maybe you are working with multiple tables for various marketing channels. You may want to combine these tables into one large table using a union so that you can calculate and compare results across all channels. So now that we know why derive tables are useful, how exactly do we define derive tables to create new aggregations or tables that we need? Derive tables are either manually written select statements or Looker-generated select statements that produce results that can be queried just like a real database table. Derive tables are integrated in LookML projects as views. You can explore or join those views in the same way as other views that point to actual database tables. Derive tables have two types of existence: first, ephemeral tables which compile at runtime and are generated as CTEs or temp tables. Second, persistent tables, or PDTs, which are stored in the underlying database. The benefit in persisting derive tables is that they are ready to go when business users need them and therefore reduce query runtimes. The downsides are that they take up storage space in your database which may correlate to cost, and they are more rigid. They cannot accept dynamic logic, so you may choose to persist only some of your derive tables such as the most frequently used ones. In summary, derive tables are a useful feature for defining new custom tables that do not yet exist in your database. Derive tables are used just like other views in your LookML project and produce results that can be queried just like a real database table. Depending on your use case, derive tables can be ephemeral or written back to the database as a persistent derive table. With derive tables, you can create custom tables to aggregate your data in a variety of ways to help answer business questions. Hope you have fun experimenting with derive tables in your organization's Looker instance.