Welcome to "Querying the Data." After watching this video, you will be able to: Interpret an entity-relationship diagram, or ERD, for a star schema and use the relations between tables to set up queries. Create a materialized view by denormalizing, or joining tables from, a star schema. Apply the CUBE and ROLLUP options in a GROUP BY clause to generate commonly requested total and subtotal summaries. CUBE and ROLLUP operations generate the kinds of summaries that management often requests. These summaries are much easier to implement than the multiple SQL queries that are otherwise required. Materialized views conveniently enable you to create a stored table you so can refresh on a schedule or on-demand. When the a view is complex, requested frequently, or is run on large data sets, consider materializing the view to help reduce the load on the database. Because the data is precomputed, querying materialized views can be much faster than querying the underlying tables. Combining cubes or rollups with materialized views can enhance performance. You can even follow up by materializing the cube or rollup. Consider the following scenario: You have the task of creating some live summary tables for reporting January sales by salesperson and automobile type for ShinyAutoSales. Begin by understanding the existing star schema in their data warehouse, called "sasDW," based on PostgreSQL. Then explore relevant ShinyAutoSales data by querying the tables from the "sales" star schema in the sasDW warehouse. After exploring the schema, you decide to create a materialized view as a staging table. Creating the view as a staging table provides you with the data you need while minimizing your impact on the database. You can incrementally refresh the data at will during off-peak hours. You start a PostgreSQL session and generate an entity relationship diagram, or ERD, which represents the "Sales" star schema implemented within the ShinyAutoSales data warehouse, "sasDW." Then, you locate the central fact table named "fact auto sales." This table contains the "amount" column, which is the measure you need. You also spot the three foreign keys in the sales fact table: "sales date key," "auto class ID," and "salesperson ID." These keys link respectively to: The "Date dimension table," which contains dates and related values such as the day of the week, month name, and quarter. The "Auto category dimension table," which includes the "auto class name," and the Boolean "is new" column, and finally, the "Salesperson dimension table," which contains the "salesperson's name." In this example, you are using PostgreSQL. Let's assume you already started up the terminal-based front-end to PostgreSQL, "P S Q L," and connected to the "S A S D W" data warehouse. Notice the command prompt contains the name of the data warehouse you are connected to, "S A S D W." Starting with the auto sales fact table, you'll enter the SQL statement "select star from sales dot fact auto sales limit 10" to display its first 10 rows. Here, you see the dollar amounts for individual auto sales, but the remaining columns are primary and foreign keys, which don't have any direct meaning for you yet. However, you notice that the sales ID values are sequential, but the numbering starts at 1,629 instead of 1. That's because ShinyAutoSales has provided you with access to a windowed subset of their data. Next, you query the auto category dimension table. Now, you can see meaningful names for various automobile classes, such as truck and compact SUVs. You notice duplicate entries for the truck class and wonder why they exist. When you look more closely, you realize the duplicate entries exist because of the distinct subclasses for new and used trucks. Similarly, you generate a view for the salesperson dimension table and find eight distinct salesperson names, including "Gocart Joe" and "Jane Honda." So far, so good! Finally, you view the date dimension table. You notice the dates only go back to January 1, 2021. Your contact at Shiny Auto Sales informs you that she will provide you with more data later and that for now, you can work with a smaller data set while you develop your queries. The date table contains potentially useful date elements such as the day of the week, month name, and quarter name. At this stage, it would be more convenient to have a table of data that contains the dimensions you need with human interpretable columns, rather than just keys. Essentially, you want to create a denormalized view of the data by joining the dimensions back to the fact of interest. You proceed by selecting the "date," "auto class name," "is new," "salesperson name," and "amount" columns from their tables, and joining each dimension onto the "amount" fact using an inner join on the corresponding keys. Next, why not capture the view as a materialized view called "Denormalized sales" or "D N sales" for short? Then you can reuse the materialized view for different queries without having to recreate your work. You accomplish this task using the clause "CREATE MATERIALIZED VIEW D N sales AS," followed by the same query you used to generate the denormalized view. Type "Select star from D N sales, LIMIT 10" to display your resulting materialized view. Now you have a tidy, human-readable, time-series of sales data available for further analysis. For example, you can see that "Cadillac Jack” sold a new midsize SUV on January 5 for $26,500. Next, you want to apply CUBE and ROLLUP operations to your denormalized, materialized view. Let's see the CUBE results. Here, you select the "auto class name," "salesperson name," and the "sum of the sales amounts" from "D N sales," where "is new" is set to "true." Finally, group the generated cube by the "auto class name" and "salesperson name." The output looks like this: The first row has no entries in the dimensions columns, which means 'all.' Thus, the value of $366,076 represents the total sales for all new cars. The next block of records has both dimension columns populated. So, for instance, you can read the total sales of new midsize SUVs by "Gocart Joe,” which is $32,099. Similarly, the last two blocks summarize "new auto sales" by class, and by salesperson. Next, you apply a ROLLUP instead of a CUBE operation. You decide to keep the query the same as the previous query, except that you replace CUBE with ROLLUP. Here's what the resulting view looks like now. You have five fewer rows with the ROLLUP result than CUBE, resulting in 13 rows instead of 18 rows. The only difference in this result is that you don't have the "total sale amounts by salesperson" summary. While CUBE generates all possible permutations of the "GROUP BY" columns, ROLLUP only looks at the single permutation defined by the columns' order listed in the ROLLUP call. In this video, you learned that: CUBE and ROLLUP summaries on materialized views provide powerful capabilities for quickly querying and analyzing data in data warehouses. CUBE and ROLLUP operations generate the kinds of summaries grouped by dimensions that management often requests. You can denormalize star schemas using joins to bring together human-interpretable facts and dimensions in a single materialized view. You can create staging tables from materialized views, which you can incrementally refresh during off-peak hours.