To recap the lab, what we're going to do is we're going to take data that's aggregated within BigQuery and then take those aggregators and do analysis with those aggregates within AI platform notebooks. Like in the demo, let's go ahead and let's start a new notebook environment. Go down to AI platforms and click "Notebooks". I've already got a notebook server running. Remember to recall how to do that. We click new instance. Let's go to TensorFlow. Lets do without GPU this time and let's go ahead and create our new instance. Remember, this takes about 2-3 minutes due. While we do that though, let's start exploring our dataset from within BigQuery. Let's go ahead, click the navigation menu again, go down to "Big Data" and click "BigQuery". Now, we have our BigQuery environment, so we have a query editor. Now, we're going to be using a public dataset. In this case, we're going to be using a dataset on arrival and departure times of various airline flights. I'm going to go ahead and paste this query in. I had some Python syntax here, but this is a SQL syntax here. Let's explain this query. We are selecting the departure delay, the number of flights at that departure delay, and then the deciles of the arrival_delay. Say your flight leaves 10 minutes late. Does it arrive 10 minutes late? What is the range of values that it takes? We're splitting it into the zeroth percentile, 10th percentile, all the way up to a 100th percentile. We're getting this from a public dataset of flights. We're grouping them by departure_delay and we're only looking at those departure_delays that have more than 100 flights and then we're going to order that by the departure_delay. We have 70 million rows in this dataset and I'm going to click "Run" in this. Within this we have aggregated and we'll show you the result within seconds and we do 6.2 seconds, a gigabyte of data process. If we look at these data, in a summary is we see the departure_delay, negative 37. We had 175 set left, 37 minutes early. These are the deciles of when those flights arrives. In the zeroth decile, negative 66, the 10th, negative 44. That actually arrive 44 minutes early in the 20th, 41 minutes early, and so on and so on all the way to the 100th. In the100th decile, a flight that left 37 minutes early, actually arrived 33 minutes late. That's a extreme outlier in this sense. Now that we have all these values that we've generated and aggregated in BigQuery, now we can actually do some analysis on those aggregates in our AI platform notebooks. Let's go back to our AI platform notebook. Let's go down and click on "Notebooks". By then are new and since should have instantiate it. I'm going to click "Open Jupiter Lab" and I'm gonna get my new notebook environment. I want to create a new notebook here. In this case, I can change the name if go ahead and I click "Rename" here. Let's call this airline delays. Before I actually work in Add a Query, I need to import the Google Cloud Platform Python library to work within BigQuery seamlessly. I'm importing the BigQuery library from the Google Cloud Python SDK and then I'm going to create this new BigQuery variable that gives me a BigQuery client and pass as a parameter my Quicklabs project ID. Then I'm going to define my query. It's the same query that I used in the BigQuery Window. I'm going to define it as a string. Then to put it into a pandas.DataFrame, I'm going to write df equals bq.query(query).todatframe, that's going to convert it to a pandas.DataFrame. Now, if I type df.head, that's going to give me the first five rows of what's in that DataFrame. Each of these rows is a delay_departure by minutes, so negative 37, negative 36, negative 35, all the way down number of flights and then those deciles within those flights. Next what I'm going to do is I'm going to take a piece of code that's going to generate, on one hand, the departure_delay and then within each column is going to be the decile, 0-100 percent of delay for a rival for that flight. The way this looks is, I have departure_delay and then I have each decile and each column all the way to 100. Just to give you a sense of the sizes this DataFrame, I'm going to type df.shape. That's going to give us the first value, the number of rows and database in this DataFrame, 508 and the number of columns, 12. That would make sense being that it's departure_delay in each of the deciles. The last thing that I'm going do is I'm going to plot the departure_delay by each of these deciles. On the x-axis I'm, going to have the departure_delay and on the y-axis I'm going to have the number of flights that takes each of these decimal values. In this piece of code, I'm going to first remove the zeroth percentile and the 100th percentile because those are extremes and then I'm going to plot the departure_delay. I'm only going to look at departure_delays from between 30 minutes early and 15 minutes late. When we plot this in the notebook, what we see is generally the same trend. As flights are delayed, if they're more late, they're going to arrive late, what you probably expect in the sense. What I really want to highlight in this lab is this pattern of doing something of large-scale data analysis over 70 million rows in BigQuery. Generally those aggregates using the serverless technology on BigQuery. Then taking the aggregation in this case, we have 508 rows that have been generated from those aggregate statistics and being able to do the analysis on this virtual machine. We're going to be using this pattern over and over again in this specialization so make sure you get familiar with it.