Okay, all right, here's the really fun part. We get to introduce a new tool to a lot of you called Cloud Dataprep, where you can take a lot of the best practices you've learned for cleaning up your data and execute that using a fun user interface, the drag and drop interface. Okay, we're going to explore with some tools. So Cloud Dataprep, it's a preprocessing data pipeline building web UI tool, that's part of Google Cloud platform. Behind the scenes, what it actually does, is it invokes and it kicks off a cloud data flow. Would you remember that data engineering tool that we talked about? A cloud data flow job, without you having to write any of the Java code that normal data engineers would have to write. So it's a UI tool that'll allow you to access a lot of these functions, much like you would do deduplications inside of BigQuery, and then behind the scenes it will create that flow for you. And so that's the kind of key first word to know, is that these visual maps of, as you can see, data being ingested here. And then there's these scripts that are ran and you can create these scripts which are called recipes inside of the UI. And all of this entire picture here is called a flow. So data flows together as you're transforming it. And how do you transform the data? Through these transformation steps, and the tool itself just uses the word wranglers. So the wranglers could include things that are very common that mapped a sequel. But you're not writing any SQL, like aggregations, or deduplications, or remove these rows that satisfy this condition or create derived fields, much like as you would in SQL. And you can chain together multiple of these transformation steps into what's called a recipe. So three key terms, right? You have the flow itself, which is that very pretty visual flow. And then you have the wranglers, which are those transformation steps. And then you have the recipes which can be multiple wranglers that form a repeatable set of steps that preprocess your data. Okay, so we covered recipes, and at the end of the day, if you wanted to submit a recipe to actually run it, you then run that job. And the job then looks up the recipe, and goes and fetches your entire dataset. because when you're operating inside of Cloud Dataprep, it's only on a sample of your dataset, about ten megabytes, up to ten megabytes. And you're building that recipe there for the sake of the speed. And the actual job runs against all of your data and fires off that cloud data flow job behind the scenes, which you can actually access and look at. And after that job is run, you can look at some critical statistics, see how many rows of data were processed. In this particular screenshot here it's over 300,000 rows. And what we didn't show you, yet, was the really fun histograms that are available as part of the visuals that are kind of built into the tool, where you can look for the frequency of values, as well, right? And here's where we can track those jobs. So just a quick tour of Cloud Dataprep. Here is one of the flows that you're actually going to be creating as part of the second lab. And let's just find a dataset that we can just show you how cool it is to explore this data. So here is your organizational details. Here is your 2014, 2015 filing information. I'm going to click on one of these datasets, and then a picture is worth 1,000 words here. So as soon as this loads this ten-megabyte sample of data into what's called the Transformer in Cloud Dataprep, this is where you can spend just a ton of your time. And we'll cover more of this in the labs to come. But here you can see things like how many electronic filers are there. So it's like, imagine BigQuery where you have that Preview, right? That we saw before of some of the rows and columns, complete with this histogram on top, which shows you how frequent are those values, right? So it's just like if you're going to run a little SQL query and do a GROUP BY, and it shows you that column for every single one of these fields, as well. And although also at the top, it'll show you a data quality horizontal bar graph, where it'll say, all right, well out of this dataset you have 7 valid values and 24 missing values or values that don't match with the data type that you said it should. So this exploring view inside of this Transformer is very powerful for exploratory analysis, and looking for those anomalies in your data. And you'll get a lot more into that in the second part of this lab. But, first, you're going to have to get your data into Cloud Dataprep.