Alright, so let's dive into projections a little bit more. And for this, I'd like to use the aggregation framework. As is the case with much of the way that MongoDB works, particularly with respect to reading and writing data. Nearly anything that you can do in the MongoDB query language, you can do in the aggregation framework, and with the aggregation framework, we have even more power in terms of projections, which are important for reshaping data as we're going to want to do in cleaning up this movie's data set. Now for a number of examples here, what I'm going to do is just experiment a little bit to show you the ways in which we can use the aggregation framework for ETL work. And as part of these examples, we'll use the movies_scratch collection as the place where we create output to look at examples. Now for this type of exploratory work, there's no reason to run against the entire collection. So what I'm going to do with all of these examples is limit the documents that I'm working on to the first 100 in the collection using a dollar limit stage as the first stage in my pipeline. And as I mentioned, we'll always be dumping the results out to the movies_scratch collection. Again, doing this just to build up our pipeline that's going to reshape these documents. In considering reshaping, we're going to be looking at a stage we've not yet looked at called dollar project. Dollar project allows us to specify a projection on all documents that pass through this stage. So what we're going to do here is ignore the fact that we've already looked at splitting several of these fields into array valued fields using an add fields stage. Where we're at at this point is that we've done enough work on some fundamentals with respect to the aggregation pipeline, filtering, projecting, that we can then really begin to talk about how to use dollar project to do a lot of what we need to do with respect to cleaning up data. So the best way to think about a dollar project stage is as a tool for reshaping documents. As we saw in a prior example, I can stipulate that I want to include fields by simply specifying the field name and a one. And I could explicitly exclude fields with the use of a zero. In this case, what we're going to do is pass through all fields of every document, but for some we're going to do a little bit of work, to either clean them up or to reshape them in some way so that they meet our needs for the type of analysis and application building we want to do down the road. So any place in this project where you see one simply means that field contains data that's good enough or that we're willing to work with. It's only those that have something else that you really need to focus on. There are a couple of silly fixes that we're making here as well. And by silly I really mean just kind of tidying things up as opposed to fixing a real problem. This is one example. In the movie's initial data set, there is a field called full plot. The only real problem with that field from my perspective is that it doesn't use camel case as the name of the field. So what I'm doing here is effectively renaming that field to full Plot. And as we saw with some other lessons, this is a field path specifier and the semantics of this are, find the value of the full plot key in all documents that pass through here. And what we're going to do here then is create a new key with this name and make as the value for that key for every document that passes through this stage. The value of the same key with the slightly different name, full plot with no capital P. Here we're doing something similar in renaming the rating field to rated. I found with this data set that people get confused about what rating versus rated. The rating field is actually the MPAA rating, PG, R, PG-13. And that with the name rating, people tend to get confused thinking that its critics review or a viewer's review. But don't have that problem if we rename it rated. And then you'll notice here that what we're doing is that same split only I've accidentally named it actors instead of cast. So fixing that, you'll note that we have that same split for each of these six fields here to convert each of the string values for these fields to arrays. But the cool thing about project is that, because nowhere in here do we specify that the field director or writer or genre should be passed along out of this stage by specifying director colon one for example. We are not only splitting the string values into arrays, but we're also renaming these fields to their plural form. So instead of director we'll have directors, and instead of genre we'll have genres for example. So both the splitting and renaming of the same kind that we did when he renamed these two fields. And lastly, what I'd like to point out is that we're actually going to create an embedded document here for IMDb. The movie's initial data set, if we take a look at this encompass, currently has field values for IMDb rating, IMDb votes. In fact there are three keys that all lead off with the name IMDb. So rather than do that in this data set, what I'd like to do instead is have a single IMDb key and as its value, I want not a scalar value, not a string, and not an array, but instead an embedded document or in Python parlance, a dictionary. And in this case the dictionary then will have the keys ID rating and votes and all we're going to do here is use the value for the IMDb rating key and the IMDb votes key and simply use those values for these new fields that I'm creating in this embedded document here as the value for IMDb. This is not strictly necessary. I'm providing this merely as an example of some of the flexibility that the MongoDB document model provides to us in terms of how we structured data. The query language and the aggregation framework fully support this type of embedded document, and there are lots of reasons why you might want to do this in a data model. We're looking at one very simple reason here and that is simply to save a little space. In other lessons, you'll get plenty of experience querying on these types of embedded document fields. Okay, so let's take a look at this as a whole. We're passing through every field that we find in movies documents. Some of them pass straight through like title, year, released and runtime. Others, we're splitting into arrays and renaming to their plural form and others we're doing some simple renaming. And finally we're doing some reshaping here, creating an embedded document. So project allows us to define the shape of documents that are output from this stage using field values for documents that are input to a project stage. Now if we run this, let's take a look at what we get as output. So again, we're sending the output here to the movies_scratch collection. And I also want to point out that we should only have 100 documents because we're limiting the documents that we look at from the very beginning. Okay, and if we go into our movies_scratch collection, note that there are just the one hundred documents that we expect. And let's take a look at the shape of these documents. We're doing this as a test of whether our strategy worked as intended. So here we see title and year, runtime, released being passed through as they were found in the movie's initial collection. Note that we have a directors, cast, writers, genres, languages, and countries array for each of those fields. But note also that we don't have that redundant director or writer or a genre field that we had when we used the add field's stage in an earlier lesson. Lastly, we see that we have an embedded document here for IMDb with the expected values here. And of course full plot and rated were renamed as expected. One thing I didn't mention is that we actually also renamed last updated, again introducing camel case. And you can see that if you look at the very last field in our project stage. Okay, so that's an initial example of using project to do some fairly simple reshaping of our documents and I would say leaning toward extremely useful reshaping when it comes to splitting strings into arrays.