All right, so now let's talk about how this script works. Let's begin at the top of the for loop. Here we're using find as we did previously, but the filter that we're passing here is an empty dictionary, or an empty document. The semantics of this for MongoDB are that we expect this to retrieve as results from this query every document in the collection. Now here what we're doing is limiting it to the first 100 documents retrieved using the limit method. Limit is a cursor method. So find, as we know, returns a cursor, which allows us to iterate over the documents that are retrieved. Limit is a cursor method which will restrict the number of results that we see to, in this case, to the first 100. I could pass 1,000 here. I could pass 365. It just depends on what you need. Limit also returns a cursor. Now doing so as part of a for loop means that Python will iterate on that cursor, and each time through movie will be the next document in the collection. Suppose that we hadn't set a limit here, or that it was quite large, thousands and thousands of documents. PyMongo and MongoDB work together to do the right thing here which is to return the documents in batches of many dozen. So with each batch, we have that many documents on the client's side and are processing them. And as we get close to exhausting a batch, PyMongo will go fetch more documents from the server in another batch. And so rather than pull documents across from the server one at a time, we're pulling them across in batches. And that's how we end up looping through each of the documents in the collection. Now the next thing to look at are these fields_to_set and fields_to_unset dictionaries. In the body of this loop, this is where we are building those dictionaries that we see as the values for set and unset in the update document. So the first thing that I do here is loop through all of the fields, pulling out the individual keys and values for each document as we pass through the loop. If the value is some version of an empty string, then I remove it from the movie dictionary. Note that I'm using copy here so that I can do that without getting the dictionary change size error. I remove it from the movie dictionary, and I create an entry in the fields_to_unset dictionary for that key. That's how I end up with, for example, this metacritic entry in the unset dictionary for this document, because there simply wasn't a metacritic value for this particular document that I'm going to update. It was just the empty string. So I can save space and make my data a little bit more readable by getting rid of all that empty data. And then we dive into a block of code where I'm splitting string values into arrays. Note that for each one of these, I'm checking to see whether it actually exists in the movie dictionary, because if it was the empty string, I would have deleted it here. And note that for director, writer, genre, and so on, I'm specifying that that field should actually be eliminated as part of the update operation, if it wasn't already eliminated here. And I'm saying let's add a field called directors in this case that is the array that I get by splitting that string value of directors. Likewise, for writers, genre, language, country. Note that for cast, I do the split, but I don't unset the cast field because I'm using exactly the same name for the array version of that field, rather than some plural form of cast. Here I'm simply changing the name of fullPlot so that it is camelcase, and doing that same change here to use rated versus rating. This is the chunk of code that creates the embedded document. And here I'm converting string representations of dates to date times using the datetime class. Finally, this is one of the more interesting bits of this particular script. It's something we actually can't yet do in the aggregation framework. Here, I am changing the data type of runtime. More specifically, I'm really doing two things. I'm first running a match on runtime, because if you take a look at any runtime you'll see that it's this ridiculous string of 1 space min, 6 space min, 190 space min. It's not an integer, so we can't do range queries. We can't do any of the types of analysis we'd really like to be able to do because it's a string, and a string that's in an inconvenient form at that. So here at the top I've compiled this regular expression that will do a match and will group the different components of the match in such a way that I can pull out the integer component for that value. And then I simply convert it to an int here, and make sure that runtime gets added to the fields to set. So that here when I construct the update doc, that's one of the changes that will be made as we call update_one in order to make the necessary updates to this document. Now something I want to point out here is that note that we're not doing anything with title, nothing with year, nothing with plot, nothing with any of these fields where we've just got a 1 here. We're essentially we're just passing the value through. As long as there is a value other than the empty string, we're doing nothing with it. That's because to update, we only need to identify the fields that we want to change in some way, whether it's setting them to a new value or eliminating them from the document. Now in this example, we're using just two of the update operators, $set, and $unset. There are a number of other update operators. See the lecture notes for this lesson for a link to the relevant MongoDB documentation. I can increment values. I can take the min and max, multiplication, a number of operations with arrays and a whole host of other types of updates that reflect the variety of different things we have to do when we're transforming our data from messy and somewhat ill formatted to data that we can actually use. So take a look at the docs to see what else is possible in update operations. Lastly, I build my update_doc from the fields_to_set and the fields_to_unset, those two dictionaries I've been building all along. Note that each one of these creates a new entry in either the fields_to_set dictionary or the fields_to_unset dictionary. That's how I end up with output that looks like this. Then if I were to run this, eliminating the limit, and uncommenting my call to update_one, I would, one at a time, update each document in the collection. Update_one is great when I'm updating a few individual documents. But when I'm trying to reshape an entire collection, this is not a terribly efficient strategy. I'm going to change this script so that I'm actually doing bulk writes, rather than writing one document at a time back to the server. I'll show you how to do that in the next lesson.