In this lesson, we're going to talk about how we can graph data with MongoDB. Throughout this course, we've spent time discussing the power of MongoDB's flexible data model as expressive query language. This lesson is going to be focused on how to use MongoDB with a tool commonly used by idea scientists and analysts, matplotlib. Specifically, we're going to look at how to use matplotlib to create scatter plots, 3D plots and box, also commonly referred to as box and whisker plots, using matplotlib with MongoDB. Before we get started talking about how to create each of these plots, first we need to talk about how we pass data to matplotlib. When graphing with matplotlib, it almost always expects the data to be passed in via one or more arrays. So, for this example, here I've created an array with values 1-5. And then, I take each of those values and, using list comprehensions here, I square each value. And we can go ahead and print this to see what it looks like, and as you expect,, we see one, two, three, four, five and one. four, nine, 16, 25. And how do you pass these arrays to matplotlib? It's pretty easy if we want to create a scatter plot. This is a common convention to go ahead and clear figure, we then go ahead and set up our subplots so we can actually access our axes. And then with our ax, we're able to use the scatter method, passing in our two arrays, and then finally, we go ahead and show the plot. And when I scroll down here, you can see, matplotlib figures out these ranges of values on both axes and plots our coordinates. As you can see, it forms a nice little X squared curve. And, we can do this exact same thing for data stored in MongoDB. We go ahead and import pymongo, create our cluster string, connect to our cluster, and then we can go ahead and write a query. Now in this example, I want to plot the length of a movie, it's runtime against its rating. So, let's create a query where we return only documents that have both of these values. And then I'm going to go ahead and just project out just those two fields. We'll go ahead and create these variables, and then run the query, and turn into a list. And when we print this, as you can see, when we print this the first time, you can see that we get both our metacritic and runtime values. However, like I demonstrated earlier, matplotlib expects a separate array for each value, not a single array or pairs of values. To do this, we're just going to iterate over each value in our RM array, pulling out the runtime. As you can see, this is now just an array of runtimes, and similarly, we're going to do the same thing for the ratings. So now that we have two arrays, each for one of the values that we're comparing, we can now go ahead and just pass this matplotlib. You can see that I'm using some additional parameters now. I'm using an Alpha, so that each point on the plot isn't fully opaque, so this way we can kind of see where lots of points overlap. And then also, I'm giving a title and labeling my axes. And when we plot this, as you can see here, there is a bit of correlation. Movies that are very long in runtime seem to do pretty well on ratings. There are very few points where we have very long movies that have low ratings, which is kind of an interesting correlation. But I don't want to spend too much time analyzing this, I want to kind of show you more of the functionalities that you can do with matplotlib. The same process can be applied to 3D plots where we want to compare three values. In this case, we're going to continue to plot the same variables as before, but we're also going to see if there's a correlation between year and either of these variables. So first, we're going to go ahead and import the Axes3D module, go ahead and clear our queries, this time I'm making sure that not only does year exist but also that it's an integer, and then, of course, I'm going to project out those three values, going to go ahead and execute my query, turning it into a list of values. And just like before, we're going to use list comprehensions to pull out each variable, and very similarly before, the syntax is a little bit different now because we're creating a 3D plot but, as you can see, we're really just doing the same things, we're clearing our figure, getting our figure, getting our ax, using a scatter plot function, and then setting our labels. The key difference here being that we had to use add_subplot and passing in a 3D projection parameter. And when we go ahead and plot this, as you can see, we now have a cooler 3D plot, and what we're really seeing here is that there aren't very many years of data. We have movies that go all the way down to 1940, but as you can see, we don't really have any runtime data or any rating data for these values. Finally, let's talk about one more plot which is the box plot, also commonly referred to as the box and whisker plot, because you have this little box and these lines that come off that kind of look like cat whiskers. Box plots are used to visualize the minimum, maximum, median, and first and third quartiles for a given set of values. In our case, we're going to plot several of these box plots representing each day in a week. Specifically, we're going to visualize the distribution of the hours of days that people watch movies on mflix across each day of the week. To do this, we're going to use this watching ping's dataset. So, imagine for a moment that if every 15 seconds that someone is watching a movie on mflix, we recorded that. And that's what each of these documents kind of looks like. It's a reference to which video they were watching, the user ID of the person so, who was watching, and then a time stamp for when they were watching. And so, since we send this every 15 seconds, that someone is watching a movie, we can now use this data to visualize a distribution of what hour of the day people are watching movies. But first, we're going to transform our data a little bit. So, here we use the aggregation framework. We're first going to take a sample of 50,000 documents, we're going to add some fields that we know which day of the week and which hour of the day they were watching, and then finally, we are going to group by day of the week, pushing each document into an array field called pings. And finally, we're going to sort by ID, which is the day of the week, so that Sunday is first and Saturday is last. So, what this means is that this cursor is going to return seven documents, each document is going to have an array field called pings that's going to contain a sample of the pings for that day of the week. But what we really want is an array of arrays, not an array of dictionaries of array of dictionaries. So, let's go ahead and run this query and then we can fix our data a little bit by pulling out our array of pings, and now that we have an array of documents, we want to get to an array of arrays. So, now we can really just pull out the hour of the day for each pings. And now we have that matrix. We now have an array where each entry is another array containing the hourOfDay for each ping. So, the outer array is the day of the week, and the inner array is the hour of the day. We can now take this matrix and very simply pass it to the box plot function. And as you can see, we now have seven box plots, each representing a day of the week. And then it kind of makes sense, there's more people watching movies on the weekends on average, but people are still watching a lot of movies during the week. And as you can see, people watch movies later in the day on the weekends. There's lots of people watching movies late at night. These little circles are outliers in our data and we don't really see any of these during the week. And moreover, people seem to be watching movies later in the day on average relative to during the week. So to recap, we saw how we can create each of these different kinds of plots using matplotlib in MongoDB. It's very easy to get the information we want out in MongoDB, it just requires a little bit of transformation in order to make it work well with matplotlib to get the..