In the previous lecture, I introduced the plot function which generates a series of points which get rendered against an axis object. Matplotlib actually has a number of useful plotting methods in the scripting layer which correspond to the different plots we might want to actually use. Now we're not going to go through all of them here, but we're going to touch on a few of the major ones. Regardless, there's a few things to keep in mind that we learned about in the previous lecture. First, pyplot is going to retrieve the current figure with this function GCF Get Current Figure, and then Get the Current Axis with the function GCA. Pyplot is keeping track of all of these axis objects for you, but don't forget that they're there and we can get them when we want to do something with them. Second, pyplot just mirrors the API of the axis objects. You could call the plot function against the pyplot module, and this is just calling the axis plot functions underneath. Finally, remember that the function declaration from most of the functions in matplotlib and with an open set of keyword arguments. There's a lot of different properties you can use to control these keyword arguments and you're not going to find them described for each function in the docs. This can be a bit frustrating because you end up just having to learn from examples. I'm going to start peppering in some of these arguments this week to demonstrate the matplotlib documentation and the value of it. You can take a look at that documentation for each function using this question mark. With that in mind, let's move on to talk first about scatter plots. A scatter plot is a two-dimensional plot similar to the line plots that I've shown previously. The scatter plot function takes an x-axis value as the first argument and a y-axis value is the second. If the two arguments are the same, we get this nice diagonal alignment of points. Just as before. I'm going to bring in matplotlib.pyplot as plot and bringing that scripting interface. I'm also going to bring in numerical python and we're going to use this later. Matplotlib is actually really based on these numpy arrays and so you see a lot of examples with this. First thing I'm going to do is create some fake data. I'm just going to create this, I've got this list of eight data points, one through eight, and I'm going to create an array out of it. Then I'm just going to make it symmetrical. I'll just have this y value, which is the same as x. Then I want to create a brand new figure for this cell. Then I want to get matplotlib to produce a scatter plot. This is similar to the plot dot plot function, but underlying objects, in this case, are not lying to these. We can see this nice arrangement of points, symmetrical, nice diagonal one, two, three. The x-axis and the y-axis have been constrained automatically by matplotlib to fit our data. Now, scattered doesn't represent items as a series as plotted. Instead, we actually can pass it a list of colors which correspond to given points. Let's use some basic list arithmetic to create a new list just short of the number of data points we need and set all of those values to green. Then we'll make the final point read. We use the same data one through eight. Now, I want to also make this list here, green, green all the way, and then the last one red. Now I could have just hardcoded this list in here. I wrote it in the docs so you'd be able to see it, but you could also take just a list of one element green as a string and times that by the length of our data minus 1. That's going to create seven of these green elements here. Then I can just append to it or read. I just want to show you you can use the normal numerical python arguments here. Then we're going to create a new figure. Then when we call scatter this time we're going to put it in the x and the y. I'm going to show a new parameter here, s for size. Then I'm going to pass in our list of colors that we want for those points. We see that we've got all of those points there a little bit bigger. They're all green except for the last one, which is red. The separation of data points into lists is actually a pretty common pattern in matplotlib. You have some set of items, but they're actually described as the same slice across many different data sources. Those who are familiar with object-oriented programming might expect that each data point is actually represented with its own instance, which encapsulates all of its properties. For instance, a point has an x value and a y value and a color and a size. That's not the case here. This is where it's useful to have some knowledge of list comprehensions and lambdas as we talked about in the first course. Also important is the zip function and list unpacking. Because this is so common in the python data science world. I'm going to take a brief aside and talk about that here. Recall that the zip method takes a number of iterables and creates tuples out of them, matching elements based on the index. If we have two lists of numbers, zip will take the first from each and create a tuple and then take the second from each and create a tuple, and so on. Also, remember that zip has lazy evaluation because it's actually a generator in python 3. That means we need to use the list function if we want to see the results of iterating over a zip. We call this realizing the generator. Let me demonstrate that. Here we're going to create a new zip generator with a zip function passing in two lists of numbers. When we convert this generator to a list, we'll see that there's a list of pairwise tuples. I've got two lists here, one through five and six through 10 and I'm passing them into zip. The return of this is actually a generator. Then here I'm going to print the list of that generator. This realizes that generator, it actually runs through the generator and creates the new list, and it will create this output. Now, we can do that again, but this time we can use unpacking instead of list. I've got the exact same generator function. I've created the exact same generator, but this time I'm calling print and I'm taking zip_generator star. In this case, it actually unpacks that collection into positional arguments. You see that we get all of those same tuples 1 6, 2 7, 3 8, but they're not a single list, their individual arguments. Let's take a look. Starting point data as tuples is common and you should be familiar with being able to convert data to and from this form. If we want to turn the data back into two lists, one with the x-component, one with the y component, we can use the parameter unpacking with zip. When you pass the list or an iterable to a function and prepended with an asterix, each item is actually taken out of the iterable and passed as a separate argument. That's a brief aside on a valuable tool which you'll see a lot when manipulating and cleaning data in the wild in Python. Now, let's get back to plotting. Let's take these two lists and plot them as a new figure using scatter. Instead of plotting them as one data series, we'll slice the lists and plot them as two data series. Instead of setting a color per data point like I did previously, we can color each series with a single value if we want to. We can also do things like change the color or transparency of an entire series, and we can add descriptive labels, of course, as well. In this example, I'm going to create a new figure, then I'm going to plot a data series tall students in red using the first two elements of x and y. I just take our x variable and our y variable from before. Remember this is the slicing notation. It just takes out two elements. The first two elements. I'm going to set the size to 100. I'm going to set the color to red. This is going to color all of the items to red if you pass it in as just a single item like this. Then I've set a label, tall students. Then I'm going to do the same thing for the short students here, but it's actually from the second element onwards. That's actually the third position because we index at zero and I'm going to make it blue. Let's execute that. We can see here in the image that we have two tall students that are red, and then we have all of these short students that are blue. You'll notice the label didn't show at the moment it's actually just held internally for each series and I'm going to get to that in a minute. But let's talk a bit more about the properties of an axis instead. The axes generally have labels to them to explain what they represent or the units that they describe. Charts tend to have titles as well, so let's put this in place, since pyplot mirrors much of the API of the axes object, we can make these calls directly on the pyplot object. I'm going to do the exact same thing, create a new figure. I'm going to create our two scatter points. Now if I just want to add a label, I'm going to add the label, the xlabel. That'll appear at the bottom. The number of times a child kick the ball is what I wrote here. Then we can add a label to the y-axis as well. We can just use plot.xlabel, plot.ylabel, and plot.title for these. Underneath what's happening is plot is calling GCA, Get Current Axis, and then calling the set xlabel and set ylabel. Let's add a legend as well. You'll see that Matplotlib is going to place our legend in the upper left-hand corner, and it's going to put a little bit of visual information there about our students. We have the exact same code here. I've just copied and pasted it. But here I'm going to add the legend. To do that, all we do is call plot.legend. It indicates that we should add that legend and use the labels that already exist on the data underneath. If you have those labels there, Matplotlib is going to use them. If you go to the docs for the legend, you'll see that there's a number of different parameters and one of them is called location. A value of four will actually put it in the lower right-hand corner of the axes. Now, we can also get rid of the frame and add a title and do any number of different visual transformations. I've taken the exact same code here and now for the plot.legend, instead of telling Matplotlib to just do whatever with it, I'm setting the lock for it to four. I'm saying, I want the frame off, so frame on equals false, and I'm setting it a title for the legend. Now, all of the things I've been showing you are renderable objects associated with the axes. The legend, for instance, is an artist which means it can contain children. Let's take advantage of this and write a little routine to recursively go through the list of children in an artist. This is a fun activity I like to do to really demonstrate how Matplotlib is working underneath. We're going to import the artists class from Matplotlib. Again, you're not going to do this when you're doing your regular data science. But sometimes you want to debug essentially a chart or a figure and you want to really understand how it's being built underneath. I think this is really powerful. We're just going to make this recursive function called rec_gc. It's going to take in some artists and some depth parameter. It's going to check if the object that's been passed in is an artist, and if so, it's just going to print out its name and then it's going to recursively call itself using get_children. We're using some fun stuff here. We're printing out a couple of spaces and then the depth, and then whatever the artist's name is. We'll get this nice indented list. Finally, let's call this function on the legend artist and see what the legend is actually made up of. You can see that that simple little legend, a little title, a little box, and a couple of dots that as an artist is actually made up of a number of different offset boxes for drawing, as well as text areas and path collections. What I want you to take away from this is that there's nothing magical about what Matplotlib is doing. Calls to the scripting interface. Just create figures, subplots, and axes. Then they load those axes up with various artists, which are the back-end renders put to the screen or some other medium like we looked at as a file. Now while you're going to spend 95 percent of your time at the scripting layer, happily creating charts and graphs, it's important to understand how the library works underneath for the other five percent of the time. The time that you're going to use this is when you want to have really fine grain control over and to create your own charting functions. You're going to have a chance to do that by the end of this course. In the next lecture, we're going to go back to line plots and create a few more and then onto bar charts.