So for our last exercise this week, we're going to look at mapping some basketball data. And we're going to particularly look at where shots were thrown on the court and look at the different success rates for where the location happened to be. And then, look at the statistics for some individual players. Okay, so, again, as usual, we start off by running the packages that we need for the data and then loading the data itself. And we've got a shot log from the season 2016/2017, and you can see a description of the variables here. We've got locations for X and Y for over 200,000 shots, so that's going to give us a pretty good picture of where shots are taken from. And then, we're going to look at some subset of that data. So let's just draw a simple plot of the coordinates of the location x and the location y. We need our markets to be pretty small, since we're going to have an awful lot of data here when we drew up those shots. So here, you can see clearly the shots are either predominantly taken around the basket or around the three point line just over the three point line or their within the three point line. And, of course, not many shots taken outside of that area now. One problem immediately, you can see with this data is that the scale is not quite right, so a basketball court has specific dimensions. It's 94ft by 50ft, and this plot is not drawn to that scale. But we can actually impose that scale by using big size, so we can use plt.figure to specify the dimensions of the x-axis and the y-axis. And that's what we do in the next line of code. And we also at the same time put some grids onto this, so we can get a better sense of where the shots were taken from. And now, you can see a somewhat more accurately scaled picture of where shots were taken. Now here, we have a full court, and we're really going to be interested in looking at just one plot for the all shots based around a common location of the basket. It shouldn't really matter to us which end of the court the shots were taken from. So the first thing we could do is just say, well, let's just look at the right hand half of the court if we just cut our dimensions of the x-axis defining the range to be from the midpoint of the data. The x-variables run up to 933, that's the ultimate distance on the x-axis. So it runs from 0 to 933. So if you just divide that in two, you see the bottom line of the court plt.xlim just defines from halfway along the x-axis to the end of the x-axis. Then, we can see a plot of the right hand half of the court. Now that is in some sense more useful for if we're going to analyze the distribution of shots. But then, we've also left out half the data because we've left out the left-hand side of the court. But what we can do is we can redefine shots in the left-hand half of the court as if they were looked at from the right-hand side of the court. So we just create a mirror image, and we do that in the following line of code. We define the half court x-access coordinates and the half court y-axis coordinates depending on where the shot was taken. And it's either going to be the mirror image if it's on the left-hand side, or it's going to be just the value for the right-hand side of the court. And, so we create those mirror values in this line. You can see the data here, the description of the data. So we still have all the same data we have. And now if we run this using the half court x and y values, we can see here what this is going to look like. So now, we have on this plot, we have all the data, but we've all around one basket, and so for ease of our analysis. Now, let's break down the data by looking at different kinds of shots, okay? So let's look at three different types. There are shots that are scored. There are shots that are missed, and there are shots that are blocked. So let's look at a plot for each of those. So we've got first the scoring shots and here we show those in green. And you can see again the scoring shots are located most heavily around the basket and then around the three point line with relatively modest dusting inside the D. And then, if we look at the missed shots. Actually, we see a rather similar pattern again around the three point line and around the basket itself. No disappoint that people have been making about basketball for quite a lot recently, which is the change towards shooting more from the three point line and less within the three point line. And you can actually see this sort of a perimeter just inside the three point line, which is almost completely empty. The player is very conscious of shooting from the edge of the three point line, and it was representing part of the change in the way that basketball is played nowadays. They're finally let's look at block shots and block shots look slightly different. And again, perhaps not surprisingly, most of the block shots are actually shots around the basket itself or shots thrown inside the three point line. Again, relatively few block shots at the three point line is relatively hard to block a shot from that location. Okay, so that's looking at the three different types of shots. Now, let's do something which is perhaps the most obvious thing you'd want to do here is compare players, and so we can generate first a table to show who all the players are in our data. You can see here the number of shots for each player in this season, and many names. Obviously, the high end of the high count values very recognizable names. So obviously, most of the best players are the ones that are going to be having the most shots by and large. So let's compare probably the two most famous names in basketball right now, LeBron James and Steph Curry and see what they look like in terms of their short distribution. So we're going to create two subsets of the data, one for LeBron and one first Steph Curry. And, so let's first create LeBron subset, again based on the shooting player's name. And let's look at again the distribution of shots for LeBron James. And what we're going to do is, we're going to put all the shots on to the same plot. Of course, is going to be many viewers since we're focusing on just one player. And we'll have the different types of shots in the different colors that we've had, red for scored, blue for miss shots, and green for block shots. So you can see here LeBron James scatter, again mostly intense colors around the basket itself, but obviously a lot from the three point line. But when we do the same plot for Steph Curry, we get, interestingly, a somewhat slightly different distribution of the dots. We can see here, there's slightly viewers under the basket and more from the three point line. And if we plot them alongside each other, these differences become relatively clear. So a couple of things you can say about this firstly, perhaps not surprisingly, LeBron is shooting more from close to the basket driving to the basket. But also you can see that Steph Curry is shooting more from the three point line. And perhaps more subtly, you can see that LeBron tends to shoot on the three point line somewhat more from the left-hand side of the court, and Steph Curry tends to shoot far more from the corners than does LeBron. So you can see differences in the style of play of the two players from plotting these charts. And, of course, you can now go on and compare any players that you're interested in, and those plots could be quite revealing in terms of understanding strategy. And indeed, coaches would be interested in, from the point of view of defense, trying to figure out where the LeBron is most likely threats, scoring threats are going to come from. So finally, what we're going to do is zoom into the data a little bit and see what happens when we look at a much smaller grid in order to think about the where shots are coming from. So here we're going to define a grid, which is going to be just really close to the basket. So again, we can define our coordinates to take any values we want. So we're going to take a sort of a little rectangle based on our grid around the basket and see what that looks like. And, so again, what you can see here is that, of course, on the right-hand side and the center is the basket, so you can see there's an awful lot of shots around there. But one of the problems is when we zoom into this scale, we hit some problems, so you can see the data is really organized in vertical lines. And you can ask yourself, well, what's the difference between each vertical line? What's the gap? And that gap between each line would actually be about one inch or 2.5 centimeters. So it's really very small difference. And that relates to the resolution of the data when the data is tracked, it's not being precise as to the difference between each 1 inch. There's nothing in between there, it's just 1 inch separation. And that means that we get a lot of dots superimposed on each other, which means that when we're interested in looking at intensity, it becomes relatively difficult to look at this intense level. If we focus even more closely in, we can see that this becomes really impossible to say very much about the data. We can see here at this level of focus for example. Again, remember the gap between each vertical line here is 1 inch, so there's virtually no difference at all between each line. But the resolution of the data is not fine enough to be able to fit the dots in between these gaps. And therefore, what's happening is that lots and lots of dots are superimposed on top of each other, which makes it relatively hard to see what's really going on. So this is a limitation of the data depending on the level of resolution that you use. However, this data is still very useful for looking at the broad picture of the performance. We've seen that we can make interesting comparisons amongst players. And this shows how presenting our data in the form of plots, graphs, and charts is always a very useful exercise as long as we remain aware of the limitations of that exercise. However, we now want to move on, having thought about how to visualize our data and having shown different things we can do. We want to do some more precise statistical analysis, and so the next topic we're going to move on to is to think about regressions, which we're going to work with a lot in these MOOCs. And, so in the next two weeks, we're going to focus on regression techniques and how they can use starting off with a relatively simple introduction. And then, expanding that to think about specific problems that arise in regression analysis.