Another way to test Han is to look at the auto correlation of the outcome of the shut. With auto correlation in particular the first auto correlation we test the linear relationship between the outcome of adjacent time period, such as the relationship between performance this season and performance last season. The relationship between, scoring the current goal and scoring the previous goal. Suppose we have a variable x, and we have n time periods in the sample. We can then list the value of x in time order as a sequence from x1 to xn. Auto correlation measures the correlation between adjacent time period x1 and x2, x2 and x3 etcetera. So, for example, if we have five seasons of performance data, that is n equals to five, then auto correlation coefficient is the correlation coefficient between x1 for x4 and X2 for x5. That is well defined variable, which captures the values of the first four seasons, and define a second variable that captures the value from the second to the last season. Will then calculate the correlation coefficient between these two variables. Specifically, the auto correlation coefficient is defined as the ratio between the co variance between xt and xt minus one and the product of the standard deviation of xt and xt minus one. Similar to correlation coefficient, the auto correlation coefficient is always between negative one and one, and it is independent of the unit of measurement. The interpretation of auto correlation coefficient, however, is slightly different from the interpretation of correlation coefficient. If the auto correlation coefficient is positive, that means that there is a positive relationship between values of a vision time period. In other words, xt and xt minus one are moving in the same direction. So if the previous performance is above average, then the current period performance would also be above average. And if the previous performance is below average, then the current period performance would also be below average. So visually will observe a smooth line. If the auto correlation coefficient is negative, it means that there is an inverse relationship between values of adjacent time periods. In other words, xt and xt minus one are moving in opposite directions. If there's a good performance in the previous time period, the current period performance will be poor. If the previous period performance is poor, then the current period performance will be good. So in the graph will observe a zigzag line. So now let's return to our Jupiter notebook and calculate the auto correlation coefficient of the outcome of the shots. So we would like to calculate the correlation coefficient between the variable current shot hit and leg shot hit. Note that in Python we could actually use the auto core, function to calculate, auto correlation coefficient. We'll not use this function in our analysis, since we want to look at the auto correlation coefficient within each individual game for each player. So using the auto core function, impact them, we will treat the last shot from the previous game and the first shot of the subsequent game as a pair. We'll calculate the auto correlation coefficient of the outcome by calculating the correlation coefficient between current shot hit in last shot hit. As we can see, though the auto correlation coefficient is positive, the magnitude, its very very small. And equals to zero. It may not make much sense to combine other players together to calculate the auto correlation coefficient, since some players may exhibit the hot hand, while some may not. So let's calculate the auto correlation coefficient by individual player using the group by function. So in this table for each player, the other correlation coefficient is shown on both the top right and the down left cells. We could see that some of the magnitudes of the auto correlation coefficients are much bigger. We may not want to print out two by two matrix for every player. We can use the unstack function, important to reshape the data. So let's create a new data frame called auto core underscore hit, to store the auto correlation coefficient for the outcomes of the shots for each player. In this newly created data frame, each row would represent a single player. However, we still have some duplicate information in the column. We can use the iloc function to select the columns that we need in the bracket after the iloc command. We first specify the rows we would like to select. Them the columns. In this auto core underscore hit data form, we want to select all the rows as each row represent only one player. For the columns, we'll only want to save the second columns which is index one. We call that in Python the first row, or the first column index zero, and the second row in second column are index one. So in this bracket, after the iloc function, after the comma, we'll just write one to indicate the second column. Lastly, we would also like to reset the index so the player names will become a variable rather than an index. Notice that we still have two levels of variable names, the current shot hit and the last shot hit. We can use to get level values function to reset the variable name to the first level, which is index zero. So lastly, let's also rename the variable, capturing the auto correlation coefficient to auto core. How informative, the auto correlation coefficient also depends on the average number of shots per game for each player. In the last lecture, we calculated a game level shot count for each player, let's take the average of this shot count using the group by function. We create a data frame call, play a game shot and call this variable avg underscore shot underscore game. We were at this average number of shots per game variable to the auto core underscore hit data frame. We can also sort the data by the size of the auto correlation. As can be seen from this table, Kyle Wiltjer has an auto correlation coefficient above 0.5, and he's the only one that has an auto correlation coefficient above 0.5. However, the average number of shots per game for Kyle Wiltjer is only 1.75. So we are using a first more sample to calculate his auto correlation coefficient. Therefore, from the auto correlation coefficient analysis again, it is hard to conclude that there is any strong evidence to support the hot hand. So finally we emerge the play a game shots data frame we created in this lecture to the players shot data framework created in the previous lecture, because they are both player level data on the shot. Will also save the updated data frames and export csv files. We'll call the shotlog data frame, shotlog2.csv, and similarly for a player stat and player shot data frames.