Hi, everyone. In the previous two lectures, we have got a good understanding of the NBA shot data set and have used conditional probabilities and auto correlation coefficient to test the hot hand. We have yet to find any strong evidence to support the hot hand. In this lecture we'll use regression analysis to further test the hot hand. Before we dive into our data in Jupyter Notebook, let's talk about conceptually how we could use regression analysis to test the hot hand. In the regression analysis, we will define a prediction error to be our dependent variable for each player. Let's use mu to denote the probability of hitting a goal in the given season. So this will be the statistical expectation of the players success rate. The prediction error will be the difference between the actual shot outcome and the average success rate. So we'll use a mu t to denote the prediction error. If the prediction error is positive, then the player performs better than expected. If the prediction error is negative, then the player performs worse than expected. If hot hand exists, then if the previous shot is better than expected, the current shot would also be better than expected. Therefore, we can run a regression where the current shot prediction error is the dependent variable, and the prediction error of the previous shot is the explanatory variable. If there is no relationship between the two prediction errors, the estimated coefficient here we use a rho to denote the estimated coefficient. This rho should be zero if there is no relationship between the two prediction errors. If there is a relationship between the two prediction errors, the estimated efficient rho would be statistically significantly different than zero. And in particular, if the estimated coefficient rho is positive and statistically significant, then we find evidence to support the hot hand. So now let's return to Jupyter Notebook and analyze our shot, not data set. Please open the Jupyter Notebook using regression analysis to test the hot hand. We'll begin with importing a number of useful libraries. This will include pandas, numpy, daytime set models.formula.api, which will use to run regressions and Matt Pot lib.py pot as well as Seaborn, which will use to make some statistical grabs. We also import the data set we last saved shot lab two.cs three. Player underscore stats two as far as prior underscore shots to the Cs three. As we discussed earlier, we use prediction error, which is the difference between the actual outcome of each individual shot and the average success rate to be our dependent variable. Let's create this prediction error variable called error. It equals to the current shot hit the dummy variable that indicates whether the current shot was a hit or miss minus the variable average hit. Our independent variable is the prediction error of the pretty shot. So let's also create a variable leg error, which equals to the difference between legs shot hit in average hit. Before the procedure regression analysis, it is always useful to visualize our data to see if there exists any pattern in our data. We can grab the outcome of the shot to see if there's any pattern over time. So before we do any graphs that's also make sure our date and time variable are stored in the correct format. We'll use the two time delta function in pandas. As an example we could take a look at LeBron James shut outcome during the regular season, we can graph a scatter plot of the current shot hit variable for LeBron James. Let's start with graphing LeBron James outcome after shots during the game on April 9, just and 17. We can use the plot function and specify time to be on the X axis and current shot hit to be on the Y axis. Indeed, in this line of code to make this graph, we're using a small trick. Instead of asking Jupyter Notebook to produce a scatter plot using the plot.scattered command, we use, we ask Jupyter Notebook to graph are nine plot. But specify the wave of the line to be zero. This essentially produces scatter plot. The reason we do it this way is because in Jupyter Notebook, scale apart requires the access is to be in numeric. It does not allow scatter the plot where the X axis is a date or time variable. So in this graph, we could see that there are several pairs of consecutive success. We do not however, observe a streak of success. Let's create a graph of the outcomes of individual shots for LeBron James for the entire regular season. We will create a sub graph for each game that he played. We need your first subset, a data set that includes only LeBron James shots, we can then graph LeBron James shot outcome separately for each game. We'll use the fast agreed function from the Seaborn Library to separately graph the shots in each individual game. We also use the map function and plt.plot function to produce a graph. We can also set access labels as well. As can be seen from the graph, there appears to be some cases where LeBron James has some consecutive success, but there is not necessarily an obvious pattern, of a stream of success. We can do a similar exercise for the stick six of checked y'all. So Cheick Diallo did not attend as many shots as LeBron James, but there are several games that he did have consecutive success in shots. For example, on April 11th, shot is 17. We will now proceed to regression analysis. Let's try to request a prediction error of the current period on the prediction error of the previous period. First, we'll do a first simple regression where the lack error is the only explanatory variable, we call that to run linear regressions, will use the OLS function from the statsmodels library. We'll use the feet function in the end to show the regression result and we use the print function to show the result in a table. So in this regression result notice that though the estimates on the lack error variable is statistically significant with a p value of course zero, the r square is also zero. This means that our specify linear model is not a good fit for our data at all.