There are a lot of factors that may also influence a success of a shot. For example, the player's own skill as a shooter, the type of the shot, the atmosphere of the stadium, whether it is a home game or an away game, and whether it is at the beginning of the game or towards the end of the game. Let's also add these control variables in our regression and see if there's any improvement in the overall goodness of the fit. We can see that the R-squared is now increased to 0.014, which is still very small. The estimates on the lag error variable is statistically significant but the magnitude is at negative 0.0082, which is still extremely small. Since it's negative, it means that the success of the previous shot would have the chance of the subsequent shots. This is contrary to what the hot-hand predicts. As we have seen, some players had a lot of shots per game, while some had just a few. Different players have different variations in their success rates in the shots. This is likely to create a heteroscedasticity problem in econometric. We can use weighted least squares to address this issue. Weighted least squares is an estimation technique that weights the observations proportional to the reciprocal of the error variance for the observation. Therefore overcomes the issue of non-constant variance. We can use sm.wls command to run the weighted least square regression. We want to weight by the number of shots per game. We specify weights equal to one divided by shot per game. Compared to the simple linear regression, there's a small improvement in the R-square, but the magnitude of the lag error term is still very small. Compared to the simple linear regression, there's a small improvement in the R-square, but the R-square is still extremely small. Additionally, the magnitude of the estimates on the lag error variable, it's negative and it's very small. What is the problem here? Does it mean that we can confidently reject hot-hand. Note from our summary statistics, some players exhibit some stream of success while some don't. In a true regressions that we have performed, we are grouping all the players together. Let's see if we can find any evidence of hot-hand if we look at individual players. Let's try running a similar regression on LeBron James Chatelet data. In this regression, we can leave out the player's position variable since there's only one position for LeBron. The R-square increased to 0.052, but the estimates on the lag error variable, it's negative 0.011. It is not statistically significant with a p-value of 0.694. Again, we find no evidence of hot-hand for LeBron James. Similarly, we can run a weighted least square estimation on the LeBron James' prediction error, weighted by the number of shots in each game that he played. The weighted least squares estimation has a slight improvement in R-square, but the estimated coefficient on lag error is not statistically significant and it's negative. We can now take a look back at LeBron James' autocorrelation coefficient. The autocorrelation coefficient, it's 0.02, which is very small and it's very close to the weighted least square estimate. Let's run a similar regression on James Jones, who according to our autocorrelation coefficient analysis from the last lecture, is most likely to have hot-hand. We start with a linear regression on the prediction error. Notice that the estimated coefficient on the lag error is much bigger but it is still not statistically significant with a p-value of 0.186. The R-squared is 0.114 compared to the regression on LeBron James statistics. This is a slightly better fit. Next we'll run a weighted least squares estimation on James Jones statistics. Interestingly, while the R-squared in the weighted least square estimation has a bigger improvement to 0.173, the lag error estimates is not statistically significant. It is substantially smaller compared to the linear regression. Therefore, we still do not have evidence to support a hot hand for James Jones. More generally, we can define functions to run regressions. We can then use this functions to run regressions on individual players. After we define this function, we can run a regression on a player, Russell Westbrook. Similarly we can define a function to run a weighted least square regression. Once we have defined this function, we can also easily run this regression on player, let's try Russell Westbrook again. Lastly, we can extract the estimated coefficient on the lag error variable for all the players. We need to first create a list of unique player names. Now we have created a list of player names. For example, the first player on the list, which is index 0 is A. J. Hammons. Next we will run regression for each player by specifying the shoot player equals the playlist with the index number. Now we can extract the coefficients on the lag error variable along with the p-value and t-statistics of the estimates. In the next step, we will write a loop to extract regression output for each player. We will then write another loop to build a DataFrame to store the regression output for all the players. Lastly, let's merge the total number of shots captured in the players shots DataFrame to the regression results DataFrame. This total number of shots indicate the sample size of the regressions. We display players that have a statistically significance estimate on the lag error variable. We define statistical significance as p-value less than or equal to 0.05. In our data set, there are a total of 37 players with statistically significant estimate on the lagged error variable, that is, the success of their previous shot impact the success rates of their current shots. Interestingly, more than half of these estimates are negative, which means that a success in the previous shot actually hurts the chance of scoring in the current shots. This is the opposite of hot hand. However, this could also mean that when a player scores, more defensive pressure may be put on the player and therefore hurt his chance of scoring in the next attempt.