We've looked to the NBA and the English Premier League and the relationship between win percentage and team performance and relative salary spending. Now let's take a look at Major League Baseball. So we're going to do exactly the same thing as we did with the previous two. We're going to posit the same models and then test for the same possibilities of omitted variable bias and heterogeneity, and then see what our results imply. As usual, we load the packages that we need and then import the data that we need. Let's describe the data. So here we have 918 seasons of team data running from 1985-2016. So I share a lot longer run a data than we've had for the other leagues. Then we look at the information, what type of objects we have in our data. We have a sense of what we've got. As before, let's look at the total salary spending in each of these seasons, and you can see here, we've got 32 seasons of data, and we can see that salaries rose dramatically over this period. So in 1985, the total salary spending in Major League Baseball was $262 million and by 2016, that had arisen to $3,750 million. So as with the other leagues, we see significant salary inflation, which is a good argument for saying we should think in terms of relative salaries in each season determining performance rather than the absolute level, it's how much more you spend on your rivals that will determine whether you're likely to win. So we merge the Sumsal, the total salaries in the season data back into our original dataset, and then we now create this relsal variable, which is salaries divided by the salary for the whole season. We can look at the data here. Now again, let's run red plot and see what it looks like. Again, this looks much more like the data that we saw for the NBA where we see quite a large scattered dots, but we see also a significant positive relationship between salary and performance. What are the problems with this plot is that the dots are very large. It's hard to see the difference between each dot, they run into each other in the middle of the data. But we can fix that quite easily by changing the size of the dot in the scatter diagram, and then we do that. So with smaller dots, we can see here much more clearly what the relationship is going to be. With that done, let's run our regression and see what we get. This is a little bit more like the NBA. But as with both cases though, we find relsal is highly significant. The coefficient is much larger than the standard error. The p-value is 0.000. But you can see up here the R-squared is relatively small, only about 13 percent of the variation of performance is accounted for. Now, again, it's important to point out if you want to explain all of the variation in performance of teams in major league baseball, and R-squared of 0.134 is, doesn't sound very promising. But on the other hand, if you're thinking of what's an important variable in determining outcomes, something that accounts for 13 percent of the total, that's actually quite significant. It's hard to think of other variables, they're going to add more to your explanation than that just on their own. Anyway, let's go on and do what we did before, which is adding the lag dependent variable. So we'll sort the data, show the data here, and then use the dot shift, a command to create the lag dependent variable, and now run our regression again with a lag dependent variable included in the data. So you can see here that adding the lag dependent variable, as in all the previous cases, has led to a fall in the value of relsal so it's around two before and now it's actually half, it's only one, just over one. But on the other hand, it's still statistically significant. The t-statistic is 5.6, the p-value is 0.000. In that sense, omitting the lag dependent variable led to an overstatement of the impact of relsal, but it is still a significant variable even when we include the emitted variable. So in that sense, we still have a plausible relationship between salary spending and league performance. But let's again do what we did in the previous cases and add in fixed effects for each of the teams and see what that does. When we add in the fixed effects, we get this regression here, and what we can see is that relsal has hardly changed to 2 with the addition of the fixed spec. It's fallen rate was just above 1 and now it's just below 0.9, but it's still statistically significant, and roughly speaking, its value has not changed very much. Notice overall that many of these fixed effects are insignificant, so actually in individually they don't seem to be adding very much to the analysis for each individual team, but the R-squared has gone up just slightly. So whether you think that this is useful to add in the fixed effects or not. Perhaps there isn't as much heterogeneity in baseball as there is in the other sports. But certainly what it doesn't do is change our perception of the significance of relsal very much at all. Let's conclude then by seeing how relsal affects the win percentage of teams in our data. So one way to work it out is to solve the underlying equation that we've estimated. So our equation, ignoring the fixed effects for a minute, our regression model suggested that win percentage in any one season t is equal to 0.284 plus 0.361 times the win percentage in the previous season, plus 1.026 times relsal in year t. So that's got win percentage on the left-hand side and relsal on the right-hand side. But you can now rearrange this regression to figure out what the value of a win percentage is based on different values of relsal, but taking the win percentage in t minus 1 out of the equation. One way to think about that is, in the long run win percentage in t and t minus 1 should be roughly the same, in some sense the team should be in a steady state. So if we impose that restriction, we get a new equation which we write down here. So we just say that win percentage is equal to some function of relsal, and we can calculate those numbers. Here's an example taking values of relsal going from 0.01, which is close to the lowest, to 0.06, which is close to the highest. If we calculate, what would that mean for win percentage? Well then for the very low relative salary spending, you get a win percent of 0.46. For the middle ranking win percentage, you get almost exactly a 500 performance, and then for the high-value of relsal you get 0.54 win percentage. So salary matters. It's going to improve your performance. You might say that it's not going to guarantee performance is much less powerful, say, than we saw in the case of the English Premier League, but nonetheless, you can see the significant effect of relsal in the data. So we've looked at three leagues now. So we're going to conclude just by looking at one more set of data, that's the data for the NHL, and use the same approach and methodology to see what the story is for that league.