So one important thing to know is that inferences on the model parameters or the beta slopes depend on getting the model for other parameters right. And in some cases, this really includes things that you should have included in the model but didn't. So the interpretation of the parameters, the slopes, is always model-dependent. So here's a tricky example. I hope you'll agree that there's a positive relationship here between predictor and outcome. Looks good, right? However, what I didn't tell you is that this data has subgroups. And now when I look within subgroups, green and red, there's a negative relationship within each subgroup. This is something that really does happen in practice. So, which one is right? Well, they're both right in a sense, across the whole population, there's a positive relationship between X1 and X2. However, that doesn't mean that if I manipulated X1, changed X1 somehow I could change Y in that direction. In fact that might have the opposite effect. And that's because the increase overall in Y with X1 is due to a group, green, which is high in both. So here, X1, my predictor, is collinear with the group. Green versus red. And that's what's causing this problem. And so it might be more correct to infer that within a group the relationship is negative. So one way to look at collinearity is by looking at variance inflation factors. And variance inflation factors among other metrics have emerged as my favourite really simple way of looking at a design matrix and understanding what some of the problems might be. So [COUGH] the idea is you can calculate Variance Inflation Factors for each regressor in your design matrix and this is the increase in the error variance that's due to design multicollinearity. So for example, if you have a VIF of 2, That means that the error variance will be doubled. Error VIF of 5, error variance is five times higher than it would be, which obviously you don't want. So how do we calculate it? Well, for a design matrix, X with columns I equals 1 to capital I there, the variance inflation factor for each column is one minus one over the variance explained by the other predictors. So this connects a series of progressions for each column affects the outcome and then the predictors are the other regressors. And that's going to get me the VIF. So let's look at an example. [COUGH] Here's an event related design with four event types. We can see the four model regressors there. And we can look at this in a matrix form. This is the heat map of the four regressors. Now time is going down. And it's the plot of the same design matrix. So let's look at the Variance Inflation Factors here in this random event related design. And you can see them here. They're the orange dots. And they're all pretty close to 1. And what I've done here is marked off a level of 2 in blue, 4 in green, 8 in red. And they're pretty close to 1 which means that the regressors are essentially orthogonal, or very, very close to orthogonal. And that's optimal in this sense. So here's some properties of Variance Inflation Factors. It's estimated for each column in the design matrix, so some columns may have high Variance Inflation Factors, others low Variance Inflation Factors. So I can have a multiple collinearity problem only with the sub-space of the design. Adding nuisance regressors, for example, head movement parameters and other physiological noise parameters to a FRMI design matrix might increase the VIFs for some regressors more than others. And we'd like to know what's the increase in Variance Inflation Factor when I include those nuisance regressors in the model. That can give me a clues about task prohibited head movement and physiological artifacts. An important point is that pairwise correlations between the predictors are not enough to assess multicollinearity. I'll show you an example of that, and that's because the multicollinearity problem doesn't depend of whether it's correlated with a regressor, it's correlated with any single other regressor, but with any combination of those regressors. So it may not be obvious. So here's a design matrix. I've taken the exact same design as before the four regressors, and I've added a new regressor. Now if you look at that it looks reasonable, I think, right? And here are the pairwise correlations between them. So the fifth column shows some correlated some correlations with the other columns, but it's still estimable, so you might think you have a reasonable shot at estimating this design matrix. However, you would be wrong. And the reason that you'd be wrong is that the new regressor that I created is a perfect linear combination of two of the original regressors. It's just the first one minus the second one. So this model, there's no unique solution for those betas at all. So now, think about this. Which variance inflation factors will be effected by this? And which of these model perimeters are not uniquely estimal? [LAUGH] Well here's the answer. Here's a plot of the Variance Inflation Factors again in the full design. And i've used the red bars where the Variance Inflation Factors are nearly infinite, they go up to infinity. So these are not estimable at all, but look at model primers three and four. They're just as they were before. There's no effect on those. Why is that? Because I've taken predictors one and two that combine that to risk predictor five. So one two and five are all collinear. I can estimate any of them uniquely, but three and four are just fine. So how are present factors for that into correlations in the design matrix our values and power? So this is a plot of assimilation that shows you some of these relationships. As you can the left here, for sample sizes of 50, 100, and 500 they're all the same. As the predictor correlation goes up above 0.8, 0.9 and the Variance Inflation Factor hits 0.5 and above. So correlation point nine is a course sponsored Variance Inflation Factor of close to 10. And a correlation between predictors of 0.8 corresponds to a Variance Inflation Factor of about 5. And now on the right, we'll look at power. This is our ability to detect a true effect if it exists at 0.05 uncorrected for the different sample sizes. So obviously here power depends on the sample size as well. And for 50 subjects, it's relatively low and it drops to 0 as the Variance Inflation Factor increases. As they move up the 500 subjects, power starts out very hight, but even then, as you can see, with 500 subjects and this is effect size of co DH1 and strong effect size, then power drops to quite low levels as the Variance Inflation Factor goes up. So these are smooth curves. There's no hard and task rule for too high and how high it needs to be. How high is too high depends on your study design, your sample size and also the goals. In some cases, the whole goal of the study is to disentangle some correlative predictors and, so some degree of correlation is inevitable. But if the correlations are very strong, then you're not going to be able to get the right answer no matter how large the sample. And here again, the P values can be misleading because of multicollinearity with high Variance Inflation Factors then the effects can flip-flop from significant positive to significant negative. So here's a take on one regression. First, on multicollinearity. It's important to check for multicollinearity and it's easy to do. Look at your design matrix visually. Look at the pairwise correlations. But also look at the Variance Inflation Factors because they're giving you some unique information. And some take-homes on interpreting P-values and making inferences. So the P-values and their corresponding effect sizes, T-values, and Z-values are only valid if the GLM assumptions hold. And we went through a number of those. And secondly, a predictor with a significant fit doesn't mean that the predictor is the right model. Just because the predictor explains some of the variants, it doesn't mean it explains more variants than all the other possible models out there, which is a really important point. So, just because it fits doesn't, it doesn't mean it's the right model, it just means it explains some of the variance. And third, variables that you haven't modeled may actually be causing effects in your data and confounding effects that you're observing. So this is something to keep in mind and to think through whenever you think through the specifics of your study. That's the end of this module, thanks for tuning in. [SOUND]