Hi everybody. This lecture will cover unexplained correlation structures, which is something we have discussed a lot in examples throughout the course already. We will review the concepts we've talked about in previous lectures. We'll talk about standard deviation and variance, and we'll go into correlation structure and how it influences power and sample size analysis, which is really what we care about for this course. We'll talk about something called compound symmetry, and how clusters result compound symmetry within correlation structures of our designs. Finally, we'll talk about how multivariate designs can influence correlation structure, and describe different types of correlation that multivariate designs can induce. Remember that, to do power and sample size analysis accurately, the correlation structure that we expect in the study has to be well summarized and well characterized. We will use the term correlation structure to summarize the correlation that exists between observations in the study. We're just looking to understand the correlation between test scores of students in the same school. Sticking to our school example we discussed many times or students in the same school for that matter. If we take measurements from children in the same school, a zero correlation? What is it? What if we pick schools in the same district versus another school district, is there a correlation there? What is the correlation? It tends to get weaker as you go up the hierarchy, but we need to be able to characterize all the different correlations that might have an impact on a study based on the design we are using. So we're going to take a look at correlation structures that result from both clustering and repeated measures we have discussed as a model of multi-level and longitudinal studies. It's important to appreciate that it's critical to achieve as good of a specification and characterization of the correlation structure as we can in order to get a power and sample size estimate. If we have a correlation structure well specified, we can get accurate power sample size analysis. If we don't have a good specification for the correlation structure however, we're going to get wrong sample sizes.j Could be too large, could be too small, which raises ethical dilemmas in issues we would like to avoid. Correlation is simply a measure of association, a relationship between two variables. The definition from Rosner tells us, correlation indicates two things about association between variables, the strength and the direction. Correlations can range from negative one to one. Values close to negative one represent very strong negative correlation, and those close to positive one represent very strong positive correlation. As you get closer to zero, you have weaker correlations. Positive correlations between zero and positive one mean that two variables change in the same direction. For example, as height increases on average, weight increases as well. Negative correlations between zero and negative one mean that two variables change in opposite directions. For example, as the number of hours studying increases on average, the number of hours sleeping may decrease. A zero correlation indicates that the two variables are uncorrelated or unrelated. This brings us the idea in statistics of independence. We talked about two variables being independent. If they are independent, their correlation will always be zero. You can technically have zero correlation and not necessarily have completely independent variables. But for the purposes of this course, we're going to assume that if we have a zero correlation, it implies independence. The value of r square, which is correlation squared, tells us how much variability in your dependent variable is based on the statistical model at hand. If you have a perfect positive or negative correlations such as positive one or negative one, then the r square is one, telling us the model is a perfect predictor of the dependent variable and all the variability of the model as explained. If you have a correlation of zero, r squared would be zero, telling us that none of the variability in the dependent variable is explained by the model. This table is called a correlation matrix. It's really concise, shorthand way of summarizing the patterns of correlation we might see in a set of data. In this matrix, we have four variables. You can see these represented by columns and row headings. The way it works is, if you follow the matrix across and down, you can see the correlation between any two variables. For example, if you want to know the correlation between variable two and three, start at the row labeled variable two and follow it over to column three. You can see there's a weak positive correlation of 0.1. So what did you notice about this matrix? Along the diagonal, from the top left to the bottom right, we see positive one. This is because the correlation between a variable and itself is always positive one. What else do you notice? Do you notice anything off the diagonal line we have highlighted? You may see that the matrix is symmetrical about the diagonal line with various values 0.8 so positive and pretty strong, 0.5 so positive but moderate, and 0.1 positive and on the weaker side. This is a characteristic of correlation matrices. This is a result of the same variables being in the same order in rows and columns. Here, you can see co-variance and correlation are related to each other using the standard deviations of an independent and dependent variable. Although we try not to use too many equations in this course, this particular relationship is important to understand. Knowing all of this allows you to convert correlation and standard deviations to co-variances and variances. We were going to talk about correlation structures as they result from clustering and repeated measures which are features of multilevel in longitudinal studies. Let's start with clustering. Clustering at levels results in shared experiences which induces correlation structure called compound symmetry. This means that if we choose any two observations within the cluster, they have the same correlation. Going into more detail, compound symmetric correlation structures, all of the independent sampling units have the same variance. So Cluster 1 has to have the same variability within it as another cluster. Along with this, the correlation between any two units within the cluster is the same no matter which two are chosen. Any and all randomly chosen pairs are going to have the same correlation. Here's.what compound symmetric correlation structure looks like in a correlation matrix. Again, we have a positive one along the diagonal. But then the same value of 0.5 can be found between all other pairs. So unequal moderately strong positive correlation between variables. Also in this table on the right, you can see that variance is constant across the clusters. There's four clusters or independent sampling units and they have the variance of 0.3. Variance is not limited to the number between zero and one. Variance can be much bigger than one, but will always be zero or greater. So a synopsis compound symmetry results in equal correlation between any pairs of units and any equal variance among independent sampling units. Now let's review correlation structures that are reduced by repeated measures. In longitudinal studies, it's very rare to find compounds symmetric structures being a good fit to test data that are collected over time. In this way, the correlation structures over repeated measures are typically different from those within clusters. So what kind of patterns do they have? Well, we know measurements from the same person taking that two or more times are going to be correlated. By the way, when we mentioned repeated measures there's really going to be a general term because they have to involve different things being measured at all versus longitudinal measures which is a different design that specifically involves time. One thing to note is that variability of observations change over time. For example, if you're testing patients who have a disease and its processes, the variability in the measurements in an outcome observed at time one or at baseline may be lower than later on as the disease progresses. It would make sense that in that case measurements might get more erratic which will result in more variability over time. Similarly, correlation between observations on independent sampling units will vary over time. It's usually the case that observations are very close to time are going to be strongly correlated than observations that are further apart in time. In other words, correlations tend to decrease over time. It may even stay constant for a while then decay. It could have a lot of different patterns. We would like to see correlation structures that reflect this decrease over time. One way to reflect these correlation values decreasing over time is the LEAR model. It has a base correlation in a decay parameter to account for the correlation decreasing over time. Here's an example of a correlation matrix using LEAR with the base correlation of 0.7 and decay parameter of 0.4. We can also have spatial repeated measures. In this case, we're talking about spatial structures rather than time-oriented structures. For example, if you study cancer incidents or mortality and you look at incidents by zip code or a country, you might see a lot of similarity in particular areas because of environmental exposures. But then as you move away from one area, things start to look quite different as there is less environmental influence. These spatial repeated measures have some complex correlation structures. Because of this, you're going to be looking at spatial repeated measures. It's good to have previous published data or experts in the field to talk to as sources to help you predict this outcome of correlation structures. Just like with time-oriented repeated measures, as measurements move further away from the starting point, more distance now instead of time, correlation tends to decrease. If we were dealing with multivariate repeated measurements, then we also have some potentially complex correlation structure. Multivariate refers to having multiple response variables so it would be a repeated measures on more than one dependent variable. For example, if we were measuring height, weight, head size, and weight size of people in the study, we would take repeated measures of those over time. These could have correlation structures that are potentially complex. Once again, looking at the literature that is already out there as a good way to find potentially complex correlation structures for multivariate repeated measures. We can also do very good job of capturing complex correlation structures with software that's available. This is something you will learn about later in this course. But just know that there is software that simplifies the process of coming up with these complex structures by taking information about a specific design. Let's do a quick review of the correlation concepts we have talked about. We need to understand and describe the correlation structure of observations to conduct power and sample size analysis. Remember, correlation structure summarizes the correlation between pair of observations. In levels like what we see with multi-level studies, the clustering results in correlation structures called compound symmetry. This occurs when all independent sampling units have the same standard deviation and when no correlation between any two independent sampling units is the same no matter which two are chosen. In longitudinal studies, there's correlation induced by measurements from the same independent sampling unit. This is a different correlation structure than we see associated with clustering as we do not typically see compounds symmetric patterns associated with longitudinal measures. That wraps up our electron correlation. Thank you for your time.