To demonstrate how to request a correlation coefficient in SAS, let's go back to the scatter plots we created for some of the gap minder variables. We use these scatter plots when visualizing the association between two quantitative variables. The first scatter plot shows the rate of Internet users by the rate of the country's population living in urban settings. The second shows the rate of Internet users by income per person. From looking at scatter plots, we can guess the associations are positive. That is, a higher Internet use rate is associated with both higher urban rates and greater income. Now let's find the correlation coefficient. To do this in SAS, we only need to add one line of syntax to the program. PROC CORR;, the statement VAR, followed by the variables we would like to examine in terms of their correlation with one another. Even though we're going to be looking at two different correlations, that between Internet use and urban rate, and between Internet use and income per person. We can actually list these together. After ending the list with a semicolon, hit Save and Run, and then SAS will give us the following output. To locate the correlation coefficients of interest and the associated p values, we need to examine the Pearson Correlation Coefficient table here, and find the row and column where our two variables of interest intersect. For the association between urbanrate and internetuserate, the correlation coefficient is approximately 0.61 with a p-value of 0.0001. This tells us that the relationship is statistically significant. For the association between incomeperperson and internetuserate, the correlation coefficient is approximately 0.75 and also has a significant p-value. Now we can actually interpret the scatter plots and the coefficients together. The association between internetuserate and income is fairly strong and it's also positive, as the scatter plot had already shown us. The association between internetuserate and urbanrate is also positive but slightly more modest at 0.61. Both are statistically significant. That is, for both associations, it's highly unlikely that a relationship of this magnitude would be due to chance alone. Here's some good news. Post hoc tests are not necessary when conducting Pearson correlation. Post hoc tests are needed only when your research question includes a categorical explanatory variable with more than two levels. Because our explanatory variable and the context of correlation coefficient is quantitative, there's never a need to perform a post hoc test. Another interesting and useful aspect of the correlation coefficient is if we square the correlation coefficient. That is, we multiply it by itself, we get a value that also helps our understanding of the association between the two quantitative variables. Small r squared is the fraction of the variability of one variable that can be predicted by the other. For example, when looking at the relationship between urban rate and Internet use rate, if we square our correlation coefficient of 0.61, we get 0.37. This could be interpreted the following way. If we know the urban rate, we can predict 37% of the variability we will see in the rate of Internet use. Of course, that also means that 63% of the variability is unaccounted for. If we square the correlation coefficient for income per person and Internet use rate we get a value of 0.56. This suggests, if we know income per person, we can predict 56% of the variability we'll see in the rate of Internet use. This is a little bit more impressive, because we can predict over half the variability. Again, correlation coefficients are commonly denoted with a lowercase r, and they're squared to determine the amount of variability that can be predicted. [MUSIC] You might be wondering how much variability in Internet use rates can be predicted if we consider both urban rate and income per person. A multivariate inferential tool called multiple regression can be used to answer this question and we'll discuss that in the future.