In the previous topics, we only talked about a single variable, but in real life, we may be interested in the association of two or multiple random variables. For example, is there any association or pattern between the stock price change of tomorrow and the number of full days in the last five days? This question is interesting and valuable because stock traders can use this pattern to buy and sell stocks. In this video, we will discuss how to measure the strengths of association between two random variables. Here's an example of housing price. This data concerns, housing values in suburbs of Boston. LSTAT is the percentage of the population classified as low status. INDUS is the proportion of non-retail business acres per town. NOX is the nitric oxide concentrations. RM is the average number of rooms per dwelling. MEDV is the median value of the owner-occupied homes in $1000. In statistics, we use covariance to measure the association between two variables. Here are the formulas to calculate covariance. Similar to sample variance, sample covariance is also divided by the degrees of freedom. You may be interested in the association between each pair variable. We can check that by using method cov of data frame. We can find that from the values, we cannot tell which pair has a stronger association. Indeed, the covariance is also affected by variance of two random variables. We need to factorize it out in order to get a measure only for the strength of association, which is the coefficient of correlation. As you can see in the formula of correlation, the covariance is divided by the standard deviation of both variables. Now, the correlation will only take values in between negative one and one, no matter what are the variation of the two variables. This is a case when there is no correlation. XY pairs on scatter plot looks like a purely random pattern. This is the case why correlation is a positive and very close to one. In this case, X and Y have strong positive correlation. As X increases, Y is more likely to increase. This is the case when correlation almost equal to negative one. Hence, X and Y have strong association, different from positive one, where X increases, Y is more likely to decrease. We also can apply method correlation of data frame. You can find that diagonal elements are one. This is because correlation with itself must be perfectly correlated. In general, there might exist non-linear pattern between variables. Covariance and correlation can only address linear pattern. There are quite a lot of quantitative measure for non-linear association. In this course, we will only use a scatter plot to detect a possible non-linear pattern. This can be done using method scatter matrix, which is imported from tools.plotting of pandas. Scatter matrix is a matrix of scatter plots for each pair of random variables. The histograms in diagonal positions are those of variables. We can find that from this scatter matrix RM seems to have a very strong linear pattern with MEDV, median value of house. Scatter plot or correlation can only find out the association between two variables. It cannot find the association between one variable and other multiple variables. To achieve that, we need multiple linear regression. To illustrate linear regression models more clearly, we will start with simple linear regression, which is a formula or equation built between two variables. We will find this relationship between RM and MEDV, which looks to have a strong association from this plot. We hope we can use RM to predict MEDV so that we can make use of historical data and enhance their value by applying the model in practice.