0:00

Principal Components Analysis and the Singular Value

Â Decomposition are a really important techniques, both in

Â the exploratory data analysis phase, and also,

Â in the more kind of formal modeling phase.

Â The techniques I've been, can be used in, very easily in both eight, stages.

Â But I'm going to talk a little bit about kind of,

Â how it's used in the exploratory phase, and I want to

Â talk a little bit about, kind of, what, what, what

Â goes on underneath, and kind of what their underlying basis is.

Â 0:24

So,

Â if you look, suppose we have some matrix data

Â here and I just generated some random normal data

Â here using this code right here, and you see

Â that the matrix that I plotted here using the

Â image function, it's not particularly interesting, it looks pretty

Â noisy and there's no real pattern as we would

Â expect there, to not be, Now, I could just

Â run a hierarchical cluster analysis on this data set.

Â I just, I can do an hierarchical, a cluster

Â analysis on the rows of the data frame or

Â the rows of the matrix and the columns of a matrix.

Â And I can do this using the heatmap function very easily in R.

Â So when I run the heatmap function, you can see that the cluster analysis

Â is done, I get the dendrograms printed on both the columns and the rows.

Â But again, there's no real interesting pattern that emerges, and

Â that's because there's no real interesting pattern underlying in the data.

Â 1:10

So so that's fine.

Â But now, what if we add a pattern to the data set.

Â So let's try to add something and

Â I, I do it with this code here, with this four loop here.

Â So I loop through all the rows, and on a random row I

Â flip a coin, and if it turns out to be a one I just

Â add a pattern, I just, so, so that five of the columns have a

Â mean of zero, and another, and the other five have a mean of three.

Â So, I just kind of add a little shift here across the columns.

Â So now if I plot the data you see that their at the

Â right, on the right hand five columns are a little

Â bit more yellow which means that they have a higher value

Â and in the left hand side five columns which are, which

Â are little bit more red which means they've a lower value.

Â That's because some of the rows have a mean of three in the right hand side

Â 2:04

So now, if I do, if I run a higher, higher group of cluster analysis on

Â the data, you can see that, that the

Â 2:12

the two sets of columns are easily separated out.

Â So you can see that the dendrogram on the top

Â of the matrix which, which is right on the columns.

Â I see it, it has, it clearly splits into two clusters,

Â there's five on the left, and there's five on the right.

Â On the rows, it's not so obvious, because

Â there's no real pattern that goes along the rows.

Â And so that kind of get reorganized into

Â a random pattern and, and that's the picture that emerges from the heat map.

Â 2:39

Now, we can take a look at a closer look at the patterns in the rows

Â and columns by looking at kind of the

Â marginal means of the, of the rows and columns.

Â So for example I can look at ether the ten different column

Â means or I can plot the 40 row means in this matrix.

Â So here with this code, that's exactly what I've done

Â on the left hand plot I've got the original matrix data.

Â 3:10

And and and I've plotted in the middle plot here the mean for each of the rows.

Â If you look, on, on the y-axis I've got the row number which goes from

Â one to 40, so that kind of is roughly parallel with the image on the left.

Â And and on the x-axis I've got the mean of that row.

Â So for example, you can see that for row ten

Â the mean is roughly, you know, minus 0.25 or something like that.

Â And then for row 30 the mean is roughly 1.5.

Â 3:37

And so we see that there's a clear shift in the mean as you go across the rows.

Â Similarly, if you go across the columns, you can see, across the

Â ten columns, there's a clear shift in the mean of each column.

Â So the first ten columns have roughly a mean of zero or close

Â to it and then the next ten columns have

Â roughly a mean of two because, there's a shift there.

Â So using the plots on the, on the, in the middle on the

Â right you can see a clear pattern, in the rows and the columns there.

Â 4:10

So, some related problems that, that so, closer analysis are, is

Â useful for kind of identifying these types of patterns, but we

Â can maybe take a little more, a slightly more formal approach,

Â that kind of takes advantage of the matrix structure of the data.

Â And so the base, there are two kinds of

Â problems you might want to look at and so if

Â you have a lot of variables and we want,

Â we want to create a new set of variables that

Â are uncorrelated and explain as much variance as possible.

Â So the idea is that we have a lot of different variables.

Â Suppose we have hundreds or maybe thousands

Â or tens of thousands of variables in

Â our data set and the idea is

Â that they're not all independent measurements of something.

Â Right?

Â So a lot of them will be related to each other.

Â They will be correlated with each other.

Â So for example, you'll have two measurements

Â that are like height and weight and so

Â those will obviously be related to each

Â other and so they're not all independent kind

Â of like factors.

Â And you see the idea is that we want to create a set of variables that is smaller

Â than the original set of variables that we

Â have and that are all uncorrelated with each other.

Â So that they kind of represent different types of variation in your data set.

Â And similarly, we want this reduced set of variables to explain

Â as much of the variability in your data set, as possible.

Â 5:25

So another related problem is that if you put all the variables together

Â in one matrix so like the matrix that we showed in the image.

Â You want to find the best matrix that's created with

Â fewer variables, the and, but still explains the original data.

Â The idea is if you, the more technical term is that you want to find

Â a lower rank matrix and that somehow

Â explains the, the original data reasonably well.

Â So the first goal here is a statistical one and it's,

Â it's a common problem that's solved by

Â the method of, of principle components analysis.

Â And the second goal here is more of a kind of data compression problem,

Â where you want to find a, a kind

Â of smaller representation of the original data.

Â And one way to think about that problem is with the singular value decomposition.

Â 6:10

So the singular value decomposition can be written in mathematical terms, in matrix

Â terms as if you have a matrix x where each, the, all, we can think of each

Â column in this matrix as a variable or a, a measurement And each row of

Â the matrix as an observation, so you might

Â have many, many observations for a given metric.

Â So for example, the rows of your matrix

Â might represent individual people, and each column would represent

Â a measurement on those people.

Â So for example, the first column might be the

Â height, and the second column might be the weight.

Â 6:41

So then the idea is that if you have a

Â matrix x, that's formatted in this way then the singular value

Â decomposition or the SVD is a matrix decomposition that can,

Â where it, which decomposses the original matrix into three separate matrixes.

Â One is U, one is called D and the other's called V.

Â 6:59

And so they, the column of U are orthogonal

Â so they're, ind, they're kind of independent of each other.

Â They're called the left singular vectors and the columns of

Â V are also orthogonal and they're called right singular vectors.

Â And then D is a diagonal matrix which contains the singular values.

Â So that's the basic idea of the singular valued composition.

Â We'll talk about these components a little bit later on.

Â Principle components analysis, also usually known as PCA,

Â is related, uses the single valued composition as a related technique

Â And the basic idea is that if you were to take

Â the original data matrix, and subtract the mean of each column

Â from each, so subtract each, the column mean from each column.

Â And divide by the column standard deviation, and then, and

Â then run a SVD on that kind of re-normalized matrix.

Â The principle components would be equal to the right singular values

Â