the column means rather than subtracting the row means.

Then I have a data set that's centered by column instead of centered by row,

and so then I can calculate the singular value decomposition on that.

[NOISE] And when I do that and then I plot the first principal

component versus the first singular vector from the column center data,

I actually get that they're identical to each other.

And so basically what's going on is that if you column center the data then do SVD,

you get exactly the principal components because the principal components

are calculating something about the variability between the columns when

they're doing that.

And so you can get PCs and

SVDs that actually compute the exact same thing if you do the centering right.

One thing to keep in mind is that outliers can really drive these decompositions,

so to illustrate that I'm going to just take our edata centered,

I'm going to assign it to the new variable edata outlier.

And then I'm going to make one of those values really outlined, so

I'm going to take this sixth gene and I'm going to multiply it by 10,000.

So this is now a very outlined gene of very high values.

So now I'm going to apply the SVD to the outlying dataset and

if I plot the original version of this decomposition where I did this

SVD on the dataset without this outlier versus the dataset with the outlier,

so then I can see that [NOISE] they don't match each other anymore.

You sort of don't see that the two data sets don't necessarily match

in terms of their singular value decomposition, but

you can definitely see that the singular value, or singular vector for

the composition with the outlier reflects that outlier quite accurately.

And so if you plot the first singular vector from this

new decomposition with the outlier verses the outlying value itself you can see that

they're very highly correlated with each other.

So what's happening is the decomposition is looking for patterns of variation well

if one gene is way higher expressed and on the other ones, then it's going to drive

most of the variation in the data set and so it'll be very correlated with it.

So you have to be careful when using these decompositions to make sure that you

pick the centering and scaling so that all of the different measurements for

all of the different features are on a common scale.