The reason we cut if off at three principle components is, again,
remember that the eigenvalues tell us the variance in the dataset.
At the end of keeping three principle components,
we have already captured over 97% of the variance in the dataset.
So we decided that that was good enough.
Of course, if that's not good enough,
you can include higher order principal components core.
So if you go to four principal components instead of three you would
keep about 97.8% of the variance and so on and so forth.
So one gets this additional information from PCA which is usually
called a screen plot, and this plot tells us,
gives us objective guidance on way to truncate the principal components.
Now let's look at another set of examples.
In the previous set of examples,
the microstructures were obtained from actual experiments.
But one can also think of generating a very large set of synthetic
microstructures where you are just digitally making them up.
So for example, if one were to think of a matrix-precipitates system, two phases, so
in this case, the matrix is shown in black and the precipitates is shown in white.
One can think of making many, many classes of distributions.
In this particular one, we're only focused on four classes.
And one can also think of many shapes of conclusions and
one can think of many volume fractions of interest.
In this particular case study, we generated about 900 structures.
Of course you can generate a lot more but this was a particular example, a case
study, and that case study is described in this paper for further information.
So, nevertheless, in these 900 microstructures there's already
a rich distribution of Inclusion shapes,
placement of inclusions as well as volume fractions.
If you take all these 900 microstructures and
throw them into this protocol that we have been learning in this class.
That is first compute the two point set of sticks and
then do the principle component analysis.
So the principle component analysis we discussed in the previous lessons
applied to this 900 microstructures uses these plots.
The plots in the top row are projections so
the principal component analysis and two sets of axis at a time.
So the first plot is showing you PC1 PC2.
The second plot is showing you PC1 PC3.
And the third plot is showing you PC2 PC3.
So, it is actually the same plot.
The original plot is a 3D plot that contains all 3 scores PC1, PC2, PC3, but
what you're seeing projections, selected projections of this 3 dimensional plot.
And right away, you can see that the five pluses of placement of
the precipitate naturally lead to clustering.
Five different clusters in the principle receipt blocks.
Again, this comes out naturally.
To the PC analysis, we did not tag the microstructures, indicating that
some of these were random, horizontal, vertical, or clustered, or whatever it is.
This information was not provided to the principle component analysis.
In spite of not having that information, the data gets automatically clustered.
One of the benefits of doing the principal components is simply that we get three
principal components.
Whereas the original dataset, any one of this dataset, has 80X80 pixels that
means the original dimensionality of the microstructure is 6400.
So from 6400 you went on 3 principal components and yet the microstructures
have clustered as expected even though we did not provide that information.
Now what are we actually capturing in the principle compliment?
Here are some of the plots of what the average looks like.
This is average for all the 900 microstructures for
all the statistics of all the 900 microstructures.
This is a map of the first principle compliment, second principle compliment,
and third principle compliment.
So each one of these blocks is capturing a particular special pattern.
And this is some sort of a signature pattern.
And the PC score associated with each compliment,
then tells you how strong is this feature in the given microstructure.
As an example, if one looks at the version to Autocorrelation of
one of the 900 microstructures this is what you get.
This is the version Autocorrelation, and the symbol would be this one.
In the truncated principle component representation we are approximating this
using these stuffs.
Of course, there are other terms in the principal component analysis,
but we're ignoring that.
What this is, the pattern represented
by 5,1r, has this much trend, In this particular micrograph.
And likewise, the pattern represented by P2r has
this much strength in this micrograph, so on, so forth.
So the advantage of this visual content analysis representation of presented
presentation, is that the microstructure is represented by these three numbers.
These three numbers are the, weights of the different principle components.
A different microstructure in the same ensemble would have three other numbers
but every microstructure in that ensemble of 900 microstructures.
Now has three distinct, three numbers, a set of three numbers that points to it.
So thereon the hypothesis and our hope is that, this
representation is what we need to make connections to the properties and process.
In summary, we have learned in this lesson that application of PCA on spatial
statistics offers unsupervised classification of material structure.
Although we didn't explicitly state it all the algorithms that are used in
the analysis of the example from in this lesson are very broadly available.
And as a specific example they're easily accessible through
the by pymks.org code repository.
There also other open access,
open source repositories provide similar functionality, of course.
The PCA analysis also allows objective quantification of variance within
the microstructure ensembles.
Because the calculations are very cheap and computationally very efficient.
They can be attach to almost any in line analytics in term of especially
useful in wood expensive experiments as well as in expensive simulations.
Thank you.
[MUSIC]