0:00

[SOUND] Welcome back to Part II of the lesson.

Â In this lesson, we're going to look at an application of the data science approach

Â to extracting structure-property linkages.

Â So now on to Step 3.

Â [COUGH] In this step, we're trying to connect the principle components

Â we established in step two to the properties we established in step one.

Â We're going to take a very simple approach, and we're going to try and

Â connect these variables through a very simple polynomial expression.

Â In this expression, p1 is principle score 1.

Â p2 is principle score 2.

Â The table here shows the expressions we actually used.

Â The property of interest, let's say,

Â is the yield point, the effective yield point, and for

Â this property of interest, it turns out that we only need to use three terms.

Â A constant term, a linear term in the principle component 1, and

Â a term that has principle component 1 times principle component 2, so

Â it's something like this term.

Â 1:09

Whereas if you look at another property,

Â like the localization propensity, turns out we need to use a lot more terms.

Â The decision on how many terms we use for

Â each case really depends on the error in the linkage,

Â and more importantly, the cross validation error.

Â Again, we discussed leave-one-out cross validation in a previous lesson.

Â 1:34

And to remind you, what it does is, it tells us when we have an over fit.

Â In other words, if we use too many terms here, you'll always get a lower error, but

Â sometimes there is a danger that we actually get much higher leave-one-out

Â cross validation, which basically essentially tells you that we will fit.

Â As a particular example in this case, we notice that this particular choice

Â of number of principle components 5 and number of polynomial degree 3,

Â in this particular case, gives us the best leave-one-out cross validation error.

Â 2:11

In general, as the number of components and

Â the polynomial degree go up, the error always goes down.

Â So you always want to get as many as you want.

Â But in this case, you can see that with five principal components,

Â if I choose the four total polynomial instead of a third total polynomial,

Â the leave-one-out cross validation goes up.

Â That means that the model now is very sensitive to the data.

Â And we want to avoid that.

Â So that's how we make the decision on which terms to keep and

Â which terms to do them all.

Â And this table now describes

Â the models we established using the data science approach.

Â Now let's see how well they performed.

Â To see how well they performed, we make this plot,

Â where we cross plot the predictions from the simple model to the actual

Â personal data, which came from the fundamental results.

Â So in some sense, we think of the simulation results has gone through, and

Â we're looking at how well the predictions captured the simulations.

Â If everything is done well, in other words, if our models are really good,

Â all the data should be along this line, the black line.

Â If in fact, all the data is exactly on the black line,

Â then we have a really good model.

Â So in this particular case, we're comparing the ability to predict

Â the ability of the data science model, which is shown in red, on these plots

Â versus the predictive capability of some of the traditionally used approaches.

Â So for example, in this case, when we're looking at the effective yield point,

Â 3:52

that connects the effective yield point to the particle size.

Â Or another power law that connects to the volume fraction of the inclusions.

Â And you'll see that both of these conventional approaches do poorly,

Â compared to the data science approach.

Â 4:10

On the other hand, if you look at the localization propensity as a property,

Â you'll notice that both the data science method and this volume fraction

Â approach power law, using volume fraction, do a pretty decent job.

Â Only the power law based on particle size does not do a good job.

Â 4:34

Now, here are two more properties we captured using the same approach, and

Â once again, you'll notice that the data science approach consistently

Â does a much better job at capturing the structure-property linkage.

Â And the real advantage of [COUGH] the benefit

Â of using the data science approach is it's pretty much the same approach,

Â no matter what property we are interested in, and

Â no matter what physical material phenomenon we are interested in.

Â So in some sense, we can template this entire process, and we can automate it.

Â As a summary for this class, the main steps involved in the data

Â science approach were demonstrated and validated

Â using a simple case of non-metallic inclusions, steel composite system.

Â 5:22

We noticed that the data science approach provided a practical

Â tool to extract robust and reliable structure-property linkages of interest.

Â The data science approach can also be generalized

Â to a very broad range of applications.

Â As I said, it can be templated and used in multiscale materials modeling and design.

Â Thank you.

Â [SOUND]

Â