0:02

In this video, we demonstrate how the response surface strategy changes as we reach the optimum.

Â Issues of curvature and non-linearity become important at the peak of the mountain.

Â One advantage of response surface methods is that we learn about the region around us

Â as we go. Remember that analogy of walking with a ski pole in your hand? Well, we never

Â really know the region around us. So when we use that ski pole to figure out what the

Â terrain looks like, we need to have a way to know when we've reached the top.

Â Let's just quickly contrast the response surface approach with the OFAT approach. The COST

Â approach, or the OFAT approach, makes you think that you're at the optimum, but you

Â can never really be sure. In this case that we saw earlier with two factors, you would

Â alternate between optimizing factor A, then factor B, then optimize factor A again, then

Â B again. And you'll eventually get to an optimum, but will you be sure you're at the peak? How

Â do you know you don't need to do another round of optimizing in A and B again?

Â Also, if I'd optimized B first and then A, I would have arrived at the optimum faster.

Â This seems like a lottery! Sometimes you get to the peak quickly, and sometimes slower.

Â Not surprisingly, statisticians don't like this sort of thing.

Â Furthermore, this approach doesn't scale well. If you had five factors, for example, A, B,

Â C, D, and E, then this haphazard searching across the five factors leads to inefficient

Â experimentation.

Â By using the COST approach you will not learn about the interactions in your system. Recall

Â from an earlier video in this module that learning more about our systems was the first

Â way we can use data to improve our processes.

Â So let's resume and continue with the model built on points 11, 12, 13, 14, and the baseline

Â at point 10. We pointed out that the contour plots exhibit curvature. The lines are not

Â parallel. These curved lines come from the interaction term, indicating that the interaction

Â coefficient is important relative to the main effects.

Â In prior models, the interaction term was small. Notice though, that the steepest ascent

Â method will still send us up in the correct direction if we ignore the interaction terms.

Â The interaction term, if we had accounted for it, would send us in a slightly different

Â angle.

Â But in this example, the discrepancy is not so bad. Had their interaction term sent us

Â into a different direction, we would definitely follow that direction instead of the steepest

Â ascent that is determined only with the linear terms. But more on that to come with this

Â topic of "curvature".

Â Let's quickly go take a step in the direction for run number 15. And because you are good

Â at this now, I am going to take a step of "Delta x_T" equal to the size two and the

Â corresponding "Delta x_P" is equal to to minus two-thirds. You can do the rest of the calculations

Â yourself and show that the predicted value of profit at this location is $742, and that

Â corresponds to these real world values and these coded values. When we run the actual

Â experiment, we record a profit of $735. That's an overestimate of $7. This overestimate is

Â comparable to the main effect.

Â And we also have visual evidence now of curvature. This is starting to tell me that I should

Â change my strategy. When we start to enter a region of curvature in response surface

Â methods, it is the presence of a change in the surface's linearity that's apparent.

Â We're becoming more nonlinear, and likely approaching an optimum. It is desirable to

Â know when this is happening. And one indication of that already is that our interaction terms

Â are large, they cannot be ignored. And visually, we see that as these non-parallel lines in

Â the contour plot.

Â The second indication that an optimum is close by is that we are levelling out. Levelling

Â out means that my outcome values, in the neighbourhood, are getting closer and closer, even when I'm

Â taking reasonable step changes.

Â Let's see this. The spread in profit values in the first factorial was around a $300 difference.

Â In the second factorial over here, that spread was around $150. And now in this third factorial,

Â my spreads are only $15 to $20.

Â We're not making the gains we had made earlier. And if we're not careful, we can be affected

Â by noise. If we don't know the level of noise around us, we might be misled. How do I know

Â whether that spread of $15 to $20 is any different to the noise in the system? Another way to

Â ask that is if we repeated those corner experiments, would we get similar values or different values?

Â So let's go calculate what the noise level is. Run at least three or four repeated experiments

Â at the same condition. And we typically use the baseline. So here at the base of the factorial.

Â I previously had an outcome of $732, and two more runs give me an outcome of $733 and $737.

Â So there's a spread of about $5. That spread is very different to the spread over the corner

Â points of the factorial. Indicating, I'm still seeing signal over the noise.

Â The third indication of an optimum, is whether our predictions are too high, or too low.

Â We saw here at point 15, we had a prediction error of $7, just over our level of noise.

Â This indicates the model can be improved.

Â We often observe strong changes in the model's surface near the optimum. For example, if

Â you're making a product, you want to make it long enough to bring out the beautiful

Â colours and caramelization flavours that occur. But go just a little bit too far and it becomes

Â burnt.

Â We also see this in engineering systems. Often, our optimal point of operation is right at

Â the edge of a cliff, and if we go just a little bit further, we fall over to the edge of the

Â cliff and see our outcome value drop down rapidly. Another good reason to take small

Â steps near the optimum.

Â A fourth way to detect curvature is that our model does not fit the surface very well.

Â A linear model cannot fit a curved surface well. And we use the terminology, "lack of

Â fit", to quantify that. Let me show you. In our first factorial, the center point was

Â $407, but the predicted center point was $390. That's a difference of $17.

Â Now that might seem large, but it really isn't when we compare it to the main effect of 55

Â and 134. Recall what the interpretation of that number 55 is again? So a $17 difference

Â really is small, indicating a small lack of fit.

Â In the second factorial, the actual center was $657 while the predicted center was $645.

Â A difference of $12. That again is small when compared to the neighbourhood we're in.

Â In this third factorial though, the actual center is at the average of these three baseline

Â values, $734. Compare that to the predicted center value of $724. That's a difference

Â of $10. Which when compared to the largest effect of 7.5 and to the level of noise of

Â about $5, indicates an important deviation in the model, versus the actual surface that

Â we're on, at least in the center.

Â So if we're getting large deviations at the center, we cannot hope to get good predictions

Â outside of the range of the model. And good predictions are essential to optimize in the

Â correct direction.

Â So there are four ways that we've shown to check for inadequacy in the model. And those

Â of you with a statistical background can go calculate the confidence intervals on the

Â model coefficients, and observe that they're very wide. None of the terms in the model

Â are statistically significant.

Â Well, as we saw in the single-variable popcorn example, when faced with a poorly predicting

Â model in a region that has curvature, we can add terms to account for the nonlinearity:

Â "quadratic terms". So let's go add these now.

Â There are two options: adding points on the face of the cube, or adding points a little

Â bit further called "axial points" or "star points". These points are at a distance denoted

Â as alpha from the center. Alpha is a value greater than 1 to ensure they are outside

Â the cube.

Â The design on the left works well if you hit into a constraint, or can not leave the factorial

Â space. The design on the right, comes from a class of designs called central composite

Â designs or CCD, and they're preferred for the statistical reason called rotatability.

Â Just a quick aside, rotatability simply means that the prediction error is equal for any

Â two points that are the same distance from the center. And it's a desirable statistical

Â property.

Â Now, there are various choices on the distance alpha and the number of center points to use,

Â but that's a messy discussion that you can research quite easily. The general advice

Â is this though: run the factorials first; then run the star points afterwards at a distance

Â of alpha equal to 2\^k taken to the power of 0.25.

Â So, if you have two factors, alpha = 1.41, and if you had three factors, you would have

Â alpha = 1.68. Also, add three to four center points to assess lack of fit. And run these

Â center points at different times, not all after each other.

Â Notice this though, from the individual perspective of factor T and from factor P, each of these

Â have runs at five distinct levels, and that's what helps us go accurately fit that quadratic

Â model.

Â Let's go do this! The first star point is run number 18 at a value of +alpha for factor

Â T in coded units, and a value of zero in factor P. Let's add that to the table, and also calculate

Â the real world units for it in the usual way. So that's 343 parts per hour, and a sales

Â price of $1.63. You can go practice reproducing the other three star points, and let's go

Â add one final center point experiment, number 22, so that we have a total of four center

Â points.

Â Now we go run these experiments, in random order of course, and report the values here

Â in standard order. Notice firstly that the center point 22 is similar to the prior values

Â indicating that the system is still stable and reproducible.

Â Well we've got quite the collection of data here. A central composite design (CCD), always

Â has the factorial points, center points, and star points. Now I've arranged them in that

Â order in the R code.

Â When we run that code, we get the quadratic model from them all. I will leave it as a

Â small challenge to you, to go prove the following two things.

Â Firstly, the model's prediction of the center point, when compared to the average of the

Â four center points has a very small deviation. So this model fits well, at least at the center.

Â Secondly, this quadratic model's prediction of the other points, for example, one of the

Â corner points, or one of the star points, or even experiment 15 over here, is a very

Â good prediction. There is little prediction error. So we have confidence in this model's

Â prediction.

Â Now let's go visualize those as a contour plots. And right away, we can see we are in

Â fact near the optimum. Visually, the axial point is pretty close to the predicted optimum

Â region from the model. That's good enough to stop here and use as our optimum.

Â But let's say the quadratic model had looked like this one instead. Then you would go run

Â your next experiment over here based on the model at that predicted optimum. And then

Â you would go verify the model's prediction ability at that point to check that you've

Â reached the optimum.

Â Now we can be a bit more precise -- for those of you who don't like to trust the visual

Â judgement. We can take this quadratic equation, differentiate it with respect to the coded

Â variables, set it equal to zero, and you will get a set of two linear equations and two

Â unknowns, which you can then solve using your favourite linear algebra software, or by hand.

Â When you go do that, you get the predicted optimum at 343 parts per hour, and a selling

Â price of $1.59. The quadratic model tells us to expect a profit of $740 at this point.

Â Running that 23rd experiment gives an actual profit of $739; that's very close agreement.

Â This is definitely the largest value we've observed over the entire approach followed.

Â So this video has answered the last question we had in an earlier video in this module:

Â "How do we know when to stop?" We know that we can stop when our model matches the surface

Â well; and the model predicts an optimum. Using the model, we know that we've reached the

Â peak of the mountain, even though we cannot see the actual mountain around us .

Â So let's recap our entire approach. Start by building successive linear models, shown

Â here in blue, green, and orange, respectively. I'm showing you the prediction contours in

Â those colours for the local region around each model. Each of those local models had

Â their baseline or 0-0 value.

Â These past videos have also shown that we should incorporate the baseline points, as

Â well as other points in the neighbourhood in our model, to help improve their estimates.

Â We use our models as long as we have confidence in their predictions. We rebuild the model

Â once we demonstrate those predictions are poor, judged by comparing the predictions

Â to the actual values, and taking noise into account.

Â As we approach the optimum, issues regarding curvature, which we studied in four points,

Â become apparent. We have to change our strategy. If we pick up we have curvature, based on

Â these criteria, we have to start decreasing our step size and to start fitting quadratic

Â models.

Â The principle of an optimum is that it's nonlinear. Points around us must be lower. And so our

Â last prediction model that we build, shown here in red, illustrates that quite nicely.

Â To end off with though, let me show you the true surface in a grey colour. This is obviously

Â something you would never seen in practice. But seeing it here gives you good confidence

Â that we're doing the right thing all along.

Â You can see how the models in blue, green, and orange approximated the non-linear surface

Â very well in their local region. Outside their local neighbourhood, they start to deviate.

Â The non-linear model fits the surface over a wider region. That isn't too surprising.

Â The information to build that non-linear model, required four plus four, plus four, or 12

Â experiments. And we use that non-linear model to place our final experiment(s) very close

Â to the true optimum.

Â To end this video, I will add one point: the real optimum, may move.

Â Our system could deteriorate and change, so that optimum that you found - won't stay there.

Â They are experimental tools that continually keep searching and moving towards the optimum.

Â We won't have time to cover them in this course, but the topic of Evolutionary Operation (EVOP)

Â is what you should search for if that interests you. It is particularly applicable to manufacturing

Â systems that are never stable. That mountain is moving and you have to move as well in

Â order to remain at the peak.

Â