Okay, so in today's class we're going to look at how we can build a predictive model for our experimental system, but using computer software. In prior videos I showed you how we did it all by hand, now it's time to make the computer do the work for us. There are a variety of software options that we might use, and there are forum discussions about this. However, for this course we have chosen to use R. The R language is free software, is fairly user-friendly, but most importantly it is heavily used by a wide variety of companies and statistical researchers. You will need 2 downloads: first download R itself from this link; then secondly, download RStudio from this link. Install both software packages. We won't actually use R directly; instead, we will use RStudio, which will call, and run, R in the background for us. After installing RStudio, please open it on your computer and you will have a screen that appears similar to this one. Just to show you how flexible R is, you can even run it from a website. If you don't want to install it, or if you cannot install it because you are on a work computer, you can always go to this link to run it interactively: http://yint.org/Rweb OK, so open RStudio and start by creating a new R script under the "File" menu. This is where you will write your commands. I want to emphasize 2 things. First, there is a common trap we have to tell you about: commands in R are case-sensitive. For example, this command c(1, 2, 3, 4) will work to create a list with 4 entries, but if you use a capital C(1,2,3,4), it will not. For this reason, whenever we show code in the videos, we will also give a link for you to download the code. Go to that website link and rather copy and paste the code. Initially, don't type it in yourself. Later on, of course, you are free to type in the commands, when you have confidence. For example, all the code used in this video is available at the link shown. Secondly, if you ever need help, use the help command in R, as shown here. As you can see now on the screen, there is a difference between lowercase and uppercase "c" commands. OK, are you ready to get started? We're actually going to work backwards. First create the prediction model called "popped_corn" by saying "popped_corn" backwards arrow, "lm"; open brackets; "y"; is predicted by A + B + A*B; close brackets. Notice how this is similar to the prediction model we wrote by hand in the prior video. Now if this is your first time with R this can be a little bit intimidating. There's a few things to consider. First is the backwards arrow (<-). It is actually a less than symbol (<) next to a dash (-), making it look like a backwards arrow. In R that represents the assignment operation. In other words, we're going to create a variable called "popped_corn", and assign it whatever is on the right hand side; in this case a linear model. The "lm" over there on the right stands for "linear model", indicating we want a least squares model, which is really just a type of linear model. And lastly, the symbol here in the middle, The tilde (~) can be interpreted as, "is predicted by" or "is described by". Now let's try running this R command. Highlight the line you've just typed, and then click that "Run" button over there. What you'll see is an error message, "object 'y' not found", which indicates that the software does not know what the variable "y" is. We haven't defined "y" just yet. In fact, we have also not defined variable A and variable B. So let's go do that now. Once again, use the assignment operator, the backwards arrow (<-), to do this. Take a look at the prior video: we saw that variable A for the 4 experiments was -1, +1, -1 and +1 from the standard order table. Let's go put these 4 numbers in a list, using the lowercase "c" command, which stands for "combine". So write "c", and then put those 4 numbers in between brackets, separated by commas: c(-1, +1, -1, +1) In a similar way, factor B from the table had -1, -1, +1, +1, let's type that in: c(-1, -1, +1, +1) Now highlight those two commands, and click "Run" to see what R does with those commands. We didn't get an error message. If we go to the console region here, and type the letter capital A and capital B, we see those two lists repeated back to us. Actually, we can also see them up here, in the "Environment" tab. We still need to create the variable called "y". The variable "y" contains a list of the numbers that represents the outcome of the experiments. Once again, we get that from our standard order table. Note that variables A, and B, and "y" have the same logical order from that table. It's very easy to create these models, because we can just go ahead and copy-and-paste directly from that standard order table. Finally, we are ready now to go run all the commands. Another nice shortcut in RStudio is to click the "Source" button. That will run all our commands in one go. In fact, there is "Source" and "Source with Echo". If you are new to R, please use the second option, which will echo (in other words "rewrite") all the commands into the console with you. That way, if there is a mistake in one of your lines, you will see exactly where the problem is. Now let's got inspect the result, particularly what that "popped_corn" variable is. Go down to the console window and type "popped_corn", and we will see our least-squares model over there. The output from that command shows us the prediction model. It has an intercept of 67; a main effect for A of 10 units; a main effect for B with a value of 4; and then the two factor interaction effect AB. Notice that these numbers match exactly what we calculated by hand earlier. So there you have it: a really quick way to get the model with computer software. We can also use the "summary(...)" command, to get more information. It shows us the original formula we used, when we built the model; it shows us the residuals, which we'll talk about later on in the course. And there we see the same 4 parameters again: 67, 10, 4, and -1. And, there's a bit more of statistical output down here for those of you that know what that is about. A key result I want to point out right now, is that there are 4 experiments, and we estimated 4 parameters: the intercept, the A effect, the B effect, and the AB interaction. I would also like to point out one thing here, where we specify the linear model. There is a term for A, for B, and the AB interaction. But you notice there isn't actually a term for the intercept. R will automatically add it for you. So even if you see 3 terms here in your input, you will get estimates for 4 parameters from R. Another nice shortcut that you can try - and we will explain this in later videos - is that you can create your model by saying: lm; open brackets; y; is predicted by (~); and then only write A*B; then close brackets. Try that out and see what you get. I want to end this video by stating that you could have used other computer software to build the least-squares model. For example Excel, Python, Minitab, MATLAB, SAS, JMP, or any of the other design of experiment software that are commercially available. You should get exactly these same parameters from the software. That's a good test if you're trying out one of those other software packages. So we have learned here how to get the basic results from a two factor experiment. Before we end this video, I want to challenge you though. Use the R software to repeat the numeric analysis for the ginger biscuits example that we had in a prior class (video 2C). Here are the raw experimental data and recall, this was a predictive model for taste. Are you able to reproduce the results we calculated by hand: 5.25, 1.75, 1.25, and 0.75 for the four parameters? Make sure you can reproduce this before continuing to the next video.