All right, so the first step that we're going to be looking at, deciding how many factors do we need to include in our analysis? There are a couple of different criteria that can be used. One criteria is to say, we want to capture, we want to retain at least a given percentage of the original variation in the service. So we might say, okay, I want to retain at least 50% of the variation in the survey. Another criterion that we could use is to say, well let's include as many factors as are necessary such that each factor that we include is doing its fair share of explaining variation. Well mathematically, what this maps on to is saying that all of the eigenvalues in the analysis have to be greater that 1. Or saying that the amount of variation, a given factor explains has to be greater than 1 over j where j is the number of survey items that we have. So if I have 20 survey items, we're going to include as many factors as necessary until a survey item falls below the 5% threshold or the 1 over 20 threshold. And that's the criterion that's often used, it's automatically built into software. Another way that we could do this is to look at what's referred to as a scree plot. What a scree plot does is essentially plot out. We can look at it in terms of the eigenvalues or as the percentage of variation. And we look for point where there's a plateau, or there's a kink in the curve and it kind of flattens out. So that's more of a visual way of accessing this. So these are different criteria that we can use. It's ultimately an analyst choice which want we're going to be using. But one of the more common ones is to say this, 1 over the number of the surveyor items or eigenvalue greater than 1, right? So the software package I'm going to show you is XL Stat. It's an add on for Microsoft Excel, adds multivariate statistics capabilities to the platform. It does allow for a 30 day free trial. I believe there are discounted rates available for students who want to purchase the license for I believe a one year term. And so we'll move over to the platform in a second to show you that. But what you'll see on the screen when we begin to do the factor analysis is it's going to ask where is the data contained? So in Excel, we highlight that region. We're going to leave it as the default principal components method for identifying those underlying factors. And then on the next screen, we can specify under the Option screen, do we want a rotation to be conducted? And if so, how many factors do you want to include? All right, so let's move over to the Excel document itself. And I've zoomed in so that we can see what's going on. And I'll do the same for us as we're looking at the raw data. Now I've relabeled the header row. So rather than looking at Q1, Q2, Q3 and so forth, just to look at a summary of what that question contains. So I've renamed it based on the questions themselves. So the column B corresponds to the responses to that question of am I in good physical condition, do I wear fashionable clothing, am I on my cloths more stylish than most of my friends? Do I like to take gambles, I'm not concerned with the Ozone, the government's too involved and so forth. All right, so this is our raw data, each row corresponds to a different respondent, each column corresponds to a different survey item. Now you'll see, after you install XLSTAT, you'll see a tab built for that. And what we're interested in is going to be analyzing data, if we click on that, you'll see the drop down menu of what we're going to be using. Through out this course, we're going to be using factor analysis right now. We'll move on to looking at key means in agglomeraty of cluster as methods of conducting market segmentation. We'll also look at using multidimensional scaling to construct perceptual maps. But you can also use XLSTAT for regression ANOVA analysis. That's built in to Excel, but techniques, such as logistic regression, which we've looked at in this specialization. Those are techniques that are not built into the standard Excel data analysis package. So this add-on really does expand the capabilities of what you can do within the Microsoft Excel environment. And so to conduct our factor analysis, we're going to launch that tab. First thing it's going to ask us for, is what's the data that we want to use? Now we're going to not use the purchase intention right now, we're going to put that aside. We're just going to use the questions about consumer preferences and consumer beliefs. All right, so that's B1 through AE401. We're going to indicate the structure of our data, it's in observations. All right, in a table, and we're going to leave principal components as is. Under Options, we're going to automatically determine the number of factors that are necessary. And I'm not going to turn on the rotate right now. We'll come back to look at what the rotated results look like in a little bit. And the reason for that is what I want to do is I want to see how many factors we actually need. Then once I know many factors we need, we can tell it how many factors to include in that rotation, rght? Missing data we don't have that problem here, but XLSTAT allows you to determine how you want to handle missing data. The output, you can decide what comes as part of your output. The important one that you want to make sure is checked off no matter what package you're using, is the factor scores. And that's what's going to allow us to conduct the subsequent analysis of effectively replacing the raw survey responses with the summarized results of the factor analysis. And so you'll see a summary of what your selection is. And we'll click on Continue, and we're going to just click through the first couple of charts, has to do with the display of information. If you're using XLSTAT, you'll get this popup window, asking you to add it as a trusted source. And again, XLSTAT is one tool that's out there, there's a free package called Real Statistics that's a nice package. The limitation there with factor analysis, it doesn't allow you to save those factor scores. Those of you who are teaching yourselves statistical languages such as R. Factor analysis is built into R, it's built into environments such as Matlab, Jump, SAS. So you can conduct this really using whatever software you're most comfortable with, right. And that's all there is to conducting the analysis, so let's just take a look at the output. We have a summary of the range of each of the survey items mean and standard deviation. Notice, we get this lovely correlation matrix, and then we can try to eyeball it. We can try to look for patterns here ourselves, but that's going to get difficult, especially since it doesn't all fit on one screen, right? We're going to move down, in terms of looking at the output. We do see what the eigenvalues are, and notice that this analysis has been run out to 18 factors. And you'll see that the eigenvalues continue to decline, that's by design. The first factor is going to have the largest eigenvalue, the second factor will have the second largest, and so forth. And that's directly related to the variation that's going to be explained, and that continues to decline with smaller eingenvalues. So that's the variation being explained by each incremental factor. And then the row below that giving is going to give us the commutative amount of variation that's explained. And what we're looking at here, notice that when we get up to 9 factors, we're capturing almost 72% of the variation in the original survey. So we've gone from about 30 questions down to about a third of those questions. And we still have more than 70% of the information contained in the survey. We could keep on adding more and more factors to capture more and more information. But notice that we see very little gained in terms of the amount of information being explained as we add more factors. That's mimicked in the screen plot that we see. Notice that early on the red line giving us that accumulative variation that we're capturing, does a pretty good job, and then it plateaus. And so that plateau, or if we were to invert this, it would look like an elbow. That's what we're looking for as a means of deciding, when do we want to stop? So it looks like in this case, we're going to stop after the 9 factors. And that's what's been done automatically for us. All right, so we'll come back to this factor pattern in a second, because that's really going to help us in terms of our interpretation. All right, and yeah, as I mentioned in terms of running your analysis different packages may or may not give you the option of this. But with XLSTAT, we said get rid of the observations wherever there's missing data. And this is what we were again, just taking a look at that we want to make sure we're doing a good job in terms of capturing the variation that's been explained in the data.