So what is factor analysis do for us? Well we going to start with our original set of survey items. Those the original variables that we're measuring. And what we ultimately want to do is we want to group those variables together, those survey items together that are highly correlated with each other, the ones that tend to move together. Now that movement maybe in the same direction, that movement maybe an opposite direction. But the assumption that we're going to make is that items that tend to move together, there's some underlying construct. There's some high order belief that consumers have or some set of preferences that they have that cause all of those survey items to move together. And if we can identify those underlying beliefs, those constructs, those are what we're going to put into our regression analysis as well as the subsequent analyses that we might conduct. Now while we're doing that, we want to make sure that we retain as much information as possible. So in our survey, let's say we've got our 30 survey items that we're looking at. We want to make that a more manageable number. We want to cut that down to identify what's really driving those responses, and maybe it's ultimately five constructs that are ultimately driving those 30 responses. Well those five constructs, that's a lot smaller than the 30 survey items that we began with. And so any time that we engage with dimension reduction we are going to be throwing away information. Our goal is to retain as much information as possible When we're conducting our analysis, all right? And so here's just kind of a visual illustration of what we might be finding. So suppose that there are K underlying constructs. And those K constructs ultimately drive all of the responses that we're collecting on the survey, all of the survey items. Well, what we might see is survey item 1 might be related to constructs 1 and 2. Survey item 2 might be related to construct 1. Survey item 4 might be related to construct 2. And so what we're going to ask factor analysis to do for us is two things. First, reveal to us how many constructs are appropriate. What is the appropriate number K? Second, we're going to ask it to reveal which constructs and which survey items are ultimately related to each other. So one of the ways that factor analysis is commonly used when it comes to analyzing survey data as I had mentioned, is to group these similar items together. And by similar I mean items that tend to move together. Now in addition to facilitating subsequent analysis by grouping similar items together, reducing the number of predictors that ultimately going into subsequent analysis. This can also be used in the course of designing your survey. Your initial survey might have 100, 150 individual items on it. And what we'd like to do is pair that down so that respondents find the survey to be a little bit more manageable. So maybe I can go from a 150 survey items down to 50 surveys items after the first pass. Well, factor analysis will help us identify which items tend to move together and as such, identify which ones are potentially redundant. I can eliminate those redundancies and administer my survey in the second wave and continue to refine it until I have a number of survey items that I'm comfortable with. The other way that factor analysis gets used is to produce measures that are uncorrelated with each other. Multicollinearity is a big problem when it comes to regression analysis. The outputs that we get from factor analysis by design are uncorrelated with each other. So if we first conduct factor analysis and then use the output from factor analysis as the inputs in our regression analysis, we're not going to have to worry about multicollinearity. And so that's one of the reasons that this is such a popular statistical technique. Right and so before we get into the mechanics and I'll illustrate it for you using a particular software package, what's the basic idea? Let's take our original survey data in this particular automotive example. We have 30 different variables or 30 different survey items. We want to see which of those items get grouped together to form these super variables. The first factor that we identify is going to have as much information as possible, that's by design. The second factor that we construct is going to have as much information as possible remains after having taken into account F1. And we're going to keep on adding additional factors, a third factor, a fourth factor, a fifth factor and so forth until it's no longer worthwhile for us to keep on adding additional factors. So, mathematically what might this look like? All right? Well, our goal is to represent each survey item. Each of our Xs', based on a linear combination of the factors, right? And that is, if we were to write this out going back to our original retail example. We're interested in modeling each of the Xs' as a function of the underlying factors. Now notice that the factors F1 and F2. These are common across all of the survey items, right? So we're going to use the same factors F1 and F2 across all of the five survey items. Now, what we need to estimate are the factor loadings. In this case, effectively, the coefficients. Now, this looks an awful lot like linear regression. But one of the differences here is when we're dealing with linear regression we know we have a set of predictor variables that are known and we have outcome variables that are known. And all that we're trying to estimate are the coefficients, the betas. In this case, we have a set of outcomes that we're interested in modeling. In this particular case, it's the Xs', but we don't know what our inputs are. We don't know the independent variables F1 and F2. We actually have to infer those through our analysis. We also don't know what the factor loadings or the coefficients are. So we're trying to get both the factor loadings and the common set of predictors in this case. But the idea behind it is going to be in the same spirit of regression. It is take a smaller number of factors and use that to model the individual survey items. All right, so if we were to return to that retail example, where we saw two blocks of items. Remember items one and two we saw were correlated with each other. Items three, four, and five were correlated with each other. The way that factor analysis would handle that would be to say that items one and two load onto the same construct. Items three, four and five load onto the same construct. All right, so let's return to the automotive example where we had looked at our survey. Our objective in analysing this survey is going to be to take as an input the original survey responses. And what we want to do is eliminate any redundancies and produce a new set of predictors that can be used in our subsequent analysis so the output that we're going to get are the factor loadings. That's going to tell us which of the original survey items tend to move together. When someone says that, they're optimistic about their income being higher. Well, what else do they tend to believe? So that's what the factor loadings give us, the other piece that we're going to be interested in is producing a set of factor scores. Think of these as a new set of predictors. It's going to summarize all of the information that was contained in the original survey, and we're going to be able to smaller set, these New Xs variables in our subsequent analysis. All right, so just to outline the steps that we're going to be conducting and the software that we use is going to help us with most of these decisions. We're going to decide how many factors are necessary, we're going to conduct the analysis, derive that solution. An optional step that can aid with interpretation is rotating that solution and we'll take a look at what exactly that means in a little bit. The part that the computer can't help us with is step 4, interpreting the factors or naming the factors. This is where a person needs to be involved, all right? We're going to look at how good a job we're doing at capturing the original survey data and then we'll take a look at kind of what would our next steps be. After I run this analysis, what can I do with the responses? And so really we're going to be thinking about factor analysis as an input for, and setting the stage for, subsequent analysis.