0:05

All right, so the first step that we're going to be looking at,

Â deciding how many factors do we need to include in our analysis?

Â There are a couple of different criteria that can be used.

Â One criteria is to say, we want to capture, we want to retain at least

Â a given percentage of the original variation in the service.

Â So we might say, okay, I want to retain at least 50% of the variation in the survey.

Â Another criterion that we could use is to say,

Â well let's include as many factors as are necessary such that

Â each factor that we include is doing its fair share of explaining variation.

Â Well mathematically, what this maps on to is saying that all of the eigenvalues in

Â the analysis have to be greater that 1.

Â Or saying that the amount of variation, a given factor explains has to

Â be greater than 1 over j where j is the number of survey items that we have.

Â So if I have 20 survey items, we're going to include as many factors as necessary

Â until a survey item falls below the 5% threshold or the 1 over 20 threshold.

Â 1:21

Another way that we could do this is to look at what's

Â referred to as a scree plot.

Â What a scree plot does is essentially plot out.

Â We can look at it in terms of the eigenvalues or

Â as the percentage of variation.

Â And we look for point where there's a plateau, or

Â there's a kink in the curve and it kind of flattens out.

Â So that's more of a visual way of accessing this.

Â So these are different criteria that we can use.

Â It's ultimately an analyst choice which want we're going to be using.

Â But one of the more common ones is to say this, 1 over the number

Â of the surveyor items or eigenvalue greater than 1, right?

Â So the software package I'm going to show you is XL Stat.

Â It's an add on for Microsoft Excel,

Â adds multivariate statistics capabilities to the platform.

Â It does allow for a 30 day free trial.

Â I believe there are discounted rates available for

Â students who want to purchase the license for I believe a one year term.

Â And so we'll move over to the platform in a second to show you that.

Â But what you'll see on the screen when we begin to do the factor analysis is it's

Â going to ask where is the data contained?

Â So in Excel, we highlight that region.

Â We're going to leave it as the default principal components method for

Â identifying those underlying factors.

Â And then on the next screen,

Â we can specify under the Option screen, do we want a rotation to be conducted?

Â And if so, how many factors do you want to include?

Â All right, so let's move over to the Excel document itself.

Â And I've zoomed in so that we can see what's going on.

Â And I'll do the same for us as we're looking at the raw data.

Â 3:24

Now I've relabeled the header row.

Â So rather than looking at Q1, Q2, Q3 and so forth,

Â just to look at a summary of what that question contains.

Â So I've renamed it based on the questions themselves.

Â So the column B corresponds to the responses to that question of am I in good

Â physical condition, do I wear fashionable clothing,

Â am I on my cloths more stylish than most of my friends?

Â Do I like to take gambles, I'm not concerned with the Ozone,

Â the government's too involved and so forth.

Â All right, so this is our raw data, each row corresponds to a different respondent,

Â each column corresponds to a different survey item.

Â Now you'll see, after you install XLSTAT,

Â you'll see a tab built for that.

Â And what we're interested in is going to be analyzing data, if we click on that,

Â you'll see the drop down menu of what we're going to be using.

Â Through out this course, we're going to be using factor analysis right now.

Â We'll move on to looking at key means in agglomeraty of cluster as methods of

Â conducting market segmentation.

Â We'll also look at using multidimensional scaling to construct perceptual maps.

Â But you can also use XLSTAT for regression ANOVA analysis.

Â That's built in to Excel, but techniques, such as logistic regression,

Â which we've looked at in this specialization.

Â Those are techniques that are not built into the standard

Â Excel data analysis package.

Â So this add-on really does expand the capabilities of what you can do

Â within the Microsoft Excel environment.

Â 5:38

All right, so that's B1 through AE401.

Â We're going to indicate the structure of our data, it's in observations.

Â All right, in a table, and we're going to leave principal components as is.

Â Under Options,

Â we're going to automatically determine the number of factors that are necessary.

Â And I'm not going to turn on the rotate right now.

Â We'll come back to look at what the rotated results look like in a little bit.

Â And the reason for

Â that is what I want to do is I want to see how many factors we actually need.

Â Then once I know many factors we need,

Â we can tell it how many factors to include in that rotation, rght?

Â Missing data we don't have that problem here, but

Â XLSTAT allows you to determine how you want to handle missing data.

Â The output, you can decide what comes as part of your output.

Â The important one that you want to make sure is checked off no matter

Â what package you're using, is the factor scores.

Â And that's what's going to allow us to conduct the subsequent analysis of

Â effectively replacing the raw survey responses

Â with the summarized results of the factor analysis.

Â 6:48

And so you'll see a summary of what your selection is.

Â And we'll click on Continue, and we're going to just click through

Â the first couple of charts, has to do with the display of information.

Â If you're using XLSTAT, you'll get this popup window,

Â asking you to add it as a trusted source.

Â And again, XLSTAT is one tool that's out there,

Â there's a free package called Real Statistics that's a nice package.

Â The limitation there with factor analysis,

Â it doesn't allow you to save those factor scores.

Â Those of you who are teaching yourselves statistical languages such as R.

Â Factor analysis is built into R,

Â it's built into environments such as Matlab, Jump, SAS.

Â So you can conduct this really using whatever

Â software you're most comfortable with, right.

Â And that's all there is to conducting the analysis, so

Â let's just take a look at the output.

Â We have a summary of the range of

Â each of the survey items mean and standard deviation.

Â Notice, we get this lovely correlation matrix, and

Â then we can try to eyeball it.

Â We can try to look for patterns here ourselves, but that's going to get

Â difficult, especially since it doesn't all fit on one screen, right?

Â We're going to move down, in terms of looking at the output.

Â We do see what the eigenvalues are, and

Â notice that this analysis has been run out to 18 factors.

Â And you'll see that the eigenvalues continue to decline, that's by design.

Â The first factor is going to have the largest eigenvalue,

Â the second factor will have the second largest, and so forth.

Â And that's directly related to the variation that's going to be explained,

Â and that continues to decline with smaller eingenvalues.

Â So that's the variation being explained by each incremental factor.

Â And then the row below that giving is going to give us

Â the commutative amount of variation that's explained.

Â And what we're looking at here, notice that when we get up to 9 factors,

Â we're capturing almost 72% of the variation in the original survey.

Â So we've gone from about 30 questions down to about a third of those questions.

Â And we still have more than 70% of the information contained in the survey.

Â We could keep on adding more and more factors to capture more and

Â more information.

Â But notice that we see very little gained in terms of the amount of

Â information being explained as we add more factors.

Â That's mimicked in the screen plot that we see.

Â Notice that early on the red line giving us that accumulative variation that we're

Â capturing, does a pretty good job, and then it plateaus.

Â And so that plateau, or if we were to invert this, it would look like an elbow.

Â That's what we're looking for as a means of deciding, when do we want to stop?

Â So it looks like in this case, we're going to stop after the 9 factors.

Â And that's what's been done automatically for us.

Â