0:15

This session is about data and the normal distribution.

Â We're going to get introduced to some concepts of the normal distribution and

Â see how we can apply it in different Phases of a Six Sigma Project.

Â But before we get there,

Â let's see what the measure phase of the Six Sigma Project is all about.

Â So what happens in the measure phase?

Â The first thing is that you identify variables.

Â You identify the critical to quality characteristics and

Â you think about how you are going to measure these.

Â Then you assess the measurement systems.

Â The idea there is to make sure that your measurements systems are valid and

Â they are reliable.

Â They're valid in the sense that they are measuring what they are supposed to be

Â measuring.

Â They're reliable in the sense that when you use them over and

Â over again they give you accurate results.

Â They are sensitive to changes, that's what a measurement should be.

Â And accessible in terms of they can be understood by people who are going to be

Â seeing those measurements on a day-to-day basis so

Â that they know what's going on in the process.

Â So we'll go from critical to quality characteristics to

Â measurement systems in the measure phase.

Â In the measure phase, we also go to establish the current performance on

Â critical to quality characteristics.

Â So once we've gone from figuring out what those critical to quality characteristics

Â are, and then the measurements are, then we need to establish current performance.

Â Now to establish current performance,

Â we use something called statistical process control.

Â And these are control charts that you can have for different types of data, for

Â discrete data, for continuous data.

Â And there are many different types of control charts that you can use to

Â establish the inherent capability of a process.

Â Next within the measure phase of the Six Sigma Project, you also can

Â establish the targets for improvement and what those targets should be.

Â So, there you would be looking at things like the Sigma levels of the process,

Â so you establish the Sigma level of the process.

Â But before that you do a process capability analysis.

Â A process capability analysis is to see how well the process is performing in

Â relation to customer expectation.

Â So in relation to the voice of the customer.

Â Comparing the voice of the customer with the voice of the process,

Â the VOC with the VOP in that sense.

Â So those are the things that happen in the measure phase.

Â Now let's take a look at different types of data that can be used in

Â the measure phase.

Â And then we'll get to distributions of data next.

Â So what are the different types of data that we can use in a Six Sigma Project and

Â that we need to start thinking about in the measure phase?

Â So first is simply verbal data.

Â And this could be open-ended comments from people.

Â If you're doing a customer survey, they're telling you something about the product or

Â the service.

Â If you're doing an employee survey, they're telling you something about

Â the experience that they have with their supervisor or working in that company.

Â So here are the example that you see is a statement that says,

Â my supervisor respects my opinions.

Â So these are open-ended comments that you would have coming out of any

Â kind of an interview or

Â a survey that you do of the audience that you're interested in getting data from.

Â Next we get into data.

Â Data in the sense of numeric Data.

Â So first we have discrete variables.

Â And the way you can think about discrete variables are where decimal

Â points do not matter, do not make sense in fact.

Â Not that they don't matter, they don't make sense.

Â So when we think about things like anything that has two values.

Â Say it's available or not available.

Â We think of it as a zero one situation.

Â Something is on time or not on time.

Â It's a zero one situation.

Â There's no 0.5, there's no 0.75.

Â So that's the first type of a discrete variable.

Â And the data that we're talking about there is attribute data of

Â a binary characteristic.

Â It is binary in the sense that there are only two possible values for it.

Â And if think about what is underlying distribution for that kind of data,

Â you maybe familiar with this already that it's a binomial distribution.

Â Binary data binomial distribution, two kinds of options yes and

Â no, or is good or not good.

Â Those kinds of data we're talking about there.

Â Next, within the categorical data, within attribute data, we have the nominal ones.

Â So here we don't really have numbers for different types of categories,

Â but we are considering them as four different categories.

Â So for example, here we have how do employees commute to work?

Â They either walk, they come by bike or they take the train or

Â they drive their own car.

Â And those are four different types of ways of commuting to work for the employees.

Â Now, you can give these numbers as 1, 2, 3 and 4.

Â You can call them as 1, 2, 3 and 4.

Â But they donÂ´t really have any natural ordering.

Â We canÂ´t say that one is higher than the other.

Â So you can call this in some way, but

Â theyÂ´re not going to mean anything in terms of their natural ordering.

Â The next category that we go to of our types of data is ordinal data.

Â Ordinal data is

Â 5:37

going to have meaning in terms of something is higher than the other.

Â So when you think of any kind of customer satisfaction survey that you may be

Â familiar with.

Â Those are the things that we get in the mail, or when you go to a restaurant,

Â they put it on the table saying could you fill this out for us?

Â And you may also be getting these as employee satisfaction surveys.

Â Now these surveys have scales that go from extremely dissatisfied

Â to extremely satisfied.

Â Or extremely happy with this to extremely unhappy, whichever way it's ordered.

Â The point there is that there's going to be some meaning of that ordering.

Â That one is either going to mean very good and five is going to mean very bad.

Â Or five is going to mean very bad and one is going to mean very good.

Â So there's going to be some kind of ordering,

Â some kind of natural ordering to these categories.

Â But remember, we're still talking about discrete categories.

Â And if you think about these three types of data, the binary data, the nominal,

Â without natural ordering, and then the ordinal, with natural ordering.

Â The concept here is that you are taking data that is subjective and

Â you're converting it to objective.

Â You're taking information and you're converting it into objective data,

Â using either a binary scale or a nominal scale or an ordinal scale.

Â So you can express these in terms of numbers.

Â 7:02

Within discrete variables, we also have something called count data.

Â And what is count data, it's as the name suggests.

Â It's counting for example the number of defects in a product.

Â If I'm looking at this clicker that I'm holding,

Â and I'm saying, how many defects are there in this clicker?

Â I can count the number of defects.

Â If I'm looking at defects in an application form that I get,

Â I'm counting the number of defects.

Â And again, it's going to be discrete.

Â I can not find 2.5 defects.

Â It's going to be either 2 defects or 3 defects.

Â And that's why it's still a discrete distribution, but I'm looking at here

Â different type of data within a discrete distribution and it is count data.

Â Now what are the implications of these different types of data?

Â The underlying statistical frequencies,

Â the underlying frequencies of data will be different.

Â The underlying statistical distributions that you can use for

Â these types of data are going to be different.

Â And that is going to have implications in terms of how you're going to do

Â the analysis.

Â The other implications of these types of data are some will give you more

Â information than others.

Â And some will be in that sense more valuable in terms of data

Â collected than others.

Â And some will also be harder to collect than others.

Â So there might be some trade-offs that you're thinking about

Â as to which type of data we should collect.

Â Well, you might be trading off with, this one is simple to collect, we're simply

Â asking a yes/no question, if you're talking about the binary type of data.

Â But we're not getting much more information than simply somebody

Â was happy or unhappy about something.

Â And we can move some more in depth information if we can move to

Â more of an ordinal kind of scale which has a survey, a battery of questions,

Â many questions that are scaled on one to five, or one to seven.

Â Typically we have odd numbers in those scales.

Â And there you are capturing a little more information.

Â It's going to take more effort, it's going to cost you more, but

Â you're going to get more information.

Â You can do something with that information.

Â So when you are thinking about types of data, you

Â should be thinking about what are the cost benefits of the different types of data.

Â Now let's take a look at the other kind of data when we're talking about discrete or

Â we're talking the opposite of discrete is continuous data.

Â So continuous data is any kind of measurement data.

Â And there we're basically saying that it can theoretically take infinite

Â number of values.

Â So we can say for example that if you're talking about temperature,

Â depending on the level of granularity that you want to go into,

Â you can go up to many, many decimal places when you're talking about it

Â in terms of Fahrenheit of Celsius.

Â And when you're talking about weight of something,

Â depending on the level of granularity that you want to go into,

Â you can be talking about 2.5 pounds, 2.68 pounds, 2.697 pounds.

Â And then you can be thinking about it in terms of ounces if you want to get it to

Â be more specific.

Â And that's the idea of continuous data of measurement data.

Â So that's the kind of data that we are normally think about when you're thinking

Â about numerical data.

Â It's very useful in terms of it's a very specific measurement of something,

Â but never the less it's a measurement of one kind of characteristics.

Â So, if I know that a critical to quality characteristic of

Â a service in a restaurant is time, I can be measuring time.

Â But it's only going to give me information about time.

Â If I know that critical to quality characteristic in a restaurant is

Â temperature of food,

Â then I can be thinking about measuring temperature of food.

Â But then it is going to be very specific but

Â it's going to be only about the temperature of the food.

Â So measurement data gives you much more information, but

Â it's about a specific aspect of a product.

Â Now, within measurement data you can collect data that is cross-sectional,

Â or that is more of a time series.

Â And simply here what we mean is that we could be looking at things

Â as they are at a point in time, or we can be looking at them over time and

Â is there a trend when we look at time series kind of data.

Â And then when we look at time series kind of data,

Â there are some implications in terms of what kind of analysis we can do.

Â So there maybe specific things that we have to account for

Â in terms of when we're doing time series kind of data.

Â Kind of when we're taking it from the same process over a period of time when we're

Â trying to measure something.

Â Or if we're looking at sales over time, over different months or

Â over different weeks, there will be some ways, in fact,

Â of adjusting to the collinearity, the the obvious relationship

Â that is going to be there when you have many weeks of sales data or

Â many weeks of any kind of process data.

Â There's going to be some relationship between the previous week and

Â the next week so you need to account for that.

Â And that's why you need to think about times series data as a little bit

Â differently than when you're looking at cross-sectional data.

Â Now, let's take this categorization and

Â apply it to some different types of data that we have over here.

Â So here you have different measurements, different things that are being measured.

Â And what I'd like you to do is apply the categorization

Â that we just saw in terms of is it discrete, is it continuous, and

Â is it within discrete, the different things that we saw.

Â The ordinal, the nominal,

Â the binary, and the count data, and whether you can apply those.

Â So you have paint viscosity, service at drive-through, and

Â then you have on-time arrival or not, number of customer calls abandoned,

Â humidity in a paint shop, and source country for outsourced parts.

Â So apply those categorizations and we'll come back and

Â see if you were able to apply them correctly.

Â 13:08

So, we're back to the data types that we saw before the question and

Â paint viscosity is something that would be a continuous measurement.

Â So it'd be measurement kind of data.

Â It's something that you might measure in units that can have decimal points.

Â So it's a continuous measurement data.

Â Service at a drive-through going from very unsatisfactory to very satisfactory,

Â it's categorical data but it's ordinal.

Â There is meaning to 1 being better than 5.

Â So there is an implied hierarchy in those numbers.

Â On-time arrival or not, something was on-time or not is obviously binary.

Â There are only two kinds of options there, two options there.

Â Number of customer calls abandoned should give you a hint just from the term just

Â from the fact that it's a number of calls.

Â It's count kind of data.

Â You're counting the number of calls that were abandoned.

Â Humidity in a paint shop.

Â Again, it's going to be like viscosity that you saw earlier.

Â It's going to be measurement data.

Â Source country for outsourced parts is going to be categorical,

Â except it's going to be nominal.

Â You're going to put these in different countries, and

Â you're going to say that if it's a one, it indicates that it's from the US.

Â If it's two, it indicates it's from Canada.

Â If it's three, it indicates it's from Mexico.

Â If it's four, it indicates that it's from China.

Â And there's going to be no implied hierarchy in terms of the numbers that

Â you're using, in fact you could use any numbers for any of those countries.

Â And that's what we mean by it being categorical but nominal data.

Â So, here you've seen the application of the different data types.

Â