[MUSIC] Hello again, now the last group of analysis tools we're going to go through inferential tools. Not just the describing or characterizing of the full population. Now we're going to take this data to derive concepts, models and trends into the future or into additional interventions and factors in our data set and process. If you remember back to a time that you've been in science fair, you were probably introduced to the scientific method. Let's just do a quick recap on that. In each one of those scientific method exercises is going to have a question. That question will now collect information and form a hypothesis. It doesn't always mean that your hypothesis has to be true or not. You just have to either prove or disprove that idea, that concept, that you are testing with data. Sometimes, it is easier to disprove it. But it's just as important which way the hypothesis is brought out. And then after you've tested it with your experiment, now, analyzing the data to draw the conclusion. So now we are going to be focusing on the ways that you would analyze the data in order to draw those conclusions. Each one of those hypotheses is just a yes or no. There's no maybe in there. It is either going to be the null hypothesis, that you say there is nothing going on, there is no interaction. There isn't a factor that is putting any impact or influence on your outcome of your process. Or the alternative hypothesis, is the alternative, that there is something going on. There is a connection. There is a relationship between that factor and your outcomes. If you think about the judicial system, it is not guilty and guilty, there is no middle ground. The presumption in our legal system is innocent until proven guilty. So in hypothesis testing, or in any of the tools that we're going to show you, it is going to have the burden of nothing's going on until data shows that there is an influence or there is an impact. In many of these tests, there's also what's known as the p value. And that's the probability that that null hypothesis, the not guilty, nothing's going on, nothing's happening, there's no influence. That probability is true beyond a reasonable doubt. So once you get that p value down to 5%, 0.05, that is when you have to give up on that presumed innocent. If it is small, that 5%, you finally have to give up that null hypothesis and accept there is guilt. There is an influence, there is a factor on that outcome process. There is always a chance for error, because you're still going to have that 5% chance, but it's going to be so small that it's not reasonable to stay with the null hypothesis. All right, in this two by two table that we see right here. This goes back to our earlier module when we were discussing variables versus attribute data. And how some of your choices and opportunities on tools, going forward, depends on the type of data that you're dealing with. So if you look at the two columns, you have variables and attributes. So these are the way that you would characterize the data sets that you have on your process and your systems. Then each one of the rows has the same discussion, and the same decision, based on whatever that factory is the way it is subgrouped differently. So depending on what you have on your inputs and your outcomes, it only will allow you a certain set of tools that fit those data items. For example, if I have a variable's outcome and a variable's input, think back to a scatter plot where it's just bunch of dots and it looks like the Milky Way. Now correlation and regression are the perfect tools to an analysis on that type of data. Each one of these tools has its own concepts of hypothesis testing. There is going to be a null hypothesis or an assumption. For example, correlation regression, the null hypothesis, again, nothing is going on, nothing is happening, would be that there may look like there is a picture there. It may look like there is a belt on a huntsmen up in the nights sky. But the correlation regression, null hypothesis is no, there is nothing going on. Or lower in the left-hand corner t-test and F-test. Those are individual tools looking at is there a difference between means in subgroups, or the spread in variation in subgroups? Some of them have p-values and others have tests that give you outcomes and concerns on that data. And we're going to go through one of them right now. If you see control chart as showing up in both the top two boxes, it doesn't matter whether you have variable or attribute data you still can have a set of control charts that look for trends and behaviors in that type of data. What a control chart is, is I'm just going to plot each one of the data items spaced over time. And when you go left to right they have to be time ordered. It's not going to be like a Pareto where you can just have the tallest to smallest. No, it's very important that not only are they time ordered, but that they are spaced properly or uniformly between them. If you look closely at this chart, after you've plotted all the points, typically, the systems or you might do it by hand, the center line is going to be the average of all of those data points. What is the central behavior of that data set. But then, you have upper and lower bounds where you're looking at how far out would a new data point be compared to all the others that it would be too unique? That it would violate that null hypothesis of nothing's going on. In control chart, the null hypothesis, it's very stable, and it's not going to exceed more than plus or minus three standard deviations. When you have three standard deviations plus or minus, that's greater than 99% of the expected behavior within those lines and boundaries. So anything that exceeds on the top or the bottom, it doesn't mean that it was particularly good or bad, all it really says is this is unique. This is different. This is something special. And this is something that will violate my concept of the null hypothesis, nothing is going on that's particularly different. Now, these control charts, as I said, can be done with variables or attribute data. Each one has its own set of calculations, and we could cover that in a totally different module. But just to let you know, that variables and attribute data still are capable of being analyzed with control charts going forward. The one on the left individual's chart, is probably the most common one that we've seen in healthcare. And the p chart in the lower right for attribute data, those would be the ones that I'd direct you to first. But there are other options that we could go into later. Now, whenever you have, in those control charts, something that is unique that's the concept of a special cause rather than a common cause. The special cause, something is different that is very unique for that particular event. Common cause, it may be something that happens periodically, or it has a bias in that system. So being able to distinguish between what is just the normal behavior, although unique in comparison with other datasets would be common cause. Special cause is an event that is unique and different even within that own system. Like a fetal heart rate distress, something unique that's happening immediately and should be investigated immediately. So the definition of what is a stable process that's fulfilling that null hypothesis o nothing unique is going on that is when you have no special causes. There is nothing unique that's going on. And the three most common ways to judge whether there are no special causes is there hasn't been a shift over time of the mean. There doesn't seem to be any ascending or descending trending. And there wouldn't be any of these outliers that's going outside of those two red lines of the control limits. There can be many more tests, there are eight that are the most common. The three that we were just describing for special causes, again, would be the outlier, something exceeding that upper control limit or the lower control limit. Then, trend is where you have multiple points that are all, what are the chances that they're all going to, in the same direction, follow through that? It's going to be very, very unusual, statistically, rather rare. So any of these rarities is when you would say, I should investigate a special cause. Or the last one, a shift, where if I have an set of data points and I have identified and locked down wherever that center line is, wherever that average is. If I, all of a sudden, have a set of data points that all in a row are just shifted and living on one side or the other of that center line. Again, that would be a special, special cause that I should be investigating. So again, each one of these tests has its own hypothesis understanding. We've just gone through the control chart and briefly discussed correlation. But the ones on the lower end, the t test, F test, chi square, and nova, each one has their own p value. So think about the null hypothesis breach. For a T test, looking at the averages. The null hypothesis is nothing is going on. There's no difference between the subgroups. Or the F test, the variance between sub groups, the null would be nothing is going on, nothing is unique, nothing is different between those sub groups. And chi-square is just the ratio between groups. Again, the null hypothesis and the inference that you would draw from that analysis is based on nothing is going on. Anything that shows up in the p-value on any of those tests, when the p is very very low. That would indicate that there is a special cause, there is something going on. And then you could, in the future, show, not only the graphical analysis with your customers, but also include that P value that shows, from a statistical standpoint, from the calculation that confirms what your eyes are telling you. That there is something unique that's going on.