In this video, you will learn to perform a logistic regression. You will also learn to interpret the output, when you have a categorical y variable and a numerical x variable. The suitable method is a logistic regression. Logistic regression consists of three steps. In the first step, you may need to convert your data to the correct format. In the second step, you make a fitted line plot and evaluate the quality of the fit. In the final step, you analyze the p-value and see if you have found a significant relationship. Let's take a look at an example, and use this example to execute the steps. Suppose you work at a call center and if one of your customer calls, they will get an automatic message that informs him or her of the estimated waiting time before a call center agent will be available. And you wonder Will this influence whether a customer hangs up or not. So, we have measured two variables, first, whether or not the customer has hung up, and second, the announced whole time measured in minutes. The variable, if a customer has hung up or not, is your CTQ And this is a categorical binary ferry wheel. The whole time, if you are numerical influence factor on the CTQ. This is what your data looks like. We measure hold time and hung up. Hold time is the number of minutes that they have to wait. And for hung up one means that the customer has hung up, and the zero means that the customer did not hang up. The first step in Logistic Regression is to enter your data into an event trial format. And an event trial format looks like this. In the first column You put all your values for your X variable, which is the whole time measured in minutes. In the third column you note the number of trials, which is the number of calls, that got an expected hold time for the number of minutes in the first column. The second column is then events column which notes the number of time out of the total number of trials that an event happened. In this case, an event is that somebody has hung up. So it is the number of people that with this certain waiting time have hung up. So this is the Event Trial Format, but our original data looks differently. To go from this original data to the Event Trial Format, you can use a cross tabulation in Minitop and let me show you how to do that. So I copy pasted my data into. I have HoldTime here. And whether or not the person receiving that HoldTime HungUp in the second column. To go to cross tabulation, go to Start and Tables. And you'll find the Cross Tabulation here, okay. Now what do we want in a rows? Well in the rows we want to have the whole time fail use. And in a different columns, the number of people that hung up for that hold time. Okay, now minute of paste the cross tabulation into your session window. And if we have a look, then our data shows us this. For the HoldTime of 1 minute. 18 people decided to Hold and 1 person HungUp and 19 people here that they have 1 minute HoldTime. Now we want to have this table into our worksheet, so you selected all up to the 20 and we copy it. And now you go to your worksheet again. And you go to the first data row and you paste this data. Now MiniTab asks you do you want to do one or multiple columns and we want to do multiple columns, okay. So we have one here that is the Hold Time that somebody Then, this is the number of people that were not hung up. And this is the number of people that hung up. And this is the total number of people so that is a total number of calls. We have now completed step one. So let's move on to the second step. You can use the data which we have already set in an event trial format for you. And you can find it in the data files. Now, pause the video and load this data into before you continue. This is what your data should look like with hold times in one column HungUp in the second column and Calls in the third column. To perform a logistic regression, we, of course, go to the Stat menu and under Regression you will find the Binary Fitted Line Plot which is logistic regression analysis. First, you have to choose, what is my data format? And we have an event/trial format. Next you can give your event your own name, so let's call it HungUp, What's the number of events? Well there's, of course, the column HungUp. The number of trials is the column calls. And the predictor is the HoldTime. Well, that's it, so click on OK. A minute that gives you the output, a binary fitted line plot, and a lot of output in your session window so let's study this. For step 2, we have to take a look at the fitted line plot which looks like this. We see that the chance of hanging up increases when the hold time increases. When the expected hold time is around five minutes the probability of a hanging up is 0.2. Which means that 20% of the people that hear that they have wait up to five minutes will hang up. When there's waiting time Increases to around 15 minutes the probability increases to approximately 0.75. Which means that 75% of the people that call and hear a waiting time up to 15 minutes will hang up. We also see that many did not fit a straight line but that it is a slightly bent line which is sort of an S curve. This is because the probability of hanging up can never have a value below zero, or above one, as it is a probability. Now, we have to check if the results are significant. This is step three. To do this, we have to take a look at Many Top's session window. The output in a session window gives the summary of the analysis. On top you see how often a hang up did occur and how often it did not. The p-value indicates that the hold time is a significant influence factor on your hanging up or not. The formula associated with the s shaped curve is not as simple as with the linear regression formula. It contains exponents which give it its S shape. The exponent is the A to the power X button on your calculator. Summarizing, Logistic Regression consists of three steps. In th first step you organize your data in the event. Trial format. In the second step you make a fitted line plot. And in the third step, you check if the results are significant and can be generalized to the population.