All right, hi everyone. We're going to talk about, introduction to logistic regression. Okay so periphery view of machine learning problems. So in machine learning we have supervisor learning with the labels and unsupervised learning it doesn't have label, and reinforcement learning with feedback signals. And we're going to focus on supervised learning. And largely it has two tasks, regression and classification. And in classification Barnier class and multi class classification. We're going to talk about, and previously we talked about linear regression. Can do the regression task and logistic regression. We're going to talk about in this video, although its name says regression, it's actually for classification especially it's useful for binary class classification. There are some ways to do the multi class classification with the largest in regression method, but it's going to require some engineering to do that. And other models that we're going to talk about it later and some of them will not talk about it in this course. They can do different things. So for example, a support vector machine can do both regression and classification. But similar to those disintegration it's usually good for binary class rather than multi class. But it can work on multi class if engineer the label and some algorithms inside of the model correctly and then decision trees can do everything. So you can do regression and binary class multi class without any problem. Also, it's nice that you can take categorical variable very efficiently. You know, network same thing can do everything. And many other models that we may not introduce in the course can do different things. So with this high level introduction, let's dive in. And let's talk about what is the binary class classification? It is essentially yes or no problem, so the label is finery. So for example credit card default, whether this customer that uses a credit card will likely to default on that given like some history data. And maybe there is an insurance claim, and some insurance claim can be fraudulent. So there can be a binary class classification, spam filtering given this email text. Is this a spam or not a medical diagnosis? Given this, patient information and lab tests and data. Is this person have a disease or not? Survivor prediction given these patients information and history and things like that. Whether these patients will survive for the next five years or not. How about customer retention? Is this customer behavior, it's likely to charm or not. So the marketing action can be taken, image recognition. Various kinds can also be binary class classification. Like, for example, is this animal door cat, right. And sentiment analysis, given these texts or twitters and sentences. What is the sentiment, is this a negative or positive thing sector. So as you can see binary class classification can have a variety of different types of data input. It could be tabulated data, it could be image, it could be text, it could be even speeches. So the that determines the binary class or not is actually entirely for the label instead of the data itself or the features itself. So brief example, we can talk about some breast cancer diagnosis problem. So this is one of the features that can determine whether this tumor is malignant or not. So it can be a binary class classification problem. And we want to have some kind of threshold or some decision value that above this value maybe we are more sure that this is going to be malignant. And maybe below this certain value. Maybe it's less likely to be malignant. So building a logistic regression model will help us to find this threshold value which is called the decision boundary by the way. And if you have more than one features, let's say we have a two features. It can be shown as a two D diagram like here and our decision boundary will be likely to be a line instead of a special value. So maybe this side is malignant and this side is likely to be benign. All right, so logistic function and provides some convenient way to construct a model like this. So let's see the function looks like this. It's between zero and 1, and it smoothly connects the line between zero and 1 and there is a sharp transition around the certain social value. Let's say this is zero, but it could be any other value. And this represents because it's between zero and one. Well just a function can be a probability function and actually the lowest stick function has another name called the Sigmoid. So this is also called the sigmoid function, and the form takes this one. So the sea is the linear combination of the features with his weight and bias. Like we did in the linear regression. And then this gig goes through a non-linear function, 1/1 plus E to the minus D. And then this entire function, all right. As a function of Z, is called the sigmoid. And takes the shape of this curb here, by the way this is called logic. And this is a relative decision boundary. So when it's set to zero, that means this is our threshold value. And the probability here it's going to be this one and this G is zero, then it's What half. So it's going to meet the 0.5. So with the 0.5 threshold. So about 0.5 we can say this is going to be malignant and below 0.5 probability. We can say it's going to be benign. Well, some people might ask, why don't we use linear regression instead? And maybe we can fit it. Like here, we can fit this and then maybe find some threshold, and it can also fit the probability of 0.5. We can try to do that, it's not easy. First of all we will have to find out where this threshold is. And maybe we can just fit the line first and then just figure out which value will give 0.5 threshold. But if you do that it gives a different special value to the logistic regression. So the one problem with the linear regression model if it fitted and then find the threshold where the probability value becomes 0.5 is that it's not very interpret, herbal. Where is the lost the regression with the sigmoid function? It is well-defined probability function. So it's very interpret table that we can find where probability becomes 0.5 and this gives the right threshold for us. So let's talk about this isn't boundary more so in you know very case where we have only one future. The decision boundary is a point where it meets the probability calls 0.5. So the occasion looks like this, and you can get the value out of it. If we have two features, the data relying on the two-dimensional space and then the decision boundary becomes a line. So we can find the line occasion here which will draw this line if it's a multi variant, have a multidimensional more than three. The decision boundary will be a hyper plane. Okay, let's talk about what if we have a multiple categories. So instead of having yes or no problem maybe we can have multiple categories. Such as maybe we would like to predict whether this animal is cat or dog or maybe cow. So for the logistic regression, the logic which is a decision boundary Texas form. And then and then for softmax which is multinomial. This has another name, multinomial. Multinomial logic regulation has this form. So they are very similar, except that there is now an index for K category. So this is industrial category. So for example for category number one we can construct this model. So there will be different ways assigned to each category. And for each feature, and now with this logic. So for largest aggregation used the sigmoid function as a probability, and we show this form, but it can be rewriting as this form as well. And this is very similar to softmax. The softmax function takes the same form as this one. Except that it now has an index for the category. And then instead of this Now it has all the suspension over of the possible exponents of this. Corresponding categories. All right, so softmax is called multinomal. Go to see regression. However, there is another similar way that we can use the original state regression for multicategories. So maybe category abc we can construct such that. It is a binary class classification for a versus not a. Which we will have to combine these two cases. So this is maybe logistic regression model one and this is a logistic regression model to we're going to do be versus not be. And then we're going to construct a third model that says Steve versus not see, and this approach is called one vs the rest. Okay, and ov our problem. So there are different ways to get the multicategory classification done. So one is like we mentioned we use a multi normal approach which is a softmax. And another way to do is using a VR. So you can find ck long library that uses you to realize this too. But I think softmax or multinomial is more common, and you will see later other classification models such as SBM and decision trees. They have a preferred way of being multi no miracles logistic or maybe some model is more convenient to use one versus the other. So we'll talk about that later. By the way, both of and softmax. Their probabilities for categories, they sum to one. So for example probability for a plus, probability for B plus probability for being C. Category for the sample number one they saw him to one. So that's the same for logistic and this optimistic regression. However, there could be some problem where maybe there are abc. Category and we don't necessarily need to pick one of them, but maybe the category doesn't even just at all. So neither cat nor dogs nor cow but something else then this should be 000 then yeah in that case it's called multi label problem. I know it sounds strange because labelling categories, what's the difference? But this type of problem where we don't necessarily have to pick one of them in the categories are called multi label problem. Versus if we have to pick one of the categories, then it's a multi class problem and both the logistic and softmax models there for multi class classification. And then there can be some other ways to treat the multi label problem, but we can still use the same models. But we will have to construct the labels differently and construct the concerns the training process a little differently. So that's a little bit of difference. But you will see more often the multi class classification problems than multilevel problems, but just keep in mind that they exist. All right, but anyway, softmax regression can give this kind of visualization. So let's say we had only two features in the data set, and the data will lie in the two D Plane. And this is going to be the decision boundary. That's up to max will give us you can see more examples here. All right, and this ends our video and then in the next video we're going to talk about how optimization works in logistic regression and how the coefficients are determined.