0:15

Hello, and welcome to this lesson which will introduce Support Vector Machine algorithm.

Â SVM is, historically,

Â a very popular machine learning algorithm because it's

Â both fairly simple to understand and very powerful.

Â It can be applied to both classification problems,

Â where it can sometimes be called SVC,

Â as well as to regression problems.

Â This particular lesson has two components that you need to do.

Â One is, you're going to look at

Â a Support Vector Machine algorithm notebook that's online.

Â It's part of a book, Python Data Science, by Jake VanderPlas.

Â I'm going to show you this very quickly and then indicate where you can stop.

Â There's an example towards the end,

Â which is optional, and you don't have to go through.

Â Then there's our notebook,

Â which is the introduction to Support Vector Machine.

Â So, first, this notebook is part

Â of this book that Jake VanderPlas has written on Python Data Science.

Â You can, of course, buy the book and support the author,

Â as well as have your own copy of the material,

Â but he makes the material available as Python Notebooks online.

Â So, here, he's talking about motivating SVM,

Â how you make SVM and to put margin is,

Â these are all the things that we'll talk about as well,

Â how to use SVC, et cetera,

Â until you get down here towards the bottom when it's actually making a specific example.

Â And I don't ask you to go through this example

Â because it's a little more complex, this Face Recognition.

Â So, go through everything up to about here.

Â If you want to, you can go farther,

Â but that's a little bit more complex

Â because what's talking about is actually using images.

Â Our notebook, on the other hand,

Â is going to focus more on the Iris dataset,

Â as well as the Adult and Auto MPG datasets that we've seen before.

Â The content is going to talk about

Â the concept of hyperplanes and how to use those to split data,

Â and also introduce the idea of a non-linear kernel.

Â Then we're going to get into classification with Support Vector Machines.

Â We'll look at the Iris dataset.

Â We'll look at the decision surface with the SVC.

Â We'll also look at the impact of hyperparameters on the decision surface.

Â We'll then get into classification on the Adult Data and talk

Â about SVC on unbalanced classes and how you can try to handle that effect.

Â I'm also going to introduce the ROC curve or ROC curve and area under the curve or AUC.

Â And we're going to look at the gain and lift charts as well.

Â These are some concepts that are important for classification, not just for SVC,

Â for any classifier, but we're going to introduce some here with the results from SVC.

Â Lastly, we're going to look at Support Vector Machine: Regression,

Â and use that to make predictions on the Auto MPG Data.

Â So, we have our standard setup code,

Â then we're going to talk about SVC and the importance of hyperplanes.

Â Support Vector Machine, basically,

Â works by dividing the data with hyperplanes into resulting classes for classification,

Â whereas the hyperplanes themselves form the predictive model for progression.

Â So, it's important to understand what these hyper planes are.

Â I like to look at this visually.

Â And so, we can make a plot of the Iris dataset,

Â and that's what these code cells are doing.

Â They're making a plot of the Iris dataset and

Â showing planes that we can split the data with.

Â We've actually calculate the optimal planes to split these datasets,

Â these three classes apart from each other.

Â The way you look at this is,

Â the first line, up here,

Â we're splitting these particular classes,

Â these two classes, the red and the green.

Â So, the green line is the actual split,

Â and then the red and the blue are your plus and minus, if you will.

Â And then with the same thing down here,

Â we have the plus and minus.

Â I've also circled in light yellow,

Â what are known as the support.

Â These are the data points that determine these boundaries.

Â So, we're going to split the data equally such that

Â these supports are uniformly away from that split point.

Â Here, it's a much tighter split,

Â and we actually have some data that are

Â misclassified because they're on the wrong side. That's okay.

Â That's one of the features of SVC,

Â it allows for some misclassifications,

Â if in general, the splits work very well.

Â Now, technically, there should be three sets of

Â hyperplanes here because there's three classes.

Â I've removed one of them,

Â only showing two, just to make it easier to visualize.

Â And that's in the code. You can actually take that out and show all three if you want.

Â I also want to introduce non-linear kernels.

Â The idea here is that SVC works very well for linear splits.

Â But if your data is curved or has non-linearities,

Â SVC is going to have trouble.

Â The way you handle that data is you apply what's known as the kernel trick where

Â we transform the data into a space where the data is linearly separated.

Â Now, this may be confusing.

Â So, I think an examples are going to be helpful.

Â What we've done is we first start with some data.

Â And if you look at this data,

Â it's actually radially distributed,

Â that there's effectively a circle around this data

Â that this is class one and this is class two out here.

Â So, what we can do is perform a transformation, a polar transformation,

Â so that now we plot the radius versus the angle,

Â and you can see that there's a very nice simple linear split.

Â This is an example of the kernel trick,

Â transform the data into a space where there is a simple linear split.

Â We're not asking you in this course to figure out

Â all the best kernel tricks that you might apply the data.

Â Second learn has some capability of doing that on its own.

Â But I want you to understand what that actually means.

Â We're here from a non-linear split here.

Â We actually have to use a circle,

Â which is definitely non-linear,

Â transform it into a different coordinate system where it is a linear split,

Â and it's easy then for SVC to make this classification.

Â So, that brings us to support vector classification.

Â We have many of the same techniques that we're going to use here as we

Â did from logistic regression or decision tree classification.

Â We have some hyperparameters.

Â One of the most important to see.

Â This is a penalty term for regularization,

Â and we would not discuss regularization yet.

Â So, we're going to set this high to minimize that.

Â And then there's different kernels that we can apply.

Â This is to implement the kernel trick.

Â There is linear, which is, no kernel trick.

Â There's RBF or radial basis function,

Â which is similar to what we just showed.

Â There's polynomials, sigmoid, and even user defined functions.

Â So, we're going to first use the Iris dataset to demonstrate SVC.

Â If you look right away here,

Â you see that it is performing very

Â accurately better than kernel's neighbor or decision tree,

Â and the classification report and confusion matrix back that up.

Â There's only one misclassification.

Â Now, if you change the random seed,

Â these results would change.

Â That's because there is a bit of randomness here in terms of test/train split,

Â as well as, in some cases,

Â the different classification algorithms.

Â We can also look at the decision surface for SVC.

Â Here, you noticed the nice hyperplanes.

Â These are lines coming through here,

Â showing you the SVM split.

Â We can then vary the hyperparameters and see what happens to our decision surface.

Â Here, I'm changing the different kernels,

Â and you can see how the accuracy changes.

Â In this case, linear works really well, RBF,

Â the radial bases function,

Â sigmoid work pretty well as well.

Â So, here's the nice linear split.

Â And then, when we change to a radial basis or a polynomial,

Â noticed you start getting curvatures.

Â This is because we've transformed the data into a different space.

Â And when we come back to the original features, that's now curved.

Â And this shows you how you can handle non-linear decision surfaces

Â or boundaries with these kernel tricks.

Â We can also look at how

Â the more complex data can be classified with this particular technique.

Â This notebook goes much faster through it because we've already introduced the data.

Â We can actually just go straight to creating our features and

Â our labels. We have our data.

Â We can then actually do a test/train split.

Â We can then apply a SVM.

Â In this case, we don't use the standard SVC.

Â And the reason I do that is because this is a big data set,

Â and standards SVC can use a lot of memory and be slow.

Â And if we use linear SVC,

Â it's going to apply SVC but with a linear kernel.

Â So, no kernel trick.

Â It'll be faster if it uses a different underlying implementation.

Â When we run it, we can see that we get a pretty reasonable accuracy.

Â Remember, our zero model was around 75.

Â Then we can perform general classifications.

Â Again, overall, pretty good.

Â Not very good right here.

Â So, that's something that we would want to look at improving,

Â maybe dropping some features or changing some features.

Â In this case, we can look at it and say, "Well, you know what?

Â Our classes were unbalanced.

Â Maybe we need to do something about that."

Â So, this is showing again our zero model performance,

Â and we can then change our hyperparameter such that our class weight is balanced.

Â And then, when we run it, we notice that our SVC has gotten much better.

Â And in particular, this particular recall

Â here for the high income has gotten much better.

Â Remember, it was about 0.2 before. So it's almost doubled.

Â Now, when we look at standard performance metrics,

Â such as precision, recall,

Â et cetera, there are single number.

Â A lot of these classification algorithms

Â now produce probabilistic classifications which we can

Â use to figure out what's

Â the optimal threshold in that probability space to get the best classification.

Â And to do this, we use something called the receiver

Â operating characteristic curve or ROC curve.

Â The ROC curve is easier to explain if you look in an example.

Â And so, we have here,

Â we compute a logistic regression, a decision tree,

Â and then a SVC model on our data and plot the ROCs for all three.

Â So, here, we have, this is random.

Â If you just guess,

Â then you're going to get the random curves.

Â So, obviously, you want to be to the left of this. That's better.

Â The perfect curve is this yellow line here.

Â You have true positive versus false positive on the ROC curve.

Â And so, you won't be as close as possible to this yellow curve.

Â And you can see here that both the logistic regression and the SVC work very well.

Â The decision tree, not quite as well.

Â The other thing you can do is compute the area under this curve or the AUC,

Â and that's what we're displaying here.

Â And so, you can see these two models were very similar. You'll see the ROC curve a lot.

Â It's a great way to compare different models or

Â the tuning of different models, different hyperparameter combinations.

Â That's a way to try to figure out what's the best way to do it.

Â Other things that you might want to look at,

Â however, are the gain or the lift chart.

Â And the reason we look at these is because sometimes you want

Â to do more than just make a classification.

Â Sometimes you want to optimize the results of

Â your classification for a particular result.

Â So, here, we actually compute this.

Â This function is a little complex,

Â and it's admittedly so.

Â So, you don't have to understand it.

Â It simply understand that it's going to compute the gain for a classification.

Â And the gain is used to make the gain chart and

Â then that is also then used to create the lift chart.

Â This plot makes that, and here's our gain chart.

Â What this shows you is your baseline prediction,

Â very similar to your ROC curve,

Â that as we use more and more test data,

Â how much gain are we getting?

Â And the idea is, at some point,

Â you may stop getting benefits by adding more data into your analysis.

Â In other words, are there certain customers that you want

Â to hit first and you're targeting your advertising budget?

Â Say, you only have $1,000,

Â and it costs $10 per customer to target them.

Â You obviously don't want to just randomly grab people.

Â You want to try to find those customers that are going to be

Â the best return on value and hit those with your budget first.

Â That's the idea behind the gain and lift chart.

Â That was the gain chart. The lift chart is simply

Â the lift above that random that you saw on that gain.

Â So, in other words, you want to be above one.

Â And as we add more and more test data, we're going to approach one.

Â And the idea is, this allowed you to figure

Â out what was the customers that you best would get performance from.

Â And so, this is just decision tree,

Â and logistic regression, and SVM.

Â We could also apply SVM to regression tasks.

Â The differences at now, we're making a continuous prediction.

Â We use our Auto MPG dataset again.

Â We grabbed the data if we don't already have it.

Â We make our analysis,

Â test/train split, and then we perform SVR.

Â Our score is pretty low.

Â And if we look at these in terms of these metrics,

Â you see that it's higher than it was before.

Â We might want to play around with changing which features we use.

Â The mean absolute error is pretty high.

Â Now we're predicting within five to six miles per gallon.

Â So, probably, we would want to do better with that if we could.

Â In the student exercise,

Â I'll actually gives you some ideas for how you might improve that.

Â So, I'm going to stop here.

Â Support Vector Machine, powerful algorithm,

Â something you definitely want to be familiar with,

Â along with the other algorithms we talked about in this course,

Â it helps build that tool kit for you

Â and applying machine learning to datasets that you may encounter.

Â If you have any questions, let us know. And good luck.

Â