Now that you've got the basics of SVMs down, let's take a look at how to tweak this model a bit. In this lecture, I want to start by tackling the question of the linear hyperplane. Up until now, we've tried to separate these two pitches, fast balls and curve balls using just two features, spin and effective speed and to do so with a straight line. But sometimes a linear hyperplane doesn't make any sense. SVMs have two ways really of dealing with this. The first is to put the effort on you as the data scientist and to have you modify your features using some non-linear pre-processing steps. You've actually probably seen this before when people do things like take the natural log of an axis and then plot it again and all of a sudden it reveals it to be a linear trend. This actually works really well and it's applicable to most if not all machine learning methods. But we can also directly tell the Sklearn SVC that we want to use a non-linear kernel. Let's see how this works with the pitching data that we were looking at previously. I'm going to bring in the data science imports just like we did previously, build our dataset, create our pitch colors for rendering, drop our missing values, and we'll intersperse our classes for rendering a little bit later. Let's just take a look at this. This looks basically the same as we saw before. We've got the curveball, fastball, then it's a fastball curveball, that's fine as long as we've got some diversity of our pitches at the beginning. Now, it's actually pretty straightforward to fit a polynomial classifier. The most important concern is actually on over-training it to your data. The issue is that we would expect to have a very high degree polynomial hyperplane, a really wavy line is, it's going to be likely to work poorly for new data which is near that street. Sklearn allows us to choose the degree of the polynomial function though that we want to fit. Here I want to create a polynomial classifier with a degree of five. We just do this by saying kernel equals poly degree of five, and here I've set that random state. You can see underneath that I have the code just for what the linear classifier would look like so it's very similar. Use a third-party function to actually show the decision boundary instead of plotting the hyperplane itself. Unfortunately, this library requires our target wise to be numeric. So I'm going to change everything from the pitch type using this pandas factorize function. This just changes everything to a 0 or 1. Instead of, I think it was ft and cu we'll have a 0 or 1. Again, we're just going to look at our first 1,000 data files or data points. Now, before we fit or train this model, we need to make our train test splits. Sklearn actually has a handy helper function for this. You saw in the previous lecture that we did this manually ourselves, but we can use this train test split function to do it for us. This is actually a great example of some of the things that Sklearn does to help you build your machine-learning models more effectively and faster. Here we just tell it what we're interested in is features, what we're interested in is the predictor, what we want our test size to be, and then we can then say what we want our random state to be for reproducibility. It actually just returns four values: our X_train, X_test, y_train, and y_test. First, let's just run this model and see what it looks like. The accuracy is 90 percent which sounds nice, but wasn't that linear SVM almost perfect? In fact, I think it was perfect. Let's quickly build that model too with some partition data here just in case this data partitioning had something to do with it. Here we just took this test size of 0.2 so 20 percent test set. Maybe we got confused based on that data. This is quite a big improvement if we use the linear one. It seems like this polynomial one is actually hurting our ability to predict. I think it's useful in these cases to look at things side-by-side. I'm going to create two subplots side-by-side. I'm going to parameterize and fit our two models. I'm going to do the linear one first and I'm going to do the poly one second. Remember this is a polynomial with a degree of five. Now we're going to use this mlxtend to plot the decision boundaries. This was great work coming out of the University of Wisconsin, Madison. To do that, we just pull in the plotting section, we call the function plot decision regions, and we just pass it our X_test and our y_test for and give it our classifier. I'm actually going to point it to the two different matplotlib axes that come out of our subplots. They should be side-by-side and we'll set those titles. I think that this is really interesting. We've got this linear SVM on the one side and it's perfectly, it seems, classified. Although we know that that's not quite a perfect classification based on the actual accuracy metric that was provided to us. Then the polynomial one here is actually really poor. It's just a line that goes across the screen, it seems. Now, if you think about it, it's actually not the world's worst line. The majority of the blue ones are on the blue side, and the majority of the orange ones are on the orange side. In fact, it's actually pretty good with the orange ones. It's really bad with these blue ones, and there's an obvious straight line that's better. We could see that the polynomial SVM we fit is really poor. It does have some accuracy greater than chance. There are only a few triangles above the decision boundary, but it's really making a poor choice for what appears to be a straight hyperplane. The polynomial kernel is actually further parameterized though, and in this example, we only set the degree of the curve to other features which are pretty impactful that we can use exist here, C, which is the regularization parameter, and Coef0, which is the independent term in the kernel function. There is actually lots of other parameterizations, but you can consider these two as a good place to start. Setting C values penalizes the model for creating highly specific and thus not very generalizable models, and the Coef0 parameter has the opposite approach and controls how much the model is influenced by high degree polynomial functions like this one. Let's try a few different values. I'm going to create eight different plots here, and I'm just going to create a bunch of different polynomial models here. I'm going to have our linear one and I'm going to have our poly five one. These are just like we did a moment ago, but I'm also going to have a poly five where I give it a C parameter so you can see what that looks like. We'll set the C as five. By default, the C is one. This is always a positive increasing number. Then I'm just going to have the poly five model with the Coef0 set to five. Then I'm going to have different combinations of them. I'm going to set C to five, Coef to one, C to 15, Coef to one, 15, and five. Then I'm going to try a three-degree polynomial, set the C to 25, and set the Coef to two. I'm just playing with a bunch of different things. This is a great way to explore and start to get an intuition for how to parameterize these models and then go to the docs to understand better how the models might work. Here's our results. This is actually really interesting, and we can see that there's actually quite a bit going on in this model. The linear one looks pretty clear, and then we saw this poly five one right next to it too. Here, we see that when we set the C parameter, we start to get a little bit better curve. We're penalizing the model a little bit and that's allowing us to learn a space, a hyperplane that fits a lot better. We can actually see this Coef01, setting that value seems to have a lot of power in this model. I think that it's interesting to look through all of these features, all of these different parameterizations and start to get a sense for how they change what actually the hyperplane looks like when using polynomial SVMs. We tried a bunch of different models, and we have two that fit really well, the linear SVM and the polynomial degree five SVM with a C of 15 and a Coef0 of five. Is this the best approach, just guess and test a bunch of parameters and determine which models work the best? The answer is actually pretty complex. At a high level, no. You want to leverage your knowledge of the problem, the data and the algorithms to make an informed understanding of how you're going to parameterize your model. Ideally, you want a simple model. It seems pretty clear in this case the linear SVM is that. You need to keep in mind that the high-level machine learning workflow we spoke of at the beginning of this course, if you're constantly looking at the results of your test set, then you're actually having your model learn information about that set, and the model isn't going to be very predictive on future data. There are some really interesting methods though for tuning your hyperparameter values and then using them on a final holdout set. The most common of these that I see used is called the grid search. This is an exhaustive search of the parameter space. It's really just guessing and testing all the different parameters and the different ways that you can look at your and build your solutions. It's exhaustive only in the sense that you have enough time or money to have the CPU chug away building your models. It's one of the reasons, and techniques like that, are one of the reasons machine learning can be very computationally intensive. The results, though, are pretty nice, and as long as you've got that validation set, you actually have a strong understanding though, of how your model actually will work in the real world. I don't want to go there yet. Let's look a little bit more at this hyperplane instead in the next lecture.