To this point, we've really looked at some of our own work using SVM's for prediction and major leagues. But I want to pick up research and I want to talk about this with wearables in particular, it's a really hot area now. A lot of our features have been nice and clean and easy to understand, but wearables are producing huge noisy data streams that are really difficult to interpret at times. For instance, this little IMU device can capture acceleration in three-dimensions, gyroscope data, temperature, light pressure, and more. It costs about a 100 bucks and it lasts for a day or two, even more it can capture some of these measurements at rates of almost 1,000 times per second, that's an incredible amount of data. What happens when we strap a few of these to an athlete? What can we do with that? A great place to start this investigation is with some replication work. Matthew Worsey described an approach to automatic classification of boxing punches through motion sensors and machine learning. In this work, they trained a model on a single athlete who wore sensors on each wrist, as well as one in the middle of his upper back at the T3 vertebrae. Each sensor captured three-dimensions of spatial position as well as rotation and acceleration. To collect training data, they classified 250 punches the athlete threw: left job, hook, uppercut and a right cross and a right uppercut. Following good practices they captured equal sized numbers of each punch type, 50 of each. After training a model, they then evaluated it on a hold-out dataset of 82 punches across the five categories, all done by the same athlete. They built several different predictive models to see if they could predict from the sensor data the kind of punch that was being thrown, and one of these models was an SVM. They've done the hard work of collecting the data for us, but why don't we dig in and see if we can build the models ourselves. The data the authors provided has over a 100 different feature set up for us. Each feature represents a set of summary statistics from the underlying sensor data, but Matthew was kind enough to send me the process data as well so that we could use it for this lecture. But he did send me the raw data and I wanted to take a moment to just explain what it looks like, and how data scientists do this work. It's all in this one file, this gust boxing classification data zip file, and we're going to look first at this raw sensor data, so let's read that in. It's an Excel file, or we can just read that right into pandas directly and it's quite large so this'll take a moment. If we look here, we see the sampling rate was 250 hertz, so that's 250 times a second. We've got acceleration in the X direction, the Y direction, and the Z axis, and we've got the gyroscope values as well, as well as the pitch, yaw and roll. They determined when a punch connected based on the impact of a glove sensor, and then they partitioned the data at 0.6 seconds before, and 0.6 seconds after this point, they looked the data inside of that window. So 250 hertz, that means there's 300 sensor observations for each punch that they detect. Let's take a look just at the first 300 observations. All right, so this is just the acceleration data, we see that it's pretty constant, spikes up a little bit and then drops off, and this was just at the beginning of the data collection. How would you actually take this and feed it into an SVM? I mean, previously, we have for an observation a single value for the pitch that we were looking at a single speed. Here we've got 300 values that represent a single feature. Well, the approach Matthew and his colleagues took, was to generate some summary statistics based on this distribution. For instance, you'll see in the data file there's the mean value, the standard deviation, the minimum, the kurtosis, the skew and otherwise. We can generate these for that data, we can take this curve, essentially, this set of events or observations, and we can turn them into a set of five or six different features. So that's the approach Worsey and team, took to get this very rapid fine-grained sensor data and turn it into momentary features for classifying boxing punches. But there's lots of different ways one could do this, so here's a bit of a thought experiment. Imagine you were doing this study, that you could put these little sensors anywhere on the boxer's body. Now, where would you put them, and how might you generate features from the underlying sensor data. Give this a moment and share your thoughts with me, I'm curious to see what you would come up with. With a bit of discussion of the data done, let's give it a go with building some of those SVMs. Let's bring our typical Data Processing Library. Now we're going to read in the boxing data. I'm going to start with just one sensor. The sensor placed on the T3 vertebrae, the upper back. The authors have separated the training and testing datasets for us, which is wonderful. This is great science, so it's really easy for us to actually replicate this and read these models in. In the data files, the class column it's called class is our y predictors or y-hat. We're going to build just a linear kernel for right now though that's something fun to play with, and we're going to use some cross-validation. This is a good approach and we'll still use accuracy as our metric. All right, so that looks like a pretty good classifier. It's sitting around 90 percent or so. I guess that we see in our cross-validation, 86, 84, 96, 96, 96, and a pretty tight standard deviation of 0.05. Actually, you'll notice here that I didn't set the random seed value to 1337 like I was previously, so you might actually have slightly different values. I think that there's something really quite notable about this look at how easy it was with clean data. We had data that was nicely set up into our train and our test set. We were able to learn a model based on the training and evaluate it nicely on that test set. This is really just great way to do science. Let's take advantage of that. We're going to fit the model, we're going to not use this cross-validation for fitting the model because we don't need to the cross-validation. All it does is gives us a better sense as to the confidence of how accurate we expected it's going to be. We're going to fit the model on all of our data, and now we're going to look at the accuracy it has on that test set. Okay, 67 percent, so that's a bit deflating. The accuracy is significantly lower than the accuracy that we've seen previously. But don't get too worried yet, we need to talk about what accuracy really means. In this context, accuracy is whether the exact label is being predicted as intended as you increase the number of classes, and in this case we have five different classes for the different punches. You will expect that there would be a hard time just getting the predicted value by chance. For instance, if you guess randomly at the outcome of two teams playing against one another, you would expect to be right roughly half the time, 50 percent assuming ties don't happen or are very infrequent. But now that we've got five classes to guess from your chance of being correct, assuming each class happened with equal frequency is only 20 percent. Accuracy is actually a misleading metric and it's one which is really rarely used for decision-making with machine-learning models. Instead, let's look at a new method to understand our model, the confusion matrix. Scikit-learn has a handy option to plot this confusion matrix. In this plot, which is a heatmap, the true labels, the correct values are plotted against the predicted labels. Instead, let's look at a new method to understand our model accuracy, the confusion matrix. Scikit-learn has a handy option to plot the confusion matrix in this plot, which is a heatmap, the true labels, the correct values are plotted against the predicted labels, those that were generated by our classifier. We've got a lot of different parameters that we can submit to this function. The first three are core and they include the classifier that we're trying to use followed by the data that we want to run through the classifier. In this case, that's our hold-out test set. First the features x-test followed by the class labels y-test. Then I'm going to tell scikit-learn with the class labels mean how to display them and what coloring I want to be shown to this heatmap. Then I'm just going to tweak a few of the dimensions so that we can look at it. Take a moment to study the confusion matrix. The y-axis on the left are the labels that come from our data. These are the true labels. Along the bottom access are the labels which are classifier predicted. The color of each cell corresponds to the number of instances which intersect with the labeled data and our classifier prediction. These are our true positives, the things that we predicted correctly. Each cell is color-coded. Dark blue means there were no instances in the intersection, and bright yellow means there were a lot. A perfect classifier with an accuracy of a 100 percent would show a bright yellow diagonal line on the confusion matrix, showing perfect alignment between the observed and predicted classes. We can use that knowledge to quickly make sense of this confusion matrix. Several of the bright yellow squares fall along the diagonal and we can see that our classifier is good at predicting right uppercuts as well as left jabs and hooks. If we look at the bottom right-hand corner, we see that the right uppercut is not well predicted by this model. Reading that bottom row, we can see that there were 13 instances of a right uppercut. You just sum all those numbers. But then only seven of them were correctly classified and that our classifier predicted a number of them to be left jabs, uppercuts, or even a right cross. Now, let's talk about that left jab for a moment. If you find the row with the left jab, you'll notice that our model made no correct predictions for this class. Instead, the model predicted the vast majority of these instances were right crosses and a few of them, it's even left hooks. This is a great example of why the confusion matrix is necessary to understand where your model is making errors. This model is pretty good with correctly predicting many of the different punches, but it's unable to make even chance prediction on the left jab and severely overpredicts whether a punch will be a right cross or not. I think this data does a great job of demonstrating why you can't just take accuracy as a metric and think that it tells the whole story. Instead, the confusion matrix gives you a much better understanding of where things are going wrong. This can be helpful in estimating the performance of the model in the real world. In this case, the left jab isn't going to show up and iterating on your features. Of course, in this data, I've intentionally just looked at this one sensor, the one at the back T3 vertebrae. It seems natural to me that including data with respect to the gloves themselves would help clean up this classification problem. That's another great example I think of where you can prototype something, see deficiencies, and start to think about how you might add new sensors in the environment to collect new data which will improve your predictions. But let's end off this lecture with a little more discussion of metrics. I think inspection of the confusion matrix is important for you as the sports data scientists, but sometimes you want a more simply describe that data to people. The first metric that we're going to look at is called precision. In a multi-class predictor like this, this is the ratio of the true positives divided by the combined number of true and false positives for that class. For instance, in this model, we never actually make a prediction of a right uppercut and a wrong. So that would be 7 divided by 7 plus 0. Perfect. However, we regularly predict the right cross and were wrong. So there the precision is quite low. 15 divided by 15 plus 3 plus 15 plus 2. Recall, on the other hand, is the number of true positives divided by the number of true positives and false negatives. This score gets smaller when you incorrectly predict that a given class was a given class. This is a false negative. In this case, the right cross recall is much better, 93 percent, since we will very rarely see a right cross punch and predict it to be something else and there's only one example in this where we predicted it to be a left uppercut. So take a look at these, both the precision and the recall. These are two common metrics used and they're different than accuracy. They come from this confusion matrix and describe a little bit how this data works and whether you want to use precision, recall accuracy, or any of the other evaluation metrics, really depends on why you might use that model. What are you going to do with it in the future? How sensitive you to false positives or false negatives? This wraps up our introduction to support vector machines. We saw how when you're using a binary class prediction like fastball or change up and there's only two features like speed and spin we can visualize the decision boundary, the street as a straight line with a linear SVM. The modeling technique gets its name from the data points which constrain this line. Those are called the support vectors. We don't have to cast this as a linear problem though and we can build up a polynomial kernel and constrained it in different ways to learn a better fitting decision boundary. More generally, we call these boundaries hyperplanes and we can apply them in n dimensions and use many different features. We also don't have to constrain our model to binary classification and you saw that in this lecture where we tackled five different class predictions. Finally, we dove in a bit more on how to evaluate how good a model actually is, including understanding the confusion matrix as a sense-making device and the precision and recall statistics as different ways to summarize aspects of the confusion matrix. These measures that we're using as well as the confusion matrix don't just work for the SVM, but they work for lots of other machine learning models as well.