Now that we have a working definition for supervised learning, let's take a look at what this process actually looks like. We'll start with a bit of terminology. From this point on, we'll refer to inputs as features, and outputs as labels. We'll refer to a single input-output pair as an example, as an example to show our model. And then we'll refer to a collection of examples as a data set. We'll be using these terms interchangeably. This isn't to confuse you, but to get you comfortable with the fact, that they are used interchangeably in everyday practice. And it can be much more confusing later on, if you aren't comfortable with these terms and concepts from a few different angles. Sound good? Okay, let's cover a few more terms then. So, commonly, a data set is split into what's called a training set, a validation set and a test set. The training set is a set of examples or input-output pairs, that the model is given in order to learn the function that links the inputs to the outputs. The difference between the many types of machine learning models that exist today, is the process through which the right function is found. Now, what makes a function the right function? Well, a common misconception, is that, a function is good if it can accurately link inputs to outputs on the training set. However, we need to remember our real goal. We want to find a model that can process new inputs and produce new and accurate outputs. This means that we can't evaluate the model on inputs it's already seen. We want to ensure that our model can generalize what it's learned, to new inputs from real world applications, which means evaluating it on examples that it was not already exposed to, during function learning, which would essentially be cheating. In other words, we want to evaluate our model on unseen examples. If a model can produce accurate outputs for these unseen examples, then we can say that the model generalizes well to new inputs. This brings us to the validation set. The validation set is a set of examples, or input-output pairs, that we hold out and do not expose the model to during training. And instead we use it to periodically assess, or validate the generalization performance of our model, as we develop the model. We use the validation set in order to get a sense of when the model has found a function that generalizes well. In practice, we can run the model on the validation set intermittently during training. Remember, we can turn off the learning mode of the program while doing so, so that the model doesn't actually update its function using these particular examples. And since this is a program, we can design it to do whatever we want. And then we can save the version of the model that gets the best validation set performance. We can also compare performance on the validation set, to make it better level design choices regarding aspects of the program that trains the model. And other function types that we consider during training. We call these other aspects, hyperparameters. Now there's one final thing that we need to consider, which is that, even though we do not let our model learn using the validation set examples, we do still compare performance on the validation set, to tweak and tune the meta level hyper parameters that we mentioned, of the learning program. In the process we may be inadvertently helping the model to do better on these particular examples. So to address this, we have one final set of data, that we call the test set. The test set, is a set of examples, that we hold out until the very end of the model development process, to double check the models generalization performance, and examples that are completely unseen during any aspect of model development. We can evaluate our final learned bottle on the test set, to get the most accurate assessment of expected generalization performance. To summarize, we've defined, one, the training set, which we expose the model to, in order to learn function parameters. Two, the validation set, which we do not allow the model to learn from, but which we use periodically to estimate generalization performance, and to choose meta level hyperparameters of the learning program. And then finally, three, the test set, which we hold out until the very end of model development, and run our final model on once, to get an accurate test of expected model performance on completely unseen data. Now with these three data sets in mind, let's see how they're used in practice. Typically, model training takes the form of a training loop. The training loop refers to a repeated training procedure, that allows the model several chances of learning good, generalizable functions from the training set. The training loop is pretty simple. We start with the program. The program sets up the training environment, with a selection of hyperparameters, and initializes the model with a random function. At this point, we set the models best performance to be zero. We can then expose the model to examples from the training set, in order to learn a function from inputs to outputs. We evaluate how the function does on the validation set. If the model gets better performance than it ever has before, we save this version of the model. And so we can repeat the second and third steps of training the model, and assessing the validation performance, until the performance on the validation set no longer goes up. Typically, we'll repeat the above for various hyperparameter settings. This is known as hyperparameter tuning, and different hyperparameters can produce different models. Now, once we're satisfied with the models performance on the validation set, we can run this final model once on the test set, in order to doubly confirm the model's performance on completely unseen examples. This training loop is at the heart of the machine learning process. A lot of the machine learning principles that we'll continue to talk about in this course, will focus on, one, different kinds of machine learning models that are common for this training procedure. And two, high level design decisions that we can make to produce high performing models. >> Let's pause here for an important note. Don't worry at all if some of this seems unfamiliar and even a bit intimidating. Depending on your prior background, we will be covering these concepts many many times in the coming lessons with plenty of medical examples. So, or not so secret plan now that I'm telling you it's really not secret anymore, is that by introducing many terms and concepts of front, over subsequent lessons with more rich examples and more longer discussions. All of this will start to come together for you and become more and more familiar, and hopefully, ideally even intuitive. It can definitely seem like a lot depending on your background, but we are confident that we've done this in a way that, by the end, you'll feel like a total pro. And yes, you'll have to trust us on this, but since we've spent all this time together, we aren't strangers anymore, right? So let's move on. We've talked before about examples in a dataset and how datasets are split up for the sake of training machine learning models. But what do these examples really look like? Recall that an example is simply an input-output pair. Alternatively, we also said that the input can be described as a set of features, and the output was the label. When the model has learned from many examples or input-output pairs, it can now take a new input and give a new output. That output by the model is typically called, the prediction. And as we covered earlier, features are the pieces of information given to the model in order to make a prediction. It's the data or the stuff that we give the model so that it can give us information we want, as output back to us. Again, the input features could be structured data, for example a patient's lab values or diagnosis codes, etc. These sorts of inputs, structured data, are also commonly used with the more traditional statistical models you may be familiar with. Structured data is commonly input into the model as a list or a vector of numbers. The input features can also be unstructured data. Unstructured data are things like images, or natural language in text reports, for example. Well, maybe intuitive of how structured data can be mathematically used by a model or program after all their numbers, how can can unstructured data or features be mathematically used by a model? That's a great question I'm glad you asked [LAUGH]. In the case of medical images, they're typically represented as grids of numbers, where each number represents the intensity at a given pixel location. In grayscale images, there is only one grid. In color images, there are actually three grids of numbers, overlaid on top of each other. The red, the green and the blue grids, each of those representing the intensity of its respective color at those pixel locations. In the case of free text, blocks of text are typically represented with what are called, word embeddings. Word embeddings are geometric numerical vector representations of words. We can think about this intuitively in a two-dimensional geometric space. If you were to imagine placing words as objects in front of you on a flat surface, you might see that certain words are related and clustered together. In medical texts, they might be cough and fever, while words like cholesterol would be somewhat further away on this little grid. And that might be associated with triglycerides, for example. The corresponding numerical coordinates of each word, in other words, its vector representation, is what is known as its embedding. And this is what helps mathematically represent, and relate these words together with one another, with numbers similar to the way that pixels must be represented as numbers. Of course, we can also use word embeddings in higher dimensional spaces than two dimensional, but the two dimensional case is easiest to understand from an intuitive perspective. Word embedding representations, allow a model to interpret words as numbers, and therefore as features, so they can be used as input for a model. And so, returning to the concept of labels, which we will cover more throughout this course, the label is again the output or prediction. And could be a diagnosis or an outcome or really anything you want the model to give you back with a given type of input or features. The features or input could be a structured data like I've mentioned, like lab values, or unstructured data like X-ray images or free text reports. With many examples of the input features and output label, a model can eventually learn a function to reliably produce the label when given new unlabeled input features. Set in another way, when a model is given a set of features, a training model should ideally produce an output that matches the feature's corresponding label. All right, so that's a lot, and I want you to feel free to review these first few videos a couple times, to get the hang of all this new vocabulary if it still seems unfamiliar to you. But again, don't worry, we're going to cover a lot more of this next.