We've said before that the data you have can determine the machine learning problem you work on. But how true is this fact? In this video, we're going to introduce the No Free Lunch Theorem, and explain the real-world consequences that influence your machine learning process. Machine learning is all about performing a task, classification, regression, whatever. The idea is that with more experience, the machine will become better at that task with respect to a given evaluation criteria. For a machine, experience means data, specifically, examples of events describing the relevant task. For instance, if you want your machine to classify patients as sick or not based on their clinical health reports, then experiences are simply instances of sick or healthy patients along with their health records. As we saw earlier, a QuAM that predicts sick versus healthy patients uses such examples to become better at the prediction task. A large set of these examples makes a dataset, which can be used to train such QuAMs. These examples are generated through some form of data generating process. The data generating process in our example is the real world, recording information about real patients and the outcomes. But in general, data is generated by some process, and those processes can be described by a probability distribution, whether we know what the distribution is or not. If we do know the probability to distribution that's generating our data, perfect. That's our QuAM. We can ask questions about the probabilities of events and get answers directly from that model. But most of the time, and definitely in all interesting cases, we don't have an absolute reference for what the best probability distribution actually is. However, if we have enough data coming from this distribution we can estimate it, and depending on the data quality and volume, as well as how well behaved the underlying probability distribution truly is, we can use the data to train a QuAM that can reasonably accurately predict new instances generated from the same distribution. As long as our training data and new examples are generated from the same distribution, the more data we have, the more accurate a picture we can construct. So why am I repeating the whole data is good in the basis of prediction in this way? Because the critical piece is that the machine learning prediction task we're trying to solve, is dependent on the specific data generating distribution that underlies the tasks. That's what gives us the training data. That's what the learning algorithm is trying to model. Now remember, we have many different types of learning algorithms in a selection of ML techniques. So the question we should be asking is, whether we can expect the same results from any old learning algorithm when we're training a QuAM for a particular task, which is itself tied to some specific data generating distribution, or is there a unique learning algorithm which works for that particular task. In 1996, computer scientist David Wolpert, introduced the theorem called the No Free Lunch Theorem which may answer this question. What the "No free lunch theorem" states is, that there is no universal learning algorithm which works better than any other on all machine learning prediction tasks or application domains. Remember, the prediction tasks and domains are in principle defined by the data generating distribution or distributions underlying it. So the no free lunch theorem says that, when averaged over all possible data generating distributions, any learning algorithm will have the same performance, including the stupidest ones. The ones that always say no or yes, or the ones that give completely arbitrary answers. But wait a second here. We know that some learning algorithms perform better than others. You've seen it yourself in Course 2. So what's going on? The key is in the universal side. All possible data generating distributions is quite a larger category than all possible problems we care about. Because it includes arbitrary and constant and useless distributions, along with those elusive distributions that are actually describing the phenomenon we find value in, the ones we actually want to model. According to the no free lunch theorem, there's no universal solution that we can promise performs best even on everything we care about, because we don't really know what the underlying distributions are for everything we care about. So the upshot is, a learning algorithm which works well in one task may not work well in another one compared to a different learning algorithm. There's no universally best model or algorithm. Now that we know about the no free lunch theorem, how does that change the way we train QuAMs? Well, there are two key lessons. First, since there's no universal learners, we have to try different learning algorithms to identify the best one for a given task. The second key takeaway is, that for us to make sure the selected algorithm is the best for the task at hand, we need a large enough dataset to represent the data generating distribution of that task. If we don't have enough data to represent the task, there's a good chance we could pick the wrong learning algorithm. Not only that, it has to be quality data. You'll learn more about some of these aspects in the coming videos. So now, you're ready to dazzle your friends with the mathematical reality that there truly is no free lunch. In the context of machine learning, there always exist tasks which will cause problems for any given learning algorithm. But fairly reliably, we can make good choices because we can choose the algorithms that perform best in the tasks we care about. Since you've been taking these courses, you have a nice, specific, and well-defined task that you're going to solve. So next week we're going to dive into the data, how to know it, and use it to the best effect.