Reinforcement learning is actually not that old. When I started back in the mid 2000, it was a lot different than it is now. There were only a handful of labs around the world that focused on reinforcement learning. Almost nobody was using RL in the industry, and the practical successes of RL were limited to a handful of applications. Reinforcement learning could fly a helicopter upside down better than any human could. RL methods could beat the world's best backgammon players, and schedule elevators pretty well. But RL was not the best way to get robots to do useful things, and the idea that a Q-learning agent could play video games better than people was almost unimaginable. Yet we held out hope. Hope because we thought reinforce learning was the best way to make progress towards AI. The promise of RL was almost intoxicating that a learning agent can figure out how the world works by simply trying things and seeing what happens. After all, that's what we think people and animals do. So why not simulate that in a computer. If we could do it, then it would be useful in almost any application and as a nice side effect, we might learn a good deal about how our own minds work too. You might wonder what's the difference between supervised learning, unsupervised learning, and reinforcement learning? The differences are quite simple. In supervised learning we assume the learner has access to labeled examples giving the correct answer. In RL, the reward gives the agent some idea of how good or bad its recent actions were. You can think of supervised learning as requiring a teacher that helps you by telling you the correct answer. A reward on the other hand, is like having someone who can identify what good behavior looks like but can't tell you exactly how to do it. Unsupervised learning sounds like it could be related but really has a very different goal. Unsupervised learning is about extracting underlying structure in data. It's about the data representation. It can be used to construct representations that make a supervised or RL system better. In fact, as you'll see later in this course, techniques from both supervised learning and unsupervised learning can be used within RL to aid generalization. In RL, we focus on the problem of learning while interacting with an ever changing world. We don't expect our agents to simply compute a good behavior and then execute that behavior in an open loop fashion. We expect our agents to get things wrong, to refine their understanding as they go. The world is not a static place. We get injured, the weather changes, and we encounter new situations and our goals change. An agent that immediately integrates its most recent experience should do well especially compared with ones that attempt to perfectly memorize how the world works. The idea of learning online is extremely powerful and is a defining feature of RL. The way we introduce concepts is dictated by this fact. For example, we introduced the ideas of bandits and exploration first, and get ideas from supervised learning later. This might feel backwards for you if you have a background in machine learning. But as you'll see, the hard part is learning online rather than just learning from data. Getting comfortable with the online setting requires a new perspective. Today, the field of reinforce learning feels like it's changing at a breakneck pace. There are almost weekly posts about new algorithmic developments and improvements in state of the art performance in benchmark domains. Search companies, online retailers, and hardware manufacturers are exploring RL solutions for their day to day operations. They are convinced that online learning will make them more efficient, save them money, and keep humans at a dangerous situations. I would never have guessed that there would be companies dedicated to the study and application of reinforce learning. It's really quite amazing. However, the more quickly things change, the more important it is to focus on the fundamentals. Some of the ideas in RL can be traced as far back as Pavlov and his drooling dogs. Take almost any modern RL system and take a closer look. It's built from ideas and algorithms that pre-date this specialization by a decade or two. For example, at its heart DQN combines Q learning, neural networks, and experienced replay. Getting good performance out of these systems, however, requires significant innovation. We would never downplay the importance of that work. In fact, we want you to come to understand these complex enlarged learning systems. To do so you first need to understand the basics. That's where this course starts, with the fundamentals. In this specialization, we will cover most of the main ideas used in modern RL systems. By the end, you'll implement a neural network learning system to solve an infinite state control task. But we will start with small problems. We'll spend time discussing the fundamental challenges in reinforced learning and our best ideas for solving them. The problem of sequential decision making represents one of the greatest prizes of our generation. It seems appropriate to take our time and get the details right. We begin our study with a special case of the reinforcement learning problem called multi-arm bandits. The agent must decide which choice generates the best outcome or reward on average. This simpler instance of the full RL problem is perhaps the best setting to understand and solve challenges fundamental to RL. This module provides an introduction to estimating values, incremental learning, exploration, non-stationarity, and parameter tuning. We continue to develop and combine these ideas in different ways over the next four courses. Let's get started.