Reinforcement learning is a hard topic. It seems every week I hear about a new algorithm or new application of reinforcement learning. On the other hand, the fundamentals of RL are much the same. State of the art learning systems are often constructed from a few well-known ideas, which are actually not that complicated. Take DQN for example, this learning system combines Q learning, Epsilon-greedy action selection, neural network function approximation, and a few other ideas to achieve superhuman scores in Atari games. In this one learning system, we see many of the most common building blocks of RL systems. This course will closely follow Sutton and Martha's new addition of reinforcement learning and introduction. The first edition trained two generations of RL researchers including Martha and myself. The RL book and this specialization adhere to a simple principle, introduce each idea in the simplest setting it arises. In this spirit, we begin our study with multi-arm bandit problems. Here, we get our first taste of the complexities of incremental learning, exploration, and exploitation. After that, we move onto Markov decision processes to broaden the class of problems we can solve with reinforcement learning methods. Here we will learn about balancing short-term and long-term reward. We will introduce key ideas like policies and value functions using almost all RL systems. We conclude Course 1 with classic planning methods called dynamic programming. These methods have been used in large industrial control problems and can compute optimal policies given a complete model of the world. In Course 2, we built on these ideas and design algorithms for learning without a model of the world. We study three classes of methods designed for learning from trial and error interaction. We start with Monte Carlo methods and then move on to temporal difference learning, including Q learning. We conclude Course 2 with an investigation of methods for planning with learned models. In Course 3, we leave the relative comfort of small finite MDPs and investigate RL with function approximation. Here we will see that the main concepts from Courses 1 and 2 transferred to problems with larger infinite state spaces. We will cover feature construction, neural network learning, policy gradient methods, and other particularities of the function approximation setting. You'll notice that this specialization starts out with basic material, but by the end of Course 3, you'll see how much we build on simple concepts introduced earlier on. You'll finish Course 3 and wonder how it is you learned so much, when it all seemed pretty simple at the time. The final course in this specialization brings everything together in a Capstone project. Throughout this specialization, as in Rich and Andy's book, we stress a rigorous and scientific approach to RL. We conduct numerous experiments designed to carefully compare algorithms. It takes careful planning and a lot of hard work to produce a meaningful empirical results. In the Capstone, we will walk you through each step of this process so that you can conduct your own scientific experiment. We will explore all the stages from problem specification, all the way to publication quality plots. This is not just academic. In real problems, it's important to verify and understand your system. After that, you should be ready to test your own new ideas or tackle a new exciting application of RL in your job. We hope you enjoyed the show half as much as we enjoyed making it for you.