Mastering Classic Reinforcement Learning Algorithms

Obtenez l'une de nos meilleures offres avec Coursera Plus pour 199 $ (habituellement 399 $). Économisez maintenant.

Ce cours n'est pas disponible en Français (France)

Nous sommes actuellement en train de le traduire dans plus de langues.

Mastering Classic Reinforcement Learning Algorithms

Ce cours fait partie de Spécialisation "Foundations of Reinforcement Learning"

Instructeur : Ashutosh Trivedi

Inclus avec

5 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

1 semaine à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

5 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

1 semaine à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

Ce que vous apprendrez

Formulate sequential decision-making problems as deterministic decision processes, Markov chains, and finite Markov decision processes.
Explain and apply core reinforcement-learning concepts, including discounting, value functions, policies, Bellman equations, and optimality.
Implement planning algorithms for finite Markov decision processes, including value iteration, policy iteration, and linear programming formulations.
Compare tabular reinforcement-learning algorithms, including bandits, Monte Carlo methods, temporal-difference learning, SARSA, and Q-learning.

Compétences que vous acquerrez

Catégorie : Reinforcement Learning
Catégorie : Machine Learning
Catégorie : Algorithms
Catégorie : Markov Model
Catégorie : Machine Learning Algorithms
Catégorie : Statistical Machine Learning
Catégorie : Model Optimization
Catégorie : Probability Distribution
Catégorie : Probability & Statistics
Catégorie : Artificial Intelligence and Machine Learning (AI/ML)
Catégorie : Sampling (Statistics)
Catégorie : Decision Intelligence
Catégorie : Applied Mathematics

Détails à connaître

Certificat partageable

Ajouter à votre profil LinkedIn

Récemment mis à jour !

juin 2026

Évaluations

6 devoirs

Enseigné en Anglais

Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées

En savoir plus sur Coursera pour les affaires

logos de Petrobras, TATA, Danone, Capgemini, P&G et L'Oreal

Élaborez votre expertise du sujet

Ce cours fait partie de la Spécialisation "Foundations of Reinforcement Learning"

Lorsque vous vous inscrivez à ce cours, vous êtes également inscrit(e) à cette Spécialisation.

Apprenez de nouveaux concepts auprès d'experts du secteur
Acquérez une compréhension de base d'un sujet ou d'un outil
Développez des compétences professionnelles avec des projets pratiques
Obtenez un certificat professionnel partageable

Il y a 5 modules dans ce cours

How can an agent learn to make good decisions through repeated interaction with an uncertain environment? This course introduces the mathematical and algorithmic foundations of classical reinforcement learning, with an emphasis on finite Markov decision processes and tabular methods.

The course begins with the simplest settings in which the central ideas are clearest: deterministic decision processes, discounted rewards, and Bellman optimality equations. It then introduces stochasticity through Markov chains and Markov decision processes, where learners study policies, value functions, expected discounted reward, and dynamic programming. With this foundation in place, the course turns to planning methods for known models, including value iteration, policy iteration, and linear programming formulations. The second half of the course studies reinforcement learning when the model is unknown and the agent must learn from sampled experience. Topics include multi-armed bandits, exploration and exploitation, Monte Carlo methods, temporal-difference learning, SARSA, Q-learning, and convergence principles. The course ends with a final assessment in which learners solve the same finite MDP from both model-based planning and model-free learning perspectives. By the end of the course, learners will be able to formulate finite decision-making problems as Markov decision processes, solve them using classical planning algorithms, and implement tabular reinforcement-learning algorithms from sampled data. This course provides the foundation for later study of deep reinforcement learning, reward programming, and trustworthy AI systems. This course can be taken for academic credit as part of CU Boulder’s Masters of Science in Computer Science (MS-CS) and Master of Science in Artificial Intelligence (MS-AI) degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more: MS in Artificial Intelligence: https://www.coursera.org/degrees/ms-artificial-intelligence-boulder MS in Computer Science: https://coursera.org/degrees/ms-computer-science-boulder

This module introduces the modeling and optimization foundations for sequential decision-making in their simplest form: deterministic decision processes with discounted rewards. We begin with states, actions, transitions, and rewards as a language for representing decision problems over time. We then develop value functions and Bellman equations as tools for optimizing long-term return. The goal is to build intuition for why dynamic programming is correct in the simpler setting of deterministic decision processes before introducing stochastic transitions, learning from sampled experience, and bootstrapping in later modules.

Inclus

11 vidéos12 lectures2 devoirs

11 vidéosTotal 69 minutes

Course Introduction7 minutes
Decision-Making over Time 3 minutes
States, Actions, Transitions, and Rewards 2 minutes
From Unfolded Decisions to State-Based Models 2 minutes
Formal Definition of a Deterministic Decision Process 4 minutes
Discounting Infinite Reward Streams 9 minutes
Runs, Histories, Policies, and Values9 minutes
Discounted Optimality Equations5 minutes
Checking Values and Extracting Policies5 minutes
Why Bellman Equations Characterize Optimal Behavior10 minutes
Existence, Uniqueness, and Value Iteration13 minutes

12 lecturesTotal 110 minutes

Earn Academic Credit for your Work!10 minutes
Course Support10 minutes
Assessment Expectations5 minutes
Sequential Decision-Making as Optimization10 minutes
States, Actions, Transitions, and Rewards10 minutes
Deterministic Decision Processes10 minutes
Discounting Infinite Reward Streams10 minutes
Policies, Runs, and Values10 minutes
Bellman Equations and Dynamic Programming10 minutes
Why Bellman Equations Characterize Optimal Behavior10 minutes
Existence, Uniqueness, and Value Iteration10 minutes
Module Summary5 minutes

2 devoirsTotal 50 minutes

AI Policy Quiz5 minutes
Deterministic Decision Processes45 minutes

This module adds stochasticity to the deterministic picture developed in the previous module. Learners continue with the surprise-quiz example, now with uncertain outcomes: studying usually helps but may not always help, and relaxing may reduce preparation but may not always do so. The module first introduces stochastic transitions as probability distributions over next states, then studies Markov chains as stochastic systems without choices and finally adds actions to obtain Markov decision processes. The goal is to make expected discounted reward, policies, and Bellman equations feel like natural extensions of the deterministic setting.

Inclus

8 vidéos8 lectures1 devoir

8 vidéosTotal 70 minutes

Module Introduction2 minutes
From Deterministic to Stochastic Transitions10 minutes
Markov Chains23 minutes
Markov Decision Processes7 minutes
Policies and Values8 minutes
Checking Values and Extracting Policies3 minutes
Bellman Optimality Equations7 minutes
Why Bellman Optimality Equations Are Correct9 minutes

8 lecturesTotal 70 minutes

From Deterministic to Stochastic Transitions10 minutes
Markov Chains10 minutes
Expected Discounted Reward in Markov Chains10 minutes
Markov Decision Processses10 minutes
Policies, Value Functions, and Expected Return 10 minutes
Bellman Equations for MDPs10 minutes
Comparing DDPs, Markov Chains, and MDPs5 minutes
Module Summary5 minutes

1 devoirTotal 45 minutes

Markov Chains and Markov Decision Processes45 minutes

This module focuses on known-model optimization. Learners use Bellman equations as computational tools for policy evaluation, policy improvement, value iteration, policy iteration, and linear programming formulations of discounted MDPs.

Inclus

9 vidéos8 lectures1 devoir

9 vidéosTotal 41 minutes

Module Introduction2 minutes
Planning Setup4 minutes
Policy Evaluation6 minutes
From Values to Better Policies 8 minutes
The Bellman Optimality Operator 5 minutes
Value Iteration as Fixed-Point Computation6 minutes
Alternating Evaluation and Improvement5 minutes
The Linear Programming View of Optimality 3 minutes
Module Summary2 minutes

8 lecturesTotal 75 minutes

Planning with a Known Model 10 minutes
Policy Evaluation10 minutes
Policy Improvement10 minutes
The Bellman Optimality Operator 10 minutes
Value Iteration10 minutes
Policy Iteration10 minutes
Linear Programming for Discounted MDPs10 minutes
Module Summary5 minutes

1 devoirTotal 45 minutes

Dynamic Programming in MDPs45 minutes

This module begins the transition from planning to reinforcement learning. In planning, the MDP model is known and Bellman backups compute expectations exactly. In reinforcement learning, the model is replaced by sampled experience. Learners first view reinforcement learning as sample-based dynamic programming, then study rewards, uncertainty, agent--environment interaction, bandit estimation, exploration versus exploitation, Monte Carlo policy evaluation, and Monte Carlo control.

Inclus

9 vidéos11 lectures1 devoir

9 vidéosTotal 37 minutes

Module Introduction2 minutes
From Planning to Reinforcement Learning3 minutes
Rewards, Uncertainty, and Exploration3 minutes
The Agent–Environment Interface3 minutes
One-Armed Bandits5 minutes
Multi-Armed Bandits5 minutes
Monte Carlo Policy Evaluation8 minutes
Monte Carlo Control6 minutes
Module Summary3 minutes

11 lecturesTotal 74 minutes

From Planning to Learning10 minutes
From Planning to Reinforcement Learning10 minutes
Rewards, Uncertainty, and Behavior5 minutes
The Agent–Environment Interaction Loop5 minutes
One-Armed Bandits10 minutes
Multi-Armed Bandits10 minutes
Monte Carlo Estimation2 minutes
Returns as Random Variables5 minutes
Monte Carlo Policy Evaluation5 minutes
Monte Carlo Control10 minutes
Module Summary2 minutes

1 devoirTotal 45 minutes

Learning from Sampled Experience 45 minutes

This module completes the tabular reinforcement-learning part of Course 1. Module 4 introduced sample-based learning through bandits and Monte Carlo methods. Module 5 introduces temporal-difference learning: updating after one sampled transition by combining an observed reward with a bootstrapped value estimate. The module ends by summarizing tabular reinforcement learning and motivating the transition to function approximation and deep RL.

Inclus

8 vidéos9 lectures1 devoir

8 vidéosTotal 33 minutes

Learning before the Episode Ends4 minutes
One-Step Bootstrapped Prediction5 minutes
On-Policy Temporal-Difference Control 4 minutes
Off-Policy Temporal-Difference Control5 minutes
What Policy Is Being Learned? 4 minutes
Smoother Targets and Overestimation3 minutes
Reducing Maximization Bias3 minutes
Between Monte Carlo and One-Step TD4 minutes

9 lecturesTotal 39 minutes

Why Temporal-Difference Learning?5 minutes
TD(0) Policy Evaluation5 minutes
On-Policy TD Control5 minutes
Q-Learning: Off-Policy TD Control5 minutes
On-Policy and Off-Policy Learning2 minutes
Expected SARSA and Maximization Bias5 minutes
Double Q-Learning5 minutes
n-Step TD2 minutes
Why Move Beyond Tabular Methods?5 minutes

1 devoirTotal 45 minutes

Control, Exploration, and Tabular RL Algorithms45 minutes

Obtenez un certificat professionnel

Ajoutez ce titre à votre profil LinkedIn, à votre curriculum vitae ou à votre CV. Partagez-le sur les médias sociaux et dans votre évaluation des performances.

Instructeur

Ashutosh Trivedi

University of Colorado Boulder

3 Cours60 apprenants

Offert par

University of Colorado Boulder

En savoir plus sur Algorithms

University of Colorado Boulder
Deep Reinforcement Learning: From Theory to Practice
Cours
Catégorie : Crédit proposé
University of Colorado Boulder
Reward Programming: Optimizing RL Efficiency and Safety
Cours
Catégorie : Crédit proposé

Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?

Felipe M.

Étudiant(e) depuis 2018

’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’

Jennifer J.

Étudiant(e) depuis 2020

’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’

Larry W.

Étudiant(e) depuis 2021

’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’

Chaitanya A.

’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’

Foire Aux Questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.