Deep Reinforcement Learning: From Theory to Practice

Sichern Sie sich eines unserer besten Angebote mit Coursera Plus für 199 $ (normalerweise 399 $). Jetzt sparen.

kurs ist nicht verfügbar in Deutsch (Deutschland)

Wir übersetzen es in weitere Sprachen.

Deep Reinforcement Learning: From Theory to Practice

Dieser Kurs ist Teil von Spezialisierung „Foundations of Reinforcement Learning“

Dozent: Ashutosh Trivedi

Bei enthalten

Mehr erfahren

6 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

Stufe Mittel

Empfohlene Erfahrung

1 Woche zu vervollständigen

unter 10 Stunden pro Woche

Flexibler Zeitplan

In Ihrem eigenen Lerntempo lernen

6 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

Stufe Mittel

Empfohlene Erfahrung

1 Woche zu vervollständigen

unter 10 Stunden pro Woche

Flexibler Zeitplan

In Ihrem eigenen Lerntempo lernen

Was Sie lernen werden

Explain how neural-network-based function approximation extends reinforcement learning beyond finite tabular settings.
Implement and evaluate value-based deep reinforcement learning algorithms, including Deep Q-Networks and stabilizing techniques.
Derive and implement policy-gradient methods, including REINFORCE, baselines, and advantage-based updates.
Explain and analyze actor–critic methods that combine policy optimization with value estimation.

Kompetenzen, die Sie erwerben

Kategorie: Deep Learning
Kategorie: Machine Learning Methods
Kategorie: Agentic systems
Kategorie: Model Optimization
Kategorie: Machine Learning Algorithms
Kategorie: Artificial Intelligence
Kategorie: Model Training
Kategorie: System Design and Implementation
Kategorie: Applied Machine Learning
Kategorie: Machine Learning
Kategorie: Artificial Neural Networks
Kategorie: Algorithms
Kategorie: Reinforcement Learning
Kategorie: Model Evaluation

Wichtige Details

Zertifikat zur Vorlage

Zu Ihrem LinkedIn-Profil hinzufügen

Kürzlich aktualisiert!

Juni 2026

Bewertungen

7 Aufgaben

Unterrichtet in Englisch

Erfahren Sie, wie Mitarbeiter führender Unternehmen gefragte Kompetenzen erwerben.

Weitere Informationen zu Coursera für Unternehmen

Logos von Petrobras, TATA, Danone, Capgemini, P&G und L'Oreal

Erweitern Sie Ihre Fachkenntnisse

Dieser Kurs ist Teil der Spezialisierung Spezialisierung „Foundations of Reinforcement Learning“

Wenn Sie sich für diesen Kurs anmelden, werden Sie auch für diese Spezialisierung angemeldet.

Lernen Sie neue Konzepte von Branchenexperten
Gewinnen Sie ein Grundverständnis bestimmter Themen oder Tools
Erwerben Sie berufsrelevante Kompetenzen durch praktische Projekte
Erwerben Sie ein Berufszertifikat zur Vorlage

In diesem Kurs gibt es 6 Module

How can reinforcement learning scale beyond small tabular problems to high-dimensional environments such as games, robotics, and autonomous decision-making? This course introduces deep reinforcement learning, where reinforcement-learning algorithms are combined with neural-network-based function approximation.

Learners begin by studying why tabular methods break down in large or continuous state spaces and how value functions, action-value functions, and policies can be represented by parameterized models. The course then develops value-based deep reinforcement learning methods, including fitted value iteration, Deep Q-Networks, replay buffers, target networks, Double DQN, dueling networks, and prioritized experience replay. Learners also study direct policy optimization through policy-gradient methods such as REINFORCE, as well as actor–critic methods that combine policy optimization with value estimation. The course introduces selected modern deep RL algorithms, such as PPO, DDPG, and SAC, with emphasis on implementation, stability, diagnosis, and empirical evaluation. By the end of the course, learners will be able to implement deep reinforcement-learning agents, diagnose common sources of instability, evaluate learned behavior using suitable experimental protocols, and report results in a reproducible way. This course can be taken for academic credit as part of CU Boulder’s Masters of Science in Computer Science (MS-CS) and Master of Science in Artificial Intelligence (MS-AI) degrees offered on the Coursera platform. These fully accredited graduate degrees offer targeted courses, short 8-week sessions, and pay-as-you-go tuition. Admission is based on performance in three preliminary courses, not academic history. CU degrees on Coursera are ideal for recent graduates or working professionals. Learn more: MS in Artificial Intelligence: https://www.coursera.org/degrees/ms-artificial-intelligence-boulder MS in Computer Science: https://coursera.org/degrees/ms-computer-science-boulder

This module introduces function approximation as the transition point from tabular reinforcement learning to deep reinforcement learning. In Course 1, we represented values explicitly using tables: V (s), Q(s, a). This works when the state and action spaces are small enough to enumerate. But many reinforcement-learning problems have large, continuous, high-dimensional, or image-like observations. In such settings, tables are not enough. Course 2 replaces tables by parameterized functions: Vθ(s), Qθ(s, a), πθ(a | s). The parameter vector θ may represent a linear model, a neural network, or another differentiable function class. The central question of this module is: How do we learn value functions when tables are too large? The module also explains why deep RL is not merely supervised learning applied to RL data. The targets are noisy, bootstrapped, policy-dependent, and often moving as the parameters change. These difficulties lead to the deadly triad: function approximation, bootstrapping, and off-policy learning. The module ends with fitted value iteration as a bridge from tabular value iteration to deep Q-learning.

Das ist alles enthalten

8 Videos11 Lektüren2 Aufgaben

8 VideosInsgesamt 38 Minuten

Course Introduction4 Minuten
When Tabular RL Breaks6 Minuten
From Tables to Vθ and Qθ6 Minuten
Losses, Gradients, and Semi-Gradient Updates8 Minuten
Bootstrapping, Changing Targets, and Correlated Data4 Minuten
Function Approximation, Bootstrapping, and Off-Policy Learning4 Minuten
A Bridge from Value Iteration to Deep Q-Learning5 Minuten
Module Summary1 Minute

11 LektürenInsgesamt 70 Minuten

Earn Academic Credit for your Work!10 Minuten
Course Support10 Minuten
Assessment Expectations5 Minuten
Why Function Approximation5 Minuten
Parameterized Value Functions10 Minuten
Learning from Targets10 Minuten
Monte Carlo TD Targets5 Minuten
Why RL is Harder Than Supervised Learning5 Minuten
The Deadly Triad3 Minuten
Fitted Value Iteration5 Minuten
Module Summary2 Minuten

2 AufgabenInsgesamt 50 Minuten

AI Policy Quiz5 Minuten
Function Approximation for RL 45 Minuten

This module develops value-based deep reinforcement learning as bootstrapped regression. In the previous module, we replaced tabular value functions by parameterized functions: Vθ(s), Qθ(s, a), πθ(a | s). We also saw that function approximation changes the learning problem: values are no longer stored independently, targets can move as parameters change, and bootstrapped updates can become unstable. This module applies these ideas to deep action-value learning. We begin with fitted value iteration, which turns Bellman updates into regression problems. We then study Deep Q-Networks, or DQN, where a neural network represents Qθ(s, a). DQN combines Q-learning targets with two important stabilizers: replay buffers and target networks. Finally, we study common DQN variants: Double DQN, dueling networks, and prioritized replay. The goal is to understand DQN not as a mysterious deep-learning recipe, but as Q-learning plus function approximation, bootstrapped targets, replay, and stabilization.

Das ist alles enthalten

7 Videos7 Lektüren1 Aufgabe

7 VideosInsgesamt 30 Minuten

Replacing the Table by a Neural Network3 Minuten
Action Values as Neural Network Outputs4 Minuten
Bellman Targets and Gradient Updates4 Minuten
Learning From Stored Transitions4 Minuten
Stabilizing Bootstrapped Targets4 Minuten
Putting the Pieces Together5 Minuten
Double DQN, Dueling Networks, and Practical Refinements6 Minuten

7 LektürenInsgesamt 42 Minuten

From Q-Learning to DQN10 Minuten
Deep Q-Networks5 Minuten
Replay Buffers5 Minuten
Target Networks5 Minuten
The DQN Algorithm5 Minuten
DQN Improvements10 Minuten
Module Summary2 Minuten

1 AufgabeInsgesamt 45 Minuten

Deep Q-Learning and Value-Based Deep RL45 Minuten

This module introduces policy-gradient methods, a family of reinforcement-learning algorithms that optimize a parameterized policy directly rather than deriving behavior from a learned value function. Starting from the motivation for direct policy learning, the module develops the policy-gradient objective, the score-function trick that makes this objective differentiable from sampled experience, and REINFORCE, the foundational Monte Carlo policy-gradient algorithm. The module then introduces baselines as a practical variance-reduction technique and closes by motivating actor-critic methods as the natural next step once a learned baseline is introduced.

Das ist alles enthalten

9 Videos6 Lektüren1 Aufgabe

9 VideosInsgesamt 75 Minuten

Module Introduction2 Minuten
Learning Policies Directly7 Minuten
From Action Values to πθ(a | s)9 Minuten
Expected Return as an Optimization Problem8 Minuten
How To Differentiate Through Sampled Actions13 Minuten
Monte Carlo Policy-Gradient Learning13 Minuten
Making Policy Gradients Practical11 Minuten
Why We Need Learned Critics10 Minuten
Module Summary1 Minute

6 LektürenInsgesamt 50 Minuten

Direct Policy Optimization5 Minuten
The Score-Function Trick5 Minuten
Trajectory Distribution and Causality10 Minuten
REINFORCE10 Minuten
Baselines and Advantages10 Minuten
From REINFORCE to Actor-Critic10 Minuten

1 AufgabeInsgesamt 45 Minuten

Policy Gradients and REINFORCE45 Minuten

REINFORCE updates the policy directly from sampled Monte Carlo returns, but those returns are noisy — the same policy can produce wildly different outcomes from episode to episode. This module introduces actor–critic methods, which tame that variance by learning a second component, the critic, that estimates how good a state or action is and feeds that estimate back into the policy update as a baseline. Learners will see how subtracting a learned value function from the return produces an advantage signal, how that signal generalizes from the one-step TD error to the multi-step Generalized Advantage Estimator, and how actor and critic are jointly trained via separate policy and value losses. The module closes by tracing the conceptual line from basic actor–critic methods to PPO, motivating why controlling the size of policy updates matters for stable learning.

Das ist alles enthalten

8 Videos1 Aufgabe

8 VideosInsgesamt 69 Minuten

Module Introduction3 Minuten
From REINFORCE to Learned Critics8 Minuten
Policies and Value Functions Together8 Minuten
Learning from Better-than-Expected Outcomes9 Minuten
Using TD Errors to Update the Policy9 Minuten
Training the Two Networks11 Minuten
Balancing Bias and Variance11 Minuten
Why Stable Policy Updates Matter10 Minuten

1 AufgabeInsgesamt 45 Minuten

Actor-Critic Methods45 Minuten

This module surveys modern deep reinforcement learning algorithms through the lens of stability, exploration, and continuous control. In the previous module, we studied policy-gradient and actor–critic methods. Vanilla policy-gradient updates can be brittle. If the policy changes too much after one update, the new policy may perform much worse than the old one, and the data collected under the old policy may no longer be reliable for updating the new one. This module studies three major algorithmic ideas. First, we study conservative policy updates through TRPO and PPO. The main idea is to improve the policy while preventing overly large policy changes. PPO implements this idea using a simple clipped surrogate objective. Second, we study DDPG, a deterministic actor–critic method for continuous-control problems. Third, we study SAC, an entropy-regularized actor–critic method that encourages exploration and often improves robustness.

Das ist alles enthalten

8 Videos4 Lektüren1 Aufgabe

8 VideosInsgesamt 69 Minuten

Module Introduction3 Minuten
Stability, Continuous Actions, and Exploration9 Minuten
Actor–Critic with Controlled Updates9 Minuten
Probability Ratios and Clipping9 Minuten
Continuous Control: Why Discrete-Action Methods Are Not Enough7 Minuten
Deterministic Actor-Critic for Continuous Actions12 Minuten
Learning Policies that Explore10 Minuten
Soft Actor–Critic and Modern Deep RL Design11 Minuten

4 LektürenInsgesamt 42 Minuten

PPO: Proximal Policy Optimization15 Minuten
DDPG: Deep Deterministic Policy Gradient10 Minuten
SAC and Algorithm Comparison15 Minuten
Module Summary2 Minuten

1 AufgabeInsgesamt 45 Minuten

Modern Deep RL: PPO, DDPG, and SAC45 Minuten

This module turns deep reinforcement learning algorithms into implementation patterns. Earlier modules introduced the main algorithmic ideas: function approximation, DQN, policy gradients, actor–critic methods, PPO, DDPG, and SAC. This module asks how those ideas become working code. A deep RL implementation is not just a neural-network training loop. In supervised learning, the data are usually given in a fixed dataset. In reinforcement learning, the data are generated by an agent interacting with an environment. This means the implementation must manage environment interaction, exploration, neural-network models, optimizers, replay buffers or trajectory buffers, target networks, logging, evaluation, and reproducibility.

Das ist alles enthalten

7 Videos2 Lektüren1 Aufgabe

7 VideosInsgesamt 47 Minuten

Module Introduction3 Minuten
How Algorithms Become Code8 Minuten
Organizing Experience9 Minuten
Training Actors and Critics9 Minuten
Knowing Whether Learning Is Working6 Minuten
Measuring Learned Behavior Reliably8 Minuten
Course Summary4 Minuten

2 LektürenInsgesamt 30 Minuten

Theory Notes10 Minuten
Course Synthesis: Deep Reinforcement Learning in Practice20 Minuten

1 AufgabeInsgesamt 45 Minuten

Practical Deep RL Implementation45 Minuten

Erwerben Sie ein Karrierezertifikat.

Fügen Sie dieses Zeugnis Ihrem LinkedIn-Profil, Lebenslauf oder CV hinzu. Teilen Sie sie in Social Media und in Ihrer Leistungsbeurteilung.

Dozent

Ashutosh Trivedi

University of Colorado Boulder

2 Kurse47 Lernende

von

University of Colorado Boulder

Mehr von Algorithms entdecken

Packt
Deep Reinforcement Learning Hands-On
Spezialisierung
Status: Kostenloser Testzeitraum
Kategorie: Credits angeboten
University of Colorado Boulder
Mastering Classic Reinforcement Learning Algorithms
Kurs
Kategorie: Credits angeboten
Packt
Cutting-Edge Topics in Deep Reinforcement Learning
Kurs
Status: Kostenloser Testzeitraum
Kategorie: Credits angeboten
Packt
Foundations of Deep Reinforcement Learning with PyTorch
Kurs
Status: Kostenloser Testzeitraum
Kategorie: Credits angeboten

Warum entscheiden sich Menschen für Coursera für ihre Karriere?

Felipe M.

Lernender seit 2018

„Es ist eine großartige Erfahrung, in meinem eigenen Tempo zu lernen. Ich kann lernen, wenn ich Zeit und Nerven dazu habe.“

Jennifer J.

Lernender seit 2020

„Bei einem spannenden neuen Projekt konnte ich die neuen Kenntnisse und Kompetenzen aus den Kursen direkt bei der Arbeit anwenden.“

Larry W.

Lernender seit 2021

„Wenn mir Kurse zu Themen fehlen, die meine Universität nicht anbietet, ist Coursera mit die beste Alternative.“

Chaitanya A.

„Man lernt nicht nur, um bei der Arbeit besser zu werden. Es geht noch um viel mehr. Bei Coursera kann ich ohne Grenzen lernen.“

Häufig gestellte Fragen

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.