About this Course

57,281 recent views

Learner Career Outcomes

33%

started a new career after completing these courses

56%

got a tangible career benefit from this course

33%

got a pay increase or promotion
Shareable Certificate
Earn a Certificate upon completion
100% online
Start instantly and learn at your own schedule.
Flexible deadlines
Reset deadlines in accordance to your schedule.
Advanced Level
Approx. 26 hours to complete
English

Learner Career Outcomes

33%

started a new career after completing these courses

56%

got a tangible career benefit from this course

33%

got a pay increase or promotion
Shareable Certificate
Earn a Certificate upon completion
100% online
Start instantly and learn at your own schedule.
Flexible deadlines
Reset deadlines in accordance to your schedule.
Advanced Level
Approx. 26 hours to complete
English

Offered by

Placeholder

National Research University Higher School of Economics

Syllabus - What you will learn from this course

Content RatingThumbs Up81%(2,251 ratings)Info
Week
1

Week 1

5 hours to complete

Intro: why should I care?

5 hours to complete
14 videos (Total 85 min), 6 readings, 3 quizzes
14 videos
Why should you care9m
Reinforcement learning vs all3m
Multi-armed bandit4m
Decision process & applications6m
Markov Decision Process5m
Crossentropy method9m
Approximate crossentropy method5m
More on approximate crossentropy method6m
Evolution strategies: core idea6m
Evolution strategies: math problems5m
Evolution strategies: log-derivative trick8m
Evolution strategies: duct tape6m
Blackbox optimization: drawbacks4m
6 readings
About the University10m
Rules on the academic integrity in the course10m
FAQ10m
Primers1h
About honors track1m
Extras10m
Week
2

Week 2

3 hours to complete

At the heart of RL: Dynamic Programming

3 hours to complete
5 videos (Total 54 min), 3 readings, 4 quizzes
5 videos
State and Action Value Functions13m
Measuring Policy Optimality6m
Policy: evaluation & improvement10m
Policy and value iteration8m
3 readings
Optional: Reward discounting from a mathematical perspective10m
External links: Reward Design10m
Discrete Stochastic Dynamic Programming10m
3 practice exercises
Reward design8m
Optimality in RL30m
Policy Iteration30m
Week
3

Week 3

3 hours to complete

Model-free methods

3 hours to complete
6 videos (Total 47 min), 1 reading, 4 quizzes
6 videos
Monte-Carlo & Temporal Difference; Q-learning8m
Exploration vs Exploitation8m
Footnote: Monte-Carlo vs Temporal Difference2m
Accounting for exploration. Expected Value SARSA11m
On-policy vs off-policy; Experience replay7m
1 reading
Extras10m
1 practice exercise
Model-free reinforcement learning30m
Week
4

Week 4

3 hours to complete

Approximate Value Based Methods

3 hours to complete
9 videos (Total 104 min), 3 readings, 5 quizzes
9 videos
Loss functions in value based RL11m
Difficulties with Approximate Methods15m
DQN – bird's eye view9m
DQN – the internals9m
DQN: statistical issues6m
Double Q-learning6m
More DQN tricks10m
Partial observability17m
3 readings
TD vs MC10m
Extras10m
DQN follow-ups10m
3 practice exercises
MC & TD10m
SARSA and Q-learning10m
DQN30m

Reviews

TOP REVIEWS FROM PRACTICAL REINFORCEMENT LEARNING

View all reviews

About the Advanced Machine Learning Specialization

Advanced Machine Learning

Frequently Asked Questions

More questions? Visit the Learner Help Center.