Sample-based Learning Methods

Sample-based Learning Methods

This course is part of Reinforcement Learning Specialization

Instructors: Martha White

38,715 already enrolled

Included with Learn more

Ask Coursera

5 modules

Gain insight into a topic and learn the fundamentals.

1,256 reviews

Intermediate level

Recommended experience

Flexible schedule

2 weeks at 10 hours a week

Learn at your own pace

90%

Most learners liked this course

5 modules

Gain insight into a topic and learn the fundamentals.

1,256 reviews

Intermediate level

Recommended experience

Flexible schedule

2 weeks at 10 hours a week

Learn at your own pace

90%

Most learners liked this course

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

5 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Reinforcement Learning Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 5 modules in this course

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning.

By the end of this course you will be able to: - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model - Understand the connections between Monte Carlo and Dynamic Programming and TD. - Implement and apply the TD algorithm, for estimating value functions - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between on-policy and off-policy control - Understand planning with simulated experience (as opposed to classic planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna

Welcome to the second course in the Reinforcement Learning Specialization: Sample-Based Learning Methods, brought to you by the University of Alberta, Onlea, and Coursera. In this pre-course module, you'll be introduced to your instructors, and get a flavour of what the course has in store for you. Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

What's included

2 videos2 readings1 discussion prompt

This week you will learn how to estimate value functions and optimal policies, using only sampled experience from the environment. This module represents our first step toward incremental learning methods that learn from the agent’s own interaction with the world, rather than a model of the world. You will learn about on-policy and off-policy methods for prediction and control, using Monte Carlo methods---methods that use sampled returns. You will also be reintroduced to the exploration problem, but more generally in RL, beyond bandits.

What's included

11 videos3 readings1 assignment1 programming assignment1 discussion prompt

11 videosTotal 58 minutes

What is Monte Carlo?7 minutes
Using Monte Carlo for Prediction6 minutes
Using Monte Carlo for Action Values3 minutes
Using Monte Carlo methods for generalized policy iteration3 minutes
Solving the Blackjack Example4 minutes
Epsilon-soft policies5 minutes
Why does off-policy learning matter?5 minutes
Importance Sampling4 minutes
Off-Policy Monte Carlo Prediction5 minutes
Emma Brunskill: Batch Reinforcement Learning12 minutes
Week 1 Summary4 minutes

3 readingsTotal 90 minutes

Module 1 Learning Objectives10 minutes
Weekly Reading40 minutes
Chapter Summary40 minutes

1 assignmentTotal 30 minutes

Graded Quiz30 minutes

1 programming assignmentTotal 5 minutes

Blackjack5 minutes

1 discussion promptTotal 10 minutes

Comparing on-policy and off-policy learning10 minutes

This week, you will learn about one of the most fundamental concepts in reinforcement learning: temporal difference (TD) learning. TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods. TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world, and do not require knowledge of the model. TD methods are similar to DP methods in that they bootstrap, and thus can learn online---no waiting until the end of an episode. You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping. For this module, we first focus on TD for prediction, and discuss TD for control in the next module. This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.

What's included

6 videos2 readings1 assignment1 programming assignment1 discussion prompt

6 videosTotal 37 minutes

What is Temporal Difference (TD) learning?5 minutes
Rich Sutton: The Importance of TD Learning6 minutes
The advantages of temporal difference learning5 minutes
Comparing TD and Monte Carlo6 minutes
Andy Barto and Rich Sutton: More on the History of RL12 minutes
Week 2 Summary2 minutes

2 readingsTotal 50 minutes

Module 2 Learning Objectives10 minutes
Weekly Reading40 minutes

1 assignmentTotal 30 minutes

Practice Quiz30 minutes

1 programming assignmentTotal 180 minutes

Policy Evaluation with Temporal Difference Learning180 minutes

1 discussion promptTotal 10 minutes

Should we care about TD in the brain?10 minutes

This week, you will learn about using temporal difference learning for control, as a generalized policy iteration strategy. You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa, Q-learning and Expected Sarsa. You will see some of the differences between the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both. You will implement Expected Sarsa and Q-learning, on Cliff World.

What's included

9 videos3 readings1 assignment1 programming assignment1 discussion prompt

9 videosTotal 30 minutes

Sarsa: GPI with TD4 minutes
Sarsa in the Windy Grid World3 minutes
What is Q-learning?3 minutes
Q-learning in the Windy Grid World4 minutes
How is Q-learning off-policy?5 minutes
Expected Sarsa4 minutes
Expected Sarsa in the Cliff World3 minutes
Generality of Expected Sarsa2 minutes
Week 3 Summary2 minutes

3 readingsTotal 90 minutes

Module 3 Learning Objectives10 minutes
Weekly Reading40 minutes
Chapter summary40 minutes

1 assignmentTotal 30 minutes

Practice Quiz30 minutes

1 programming assignmentTotal 180 minutes

Q-Learning and Expected SARSA180 minutes

1 discussion promptTotal 10 minutes

How can we use off-policy for learning multiple goals?10 minutes

Up until now, you might think that learning with and without a model are two distinct, and in some ways, competing strategies: planning with Dynamic Programming verses sample-based learning via TD methods. This week we unify these two strategies with the Dyna architecture. You will learn how to estimate the model from data and then use this model to generate hypothetical experience (a bit like dreaming) to dramatically improve sample efficiency compared to sample-based methods like Q-learning. In addition, you will learn how to design learning systems that are robust to inaccurate models.

What's included

11 videos4 readings2 assignments1 programming assignment1 discussion prompt

11 videosTotal 47 minutes

What is a Model?5 minutes
Comparing Sample and Distribution Models2 minutes
Random Tabular Q-planning3 minutes
The Dyna Architecture5 minutes
The Dyna Algorithm5 minutes
Dyna & Q-learning in a Simple Maze5 minutes
What if the model is inaccurate?4 minutes
In-depth with changing environments6 minutes
Drew Bagnell: self-driving, robotics, and Model Based RL7 minutes
Week 4 Summary2 minutes
Congratulations!2 minutes

4 readingsTotal 130 minutes

Module 4 Learning Objectives10 minutes
Weekly Reading40 minutes
Chapter Summary40 minutes
Text Book Part 1 Summary40 minutes

2 assignmentsTotal 90 minutes

Practice Assessment 45 minutes
Replacement Practice Assignment45 minutes

1 programming assignmentTotal 180 minutes

Dyna-Q and Dyna-Q+180 minutes

1 discussion promptTotal 10 minutes

Compare Planning and Reasoning10 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Instructor ratings

(223 ratings)

Martha White

University of Alberta

4 Courses116,860 learners

Adam White

University of Alberta

4 Courses116,860 learners

Offered by

University of Alberta

Alberta Machine Intelligence Institute

Explore more from Machine Learning

Northeastern University
Statistical Learning for Engineering Part 1
Course
Category: Preview
University of Colorado Boulder
Mastering Classic Reinforcement Learning Algorithms
Course
Status: Free Trial
Columbia University
Decision Making and Reinforcement Learning
Course
Category: Preview
Northeastern University
Statistical Learning for Engineering Part 2
Course
Category: Preview

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

5 stars
82.27%
4 stars
13.27%
3 stars
2.78%
2 stars
0.63%
1 star
1.03%

Showing 3 of 1256

Reviewed on Jul 15, 2023

It was a good course, but I was expecting more explanation on the subjects in the book. For example Prioritized Sweeping was missing and the videos are not instructive enough.

Reviewed on Aug 1, 2023

Excellent material, excellent didactic, and the programming exercises provide the completion needed for the methods understanding, beautiful curse.

Reviewed on May 20, 2020

Overall a very nice course, well explained and presented.Sometimes, it would be nice to see the slides 'full screen' rather than the small version in the corner.

View more reviews

Unlock access to 10,000+ courses with a subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 4,700 global companies that choose Coursera for Business

Frequently asked questions

To access course materials, assignments, and earn a Certificate, you'll need to purchase the Certificate experience when you enroll in a course. Eligible learners may also have the option to start with a Free Trial. Some courses may also offer a Full Course, No Certificate option. This lets you access course materials, submit required assessments, and receive a final grade, but you won't be able to earn or purchase a Certificate.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.