Decision Making and Reinforcement Learning

Decision Making and Reinforcement Learning

Instructor: Tony Dear

2,917 already enrolled

Included with

8 modules

Gain insight into a topic and learn the fundamentals.

4.2

(17 reviews)

Intermediate level

Recommended experience

47 hours to complete

3 weeks at 15 hours a week

Flexible schedule

Learn at your own pace

8 modules

Gain insight into a topic and learn the fundamentals.

4.2

(17 reviews)

Intermediate level

Recommended experience

47 hours to complete

3 weeks at 15 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Map between qualitative preferences and appropriate quantitative utilities.
Model non-associative and associative sequential decision problems with multi-armed bandit problems and Markov decision processes respectively
Implement dynamic programming algorithms to find optimal policies
Implement basic reinforcement learning algorithms using Monte Carlo and temporal difference methods

Skills you'll gain

Details to know

Earn a career certificate

Add to your LinkedIn profile

Assessments

8 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

There are 8 modules in this course

This course is an introduction to sequential decision making and reinforcement learning. We start with a discussion of utility theory to learn how preferences can be represented and modeled for decision making. We first model simple decision problems as multi-armed bandit problems in and discuss several approaches to evaluate feedback. We will then model decision problems as finite Markov decision processes (MDPs), and discuss their solutions via dynamic programming algorithms. We touch on the notion of partial observability in real problems, modeled by POMDPs and then solved by online planning methods. Finally, we introduce the reinforcement learning problem and discuss two paradigms: Monte Carlo methods and temporal difference learning. We conclude the course by noting how the two paradigms lie on a spectrum of n-step temporal difference methods. An emphasis on algorithms and examples will be a key part of this course.

Welcome to Decision Making and Reinforcement Learning! During this week, Professor Tony Dear provides an overview of the course. You will also view guidelines to support your learning journey towards modeling sequential decision problems and implementing reinforcement learning algorithms.

What's included

6 videos6 readings1 assignment1 programming assignment3 discussion prompts1 plugin

6 videosTotal 39 minutes

Introduction to Decision Making and Reinforcement Learning1 minutePreview module
Course Logistics3 minutes
1.1 Rational Agents and Utility Theory9 minutes
1.2 Preferences and Axioms of Utility Theory9 minutes
1.3 Uncertain and Multi-Attribute Utilities9 minutes
1.4 Value of Perfect Information6 minutes

6 readingsTotal 60 minutes

Course Syllabus10 minutes
About the Instructor10 minutes
Academic Honesty Policy10 minutes
Discussion Forum Etiquette10 minutes
Pre-Course Survey 10 minutes
Week 1 Lesson Materials10 minutes

1 assignmentTotal 30 minutes

Utility Theory30 minutes

1 programming assignmentTotal 180 minutes

Utility Theory180 minutes

3 discussion promptsTotal 30 minutes

Introduce Yourself!10 minutes
Discussion on Utility Theory10 minutes
Week 1 Questions and Feedback10 minutes

1 pluginTotal 15 minutes

Pre-Course Survey 15 minutes

Welcome to week 2! This week, we will learn about multi-armed bandit problems, a type of optimization problem in which the algorithm balances exploration and exploitation to maximize rewards. Topics include action values and sample averaging estimation, 𝜀-greedy action selection, and the upper confidence bound. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

3 videos1 reading1 assignment1 programming assignment2 discussion prompts

3 videosTotal 36 minutes

2.1 Multi-Armed Bandits and Action Values9 minutesPreview module
2.2 Ɛ-Greedy Action Selection12 minutes
2.3 Upper Confidence Bound14 minutes

1 readingTotal 10 minutes

Week 2 Lesson Materials10 minutes

1 assignmentTotal 30 minutes

Multi-Armed Bandit Problems30 minutes

1 programming assignmentTotal 180 minutes

Multi-Armed Bandit Problems180 minutes

2 discussion promptsTotal 20 minutes

Discussion on Multi-Armed Bandits10 minutes
Week 2 Questions and Feedback10 minutes

Welcome to week 3! This week, we will focus on the basics of the Markov decision process, including rewards, utilities, discounting, policies, value functions, and Bellman equations. You will model sequential decision problems, understand the impact of rewards and discount factors on outcomes, define policies and value functions, and write Bellman equations for optimal solutions. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

6 videos1 reading1 assignment1 programming assignment3 discussion prompts

6 videosTotal 35 minutes

3.1 Markov Decision Process Framework4 minutesPreview module
3.2 Gridworld Example8 minutes
3.3 Rewards, Utilities, and Discounting7 minutes
3.4 Policies and Value Functions6 minutes
3.5 Example: Mini-Gridworld5 minutes
3.6 Bellman Optimality Equations3 minutes

1 readingTotal 10 minutes

Week 3 Lesson Materials10 minutes

1 assignmentTotal 30 minutes

Sequential Decision Problems30 minutes

1 programming assignmentTotal 180 minutes

Bellman Equations180 minutes

3 discussion promptsTotal 30 minutes

Discussion on Sequential Decision Problem - Part 110 minutes
Discussion on Sequential Decision Problem - Part 210 minutes
Week 3 Questions and Feedback10 minutes

Welcome to week 4! This week, we will cover dynamic programming algorithms for solving Markov decision processes (MDPs). Topics include value iteration and policy iteration, nonlinear Bellman equations, complexity and convergence, and a comparison of the two approaches.You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

6 videos1 reading1 assignment2 programming assignments3 discussion prompts

6 videosTotal 41 minutes

4.1 Time-Limited Values7 minutesPreview module
4.2 Value Iteration6 minutes
4.3 Value Iteration Implementation8 minutes
4.4 Policy Iteration8 minutes
4.5 Example: Mini-Gridworld3 minutes
4.6 Algorithm Complexity7 minutes

1 readingTotal 10 minutes

Week 4 Lesson Materials10 minutes

1 assignmentTotal 30 minutes

Markov Decision Processes30 minutes

2 programming assignmentsTotal 360 minutes

Value Iteration180 minutes
Policy Iteration180 minutes

3 discussion promptsTotal 35 minutes

Discussion on Markov Decision Processes15 minutes
Discussion on Policy Iteration vs. Value Iteration10 minutes
Week 4 Questions and Feedback10 minutes

Welcome to week 5! This week, we will go through topics on partial observability and POMDPs, belief states, representation as belief MDPs, and online planning in MDPs and POMDPs. You will also apply your knowledge to update the belief state and employ a belief transition function to calculate state values. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

5 videos2 readings1 assignment1 programming assignment3 discussion prompts

5 videosTotal 35 minutes

5.1 Partial Observability and POMDP 4 minutesPreview module
5.2 Belief States8 minutes
5.3 Belief Transition Model6 minutes
5.4 Policies and Value Functions10 minutes
5.5 Example: Mini-Gridworld5 minutes

2 readingsTotal 20 minutes

Week 5 Lesson Materials10 minutes
Summary of Weeks 3, 4, and 510 minutes

1 assignmentTotal 30 minutes

POMDPs30 minutes

1 programming assignmentTotal 180 minutes

POMDPs180 minutes

3 discussion promptsTotal 35 minutes

Discussion on POMDPs - Part 115 minutes
Discussion on POMDPs - Part 210 minutes
Week 5 Questions and Feedback10 minutes

Welcome to week 6! This week, we will introduce Monte Carlo methods, and cover topics related to state value estimation using sample averaging and Monte Carlo prediction, state-action values and epsilon-greedy policies, and importance sampling for off-policy vs on-policy Monte Carlo control. You will learn to estimate state values, state-action values, use importance sampling, and implement off-policy Monte Carlo control for optimal policy learning. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

6 videos2 readings1 assignment1 programming assignment2 discussion prompts

6 videosTotal 41 minutes

6.1 Monte Carlo Methods5 minutesPreview module
6.2 First-Visit MC Prediction7 minutes
6.3 State-Action Values5 minutes
6.4 Ɛ−Greedy On-Policy MC Control7 minutes
6.5 On and Off-Policy MC Control7 minutes
6.6 Example: Mini-Gridworld8 minutes

2 readingsTotal 20 minutes

Week 6 Lesson Materials10 minutes
Post-Lecture Reading10 minutes

1 assignmentTotal 30 minutes

Monte Carlo RL30 minutes

1 programming assignmentTotal 180 minutes

Monte Carlo180 minutes

2 discussion promptsTotal 20 minutes

Discussion on Monte Carlo RL10 minutes
Week 6 Questions and Feedback10 minutes

Welcome to week 7! This week, we will cover topics related to temporal difference learning for prediction, TD batch methods, SARSA for on-policy control, and Q-learning for off-policy control. You will learn to implement TD prediction, TD batch and offline methods, SARSA and Q-learning, and compare on-policy vs off-policy TD learning. You will then apply your knowledge in solving a Tic-tac-toe programming assignment.You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

5 videos2 readings1 assignment3 programming assignments2 discussion prompts

5 videosTotal 35 minutes

7.1 Temporal Difference Learning6 minutesPreview module
7.2 Temporal Difference Prediction5 minutes
7.3 Batch Updating5 minutes
7.4 TD Learning for Control8 minutes
7.5 SARSA vs Q-Learning9 minutes

2 readingsTotal 20 minutes

Week 7 Lesson Materials10 minutes
Post-Lecture Readings10 minutes

1 assignmentTotal 30 minutes

Temporal Difference Learning30 minutes

3 programming assignmentsTotal 420 minutes

Tic-Tac-Toe60 minutes
Q-Learning180 minutes
SARSA180 minutes

2 discussion promptsTotal 20 minutes

Discussion on Temporal Difference RL10 minutes
Week 7 Questions and Feedback10 minutes

Welcome to week 8! This module covers n-step temporal difference prediction, n-step SARSA (on-policy and off-policy), model-based RL with Dyna-Q, and function approximation. You will be prepared to implement n-step TD learning, n-step SARSA, Dyna-Q for model-based learning, and use function approximation for reinforcement learning. You will apply your knowledge in the Frozen Lake programming environment. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

4 videos3 readings1 assignment1 programming assignment2 discussion prompts1 plugin

4 videosTotal 39 minutes

8.1 𝑛-step Temporal Difference Prediction10 minutesPreview module
8.2 𝑛-step SARSA8 minutes
8.3 Model-Based Methods8 minutes
8.4 Function Approximation11 minutes

3 readingsTotal 30 minutes

Week 8 Lesson Materials10 minutes
Post-Lecture Readings10 minutes
Post-Course Survey10 minutes

1 assignmentTotal 30 minutes

Generalization of Tabular Methods30 minutes

1 programming assignmentTotal 180 minutes

Frozen Lake180 minutes

2 discussion promptsTotal 25 minutes

Reinforcement Learning in Daily Lives15 minutes
Week 8 Questions and Feedback10 minutes

1 pluginTotal 15 minutes

Post-Course Survey15 minutes

Instructor

Instructor ratings

4.3 (6 ratings)

Tony Dear

Columbia University

1 Course2,917 learners

Offered by

Columbia University

Recommended if you're interested in Algorithms

Fundação Instituto de Administração
Modelos de Atribuição
Course
Google Cloud
Autoscaling TensorFlow Model Deployments with TF Serving and Kubernetes
Project
Google Cloud
Transformer Models and BERT Model - Deutsch
Course
DeepLearning.AI
Reinforcement Learning from Human Feedback
Project

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

Showing 3 of 17

4.2

17 reviews

5 stars
58.82%
4 stars
23.52%
3 stars
0%
2 stars
11.76%
1 star
5.88%

Reviewed on Jan 20, 2024

Reviewed on Jul 9, 2023

View more reviews

New to Algorithms? Start here.

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

You will be eligible for a full refund until two weeks after your payment date, or (for courses that have just launched) until two weeks after the first session of the course begins, whichever is later. You cannot receive a refund once you’ve earned a Course Certificate, even if you complete the course within the two-week refund period. See our full refund policy.

Decision Making and Reinforcement Learning

What you'll learn

Skills you'll gain

Details to know

See how employees at top companies are mastering in-demand skills

Earn a career certificate

There are 8 modules in this course

Decision Making and Utility Theory

What's included

Bandit Problems

What's included

Markov Decision Processes

What's included

Dynamic Programming

What's included

Partially Observable Markov Decision Processes

What's included

Monte Carlo Methods

What's included

Temporal-Difference Learning

What's included

Reinforcement Learning - Generalization

What's included

Instructor

Offered by

Recommended if you're interested in Algorithms

Modelos de Atribuição

Autoscaling TensorFlow Model Deployments with TF Serving and Kubernetes

Transformer Models and BERT Model - Deutsch

Reinforcement Learning from Human Feedback

Why people choose Coursera for their career

Learner reviews

New to Algorithms? Start here.

Open new doors with Coursera Plus

Advance your career with an online degree

Join over 3,400 global companies that choose Coursera for Business

Frequently asked questions

When will I have access to the lectures and assignments?

What will I get if I purchase the Certificate?

What is the refund policy?

More questions