Columbia University
Decision Making and Reinforcement Learning
Columbia University

Decision Making and Reinforcement Learning

Taught in English

Some content may not be translated

2,028 already enrolled

Course

Gain insight into a topic and learn the fundamentals

Tony Dear

Instructor: Tony Dear

Intermediate level

Recommended experience

47 hours to complete
3 weeks at 15 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Map between qualitative preferences and appropriate quantitative utilities.

  • Model non-associative and associative sequential decision problems with multi-armed bandit problems and Markov decision processes respectively

  • Implement dynamic programming algorithms to find optimal policies

  • Implement basic reinforcement learning algorithms using Monte Carlo and temporal difference methods

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

8 quizzes

See how employees at top companies are mastering in-demand skills

Placeholder
Placeholder

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

Placeholder

There are 8 modules in this course

Welcome to Decision Making and Reinforcement Learning! During this week, Professor Tony Dear provides an overview of the course. You will also view guidelines to support your learning journey towards modeling sequential decision problems and implementing reinforcement learning algorithms.

What's included

6 videos6 readings1 quiz1 programming assignment3 discussion prompts1 plugin

Welcome to week 2! This week, we will learn about multi-armed bandit problems, a type of optimization problem in which the algorithm balances exploration and exploitation to maximize rewards. Topics include action values and sample averaging estimation, 𝜀-greedy action selection, and the upper confidence bound. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

3 videos1 reading1 quiz1 programming assignment2 discussion prompts

Welcome to week 3! This week, we will focus on the basics of the Markov decision process, including rewards, utilities, discounting, policies, value functions, and Bellman equations. You will model sequential decision problems, understand the impact of rewards and discount factors on outcomes, define policies and value functions, and write Bellman equations for optimal solutions. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

6 videos1 reading1 quiz1 programming assignment3 discussion prompts

Welcome to week 4! This week, we will cover dynamic programming algorithms for solving Markov decision processes (MDPs). Topics include value iteration and policy iteration, nonlinear Bellman equations, complexity and convergence, and a comparison of the two approaches.You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

6 videos1 reading1 quiz2 programming assignments3 discussion prompts

Welcome to week 5! This week, we will go through topics on partial observability and POMDPs, belief states, representation as belief MDPs, and online planning in MDPs and POMDPs. You will also apply your knowledge to update the belief state and employ a belief transition function to calculate state values. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

5 videos2 readings1 quiz1 programming assignment3 discussion prompts

Welcome to week 6! This week, we will introduce Monte Carlo methods, and cover topics related to state value estimation using sample averaging and Monte Carlo prediction, state-action values and epsilon-greedy policies, and importance sampling for off-policy vs on-policy Monte Carlo control. You will learn to estimate state values, state-action values, use importance sampling, and implement off-policy Monte Carlo control for optimal policy learning. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

6 videos2 readings1 quiz1 programming assignment2 discussion prompts

Welcome to week 7! This week, we will cover topics related to temporal difference learning for prediction, TD batch methods, SARSA for on-policy control, and Q-learning for off-policy control. You will learn to implement TD prediction, TD batch and offline methods, SARSA and Q-learning, and compare on-policy vs off-policy TD learning. You will then apply your knowledge in solving a Tic-tac-toe programming assignment.You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

5 videos2 readings1 quiz3 programming assignments2 discussion prompts

Welcome to week 8! This module covers n-step temporal difference prediction, n-step SARSA (on-policy and off-policy), model-based RL with Dyna-Q, and function approximation. You will be prepared to implement n-step TD learning, n-step SARSA, Dyna-Q for model-based learning, and use function approximation for reinforcement learning. You will apply your knowledge in the Frozen Lake programming environment. You could post in the discussion forum if you need assistance on the quiz and assignment.

What's included

4 videos3 readings1 quiz1 programming assignment2 discussion prompts1 plugin

Instructor

Instructor ratings
4.2 (5 ratings)
Tony Dear
Columbia University
1 Course2,028 learners

Offered by

Recommended if you're interested in Algorithms

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

New to Algorithms? Start here.

Placeholder

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions