Will I earn university credit for completing the Specialization?

Learners that complete the specialization will earn a Coursera specialization certificate signed by the professors of record, not a University of Alberta credit.

Is this course really 100% online? Do I need to attend any classes in person?

This course is completely online, so there’s no need to show up to a classroom in person. You can access your lectures, readings and assignments anytime and anywhere via the web or your mobile device.

Can I just enroll in a single course?

Yes! To get started, click the course card that interests you and enroll. You can enroll and complete the course to earn a shareable certificate. When you subscribe to a course that is part of a Specialization, you’re automatically subscribed to the full Specialization. Visit your learner dashboard to track your progress.

Is financial aid available?

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Can I take the course for free?

No, you cannot take this course for free. When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. If you cannot afford the fee, you can apply for financial aid.

Reinforcement Learning Specialization

Reinforcement Learning Specialization

Master the Concepts of Reinforcement Learning.

Implement a complete RL solution and understand how to apply AI tools to solve real-world problems.

Instructors: Adam White

66,350 already enrolled

Included with Learn more

Ask Coursera

4 course series

Get in-depth knowledge of a subject

from 3,588 reviews of courses in this program

Intermediate level

Recommended experience

2 months to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

4 course series

Get in-depth knowledge of a subject

from 3,588 reviews of courses in this program

Intermediate level

Recommended experience

2 months to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Build a Reinforcement Learning system for sequential decision making.
Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradients, Dyna, and more).
Understand how to formalize your task as a Reinforcement Learning problem, and how to begin implementing a solution.
Understand how RL fits under the broader umbrella of machine learning, and how it complements deep learning, supervised and unsupervised learning

Details to know

Shareable certificate

Add to your LinkedIn profile

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Advance your subject-matter expertise

Learn in-demand skills from university and industry experts
Master a subject or tool with hands-on projects
Develop a deep understanding of key concepts
Earn a career certificate from University of Alberta

Specialization - 4 course series

The Reinforcement Learning Specialization consists of 4 courses exploring the power of adaptive learning systems and artificial intelligence (AI).

Harnessing the full potential of artificial intelligence requires adaptive learning systems. Learn how Reinforcement Learning (RL) solutions help solve real-world problems through trial-and-error interaction by implementing a complete RL solution from beginning to end.

By the end of this Specialization, learners will understand the foundations of much of modern probabilistic artificial intelligence (AI) and be prepared to take more advanced courses or to apply AI tools and ideas to real-world problems. This content will focus on “small-scale” problems in order to understand the foundations of Reinforcement Learning, as taught by world-renowned experts at the University of Alberta, Faculty of Science.

The tools learned in this Specialization can be applied to game development (AI), customer interaction (how a website interacts with customers), smart assistants, recommender systems, supply chain, industrial control, finance, oil & gas pipelines, industrial control systems, and more.

Applied Learning Project

Through programming assignments and quizzes, students will:

Build a Reinforcement Learning system that knows how to make automated decisions.

Understand how RL relates to and fits under the broader umbrella of machine learning, deep learning, supervised and unsupervised learning.

Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradient, Dyna, and more).

Understand how to formalize your task as a RL problem, and how to begin implementing a solution.

Fundamentals of Reinforcement Learning

Course 1, 15 hours

What you'll learn

Formalize problems as Markov Decision Processes
Understand basic exploration methods and the exploration / exploitation tradeoff
Understand value functions, as a general-purpose tool for optimal decision-making
Know how to implement dynamic programming as an efficient solution approach to an industrial control problem

Skills you'll gain

Category: Reinforcement Learning

Category: Markov Model

Category: Machine Learning Algorithms

Category: Decision Intelligence

Category: Agentic systems

Category: Machine Learning

Category: Algorithms

Category: Artificial Intelligence

Sample-based Learning Methods

Course 2, 22 hours

What you'll learn

In this course, you will learn about several algorithms that can learn near optimal policies based on trial and error interaction with the environment---learning from the agent’s own experience. Learning from actual experience is striking because it requires no prior knowledge of the environment’s dynamics, yet can still attain optimal behavior. We will cover intuitively simple but powerful Monte Carlo methods, and temporal difference learning methods including Q-learning. We will wrap up this course investigating how we can get the best of both worlds: algorithms that can combine model-based planning (similar to dynamic programming) and temporal difference updates to radically accelerate learning.

By the end of this course you will be able to: - Understand Temporal-Difference learning and Monte Carlo as two strategies for estimating value functions from sampled experience - Understand the importance of exploration, when using sampled experience rather than dynamic programming sweeps within a model - Understand the connections between Monte Carlo and Dynamic Programming and TD. - Implement and apply the TD algorithm, for estimating value functions - Implement and apply Expected Sarsa and Q-learning (two TD methods for control) - Understand the difference between on-policy and off-policy control - Understand planning with simulated experience (as opposed to classic planning strategies) - Implement a model-based approach to RL, called Dyna, which uses simulated experience - Conduct an empirical study to see the improvements in sample efficiency when using Dyna

Skills you'll gain

Category: Reinforcement Learning

Category: Sampling (Statistics)

Category: Applied Machine Learning

Category: Probability Distribution

Category: Algorithms

Category: Simulations

Category: Machine Learning

Category: Machine Learning Algorithms

Category: Machine Learning Methods

Category: Statistical Methods

Prediction and Control with Function Approximation

Course 3, 22 hours

What you'll learn

In this course, you will learn how to solve problems with large, high-dimensional, and potentially infinite state spaces. You will see that estimating value functions can be cast as a supervised learning problem---function approximation---allowing you to build agents that carefully balance generalization and discrimination in order to maximize reward. We will begin this journey by investigating how our policy evaluation or prediction methods like Monte Carlo and TD can be extended to the function approximation setting. You will learn about feature construction techniques for RL, and representation learning via neural networks and backprop. We conclude this course with a deep-dive into policy gradient methods; a way to learn policies directly without learning a value function. In this course you will solve two continuous-state control tasks and investigate the benefits of policy gradient methods in a continuous-action environment.

Prerequisites: This course strongly builds on the fundamentals of Courses 1 and 2, and learners should have completed these before starting this course. Learners should also be comfortable with probabilities & expectations, basic linear algebra, basic calculus, Python 3.0 (at least 1 year), and implementing algorithms from pseudocode. By the end of this course, you will be able to: -Understand how to use supervised learning approaches to approximate value functions -Understand objectives for prediction (value estimation) under function approximation -Implement TD with function approximation (state aggregation), on an environment with an infinite state space (continuous state space) -Understand fixed basis and neural network approaches to feature construction -Implement TD with neural network function approximation in a continuous state environment -Understand new difficulties in exploration when moving to function approximation -Contrast discounted problem formulations for control versus an average reward problem formulation -Implement expected Sarsa and Q-learning with function approximation on a continuous state control task -Understand objectives for directly estimating policies (policy gradient objectives) -Implement a policy gradient method (called Actor-Critic) on a discrete state environment

Skills you'll gain

Category: Reinforcement Learning

Category: Supervised Learning

Category: Artificial Neural Networks

Category: Deep Learning

Category: Machine Learning

Category: Algorithms

Category: Linear Algebra

Category: Feature Engineering

Category: Machine Learning Algorithms

A Complete Reinforcement Learning System (Capstone)

Course 4, 16 hours

What you'll learn

In this final course, you will put together your knowledge from Courses 1, 2 and 3 to implement a complete RL solution to a problem. This capstone will let you see how each component---problem formulation, algorithm selection, parameter selection and representation design---fits together into a complete solution, and how to make appropriate choices when deploying RL in the real world. This project will require you to implement both the environment to stimulate your problem, and a control agent with Neural Network function approximation. In addition, you will conduct a scientific study of your learning system to develop your ability to assess the robustness of RL agents. To use RL in the real world, it is critical to (a) appropriately formalize the problem as an MDP, (b) select appropriate algorithms, (c ) identify what choices in your implementation will have large impacts on performance and (d) validate the expected behaviour of your algorithms. This capstone is valuable for anyone who is planning on using RL to solve real problems.

Skills you'll gain

Category: Reinforcement Learning

Category: Performance Tuning

Category: Machine Learning Algorithms

Category: Artificial Neural Networks

Category: Algorithms

Category: Model Training

Category: Feature Engineering

Category: Model Evaluation

Category: Machine Learning Methods

Category: Solution Architecture

Category: Machine Learning

Category: Agentic systems

Category: Model Optimization

Category: Markov Model

Category: Systems Development

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Adam White

University of Alberta

4 Courses116,934 learners

Offered by

University of Alberta

Alberta Machine Intelligence Institute

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Unlock access to 10,000+ courses with a subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 4,700 global companies that choose Coursera for Business

Frequently asked questions

It is recommended that learners take between 4-6 months to complete the specialization.

Recommended that learners have at least one year of undergraduate computer science or 2-3 years of professional experience in software development. Experience and comfort with programming in Python required. Must be comfortable converting algorithms and pseudocode into Python. Basic understanding of concepts from statistics (distributions, sampling, expected values), linear algebra (vectors and matrices), and calculus (computing derivatives)

Yes, it is recommended that courses are taken sequentially.

By the end of this specialization, you will be able to"

Build a Reinforcement Learning system for sequential decision making.
Understand the space of RL algorithms (Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradients, Dyna, and more).
Understand how to formalize your task as a Reinforcement Learning problem, and how to begin implementing a solution.
Understand how RL fits under the broader umbrella of machine learning, and how it complements deep learning, supervised and unsupervised learning