Reinforcement Learning Algorithms and Use Cases

Written by Coursera Staff • Updated on

Reinforcement learning algorithms allow artificial intelligence agents to learn the optimal way to perform a task through trial and error without human intervention. Explore reinforcement learning algorithms, such as Q-learning and actor-critic.

[Featured Image] A diverse group of professionals collaborating in a modern workspace, discussing reinforcement learning algorithms while engaging with a computer.

Key takeaways

Reinforcement learning algorithms acquire feedback through trial and error to improve their performance over time.

  • The purpose of reinforcement learning is to train AI models to learn the ideal way to accomplish a task.

  • Types of reinforcement learning algorithms include Q-learning, SARSA, and proximal policy optimization.

  • You can use reinforcement learning algorithms in real-world applications, such as game testing, self-driving cars, and health care treatment plans.

Explore the various types of reinforcement learning algorithms and how they work. If you’re interested in learning more about reinforcement learning, the Reinforcement Learning Specialization from the University of Alberta can help you learn to apply AI tools to solve problems and allow you to practice building a reinforcement learning system.

What is reinforcement learning?

Reinforcement learning is a machine learning method in which computers, robots, or other AI models find the best way to accomplish a goal through trial and error without the need for a computer scientist or other person to show them what to do. Reinforcement learning allows the AI to reflect on its own decisions and determine its value toward reaching the goal, ultimately finding the solution with the highest or best value.

You can use different reinforcement learning algorithms that utilize various approaches to the learning process, primarily in how they determine the value of actions within the decision-making process and learn through trial and error.

How do reinforcement learning algorithms work? 

Reinforcement learning algorithms learn in a way that might remind you of how humans learn through trying things and determining whether that attempt was good. If you want to learn how to do something, such as how to play chess, one way to learn is to sit down in front of a chessboard and start playing. As a beginner, you’re guaranteed to make a lot of mistakes. Every time you make a mistake, you can learn from what went wrong and become a stronger player for the next game. After playing many, many games of chess, you’ll start to understand the best way to dominate your opponents, no matter what maneuvers they attempt. 

Reinforcement learning works similarly. The algorithm attempts to accomplish a goal and then evaluates its own performance. It will adjust its decision-making process based on the feedback it gives itself about its actions. It uses a system of rewards and penalties to learn the most effective way to achieve a goal, much like how humans learn through trial and error. Continuing to use chess as a real-world example of this process, Google DeepMind developed AlphaZero, an artificial intelligence model that can play chess, shogi, and go. 

Reinforcement learning uses the Markov decision process, a sequential decision-making process based on mathematics, to evaluate specific actions' immediate and cumulative rewards. The AI model will first explore its environment by trying different actions and considering whether they move the state toward the final goal.

By looking at the immediate and long-term rewards of certain decisions, the AI model can choose the solution with the most value.

 

Model-based vs. model-free

Reinforcement learning algorithms can be differentiated as model-based or model-free, which describes whether the AI model builds an internal model of its environment or not. In a controlled, unchanging environment, an AI model may build a map or model of its environment to determine the optimum way to navigate the space. For example, a robot that serves drinks at a restaurant may create a map of the area to choose the best path to each table. With the model, the AI can predict the best action without physically navigating the space first. This is a model-based reinforcement learning algorithm. 

In more complex or dynamic environments, the model-free agent will learn directly through trial and error because it cannot build an internal model similarly. For example, a self-driving car can’t map out the space it will move through because of the variability of other drivers, pedestrians, road conditions, and other factors. The AI must instead learn through trying different actions and seeing what works. In this case, learning happens within a virtual environment, so the agent can experiment freely without endangering anyone. 

Types of reinforcement learning algorithms

Common reinforcement learning algorithms include Q-learning, SARSA, REINFORCE, PPO, TRPO, A2C, A3C, and DDPG. These algorithms differ in how they allow the main components of reinforcement learning to interact (i.e., the agent, environment, policy, and reward). The agent is the AI model; the environment is everything the AI model interacts with; the policy is the programming or instructions the AI model has; and the reward is a score representing the value of an action.

Each reinforcement learning algorithm has a different approach to implementing these four primary components. 

  • Q-learning or Deep Q-Networks (DQNs): Q-learning is a model-free algorithm that allows an AI model to learn without any prior knowledge or policy, or with the ability to deviate from policy. As a result, a Q-learning algorithm can create its own set of rules for achieving the desired action by predicting what the reward (Q-value) will be for any given action. Because of this, you can use Q-learning in uncontrolled or unpredictable environments. Combining Q-learning with neural networks allows you to use a DQN algorithm. 

  • SARSA (State, Action, Reward, State, Action): SARSA is a model-free algorithm like Q-learning, but it learns based on the actions it actually takes.

  • REINFORCE: The REINFORCE algorithm is a type of policy-gradient algorithm, which means that it adjusts its policy as it learns by predicting the return of certain actions. Because REINFORCE seeks to identify the optimal policy as it manipulates its environment, it's considered an off-policy algorithm. 

  • Actor-critic and A2C: Actor-critic algorithms use two neural networks, one as the actor to select actions and the other as the critic to evaluate the actions. The actor follows the current policy, and the critic evaluates and adjusts the policy after each iteration. This architecture can help you get the best of both value-based and policy-based algorithms. 

  • Trust-region policy optimization (TRPO): TRPO algorithms help solve a common problem with policy gradient algorithms. Sometimes the policy changes can be so big or small that the program won’t work as expected. A TRPO prevents the policy changes from being too drastic by adding constraints to the policy updates in each iteration. 

  • Proximal policy optimization (PPO): PPO is an on-policy algorithm developed as a simpler and just as effective solution to the problems that a TRPO can solve. This algorithm uses a novel equation to simplify the program by implementing updates in batches. 

  • Deep deterministic policy gradient (DDPG): A DDPG is an algorithm that combines many of the qualities of other algorithms mentioned. It is an off-policy, actor-critic model that uses a value-based critic to learn a deterministic policy, or a policy that is predictable based on the input. 

What are reinforcement learning algorithm use cases in machine learning?

You can use reinforcement learning in many different industries for various applications. A few examples of the many ways you could use reinforcement learning include gaming, health care, and self-driving vehicles: 

  • Gaming: Reinforcement learning algorithms can learn to play games, allowing you to play against an opponent who can adapt to your moves. You can also use reinforcement learning algorithms for game testing. 

  • Self-driving cars: You can use reinforcement learning to control a self-driving car that can learn to maneuver in a complex and unpredictable environment. Reinforcement learning allows the AI model to manage complex variables like speed, multiple lanes, and other drivers. 

  • Health care: Health care professionals use reinforcement learning to help guide patient treatment decisions. This technology is called dynamic treatment regimes. 

Read more: 10 Machine Learning Algorithms to Know

Explore our free machine learning resources

Subscribe to our weekly LinkedIn newsletter, Career Chat, for industry updates and trends, popular certifications, and resume-building skills. Then, check out some of our other free resources to learn more about artificial intelligence.

If you want to develop a new skill, get comfortable with an in-demand technology, or advance your abilities, you can keep growing with a Coursera Plus subscription. You’ll get access to over 10,000 flexible courses. 

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.