Decision Trees in Machine Learning: Two Types (+ Examples)

Written by Coursera Staff • Updated on

Decision trees, which resemble a flowchart-like structure, are a type of supervised learning algorithm commonly used in machine learning. They model and predict outcomes based on input data. Read on to learn more.

[Featured image] A smiling machine learning engineer sits at his workstation.

Trees are a common analogy in everyday life. Shaped by a combination of roots, trunk, branches, and leaves, trees often symbolise growth. In machine learning, a decision tree is an algorithm that can create both classification and regression models. 

The decision tree is so named because it starts at the root, like an upside-down tree, and branches off to demonstrate various outcomes. Because problem-solving is the basis of machine learning, decision trees help people visualise these models and adjust how they train them. 

Discover what you need to know about decision trees in machine learning.

What is a decision tree? 

A decision tree is a supervised learning algorithm used for classification and regression modelling. Regression serves as a method for predictive modelling, so these trees either classify data or predict what will happen next.

Decision trees look like flowcharts. They start at the root node with a specific question of data, which leads to branches that hold potential answers. The branches then lead to decision (internal) nodes, which ask more questions that lead to more outcomes. This goes on until the data reaches what’s called a terminal (or “leaf”) node and ends.

Four main methods of training algorithms are used in machine learning: supervised learning, unsupervised learning, reinforcement learning, and semi-supervised learning. A decision tree helps you visualise how a supervised learning algorithm leads to specific outcomes.

Introduction to supervised learning

To deepen your knowledge of supervised learning, consider the course Introduction to Supervised Learning: Regression and Classification from DeepLearningAI and Stanford University. In 33 hours or less, you’ll get an introduction to modern machine learning, including supervised learning and algorithms such as decision trees, multiple linear regression, neural networks, and logistic regression.

Placeholder

Why is a decision tree important in machine learning?

Decision trees in machine learning provide an effective method for making decisions because they lay out the problem and all the possible outcomes. It enables developers to analyse the possible consequences of a decision. As an algorithm accesses more data, it can predict outcomes for future data. 

In this simple text decision tree, the analysis focuses on whether to go to the shops to buy toilet roll

Basic Decision Tree Example

Problem: “Do I need to get toilet roll?”

YesNo
Is it raining? > Yes > Go to the shops when it stops rainingThere’s no need to go to the shops
Is it raining? > No > Go to the shops and buy toilet roll

In machine learning, decision trees offer simplicity and a visual representation of the possibilities when formulating outcomes. Read on to see how the two types of decision trees work. 

Types of decision trees in machine learning

Decision trees in machine learning can either be classification trees or regression trees. Together, both types of algorithms fall into a category of “classification and regression trees” and are sometimes referred to as CART. Their respective roles are to “classify” and to “predict.”

1. Classification trees

Classification trees determine whether an event happened or didn’t happen. Usually, this involves a “yes” or “no” outcome. 

This type of decision-making is often used in the real world. Consider the below examples to help contextualise how decision trees work for classification:

Example 1: How to spend your free time after work

What you do after work in your free time can often be weather-dependent. If it is sunny, you might choose between having a picnic with a friend, grabbing a drink with a colleague, or getting a few messages. If it’s raining, you might opt to stay home and watch a film instead. There is a clear outcome. In this case, the classification is whether you should “go out” or “stay in.”

Example 2: Homeownership based on age and income

In a classification tree, the data set splits according to its variables. There are two variables, age, and income, that determine whether or not someone buys a house. If training data tells us that 70 per cent of people over age 30 bought a house, then the data gets split there, with age becoming the first node in the tree. This split makes the data 80 per cent “pure.” The second node then addresses income from that point.

Gain hands-on experience with classification trees

To understand how decision trees work in machine learning, consider registering for these Guided Projects to apply your skills to real-world projects. You can complete them in two hours or less:

Decision Tree and Random Forest Classification using Julia

Predicting Salaries with Decision Trees

Placeholder

2. Regression trees

Regression trees, on the other hand, predict continuous values based on previous data or information sources. For example, they can predict the price of gasoline or whether a customer will purchase eggs (including which type of eggs and at which store).

This type of decision-making is more about programming algorithms to predict what is likely to happen, given previous behaviour or trends. 

Example 1: Housing prices in Manchester

Consider using regression analysis to predict the price of a house in Manchester. Plotting it on a map, the regression model can predict housing prices in the coming years using data points of what prices have been in previous years. This relationship is a linear regression since predictions indicate that housing prices will continue rising. Machine learning helps you to predict specific prices based on a series of variables that have been true in the past.

Example 2: Bachelor’s degree graduates in 2025

A regression tree can help a university predict the number of bachelor’s degree students in 2025. On a graph, one can plot the number of degree-holding students between 2013 and 2023. If the number of university graduates increases linearly each year, then regression analysis can be used to build an algorithm that predicts the number of students in 2025. 

Classification and regression tree (CART) is a predictive algorithm used in machine learning that generates future predictions based on previous values. These decision trees are at the core of machine learning and serve as a basis for other machine learning algorithms such as random forest, bagged decision trees, and boosted decision trees.

Gain hands-on experience with regression trees

To see how decision tree algorithms work in predictive machine learning models, take a look at these Guided Projects. Each project takes less than two hours, and real-world examples form the basis, helping you elevate your skills:

XG-Boost 101: Used Cars Price Prediction

Decision Tree Classifier for Beginners in R

Placeholder

Decision tree terminology

These terms come up frequently in machine learning and are helpful to know as you embark on your machine-learning journey:

  • Root node: The topmost node of a decision tree that represents the entire message or decision

  • Decision (or internal) node: A node within a decision tree where the prior node branches into two or more variables

  • Leaf (or terminal) node: The last node in the decision tree and farthest from the root node, also called the external node or terminal node, which means it has no child

  • Splitting: The process of dividing a node into two or more nodes and the part at which the decision branches off into variables

  • Pruning: The opposite of splitting, the process of going through and reducing the tree to only the most important nodes or outcomes

Learn machine learning with Coursera

Decision trees are helpful tools that provide structure to machine learning. Now that you know more about your Root Nodes from your Leaf Nodes, consider starting your machine learning journey with Coursera’s top-rated Specialisation Supervised Machine Learning: Regression and Classification, offered by DeepLearning.AI. 

Taught by AI visionary Andrew Ng, you will build machine learning models in Python using popular libraries NumPy and scikit-learn, and train supervised machine learning models for prediction—including decision trees.

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.