What Is a Confusion Matrix?

Written by Coursera Staff • Updated on Feb 20, 2026

Learn what a confusion matrix is and why professionals across industries value this tool. Plus, discover how to calculate and interpret key performance indicators from your confusion matrix.

[Featured Image] A machine learning engineer studies a confusion matrix on their laptop.

Key takeaways

A confusion matrix is a two-by-two matrix that represents the number of correct and incorrect predictions your classification algorithm produces.

According to Glassdoor, a data scientist earns a median total salary of $154,000 per year [1].

Researchers use a confusion matrix to determine key performance metrics such as the accuracy, precision, and F1 score of their models.

You can use a confusion matrix to assess the accuracy and precision of your models as a data scientist.

Explore the basics of classification in machine learning, how to interpret a confusion matrix, advantages and limitations, and what type of career might use this tool. If you’re ready to begin enhancing your machine learning skill set, enroll in the Machine Learning Specialization from Stanford University and DeepLearning.AI, where in as little as two months, you can learn about model evaluation, unsupervised learning, predictive modeling, data ethics, and more.

What is classification in machine learning?

Classification in machine learning is like sorting things into different groups based on their features. For example, imagine you have a selection of photos that include either cats or dogs. Classification algorithms help the machine learn the differences between cat and dog images based on characteristics like color, size, or shape. In machine learning, you might use this concept for more complex tasks, such as recognizing spam emails, diagnosing diseases from medical images, or categorizing products.

Classification algorithms consider several independent variables before generating the probability of something being in each possible category. For example, let’s say you are trying to provide a medical diagnosis for a patient. In this case, the patient’s characteristics and symptoms might be the independent variables. If the patient is over 60 years old and is experiencing joint pain and stiffness, your classification algorithm might give a high percentage likelihood that the patient has arthritis.

Depending on your algorithm and potential categories, you might find likelihoods associated with other conditions, such as joint fractures, cancer, or infections.

Confusion matrix: Python

To create a confusion matrix in Python, you will need five parameters:

• Y_true: The correct target values

• Y_pred: The values returned by the algorithm

• Labels: A method of indexing your matrix and returning all or a subset of labels

• Sample_weight: Sample weight

• Normalize: A method of normalizing the data; with the value “none,” you can return a matrix without normalization.

To create a confusion matrix in Python, use the command:

sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)

Detecting errors in classification

After you develop your classification algorithm, you will want to detect how accurate your model is. If the algorithm makes mistakes and labels things incorrectly, they are designated false positives and false negatives. For instance, marking a safe email as spam is a false positive. On the other hand, a false negative occurs when the algorithm fails to identify what it’s supposed to find, such as missing a spam email and letting it into the inbox. Ideally, you want your classification algorithm to have much higher rates of true positives and true negatives than false positives and negatives.

What is a confusion matrix in machine learning?

A confusion matrix is a convenient way of representing your true positives, true negatives, false positives, and false negatives. Confusion matrices typically receive representation as a 2 by 2 table:

	Predicted negative	Predicted positive
Actual negative	A	B
Actual positive	C	D

In this confusion matrix, you have four cells:

“D” or True positives (TP): The model correctly predicts the positive class in these cases.

“A” or True negatives (TN): The model correctly predicts the negative class in these cases.

“B” or False positives (FP): In these instances, the model indicated “positive,” but the true value was “negative.” In statistics, you might refer to this as a Type I error.

“C” or False negatives (FN): These are the cases in which the model indicated “negative,” but the true value was “positive.” In statistics, you might refer to this as a Type II error.

What is a confusion matrix used for?

From the four basic elements of a confusion matrix (true positives, true negatives, false positives, false negatives), you can calculate several key performance metrics:

Accuracy

Accuracy is the overall correctness of the model, calculated as (TP + TN) / (TP + TN + FP + FN).

Precision

Precision is the accuracy of positive predictions, calculated as TP / (TP + FP). If your precision is 0.4, then the model is correct in its positive predictions 40 percent of the time.

Sensitivity or recall

Sensitivity represents the model’s ability to find all the positive cases. When this measure is high, you will be more likely for the model to identify positive cases. It is also more likely to have false positives. You can calculate this as TP / (TP + FN).

Specificity

Specificity represents the model’s ability to classify negative instances correctly. This is the inverse of specificity—a higher value of this measure means the model is more likely to classify negative cases correctly. However, the model is also more likely to have false negatives when this measure is high. You can calculate this as TN / (TN + FP).

F1 score

The F1 score, or F-measure, is a value that represents how well a classification algorithm performs. This is calculated as 2 * (Precision * Recall) / (Precision + Recall).

Interpreting a confusion matrix involves more than just looking at the numbers. It’s about understanding the context of the problem you’re solving. If you are developing a screening algorithm for a medical diagnosis, you might want to avoid a false negative. For example, imagine predicting no disease when someone actually has one. Avoiding false negatives in this context would be a top priority.

In contrast, in email spam detection, a false positive (marking a good email as spam) might be more problematic. By examining the matrix, you can identify if your model needs to be more sensitive (increasing true positives) or more specific (reducing false positives) and adjust your approach accordingly.

Advantages and disadvantages of using a confusion matrix

When choosing to use a confusion matrix, you should consider whether it is appropriate for your type of data and what performance measures are important to you. The primary advantages and limitations you might experience include the following.

Advantages of a confusion matrix

By using a confusion matrix with binary data, you can determine several different performance measures. With binary classification, you can determine the model’s accuracy, precision, recall, Matthews correlation coefficient, ROC, and area under the curve. Each of these measures represents a different aspect of your model’s performance, and you can use these measures to determine how your model needs to be altered. When designing for a specific data type, you might find that having control over what your model prioritizes is important.

Disadvantages of a confusion matrix

When using a confusion matrix, you should consider the data type you are using. Confusion matrices can become complex when you have multiclass classification because you have more than two classes to predict. As this number increases, interpreting the accuracy and performance of your confusion matrix becomes increasingly complex. Multiclass classification is also limited to only a few performance measures compared to binary classification.

Confusion matrices can also look misleading if you have class imbalances. For example, suppose you had a data set with 1,000 values and only three positive values. In that case, your classification system might seem to have a high accuracy by just predicting everything as negative when, in reality, it cannot detect positives correctly.

Professions that use confusion matrices

Many different careers you might choose to explore use confusion matrices when dealing with classification tasks. You might use classification tasks in your primary job role, such as designing fraud detection software, or you might use classification as a tool to help you perform more effectively, such as using an email spam filter.

As a data scientist, you might use a confusion matrix to understand the accuracy and precision of your models. This type of data science applies to many fields you can specialize in. For example, as an environmental data scientist, you might use a confusion matrix when studying how accurately your classification model detected a genetic variant in your sample. As a data scientist, you can expect to earn a median total salary of $154,000 per year as of February 2026 [1]. This figure includes base salary and additional pay, which may represent profit-sharing, commissions, bonuses, or other compensation.

Keep up with tips and trends in machine learning

Subscribe to our weekly LinkedIn newsletter, Career Chat, for industry updates, tips, and trends. Then, explore free machine learning resources to optimize your professional growth:

Watch on YouTube: Linear Regression Explained | Python Machine Learning Tutorial with Real Data

Explore career options: Machine Learning Career Paths: Explore Roles & Specializations

Learn related terminology: Artificial Intelligence Glossary: Learn AI Vocabulary

Accelerate your career growth with a Coursera Plus subscription. When you enroll in either the monthly or annual option, you’ll get access to over 10,000 courses.

Build job-ready skills with Coursera Plus

Start 7-day free trial

Article sources

Glassdoor. “Salary: Data Scientist in the United States, https://www.glassdoor.com/Salaries/us-data-scientist-salary-SRCH_IL.0,2_IN1_KO3,17.htm.” Accessed February 14, 2026.

Updated on Feb 20, 2026

Written by:

Coursera Staff

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.