What Is Anomaly Detection in Machine Learning?

Written by Coursera Staff • Updated on

Learn about anomaly detection in machine learning, including types of anomalies, various anomaly detection techniques, and industry applications.

[Featured image] A group of doctors meets in a conference room to discuss how anomaly detection in machine learning can help them detect diseases in patients.

Anomaly detection is a data-related task where algorithms work to identify outliers. These outliers are important as inconsistencies within the data may be a cause for concern. Identifying any outliers or anomalies is important in order to avoid misrepresenting a data set or address any problems they potentially represent. Successful anomaly detection has the potential to help businesses in several ways, including minimizing costs, retaining customers, and managing time. Using machine learning strategies for anomaly detection is helping increase the rate at which you can identify anomalies, and in this article, you will learn about how various machine learning algorithms are making an impact in this space.

Read more: What Is Machine Learning? Definition, Types, and Examples

Anomaly detection in machine learning

Anomaly detection in machine learning is the process of using machine learning models to identify anomalies rapidly. This serves several purposes, whether to maintain clean, high-quality data that you will use for processing or specific business purposes. By ensuring quality data, organizations can have trust in their analysis, leading to better decision-making. 

Although anomaly detection techniques have previously existed, more modern efforts that utilize machine learning can automatically detect outliers. The main advantages of anomaly detection in machine learning include the ability to handle significant volumes of data, high dimensional data from various sources, a high success rate in identifying anomalies, and the ability to have real-time detection.

Types of anomalies

Anomalies can present themselves in different ways. While some anomalies are merely outliers that happen suddenly, other anomalies are less obvious. In some cases, anomalies can appear as a gradual change over time, slowly altering the data. 

Two ways you can classify anomalies are intentional versus unintentional: 

  • Intentional anomalies occur because of a specific event, such as a cyber attack on a company's network.

 

  • Unintentional anomalies instead arise due to an error at some point during data collection, such as a human error miscalculation, damaging the quality of a data set. 

When looking closer at the relationships within data, three more types of anomalies exist; contextual anomalies, point anomalies, and collective anomalies:

  • Contextual anomalies are similar to intentional anomalies in the way that they can occur due to specific events. 

  • Point anomalies are outliers that stand out from other data points within a data set. 

  • Collective anomalies are data points that sequentially occur and may need closer investigation to determine if it’s a cause for concern.

Anomaly detection techniques and algorithms

The right anomaly detection technique depends greatly on the type of data you're working with and how much of the data has labels versus unlabeled. Anomaly detection techniques fall into one of three categories: unsupervised anomaly detection, supervised anomaly detection, and semi-supervised anomaly detection. 

1. Unsupervised anomaly detection

Unsupervised anomaly detection is a popular approach to anomaly detection in machine learning. This is because unlabeled anomalous data is more common, allowing the unsupervised anomaly detection algorithm to make discoveries on its own, with no need for labels. This technique is seen in deep learning, using algorithms such as artificial neural networks, isolation forests, and one-class support vector machines. You can see unsupervised anomaly detection used in areas such as fraud detection and detecting medical anomalies. 

2. Supervised anomaly detection

Supervised anomaly detection requires the use of labeled data, unlike unsupervised learning methods. The downside of this is the fact that the algorithm can only detect anomalies that it’s seen before in its training data. This requires providing the algorithm with enough examples of anomalies and proper data. Examples of supervised anomaly detection algorithms include random forests and k-nearest neighbors. Some of the industry applications for these algorithms are detecting fraudulent transactions as well as detecting any defects that occur during manufacturing.

3. Semi-supervised anomaly detection

Semi-supervised anomaly detection blends together facets of both supervised and unsupervised anomaly detection methods, with the ability to handle some labeled data, in addition to large amounts of unlabeled data. Using labeled data gives you more control over the training process, potentially leading to better outcomes. An example of a semi-supervised anomaly detection algorithm is linear regression. Use cases for these algorithms include highly complex and industry-specific systems, as well as fraud detection.

Challenges of anomaly detection

Anomaly detection in machine learning does come with certain challenges. Unsupervised and supervised approaches can sometimes return too many false positives. This means that extra effort has to then go into developing a better model, as well as identifying the false positives in the first place. The results of anomaly detection also aren’t always simple to interpret, making it necessary to have employees properly equipped with skills to understand what they’re reviewing. 

Anomaly detection requires some specific data features as well. The data you use to develop a trained algorithm needs to be clean, with no duplicate information or incomplete data sets. Additionally, the size of the data set you use for training matters. If you don’t have a big enough training set, the model can’t accurately develop the model. To make up for a lack of data, one option is to implement synthetic data sets.  

Anomaly detection use cases

Implementing anomaly detection is useful across a wide range of industries. Here’s a look at specific use cases:

  • In e-commerce, anomaly detection allows businesses to measure changes in conversion rate. This allows you to quickly spot the issues causing these changes, whether due to seasonal changes, website issues, or other technical problems.

  • When businesses partner with content creators on social media, it’s important to ensure they’re working with real people and not fraudulent accounts. Anomaly detection algorithms can identify behavior that is concerning in this context.

  • The cybersecurity industry benefits greatly from anomaly detection, with the power to differentiate potentially malicious activity that can cause damage to the system from standard online actions. 

  • Anomaly detection helps with the monitoring of information technology systems, observing metrics that provide context to the overall performance of a system.

Getting started with Coursera

On Coursera, you can explore highly rated courses to learn more about machine learning and anomaly detection. Introduction to Machine Learning from Duke University can help you learn more about foundational machine learning topics, including logistic regression and neural networks.

Unsupervised Learning, Recommenders, Reinforcement Learning from DeepLearning.AI is another great option to gain skills in anomaly detection and expand your knowledge of unsupervised and supervised learning techniques.

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.