What Is Adversarial Machine Learning?

Written by Coursera Staff • Updated on

Adversarial machine learning focuses on the vulnerability of manipulation of a machine learning model by deceiving inputs designed to cause the application to work improperly. Learn more about what adversarial machine learning is and why it’s important.

[Featured Image] A group of employees meet to discuss how to protect their organization from adversarial machine learning.

Adversarial machine learning is a technique that's used to study machine learning (ML) model attacks to identify weak points and form a defense against malicious attacks. Adversarial attacks seek to manipulate machine learning models by providing deceiving inputs. These deceiving inputs cause a machine learning model to malfunction, potentially exposing data or disrupting the function operated by machine learning.

Despite its name, adversarial machine learning is a description of methods that you can use to assess vulnerabilities of ML models—it is not a type of ML. Attackers target machine learning models for many reasons and with many methods to manipulate them. The results of manipulating AI or machine learning systems range from incorrect outputs rendered by generative AI applications to data breaches. Read on to learn about adversarial machine learning, the different types, examples, and more.

How does adversarial machine learning work?

Machine learning models train themselves to perform programmed actions by processing large data sets. They classify data points into categories and determine actions based on what the model can understand and how it has been trained.

Adversarial machine learning refers to the disruption of this workflow by introducing an input meant to deceive the model into malfunctioning. This can also be useful in gaining insight into achieving desired outcomes by manipulating data.


An example that underlines the potential danger involved in adversarial machine learning is the manipulation of autonomous vehicles. Self-driving vehicles use machine learning models to interpret road signs. Slight modifications to these street signs, such as the placement of a sticker on a yield sign, can cause the machine learning model to malfunction, according to a study conducted by researchers from Princeton, UC Berkeley, and Purdue. 


Types of adversarial machine learning

Adversaries use two main tactics to manipulate machine learning models: black-box and white-box attacks. In a white-box attack, the manipulator can access the model’s code and control its inner workings. However, the attacker knows only the ML model’s outputs during a black-box attack. An adversary can still manipulate a machine learning model even without access to the model’s code.

Adversarial machine learning attacks typically have the same overarching goal but use different tactics to accomplish them. You’ll find these attacks typically fall into one of three categories.

1. Poisoning attack

In a poisoning attack, the adversary “poisons” the machine learning model to underperform, contaminating the data to provoke a vulnerability. Google’s spam email filters highlight an example of data poisoning. Adversaries have repeatedly tried to trick Google’s filter by reporting massive amounts of emails as not spam.

Read more: 5 Cybersecurity Threats to Know

2. Evasion attack

Evasion attacks take place on trained ML models. The adversary changes an input slightly to manipulate how the machine learning model classifies it. For example, placing a sticker on a stop sign to trick a self-driving car’s machine learning model into thinking it is a yield sign.

Evasion attacks often aim to circumvent spam or malware filters, in which adversaries use language or mark software or emails in misleading ways. Attackers use trial and error tactics to gain insight into a machine learning model’s operations. For example, an adversary may send a large volume of emails with variations in language and observe which ones make it through a spam filter.

3. Extraction attack

An extraction attack involves an attacker manipulating a machine learning model to create a replica of the model or stealing data from the model. Like evasion attacks, extraction attacks often rely on inferences the attacker can make about outputs generated by the model.

What is adversarial machine learning used for?

Adversarial machine learning attacks undermine the effectiveness of machine learning models. Manipulations cause damage, erode faith in the model’s application, or circumvent security measures such as filters for harmful content.

What is an adversarial example?

An adversarial example is a term that refers to the input fed to a machine learning model to manipulate it into a malfunction. These inputs may seem innocuous to a human being, but they target machine learning’s ability to classify the input correctly.

For example, security cameras are sometimes operated by machine learning models. If an adversary noticed that a location’s security system accounted for a maintenance person entering the building at a certain time each night. In that case, the machine learning model may permit any action at that time without a security alert. An adversary who notices this could potentially manipulate the machine learning model to miss a threat if the model permits anyone to enter the building at that time without a security alert.

How do organizations combat adversarial machine learning?

Businesses have several means of combatting adversarial machine learning, and it is becoming increasingly essential for them to devise strategies for this purpose. Adversarial machine learning attacks persist in danger and potentially increase the destruction they can cause as businesses increase their reliance on AI applications. Microsoft now spends more than $20 billion on cybersecurity alone. [1] According to management consulting firm Gartner, the information security industry will reach $215 billion in 2024, a 14.3 percent increase from 2023’s estimate. [2]

Many companies, including Microsoft and Google, provide their code as open-source to enhance cybersecurity, especially in a rapidly advancing environment such as artificial intelligence. You might think open-source code would instead lead to increased manipulation because it’s accessible by many people. However, open-source code also allows experts worldwide to impart their knowledge into cybersecurity measures. With programming experts, researchers, and top professionals accessing code, companies can readily identify and fix vulnerabilities.

Read more: What Is Cybersecurity? Definition + Industry Guide

How do organizations counteract adversarial attacks?

Existing methods of defending against adversarial machine learning attacks face a yet-unresolved problem. Defenses against anticipated or detected attacks are not adaptive. Maintenance must be performed routinely to keep machine learning models protected from vulnerabilities.

Programmers adjust machine learning models to adapt to threats and protect themselves from adversarial attacks using two main techniques.

Adversarial training

Adversarial training is a protocol in which humans introduce adversarial examples – or the corrupt inputs that prompt machine learning models to malfunction – to the model. The model then labels these known malicious inputs as threatening. In the same way, that the machine learning model learns to classify data as part of its normal process, the model also trains itself to turn away perturbations. This method involves ongoing maintenance efforts and oversight as attempts to manipulate the machine learning model evolve.

Defensive distillation

Defensive distillation is a highly adaptable method of training machine learning models to prevent adversarial attacks. A “teacher network” trains on a data set. Another network, the “learner network,” uses the teacher network’s class probabilities—the probability of how the model classifies inputs—as soft targets in its own training. Since the learner network operates with more nuanced information, it can classify information on a deeper level. 

Getting started with Coursera

You can dive further into machine learning concepts with CertNexus’ Build Decision Trees, SVMs, Artificial Neural Networks course, offered on Coursera, which can help you explore key concepts artificial intelligence practitioners should know. You can train and evaluate several components of neural networks and optimize the performance of AI applications for tasks such as language processing.

Article sources


Yahoo News. “Microsoft’s security arm is now a $20 billion per year business, https://finance.yahoo.com/news/microsofts-security-arm-is-now-a-20-billion-per-year-business-213419664.html.” Accessed March 27, 2024.

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.