What Is Adversarial Machine Learning?

Written by Coursera Staff • Updated on Jan 22, 2026

Adversarial machine learning focuses on the vulnerability of manipulation of a machine learning model by deceiving inputs designed to cause the application to work improperly. Learn more about what adversarial machine learning is and why it’s important.

[Featured Image] A group of employees meet to discuss how to protect their organization using adversarial machine learning.

Key takeaways

Adversarial machine learning examines malicious attacks on machine learning (ML) models to identify vulnerabilities and develop defenses against them.

The information security industry is expected to reach $323 billion by 2029, up from $213 billion in 2025 [1]

Three types of adversarial machine learning include poisoning attack, evasion attack, and extraction attack.

You can help organizations combat adversarial attacks in a career as a programmer by adjusting machine learning models using adversarial training and defensive distillation.

Discover more about adversarial machine learning, the different types, examples, and how to counteract adversarial attacks. If you're ready to build essential job skills in this industry right away, consider enrolling in the Machine Learning Specialization. You'll have the opportunity to master fundamental AI concepts and develop practical machine learning skills in the beginner-friendly, three-course program by AI visionary Andrew Ng.

What is adversarial machine learning?

Adversarial machine learning is a technique that's used to study machine learning model attacks to identify weak points and form a defense against malicious attacks. Adversarial attacks seek to manipulate machine learning models by providing deceiving inputs. These deceiving inputs cause a machine learning model to malfunction, potentially exposing data or disrupting the function operated by machine learning.

Despite its name, adversarial machine learning is a description of methods that you can use to assess vulnerabilities of ML models; it is not a type of ML. Attackers target machine learning models for many reasons and with many methods to manipulate them. The results of manipulating AI or machine learning systems range from incorrect outputs rendered by generative AI applications to data breaches.

How does adversarial machine learning work?

Machine learning models train themselves to perform programmed actions by processing large data sets. They classify data points into categories and determine actions based on what the model can understand and how it has been trained.

Adversarial machine learning refers to the disruption of this workflow by introducing an input meant to deceive the model into malfunctioning. This can also be useful in gaining insight into achieving desired outcomes by manipulating data.

What is an adversarial attack in machine learning example?

An example that underlines the potential danger involved in adversarial machine learning is the manipulation of autonomous vehicles. Self-driving vehicles use machine learning models to interpret road signs. Slight modifications to these street signs, such as the placement of a sticker on a yield sign, can cause the machine learning model to malfunction, according to a study conducted by researchers from Princeton, UC Berkeley, and Purdue [2].

Types of adversarial machine learning

Adversaries use two main tactics to manipulate machine learning models: black-box and white-box attacks. In a white-box attack, the manipulator can access the model’s code and control its inner workings. However, the attacker knows only the ML model’s outputs during a black-box attack. An adversary can still manipulate a machine learning model even without access to the model’s code.

Adversarial machine learning attacks typically have the same overarching goal but use different tactics to accomplish it. You’ll find these attacks typically fall into one of three categories.

1. Poisoning attack

In a poisoning attack, the adversary “poisons” the machine learning model to underperform, contaminating the data to provoke a vulnerability. Google’s spam email filters highlight an example of data poisoning. Adversaries have repeatedly tried to trick Google’s filter by reporting massive amounts of emails as not spam.

2. Evasion attack

Evasion attacks take place on trained ML models. The adversary changes an input slightly to manipulate how the machine learning model classifies it. For example, placing a sticker on a stop sign to trick a self-driving car’s machine learning model into thinking it is a yield sign.

Evasion attacks often aim to circumvent spam or malware filters, in which adversaries use language or mark software or emails in misleading ways. Attackers use trial-and-error tactics to gain insight into a machine learning model’s operations. For example, an adversary may send a large volume of emails with variations in language and observe which ones make it through a spam filter.

3. Extraction attack

An extraction attack involves an attacker manipulating a machine learning model to create a replica of the model or stealing data from the model. Like evasion attacks, extraction attacks often rely on inferences the attacker can make about outputs generated by the model.

A Taxonomy and Terminology of Adversarial Machine Learning

The National Institute of Standards and Technology publishes a taxonomy of concepts and terms related to the field of adversarial machine learning titled “Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations.” Going through this book can help you understand basic terms, explore the scope of adversarial machine learning, and assess current practices with AI technology.

What is adversarial machine learning used for?

Adversarial machine learning attacks undermine the effectiveness of machine learning models. Manipulations cause damage, erode faith in the model’s application, or circumvent security measures such as filters for harmful content.

What is an adversarial example?

An adversarial example is a term that refers to the input fed to a machine learning model to manipulate it into a malfunction. These inputs may seem innocuous to a human being, but they target machine learning’s ability to classify the input correctly.

For example, security cameras are sometimes operated by machine learning models. If an adversary noticed that a location’s security system accounted for a maintenance person entering the building at a certain time each night, the machine learning model may permit any action at that time without a security alert. An adversary who notices this could potentially manipulate the machine learning model to miss a threat if the model permits anyone to enter the building at that time without a security alert.

How do organizations combat adversarial machine learning?

Businesses have several means of combating adversarial machine learning, and it is becoming increasingly essential for them to devise strategies for this purpose. Adversarial machine learning attacks persist in danger and potentially increase the destruction they can cause as businesses increase their reliance on AI applications. Microsoft earns $20 billion in revenue annually on cybersecurity alone, making it the largest cybersecurity vendor in the world [3]. According to the management consulting firm Gartner, the information security industry is expected to reach $323 billion by 2029, up from $213 billion in 2025 [1].

Many companies, including Microsoft and Google, provide their code as open-source to enhance cybersecurity, especially in a rapidly advancing environment such as artificial intelligence. You might think open-source code would instead lead to increased manipulation because it’s accessible to many people. However, open-source code also allows experts worldwide to impart their knowledge into cybersecurity measures. With programming experts, researchers, and top professionals accessing code, companies can readily identify and fix vulnerabilities.

How do organizations counteract adversarial attacks?

Existing methods of defending against adversarial machine learning attacks face a yet-unresolved problem. Defenses against anticipated or detected attacks are not adaptive. Maintenance must be performed routinely to keep machine learning models protected from vulnerabilities.

Programmers adjust machine learning models to adapt to threats and protect themselves from adversarial attacks using two main techniques: adversarial training and defensive distillation.

Adversarial training

Adversarial training is a protocol in which humans introduce adversarial examples, or the corrupt inputs that prompt machine learning models to malfunction, to the model. The model then labels these known malicious inputs as threatening. In the same way that the machine learning model learns to classify data as part of its normal process, the model also trains itself to turn away perturbations. This method involves ongoing maintenance efforts and oversight as attempts to manipulate the machine learning model evolve.

Defensive distillation

Defensive distillation is a highly adaptable method of training machine learning models to prevent adversarial attacks. A “teacher network” trains on a data set. Another network, the “learner network,” uses the teacher network’s class probabilities, or the probability of how the model classifies inputs, as soft targets in its own training. Since the learner network operates with more nuanced information, it can classify information on a deeper level.

Learn more about adversarial machine learning and cybersecurity

Get insights into in-demand skills and career trends by subscribing to our LinkedIn newsletter, Career Chat, and continue learning more about cybersecurity, machine learning, and related topics by exploring these free resources:

Watch on YouTube: Why Cybersecurity Professionals Need to Understand AI

Earn a certificate: 6 machine learning certificates + how to choose the right one for you

Take a quiz: Cybersecurity Career Quiz: Is It Right for You? Find Your Role

Whether you want to develop a new skill, get comfortable with an in-demand technology, or advance your abilities, keep growing with a Coursera Plus subscription. You’ll get access to over 10,000 flexible courses.

Build job-ready skills with Coursera Plus

Start 7-day free trial

Article sources

Gartner. “Forecast: Information Security, Worldwide, 2023-2029, 2Q25 Update, https://www.gartner.com/en/documents/6640834.” Accessed January 9, 2026.

Updated on Jan 22, 2026

Written by:

Coursera Staff

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.