What Is Semantic Segmentation and How Does It Work?

Written by Coursera Staff • Updated on

Semantic segmentation is defined, explained, and compared to other image segmentation techniques in this article.

[Featured image] A person in a yellow shirt scrolls through social media filters on their phone.

If you’ve ever used a filter on Instagram or TikTok, you’ve employed semantic segmentation from the palm of your hand. But this computer vision technique goes far beyond digital makeup and mustaches. You’ll find it hard at work in hospitals, farms, and even Teslas. In the following article, you’ll learn more about how semantic segmentation works, its importance, and how to do it yourself. 

What is semantic segmentation?

Semantic segmentation identifies, classifies, and labels each pixel within a digital image. Pixels are labeled according to the semantic features they have in common, such as color or placement. Semantic segmentation helps computer systems distinguish between objects in an image and understand their relationships. It’s one of three subcategories of image segmentation, alongside instance segmentation and panoptic segmentation. 

Semantic segmentation vs. instance segmentation

Instance segmentation expands upon semantic segmentation by assigning class labels and differentiating between individual objects within those classes.


Semantic segmentationInstance segmentation
DogsYellow dog, brown dog

Semantic segmentation vs. panoptic segmentation

Panoptic segmentation is a hybrid technique combining semantic and instance segmentation for a unified, interpreted view; hence, the prefix pan, meaning “all.” The panoptic segmentation process places objects into the following two categories:

  1. Things. In the context of computer vision, “things” are quantifiable objects with defined shapes, for example, vehicles, people, animals, and trees. 

  2. Stuff. “Stuff” describes objects lacking defined shapes that computer vision can identify by material or texture. Examples include bodies of water, mountain ranges, and the sky. 

How does semantic segmentation work?

Image classification can be a form of supervised machine learning, depending on the case. Image classification models may be trained to recognize objects in images using labeled example photos. This process initially depended upon raw pixel data. However, this data type is prone to uncorrectable fluctuations caused by camera focus, lighting, and angle variations. Introducing a convolutional neural network (CNN) to this process made it possible for models to extract individual features and deduce what objects they represent. 

Semantic models take this approach a step further. After passing input images through the neural network architecture, semantic segmentation models create a color-coded map wherein each color represents a different class label. These defined spatial features help computers identify boundaries between different objects and distinguish between background and foreground focus items. 

Semantic segmentation process

1. Classification. Pixels in an image are assigned a class label representing particular objects. 

2. Localization. Objects are outlined with a bounding box. A bounding box is a line drawn around the perimeter of an object. 

3. Segmentation. In the localized image, pixels are grouped using a segmentation mask. A segmentation mask reduces noise by separating one portion of an image from the rest. One way to visualize segmentation masking is to imagine sliding a piece of black construction paper with a hole cut out over an image to isolate specific portions.


Semantic segmentation use cases

  • Photography and social media filters. All commonly used camera effects and filters on social media applications like Instagram and TikTok rely on semantic segmentation. For example, it identifies the placement of eyes to apply sunglasses. Semantic segmentation also allows cameras to switch between landscape and portrait formats. 

  • Medical imaging analyses. AI segmentation models trained on medical imagery can perform automated analysis to measure and detect anomalies on a pixel level. By highlighting and mapping anatomical features, segmentation enhances visualization for more precise identification of tumors and other irregularities.

  • Agriculture. Farmers employ AI and semantic segmentation to automate maintenance and manage the health of their crops. Computer vision technology helps farmers quickly detect at-risk portions of their fields to eradicate pests or contain infections. 

  • Self-driving cars. Autonomous vehicles rely heavily on semantic segmentation to identify obstacles, analyze road conditions, and map surroundings. 

How to do semantic segmentation

Many different tools and models exist that you can use to perform semantic segmentation. If you’d like step-by-step guidance throughout your project, consider the Semantic Segmentation with Amazon Sagemaker Guided Project on Coursera. You’ll visualize and prepare data for model training via a split-screen web browser environment. To complete this advanced-level project, experience with Python programming, deep learning concepts, and AWS is required. Consider the resources in the following sections if you want to start a semantic segmentation project independently. 

Semantic segmentation data sets

Data sets for semantic segmentation are typically huge and complex. The more diverse labels in the data set, the better the model can learn. Here are a few commonly used segmentation data sets:

  • Microsoft Common Objects in Context (MS COCO). MS COCO is a large-scale data set used for captioning, key-point detection, object detection, and segmentation. It includes over 320,000 images with a wide variety of annotations having been refined by community feedback. 

  • Cityscapes Dataset. The central focus of this data set is the semantic understanding of city and street scenes. It includes 30 different classes, 25,000 annotated images, dense semantic segmentation, and instance segmentation for people and vehicles. 

  • ScanNet. ScanNet is an RGB-D video data set with 2D and 3D data. It comprises 2.5 million indoor views in 1,513 scenes with semantic segmentation annotations and surface reconstructions. 

Semantic segmentation models

Semantic segmentation models are used to classify objects in images. The list below includes a few popular segmentation models:

  • Pyramid Scene Parsing Network (PSPNet). PSPNet uses a pyramid parsing module to discern multi-level features for a more comprehensive context of an image. It’s capable of processing global and local information. 

  • SegNet. SegNet is a semantic segmentation model comprising an encoder network, a decoder network, and a classification layer. 

Keep learning about semantic segmentation with Coursera. 

If you’re new to the field of computer vision, consider enrolling in an online course like Image Processing for Engineering and Science Specialization from MathWorks. You’ll gain a foundational understanding of image processing and analyzing techniques. 

DeepLearning.AI offers an intermediate-level course, Advanced Computer Vision with TensorFlow, to build upon your existing knowledge of image segmentation using TensorFlow

If you’re ready to dive straight into a semantic segmentation project, the Guided Project Semantic Segmentation with Amazon Sagemaker walks you through the entire process. 

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.