What Is Hierarchical Clustering?

Written by Coursera Staff • Updated on

Explore hierarchical clustering, an exciting statistical analysis technique designed to effectively group data. Plus, learn simple steps you can take to build your background so you can start performing clustering algorithms yourself.

[Featured image] A statistician uses colored sticky notes to sort data, perhaps using hierarchical clustering.

Hierarchical clustering is a commonly used clustering technique that groups data in a hierarchical structure, making it easier for you to sort and understand your data. Read on to examine what hierarchical clustering is, how the algorithm works, different methods you can choose, and common uses of this type of clustering analysis. 

Read more: Types of Clustering

What is hierarchical clustering?

Hierarchical clustering is a data analysis technique designed to sort data points into clusters, or groups, based on a set of similar characteristics. Hierarchical clustering works by creating a cluster “tree,” where clusters start larger and then break down into smaller groups at each branching point in the tree.

For example, you might be trying to categorize a group of animals. To do this, your top (largest) cluster might be the entire set of animals. Below this, your data might branch into smaller clusters by type of animal (mammal, reptile, and so on) and then further break down into clusters of species (mammals would break down into deer, dog, cat), and as you descend farther down your cluster tree. Hierarchical clustering is typically used for hierarchical data, meaning data that is easily subcategorized into different levels of specificity.

Read more: What Is Clustering?

Types of hierarchical clustering

You can choose your hierarchical clustering algorithm based on how you want your data split. The two main choices are divisive methods and agglomerative methods. Understanding each method can help you make a more informed decision about how best to structure the algorithm you’re using for your data.

1. Agglomerative clustering

Agglomerative clustering is a “bottom-up” approach. First, every data point is categorized as an individual cluster. Following this, pairs of data points merge based on similarity. These pairs merge with other, most similar pairs, and so on, as clusters gradually combine up the “tree.” If you had a set of pictures, you could think of this as starting with each picture as its own “cluster.” Then, bit by bit, you would combine the pictures based on similarities until you have one big stack.

2. Divisive clustering

This method is a “top-down” clustering approach. Essentially, it works oppositely compared to agglomerative clustering. Continuing with the photo analogy, you would start with one big pile of photos and gradually divide it into smaller piles based on differences until each photo is in a pile by itself or until the piles are distinct enough for your purpose

How does the hierarchical clustering algorithm work?

In a broad sense, the hierarchical cluster algorithm works in three main steps:

  • Step 1: You'll need to assess how similar or different each pair of data points in your data set is. This is usually done by calculating the distance between pairs using a specific function. This function can calculate distances in several ways, depending on what makes the most sense for your data and the insights you’re trying to gain. Whether you are using an agglomerative or divisive clustering approach, you may start with the smallest or largest clusters here.

  • Step 2: Once you have all these distance measurements, the next step is to start grouping or dividing data points based on their proximity. This is where a function commonly referred to as linkage comes into play. Linkages pair up the closest points using the distance data, forming clusters. For agglomerative functions, these initial clusters then group with other clusters or points, gradually building up a hierarchical tree representing your data in relatedness levels. For divisive methods, distance measures inform how to break down larger clusters further.

  • Step 3: The final step involves deciding how to cut the hierarchical tree into meaningful clusters. This prunes the tree at specific points, effectively deciding the final grouping of data points into clusters. Each branch cut from the tree represents a cluster, with all data points beneath that branch being members of the same group. How broad or specific you want your final grouping will depend on your subject area and purpose.

What is hierarchical clustering used for?

Professionals use hierarchical clustering in many applications. For example, hierarchical clustering has been used in the following contexts:

  • Customer segmentation: Organizations rely on hierarchical clustering to group customers based on location, behavior, and demographics. This helps in targeting marketing campaigns more effectively.

  • Crime analysis: Law enforcement agencies can apply hierarchical clustering to categorize different types of crimes or to find patterns in criminal activity, which assists in resource allocation and preventive strategies.

  • Health care and medical diagnosis: This technique can group patients with similar symptoms or medical histories, aiding in diagnosis and care quality.

  • Natural language processing (NLP): In NLP, you can use hierarchical clustering to group words or phrases with similar meanings. This commonly used machine learning and information retrieval technique analyzes text and identifies trends.

  • Recommendation systems: They can help group similar products based on consumer characteristics, enhancing the accuracy of recommendations.

Who uses hierarchical clustering?

Any professional who needs to cluster data can find applications for this method, which means it is not industry-specific. Some ways you might see the algorithm professionals use include the following examples.

  • Researchers for uncovering patterns in data

  • Marketers for segmenting customers and tailoring strategies

  • Retailers for organizing products and understanding customer preferences

  • Law enforcement agencies for analyzing crime data and patterns

  • Biologists for analyzing patterns of gene expression

Pros and cons of hierarchical clustering

While hierarchical clustering applies to many fields, knowing the advantages and disadvantages can help determine if it suits your application. While pros and cons vary, common ones include the following:


  • Straightforward method for understanding data structure and relationships

  • The hierarchy reveals natural groupings, meaning you don’t have to predefine the number of clusters

  • Easy to visualize results because the branches of the dendrogram provide a visual summary of the data clustering, offering insights into data relationships


  • Method relies heavily on the initial data allocation, and measurement errors can skew results

  • Introduces bias if the algorithm is not set up correctly for the type of data

  • Lower efficiency than other clustering methods

How to start learning hierarchical clustering

To begin your journey in hierarchical clustering, focus on strengthening your background in subjects like artificial intelligence and data analysis, such as statistics, computer science, and mathematics. These areas lay the groundwork for understanding data analysis techniques. One way to do this is by pursuing courses or degrees focused on areas related to data analysis and machine learning, such as computer science or information technology. Look for modules that cover hierarchical clustering along with other data analysis methods.

As with any programming language, working on projects involving relevant data is a great way to build proficiency and gain hands-on experience. You can choose project types depending on your interests. You can familiarize yourself with different data types and clustering challenges by practicing with real data sets.

Getting started with Coursera

Hierarchical clustering is one way to group data, so it's easier to sort and analyze. On Coursera, you can take courses designed to strengthen your foundational skills and prepare you to perform cluster analyses. As a beginner, consider taking the Data Science: Foundations using R Specialization, designed to help you learn skills in statistical programming, exploratory data analysis, and more, including cluster techniques. 

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.