Hierarchical Clustering using Euclidean Distance

Offered By
Coursera Project Network
In this Guided Project, you will:

Understand the importance and usage of the hierarchical clustering using skew profiles.

Locate and process the viral cDNA genome files to calculate the skew profiles.

Understand the theory for using the Pythagorean equation to calculate the Euclidean distance. And apply that using python to build a linkage matrix.

Understand how errors occur, how to avoid them, and resolve their sources.

ClockAbout 75 minutes required for the project and 45 for the other materials (reading and assignment).
IntermediateIntermediate
CloudNo download needed
VideoSplit-screen video
Comment DotsEnglish
LaptopDesktop only

By the end of this project, you will create a Python program using a jupyter interface that analyzes a group of viruses and plot a dendrogram based on similarities among them. The dendrogram that you will create will depend on the cumulative skew profile, which in turn depends on the nucleotide composition. You will use complete genome sequences for many viruses including, Corona, SARS, HIV, Zika, Dengue, enterovirus, and West Nile viruses.

Skills you will develop

Python ProgrammingGenomicsplotting

Learn step-by-step

In a video that plays in a split-screen with your work area, your instructor will walk you through these steps:

  1. Task 1: Getting Started with Hierarchical Clustering

  2. Task 2: Locate and Process The Data Files

  3. Task 3: Understand The Result Dataset

  4. Task 4: Hierarchical Clustering - Metric

  5. Task 5: Hierarchical Clustering - Ordering & Methods

  6. Task 6: Dendrogram Plotting

  7. Task 7: Dendrogram - Analysis

  8. Task 8: Errors to Avoid

How Guided Projects work

Your workspace is a cloud desktop right in your browser, no download required

In a split-screen video, your instructor guides you step-by-step

Frequently asked questions

Frequently Asked Questions

More questions? Visit the Learner Help Center.