Foundations of Data Science: K-Means Clustering in Python

Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

Foundations of Data Science: K-Means Clustering in Python

Instructors: Professor Matthew Yee-King

76,979 already enrolled

Included with

Learn more

Ask Coursera

5 modules

Gain insight into a topic and learn the fundamentals.

735 reviews

Beginner level

Recommended experience

Flexible schedule

3 weeks at 10 hours a week

Learn at your own pace

95%

Most learners liked this course

5 modules

Gain insight into a topic and learn the fundamentals.

735 reviews

Beginner level

Recommended experience

Flexible schedule

3 weeks at 10 hours a week

Learn at your own pace

95%

Most learners liked this course

What you'll learn

Define and explain the key concepts of data clustering
Demonstrate understanding of the key constructs and features of the Python language.
Implement in Python the principle steps of the K-means algorithm.
Design and execute a whole data clustering workflow and interpret the outputs.

Skills you'll gain

Tools you'll learn

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

39 assignments¹

AI Graded see disclaimer

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 5 modules in this course

Organisations all around the world are using data to predict behaviours and extract valuable real-world insights to inform decisions. Managing and analysing big data has become an essential part of modern finance, retail, marketing, social science, development and research, medicine and government.

This MOOC, designed by an academic team from Goldsmiths, University of London, will quickly introduce you to the core concepts of Data Science to prepare you for intermediate and advanced Data Science courses. It focuses on the basic mathematics, statistics and programming skills that are necessary for typical data analysis tasks. You will consider these fundamental concepts on an example data clustering task, and you will use this example to learn basic programming skills that are necessary for mastering Data Science techniques. During the course, you will be asked to do a series of mathematical and programming exercises and a small data clustering project for a given dataset.

This week we will introduce you to the course and to the team who will be guiding you through the course over the next 5 weeks. The aim of this week's material is to gently introduce you to Data Science through some real-world examples of where Data Science is used, and also by highlighting some of the main concepts involved.

What's included

9 videos4 assignments3 discussion prompts

9 videosTotal 22 minutes

Welcome and Introduction3 minutes
Introduction to Data Science3 minutes
What is Data?2 minutes
Types of Data1 minute
Machine Learning4 minutes
Supervised vs Unsupervised Learning3 minutes
K-Means Clustering4 minutes
Preparing your Data2 minutes
A Real World Dataset1 minute

4 assignmentsTotal 100 minutes

Types of Data – Review Information15 minutes
Supervised vs Unsupervised – Review Information15 minutes
K-Means Clustering – Review Information30 minutes
Week 1 Summative Assessment40 minutes

3 discussion promptsTotal 270 minutes

Welcome!30 minutes
Examples of Data120 minutes
Machine Learning in the News120 minutes

What's included

11 videos4 readings10 assignments1 peer review1 ungraded lab

11 videosTotal 37 minutes

2.0: Week 2 Introduction1 minute
2.1 – Introduction to Mathematical Concepts of Data Clustering2 minutes
2.2 – Mean of One Dimensional Lists2 minutes
2.3 – Variance and Standard Deviation4 minutes
2.4 Jupyter Notebooks6 minutes
2.5 Variables4 minutes
2.6 Lists5 minutes
2.7 Computing the Mean3 minutes
2.8 Better Lists: NumPy4 minutes
2.9 Computing the Standard Deviation6 minutes
Week 2 Conclusion1 minute

4 readingsTotal 50 minutes

Population vs Sample, Bias10 minutes
Variability, Standard Deviation and Bias10 minutes
Python Style Guide10 minutes
Numpy and Array Creation20 minutes

10 assignmentsTotal 122 minutes

Population vs Sample – Review Information5 minutes
Mean of One Dimensional Lists – Review Information3 minutes
Variance and Standard Deviation – Review Information4 minutes
Jupyter Notebooks – Review Information20 minutes
Variables – Review Information10 minutes
Lists – Review Information10 minutes
Computing the Mean – Review Information10 minutes
Better Lists – Review Information10 minutes
Computing the Standard Deviation – Review Information10 minutes
Week 2 Summative Assessment40 minutes

1 peer reviewTotal 30 minutes

Use Jupyter Notebooks30 minutes

1 ungraded labTotal 15 minutes

Jupyter Notebook Environment15 minutes

What's included

16 videos10 readings15 assignments

16 videosTotal 53 minutes

Week 3 Introduction1 minute
3.1 Multidimensional Data Points and Features2 minutes
3.2 Multidimensional Mean3 minutes
3.3 Dispersion: Multidimensional Variables3 minutes
3.4 Distance Metrics5 minutes
3.5 Normalisation1 minute
3.6 Outliers1 minute
3.7 Basic Plotting3 minutes
3.7a Storing 2D Coordinates in a Single Data Structure6 minutes
3.8 Multidimensional Mean5 minutes
3.9 Adding Graphical Overlays6 minutes
3.10 Calculating the Distance to the Mean4 minutes
3.11 List Comprehension4 minutes
3.12 Normalisation in Python6 minutes
3.13 Outliers and Plotting Normalised Data3 minutes
Week 3 Conclusion1 minute

10 readingsTotal 120 minutes

Multidimensional Data Points and Features Recap10 minutes
Multidimensional Mean Recap10 minutes
Multidimensional Variables Recap10 minutes
Distance Metrics Recap10 minutes
Normalisation Recap10 minutes
Note on Matplotlib10 minutes
Matplotlib Scatter Plot Documentation20 minutes
Matplotlib Patches Documentation10 minutes
List Comprehension Documentation20 minutes
3.12 Errata10 minutes

15 assignmentsTotal 290 minutes

Multidimensional Data Points and Features – Review Information3 minutes
Multidimensional Mean – Review Information3 minutes
Dispersion: Multidimensional Variables – Review Information5 minutes
Distance Metrics – Review Information6 minutes
Normalisation – Review Information3 minutes
Outliers – Review Information30 minutes
Basic Plotting – Review Information5 minutes
Storing 2D Coordinates – Review Information30 minutes
Multidimensional Mean – Review Information30 minutes
Adding Graphical Overlays – Review Information30 minutes
Calculating Distance – Review Information30 minutes
List Comprehension – Review Information30 minutes
Normalisation in Python – Review Information30 minutes
Outliers – Review Information30 minutes
Week 3 Summative Assessment25 minutes

What's included

8 videos6 readings7 assignments1 peer review

8 videosTotal 37 minutes

Week 4 Introduction1 minute
4.1: Using the Pandas Library to Read csv Files5 minutes
4.1a: Sorting and Filtering Data Using Pandas8 minutes
4.1b: Labelling Points on a Graph4 minutes
4.1c: Labelling all the Points on a Graph3 minutes
4.2: Eyeballing the Data6 minutes
4.3: Using K-Means to Interpret the Data9 minutes
Week 4: Conclusion1 minute

6 readingsTotal 60 minutes

Week 4 Code Resources5 minutes
Pandas Read_CSV Function15 minutes
More Pandas Library Documentation10 minutes
The Pyplot Text Function10 minutes
For Loops in Python10 minutes
Documentation for sklearn.cluster.KMeans10 minutes

7 assignmentsTotal 75 minutes

Using the Pandas Library to Read csv Files – Review Information5 minutes
Sorting and Filtering Data Using Pandas – Review Information10 minutes
Labelling Points on a Graph – Review Information5 minutes
Labelling all the Points on a Graph – Review Information5 minutes
Eyeballing the Data – Review Information5 minutes
Using K-Means to Interpret the Data – Review Information5 minutes
Week 4 Summative Assessment40 minutes

1 peer reviewTotal 60 minutes

Create a Labelled Plot of the Happiness Data60 minutes

What's included

9 videos3 readings3 assignments3 peer reviews5 discussion prompts

9 videosTotal 30 minutes

Introduction to Week 51 minute
5.1 Can a Machine Detect Fake Notes?2 minutes
5.2 Working for a Client5 minutes
5.3 How to Organize Work on Your Project4 minutes
5.4 Dealing With Difficulties3 minutes
5.5 No Data no Data Science: Introduction of the Dataset5 minutes
5.6 Modelling5 minutes
5.7 Presenting the Project Results3 minutes
5.8 Concluding Remarks1 minute

3 readingsTotal 25 minutes

Week 5 Code Resource – the Dataset for our Project10 minutes
Saving plt.scatter Outputs as Figures10 minutes
Additional Recommended Reading for Week 55 minutes

3 assignmentsTotal 44 minutes

How Would You Help? – Review Information10 minutes
Python – Review Information4 minutes
Week 5 Summative Assessment30 minutes

3 peer reviewsTotal 180 minutes

Exploratory Data Analysis60 minutes
Clustering60 minutes
Your Report60 minutes

5 discussion promptsTotal 130 minutes

What Is Required to Train a Machine to Detect Fake Notes?40 minutes
Your Project Plan60 minutes
Self-reflection10 minutes
Tips for Other Learners10 minutes
Do You have Data Science Plans?10 minutes

Instructors

Instructor ratings

(311 ratings)

Professor Matthew Yee-King

University of London

24 Courses437,958 learners

Dr Betty Fyn-Sydney

University of London

1 Course76,979 learners

Offered by

University of London

Goldsmiths, University of London

Explore more from Machine Learning

Status: Free Trial
University of London
Statistics and Clustering in Python
Course
Status: Free Trial
Pearson
Data Science Fundamentals Part 1: Unit 1
Course
Status: Preview
University of Leeds
Programming for Data Science
Course
Status: Free Trial
Pearson
Data Science Fundamentals, Part 1
Specialization

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

5 stars
72.65%
4 stars
20%
3 stars
4.35%
2 stars
1.08%
1 star
1.90%

Showing 3 of 735

Reviewed on Dec 19, 2022

Overall, a great experience but labs could have been better, and few instructors were not very detailed in their approach.

Reviewed on Jun 28, 2020

Very interesting course! The lecturers explain concepts thoroughly which makes the concepts easy to understand even for people without much knowledge in Data Science

Reviewed on Jun 3, 2019

This course is at right level for a beginner (python and analytics) while going into details around K means clustering

View more reviews

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.