Created by:  University of Illinois at Urbana-Champaign

  • Jiawei Han

    Taught by:  Jiawei Han, Abel Bliss Professor

    Department of Computer Science
Basic Info
Course 5 of 6 in the Data Mining Specialization
Language
English
How To PassPass all graded assignments to complete the course.
User Ratings
4.1 stars
Average User Rating 4.1See what learners said
Syllabus

FAQs
How It Works
Coursework
Coursework

Each course is like an interactive textbook, featuring pre-recorded videos, quizzes and projects.

Help from Your Peers
Help from Your Peers

Connect with thousands of other learners and debate ideas, discuss course material, and get help mastering concepts.

Certificates
Certificates

Earn official recognition for your work, and share your success with friends, colleagues, and employers.

Creators
University of Illinois at Urbana-Champaign
The University of Illinois at Urbana-Champaign is a world leader in research, teaching and public engagement, distinguished by the breadth of its programs, broad academic excellence, and internationally renowned faculty and alumni. Illinois serves the world by creating knowledge, preparing students for lives of impact, and finding solutions to critical societal needs.
Pricing
AuditPurchase Course
Access to course materials

Available

Available

Access to graded materials

Not available

Available

Receive a final grade

Not available

Available

Earn a shareable Course Certificate

Not available

Available

Ratings and Reviews
Rated 4.1 out of 5 of 62 ratings

Very informative lectures, wonderful assignments. This course isn't so easy but it gives you real knowledge and useful experience.

Very detailed introduction of Clustering techniques.

Good, thorough coverage -- for a 4-week course -- of how to cluster. I liked the evaluation of clustering topic especially. Very few other instructors seem to discuss the vitally important evaluation of clustering results in any depth when they teach clustering. Dr. Han explained a comprehensive framework for understanding the effectiveness of any clustering system. I had never seen some of this material before, even though clustering was a topic appearing in a couple of other data science or machine learning courses that I have taken in the past. Ideally I would even wish to see this course extended to 6 or 8 weeks, so that case studies on difficult real datasets can be clustered. For example I had a terribly difficult ordeal last year before I took this course, trying to cluster the Kaggle.com dataset of the BOSCH competition. It has about 90% missing data in every row, and there are 2 million rows in total, and about 4500 columns! Kaggle's BOSCH is a SUPER tough dataset to work with! I hope to come back to try the BOSCH dataset again using my new knowledge of clustering some time soon. The reason I chose to run unsupervised clustering on this BOSCH dataset, which is ostensibly intended for supervised learning, is to eliminate significant amounts of the missing data from being exposed to multiple individual supervised learning models by prior clever grouping of examples. I am still postulating to the current day that clustering and creating another unique supervised learning model for each cluster is the most important step to eliminating missing data in this particular problem.

This course along with the Reading material proposed will give you a big picture of how clustering algorithms work, as well as clustering validation methodologies. It is really useful if you are thinking about applying such algorithms and understanding the state-of-the-art.