Back to Machine Learning: Clustering & Retrieval

4.6

stars

1,966 ratings

•

334 reviews

Case Studies: Finding Similar Documents
A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover?
In this third case study, finding similar documents, you will examine similarity-based algorithms for retrieval. In this course, you will also examine structured representations for describing the documents in the corpus, including clustering and mixed membership models, such as latent Dirichlet allocation (LDA). You will implement expectation maximization (EM) to learn the document clusterings, and see how to scale the methods using MapReduce.
Learning Outcomes: By the end of this course, you will be able to:
-Create a document retrieval system using k-nearest neighbors.
-Identify various similarity metrics for text data.
-Reduce computations in k-nearest neighbor search by using KD-trees.
-Produce approximate nearest neighbors using locality sensitive hashing.
-Compare and contrast supervised and unsupervised learning tasks.
-Cluster documents by topic using k-means.
-Describe how to parallelize k-means using MapReduce.
-Examine probabilistic clustering approaches using mixtures models.
-Fit a mixture of Gaussian model using expectation maximization (EM).
-Perform mixed membership modeling using latent Dirichlet allocation (LDA).
-Describe the steps of a Gibbs sampler and how to use its output to draw inferences.
-Compare and contrast initialization techniques for non-convex optimization objectives.
-Implement these techniques in Python....

Aug 25, 2016

excellent material! It would be nice, however, to mention some reading material, books or articles, for those interested in the details and the theories behind the concepts presented in the course.

Jan 17, 2017

Excellent course, well thought out lectures and problem sets. The programming assignments offer an appropriate amount of guidance that allows the students to work through the material on their own.

Filter by:

By Dongliang Z

•Mar 22, 2018

I enjoyed this course. This specialization is very good for machine learning beginner. Look forward to the next course anyway.

By Целых А Н

•Jun 07, 2020

Find the course useful. The authors presented a simple and clear visualization of the meaning of algorithms. Excellent!

Thanks!

By Robert C

•Feb 16, 2018

Emily was fantastic at explaining difficult to understand concepts. Thoroughly enjoyed the course, and learned quite a lot.

By Kuntal G

•Nov 03, 2016

Very Good in depth explanation and hand-on lab machine learning course. very focused on real world analytics and algorithms

By Arun K P

•Oct 27, 2018

Very useful and informative .It help and provide confidence to the job more effectively. Thanks for the help and good cour

By José M T

•Apr 14, 2017

The teachers are really amazing. They do not just explain it as if they read a book. They explain the concepts very well

By Vikash S N

•Feb 03, 2019

It was great but I was also interested to implement the solutions with pyspark...though I did it eventually. Thank you!

By MARC G

•Oct 21, 2017

Clear and well designed course. The assignments are quite thorough. Sometimes, quiz question are not so clear though.

By Andrey N

•Mar 12, 2017

Some themes are shown very superficially it would be great to go deeper. Despite of this the course is great!

Thanks.

By Rohan K

•Mar 22, 2018

Good introduction to very complicated concepts. I now have the tools to learn more about HHMs and anomaly detection.

By Justin K

•Aug 17, 2016

An interesting topic, presented well by the instructor and reinforced by intermediate-level programming assignments.

By Somu P

•Nov 17, 2018

Excellent course, which gives you all you need to learn about machine learning. Concepts and hands on practical ex

By Freeze F

•Oct 26, 2016

From LDA onwards the pace ramped up ! Please be slow during advance topics. But altogether it was a great course.

By Fahad S

•Nov 03, 2018

Emily ross is an amazing instructor. The course introduces many complex topics and presents them intuitively.

By Patrick M

•Aug 09, 2016

Excellent course. Nice selection of algorithms reviewed - all clearly explained with sample implementations.

By Jorge L

•May 26, 2017

I'm a grad student and I can notice the instructor makes a difference in this course. I fully recommend it.

By ADONGDONG

•May 07, 2017

very good! strongly recommend to people who want to start a career on data science or are interested in it

By Prasant K S

•Dec 21, 2016

It is explained in simple and lucid language by expert Emily and codes illustrated by Carlos. Go for it.

By João F A d S

•Aug 07, 2016

Great course. Well packed, well explained, nice practical examples, good all around MOOC with of info.

By Geoff B

•Jul 14, 2016

Another great introduction. The assignments are notably a little bit harder than the previous courses.

By Susree S M

•Nov 14, 2018

This course is very useful to know about the concepts of machine learning and do hands-on activities.

By gaston F

•Oct 11, 2016

This course was awesome as all the previous courses, I'm waiting to the next course and the capstone

By Sayan B

•Dec 05, 2019

This is actually a tremendous course. Assignments are not so good, but the materials are wonderful.

By Suresh K P

•Dec 21, 2017

Interesting, lot of Algorithms and methods to use iin upcoming projects and real time applications

By Gillian P

•Jul 23, 2017

A very good course with two engaging and sympathetic teachers. Would love to see the next courses

- AI for Everyone
- Introduction to TensorFlow
- Neural Networks and Deep Learning
- Algorithms, Part 1
- Algorithms, Part 2
- Machine Learning
- Machine Learning with Python
- Machine Learning Using Sas Viya
- R Programming
- Intro to Programming with Matlab
- Data Analysis with Python
- AWS Fundamentals: Going Cloud Native
- Google Cloud Platform Fundamentals
- Site Reliability Engineering
- Speak English Professionally
- The Science of Well Being
- Learning How to Learn
- Financial Markets
- Hypothesis Testing in Public Health
- Foundations of Everyday Leadership

- Deep Learning
- Python for Everybody
- Data Science
- Applied Data Science with Python
- Business Foundations
- Architecting with Google Cloud Platform
- Data Engineering on Google Cloud Platform
- Excel to MySQL
- Advanced Machine Learning
- Mathematics for Machine Learning
- Self-Driving Cars
- Blockchain Revolution for the Enterprise
- Business Analytics
- Excel Skills for Business
- Digital Marketing
- Statistical Analysis with R for Public Health
- Fundamentals of Immunology
- Anatomy
- Managing Innovation and Design Thinking
- Foundations of Positive Psychology