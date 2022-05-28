One of the most useful areas in machine learning is discovering hidden patterns from unlabeled data. Add the fundamentals of this in-demand skill to your Data Science toolkit. In this course, we will learn selected unsupervised learning methods for dimensionality reduction, clustering, and learning latent features. We will also focus on real-world applications such as recommender systems with hands-on examples of product recommendation algorithms.
About this Course
Calculus, Linear algebra, Python, NumPy, Pandas, Matplotlib, and Scikit-learn.
What you will learn
Explain what unsupervised learning is, and list methods used in unsupervised learning.
List and explain algorithms for various matrix factorization methods, and what each is used for.
Skills you will gain
- Dimensionality Reduction
- Unsupervised Learning
- Cluster Analysis
- Recommender Systems
- Matrix Factorization
University of Colorado Boulder
CU-Boulder is a dynamic community of scholars and learners on one of the most spectacular college campuses in the country. As one of 34 U.S. public institutions in the prestigious Association of American Universities (AAU), we have a proud tradition of academic excellence, with five Nobel laureates and more than 50 members of prestigious academic academies.
Syllabus - What you will learn from this course
Unsupervised Learning Intro
Now that you have a solid foundation in Supervised Learning, we shift our attention to uncovering the hidden structure from unlabeled data. We will start with an introduction to Unsupervised Learning. In this course, the models no longer have labels to learn from. They need to make sense of the data from the observations themselves. This week we are diving into Principal Component Analysis, PCA, a foundational dimension reduction technique. When you first start learning this topic, it might not seem easy. There is undoubtedly some math involved in this section. However, PCA can be grasped conceptually, perhaps more readily than anticipated. In the Supervised Learning course, we struggled with the Curse of Dimensionality. This week, we will see how PCA can reduce the number of dimensions and improve classification/regression tasks. You will have reading, a quiz, and a Jupyter notebook lab/Peer Review to implement the PCA algorithm.
Clustering
This week, we are working with clustering, one of the most popular unsupervised learning methods. Last week, we used PCA to find a low-dimensional representation of data. Clustering, on the other hand, finds subgroups among observations. We can get a meaningful intuition of the data structure or use a procedure like Cluster-then-predict. Clustering has several applications ranging from marketing customer segmentation and advertising, identifying similar movies/music, to genomics research and disease subtypes discovery. We will focus our efforts mainly on K-means clustering and hierarchical clustering with consideration to the benefits and disadvantages of both and the choice of metrics like distance or linkage. We have reading, a quiz, and a Jupyter notebook lab/Peer Review this week.
Recommender System
This week we are working with Recommender Systems. Websites like Netflix, Amazon, and YouTube will surface personalized recommendations for movies, items, or videos. This week, we explore Recommendation Engines' strategies to predict users' likes. We will consider popularity, content-based, and collaborative filtering approaches, and what similarity metrics to use. As we work with Recommendation Systems, there are challenges, like the time complexity of operations and sparse data. This week is relatively math dense. You will have a quiz wherein you will work with different similarity metric calculations. Give yourself time for this week's Jupyter notebook lab and consider performant implementations. The Peer Review section this week is short.
Matrix Factorization
We are already at the last week of course material! Get ready for another dense math week. Last week, we learned about Recommendation Systems. We used a Neighborhood Method of Collaborative Filtering, utilizing similarity measures. Latent Factor Models, including the popular Matrix Factorization (MF), can also be used for Collaborative Filtering. A 1999 publication in Nature made Non-negative Matrix Factorization extremely popular. MF has many applications, including image analysis, text mining/topic modeling, Recommender systems, audio signal separation, analytic chemistry, and gene expression analysis. For this week, we focus on Singular Value Decomposition, Non-negative Matrix Factorization, and Approximation methods. This week, we have reading, a quiz, and a Kaggle mini-project utilizing matrix factorization to categorize news articles.
About the Machine Learning: Theory and Hands-on Practice with Python Specialization
In the Machine Learning specialization, we will cover Supervised Learning, Unsupervised Learning, and the basics of Deep Learning. You will apply ML algorithms to real-world data, learn when to use which model and why, and improve the performance of your models. Starting with supervised learning, we will cover linear and logistic regression, KNN, Decision trees, ensembling methods such as Random Forest and Boosting, and kernel methods such as SVM. Then we turn our attention to unsupervised methods, including dimensionality reduction techniques (e.g., PCA), clustering, and recommender systems. We finish with an introduction to deep learning basics, including choosing model architectures, building/training neural networks with libraries like Keras, and hands-on examples of CNNs and RNNs.
