Welcome to Supervised vs Unsupervised Learning. After watching this video, you will be able to: Define Supervised and Unsupervised Learning. List examples of supervised and unsupervised machine learning use cases. Define Clustering. And, differentiate between Supervised and Unsupervised Learning. An easy way to begin grasping the concept of supervised learning is by looking directly at the words that make it up. Supervise means to observe and direct the execution of a task, project, or activity. We aren’t going to be supervising a person. Instead, we’ll be supervising a machine learning model that might be able to produce a classification of a data set into its respective regions as we see here. So, how do we supervise a machine learning model? We do this by “teaching” the model. We load the model with the knowledge to predict future instances. But this leads to the next question. “How exactly do we teach a model?” We teach the model by training it with some data from a labeled data set. A labeled data set categorizes data into classes. For example, if we had data about cancer, labels could include ‘benign’ or ‘malignant.’ As you can see, we have some historical data for patients, and we know the class of labels of each row. Let’s start by introducing some components of this table. The names up here, which are called clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, and so on, are called features. The columns are called features or input variables, including the data. If you plot this data and look at a single data point on a plot, it will have all these attributes. These attributes form a row on this chart, also referred to as an observation. Looking at the data, you may notice two types: The first is numerical. When dealing with machine learning, this is the most used data type. The second is categorical. That is, non-numeric data because it contains characters rather than numbers. In this case, it’s categorical because this data set is made for classification. There are two types of supervised learning techniques: Classification and Regression. Classification is the process of predicting a class label or category. For example, “Will I pass or fail my biology test”? In this case, there are only two outcomes, and I can only be bucketed in one at a time; for example, either I will pass, or I will fail the exam. Regression is the relationship between a dependent and an independent variable. The dependent variable is a continuous variable that we want to predict, and the independent variables are the variables that, we believe, influence the value of the dependent variable. Unlike classification, the outcome that we want to predict is a continuous variable rather than a categorical value. Look at this data set. Assume we want to predict the CO2 emissions of a new car that is not listed in this data set. CO2 emission is the dependent variable because it is what we want to predict, and it is a continuous variable. We know the values of its engine size, cylinders, and fuel consumption, these are the independent variables because, we believe, these variables influence the CO2 emission. Given this data set and the information you have based on the independent variable, you can calculate a regression line to predict the CO2 emission of a car. Since we know the meaning of supervised learning, what do you think unsupervised learning means? Unsupervised learning is precisely as it sounds. We do not supervise the model, but we allow it to work independently to discover patterns and structures in the data that may not be visible to the human eye. This means that the unsupervised algorithm trains on the data set and draws conclusions based on the unlabeled data on its own. Unsupervised learning uses more difficult algorithms than supervised learning because we know little to no information about the data or the outcomes that are to be expected. Unsupervised learning techniques such as dimension reduction, density estimation, market basket analysis, and clustering are the most widely used unsupervised machine learning techniques. Dimensionality reduction and/or feature selection plays a significant role in unsupervised learning by reducing redundant features to make the classification easier. Market basket analysis is a modeling technique based upon the theory that if you buy a particular group of items, you are more likely to buy another group of items. Density estimation is a straightforward concept mainly used to explore the data and find some structure. And finally, clustering. Since clustering is one of the most popular unsupervised machine learning techniques, let us look at it more in-depth. Clustering works by grouping data points or objects that are somehow similar by the characteristics of the data. Clustering is mainly used for Discovering the structure of the data Summarizing the data And, detecting anomalies within a data set. Cluster analysis has many applications in different domains, whether helping a bank segment its customers, based on specific characteristics, or assisting an individual group with their favorite types of music! To recap, the most significant difference between supervised and unsupervised learning is that supervised learning deals with labeled data while unsupervised learning deals with unlabeled data. In supervised learning, we have machine learning algorithms for classification and regression. In unsupervised learning, we have methods such as clustering. In comparison to supervised learning, unsupervised learning has fewer models and fewer evaluation methods that can be used to ensure that the outcome of the model is accurate. As such, unsupervised learning creates a less controllable environment as the machine creates outcomes for us. In this video, you learned that: In supervised machine learning, we use labeled data to train our model. In unsupervised machine learning, we use unlabeled data to train our model and allow the model to work independently to discover patterns and structures in the data that may not be visible to the human eye. There are two types of supervised learning techniques: classification and regression. And clustering is a grouping of data points or objects that are somehow similar.