Now that we know what machine learning is and have seen some examples of it, let's talk about how we do machine learning. In this lecture we will get an overview of the main categories of machine learning techniques. After this video you will be able to describe the main categories of machine learning techniques and summarize how supervised learning differs from unsupervised learning. There are different categories of machine learning techniques for different types of problems. The main categories are listed here. They are classification, regression, cluster analysis, and association analysis. We will cover each one in detail in the following slides. In classification, the goal is to predict the category of the input data. An example of this is predicting the weather as being sunny, rainy, windy, or cloudy. The input data in this case would be sensor data specifying the temperature, relative humidity, atmospheric pressure, wind speed, wind direction, etc. The target or what you're trying to predict would be the different weather categories, sunny, windy, rainy, and cloudy. Another example is to classify a tumor as either benign or malignant. In this case, the classification is referred to as binary classification since there are only two categories. But you can have many categories as well. As the weather prediction problem shown here. Another example is to identify hand written digits as being in one of ten categories, zero to nine. Some more examples of classification are classifying a tumor from a medical image as being benign or malignant. Predicting whether it will rain the next day. Determining if a loan application is high-risk, medium-risk or low-risk. Identifying the sentiment of a tweet or review as being positive, negative, or neutral. When your model has to predict a numeric value instead of a category, then the task becomes a regression problem. An example of regression is to predict the price of a stock. The stock price is a numeric value, not a category. So this is a regression task instead of a classification task. If you were to predict whether the stock price will rise or fall, then that would be a classification problem. But if you're predicting the actual price of the stock, then that is a regression problem. That is the main difference between classification and regression. In classification, you're predicting a category and in regression, you're predicting a numeric value. Some other examples of regression are estimating the demand of a product based on time or season of the year. Predicting a score on a test, determining the likelihood of how effective a drug will be for a particular patient, predicting the amount of rain for a region. In cluster analysis, the goal is to organize similar items in your data set into groups. A very common application of cluster analysis is referred to as customer segmentation. This means that you're separating your customer base into different groups or segments based on customer types. For example it would be very beneficial to segment your customers into seniors, adults and teenagers. These groups have different likes and dislikes and have different purchasing behaviors. By segmenting your customers to different groups you can more effectively provide marketing adds targeted for each groups particular interests. Note that cluster analysis is also referred to as clustering. Some other examples of cluster analysis are, identifying areas of similar topography, such as desert region, grassy areas, mountains etc. Categorizing different types of tissues from medical images. Determining different groups of weather patterns, such as snowy, dry, monsoon. And discovering hot spots for different types of crime from police reports. The goal in association analysis is to come up with a set of rules to capture associations between items or events. The rules are used to determine when items or events occur together. A common application of association analysis is known as market basket analysis. Which is used to understand customer purchasing behavior. For example, association analysis can reveal that banking customers who have CDs, or Certificates of Deposits, also tend to be interested in other investment vehicles such as money market accounts. This information can be used for cross selling. If you advertise money market accounts to your customers with CDs they are likely to open such an account. According to data mining folklore a supermarket chain used association analysis to discover a connection between two seemingly unrelated products. They discovered that many customers who go to the store late on Sunday night to buy diapers also tend to buy beer. This information was then used to place beer and diapers close together and they saw a jump in sales of both items. This is the famous diaper, beer connection. Some other applications of association analysis are recommending similar items based on the purchasing behavior or browsing histories of customers. Finding items that are often purchased together, such as garden hose and potting soil, and offer sales on these related items at the same time to drive sales of both items. Identifying web pages that are often accessed together so that you can more efficiently offer up these related web pages at the same time. We have now looked at the different categories of machine learning techniques. They are classification, regression, cluster analysis, and association analysis. We have also seen some examples of each category There is also another categorization of machine learning techniques, and that is supervised versus unsupervised approaches. In supervised approaches the target, which is what the model is predicting, is provided. This is referred to as having labeled data because the target is labeled for every sample that you have in your data set. Referring back to our example of predicting a weather category of sunny, windy, rainy or cloudy, every sample in the data set is labeled as being one of these four categories. So the data is labeled and predicting the weather categories is a supervised task. In general, classification and regression are supervised approaches. In unsupervised approaches on the other hand, the target that the model is predicting is unknown or unavailable. This means that you have unlabeled data. Going back to our cluster analysis example of segmenting customers into different groups. The samples in your data are not labeled with the correct group. Instead, the segmentation is performed using a clustering technique to group items based on characteristics that they have in common. Thus, the data is unlabeled and the task of grouping customers into different segments is an unsupervised one. In general, cluster analysis and association analysis are unsupervised approaches. In summary, in this lecture we looked at the different categories of machine learning techniques. We discussed classification, regression, cluster analysis and association analysis. We also defined what supervised and unsupervised approaches are, in machine learning.