Hi, everyone. Welcome to the class. This video will talk about introduction to machine learning. Before we talk about machine learning, let's talk about a separate buzzword here. You might have heard about data science, which some of you might be taking other courses in data science. Surely you heard about machine learning because this course is going to be about machine learning. Many of you may have heard about artificial intelligence, which is another buzzword along with the machine learning and data science these days. Maybe you also heard about deep learning, which is another heard word. Let's briefly talk about what these are. For data science, it's a really big interdisciplinary field about data. You can think about it's actually anything to do with data, including data pipelining, even data collection and data munging and cleaning and data analysis, which may include the manual data analysis, including some simple checks or exploratory data analysis, or can also include the machine learning techniques to analyze the data. Since the data science has a really big spectrum, it can oftentimes called the soft and hard data science. Soft data science means delimited techniques that doesn't require a lot of software engineering skills or a lot of math skills, something like data visualization and reporting, dashboard, those kind of thing, as well as simple data analysis can fall into the category. Whereas the hard data science, those involve more mathematical, more technical skills, such as analyzing data or building systems using machine learning. Also, data science can deal with the data that's small size that can fit into your Excel file or something like that. Or it's big data, that sits in the big data warehouse. In the industry, the job description looks like this. The data scientists, their job will tend to vary. They can do data collection, cleaning, munging data, or preparing data for whatever the company needs. Or they can build the machine learning models and do the testing on those data to build a system. As well as they can also do the visualization and stuff. Usually data scientists have diverse backgrounds and they require interdisciplinary knowledge. machine learning. You mentioned the machine learning several times during the talk about data science. Machine learning is part of data science, and it is also a subfield of artificial intelligence. It focuses on learning algorithm and building models and training them on the data. A machine learning consists of different types of learning, such as supervised learning, unsupervised learning, or reinforcement learning. Many machine learning models, they are coming from statistical learning, so machine learning extends the statistical learning by encoding more complex algorithm which deal with a more complex data and bigger data and more efficient algorithms. In the industry, machine learning engineers can develop and test emotionally models and design machine learning experiments and build machine learning systems. Artificial intelligence, it has a long history in the CS, and it is about problem-solving with the intelligence. That means an AI agent will make an optimal decision according to its algorithm or whether it has a learning component or not to maximize the goal as a response to that environment. You might think that AI field is very practical because you're seeing a lot of applications these days. However, AI also has a lot of theoretical components in it. In industry, AI engineers and experts, they are more or less similar to ML engineers. With broad sets skills including mass and programming skills as well as machine learning, and they work on building AI system, building machine learning models, natural language processing, robotics and computer vision, and stuff. Deep learning focuses on neural net models, the building neural network models, and training them on data. It also deals with a lot of optimization algorithms and training techniques in order to deal with the complex neural network model training. It is very suitable for complex data such as images, texts and voice, and graphs, and hybrid types of data. It is a subfield of artificial intelligence and subfield of machine learning and in industry, the learning engineers who work on machine learning problems, it deals with some complex data such as images and texts and things like that, or in the high-performance computing. We're going to show some summary diagram. There's a data science which is a big interdisciplinary field, and there's AI, also very big field. data science is about anything to do with data, including data analysis. Whereas artificial intelligence is about solving problems using intelligent algorithms. In the intersection, when the AI algorithm is learning from the data, it is called the machine learning, and particularly, if it deals with a complex data with the neural network architecture, it is called deep learning. Here's the Google Trend on the time on machine learning and software engineering just to compare how the machine learning becomes popular for recent few years. The term machine learning has been around for a long time. However, it became much more popular during the last five or more years. This graph also indicates that the job growth in machine learning has grown out really much, about 350 percent during the past few years. As you can see, machine learning is a top skills in the jobs that involves the AI skills. That already sounds like ML is very cool. Let's talk about what ML can do. Machine learning is applied everywhere these days. For example, when you do online shopping, you often see this product recommendation based on your browsing and shopping history, and those who use machine learning algorithms could predict the products that are more likely to be purchased by the customers. Same goes for movie recommendation and music recommendations. Sentimental analysis is very popular applications these days. It is standard by now that data scientist analyze the texts, such as news articles and social media articles to figure our cities and sentiment on political events. Similarly, the product review scales, or restaurant review scales can be predicted by machine learning algorithms, which information can be important for businesses. Machine Learning is also used a lot in the financial industry. For example, we can focus a stock price using machine learning, and machine learning is also used for algorithmic trading as well as robot divisor which gives them advice for people and how to allocate their assets. Also, it can be used for forecasting housing price. As you can imagine, machine learning can be also useful in medical industry. By applying machine learning models into images or tables, you can help doctors to make a medical diagnosis or medical decisions. It is also used in many science disciplines, such as a bioscience. For example, machine learning techniques can be applied to this graph data to inspect protein interactions or this type of data for the study of genetics. Machine Learning is also key component in on another things. Smart sensors and smart devices produce lot of data, and also machine learning plays a key role on analyzing those data. Machine learning is also used in self-driving cars. Self-driving cars can use machine learning at the learning to recognize images and make good decisions. Let's talk about what we will learn in this course. Here is the data science project life cycle. Data should be collected and pipelined into data warehouse, and there's a data governance that though data warehouse it has to implement, and there will be also data pooling and cleaning and maintaining the data. That part is called data engineering mostly. Data science, on the other hand, focuses on using the data and analyzing those data that were prepared by the data engineering process. You can include the selection of those data from the warehouse, and then cleaning those data in exploratory data analysis and data preprocessing, which means that we prepare data for the model to consume, and after that, data scientists will build a models and then to the model training, and there is a result an depending on the result, they had to go back to build models, or if the result is so strange, then they will have to collect more data or select more data and do this cycle again. Along the steps that we mentioned during the data science project cycle, we're going to talk about a few things and we don't cover everything. For example, this course is not about data collection and pipelining. We'll talk about a little bit of data cleaning in EDA and data preprocessing, but the main focus will be how to build a model, how to select models, and how to do the training and do the testing and analyze those results. Let's talk about what is learning. When these children learn alphabets, they can learn to generalize, so for example, they can recognize this letter, whether it's small, large, or it has a different font, or it has a different color, or the image it looks angled, or the letter is in the word. It seems so obvious for people, however, to make a machine to learn this. It is not trivial. Here is some example of supervised learning. There are images and the labels. The labels are the names of these animals, and a supervised learning model learns to predict the label given data. Unsupervised learning actually resembles very much how human learn. In baby stage, they don't know about the geometric shapes and colors. But over time, they learn to recognize these visual properties and also recognize the similarities and dissimilarities between them even before they learn about the names of this color and shapes. Those type of learning without labels are called unsupervised learning. Unsupervised learning is about learning underlying features and extracting information, recognizing patterns in the data, or clustering similar data points. Another type of learning is reinforcement learning, which the AI agent learns how to act from experience. Experience is either reward or punishment. It is very much similar to the animal training. We give treats or punishment to make the animal behave desired way. Reinforcement learning is used a lot in AI and robotics. Let's change gears. Then I'll talk about some definitions that frequently show up in machine learning. Data and machine learning can be any forms such as tables, images, and texts, and sounds, and graph. It can be any format. However, we're going to talk about mostly the tabulated data format. Let's take an example of this table. Let's say the supervised learning task is to predict a house price. This one will be our labels. These labels also are called targets. All these columns except the labels are there called the features. Also they are called predictors which means that those features and predictors are used in the machine learning models to predict the labels. This row of the tables are called observations or samples which means that the instances of data. Here are some few examples of machine learning tasks. For example, prediction includes a classification regression, and these are in supervised learning. Clustering which groups the similar data points together and anomaly detection and dimensionality reduction, they're in a category of unsupervised learning. There are other machine learning tasks, such as data generation and feature selection which are typically not categorized as supervised learning or unsupervised learning. However, this type of machine learning tasks can be used to enhance the performance of supervised learning tasks. Let's talk about prediction tasks in supervised learning. Prediction task can be either classification or regression depending on the label's datatype. When the label is categorical variable, which means a zero or a one, or 1, 2, 3, or a, b, c, these are called categorical variables. In that case, it becomes classification. If the categories are binary, it is called the binary class classification. If it's a multi categories, call them multiclass classification, binary multi-class. On the other hand, if the label variable is a real-value variable, so something like 0.1, 0.999, something like that, or 3.4. Real value variables to predict those labels given the data is called a regression problem. With that in mind, let's talk about how supervised learning works briefly. Here's a data. Data consists of features and target. Feature usually we called x and target is y. Then we have a model, and this feature is input to the model. The model might have a parameters inside or hyperparameters. That means some settings the user get to choose. Then after feeding the features into model, the model will make a prediction. Initially, the model doesn't predict very well, so there will be some error between the prediction and target variable. This error can be used to trick the model to have a better prediction next iteration, and over this iteration, the model becomes more accurate. That's how supervised learning works. Here is a brief taxonomy of supervised learning models. For models that has internal parameters, it is called the parametric models without explaining what those are individually. These are examples of parametric models. Whereas non-parametric models doesn't have internal parameters. These are examples of non-parametric models. Although we are not going to talk about every single model here, we'll talk about most of these models in this class.