Hello everyone. Welcome to the introductory video of Amazon SageMaker. My name is Fan Li and I'm the senior product manager on Amazon SageMaker team. Amazon SageMaker is a fully managed service to help data scientists and developers to build, train, and deploying Machine Learning models quickly and easily. It has three major components. A hosted notebook instance, a distributed on-demand training environment, and a model hosting environment that is elastic, scalable, secure, and reliable. Although using all components, end-to-end brings you a seamless experience on Amazon SageMaker, you have the flexibility to use any combination of those components in order to fit your own workflow. Amazon SageMaker provides hosted Jupiter notebooks that requires no setup. With a few clicks on Amazon SageMaker console or through APIs, you can create a 40 managed notebook instance, which comes with preloaded data science packages such as popular Python libraries, deep learning frameworks, Apache Spark, and so on. You can then start processing your datasets and developing your algorithms immediately. If you want to use extra packages that are not preloaded, you can simply PIP install or Conda install them which will be persisted in that notebook instance. Although you can certainly run training jobs in the hosted notebook instance, many times, you want to get access to more compute capacity, especially for large datasets. In that case, you simply select the type and quantity of Amazon EC2 instances you need and kick-off a train job. Amazon's SageMaker then sets up the compute cluster, performs a training job, and tear down the cluster when the training job is finished. So you only pay for what you use and never worry about the underlying infrastructure. In Amazon's SageMaker, model training is flexible. You can certainly bring arbitrary algorithms, either open-sourced or developed by yourself in the form of Docker images. Although Amazon SageMaker offers a range of built-in high-performance machine learning algorithms that have been optimized for this TPOT training making it highly effective in training models against the mark datasets. For those who want to train your own neural networks, Amazon SageMaker makes it super easy to directly submit your TensorFlow or Apache amex nav scripts for this TPOT training and you can use alternative deep-learning frameworks as well by packaging your own Docker images and bringing them to Amazon's SageMaker. When you are ready to deploy a model to production, you can simply indicate the compute resource requirements for hosting the model and deploy it with just one click. A HTTPS endpoint will then be created to achieve low latency, high throughput inferences. With Amazon SageMaker, you can swap the model behind a endpoint without any downtime or even put multi models behind a endpoint for the purpose of A/B testing. Next I'm going to show you a quick demo. In this demo, I will create a notebook instance from Amazon SageMaker console, build my workflow in the notebook instance to train a simple classification model, and then deploy that model so that I can make inferences against it. Please remember, you can always use Amazon's SageMaker console or APIs to build your workflows, if you prefer to do that. Now, I'm on Amazon SageMaker console and I am creating a notebook instance. I will give it a name, for example, my notebook instance-1. As you can see, I can pick the type of the notebook instance here. Since I only plan to use a notebook instance as my development environment and rely on the on-demand training environment to execute heavy lifting training jobs for me, I just pick the smallest instance, which is Ml.t2.medium. I'm also granting permissions to the notebook instance through IAM role so that I can access necessary AWS resources from my notebook instance without the need to provide my AWS credential. If you don't have IAM role in place, Amazon SageMaker will automatically create a role for you with your permission. For those who want to access resources in your VPCs, you can specify which VPC you want to be able to connect from the notebook instance. You can also secure your data in the notebook instance leveraging KMS encryption. I will go ahead, hit the Create Notebook instance button and this notebook instance will be created and automatically started. Now that my notebook instance is up and running, I will just click on the "Open button" on the right, which brings me to the Jupiter notebook dashboard. For those who are not familiar with Jupiter notebooks, it is an open source web application that allows users to author and execute code interactively. It is very widely used by the Data Scientist community. From the Jupiter notebook dashboard, I can see a list of pre-populated example notebooks showing me how to use Amazon sage maker to build all kinds of machine learning solutions. I can easily make my own version based on one of them. This example notebooks are developed by subject matter experts across Amazon and we will continue adding more examples over time. Let's go through one of the example notebooks here which uses XG boost implementation of boosted trees algorithm to build a direct marketing model. XG boost is an extremely popular open-source package for gradient boosted Trees which is widely used in building classification models. Amazon SageMaker offers XG boost as a built in algorithm so that customer can access it more easily. In this notebook, we will build a model to predict if our customer will enroll for a term deposit at a bank after one or more outreach phone costs. We will use a public data set published by UC Irvine which contains information about historical customer outreach and whether customers have subscribed to the term deposit offered by a bank in Europe. We will first set up some variables that are being used in this notebook and download the dataset from the internet. After that, we will perform some data exploration and transformation to prepare the data for training. We will then kick off a training job into Amazon SageMaker training environment. Last but not the least, after the training is done, we will deploy the model to Amazon SageMaker hosting and make inferences against the degenerated HTTPS endpoint. To start, let's set up some variables such as the IAM role that Amazon SageMaker can use during training and hosting. Depending on your preference, it can be the same or different from the role that you have passed to the notebook instance. I'm also setting up the S3 bucket in my account which I am using to store the dataset as well as the model artifacts that will be written by Amazon SageMaker once the training job is finished. I'm also importing some Python libraries to be used in this notebook. Next, I'm going to download the data set from UC Irvine's Machine Learning Data Repository and take a look at a sample set of it. As you can see here, there are over 40,000 customer records and 20 features associated with each customer, including age, marital status, education level, number of contacts performed, external environments factors such as consumer confidence index. Last column of this dataset is what we call [inaudible]. It tells us whether or not a customer has subscribed to the term deposit which will be our inference or prediction targets for new data points as well. I then go ahead to some data cleansing and transformations. Not going into the detail here, but the goal is to better prepare the data set so as to generate a better model. As a common practice, after I have completed the data exploration and the transformation, I split the data set into training data, validation data, and test data. Training data and validation data will be used in the training process. I am going to use the test data to evaluate model performance after it is deployed. I then upload the datasets to my S3 bucket and move on to the training step. Creating your training job in Amazon SageMaker is pretty straightforward. This cell contains the parameters I need to set up, such as the IAM role, the training AWS image which is provided and managed by Amazon SageMaker since I am using the built in XG boost algorithm. The compute resource needed to run the training jar. In this case, I'm running the job on two MLC42XLR instances. Input data config specify where the training data set is. While output config specified where the training result, which we call model artifacts should be written to. I'm also setting up a bunch of hyperparameters which are algorithms specific and a stopping condition to avoid endless training job execution. After our parameters are set, I initiate an Amazon SageMaker Model Client and call the create training job API to kickoff the training job. For this particular training job, it will just take a few minutes. As you can see here, the training job has been completed and model artifacts have been written to my S3 bucket in the back end. I'm now ready to deploy it into production. I will first create a model in Amazon SageMaker hosting by specifying the model artifacts location as well as the inference image which contains the inference code. Again, since I use the built-in XG boost algorithm, the inference image here is provided and managed by Amazon SageMaker as well. If you were to bring your own model to hosting, you need to provide your own inference image here. I will then create a endpoints, but before that, I need to set up a endpoint configuration first. This is to specify how many models I'm going to put behind a endpoint and the compute resources I need for each of the model. In this demo, I'm only putting one model which is the one I just trend behind endpoint. I am using one MLC4X large instance to host that model. After that, I just call create endpoint API to create the endpoint. Now, the endpoint is created and I can make inferences against it in real time. What I'm doing here is getting the test data I have prepared during the data preparation step, making inferences for each single one record in that dataset and comparing the results with the branches. As you can see here, I managed to get an Error Rate which is not bad. Although, I may need more iterations to improve it. That is all I have for this introductory video. I hope you enjoyed learning about Amazon SageMaker. I'm Fan Li and thank you for watching.