- [Morgan] Though we won't dive deep into this topic, this video will highlight what is possible with data lakes on AWS. Machine learning can be used for a wide variety of purposes and one use case is predictive analytics. For example, one common predictive model is called forecasting, where you analyze historical data to try to predict future outcomes. Let's say you're a large online retailer selling apparel and you want to use machine learning to help drive your inventory levels. Based on the amount of clothes you predict you will sell, you will stock that much inventory. Or maybe you're a company that is offering online content streaming and you want to use machine learning for a recommendations engine where an algorithm will suggest new content for the user to consume based off of their previous preferences. Beyond the use case, you might be wondering, how does machine learning relate to data lakes? Well, in order to train a machine learning model to do prediction, classification, or other types of machine learning, you need lots and lots of data. It's common to store training data in a data lake and integrate with AWS services to use the data for machine learning purposes. Data lakes help you collect, store, prepare, and categorize the data. And because it operates with using a schema-on-read model, it gives the ultimate flexibility for machine learning developers and data scientists. Depending on your skillset and your goals, you might be looking at integrating machine learning into your applications in different ways. You have options where the machine learning services are managed and the hard work is done for you entirely. There are machine learning platform services where a lot of the hard work is done, but it still requires some deep knowledge of machine learning to get the outcome that you want. And then finally, there's the do-it-yourself option where you do all of the hard work. If you are a machine learning expert, you can use the data in your data lake to train your own models and use your own algorithms. You will likely be using EC2 instances for your compute power. AWS provides AWS Deep Learning AMIs or Amazon Machine Images that make it easier to build deep learning models and to build clusters with machine learning and data lake-optimized GPU instances. If you want to dive deep into machine learning, but you don't want to manage your own environment, check out Amazon SageMaker. SageMaker is a platform service that makes the entire process of building, training, and deploying machine learning models easy by providing everything you need to connect to your training data, select and optimize the best algorithm and framework, and deploy your model on auto-scaling clusters of Amazon EC2 instances. And finally, if you want to use machine learning algorithms, but you don't want to develop and train your own models, you can use plugin, pre-built AI functionality directly into your apps via APIs. AWS provides solution-oriented APIs for computer vision and natural language processing, like the services Amazon Recognition for image and video classification and Amazon Comprehend for natural language processing. For example, maybe you run a photo hosting website and you want to use machine learning for classification purposes, like identifying objects in photos that get uploaded to your site. The objects that get detected could be used for accessibility to generate the alt text for the images. This alt text could then be used by screen reader devices. The service Amazon Recognition would be great for this use case, as it provides tags for images or videos based on what it recognizes in the image. Those tags can then be used to create the alt text. When integrating machine learning into your applications or using the data in your data lake alongside machine learning, the service or approach that you choose entirely depends on your use case. It also depends on how much control you want over the entire process. Some services are more convenient to use and require less expertise, but give you less control. Whereas other services require more expertise and work on your side and therefore they are less convenient, but they also give you the most control. On the control side of the spectrum, you have Amazon EC2 and the Deep Learning AMIs. This is the do-it-yourself option, and it gives you the ultimate control, but also requires that you have the most knowledge. In the middle is Amazon SageMaker. This allows you to focus on the algorithms that you want to use without really needing to manage the environment. So you get a good amount of control and you also get a good amount of convenience. Then on the convenience side of the spectrum, you have services like Amazon Recognition and Amazon Comprehend, which are really plug and play services. They give you access to the machine learning models that are pre-trained through APIs. So it's super convenient and it requires the least amount of work, but it also doesn't give you a ton of control over the algorithm itself.