As I mentioned at the beginning of this week, training an NLP model from scratch can be a very time-consuming and expensive. For example, training the BERT models 110 or 340 million parameters from scratch could take multiple days, depending on the CPU or GPU resources you have available. Luckily, there are many pretrained models available, which you can use to simply adapt them to your use case and your data set. Let's discuss the concept of pretrained models a bit further. Before I discuss the concept of model pretraining and fine tuning, I want to highlight the difference between built-in algorithms and pretrained models. In course one, you learned how to use built-in algorithms, for example, the blazing text algorithm, to quickly train a model. The built-in algorithm provided all required code to train the text classifier. You just pointed the algorithm to the prepared training data. This week, you will work with pretrained models. The main difference here is that the model has already been trained on large collections of text data. You will provide specific text data, the product reviews data, to adapt the model to your text domain and also provide your task and model training code. Telling the pretrained model to perform a text classification task, with the three sentiment classes. Now, let's dive deeper into the concept of model pretraining and fine tuning. I already mentioned a few times that pretrained and all key models have been trained on large text corpus, such as large book collections or Wikipedia. In this unsupervised learning step, the model builds vocabulary of tokens, from the training data, and learns the vector representations. You can also pretrain NLP models on specific language data. A BERT model trained on the German language is available as GermanBERT. Another one trained on the French language is available under the name of CamemBERT. BERTje is trained on the Dutch language. There are also many more pretrained models available, that focus on specific text domains and use cases, such as classifying patents with PatentBERT, SciBERT, which is trained on scientific text, or ClinicalBERT, which is trained on healthcare text data. You can use those pretrained models and simply adapt them to your specific data set. This process is called fine tuning. You might be familiar with the concept of transfer learning, which has become popular in computer vision. It's a machine learning technique where a model is trained on one task and then repurposed on a second related task. Now, think of fine tuning as the transfer learning in NLP. If you work with English product reviews as your training data, you can use an English language model, pretrained for example, on Wikipedia, and then fine tune it to the English product reviews. The assumption here is that the majority of words used in the product reviews have been learned already from the English Wikipedia. As part of the fine tuning step, you would also train the model on your specific NLP task. In the product reviews example, adding a text classifier layer to the pretrained model that classifies the reviews into positive, neutral, and negative sentiment classes. Fine tuning is generally faster than pretraining, as the model doesn't have to learn millions or billions of BERT vector representations. Also note that fine tuning is a supervised learning step, as you fit the model using labeled training data. Now, where can you find pretrained models to get started? Many of the popular machine learning frameworks, such as PyTorch, TensorFlow, and Apache mxnet, have dedicated model huts, or zoos, where you can find pretrained models. The open source NLP project, Hugging Face, also provides an extensive model hub with over 8,000 pretrained NLP models. If you want to deploy pretrained models straight into your AWS account, you can use SageMaker JumpStart to get easy access to pretrained text and vision models. JumpStart works with PyTorch Hub and TensorFlow Hub and lets you deploy supported models in one click into the SageMaker model hosting environment. JumpStart provides access to over a 100 pretrained vision models, such as Inception V3, ResNet 18, and many more. JumpStart also lists over 30 pretrained text models from PyTorch Hub and TensorFlow Hub, including a variety of BERT models. In one click, you can deploy the pretrained model in your AWS account, or you can select the model and fine tune it to your data set. JumpStart also provides a collection of solutions for popular machine learning use cases, such as, for example, fraud detection in financial transactions, predictive maintenance, demand forecasting, churn prediction, and more. When you choose a solution, JumpStart provides a description of the solution and the launch button. There's no extra configuration needed. Solutions launch all of the resources necessary to run the solution, including training and model hosting instances. After launching the solution, JumpStart provides a link to a notebook that you can use to explore the solutions' features. If you don't find a suitable model via JumpStart, you can also pull in other pretrained models via custom code. This week, you will work with a pretrained RoBERTa model from the Hugging Face model zoo, then RoBERTa for sequence classification, which is a pretrained RoBERTa model and comes already preconfigured for text classification tasks.