Good to see you again. In this week, you will learn about transfer learning, which is a new concept in this course. Transfer learning allows you to get better results and speeds up training. You'll also be looking at question answering. Let's dive in. In Week 3 of Course 4, you're going to cover many different applications of NLP. One thing you are going to look at is question answering. Given the question and some context, can you tell us what the answer is going to be inside that context? Another thing you're going to cover is transfer learning. For example, knowing some information by training something in a specific task, how can you make use of that information and apply it to a different task? You're going to look at BERT, which is known as the Bidirectional Encoder Representation which makes use of transformers. You'll see how you can use bi-directionality to improve performance. Then you're going to look at the T5 model. Basically, what this model does, you can see here, it has several possible inputs. It could be a question, you get an answer. It could be a review, and you'll get the rating over here. It's all being fed into one model. Let's look at question answering. Over here you have context-based question answering, meaning you take in a question and the context and it tells you where the answer is inside that context over here. This is the highlighted stuff which is the answer. Then you have closed book question answering which only takes the question and it returns the answer without having access to a context, so it comes up with its own answer. Previously we've seen how innovations in model architecture improve performance and we've also seen how data preparation could help. But over here, you're going to see that innovations in the way the training is being done also improves performance. In which case, you will see how transfer learning will improve performance. This is the classical training that you're used to seeing. You have a course review, this goes through a model, and let's say you predict the rating. Then you just predict the rating the same way as you've always been doing. Nothing changed here, this is just an overview of the classical training that you're used to. Now, in transfer learning, let's look at this example. Let's say that you have movie reviews and then you feed them into your model and you predict a rating. Over here you have the pre-train task, which is on movie reviews. Now, in training, you're going to take the existing model for movie reviews, and then you're going to find two units or train it again, on course reviews, and you'll predict the rating for that review. As you can see over here, instead of initializing the weights from scratch, you start with the weights that you got from the movie reviews, and you use them as the starter points when training for the course reviews. At the end, you do some inference over here, and you do the inference the same way you're used to doing. You just take the course review, you feed this into your model, and you get your prediction. You can also use transfer learning on different tasks. This is another example where you feed in the ratings and some review, and this gives you sentiment classification. Then you can train it on a downstream task like question answering, where you take the initial weights over here and you train it on question answering. When is Pi day. The model answers March 14th. Then you can ask them model the same question. When's my birthday over here? It does not know the answer. But this is just another example of how you can use transfer learning on different tasks. Now we're going to look at BERT, which makes use of bi-directional context. In this case, you have learning from deeplearning.ai, is like watching the sunset with my best friend. Over here the context is everything that comes before. Then let's say you're trying to predict the next word, deeplearning.ai. Now, when doing bi-directional representations, you'll be looking at the context from this side and from this side to predict the middle word. This is one of the main takeaways for bi-directionality. Now let's look at single task versus multitask. Over here you have a single model which takes in a review and then predicts a rating. Over here you have another model which takes in a question and predicts an answer. This is a single task each, like one model per task. Now, what you can do here with T5 is, it is the same model that's being used to take the review, predict the rating, and then take the question, and predict the answer. Instead of having two independent models, you end up having one model. Let's look at T5. Over here, the main takeaway is that the more data you have, generally the better performance there is. For example, the English Wikipedia dataset is around 13 gigabytes compared to the C4, Colossal Clean Crawled Corpus is about 800 gigabytes, which is what T5 was trained on. This is just to give you like how much larger the C4 dataset is, when compared to the English Wikipedia. What are the desirable goals for transfer learning? First of all, you want to reduce training time because you already had a pre-trained model. Hopefully, once you use transfer learning, you'll get faster convergence. It will also improve predictions because you'll learn a few things from different tasks that might be helpful and useful for your currents predictions on the task you're training on. Finally, you might require or need less data because your model has already learned a lot from other tasks. If you have a smaller dataset, then transfer learning might help you. You now know what you're about to learn. I'm very excited to show you all these new concepts. In the next video, we'll start by exploring transfer learning.