In this lesson, you will learn about medical question answering. We will cover one of the most popular developments in natural language processing called BERT, and see how it can be applied to answer medical questions. In this lesson, you will learn about label extraction. In course one, you built your chest x-ray classification model from labeled data. In this lesson, you will learn how you can automatically create those labels by extracting mentions of diseases from radiology reports. So let's say a patient or a doctor wants to know more about a medical diagnosis or a treatment. One way they might learn is to ask a question in natural language and get an answer to that question. This is a task of question answering, an important task in natural language processing. And question answering systems, also called QA systems, are used in search engines like Google and in phone conversation interfaces like Siri. For many questions that are entered into search engines, search engines are often able to find the passage of text containing the answer. The challenge is the last step of answer extraction, which is to find the shortest segment of the passage that answers a question. Here for the question, what is the drug forxiga used for? The answer is reduced blood glucose levels. Our model will thus take in a user question, called Q, and a passage that contains the answer to the question. This might be a passage that might be returned by Google search. And the model will produce an answer that is extracted from this passage, here extracted from this part of the passage over here. There have been many recent advances on the question answering task in natural language processing, including recent models called ELMo, BERT, and XLNet. We will look at the BERT model in particular. The BERT model consists of several layers called transformer blocks. Let's look at the input for the BERT model first. The two inputs are the question and the passage. We've seen how we can enter an images into a model, but how do we input text? We can break up the question and the passage into tokens or words. We separate the inputs from the question and from the passage using a special token called the separator token. In reality, BERT further separates words into word pieces and also has a start token at the start, but we can work with the simplification without loss of generality. Now these inputs pass into the model, where they pass through several transformer blocks and are ultimately transformed into a list of vectors. There's one 768-dimensional vector for each of the words. This is called the word representation for a word. Word representations represent words in a way that capture meaning related relationships between words. Distances between words capture how related they are or how often they're used in similar context. So words that are unrelated, like 15 and force, are far, such that their vectors are far away from each other, while words that are similar or close, such that their distances are small. We can attempt to visualize these word dimensions by reducing the dimensions of the vectors to two dimensions using methods such as t-SNE so we can see them graphically. See that the distance between 15 and force is large, while the distance between force and military is small. The distance between 15 and words similar to 15, like 30, is very small.