In this module, you focused on solving one problem, which is often the first challenge in building a custom NLP project. How can you represent text in a numeric format while retaining its meaning? You started with tokenization, which divides text into smaller language units, such as words. You then learned pre-processing, which prepares words for model training. For example, you can use stemming to keep the word to root format, and stop word removal to remove the low information words such as A [inaudible]. After that, you advanced to the major topic of this module, texts representation. Your goal was to solve two major problems in text representation. How can you turn text into numbers while retaining meaning and how can you feed these numbers into an ML model? You walked through the three major categories of modern text representation techniques, from basic vectorization toward embeddings and a transfer learning. Basic vectorization is a simple but fundamental technique to encode texts to vectors. You explored two major methods, including one hot encoding, which encodes the word to a vector where one corresponds to its position in the vocabulary and zeros to the rest, and bag-of-words, which encodes the word to the frequency it occurs in a sentence. You then explored word embeddings, which are an improvement from basic vectorization. Word embeddings represent words in a vector space where the distance between them indicates semantic similarity and difference. By doing so, this technique encodes texted ends and lower-dimensional vectors that convey meanings. Specifically, you learned word2vec, the most popular word embeddings model in depth. Word2vec includes two different techniques; continuous bag-of-words and skip-gram. Continuous bag-of-words uses context words to predict center words. For example, once the blank given the surrounding words, such as dog, is, a, and person. Opposite to continuous bag-of-words, CBOW, the other method, skip-gram uses center words to predict context words. For example, given chase, what are the probabilities of other words, such as a, dog, and person occur in the surrounding context? You walked through the neural network training of CBOW step by step, starting with the input layer that uses one hot encoder to represent each context word. You used E, the embedding matrix to embed the input layer, and then sum the vectors to get the hidden layer, H. You multiply H with another embedding matrix E, and feed the result to a SoftMax function to get the probability. Why? You compare the output vector with the actual result and use backpropagation to adjust the weights in the embedding matrices, E and E. Iterate this process until the difference between the predicted result and the actual result reaches minimum. Now you get the optimum, E, the word2vec embedding matrix. Although word embeddings are a breakthrough for using neural network to learn text representation, they are expensive to train. The best practice in current NLP is to use transfer learning, which relies on pre-trained embeddings on a massive corpus, and then fine tune them for specific tasks. You practiced how to use transfer learning and reusable embeddings on TF Hub in the hands-on practice. We hope you now have a comprehensive understanding of the different techniques to present text in NLP. In the next module, you'll feed text to different NLP models and train them to accomplish various tasks. See you soon.