In the last lesson you explored the basic vectorization techniques such as one-hot encoding and bag-of-words. You also discovered the two primary limitations in using this approach to represent text. First, the high-dimensional and sparse vectors, and second the lack of relationship between words. To solve these problems you'll explore a new approach called word embeddings in this lesson. You'll learn how word embeddings and code text and numbers that convey meanings. Let's start with intuition. Let me ask you a question. How would you describe a dog? You might mention its breed, age, size, color, owner, and friendliness. You can easily think of at least 10 different dimensions. Another question, how would you describe a person? This is even more interesting. How about their ID such as a driver license, physical stats such as gender, height, and weight, relationship to you such as family or friend, social network status such as the number of followers, sense of humor, sports and hobbies, even the dog this person has. You get the idea. You can again, easily think of at least 20 different dimensions to describe a person. You might now guess the direction. How would you describe a word then? For example, the word queen. How would you convey the nature of it, its origin, and a sense of belongings, excitement, and trust associated with this word by using quantitative measurements. You can use dimensions and in math, a vector space. You can now generate an idea. How about representing a word in a vector space with dimensions to describe its properties. Not only that you want the distance between the words to indicate the similarities between them. For example queen and king are close to each other, but far from Apple. Additionally, you want this representation to capture the analogy between words. For example, the distance between king and queen is similar to the distance between man and woman. Now you have king minus man plus woman equals to queen. Isn't it amazing to play with words in the same way you play with numbers? Word embedding is technique to encode texts into meaningful vectors. The technique lets you represent texts with low-dimensional dense vectors. You don't need the vector size as big as 20,000 in one-hot encoding, instead you have between single-digit dimensions for small data sets and four digits dimensions for large data sets. Each dimension is supposed to capture a feature of a word. A higher-dimensional embedding captures detailed relationships between words. However, it takes more data and resources to train. Additionally, you don't have sparse vectors anymore. The cells of a vector in word embeddings normally contain values. More importantly, the vectors capture the relationships between words where similar words have a similar encoding. Word embedding is sometimes called a technique of distributed representation indicating that the meanings of a word are distributed across dimensions. Even better instead of specifying the values for the embedding manually, you train a neural network to learn those numbers. This might sound too good to be true. How do word embeddings do the magic to convert words such as king, queen, men, and woman to vectors that convey the semantic similarities? Word embedding is an abstract term or a technique that includes a few concrete algorithms or models, such as the famous Word2vec by Google, glove by Stanford, and FastText by Facebook, etc. In the next lesson we will explore Word2vec which is considered a breakthrough for applying neural networks to text representation.