Welcome. Last week, you learned how to classify tweets using logistic regression. This week, we'll solve the same problem using a method called the Naive Bayes. It's a very good, quick, and dirty baseline for many text classification tasks, and the concepts you learned will be use later throughout the specialization. Naive Bayes is an example of supervised machine learning, and shares many similarities with the logistic regression method you used in the previous assignment. It's called naive because this method makes the assumption that the features you're using for classification are all independent, which in reality is rarely the case. As you will see, however, it still works nicely as a simple method for sentiment analysis. So as before, you will begin with two corporal, one for the positive tweets and one for the negative tweets. You need to extract the vocabulary or all the different words that appear in your corpus along with their counts. You get the word counts for each occurrence of a word in the positive corpus, then do the same for the negative corpus, just like you did before. Then, you're going to get a total count of all the words in your positive corpus and do the same again for your negative corpus. That is, you're just summing over the rows of this table. For positive tweets, there's a total of 13 words, and for negative tweets, a total of 12 words. This is the first new step for Naive Bayes, and it's very important because it allows you to compute the conditional probabilities of each word given the class as you're about to see. Now divide the frequency of each word in a class by its corresponding sum of words in the class. So for the word I, the conditional probability for the positive class would be 3 over 13. You store that value in a new table with the corresponding value, 0.24. Now for the word I, in the negative glass, you get 3 over 12. So you store that in your new table with the corresponding value, 0.25. Now apply the same procedure for each word in your vocabulary to complete the table of conditional probabilities. One key property of this table is that if you sum over all the probabilities for each class, you'll get one. Let's investigate this table further to see what these numbers mean. First, note how many words have a nearly identical conditional probability. Like I, am, learning, and NLP. The interesting thing here is words that are equally probable don't add anything to the sentiment. In contrast to these neutral words, look at some of these other words like happy, sad, and not. They have a significant difference between probabilities. These are your power words tending to express one sentiment or the other. These words carry a lot of weight in determining your tweet sentiments. Now let's take a look at because. As you can see, it only appears in the positive corpus. So its conditional probability for the negative class is zero. When this happens, you have no way of comparing between the two corpora, which will become a problem for your calculations. To avoid this, you will smooth your probability function. Say you get a new tweet from one of your friends and the tweet says, "I'm happy today, I'm learning," and you want to use the table of probabilities to predict the sentiments of the whole tweet. This expression is called the Naive Bayes inference condition rule for binary classification. This expression says that you're going to take the product across all of the words in your tweets of the probability for each word in the positive class divided by the probability in the negative class. Let's calculate this product for this tweet. For each word, select its probabilities from the table. So for I, you get a positive probability of 0.2 and a negative probability of 0.2. So the ratio that goes into the products is just 0.2 over 0.2. For am, you also get 0.2 over 0.2. For happy, you get 0.14 over 0.10. For today, you don't find any word in table, meaning this word is not in your vocabulary, so you won't include any term in this score. For the second occurrence of I, again, you'll have 0.2 over 0.2. For the second occurrence of am, you'll have 0.2 over 0.2, and learning gets 0.10 over 0.10. Now note that all the neutral words in the tweet like I and am just cancel out in the expression. What you end up is with 0.14 over 0.10 which is equal to 1.4. This value is higher than one, which means that overall, the words in the tweets are more likely to correspond to a positive sentiment, so you conclude that the tweet is positive. So far, you've created a table to store the conditional probabilities of words in your vocabulary and applied the Naive Bayes inference condition rule for binary classification of a tweet. Great. Next, you'll look into some issues with this implementation and how to fix them. Now, you have seen how Naive Bayes can be used to classify the sentiment of a tweet. In the next video, we will simplify the calculations before we implement it.