In this hands-on lecture, we'll discuss about dictionary-based sentiment classifier based on dictionary call SentiWordNet. We're going to use the same data set that we used for Stanford CoreNLP and LingPipe logistic regression classifier, which is New York Times data set. Same as before, we're going to use just ten simple news articles. Because this SentiWordNet-based approach is unsupervised learning, which means there is no need for training data, and training phase. Simply because of that characteristic as you see from line 34 and down to 46. What you do is simply Parse your data, parsing data, it happens over here, reading and parsing. And then here's, 33 to 46, is tokenized word is matched with SentiWordNet entry. And then if the match, if given token or word has matched entry in SentiWordNet, then it gives a score to that particular word, either negative, either positive. So, let's look at the logic of how scoring works based on SentiWordNet by going in to getSentimentWordNet, so let's select Open the. And simply, if this means high enough, which, the very big size of the word, and then either you, if you create a SentiWordNet model, it's basically you serialize the model. It's not the classifier, you simply, this SentiWordNet 3.0 version of SentiWordNet and to serialize it. Because this SentiWordNet has a huge number of sentiment tokens in there, so probably what we want to do is we want to have effective or efficient searching algorithm, a data structure, like a trie data structure, and so that it has a fast lookup. And this file basically suit the needs. If you have a serialized model, then you simply instantiate SentiWordNet object. If not, then encrypt them, okay? So let's go SentiWordNet object. What you do there is, the first constructor is based on the assumption that you have a serialized model. Second constructor is, it's just a SentiWordNet dictionary, and take that input and then you serialize them, okay? Here it called initializeDictionary. The initializeDictionary just simply does dictionary lookup. So this is based on HashMap. Let's double-check whether it's a HashMap or, yeah. It's Java map. Okay. Think it's a HashMap, but it's a Java map. So Java has, the map container has built-in data structure, so it relies on this HashMap. As I explained before, you can use trie data structure. Trie data structure, in fact, performs better in terms of speed. But since this size of SentiWordNet, HashMap handles properly. But all it does is, initializeDictionary means, you simply take SentiWordNet input, and then you parse this WordNet synset based on SentiWordNet structure. And then for each information there, like whether the term has positive value or negative value or objective value, depending on the value of a query of a given term, then you simply assign that information to the particular comment and store them in this HashMap. Okay, that's what's happening inside this initializeDictionary. Okay, so given this, after you've constructed, getSentiment function takes a sentence and then for each sentence, the sentence has preprocessed information by Stanford CoreNLP. It has the lemma term, it has POS tag for each term. Then here it has several if-else-if statements. If POS tag is adjective or noun or verb, then SentiWordNet has its own POS tagging, and you simply match with the SentiWordNet POS tag with sample coordinate p. And then if that is matched and lemma form is matched with the entry of SentiWordNet along with its POS tag, then it simply gets that information and its score assigned to the particular entry and appended to a sentiment score. That's how it works. So, let's Execute this. SentiWordNet, I'm in the Java, run this. Uppercase. It may take less time than the other two because it doesn't have a training phase. Since it takes ten examples and it applies Stanford CoreNLP preprocessing module and then we don't use classifier, but we called the pipe into our annotation. As you see, each sentence, it has the aggregated sentiment score per each sentence. Which means, if the sentence has several sentiment words, the sentiment word has negative, positive or objective. Then based on the sum of each sentiment word per given sentence, then this is a total sentiment score per each sentence. Let's stop here. So some makes sense, some doesn't make sense. So what I can say here is SentiWordNet-based approach, some cases is it achieves really high performance in terms of accuracy. In many cases it performs really poor because the tokenized terms from test data set, many of them are not match entries in SentiWordNet because this is a jargon, and this is a new word, and so on and so forth, or lemmatized form is not correct. There are many possible reasons. Because of that, SentiWordNet, or any dictionary-based sentiment classification, performs poorer in large-scale experiments or real-world application settings.