[SOUND] This lecture is about, Opinion Mining and Sentiment Analysis, covering, Motivation. In this lecture, we're going to start, talking about, mining a different kind of knowledge. Namely, knowledge about the observer or humans that have generated the text data. In particular, we're going to talk about the opinion mining and sentiment analysis. As we discussed earlier, text data can be regarded as data generated from humans as subjective sensors. In contrast, we have other devices such as video recorder that can report what's happening in the real world objective to generate the viewer data for example. Now the main difference between test data and other data, like video data, is that it has rich opinions, and the content tends to be subjective because it's generated from humans. Now, this is actually a unique advantaged of text data, as compared with other data, because the office is a great opportunity to understand the observers. We can mine text data to understand their opinions. Understand people's preferences, how people think about something. So this lecture and the following lectures will be mainly about how we can mine and analyze opinions buried in a lot of text data. So let's start with the concept of opinion. It's not that easy to formally define opinion, but mostly we would define opinion as a subjective statement describing what a person believes or thinks about something. Now, I highlighted quite a few words here. And that's because it's worth thinking a little bit more about these words. And that will help us better understand what's in an opinion. And this further helps us to define opinion more formally. Which is always needed to computation to resolve the problem of opinion mining. So let's first look at the key word of subjective here. This is in contrast with objective statement or factual statement. Those statements can be proved right or wrong. And this is a key differentiating factor from opinions which tends to be not easy to prove wrong or right, because it reflects what the person thinks about something. So in contrast, objective statement can usually be proved wrong or correct. For example, you might say this computer has a screen and a battery. Now that's something you can check. It's either having a battery or not. But in contrast with this, think about the sentence such as, this laptop has the best battery or this laptop has a nice screen. Now these statements are more subjective and it's very hard to prove whether it's wrong or correct. So opinion, is a subjective statement. And next lets look at the keyword person here. And that indicates that is an opinion holder. Because when we talk about opinion, it's about an opinion held by someone. And then we notice that there is something here. So that is the target of the opinion. The opinion is expressed on this something. And now, of course, believes or thinks implies that an opinion will depend on the culture or background and the context in general. Because a person might think different in a different context. People from different background may also think in different ways. So this analysis shows that there are multiple elements that we need to include in order to characterize opinion. So, what's a basic opinion representation like? Well, it should include at least three elements, right? Firstly, it has to specify what's the opinion holder. So whose opinion is this? Second, it must also specify the target, what's this opinion about? And third, of course, we want opinion content. And so what exactly is opinion? If you can identify these, we get a basic understanding of opinion and can already be useful sometimes. You want to understand further, we want enriched opinion representation. And that means we also want to understand that, for example, the context of the opinion and what situation was the opinion expressed. For example, what time was it expressed? We, also, would like to, people understand the opinion sentiment, and this is to understand that what the opinion tells us about the opinion holder's feeling. For example, is this opinion positive, or negative? Or perhaps the opinion holder was happy or was sad, and so such understanding obvious to those beyond just Extracting the opinion content, it needs some analysis. So let's take a simple example of a product review. In this case, this actually expressed the opinion holder, and expressed the target. So its obviously whats opinion holder and that's just reviewer and its also often very clear whats the opinion target and that's the product review for example iPhone 6. When the review is posted usually you can't such information easier. Now the content, of course, is a review text that's, in general, also easy to obtain. So you can see product reviews are fairly easy to analyze in terms of obtaining a basic opinion of representation. But of course, if you want to get more information, you might know the Context, for example. The review was written in 2015. Or, we want to know that the sentiment of this review is positive. So, this additional understanding of course adds value to mining the opinions. Now, you can see in this case the task is relatively easy and that's because the opinion holder and the opinion target have already been identified. Now let's take a look at the sentence in the news. In this case, we have a implicit holder and a implicit target. And the tasker is in general harder. So, we can identify opinion holder here, and that's the governor of Connecticut. We can also identify the target. So one target is Hurricane Sandy, but there is also another target mentioned which is hurricane of 1938. So what's the opinion? Well, there's a negative sentiment here that's indicated by words like bad and worst. And we can also, then, identify context, New England in this case. Now, unlike in the playoff review, all these elements must be extracted by using natural RAM processing techniques. So, the task Is much harder. And we need a deeper natural language processing. And these examples also suggest that a lot of work can be easy to done for product reviews. That's indeed what has happened. Analyzing and assembling news is still quite difficult, it's more difficult than the analysis of opinions in product reviews. Now there are also some other interesting variations. In fact, here we're going to examine the variations of opinions, more systematically. First, let's think about the opinion holder. The holder could be an individual or it could be group of people. Sometimes, the opinion was from a committee. Or from a whole country of people. Opinion target accounts will vary a lot. It can be about one entity, a particular person, a particular product, a particular policy, ect. But it could be about a group of products. Could be about the products from a company in general. Could also be very specific about one attribute, though. An attribute of the entity. For example, it's just about the battery of iPhone. It could be someone else's opinion. And one person might comment on another person's Opinion, etc. So, you can see there is a lot of variation here that will cause the problem to vary a lot. Now, opinion content, of course, can also vary a lot on the surface, you can identify one-sentence opinion or one-phrase opinion. But you can also have longer text to express an opinion, like the whole article. And furthermore we identify the variation in the sentiment or emotion damage that's above the feeding of the opinion holder. So, we can distinguish a positive versus negative or mutual or happy versus sad, separate. Finally, the opinion context can also vary. We can have a simple context, like different time or different locations. But there could be also complex contexts, such as some background of topic being discussed. So when opinion is expressed in particular discourse context, it has to be interpreted in different ways than when it's expressed in another context. So the context can be very [INAUDIBLE] to entire discourse context of the opinion. From computational perspective, we're mostly interested in what opinions can be extracted from text data. So, it turns out that we can also differentiate, distinguish, different kinds of opinions in text data from computation perspective. First, the observer might make a comment about opinion targeting, observe the word So in case we have the author's opinion. For example, I don't like this phone at all. And that's an opinion of this author. In contrast, the text might also report opinions about others. So the person could also Make observation about another person's opinion and reported this opinion. So for example, I believe he loves the painting. And that opinion is really about the It is really expressed by another person here. So, it doesn't mean this author loves that painting. So clearly, the two kinds of opinions need to be analyzed in different ways, and sometimes in product reviews, you can see, although mostly the opinions are false from this reviewer. Sometimes, a reviewer might mention opinions of his friend or her friend. Another complication is that there may be indirect opinions or inferred opinions that can be obtained. By making inferences on what's expressed in the text that might not necessarily look like opinion. For example, one statement that might be, this phone ran out of battery in just one hour. Now, this is in a way a factual statement because It's either true or false, right? You can even verify that, but from this statement, one can also infer some negative opinions about the quality of the battery of this phone, or the feeling of the opinion holder about the battery. The opinion holder clearly wished that the battery do last longer. So these are interesting variations that we need to pay attention to when we extract opinions. Also, for this reason about indirect opinions, it's often also very useful to extract whatever the person has said about the product, and sometimes factual sentences like these are also very useful. So, from a practical viewpoint, sometimes we don't necessarily extract the subject of sentences. Instead, again, all the sentences that are about the opinions are useful for understanding the person or understanding the product that we commend. So the task of opinion mining can be defined as taking textualized input to generate a set of opinion representations. Each representation we should identify opinion holder, target, content, and the context. Ideally we can also infer opinion sentiment from the comment and the context to better understand. The opinion. Now often, some elements of the representation are already known. I just gave a good example in the case of product we'd use where the opinion holder and the opinion target are often expressly identified. And that's not why this turns out to be one of the simplest opinion mining tasks. Now, it's interesting to think about the other tasks that might be also simple. Because those are the cases where you can easily build applications by using opinion mining techniques. So now that we have talked about what is opinion mining, we have defined the task. Let's also just talk a little bit about why opinion mining is very important and why it's very useful. So here, I identify three major reasons, three broad reasons. The first is it can help decision support. It can help us optimize our decisions. We often look at other people's opinions, look at read the reviews in order to make a decisions like buying a product or using a service. We also would be interested in others opinions when we decide whom to vote for example. And policy makers, may also want to know people's opinions when designing a new policy. So that's one general, kind of, applications. And it's very broad, of course. The second application is to understand people, and this is also very important. For example, it could help understand people's preferences. And this could help us better serve people. For example, we optimize a product search engine or optimize a recommender system if we know what people are interested in, what people think about product. It can also help with advertising, of course, and we can have targeted advertising if we know what kind of people tend to like what kind of plot. Now the third kind of application can be called voluntary survey. Now this is most important research that used to be done by doing surveys, doing manual surveys. Question, answer it. People need to feel informs to answer their questions. Now this is directly related to humans as sensors, and we can usually aggregate opinions from a lot of humans through kind of assess the general opinion. Now this would be very useful for business intelligence where manufacturers want to know where their products have advantages over others. What are the winning features of their products, winning features of competitive products. Market research has to do with understanding consumers oppinions. And this create very useful directive for that. Data-driven social science research can benefit from this because they can do text mining to understand the people's opinions. And if you can aggregate a lot of opinions from social media, from a lot of, popular information then you can actually do some study of some questions. For example, we can study the behavior of people on social media on social networks. And these can be regarded as voluntary survey done by those people. In general, we can gain a lot of advantage in any prediction task because we can leverage the text data as extra data above any problem. And so we can use text based prediction techniques to help you make predictions or improve the accuracy of prediction. [MUSIC]