[SOUND] So here are some specific examples of what we can't do today and part of speech tagging is still not easy to do 100% correctly. So in the example, he turned off the highway verses he turned off the fan and the two offs actually have somewhat a differentness in their active categories and also its very difficult to get a complete the parsing correct. Again, the example, a man saw a boy with a telescope can actually be very difficult to parse depending on the context. Precise deep semantic analysis is also very hard. For example, to define the meaning of own, precisely is very difficult in the sentence, like John owns a restaurant. So the state of the off can be summarized as follows. Robust and general NLP tends to be shallow while a deep understanding does not scale up. For this reason in this course, the techniques that we cover are in general, shallow techniques for analyzing text data and mining text data and they are generally based on statistical analysis. So there are robust and general and they are in the in category of shallow analysis. So such techniques have the advantage of being able to be applied to any text data in any natural about any topic. But the downside is that, they don't give use a deeper understanding of text. For that, we have to rely on deeper natural language analysis. That typically would require a human effort to annotate a lot of examples of analysis that would like to do and then computers can use machine learning techniques and learn from these training examples to do the task. So in practical applications, we generally combine the two kinds of techniques with the general statistical and methods as a backbone as the basis. These can be applied to any text data. And on top of that, we're going to use humans to, and you take more data and to use supervised machine learning to do some tasks as well as we can, especially for those important tasks to bring humans into the loop to analyze text data more precisely. But this course will cover the general statistical approaches that generally, don't require much human effort. So they're practically, more useful that some of the deeper analysis techniques that require a lot of human effort to annotate the text today. So to summarize, the main points we take are first NLP is the foundation for text mining. So obviously, the better we can understand the text data, the better we can do text mining. Computers today are far from being able to understand the natural language. Deep NLP requires common sense knowledge and inferences. Thus, only working for very limited domains not feasible for large scale text mining. Shallow NLP based on statistical methods can be done in large scale and is the main topic of this course and they are generally applicable to a lot of applications. They are in some sense also, more useful techniques. In practice, we use statistical NLP as the basis and we'll have humans for help as needed in various ways. [MUSIC]