Text analytics can be classified into two types of projects based on the data used and the goals of the analysis, exploration and prediction. Text exploration involves methods to extract and summarize relevant information from a body of text. Exploration is analogous to unsupervised learning in traditional machine learning in that there is no target or goal to predict in the analysis. The results are text summaries, clusters, or graphics that describe the original data. In many cases, text exploration can be used to extract information from unstructured data and then organize that information into tables for use in subsequent projects. For example, information retrieval involves finding documents with relevant content, such as finding drug dosages in a collection of medical documents or finding addresses from a county survey. The other type of text project involves the use of text data as an input for predictive modeling using the traditional machine learning algorithms. Text must be analyzed using advanced natural language processing to create numeric representations that can be fed in to predictive models. One example of predictive modeling using text is the use of insurance adjuster notes to model fraud. Account information can be combined with the adjusters notes to predict fraudulent claims, and ideally, the addition of the text information, or adjusters' notes, improves the model performance, allowing more fraud to be detected. In text analytics, the data are referred to as documents. Each document could be a complete work of literature or something short like a doctor's note on a patient's chart. The document collection is called the corpus and represents all the text to be used in a project. The language in a corpus is used for natural language processing, as things like syntax and parts of speech are language specific. Dictionaries are used to exploit human knowledge about the documents. In the context of text analytics, dictionaries are lists of words that have a specific role or purpose in the analysis. For example, a stop-list is a dictionary of words that should be ignored by the natural language processing algorithms because they don't contain useful information about the text. Words like is, and, and the generally don't provide any predictive value in text analytics. Visual exploration of the text is useful to see what types of topics and concepts can be extracted from the text. It can provide a useful starting point for more detailed analysis using natural language processing. A word cloud presents text from the corpus, and the size of the text strings corresponds to the frequency of the word in the corpus. A bar chart can also highlight the same information. Other metrics besides frequency can also be used to create the graphics. These metrics can be external, like a word cloud of nations with the size of the words determined by the population of the nation, or they can be created as a part of the natural language processing. Word similarity scores can be generated from the corpus to compare terms in the corpus to a word of interest. The size of items in the word cloud can then be determined by the similarity score to the base word of interest.