In this hands on lecture, we'll discuss about extension of LDA by adding a feature functionality within each attachment. It's called DMR. Let's create a Java file called DMRmain.java and insert it in our test of main package which I already created for you, DRMmain.java. Let me explain this main function briefly. Here, we open up two files and read in those two files. One is new Times articles. The other one is raw data. The second file is needed because we want to extract metadata, or observe the third variable that we are interested in. We want to condition on this particular variable so that how topics are generated based upon that particular condition of meta data. In this case we are going to use public case and date of news articles. The publication date is available in nytimes_news_json.txt. So, In this case we, We are going to have 5,000 documents. And then once the 5,000 documents are collected from the news article corpus then we're going to create collection object. The collection object, there is a method called create DMR. The create DMR takes three parameters. The first parameter is number of topics. A second parameter is number of situations. And third parameter is the interval between the result of topic models by DMR. So let's talk about DMR a little bit more. If you recall the lecture notes that I provided for DMR. DMR is not based on mortgage credit, which means it's slower than LDA. Why's it slower? Well, I'm going to explain this later on. So let's take a look at, again, create DMR. Same as for LDA, we need to create a series of pipes. Once you create a series of pipes then same as for LDA, we need to create or instantiate instance list. And then store, parse document into instance list. And then instantiate DMR topic model. So DMR topic model is extended from LDA Hyper. So LDA Hyper is not based on product processing in Mallet. That's why TM or a topic model is much slower than LDA in Mallet. We need to set several parameters for running DMR. One of them is number of topics and number of topic display, interval which means in the console if you see a red font the result. You know…how much, how many kind, how often you want to see the result of topic in a model, DMR topic model then the interval has something to do with that decision. Number retracing and optimal interval. This has impact on the performance of the outcomes. The iteration divided by 10 is a rule of thumb, which results in the best performance. Or you can play with the number here so that it gives you the best or ideal result. And then same as LDA, you instance list and then you estimate. Which you call the Gibbs sampling algorithm, inside estimate function. After that, same as for earlier, you're going to create or save the DMR model and save instance list for DMR for later uses. After that you want to just print out some information. Why is crossed out here, that is because DMR hyper is deprecated but you still can use a deprecated API. But that's the only API that you need to use for DMR. So let's go back to DMR main. Here the core function is create DMR once this line is executed. All you need to do is just print out the parameters, and print out topic. Parameter is the result of topic models conditioned on, in my case, the pop date, which means, let's say pop date is month-based. So January to December, you have 12 months, and you have 12 different variables. For each month, topics has these trends that are influenced by the condition on that month. So generally topic 1 has higher probability than the rest of eleven months. If I have ten topics, then January for the topic 1 has a higher probability then the rest of nine topics. That means, topic 1 particularly discussed and mentioned In January. And we need to take a closer look at the result and why that happened. That may need further analysis and other factors to be considered. Let's execute DRMmain.java. It may take a while. Okay, it parsed 5,000 news articles. Then, It shows ten topics and, What it does is it applies Gibbs sampling inference algorithm. So initially it takes random sampling and try to identify topics, number of words that have high impact on forming ten topics. It's not that slow compared to much bigger size of iteration. Here are ten topics. Then each topic has number of top words and top probability and the sum of the top words frequencies. So let's go to our project and here progress, as I explained it before, for each topic, okay. For each topic we have n number of timestamps. And this topic has probability for the particular date, and downward up to the last time period. Same goes through other nine topics. So what you can do with this result, is you plot this to excel spreadsheet or something like that. And you make a graph. And maybe x-axis is time series. A y-axis is probability of each time stamp. And then, what you do is just plot the topic, ten topics. And then, you can do a line graph or a bar graph or a whatever the correct formats that you want to use. And by doing so, you can keep track of trends of particular topics. Falling trend, rising trend. And or mixed trend and so on, so forth.