All right, so you've played around with evaluating different parameters for your topic model. You've inspected different topic models based on those parameters, and you have finally adopted the topic model that you think best fits the data both qualitatively and from a statistical fit perspective. Now it's time to classify data, right? The whole point of topic modeling is to, one, understand what's in your data. So now we've kind of understood what the topics were inside of this Nike review data. But next, we can actually cluster these documents and print out these documents in those clusters, so that we can actually segment the data and inspect one topic or another. If there's one topic as Nike that we're particularly interested in, we can actually print out all of the results for that topic, all of the reviews that mentioned that topic. And we can actually look at it qualitatively to further understand what people are trying to say instead of that topic. So lots of imports that you've already seen, and it all looks good here, right? And we've got to restart our terminal like three times to get this working. But once we have it all working, next, we can actually load in our document, term matrices, we can pick our final model. However, we want to do it, and then we printed out like we did before and then we actually can classify our documents. Before we do that, we want to make sure that we give our topic model labels something descriptive. So what we can do here is actually tell it to replace topic one with a little description of what words are most prominent in that topic. So topic one is last long, so we could say maybe that corresponds to durability. Topic two corresponds to the look and people liking it. Topic three is people love how comfortable they are. Topic four people say they arrive looking exactly like they expected. Topic five is would buy it again. So you can play with the lambda parameter here, as you lower the lambda, I believe you get more words in your topic descriptions. I think two to three words is good. And I think right here you know as these topic descriptions are presented, I actually can interpret most of them qualitatively quite easily. So I'm not saying this is the right fit for these topics, I think you're going to be able to do far better than I. But still this is starting to look like something that we can actually interpret and make sense of. So now that we've got our labels, we can actually go through and rank each document in our corpus as to which topic it most belongs to. So documents zero, it belongs to topics 17 with a high score, and then there's a far distant rank for topic 18. So it really belongs to topic 17, you can see that we want to see stark differences here. We want we want our machine to be able to be pretty sure that it belongs to one topic, and then maybe kind of sure that it belongs to another. Because in Amazon reviews, we don't expect many different topics to be assigned to a given document. Now, if this was a book, then we might expect there to be up to five topics that this would be associated with, and we would expect these scores to be similar across topics, right? However, this is an example of a tourist dataset, where we just don't have a lot of features or terms in any given document. So we probably expect there to be one or two topics per document, so that's pretty much it. I mean, we can then take this and we can say, hey, we can then actually go through iterate through the classifications. And we can actually print out the ranks, and we can print out the locations. You can actually then see these visualized. You can actually go through and search the text to see if you wanted to say, well, what topics correspond to the word battery. Well, you can see here, you can actually search through the text of the actual document classifications and look for the term battery, and then see which topics tend to have battery associated with them. And then, of course, you can actually save these document classifications to an Excel file. So you can send it to someone who's qualitative to make sense of these topics and the documents inside of the topics. So we've got LDA of this here, and LDA of this is a really nice way of visualizing topic space in two dimensions. Remember that these documents actually have 200 dimensions, so we can't really visualize it like a document term matrix vector space format that we saw in our examples from lecture. But the way that this is conceptualized, the further overlap we see between two topics, the more we're violating that assumption that topics should have distance from each other. Remember the two assumptions that we're looking for, one, is that documents within a topic should be similar. And two, which is what we're looking at here, is that the distance between topics should be minimized. Does that look like that's the case here? Not really, right? There's a ton of overlap there, and they look pretty much the same. If you look at the key words here that are associated with these topics, they're pretty much the same comfortable appears in both of these. Great appears in both of these, they're pretty similar. So what we'd love to do is get this merged into one topic probably the same here and definitely the same here. So we can try to get the computer to merge these by making sure that we look at this as we play with K as well. So this is a really nice tool, you can see how the relevant metrics kind of change. And so, it's really, really nice is you can actually play here with the most relevant terms for any given topic. And you can slide it, and see how that actually changes the way that that topic is represented, and that's nice as well. But again, we're looking for space and we definitely see some in it. We see some good, there is definitely some distance here between the majority of topics, but there's also some more. Will you ever be able to create a topic model where distance is always guaranteed between all topics? No, unless the documents are so, so different, you're going to have some overlap because in reality topics kind of do overlap with each other and concepts do overlap with each other. So really expect to see some overlap, and this is a really nice visualization that you can give a client. And that they can play around with and kind of see what the topics were and the top words associated with those topics were. There are some other really fun little things that you can do with TM tool kit that I think are great. Some of the things that you can do or just print out little word clouds for any given topic. And this is really nice, and I'm not a huge fan of word clouds, but if you want to represent a topic in a visual way I think either LDA visit or printing out these topic clouds are probably the best way to go. So you have a lot of tools here in your toolkit using TM toolkit. What do we do with TM toolkit? Well, we loaded in a corpus of documents. We process those documents down to key features, and we did so in different ways. We ran different models in parallel using different formats of document term matrices, and different specifications for the parameters in which those models were initialized with. And we compared all of those, and they're fit statistics. We then visualized topics, we actually were able to classify documents. And we were able to look at how this whole topic picture kind of coalesced and how it all played out. So there's so much you can do inside of the TM toolkit work bench that I think it's a really great tool for natural language processing in general. That being said, it's not the only way to do topic modeling, there are other ways to do it. And so in our last lecture, we're actually going to use pre-trained models like we did in our supervised deep learning exercise, and we're actually going to see how easy it is to build a topic model using birth