As we said here, neither one of those topic models were perfect, and we know that through our intuition. However, there are also some statistics that can help us guide the parameters that we set for our topic models. Again, I'm incorporating and importing everything that we've done so far, including all of our DTMs, and our vocabs, and our document labels. I've imported everything as we have before. Now I'm going to use the evaluation function of tmtoolkit to see how various performance statistics change as we manipulate some of the parameters that we can specify in the topic model. There are some constant parameters that topic model toolkit does not want to change, and the number one thing is beta. It really wants you to be deliberate on beta on the upfront, and that makes sense to me. But it will actually play with alpha and it's going to allow us to play with k, which is the number of topics in a model, and that is a really lovely thing. The first step to an evaluation is to simply specify the parameters that we're going to iterate through for all of the topic models that we want to test. What we're going to go through with in this process is essentially creating many different topic models and looking at how statistics change as we specify topic models differently. The number of iterations isn't something that we should change unless we see that it's taking forever to get through all of these topic models that we've got to build. If it's taking forever, this is a parameter that we can lower to increase the speed, but remember, if we don't try to get those topic centroids right, if we don't really have enough darts to throw at the board, then we're going to have an issue where we might get lucky with one topic model and we might not get lucky with another. We want to keep that, and at least probably in the 500 range, but you can lower it a little bit if you just absolutely have to. Beta is static for evaluation, and I think that's mainly because beta represents topic or density, and that really doesn't change as you manipulate the number of topics that are in a model, so we're going to keep that static. However, you can see here that alpha is actually a dynamic reference, and it's always 1/k. If we have one topic in our model, our alpha would of course be 1, if we have 2 our alpha would be 50 or 0.5, if we have 3, it would be 0.33 and so on. We have to remember that alpha really represents the document topic density. The lower the alpha, documents contain fewer topics. Think about it like this, if you have just a handful of topics in a large corpus, most of those topics are going to be present in most of those documents. If you have a very large corpus, then most topics won't be present in most documents. You're going to have more topics, and you're going to have more documents, so you're going to have more of clusters, that's why we set it up as a relationship. As you get more topics, the idea is that those topics won't be present in every single document. If I have five topics and I got a million documents, then most of the topics are going to be present in most of the documents. If I've got 500 topics, most of the topics are not going to be present in most of the documents, it's going to be a few in different types of documents. What I'm trying to say here is that, as we increase the number of topics, we just don't expect every single document to be classified as having every single topic, we expect the number of topics for any given document to be classified as to be a percentage of the total number of topics. Again, if you think that there is something wrong with your topic density, then you can always play with this ratio. You could do 10/k to increase the alpha, or you do a 100/k to increase the alpha, or you could lower it to 0.5 k to decrease. It's the Alpha. But it's all going back to how many topics do you expect to be in a document? Remember, the higher the Alpha documents are made up of more topics, so any given document has more topics classified to it. The lower the Beta, that means the less topics you expect any given document to have, and that's really something that you have to think about qualitatively. Tweets, there's not a lot of topics that are going to be in any given tweet, but it might be one or two topics at most, but in a book there could be many topics that would belong to that one documents. So think about it like that as well. Now we've just have all of our parameters printed out, and now we're actually able to go through these parameters one combination at a time, and build a topic models, so this is going to be one topic model that's not going to be a very good one. This is going to be one topic model. It's going to have six topics, and it's going to have an Alpha of 0.16. The next one that we're going to build is going to have 11 topics, and an Alpha of approximately 0.1, and so on. Now, that we have these different parameters that we've defined, we've got to go through, and we got to actually build those models, and then look at some fit statistics to see if there's any intuition. As we play with these parameters, does the fit statistics actually correlate or correspond to something that would be interesting. We're going to go ahead, and also pull in some plotting ability of TM toolkits actually plot these statistics as we try different topic models to see how they change, and we're actually going to call those final things all into place, so that we can actually run this, and see how this works, and so if we actually print out results by n topics, we actually get these statistics. I'm going to include the papers of this statistics so you can see how these measures were constructed, and what their optimal values are, but I'm also going to just give you an intuitive way to interpret these as we go to. When we print out this chart here, we're actually looking at two different things. We want to look for an elbow here in these first two statistics, so the arun, and Cao Juan are both statistics that we want to try to minimize knowing that at some point that minimization process really starts to decay. We can see that we see a huge jump here in performance in the earlier topics, and this bottom here represents k. The x-axis is k. You can see here that we see a huge difference between six topics, and 11 topics. Not nearly as much of a difference between 11, and 16. It starts to diminish, and so that's what we want to see. You can see that this really starts to pretty much level out here, but I can tell already that we probably haven't specified enough k. At some point this arun, and Cao Juan are actually going to start to go back up in the opposite direction. They're actually going to start to increase in value. We see the decay of the elbow. We don't see the reemergence of this value actually getting higher. This is an example that when I see something like this tells me that I probably should in my constant parameters here, I should probably go up here, and change the max range of k to something like a 100 to see if I can actually witness this begin to advanced, and go up again. That being said, from what I'm seeing here, this is definitely approximately minimize somewhere around 46. The other challenge here that we see is that, we want to try to also maximize this coherent scores. The higher the coherence, the better. We just don't see that as we add more topics into the model, the coherence gets worse, and so that indicates to me that Alpha or Beta is in properly specify because it should not elbow, it actually should maximized at some points. It should increase at some point, and then start to go back down. We just don't see that, we actually see the inverse elbow of that, and so we don't want to accept coherence here as a valid metric because it's not performing like it's supposed to. We're actually supposed to see a peak in crest, and then fall down, and it's just getting worse as we introduce more topics. Again, that suggests to me that we're probably off on our Alpha, and Beta. In reality, you will probably not be able to fit a model that really optimizes all three of these metrics. You'll be lucky to get one that optimizes two of three. I'm not exactly sure why, but my guess is that, it's just really hard to get it greens across all of these different statistics. These statistics are a little less empirical in the sense that they're inferred from Model Fit and not inferred from Gold Standard Data. Remember that when we do validation on supervised machine learning, we have actual answers to compare our models to. Our measures of loss are pretty objective, in this case, the way that these fit statistics are calculated, It's all inferred of upon the data structures, distances between documents and none of that really empirically matches to, is this a good topic field or not. I'm not a particularly so these are the metrics that you need to say, well, my topic model's, awesome. The best thing is your intuition. But I can see here that clearly k of 16, which is what I specified in the previous lecture, was far fewer two topics. There should be more topics added to this model. These metrics continue to prove pretty much all the way up until I cut off my topics, and this one continues to get worse so I've got a lot of work to do. This is where the challenge of the project is. I've given you the code, but I am not tuned these parameters to this data set to be optimal at all. These are performing like they should, but I think we could increase k to see what happens once we get north of 50 topics. This is just not behaving at all like it should so I'll show you the papers so that you can actually see how the authors actually interpret these metrics and what their graphs look like as we increase k and those will be in the supplementary files for topic modeling. If you have any questions on that, start a discussion in the Coursera about what you were able to do to get these fit statistics to work appropriately because trust me, you will be hard pressed to get all three work the way they should. I think two out of three ain't bad as Meat Loaf once said. I'm playing around with the Beta here, you can see that this is again another evaluation that I did. I played around with the Beta. I left the Alpha of 1 over k and I did one to 50 again, and again, I should probably increase that. You can see that I see similar curves so I'm increasing the Beta this time. This is now 0.5 leaving K as is, still seeing the same type of thing. I can lower the Alpha, so this is 1 over 10 times k. k is always going to be Beta and all of a sudden, this is starting to behave differently, right? Well, this is actually suggesting something maybe that we can converge on here that we can say, so certainly, we're seeing this value lower and the lower the degree of this decay, starts to settle in around 30 and all of a sudden, the coherence really comes back up around 32. There's some evidence here that we might be getting on to something and so these metrics really do change as we go. What have we learned from this process? Well, we need to keep going to really get all of our learnings out from this. But we did learn that k is probably somewhere around 40 to 50 plus, because it's definitely not converged at 16 like I had hoped. We also see that there's something may be happening in here north of 30 topics where coherence is maximized and we definitely see that as we change Alpha and Beta, these parameters change a lot. What I usually do in this process is jot down combinations that I want to try it for k and also for Beta and Alpha. I'm really intrigued about using case and more north and 30 here with this Beta and Alpha. I'm going to build a model and I'm going to print out the topic model words to see how that looks and see if that's clear, and I know that I need to play around with a higher k value as well, so I'm going to probably read rerun these evaluations with a higher k. This is where we have to be patient with topic modeling. I know it can be frustrating, but you're going to get higher-quality topics as a result, by spending a little time here.