0:02

Hey.

You know the basic topic model which is called PLSA,

and now you know how to train it.

Now, what are some other topic models in this world?

What are some other applications that we can solve with the topic modeling?

I want to start with a nice application.

It is about diary of Martha Ballard.

So, this is a big diary.

She was writing for 27 years.

This is why it's rather complicated for people to read this diary and to analyze this.

So, some other people decided to apply topic modeling to

this and see what other topics revealed in this diary.

These are some examples of the topics and you can see just the top most probable words.

So, you remember you have your Phi metrics which

stand for probabilities of words and topics.

And this is exactly those words with the highest probabilities.

And actually you can see that the topics are rather intuitively interpretable.

So, there is something about the gardens,

and potatoes, and work in these gardens.

There is something about shopping like sugar,

or flour, or something else.

So, you can look through these top words,

and you can name the topics, and that's nice.

What's nicer, you can look into how these topics change over time.

So, for example the gardening topic is very popular during summer,

in her diary, and it's not very popular during winter,

and it makes perfect sense.

Right? Another topic which is about emotions has

some high probabilities during

those periods of her life when she had some emotional events.

For example, one moment of high probability there

corresponds to the moment when she got her husband into prison,

and somebody else died,

and something else happened.

So, the historians can I say that,

''OK, this is interpretable.

We understand why this topic has high probability there.''

Now, to feel flexible and to apply your topics in many applications,

we need to do a little bit more math.

So, first, this is the model called Latent Dirichlet Allocation,

and I guess this is the most popular topic model ever.

So, it was proposed in 2003 by David Blei,

and actually any paper about topic models now cite this work.

But, you know this is not very different from PLSA model.

So, everything that it says is that,

''OK we will still have Phi and Theta parameters,

but we are going to have Dirichlet priors for them.''

So, Dirichlet distribution has rather ugly form and you do not need to memorize this,

you can just always Google it.

But, important thing here is that we say that our parameters are not just fixed values,

they have some distribution.

That's why as the output of our model,

we are also going to have some distribution over parameters.

So, not just two matrices of values,

but distribution over them,

and this will be called posterior distribution and it will be

also Dirichlet but with some other hyperparameters.

In other course of our specialization devoted to Bayesian methods,

you could learn about lots of ways how to estimate this model and how to train it.

So, here I just name a few ways.

One way would be a Variational Bayes.

Another way would be Gibbs Sampling.

All of them have lots of complicated math,

so we are not going to these details right now.

Instead, I'm going just to show you what is

the main path for developing new topic models.

So, usually people use probabilistic graphical models and

Bayesian inference to provide new topic models and they say,

''OK, we will have more parameters,

we will have more priors.

They will be connected to this and that way.''

So people draw this nice pictures about what happens in the models.

And again, let us not go into

the math details but instead let us look how these models can be applied.

Well, one extension of LDA model would be Hierarchical topic model.

So, you can imagine that you want your topics to build some hierarchy.

For example, the topic about speech recognition would be

a subtopic for the topic about algorithms.

And you see that the root topic has

some very general Lexis and this is actually not surprising.

So, unfortunately, general Lexis is always something that we see with high probabilities,

especially for root topics.

And in some models,

you can try to distill your topics and to say well maybe we should have

some separate topics about

the stop words and we don't want to see them in our main topics,

so we can also play with it.

Now, another important extension of topic models is Dynamic topic models.

So, these are models that say that topics can evolve over time.

So, you have some keywords for the topic in one year and they change for the other year.

Or you can see how the probability of the topics changes.

For example, you have some news flow and you know that some topic about

bank-related stuff is super popular in this month but not that popular later.

OK? One more extension, multilingual topic models.

So, topic is something that is not really dependent on

the language because mathematics exists everywhere, right?

So, we can just express it with different terms in English,

in Italian, in Russian,

and in any other language.

And this model captures this intuition.

So, we have some topics that are just the same

for every language but they are expressed with different terms.

You usually train this model on parallel data so you have

two Wikipedia articles for the same topic,

or let's better say for the same particular concept,

and you know that the topics of these articles should be similar,

but expressed with different terms, and that's okay.

So, we have covered some extensions of Topic Models,

and believe me there are much more in the literature.

So, one natural question that you might have now if whether there

is a way to combine all those requirements into one topic model.

And there might be different approaches here and one approach which we

develop here in our NLP Lab is called Additive Regularization for Topic Models.

The idea is super simple.

So, we have some likelihood for PLSA model.

Now, let us have some additional regularizers.

Let us add them to the likelihood with some coefficients.

So, all we need is to formalize our requirements with some regularizers,

and then tune those tau coefficients to say that, for example,

we need better hierarchy rather than better dynamics In the model.

So, just to provide one example of how those regularizers can look like,

we can imagine that we need different topics in our model,

so it would be great to have as different topics as possible.

To do this, we can try to maximize the negative pairwise correlations between the topics.

So, this is exactly what is written down in the bottom formula.

You have your pairs of topics and you try to make them as different as possible.

Now, how can you train this model?

Well, you still can use EM algorithm.

So, the E-step holds the same,

exactly the same as it was for the PLSA topic model.

The M-step changes, but very slightly.

So, the only thing that is new here is green.

This is the derivatives of the regularizers for your parameters.

So, you need to add these terms here to get

maximum likelihood estimations for the parameters for the M-step.

And this is pretty straightforward,

so you just formalize your criteria,

you took the derivatives,

and you could built this into your model.

Now, I will just show you one more example for this.

So, in many applications we need to model

not only words in the texts but some additional modalities.

What I mean is some metadata,

some users, maybe authors of the papers,

time stamps, and categories,

and many other things that can go with the documents but that are not just words.

Can we build somehow them into our model?

We can actually use absolutely the same intuition.

So, let us just,

instead of one likelihood,

have some weighted likelihoods.

So, let us have a likelihood for every modality and

let us weigh them with some modality coefficients.

Now, what do we have for every modality?

Actually, we have different vocabularies.

So, we treat the tokens of authors modality as a separate vocabulary,

so every topic will be now

not only the distribution of words but the distribution over authors as well.

Or if we have five modalities,

every topic will be represented by five distinct distributions.

One cool thing about multimodal topic models is that you

represent any entities in this hidden space of topics.

So, this is a way somehow to unify all the information in your model.

For example, you can find what are the most probable topics for

words and what are the most probable topics for time stamps, let's say.

And then you can compare some time stamps and words and say,

''What are the most similar words for this day?''

And this is an example that does exactly this.

So, we had a corpora that has some time stamps for

the documents and we model the topics both for words and for time stamps,

and we get to know that the closest words for the time stamp,

which corresponds to the Oscars date would be Oscar,

Birdman, and some other words that are really related to this date.

So, once again, this is a way to embed all your different modalities into

one space and somehow find a way to build similarities between them.

OK. Now, what would be your actions if you want to build your topic models?

Well, probably you need some libraries.

So, BigARTM library is the implementation of the last approach that I mentioned.

Gensim and MALLET implement online LDA topic model.

Gensim was build for Python and MALLET is built for JAVA.

And Vowpal Wabbit is the implementation of the same online LDA topic model,

but it is known to be super fast.

So, maybe it's also a good idea to check it out.

Now, finally, just a few words about visualization of topic models.

So you will never get through large collections and that is

not so easy to represent the output of your model,

those probability distributions, in such a way that people can understand that.

So, this is an example how to visualize Phi metrics.

We have words by topic's metrics here and you can see that

we group those words that correspond to every certain topic

together so that we can see that

this blue topic is about

these terms and the other one is about social networks and so on.

But actually, the visualization of topic models is the whole world.

So this website contains 380 ways to visualize your topic models.

So, I want to end this video and ask you to just explore them maybe for a few moments,

and you will get to know that topic models can build

very different and colorful representations of your data.