One simple yet extraordinarily class of probabilistic temporal models is the class of hidden Markov models. Although these are models can be viewed as a subclass of dynamic Bayesian networks. We'll see that they have their own type of structure that makes them particularly useful for a broad range of applications. So, a hidden Markov model in its simplest form can be viewed as a probabilistic model that has a state variable S and a single observation variable O. And so the model really has only two publistic pieces, there is the transition model that tells us the transition from one state to the next over time and then there is the observation model, that tells us in a given state how likely we are to see different observations. Now, you can unroll this simple 2d-BN. So if this is our 2d-BN you can unroll that to produce an unrolled network, which has the same repeated structure [INAUDIBLE] state at time zero move to the state at time one, and so on, and at each state, we make an appropriate observation. But what's interesting about hidden Markov models is that they often have a lot of internal structure that manifests most notably here in the transition model, but sometimes that was on the observation model. So here is an example of what the of what a structured transition model would look like and this entire model is actually an, is peering into the CPD of the probability of the state a time, of the next state given, the current state. So each of these nodes in here is not a random variable, rather it is a particular assignment to the state variable, sort of state that the model might been. And what you see here is the structure of the transition matrix that tells us that from S1 for example, the the model is likely to transition to S2 with probability of 0.7 or stay in S1 with a probability of 0.3. And so the, these two outgoing probabilities have to sum to one because it's a probability distribution over the next state, given that in the current time point the model is in state S1. And we similarly have that for, all other states, so here from S4, for example. there is the probability of 0.9 of going to S2 and 0.1 of staying at S4. So here, the structure is actually a sparsity in the transition model, As opposed to something that manifests at the level of the 2TBN structure, which is actually fairly simple. It turns out that this kind of structure is useful for a broad range of applications. robot localization, speech recognition, where HMMs are really the method of choice for all current speech recognition systems. Biological sequence analysis for example, annotating a DNA sequence with elements that are functionally important and other elements that are not of importance and similarly annotating a text sequence with, particular annotating words with the role in the sentence for example. All of these are methods where hidden Markov models have been used with great success. So, let's look for example what the HMM would look like for robot localization. This might not look exactly like a HMM to begin with because it has some additional variables, so lets talk about what each of these variables means. Here, what we have is a state variable that represents the robot pose that is the position, and potentially orientation of the robot within a map at each point in time. We also have an external control signal u, which is the guidance that the robot is told of move left, move right, turn around, since these variables are observed and externally imposed, they're not really stochastic random variables, they are just inputs to the system if you will. and then we have in addition, the observation variables which is what the robot observes at each point in the process, which depends on both their position, and of course on, the map position. So that the robot's observations depend on the overall architecture of the space that they're in and the state that they're building. Now, since in many of the applications that we'll be considering the map is observed, which is why I just grayed it out then you can effectively think of this as something where the basic structure, over which where reasoning is just the set of variables that represent the S's and the O's, which is why this is really a slight elaboration of the standard HMM model. And here also, we're going to have sparsity in the transition model, because the robot had jump around from one state to the other so within a single step. and so there is only going to be a limited set of positions at time T plus one given where the robot is at time T. Speech recognition, as I mentioned, is perhaps the prototypical HMM success story, because it's so, it's so much in common use. The speech recognition problem is to take a sentence, such as Dorothy lived in whatever, and imagine somebody is saying that and what is given as input to the probabilistic reasoning system is this very noisy acoustic signal that represents the, the values of the different frequencies of speech in a different frequency, acoustic frequencies at each point in time. And what we would like to do, is we would like to take that input and put it into a decoder that is going to evaluate different possible sentences, and hopefully guess what the source sentence is and be able to predict it with reasonable accuracy. So how does speech recognition work? So this is what an acoustic signal looks like, we can see that over here where at each point in time we have these frequencies and we can convert that to the frequency spectrum by using Fourier transform. And what we would like to do, we would like to break up this continuous signal into these pieces that correspond to words and recognize for each piece that which word it corresponds to. So this is a twofold problem because in general in continuous speech, we have to both identify the boundaries between words, as we are also trying to identify the words. And it turns out that the way to do this is not to think about words as the basic unit, but rather think about these smaller entities that are called phones or phonemes, as the basic units from which words are comprised, and then, as we recognize phones, we can put them together to figure out what the words mean. So, here is a phonetic alphabet one of several that break the is used to define how words are broken up into these little pieces. And, so you can see that these hm, hm don't line up exactly with with characters in the alphabet, because the same character can actually be pronounced in different ways giving rise to different phones, and vice versa sometimes the same phone can come from different letters. And, so what we see here are the, for example the phone called ER, or called ER and is pronounced in the same way as the ur in hurt. And, so this is a phonetic alphabet, and as I said, there is many of those that one can consider. So once we have the phonetic alphabet we can look at a word, in this case, this is the, this is the phonetic alphabet, the phonetic breakdown of the word nine, n, ay, n. So you can see that this is an HMm structure, this is not the DBN, this is the HMM that tell us at each point in time, whether you're within the phone, n, ay, and so, there is a self transition loop, because you can stay in the same phone for more than one time unit. And then, eventually, you transition to the next phone and the next phone and this is a typical HMM for a word. Now, within a phone, a phone also lasts a certain unit of time, and so what we have here is the, within the phone for a given, within a given phone there is a transition model. In this case, the phone and it has in this case, three states, the beginning b, the middle, and the final. And this is a fairly standard breakdown for most phone that have the beginning of the phone, the middle, and the end. And each of these typically corresponds to a different distribution over the acoustic signal that you see at that stage in the phone. So, if you put all these together, if you're trying to do speech recognition, then and this is for the moment, speech recognition for isolated words. So there is a start state, and then, there is a transition model from the start state that tells us how likely we are in the current state to go into one of many words and that would be a language model that tells us how likely different words are to occur at this point. And then, assuming, and then given that the model has transitioned to say the one word, the word one, and we can see that we now have across different points in time that the model transitions to the w, w and then ultimately to the and then finally to the n, n, n. And, then, at the end of it, it transitions to the end of the word, at which point the process circles back and we move on to the next word. And the self transition, and the transition back to the start state is what allows us to do continuous speech recognition. So this is a probabilistic model that tells us how speech might be constructed of first words, then phones within words, and then finally pieces, little bits of the phone that we see in this in the phone HMM that we saw previously. And, this is a generative model of speech, but what happens is that as you feed in evidence about the observed acoustic signal over here, and you run probabilistic inference over this model. What you get out is the most likely set of words that gave rise to the observed speech signal. So to summarize, HMMs in some simplistic way can be viewed as a subclass of the framework of dynamic Bayesian networks. And, while they seem unstructured when we observe them at that level, when we think of structure at the level of random variables, there is a lot of structure that manifests in the sparsity structure of the, of the conditional probabilities and also in terms of repeated elements. As we saw in the previous slide, the phoning, the phone model can occur in multiple words and we replicate that structure across the different places where the same phone can be used in, in the language. And so, that gives a lot of structure in the transition model that really doesn't manifest in any way at the level of random variables. And, we've seen that HMMs are used in, of wide variety of applications for sequence modeling and they're probably one of the most used forms of probabilistic graphical models out there.