So, good day. We are the 3rd group. Today, Iâll introduce some examples of Hidden Markov Model. So letâs start from a very simple example. It is adapted from Wikipedia. Imagine you have a girlfriend. You meet her every night, And when you meet her, she kisses you or beats you or does nothing. What she does depends on her feeling during the daytime. But you do not know how she is feeling during the daytime because you cannot meet her in the daytime. So every night you meet her, she does something And you would like to get the answer. So how is she feeling during the daytime? Happy or unhappy? So we build a Hidden Markov Model for this example because you do not know how she is feeling, happy or unhappy. It is a hidden state. But she kisses you or beats you or does nothing when you meet her. So you can know this information. It is observation. So, what we are going to do is using the hidden information, using the observation to infer the hidden state. Your girlfriend has some parameters. The 1st day, she has the probability of 0.6 happy and probability of 0.4 unhappy. During the following days, she is happy or unhappy depends on the previous day. If she is happy in the previous day, she will be happy in the probability of 0.7 and unhappy in the probability of 0.3. And unhappy is the similar. So this is the transition probability. And um, if she is happy, she has the probability of 0.5 to kiss you, and probability of 0.4 to beat you and 0.1 probability to do nothing. And unhappy is the similar. So this is the initiation probability. Now you have met you girlfriend in the continuous 3 days. In the first day, she will kiss you, second day beat you and the third day she will do nothing. So how is she during the three days? Happy or unhappy? So we will use dynamic programming algorithm, called the Viterbi algorithm, to get the answer. I think we are already familiar with it In the first day, there is a probability 0.6 if she is happy. And if she is happy, the probability is 0.5 when she kiss you And so the probability that she is happy in the first day and kiss you at the first night is 0.3 And similarly, as the same as the unhappy state that she beats you at the second night. So during the second night ,she beats you. So if she is happy in the first day, it is the probability of 0.7 that she is still happy in the second day. If she is happy in the second day the probability that she beats you is 0.4. and so, if she is happy and kiss you in the first day while she is happy in the second day but beat you, the probability should be 0.084. And the other three situations should be similar We get the higher happy probability 0.084. The higher unhappy probability is 0.0072. So we get this path and also this path for the unhappy statistic. This is similar in the third day. So now by the high probability we get this path. So it is mostly likely that she is happy in the first day, still happy in the second day and unhappy in the third day. So we use the Hidden Markov Models to solve this question. So now letâs talk about DNA sequences. um So imagine the 5' splice site recognition problem. Assume we are given DNA sequence that begins with exon that contains one 5' splice site and ends in the intron. So we are going to identify where the exon switches to the intron. So the information we can get is based on the sequence statistics of bases. So for every base pair it has 1 of these 3 hidden states. It may be exon, 5 splice site and intron. And we do not know the hidden states. We can only know the observations A, C, G or T. So we have the start probability of 100% to start at the exon. And this is the transition probability. If one base pair is exon, the following base pair has 0.9 probability of exon and 0.1 probability to become 5 splice site. And this is similar in 5 splice site and intron. So this is transition probability. And the emission probability is based on the base pair composition difference between the exon, 5 splice site and intron. So we assume in the exon all base pairs have equal frequencies. For the 5 splice site, it should always be G. The intron should be AT rich. So this is Hidden Markov Model. And then we solve the problem using the just talked dynamic programming algorithm. And we find the most likely path. So it has highest logP(P=probability) and has the 5 splice site here. So then we talk about gene modifying, we will start with very simplified model, So it only consider about the coding region, for every base pair it may have nine hidden states: the first base pair of the start codon, the second and the third base pair o of the start codon, the first, second, third base pair of the common codon and the stop codon. And the operation is also ACTG. The transition possibility, so we know if the first base pair is the first base of the start codon, so on what we do not know is, when the base pair is the third base pair of a codon,it may return to the first base pair of a codon or go to the stop codon So we also get the transition possibility based on the different base pair composition on the first, second and third position of a codon also on the start and stop codon. So now⦠this is the simplified codon region in the Hidden Markov model. It is very simple to add some states, We add one state in intragenic states and eight states before the start codon and three base pairs following the start codon. And also three base pairs before the stop codon and eight base pairs after the stop codon. So it became a more complex Hidden Markov model, it faces a prokaryotic unsliced gene. So for every base pair, it may have so many hidden states. The operation is the still four base pair, ACGT. but the observation is still the four basepair And the transition possibility and the emission possibility we do not shown here, it is also based on the DNA sequence states. And the transition possibility and the emission possibility we do not shown here, it is also based on the DNA sequence states. At the the first line...we now look at the first line,there are introns If the intron start just after the third basepair of the codon,it is a first line If the intron start after first basepair of condon.It is the second line This is the coding region so the start codon should be here ,it enters the coding region, and it may have three paths to enter the introns,go back to exons when the model in the exons, there are some probability to the stop codon Then we just add several states for promoter, several states for 5â UTR, and also several states for 3â UTR and poly A and inter-genic regionï¼and we will get the normal eukaryotic gene model. Having this model, we can predict genes in a sequence of anonymous DNA. We also just talk about the dynamic reprogramming algorithm, and we go through the Hidden Markov Chain. When the chain goes through a promoter state, a promoter is detected. When the chain goes through a stop codon state, the stop codon is detected, and so on. and so on So it is obviously there are so many possible variations of the model. We can add more states as we need. Maybe the model cannot detect every DNA (sequence) correctly, but it can detect DNA fit this model. So I finished, Thank you very much