To develop a better approach to parameter learning problem, we need to develop a new version of the Viterbi algorithm. As we discussed, the Viterbi algorithm gives a yes or no, hard answer to the question, was the HMM in state k at time i given that it emitted the string x? We now want to give a soft answer to this approach. And to give the soft answer we need to define the notion of probability of pi i = k, x. Which is a unconditional probability that the hidden path will pass through state k at time i and emit x. For example, this question translate in the following question for the Crooked Casino. What is the probability that the dealer was using the Fair coin at the 5th flip, given that it emitted a given string? So after we defined the notion of probability pi i = k, x, we notice that this is simply the total probability of all paths passing through the node corresponding to state k in the i's column of the HMM Viterbi graph. And note that this probability can be computed as simply path sums through all possible paths pi, such that pi i = k of probabilities of x, pi. Also note that sum of probabilities pi i = k, x through all possible emitted symbols k and all possible paths x = 1. We now introduce a different probability, probability that pi i = k, given x, which is a conditional probability that the HMM was in state k at time i given that it emitted string x. And in the case of Crooked Casino, it correspond to the following question. What is the probability that the dealer was using the Fair coin at the 5th flip, given that it emitted a given string? To compute the probability that pi i = k, given x, we notice that this probability is simply the fraction of the product weight of paths visiting node k over the weight of all paths in the Viterbi graph. Or in other words, probability of pi i = k given x is simply the ratio of probability pi i, x over probability of x. And therefore, to find out what is probability of pi i = k given x, we need to solve the following soft decoding problem. Find the probability that an HMM was in a particular state at a particular moment given its output. To solve the self decoding problem, we first need to compute the probability pi i = k, x which is total product weight of all paths through the Viterbi graph for x the pass through the highlighted node in the graph below. For each such path we can notice that it is formed by a blue subpath from the source to the highlighted node, and by the red subpath from the highlighted node to the sink. And therefore probability of pi i = k, x equal to sum of product weights of blue paths multiplied by the sum of the product weight of the red paths. The first part of this equation, sum of product rate of blue paths, is simply the expression forward ki that we already know how to compute. But what is the red path of this equation? Well, note that the red path of this equation can be computed if we simply will have HMM Viterbi graph with its all edges reversed. And we will denote the product weight of this path as backward k, i. And of course computing backward k, i will be very similar to computing forward k, i. Therefore, since the reverse edge connecting node (l, i + 1) to node (k, i) in the reversed graph has weight (k, l, i), we have recurrency backward k i = sums through all states l backward l i + 1 x the weight of the edge (k, l, i). And therefore combining the forward-backward algorithm with this solution of the outcome likelihood problem, we have Pr(pi i = k given x) = Pr(pi i = k, x)/Pr(x), which is simply forward k,i x backward k,i and divided by forward(sink). We now know how to compute each element in this formula. Now, after we computed the values forward k,i and backward k,i, let's ask a slightly different question. What is the conditional probability that (pi i = l, pi i + 1 = k, given that the emitted string is x)? Or in another word, what is the probability that HMM passes through an edge in the Viterbi graph? And of course this probability equal to sum of weights of all blue subpaths in this graph, multiplied by the weight of the black edge, and multiplied by the sum of the weights of all red subpaths. And as a result, this expression is given by forward l,i x weight(l, k, i) x backward k, i + 1 divided by forward(sink).