[BLANK_AUDIO] In section 7.3, we prove the converse of the channel coding theorem. In the communication system that we are studying, the random variables W, X_1, Y_1, X_2, Y_2, all the way to X_n, Y_n, and W hat, are generated in this order. [BLANK_AUDIO] The memorylessness of the DMC imposes the following Markov constraint for each i. Namely, given X_i, Y_i is independent of all the random variables generated before X_i, that is W, X_1, Y_1, all the way to X_{i-1}, Y_{i-1}. The dependency graph which illustrates the Markov constraints of the random variables can be composed accordingly. We now see how the dependency graph can be constructed. First, as we have mentioned, the random variables involved in this problem are generated according to the following order: W, X_1, Y_1, X_2, Y_2, X_3, Y_3, all the way to X_n, Y_n, and W hat. Let q denote the joint distribution for all the random variables. For such a general distribution q, it can be factorized as q(w) times q(x_1|w) times q(y_1|w,x_1) times q(x_2|w,x_1,y_1) times q(y_2|w,x_1,y_1,x_2) so on and so forth. Such a factorization is valid as long as the conditional events all have non-zero probabilities. Otherwise, the joint probability is equal to 0. Now, for the conditional distribution q(y_1|w,x_1), by the definition of the DMC, when X_1 is given, Y_1 is independent of W. For the conditional distribution, q(x_2|w,x_1,y_1), because X_2 is a function of W, once W is given, X_2 is independent of X_1 and Y_1. For the conditional distribution q(y_2|w,x_1,y_1,x_2), according to the definition of the DMC, once X_2 is given, Y_2 is independent of W, X_1, and Y_1. Accordingly, we have q(w) times, q(x_1|w) where q(x_1|w) correspond to this edge in the dependency graph, so one can think of passing the random variable W through a channel to obtain the random variable X_1. Now, the conditional distribution, q(y_1|x_1) is just, p(y_1|x_1) which is the transition probability, which is the transition probability of the channel. And this corresponds to this edge in the dependency graph. So, we can think of passing X_1 through the channel, p(y|x), to obtain the random variable Y_1. Likewise, we have, q(x_2|w), which corresponds to this edge and, p(y_2|x_2), which corresponds to this edge. Continuing in this fashion, we can construct all the other edges. [BLANK AUDIO]. And this completes the construction of the dependency graph. Thus, there is a one-to-one correspondence between the dependency graph and the Markov structure of the random variables in the problem. [BLANK_AUDIO] We continue to use q to denote the joint distribution and marginal distributions of all random variables. In the last slide, we have worked out a factorization for the joint distribution q. More precisely, for all quadruples, (w, x, y, w hat), such that q(x) is greater than 0, and q(y) is greater than 0, we have the factorization q(w,x,y,w hat) equals q(w) times the product i from 1 up to n q(x_i|w), times the product from i equals 1 up to n p(y_i|x_i) times q(w hat|y). We have seen how we can construct the dependency graph from this factorization of the joint distribution q. On the other hand, this factorization can also be obtained directly from the dependency graph. Note that q(w) is greater than 0 for all w, because the message is chosen uniformly so that, q(x_i|w) are well defined. Also, note that, q(x_i|w), and, q(w hat|y) are deterministic. The dependency graph suggests the Markov chain W, X, Y, and W hat, and this can be formally justified by invoking proposition 2.9. [BLANK_AUDIO] All we need to note is that the first term q(w) depends only on w. The second term depends only on w and x. The third term depends only on x and y. And the last term depends only on y and w. We are going to prove two propositions that arise from the Markov structure of the problem. The first proposition says that for a sequence x such that q(x) is greater than 0, q(y|x), is equal to the product of i from 1 up to n, p(y_i|x_i). This means that the probability of receiving a particular sequence y, given that a particular sequence x is transmitted, is equal to the product of the conditional probabilities for the individual transmissions. The proof goes as follows. First, for all x sequence and y sequence, such that q(x) is greater than 0 and q(y) greater than 0, we have q(x,y) equals summation w, summation w hat, q(w,x,y,w hat). Applying our factorization, for q(w,x,y,w hat) we obtain this expression. Here, note that in the summation the first three terms does not depend on w hat. So, we can move it outside the summation w hat. Now summation w hat, q(w hat|y) is equal to 1, because q(w hat|y), is a conditional distribution. Furthermore, the product of all i, p(y_i|x_i) does not depend on w, and so we can move it outside the summation w. Furthermore, q(x) can be obtained from q(x,y) by summing over all y, and applying the formula we have obtained for q(x,y), we have this. Now, the expression in the first square bracket does not depend on y. So, we can move it outside the summation y. And for the summation y, we write it out explicitly as summation y_1, summation y_2, all the way to summation y_n. Now we claim that summation y_1, summation y_2, all the way to summation y_n, the product of all i, p(y_i|x_i), is equal to the product of all i, summation y_i, p(y_i|x_i). We now illustrate this for the case when n is equal to 2. For this case, we have summation y_1, summation y_2, the product of i from 1 up to 2, p(y_i|x_i). Writing out the product, we have summation y_1, summation y_2, p(y_1|x_1), p(y_2|x_2). And then this can be factorized as the product of 2 summation because, p(y_1|x_1) does not depend on y_2, and, p(y_2|x_2) does not depend on y_1. And so, we can write this as the product of i from 1 up to 2, summation y_i, p(y_i|x_i). Now, for each i, summation y_i, p(y_i|x_i) is equal to 1 because, p(y_i|x_i) is a conditional distribution. And, therefore, we have summation w, q(w), the product of i, q(x_i|w). Therefore, for x, such that q(x) is greater than 0, q(y|x), is equal to q(x,y) divided by q(x). By comparing the two expressions, we find that this cancels with this, and what is left is the product of all i, p(y_i|x_i). And, this proves the proposition. [BLANK_AUDIO] The next proposition we are going to prove, is a consequence of the previous proposition, which says that the conditional entropy of the received sequence Y, given the transmitted sequence X is equal to summation i equals 1 up to n, the conditional entropy of Y_i, given X_i. For the proof, we first observe that for any (x,y), if q(x,y) is greater than 0, then q(x) is greater than 0. Thus, by the above proposition, equation one holds. Therefore, by equation one we have, minus the expectation of, log of q(Y|X), is equal to, minus expectation of the log of the product of i equals 1 up to n, p(Y_i|X_i) which is equal to summation i equals 1 up to n, minus expectation of log of p(Y_i|X_i). Now, minus expectation of log q(Y|X) is simply entropy of Y given X. And, minus expectation of log p(Y_i|X_i), is simply the entropy of Y_i given X_i. So, we have proved the assertion that entropy of Y given X, is equal to summation i equals 1 up to n, the entropy of, Y_i given X_i.