In section 2.4, we talk about chain rules for information measures. First we have the chain rule for entropy, which says that the entropy of X_1, X_2, up to X_n, is equal to the summation from i equals 1 to n, entropy of X_i conditioning on X1, X2, up to X_{i-1}. As an illustration, we look at some examples. First, for n equals 2, the chain rule says that the entropy of X_1, X_2 is equal to the entropy of X_1 plus the entropy of X_2 given X_1. This is just Proposition 2.6 that we have seen before. For n equals 3, it says that the entropy of X_1, X_2, X_3 is equal to the entropy of X_1 plus the entropy of X_2 conditioning on X_1 plus the entropy of X_3 conditioning on X_1 and X_2, And then we have the chain rule for conditional entropy, which is exactly the same as the chain rule for entropy. Except that we are now conditioning on a random variable Y. We also have a chain rule for mutual information, which says that the mutual information between X_1, X_2, all the way up to X_n, and Y is equal to summation i from 1 up to n, the mutual information between X_i and Y conditioning on X_1, X_2 up to X_{i-1}. As an illustration, we look at some examples. For n equals 2, the chain rule says that the mutual information between 1, 2, and Y, is equal to the mutual information between 1 and Y, plus the mutual information between 2 and Y given 1. For n equals 3, the chain rule says that the mutual information between 1, 2, and 3 given Y is equal to the mutual information between 1 and Y plus the mutual information between 2 and Y, given 1, plus the mutual information between 3, and Y given 1 and 2. Then we also have the chain rule for conditional mutual information, which is exactly the same as the chain rule for mutual information, except that we now condition on a random variable Z. We will now prove the Chain Rule for Conditional Entropy. First, consider conditional entropy of X_1, X_2 up to X_n given Y is equal to entropy of X_1, X_2 up to X_n and Y minus entropy of Y. This is just to apply the Chain Rule in reverse. Now we combine the random variables X_1 and Y to obtain entropy of X_1, Y as a pair, X_2 all the way up to X_n minus entropy of Y. Then we apply Chain Rule to obtain entropy of X_1, Y plus summation i equals 2 up to n, entropy of X_i conditioning on X_1, Y, X_2 all the way up to X_{i-1}. And then, minus entropy of Y. And then, we move Y to the end of the conditioning. Now entropy of X_1, Y minus entropy of Y is equal to entropy of X_1 given Y. So we see that entropy of X_1 given Y is actually the first term in the summation. By combining these two terms, we can finally write summation i from one up to n, entropy of X_i, conditioning on X_1, X_2, up to X_{i-1}, and Y. Now offer an alternative proof of Proposition 2.25 which gives different insights. First, conditional entropy of X_1, X_2, up to X_n given Y, can be written as summation y, p(y) times entropy of X_1, X_2, up to X_n, conditioning on a particular y. Because we are conditioning on a particular y, we actually can apply the unconditional version of the chain rule to obtain summation i equals 1 up to n, entropy of X_i, given X_1, X_2, up to X_{i-1}, and a particular y. Now we switch the order of the summation, which can be justified to obtain summation i equals 1 up to n, summation y, p(y), entropy of X_i, given X_1, X_2, up to X_{i-1}, and a particular y. Now the inner sum is just a conditional entropy of X_i, given X_1 up to X_{i-1} and Y and so we have proved proposition 2.25. This alternative proof explains why proposition 2.25, namely the conditional version of the chain rule, can be obtained from proposition 2.24, namely, the unconditional version of the chain rule, by conditioning on Y throughout a formula.