Section 2.3 is about the continuity of Shannon's information measures for fixed finite alphabets. This topic is actually beyond the scope of this course. However, we can nevertheless develop some appreciation for the issues involved. It turns out that when it comes to continuity, there's a huge difference between finite alphabets, and countable alphabets. First of all, all Shannon's information measures are continuous when the alphabets are fixed and finite. This is precisely because all the summations are finite. (Countable) alphabets, namely that the alphabet contains an countably infinite number of elements, Shannon's information measures turns out to be everywhere discontinuous, which is very surprising. To probe further, I would refer you to problems at end of the chapter namely Problems 28, 29, 30 and 31 and you may also want to look into this paper published in 2009. The next definition explains what it means by the continuity of the entropy function. Let p and q be two probability distributions on a common alphabet script X. The variational distance, also called L_1 distance, between p and q is defined as V(p,q) equals the summation over all x, the absolute difference between p(x) and q(x). The entropy function is continuous at a target distribution p if the entropy of p' as p' tends to p in variational distance is equal to the entropy of the limit of p' as p' tends to p, that is the entropy of p. In other words, the order of taking the entropy function and the limit can be exchanged. Or, equivalently, for any epsilon bigger than 0, there exists a delta bigger than 0 such that the absolute difference between H(p) and H(q) is less than epsilon for a q satisfying V(p,q) is less than delta. In other words, as long as q is sufficiently close to p in variational distance, the difference between H(p) and H(q) is less than epsilon. Let us look at an example. Let X be the set of all possible integers. Which is a countably infinite alphabet, or simply a countable alphabet. Let P_X be the deterministic probability distribution, which consists of one probability mass equal to one and the rest equal to zero. And let P_{X_n} be the probability distribution consists of the probability mass 1 minus 1 over square root log n, And then followed by n copies of 1 over n times square root log n. And then followed by all zeros. It is easy to see that as n tends to infinity, P_{X_n} tends to P_X because the first probability mask tends to 1. And the n following probability mass all tends to 0. So as n tends to infinity, the variational distance P_X and P_{X_n}, which can be evaluated to 2 divided by square root log n, tends to zero. However, the entropy of the limit as n goes to infinity P_{X_n}, which is equal to the entropy of P_X, namely the entropy of the deterministic distribution is equal to 0. For the limit as n tends to infinity, the entropy of P_{X_n} can be shown to be equal to infinity. This is left aty an exercise.