We are now ready to discuss theorem 6.10, the Conditional Strong AEP. For any sequence x which is delta typical, define the conditional typical set T_{Y|X delta}^n of the sequence x to be the set of all typical y's such that the pair (x,y) is jointly delta typical. If the size of the conditional typical set is at least equal to 1, that is there exist at least 1 typical y, such that (x,y) are jointly typical, then the size of the conditional typical set is lower bounded by 2 to the power n times H(Y|X) minus nu and upper bounded by 2 to the power n times H(Y|X) plus nu, where nu tends to 0 as n tends to infinity and delta tends to 0. Here are some remarks. Consider dividing the size of the jointly typical set T_{XY} by the size of the typical set T_X, which is approximately equal to 2 to the power n times H(X,Y) divided by 2 to the power n times the H(X). This is equal to 2 to the power n times H(X,Y) minus H(X). And this is equal to 2 to the power n times H(Y|X). In other words, the number of typical y sequences that are jointly typical with a typical x sequence(s), is approximately equal to 2 to the power n times H(Y|X) on the average. Theorem 6.10, that is the conditional strong AEP, guarantees that this is so for each typical x sequence as long as there exists at least one y sequence that is jointly typical with that x sequence. [BLANK_AUDIO] We will divide the proof of theorem 6.10 into two parts. First, we prove the upper bound on the conditional typical sets. It says that if the size of the conditional typical set for a particular typical sequence is at least equal to 1, then the size of the conditional typical set is upper bounded by 2 to the power n times H(Y|X) plus nu, where nu tends to 0 as n tends to infinity and delta tends to 0. For any nu greater than zero, by the strong AEP, consider 2 to the power minus n times H(X) minus nu over 2 greater than or equal to the probability of the x sequence, where this probability is equal to the sum of p(x,y) over all y sequences. This is lower bounded by summing p(x,y), over all y sequences, which is delta typical with that typical x sequence. Now we have (x,y) being jointly typical, so by the strong joint AEP, p(x,y) is lower bounded by 2 to the power minus n times the H(X,Y), plus nu over 2. This lower bound is valid when delta is sufficiently small. Since this lower bound does not depend on the sequence y, this is simply equal to the size of the conditional typical set, times 2 to the power minus n, times H(X,Y) plus nu over 2. The remaining steps are very similar to the proof of the upper bound on the typical set T_X in the theorem 6.2, that is the strong AEP. First, we move 2 to the power minus n times H(X,Y) plus nu over 2 to the other side of the inequality. [BLANK_AUDIO] Then we get the size of the conditional typical set for that particular sequence x upper bounded by 2 to the power minus n times the H(X) minus nu over 2 plus n times H(X,Y) plus nu over 2. [BLANK_AUDIO] This is equal to 2 to the power n times H(X,Y) minus H(X) plus nu. Now H(X,Y) minus H(X) is simply H(Y|X). So we get the size of the conditional typical set is at most equal to 2 to the power n times H(Y|X) plus nu. The formal proof of the lower bound in Theorem 6.10 is given in the textbook, which has a lot of fine details. Here, we give an outline of the proof by means of a very simple example. Let the alphabet X be {0,1} and the alphabet Y be {a,b,c}. Consider a particular pair of jointly typical sequences (x,y) as shown. For the x sequence, all the zeros occurs first and then all the ones. For the components of the y sequence corresponding to x_k equals 0, all the a's occurs first, then all the b's, and then all the c's. Same for the components of the y sequence corresponding to x_k equals 1. There are approximately n times p(0,a) pairs of (0,a), approximately n times p(0,b) pairs of (0,b), approximately n times p(0,c) pairs of (0,c), approximately n times p(1,a) pairs of (1,a), approximately n times p(1,b) pairs of (1,b)'s and approximately n times p(1,c) pairs of (1,c)'s Now observe that by rearranging the components of y corresponding to x_k equals 0, the joint typicality of the pair of sequences is preserved because there are still approximately n times p(0,a) pairs of (0,a)'s, n time p(0,b) pairs of (0,b)'s, and n time p(0,c) pairs of (0,c)'s. [BLANK_AUDIO] Likewise, by rearranging the components of y corresponding to x_k equals 1, the joint typicality of the pair of sequences is also preserved. [BLANK_AUDIO] Now the number of possible arrangements is equal to the product of two multinomial coefficients, the first one being np(0), np(0,a), np(0,b), np(0,c); the second one being np(1) np(1,a) np(1,b) and np(1,c). Using our approximation for the multinomial coefficient, for the first one, we get, 2 to the power n times p(0), multiplied by the entropy of the conditional distribution of y, given x is equal to 0. For the second one, it is approximated by 2 to the power n times p(1), times the entropy of the conditional distribution of y given x is equal to 1. Now, the entropy of the conditional distribution of y given x is equal to 0 is simply the conditional entropy of the random variable Y given the random variable X is equal to 0. Likewise, the entropy of the conditional distribution of y, given x is equal to 1, is simply equal to the conditional entropy of Y given X is equal to 1. And this is equal to 2 to the power n times H(Y|X). In other words, for any particular pair of jointly typical sequences, we can generate approximately 2 to the power n times H(Y|X) pairs of jointly typical sequences. Therefore, for a typical sequence x, as long as there exists a typical sequence y, which is strongly typical with that sequence x, the total number of typical y sequences jointly typical with that x sequence is at least equal to 2 to the power n times H(Y|X) minus nu, where nu is a small quantity that goes to 0 as n tends to infinity and delta tends to 0. So we have established theorem 6.10, the conditional strong AEP. Here is an illustration of the conditional strong AEP. On the top is a delta typical set T_X and at the bottom is the delta typical set T_Y. For each delta typical x sequence, as long as there exists another y sequence which is delta typical with it, then there are approximately 2 to the power n times H(Y|X) such y sequences. Likewise, for a delta typical y sequence, as long as there exists one typical x sequence, which is delta typical with that y sequence, then there exist approximately 2 to the power n time H(X|Y) such x sequences. The next result is a corollary of Theorem 6.10. For joint distribution p(x,y) on alphabet X times alphabet Y, let S_{X delta}^n be the set of all delta typical in sequences, such that the conditional typical set T_{Y|X delta}^n for that particular x sequence is non-empty. In other words, the set S_{X delta}^n is the set of all delta typical x sequences such that there exist at least one y sequence which is delta typical with that x sequence. Then the size of the set S_{X delta}^n is at least 1 minus delta times 2 to the power n times entropy minus psi, for psi tends to 0 as n tends to infinity and delta tends to 0. Basically, this corollary says the following. Although the set S_X is a subset of the typical set T_X, the size of the typical set T_X and the size of the set S_X grow at a same asymptotic rate. The next proposition says that with respect to a joint distribution p(x,y) on X alphabet times Y alphabet, for any delta greater than 0, the probability that the random sequence X being in S_{X delta}^n is greater than 1 minus delta, for n sufficiently large. This proposition says that with very high probability, we can obtain a x sequence such that there exist a y sequence that is jointly typical with that x sequence. The proof of Proposition 6.13 is straightforward, so it is omitted here. From these two results, we see that the set S_X, and the set T_X, have almost the same asymptotic properties. We now prove Corollary 6.12. By the consistency of strong typicality, if the pair of sequences (x,y) are jointly delta typical, then x is delta typical with respect to the marginal p(x). In particular x is in S_{X delta}^n because there exist at least one y that is jointly typical with that x. Now we write the jointly typical set T_{XY} as the union over all x sequence in S_X of all (x,y) pairs such that y is jointly typical with x. Using the lower bound on the size of the jointly typical set T_{XY}, and the upper bound on the size of the conditional typical set T_{Y|X}, we have the size of the jointly typical set T_{XY}, lower bounded by 1 minus delta, times 2 to the power n times H(X,Y) minus delta, and upper bounded by the size of the set S_{X delta}^n times 2 to the power n times H(Y|X) plus nu. From the lower bound and the upper bound, we obtain the size of the set S_{X delta}^n, greater than or equal to 1 minus delta times 2 to the power n times H(X,Y) minus H(Y|X) minus lambda plus nu. Since the H(X,Y) minus H(Y|X) is equal to H(X), this is equal to 1 minus delta, times 2 to the power n times H(X) minus delta plus nu. Then the theorem is proved upon letting psi equals lambda plus nu. We now show that strong typicality exhibits a very special structure, called the asymptotic quasi-uniform structure. For two jointly distributed random variable (X,Y), consider the two-dimensional strongly joint typicality array whose rows are labelled by the approximately 2 to the power n times H(X) typical x sequences. The columns are labelled by the approximately 2 to the power n times H(Y) typical y sequences. For each particular typical x and a particular typical y, if the (x,y) pair is jointly typical, then we put a dot at that entry. By the strong joint AEP, the total number of dots in the array is approximately equal to 2 to the power n times H(X,Y). If we fix a typical x sequence, by the conditional strong AEP, the total number of dots is approximately equal to 2 to the power of n times H(Y|X). In other words, every row in a strongly joint typicality array has approximately the same number of dots. Likewise, for a fixed typical y sequence, the total number of dots, is approximately equal to 2 to the power n times H(X|Y). Again, each column of the array, has approximately the same number of dots. Now we look at a three dimensional strongly joint typicality array. In this array, there are approximately 2 to the power n times the H(X) typical x sequences, approximately 2 to the power n times H(Y) typical y sequences, and approximately 2 to the power n times H(Z) typical z sequences. On the plane corresponding to a typical z sequence, z_0, if there is one dot on the plane, then there are approximately the same number of dots, on each plane parallel to that plane. [BLANK_AUDIO] Likewise, for a fixed typical (x_0, y_0) pair, for the cylinder corresponding to that pair, if there is one dot in the cylinder, then there are approximately the same number of dot in each cylinder in the same direction. [BLANK_AUDIO]