In section 8.5, we discuss the achievability of the information rate distortion function R_I(D). First, recall, that the rate distortion theorem says that R(D) is equal to R_I(D). Here is the strategy for proving the achievability of R_I(D). First, an i.i.d. source X_k, k greater than or equal to 1, with generic random variable X with distribution p(x) is given. For every random variable X hat, taking values in the alphabet X hat, with expected distortion between X and X hat less than or equal to D, where D is between 0 and D_max, we will prove that the rate distortion pair, (I(X;X hat),D) is achievable by showing that for large n, the existence of a rate distortion code, such that the rate of the code is not more than I(X;X hat) plus epsilon, and the distortion between the source sequence X and reproduction sequence X hat is less than or equal to D plus epsilon with probability almost equal to 1. [BLANK_AUDIO] Then minimize the mutual information between X and X hat over all such X hat, to conclude that the pair (R_I(D),D) is achievable. By the definition of R(D), this implies that R_I(D) is greater than or equal to R(D). We first describe the parameter settings for the random coding scheme. First, fix epsilon greater than 0 and X hat such that the expectation of the distortion between X and X hat is less than or equal to D, where D is between 0 and D_max. Let delta be specified later. Let n be an integer, satisfying 1 over n times log M, greater than or equal to I(X;X hat) plus epsilon over 2 and less than I(X;X hat) plus epsilon, where n is sufficiently large. The idea here is that the rate of the code exceeds I(X;X hat), but not by too much. Here is the random coding scheme. First construct a codebook C, of an (n,M) code by randomly generating M codewords in the alphabet set X hat to the power n, independently and identically according to p(x hat) to the power n. Denote these codewords by {X hat}(1), {X hat}(2) up to {X hat}(M). This is illustrated in the following figure. Then reveal the codebook C to both the encoder and the decoder. The source sequence X is then generated according to p(x) to the power n. The encoder encodes the source sequence X into an index K in the set I containing the indices {1,2,...,M}. The index K takes the value i if (a), X and {X hat}(i), that is the i-th codeword, are jointly typical. And (b), for all i', if X and {X hat}(i') are jointly typical, then i' is less than or equal to i. That is, if there exist more than one i satisfying (a), let K be the largest such index. Otherwise, K takes the constant value 1. The index K is delivered to the decoder. The decoder outputs {X hat}(K) as the reproduction sequence X hat. Here are some remarks. First, the event K equals 1 occurs in one of the following two scenarios: {X hat}(1), that is the first codeword, is the only codeword in C which is jointly typical with the source sequence X. Or no codeword in the codebook is jointly typical with the source sequence X. Second, if K is not equal to 1, then the reproduction sequence {X hat}(K) is always jointly typical with the source sequence X. As we will see, K not equal to 1 corresponds to successful encoding. And K equals 0 corresponds to an encoding error. The performance analysis is done in a number of steps. First, recall that the event K equals 1 occurs in one of the following two scenarios: {X hat}(1) is the only codeword in a codebook which is jointly typical with the source sequence X. Or no codeword in the codebook is jointly typical with the source sequence X. In other words, if K equals 1, then the source sequence X is jointly typical with none of the codewords, {X hat}(2), {X hat}(3) all the way to {X hat}(M). We will show that the probability that K equals 1 can be made arbitrarily small. Define the event E_i to be the event that the source sequence and the i-th codeword are jointly typical. From the discussion in step one, we see that the set K equals 1 is a subset of E_2 complement intersect E_3 complement intersect all the way to E_M complement. Since the codewords are generated i.i.d. conditioning on the particular source sequence X, the events E_i are mutually independent, and have the same probability. Then for any particular source sequence X, the probability that K equals 1, given X equals x, is upper bounded by the probability of E_2 complement intersect E_3 complement intersect all the way to E_M complement, conditioning on X equals x. Because the events E_2 up to E_M are independent, this probability is the equal to the product from i equals 2 up to M, the probability of E_i complement, given X equals x. Furthermore, because all these events are identical, this is equal to the probability of E_1 complement given X equals x to the power M minus 1. Or 1 minus the probability of E_1 given that X is equal to x to the power M minus 1. We will focus on those source sequence x in the set S_{X delta}^n, where the S set is equal to the set of all typical x sequence such that there exist at least one typical x hat sequence which is jointly typical with that x sequence, because from proposition 6.13, we know that the probability that the source sequence is in the S set is approximately equal to 1 for large n. For an x sequence in the S set, we obtain the following lower bound on the probability of E_1 given X is equal to x. First, the probability of E_1 given X is equal to x is equal to the probability that X and {X hat}(1), that is the first codeword, are jointly typical. This is equal to summation p(x hat) such that, x hat is in the conditional typical set T_{X hat|X delta}^n of X. By the consistency of strong typicality, if X and X hat are jointly typical, then X hat is also typical. And so, by the strong AEP, p(x hat) is lower bounded by 2 to the power minus n times entropy of X hat plus eta. Now, by the conditional strong AEP, if the conditional typical set is non-empty, which is the case, then its size is lower bounded by the 2 to the power n times entropy of X hat given X minus xi. Rearranging the terms, we have 2 to the power minus n times entropy of X hat minus entropy of X hat given X plus xi plus eta, where H(X hat) minus H(X hat|X) is equal to I(X;X hat). And we define xi plus eta to be zeta, where zeta tends to 0 as n tends to infinity, and delta tends to 0. Now from step five, we have the probability that K is equal to 1 given X is equal to x is less than or equal to 1 minus the probability of E_1 given X is equal to x to the power M minus 1. Applying the above lower bound on the probability of E_1 given X equals x, we obtain the upper bound 1 minus 2 to the power minus n times I(X;X hat) plus zeta to the power M minus 1. [BLANK_AUDIO] Now, recall that we chose M such that 1 over n log M is greater than or equal to I(X;X hat) plus epsilon over 2, this implies that M is greater than or equal to 2 to the power n times I(X;X hat) plus epsilon over 2. In step 8, we obtained an upper bound on the probability of K equals 1 given that X is equal to x. Taking the logarithm, we obtain that log of the probability that K is equal to 1, given X is equal to x, is (upper) bounded by M minus 1, log of 1 minus 2 to the power minus n, times I(X;X hat), plus zeta. Now, observe that this logarithm is negative and so, by applying this lower bound on M, we obtain the upper bound 2 the power n times I(X;X hat) plus epsilon over 2 minus 1, times log of 1 minus 2 to the power minus n times I(X;X hat) plus zeta. Now, for this logarithm, we apply the fundamental inequality to obtain the upper bound, minus 2 to the power minus n times I(X;X hat) plus zeta. When this term is multiplied by this term, this I(X;X hat) cancels with this I(X;X hat). And so, we obtain 2 to the power n times epsilon over 2 minus zeta. And the other term is obtained by multiplying minus 1 by 2 to the power minus n times I(X;X hat) plus zeta. Now, let n be sufficiently large and delta be sufficiently small, so that, epsilon over 2 minus zeta is greater than 0. So, as n goes to infinity, this tends to infinity and this tends to 0. Then the above upper bound on log of the probability of K equals 1, given X equals x, tends to minus infinity, as n tends to infinity. That is, the probability of K equals 1 given X equals x tends to 0 as n tends to infinity. This implies that for a sequence X in the S set, for sufficiently large n, the probability of K equals 1 given X equals x is less than or equal to epsilon over 2. [BLANK_AUDIO] Now it follows that the probability of K equals 1 is equal to the probability of K equals 1 given X equals x times probability of X equals x, over all x in S set. And, we take the same summation over all x which are not in the S set. Now, the probability that K is equal to 1 given that X is equal to x where x is in the S set, from step eleven is upper bounded by epsilon over 2. For x not in S set, this conditional probability is upper bounded by 1. Of the probability that X is equal to x over all x in S set is simply the probability that the random source sequence is in the S set. And the summation of the probability of X equals x for all X not in the S set is simply the probability that random source sequence x is not in the S set. Now, the probability that the source sequence is in the S set is upper bounded by 1 and the probability that the source sequence X is not in the S set by proposition 6.13 is upper bounded by delta. And so, we have obtained that the probability that K is equal to 1 is upper bounded by epsilon over 2 plus delta, by letting n be sufficiently large and delta be sufficiently small so that epsilon over 2 minus zeta is greater than 0. Recall that we need this condition to ensure that for sufficiently large n, for sequence x in the S set the probability that K is equal to 1, given X is equal to x is less than or equal to epsilon over 2. And this contributes to the term epsilon over 2, in the above upper bound on the probability of K equals 1. At the same time, we let delta be sufficiently small, so that delta is less than epsilon over 2. And so, we obtain the probability that K is equal to 1, is less than epsilon.