We already have finished more than half of the proof. So let us summarize the main idea. First, we randomly generate M codewords in the alphabet X hat to the power n, according to p(x hat) to the power n, where n is large. By the strong AEP, the source sequence X is in the S set with high probability. For a particular source sequence, x in the S set by the conditional strong AEP, the probability that it is jointly typical with a particular codeword is approximately equal to 2 to the power minus n times I(X;X hat). If M grows with n at a rate higher than I(X;X hat), then the probability that there exists at least one codeword, which is jointly typical with the source sequence X with respect to p(x,x hat) is high. Such a codeword, if it exists, would have distortion approximately equal to the expected distortion between the random variable X, and the random variable X hat, because the joint relative frequency of the source sequence X and this codeword is approximately equal to the joint probability of x and x hat. In other words, if the source sequence and the codeword are jointly typical, then the distortion of the codeword is approximately equal to the expected distortion between the random variable X and the random variable X hat. This will be formally justified in the next proposition. We then use this codeword to represent the source sequence X to satisfy the distortion constraint, because the expected distortion between X and X hat is less than or equal to D. The next proposition says that for X hat such that the expected distortion between X and X hat is less than or equal to D, if a pair of sequences (x,x hat) are jointly typical, then the distortion between x and x hat is less than or equal to D plus d_max times delta. The proof goes as follows. For (x,x hat) that are jointly typical, consider d(x,x hat) equals 1 over n times summation k equals 1 up to n, d(x_k,x_k hat). Now this summation can be written as summation x, x hat, d(x,x hat) times N(x,x hat|x, x hat). In the next line, we add n p(x,x hat), and minus n p(x, x hat). Now, we multiply d(x, x hat) by n p(x, x hat) to obtain summation x x hat, p(x, x hat) times d(x, x hat), where this n cancels with this n. For this term, N(x,x hat|x, x hat), upon multiplying by 1 over n, we obtain 1 over n times N(x,x hat|x, x hat). And for n p(x,x hat), we obtain p(x,x hat), because this n and this n cancel with each other. Now this summation is simply equal to the expected distortion between the random variable X and the random variable X hat. For the second summation, for the expression inside the parenthesis, we upper bound it by means of the triangular inequality. Now d(x,x hat), is upper bounded by d_max, and this absolute value is less than or equal to delta, because the sequences x and x hat, are jointly typical with each other. Finally, the expected distortion between X and X hat, is less than or equal to D. This completes the proof of the proposition. We only need to fill in the remaining details of the proof. For sufficient large n, consider the probability that d(X,X hat) is greater than D plus epsilon. This can be written as the probability of the event conditioning on K equals 1 times the probability that K is equal to 1, plus the probability of the event conditioning on K not equal to 1 times the probability that K is not equal to 1. Now the probability, that d(X,X hat) is greater than D plus epsilon, conditioning on K equals one, is less than or equal to one. The probability that K is equal to one is less than or equal to epsilon, as we have shown. And the probability of K not equal to 1 is less than or equal to 1. So, we have epsilon plus the probability that d(X, X hat) is greater than D plus epsilon, conditioning on K not equal to 1. Recall that conditioning on K not equal to 1, the source sequence X and the reproduction sequence X hat are jointly typical. Then from the proposition that we have proved, the distortion between the source sequence X and the reproduction sequence X hat is less than or equal to D plus d_max times delta. By taking delta less than or equal to epsilon divided by d_max, we obtain d(X, X hat) less than or equal to D plus d_max times epsilon over d_max, where this d_max and this d_max cancel with each other and so, conditioning on K not equal to 1, d(X, X hat) is less than or equal to D plus epsilon. [BLANK_AUDIO] Therefore the probability that d(X, X hat) is greater than D plus epsilon conditioning on K not equal to 1 is equal to 0, which implies that the probability that d(X, X hat) greater than D plus epsilon is less than or equal to epsilon. It is then apparent that the event K not equal to 0 corresponds to successful encoding and the event K equals 1 corresponds to an encoding error. Thus we have shown the existence of a code such that 1 over n times log M, that is the rate, is less than or equal to I(X;X hat) plus epsilon, by construction. And through our analysis, we see that the probability that d(X,X hat) greater than d plus epsilon is less than or equal to epsilon. Hence, for any X hat such that the expected distortion between X and X hat is less than or equal to D, the pair (I(X; X hat),D) is achievable. Finally, minimize I(X; X hat) over all such X hat to conclude that the pair (R_I(D),D) is achievable. By the definition of the rate distortion function R(D), this implies that R_I(D) is greater than or equal to R(D). [BLANK_AUDIO] This picture explains intuitively why the rate of a rate distortion code that achieves a distortion D must be at least R_I(D). First of all the number of typical x sequences is approximately equal to 2 to the power n times entropy of X. Now for each codeword which is a typical X hat sequence, is jointly typical with approximately 2 the power n times entropy of X given X hat, typical X sequences. Therefore, in order for most of the typical X sequences to be covered by at least one codeword, the number of codewords must be at least 2 to the power n times entropy of X divided by 2 to the power n times entropy X given X hat, which is approximately 2 to the power n times I(X;X hat), where I(X;X hat) is greater than or equal to R_I(D). Therefore, the rate of such a code must be at least R_I(D).