In section 4.2 we discuss prefix codes, which is a very important class of uniquely decodable codes. Definition 4.9, a code is called a prefix code if no codeword is a prefix of another codeword. For brevity, a prefix-free code will be referred to as a prefix code. Here is an example of a prefix code. Consider the code C prime, which maps A to 0, B to 10, C to 110, and D to 1111. It is easy to check that no codeword is a prefix of another codeword, and therefore C prime is a prefix code. A D-ary tree is a graphical representation of a collection of finite sequences of D-ary symbols. For such a tree, a node is either an internal node or a leaf. The tree representation of a prefix code, is called a code tree. Now let us discuss instantaneous decoding of a prefix code. Using the code C prime in example 4.10, B is mapped to 10, C to 110, D to 1111, A to 0, and C, to 110. Upon concatenating the codewords, their boundaries are no longer explicit. That is, from the stream of symbols, we cannot tell the ends of the codewords. The stream of coded symbols are then transmitted to the receiver. Since C prime is a prefix code, the codewords can be represented by a code tree. Which is shown here. In labeling the nodes of the tree, we adopt the convention that going up is 0, going down is 1. So we have the codeword 0 for A, the codeword 10 for B, the codeword 110 for C, and the codeword 1111 for D. Now we are going to show that instantaneous decoding at the receiver can be done by tracing the code tree from the root. Now consider the stream of symbols received at the receiver. First we receive the symbol 1, so starting from the root, we trace down and then we receive the symbol 0, so we go up. And then we reach the codeword 10, and then we'd put a comma, denoting that this the end of the codeword. Now the next symbol received is 1, so we go down, the next symbol is 1, so we go down, the next symbol is 0, so we go up and then we reach another codeword, 110, and we put a comma there. Now the next received symbol is 1, so we go down. 1, go down. 1, go down. And 1, go down. Here we reach another codeword, 1111, and we put a comma. The next received symbol is 0, so we go up and reach a codeword 0, and put a comma. Now, the next symbol is 1, so we go down, 1, go down, 0, go up. So we reach the codeword 110, and we put a comma. So on and so forth. The boundaries of the code words can thus be recovered. In this sense, a prefix code is said to be self-punctuating. Theorem 4.11 is a characterization of the existence of a D-ary prefix code. It says that there exists a D-ary prefix code with codeword lengths l 1, l 2, up to l m if and only if the Kraft inequality summation D to the power minus l k, k equals 1 to m less than or equal to 1 is satisfied. The direct part of the theorem, that is, the only if part, follows because the prefix code is uniquely decodable and hence satisfies Kraft inequality. We now prove the converse. That is, the if part of the theorem. We need to prove the existence of a D-ary prefix code, with codeword lengths l 1, l 2, up to l m, if these lengths satisfy the Kraft inequality. Without loss of generality, assume that the codeword lengths are indexed in ascending order. Consider all the D-ary sequences of lengths less than or equal to l m and regard them as the nodes of the full D-ary tree of depth l m. This is illustrated in the figure. We will refer to a sequence of length l as a node of order l. Now, there are D to the power l 1 greater than 1 nodes of order l 1 which can be chosen as the first codeword. Because we assume that l 1 is greater than or equal to 1. This is illustrated in the figure. Thus, choosing the first codeword is always possible. Assume that the first i codewords have been chosen successfully, where i is between 1 and m minus 1. And we want to chose a node of order l i plus 1 as the i plus 1st codeword, such that it is not prefixed by any of the previously chosen codewords. Since all the previously chosen codewords are not prefixes of each other, their descendants of order l i plus 1 do not overlap. The i plus 1st node to be chosen cannot be a descendant of any of the previously chosen codewords. This is illustrated in the figure. Consider the full tree of length l i plus 1. The number of nodes of order l i plus 1 is equal to D to the power l i plus 1. For a node of order l 1, the number of descendants of order l i plus 1 is equal to D to the power l i plus 1 minus l 1. Because the depth of the subtree extending from that node is equal to l i minus l 1. Likewise, for a node of depth l 2, the number of descendants of order l i plus 1 is equal to D to the power l i plus 1 minus l 2. And for the node of order l i, the number of descendants of order l i plus 1 is equal to D to the power l i plus 1 minus l i. Therefore, the number of nodes which can be chosen as the i plus 1st codeword is precisely equal to D to the power l i plus 1, minus D to the power l i plus 1 minus l 1, minus D to the power l i plus 1 minus l 2, minus, all the way to D to the power l i plus 1 minus l i. If l 1, l 2, up to l m satisfy the Kraft inequality, we have summation k equals 1 up to m D to the power minus l k, is less than or equal to 1. Therefore, the first i plus 1 terms in the summation is also less than or equal to 1. That is, D to the power minus l 1, plus all the way to D to the power minus l i, plus D to the power minus l i plus 1, is less than or equal to 1. Multiplying the above inequality by D to the power l i plus 1, we have D to the power l i plus 1 minus l 1, plus, all the way to D to the power l i plus 1 minus l i, plus D to the power l i plus 1 minus l i plus 1, less than or equal to D to the power l i plus 1. Now, for D to the power l i plus 1 minus l i plus 1, it is just equal to 1. For the remaining terms on the left-hand side, upon moving it to the other side and rearranging the terms, we have D to the power l i plus 1, minus D to the power l i plus 1 minus l 1, minus, all the way to D to the power l i plus 1 minus l i, greater than or equal to 1. Now we recognize that the left-hand side of this inequality is exactly the number of nodes which can be chosen as the i plus 1st codeword. And we have shown that this is greater than or equal to 1. Therefore, we have shown by induction the existence of a prefix code, with codeword lengths l 1, l 2, up to l m, completing the proof. We now introduce a special class of probability distributions. A distribution is called D-adic distribution if p i is equal to D to the power minus t i for all i, where t i is a positive integer. When d is equal to 2, it is called a dyadic distribution. A corollary of the previous theorem is the following. There exists a D-ary prefix code which achieves the entropy bound for a distribution p i if and only if p i is, if and only if p i is D-adic. Here is the proof of the corollary. We first prove the only if part. Consider a D-ary prefix code which achieves the entropy bound for a distribution p i. Let l i be the length of the codeword assigned to the probability p i. By Theorem 4.6, for all i, l i is equal to minus log D p i. Or, p i is equal to D to the power minus l i. Thus p i is D-adic. The if part goes as follows. Suppose p i is D-adic, let p i equals D to the power minus t i for all i, where t i is a positive integer, then t i is equal to minus log D p i. Let l i equals t i for all i. We now verify that l i defined as such satisfies the Kraft inequality. Consider summation i, D to the power minus l i equals summation i D to the power minus t i, because l i is equal to t i. Now, D to the power minus t i, is equal to p i. Now, summation i, p i, is simply equal to 1, and it is less than or equal to 1. Therefore the Kraft inequality is satisfied. Then there exists a prefix code with codeword lengths l i. Assign the codeword with length l i to the probability p i, for all i. Since, for all i, l i is equal to t i, and t i is equal to minus log D p i, we have l i equals minus log D p i. We see from Theorem 4.6 that this code achieves the entropy bound.