0:09

We see that it is not easy to measure the entropy or the uncertainty of a source.

Â There is typically structural or dependencies in

Â the data, which means that there is

Â redundant information, which need to be acknowledged

Â when measuring the entropy of the source.

Â 1:21

The answer is that estimating the entropy of a source is a very difficult task.

Â Shannon himself actually looked into this question in trying to estimate the

Â entropy of the English language in his 1948 seminal paper and in other papers.

Â So if we assume a discrete memoryless source

Â model for the language and we assume that all

Â the letters are equally represented, have equal probabilities, then

Â this is the estimate of the entropy we obtain.

Â It's actually about 4.76, if we include the space as the 27th letter.

Â 2:00

We can do better than that.

Â We still use a DMS model, but now we go

Â and measure the probabilities of appearance of each individual letter.

Â And in this case, the estimate of

Â the entropy becomes smaller, 4.14 bits per letter.

Â We still know, however, that this is not a

Â very accurate model, because there are dependencies amongst letters.

Â There are combinations of letters that are more probable than

Â others and combinations that do not exist in the English language.

Â So therefore, if one looks at these dependencies

Â by considering up to eight letters, then this

Â is the estimate of the source, of the

Â entropy of the source, that one can obtain.

Â So this is the most accurate or one of the

Â most accurate estimates of the source of the English language.

Â And therefore, this is the number one should try to come

Â close to in designing efficient codes for encoding the English language.

Â In general, the more we know about the structure of the

Â data, the better our estimate of the entropy is going to be.

Â We're going to demonstrate this with two simple toy examples.

Â So assume we have this stream here of symbols.

Â So this source has six symbols in it's alphabet.

Â In this stream we go and measure the appearance of two and four.

Â So it's two out of ten.

Â Therefore the probability of two is 0.2 and so it

Â is a probability of four and so on and so forth.

Â We use the formula of the entropy for a DMS source

Â and we find that in this particular case this is the entropy.

Â We need 2.44 bits per symbol.

Â But the way here, since they have six symbols if I, if I were to use a fixed

Â code then I would need three bits per symbol because two to the third is eight.

Â So with three bits I can express eight different codewords.

Â Here I have only six symbols but using a variable length code, in

Â principle, I could go closer to the entropy of 2.44 bits per symbol.

Â Now, if we stare at this particular stream here for, for

Â a little, we can possibly see some structure in it and to

Â express the structure we propose a prediction model.

Â 4:32

So what this tells me, so x of n first

Â of all, the value in this stream at location n.

Â So if this n, then the one to the left is n minus 1 and so on.

Â So this first order linear prediction model tells us that I

Â can find the value of the signal at location n by

Â assuming it's equal to the value at the previous location n

Â minus 1, plus the residual or the correction tell or the error.

Â 5:12

So the residual consists only of plus and minus 1.

Â I measure the probability of one, the probability of minus 1.

Â I utilized the DMS formula for the entropy.

Â And we see that the entropy is 0.7 bits per symbol.

Â So it's more than three times lower the

Â estimate than the entropy we, we first obtained.

Â We're going to look at predictors of this

Â nature multiple times in this segment of the course.

Â We'll talk about compression.

Â And the idea here is that we build a

Â predictor to predict, as here, the value of x

Â at location n, and then all we have to

Â encode is the unpredictable part, which is the residual.

Â 5:59

Of course, in decoding it, I need to have the predictor model.

Â It's a static predictor, in this particular case.

Â So given the predictor model of the residual

Â I am able to construct the original sequence.

Â A similar example is represented here.

Â So this is now the generated message, the stream of symbols.

Â I have here I see three different symbols.

Â So I measure the probability.

Â So the source generates only symbol three, symbol four, and symbol five.

Â And I do see that I need 1.5 bits

Â per symbol to represent this, according to the entropy.

Â Now if we again stare at it for a second, we see that, in essence,

Â if I look at two symbols at a time, so it's like I'm extending

Â the source, I now use a block code, you might say, then I

Â only have the symbol 33, and this should be 45 here.

Â And, then in this case since they're equally probable, the entropy's

Â one bit per new symbol or therefore half a bit per original symbol.

Â So again, we can, reduce the entropy estimate considerably by looking

Â into the structure of the data, exploiting it, and capitalizing on it.

Â Let us look now into a simple example of codes.

Â In getting a better understanding of them and then

Â also introducing and discussing some other properties of codes.

Â We are given this toy source.

Â It emits one of four symbols and

Â here are the associated probabilities of the symbols.

Â 8:04

And we can find that for this particular

Â source here is its entropy, 1.75 bits per symbol.

Â You are given this code, Code 1.

Â And we are asked to figure out whether this is a good code, a useful code.

Â It takes a quick second to look at it and

Â realize that if we are given for example code word 0.

Â It's not clear whether we should decipher it as symbol S1 or as symbol S2.

Â So this code is not uniquely decodable or uniquely decipherable.

Â We can find the average codeword length for this code or the rate for this code.

Â It's 1.125 bits per symbol and since this is smaller than the entropy, this

Â is already a red flag that there's probably something wrong with this code.

Â We are also presented with Code 2 and we're asked the same question.

Â Probably takes an additional second to also

Â realize that this code is not uniquely decodable.

Â If we are given, for example, the sequence of

Â code word 00, we could decipher them as S1,

Â S1 or we could decipher them as symbol S3.

Â So this code is not uniquely decodable.

Â Again, we look at the rate of the code is below

Â entropy, the Source Coding theorem is violated, it's not a useful code.

Â 9:41

We're given Code 3.

Â One of the observation about this code is that this is a prefix code.

Â The name might be a bit misleading.

Â It's actually a prefix(-free) code, which means no

Â code word is a prefix of another code word.

Â As such, it's intuitively clear that this code is uniquely

Â decodeable and more than that, the decoding is instantaneous.

Â In other words, if we look at the bit stream, let's say 010110,

Â we start deciphering it, so as long as,

Â as soon as we see 0, we know that this should be coded as 1, because 0 exists.

Â Is a code where there can find in Code 3.

Â Then I see that 10 is a code that exists there, so this is deciphered as S2.

Â 110 is deciphered as three.

Â So this is a prefix code and it's uniquely decodable, any prefix code.

Â And it's also an instantaneous.

Â We see actually that its rate, the average code word length,

Â is exactly entropy, so this is an optimal you might say, code.

Â It achieves entropy.

Â We cannot do better than that.

Â We are also given Code 4.

Â 11:08

It takes a bit more thinking now to see whether this is uniquely decodable or not.

Â Actually, there is a specific algorithmic test based on

Â which you can test whether a given code is decodable.

Â And this is indeed a uniquely decodable code.

Â It's not a prefix code, because zero, for example,

Â appears as a prefix to all other code words.

Â So not prefix.

Â 11:40

And this not instantaneous either, since it's not prefix.

Â So we see that unique decodability and prefix code relationship goes one way.

Â So prefix means uniquely decodable, but uniquely decodable does

Â not mean that the code is a prefix code, as indicated by Code 4.

Â We see that the rate of this last code is a bit higher than entropy.

Â So if one were to choose one out of the four codes, then

Â Code 3 would be the one of choice, because being a prefix and

Â instantaneous is an advantage and also its rate is the lowest one among the four,

Â although as we've already discussed, these are not useful or acceptable codes.

Â 13:21

So clearly, to get to codeword S1, I have one 0, so S1 is represent by 0.

Â S3 is represented by 00.

Â And S4 is represented by 11.

Â This ending nodes are called leaf nodes.

Â And actually, the root is also a branch node, because

Â the tree branches out at that particular location.

Â So let's do the same thing for Code 3.

Â 14:05

So to obtain the code word for symbol S3, we see that we

Â have to go through a 1 here, a 1 here, and a 0 here.

Â So the code word for S3 is 110.

Â Now, an important observation here is that this Code

Â 3, if you remember, it's a prefix code, which you can

Â observe right away by looking at the 3 representation,

Â in order to get to any of the code words, we don't need to go

Â through another code word, which was the case for example

Â with Code 2.

Â To get to S3 for example, we have to go through S1.

Â So S1 is a prefix to S3, but here this is not the case.

Â So none of the code words is a prefix

Â from another code word and therefore is a prefix code.

Â And equally importantly is that the leaf nodes here are represented by symbols.

Â So the symbols, rather, are

Â represented by leaf nodes.

Â 15:44

So, I put here also the various terms that I introduced.

Â So this is an important point that we see here for prefix codes and we'll

Â see that when we talk pretty soon about the Huffman code,

Â that due to the construction I end up with a construction like this.

Â In other words, the symbols are represented by leaf nodes

Â and therefore, we know that this is a prefix prefix code.

Â 16:27

So, as mentioned, it's hard to tell if a code is uniquely decodable.

Â However, there is a systematic procedure, which, based on which one can determine

Â in a finite number of steps if, indeed, the code is uniquely decodable.

Â