And finally, 100 will take us to the symbol C.

So as you can see,

the binary sequence is uniquely decoded into a sequence of symbols.

Now that we know how prefix-free codes work,

the goal becomes minimizing the total bit stream length.

And the intuition is rather clear.

For the symbols that appear the most often in the input sequence,

we should use short binary sequences.

For instance, in the previous example, we had a symbol that was associated to

the shortest possible binary sequence, just the digit 0.

While it would make sense to choose the most probable symbol for that position.

There is a famous algorithm by Huffman that allows you build the optimal

prefix-free binary code for a given set of symbol probabilities.

The JPEG algorithm provides you with a pre-canned Huffman code,

which is of course not optimal for every image that you encode.

But which performs reasonably well under most circumstances.

Alternatively, you can run the encoder,

compute the probabilities of each r, s pair, and build your own

Huffman code using the Huffman algorithm that we will see in just a second.

Of course, in this case, you will have to send the code along with an image to

inform the decoder on how to parse the bit stream.

And so there will be a side-information penalty that you will have to pay.

But sometimes,

when the image is very large, this could be advantages nonetheless.

Lets see how the Huffman algorithm works.

Suppose, once again, we have four symbols, and suppose

that we found that the probability table for the four symbols is the following.

The first symbol occurs with 38% probability, the second with 32%,

the third with 10%, and the fourth one with 20%.

The Huffman algorithm proceeds,

by selecting at each step the two symbols which are least probable.

So, in this case, it would be C and D.

And we build a subtree, where the two symbols are the leaves and

they are connected to an intermediate node.

Now, the idea is that the probability of going through this node, at this point,

will be the combined probability of the leaves.

So at next step of the algorithm we will have three probabilities,

the two symbols that are left in the pool, A and B.

And a composite symbol that will have a probability of 30%.

So, at the following step we select the two symbols from the set,

including the composite symbol, with the lowest probabilities.

And in this case it will be this and this.

And we repeat the process, so this is what we did at the first step and

now we're combining this node with this node, into another subtree.

Now here, B is a leaf, because it's a symbol.

And we get an intermediate node,

whose probability is the sum of the two children.

Again, we're left with a remaining symbol from the pool, with probability 38%.

And a composite symbol here, where you have 62% of total probability.

So at the last step, we finally combine the two remaining items.

And we have the final tree, where of course,

the probability at the root of the tree is is 1.

This binary tree represents the optimum prefix-free code for the set of

probabilities that we found experimentally from analyzing the frequency symbols.

And a JPEG encoder would use a similar tree to encode the r,

s pairs that were found in the run-length paths as the algorithm.