it turns out that the DFT looks like this.

Well, what happened was that the signal was a sinusoid that corresponded to one

of the basic factors for the length of the signal.

But with this representation in theory instead of having NR bits to

code the signal.

We just need to code two DFT coefficients possibly together with their position.

And so the amount of bits that we spent to code the signal is definitely

less than NR.

So the lesson that we get from this simple example is that we would like a transform

that when transforming an image block captures the important features of

the block in just a few coefficients.

So that we can just encode those coefficients and discard all the rest.

Also, we would like to transform to be efficient to compute

because we're interested in an efficient compression algorithm.

The answer to these two requirements is a transform

called the discrete cosine transform, or DCT for short.

Mathematically the DCT can be expressed as such.

It is very similar to the DFT.

Except that the basis functions are based on

cosines rather than on complex exponentials.

The advantage of this formulation is that the transform is purely a real

transform when applied to real signals such as images.

Instead of studying the structure of this double summation,

we can just exploit our intuition about basis expansions.

And realize that each coefficient of the DCT is just the inner product,

namely a measure of similarity between the image and each of the basis functions.

Now remember we're operating at block level.

So if for instance we're going to split our image,

as is customary in JPG, into eight by eight pixel blocks.

It means that N here is equal to 8.

And the same in the other direction.

And so we will have 64 basis functions for the space of eight by eight images.

We can actually plot these basis functions like so, and

here we have increasing values of the index K1 and of the index K2.

And we can see that by computing the DCT, we're actually correlating, namely

measuring the similarity between an image block and each one of these patterns here.

Some patterns will capture an edge like behavior.

Some patterns will capture a textural like behavior.

But in general, there will be a pattern that will be able to capture most

of the information contained in the small block.

And this is really the philosophy behind transform coding in JPEG.

Now, when it comes to smart quantization.

What we need to do is to change a little bit the definition of quantization that we

have seen in module 7.

The idea is that we want to be able to discard as many values

of the transform as possible.

And by discarding them, we mean that we set them to zero.

And therefore, we don't need to specify their amplitude.

This is achieved by using the so-called deadzone quantizers that we will

see in just a second.

The second part of smart quantization is being able to change the quantization step

according to the importance of the coefficients that we are quantizing.

So key coefficients will require a fine grained resolution.

Therefore more bits will be allocated to them,

whereas less important coefficients can be quantized rather coarsely.

So let's go back to the quantization model that we saw in module seven.

Suppose that our input is bounded between -2 and 2,

and that we're using two bits to quantize the input.

Then we're dividing the input range into four intervals,

because we have two bits per input sample.

And we are associating a representative value to each interval,

which happens to be the midpoint.

Mathematically in this case we can say that the quantized

value is simply the floor of the input value plus 0.5.

You can see however that in this model, there is no zero value in the output.

If the input is even slightly positive, it will get associated to 0.5,

and if it's slightly negative to -0.5, but there is no zero.

A deadzone quantizer on the other hand uses intervals that have the same width as

the standard quantizer.

But centers an interval around 0.

So all values in this case, from 0.5 to -0.5 will be mapped to 0.

So small values will be quanitzed to 0.

And then the rest proceeds with the usual staircase, so

everything is shifted by 0.5.

From 0.5 to 1.5, we will have a representative value of one.

Mathematically, this is equivalent to saying that the quantization scheme

simply rounds the input value to the nearest integer.

You can notice now that we have an asymmetric characteristic for

the quantizer.

So in a sense, we're wasting half a bit, we can add the level to the right side or

to the left side if needs be.

But the advantage of having a deadzone largely offsets the loss of half a bit.

And finally, let's consider the last ingredient in the success of JPEG

which is entropy coding.

The idea behind entropy coding is to minimize the effort to encode a certain

amount of information.