1:14

Take for instance, a grayscale image, a black and white picture.

In digital format, we can encode the appearance of each pixel in the image

by associating the grayscale level to a scalar value.

The appearance will vary between a lowest value, which we associate to black,

to a highest possible value which we associate to white and

all gray levels in between.

For color images, we need something a little bit more complicated.

We need to use the concept of a color space.

Now the theory of color spaces is a very fascinating subject.

But it's way too complicated for even a cursory introduction here.

So we will just rely on your everyday experience.

You are all familiar with the RGB color model for instance,

which is used in monitors for your computer.

The RGB coding associates to each color pixel three scalar values.

R, G, and B which represent the amount of red, green, and

blue, that we need to mixed up to obtain the desired color.

It's a fascinating fact about the human visual system that we can actually

represent such a wider array of colors using just three components.

Now when we used a color model it means that each pixel has a multidimensional

value.

But since these values are independent,

we can split the original color images into three fundamental components

that are associated to the components of the vector space.

So in the case of RGB for instance, we will have three independent components

that encode the red part, the green part and the blue part.

Each of these images is now a scalar image.

And so in the following, we will be simply concentrating on these color images for

processing.

So in image processing, we're moving from one to two dimensions.

And we know quite a bit about one-dimensional signal processing already.

3:18

The things that work and

work rather well, are the concepts of linearity and convolution.

The Fourier transform in two dimensions is a simple extension of the one dimensional

Fourier transform.

And interpolation and sampling work exactly in the same way in two dimensions.

What works less well in image processing, is that, for instance,

Fourier analysis, which algorithmically, is just an extension of the 1Dks,

becomes much less relevant in the case of images.

Filter design is much harder as well, and IIR filters are rare.

And later operators are only mildly useful, and

the reason is images are very diverse signals.

Imagine a photograph of a landscape where you have all sorts of objects and

textures.

Now a linear operator, and

in 2D case it would be a linear space invariant operator, would apply the same

kind of transformation to all different parts of an image.

Regardless of what they represent.

And of course it's kind of difficult to imaging that the same filter,

unless it's a very, very simple operation,

will yield the desired result when applied to very heterogeneous parts of an image.

New concepts that appear in image processing is the fact that we can

introduce a new class of manipulations called affine transforms.

These include rotation, scaling, skewing of images.

Things you do in Photoshop all the time,

the fact that images are finite support signals.

By definition you take an image with a camera, and the CCD, the sensor of

a camera has a finite surface, so they're intrinsically finite support.

And because of that,

an image signal is available in its entirety from the beginning of processing.

So while in 1D you can imagine the system that works online with

the samples coming and you never know when the samples are going to stop.

When it comes to images, you sort of assume that you have the whole image

already in memory before you start processing.

So causality is less of an issue in image processing.

However, all this applies to images not to general 2D signals.

Images are a very specialized signal.

And they are designed for a very specific type of receiver, the human visual system.

So images are a very small subset of 2D signals.

And a subset that is imbued with semantics.

Now semantics are very very hard to deal with in our linear space and

variant paradigm.

6:01

So how do we represent this 2D signal?

Well, from a standard mathematical point of view, we could represent it with

a Cartesian plot where we have one axis that indicates the first index.

The second axis indicates the second index and

the value of the signal is represented as a third coordinate.

So we have a 3D plot where the scalar values form a 3D surface.

Sometimes, especially in conjunction with the description of filters,

we are interested in what we call the support representation of a 2D signal.

In this representation, we take a birds eye view of the signal and we only

represent the nonzero values of the signal as dots in a two dimensional plane.

Since the height of the pixels,

namely the scalar value, cannot be inferred just by the dot representation.

We often write the value on the signal of the particular location next to the dot.

So this plot for instance, represents the two dimensional delta signal.

Which is a signal which is 0 everywhere except in the origin where it is 1.

And so you have just one red dot here at the origin with value 1.

Of course, the most common representation for

a 2D signal which is also an image is an image representation.

In this case, we exploit the dynamic range of the medium.

In this case we have computer monitor, and

we know that each pixel can be driven to represent a different shade of gray.

And since the pixel values are packed very closely together in space,

here for instance we have 512 by 512 pixel values.

The density will be high and the eye will create the illusion of a continuous image.

So one question that could come up naturally at this point is,

why do we go through the trouble of defining

a whole new two dimensional signal processing in paradigm?

Can we just convert images into 1D signals and

use the standard things that we've used so far?

And of course the sometimes, that's exactly what we do if we think of

the printer that prints one line at a time as the paper rolls out, or a fax machine.

That's exactly what happens.

However, if we do that, we miss out on the spatial correlation between pixels, and

therefore the properties of an image will be more difficult to understand.

Let's look at an example.

Here we have a 41 by 41 pixel image and

the content of this image is simply a straight line.

We will see that the angle of the straight line will change later.

What you have in the bottom panel is what we call a raster scan representation of

the image.

In other words, we go through the lines of image one by one.

And with plot the corresponding pixels on this axis.

Now for a horizontal line coincides with the n1 axis the resulting

unrolled representation is just a series of 0 pixels.

Except when we scan this line at which point we will have

41 pixels equal to 1 and then we will go back to 0.

So this is rather simple to understand, but

if we change the angle of the line, we see that the representation

in an unrolled fashion changes in ways that are not very intuitive.

When the angle is small, we have clusters of pixels interspersed with 0s.

As the angle increases, the spacing of the clusters changes and

also number of pixels per cluster.

It's very hard to understand the visual characteristic of the line from

the position of the clusters and the number of pixels.

After we passed the 45 degree angle we will have

collections of single pixel clusters.

And the spacing of these clusters will change in even more

subtle ways according to the angle of the line.

Finally when each line that is coinciding with the n2 axis,

we will have single pixels that are separated by 40, 0s.

Because as we scan the image, we will hit in on 0 pixel, and

then we will have to go 40 pixel before we head to another one.

10:16

Just like we did for the 1DKs, here are some basic signals.

The first one we have already seen in passing is the delta, the impulse.

Which is 0 everywhere except in the origin, where it is equal to 1.

And the support representation is like so.

The two dimensional rect signal is defined by two parameters,

which we may call the width and the height of the rect.

And it is 0 everywhere except in a rectangular region which is

defined by those values of the n1 index.

They are smaller than capital N1 in magnitude and

those values of the n2 index, they are smaller in magnitude to the capital N2.

And of course it looks like this.

11:41

Separability is a fragile property in the sense that

we can have simple transformations or

linear combinations of separable objects which are no longer separable.

In this picture here you have a square, for instance,

which is simple a rect function with equal width and height.

But which is rotated by 45 degrees.

This signal is now separable.

It is expressed as 1 when the sum

of the indices in magnitude is smaller than capital N and 0 otherwise.

But there is no way that we can express this signal as the product

of two elementary 1D signals.

Similarly, the difference of two rect signals which are separable to begin with,

gives rise to a non-separable frame-like pattern as in the picture.

Separability does not represent a natural property of images.

But it is very important from the computational point of view.

We can understand that fully if we consider the two dimensional convolution,

which is really a simple straightforward extension of the one dimensional case.

The convolution of two sequences in discrete space is simply the sum for

the first index that goes to minus infinity to plus infinity.

Of the sum for the second index that goes from minus infinity to plus infinity.

12:59

Of the first sequence times the second sequence.

Space reversed and centered in n1 and n2.

So we just have a doubling of indices with respect to the 1Dks.

If one of the signals is separable, then we could show that the convolution can be

split into two one dimensional convolutions as shown here.

So if we assume, for instance, that h is separable in the convolution of h of n1 n2

with x of n1 and n2, then the convolution can be performed in two steps.

First we could involve x with h2 of n2 using only n2 as the free variable,

and then we can involve the result with h1 of n1.

The consequences of this split are extremely important when we consider

the computational requirements of the convolution operator.

If h[n1, n2] is a finite-support signal, and

the support is of size capital M1 times capital M2.

And this by the way is going to be the standard setup for

FIR filtering operations in 2D.

A non-separable convolution will require M1 x M2 operations per output sample.

Whereas a separable convolution will require M1 +

M2 operations per output sample, which is generally much,

much smaller than the number of operation required in the previous case.