In this module, we will look at images as a particular class of multidimensional digital signals. We'll look at different ways to represent this two dimensional signals. And we will explore some basic instances of images and operators that we can apply to them. Before we start, let's make the acquaintance of this cute little dog. We decided to boost the ratings of our class by using the picture of a puppy, for all the image examples that we'll use in the following. But even behind the puppy, there's a hard mathematical reality. And so digital images can be expressed as a two-dimensional signal. x[n1, n2] where n1 and n2 are integer numbers, integer indices. And each combination of n1 and n2 indicates a point on a grid. We know already from everyday life that we call these points pixels, as picture elements. Usually the grid is regularly spaced. So we have regular arrangement of points on the grid. And the value of the signal at coordinates n1 and n2 refers to the pixel's appearance. Now what do we mean by that? Take for instance, a grayscale image, a black and white picture. In digital format, we can encode the appearance of each pixel in the image by associating the grayscale level to a scalar value. The appearance will vary between a lowest value, which we associate to black, to a highest possible value which we associate to white and all gray levels in between. For color images, we need something a little bit more complicated. We need to use the concept of a color space. Now the theory of color spaces is a very fascinating subject. But it's way too complicated for even a cursory introduction here. So we will just rely on your everyday experience. You are all familiar with the RGB color model for instance, which is used in monitors for your computer. The RGB coding associates to each color pixel three scalar values. R, G, and B which represent the amount of red, green, and blue, that we need to mixed up to obtain the desired color. It's a fascinating fact about the human visual system that we can actually represent such a wider array of colors using just three components. Now when we used a color model it means that each pixel has a multidimensional value. But since these values are independent, we can split the original color images into three fundamental components that are associated to the components of the vector space. So in the case of RGB for instance, we will have three independent components that encode the red part, the green part and the blue part. Each of these images is now a scalar image. And so in the following, we will be simply concentrating on these color images for processing. So in image processing, we're moving from one to two dimensions. And we know quite a bit about one-dimensional signal processing already. When we move to two dimensions, something still works. We can still use some concepts that we used in one dimension. Something unfortunately breaks down and new things appear. So let's look, in turn, at these three scenarios. The things that work and work rather well, are the concepts of linearity and convolution. The Fourier transform in two dimensions is a simple extension of the one dimensional Fourier transform. And interpolation and sampling work exactly in the same way in two dimensions. What works less well in image processing, is that, for instance, Fourier analysis, which algorithmically, is just an extension of the 1Dks, becomes much less relevant in the case of images. Filter design is much harder as well, and IIR filters are rare. And later operators are only mildly useful, and the reason is images are very diverse signals. Imagine a photograph of a landscape where you have all sorts of objects and textures. Now a linear operator, and in 2D case it would be a linear space invariant operator, would apply the same kind of transformation to all different parts of an image. Regardless of what they represent. And of course it's kind of difficult to imaging that the same filter, unless it's a very, very simple operation, will yield the desired result when applied to very heterogeneous parts of an image. New concepts that appear in image processing is the fact that we can introduce a new class of manipulations called affine transforms. These include rotation, scaling, skewing of images. Things you do in Photoshop all the time, the fact that images are finite support signals. By definition you take an image with a camera, and the CCD, the sensor of a camera has a finite surface, so they're intrinsically finite support. And because of that, an image signal is available in its entirety from the beginning of processing. So while in 1D you can imagine the system that works online with the samples coming and you never know when the samples are going to stop. When it comes to images, you sort of assume that you have the whole image already in memory before you start processing. So causality is less of an issue in image processing. However, all this applies to images not to general 2D signals. Images are a very specialized signal. And they are designed for a very specific type of receiver, the human visual system. So images are a very small subset of 2D signals. And a subset that is imbued with semantics. Now semantics are very very hard to deal with in our linear space and variant paradigm. Let's now look in more detail at the extension of our digital signal processing paradigm to two dimensions. So a discrete space signal is a signal that we indicate with this notion, x[n1,n2]. n1 and n2 are two discrete valued indices. The signal could be complex valued. But of course in the case of images, scalar images, the values are going to be real. So how do we represent this 2D signal? Well, from a standard mathematical point of view, we could represent it with a Cartesian plot where we have one axis that indicates the first index. The second axis indicates the second index and the value of the signal is represented as a third coordinate. So we have a 3D plot where the scalar values form a 3D surface. Sometimes, especially in conjunction with the description of filters, we are interested in what we call the support representation of a 2D signal. In this representation, we take a birds eye view of the signal and we only represent the nonzero values of the signal as dots in a two dimensional plane. Since the height of the pixels, namely the scalar value, cannot be inferred just by the dot representation. We often write the value on the signal of the particular location next to the dot. So this plot for instance, represents the two dimensional delta signal. Which is a signal which is 0 everywhere except in the origin where it is 1. And so you have just one red dot here at the origin with value 1. Of course, the most common representation for a 2D signal which is also an image is an image representation. In this case, we exploit the dynamic range of the medium. In this case we have computer monitor, and we know that each pixel can be driven to represent a different shade of gray. And since the pixel values are packed very closely together in space, here for instance we have 512 by 512 pixel values. The density will be high and the eye will create the illusion of a continuous image. So one question that could come up naturally at this point is, why do we go through the trouble of defining a whole new two dimensional signal processing in paradigm? Can we just convert images into 1D signals and use the standard things that we've used so far? And of course the sometimes, that's exactly what we do if we think of the printer that prints one line at a time as the paper rolls out, or a fax machine. That's exactly what happens. However, if we do that, we miss out on the spatial correlation between pixels, and therefore the properties of an image will be more difficult to understand. Let's look at an example. Here we have a 41 by 41 pixel image and the content of this image is simply a straight line. We will see that the angle of the straight line will change later. What you have in the bottom panel is what we call a raster scan representation of the image. In other words, we go through the lines of image one by one. And with plot the corresponding pixels on this axis. Now for a horizontal line coincides with the n1 axis the resulting unrolled representation is just a series of 0 pixels. Except when we scan this line at which point we will have 41 pixels equal to 1 and then we will go back to 0. So this is rather simple to understand, but if we change the angle of the line, we see that the representation in an unrolled fashion changes in ways that are not very intuitive. When the angle is small, we have clusters of pixels interspersed with 0s. As the angle increases, the spacing of the clusters changes and also number of pixels per cluster. It's very hard to understand the visual characteristic of the line from the position of the clusters and the number of pixels. After we passed the 45 degree angle we will have collections of single pixel clusters. And the spacing of these clusters will change in even more subtle ways according to the angle of the line. Finally when each line that is coinciding with the n2 axis, we will have single pixels that are separated by 40, 0s. Because as we scan the image, we will hit in on 0 pixel, and then we will have to go 40 pixel before we head to another one. This simple example should convince you that a full 2D representation is necessary to best describe and interpret an image signal. Just like we did for the 1DKs, here are some basic signals. The first one we have already seen in passing is the delta, the impulse. Which is 0 everywhere except in the origin, where it is equal to 1. And the support representation is like so. The two dimensional rect signal is defined by two parameters, which we may call the width and the height of the rect. And it is 0 everywhere except in a rectangular region which is defined by those values of the n1 index. They are smaller than capital N1 in magnitude and those values of the n2 index, they are smaller in magnitude to the capital N2. And of course it looks like this. One fundamental property for two dimensional signals that has no equivalenct in one dimension, is that of separability. Now, separability simply means that we can write a two dimensional signal as the product of two independent 1D signals, defined on indices n1 and n2. So the delta signal for instance, is fully separable because it is just a product of 2 delta functions applied to both indices. And similarly, the rectangular function is again, the product of two one dimensional rect functions defined over N1 and N2. Separability is a fragile property in the sense that we can have simple transformations or linear combinations of separable objects which are no longer separable. In this picture here you have a square, for instance, which is simple a rect function with equal width and height. But which is rotated by 45 degrees. This signal is now separable. It is expressed as 1 when the sum of the indices in magnitude is smaller than capital N and 0 otherwise. But there is no way that we can express this signal as the product of two elementary 1D signals. Similarly, the difference of two rect signals which are separable to begin with, gives rise to a non-separable frame-like pattern as in the picture. Separability does not represent a natural property of images. But it is very important from the computational point of view. We can understand that fully if we consider the two dimensional convolution, which is really a simple straightforward extension of the one dimensional case. The convolution of two sequences in discrete space is simply the sum for the first index that goes to minus infinity to plus infinity. Of the sum for the second index that goes from minus infinity to plus infinity. Of the first sequence times the second sequence. Space reversed and centered in n1 and n2. So we just have a doubling of indices with respect to the 1Dks. If one of the signals is separable, then we could show that the convolution can be split into two one dimensional convolutions as shown here. So if we assume, for instance, that h is separable in the convolution of h of n1 n2 with x of n1 and n2, then the convolution can be performed in two steps. First we could involve x with h2 of n2 using only n2 as the free variable, and then we can involve the result with h1 of n1. The consequences of this split are extremely important when we consider the computational requirements of the convolution operator. If h[n1, n2] is a finite-support signal, and the support is of size capital M1 times capital M2. And this by the way is going to be the standard setup for FIR filtering operations in 2D. A non-separable convolution will require M1 x M2 operations per output sample. Whereas a separable convolution will require M1 + M2 operations per output sample, which is generally much, much smaller than the number of operation required in the previous case.