The spectrogram is a clever way of showing this time varying spectral information in one single plot. If you think about it, the short time Fourier transform is a complex valued function of two variables, m and k. And so to plot it properly we would need a four dimensional plot, which of course is not possible. We can restrict ourselves the magnitude of the DFT, at which point, the STFT becomes a real valued function of two variables, which requires a three dimensional plot. Now, this is not only quite hard to do, but also rather difficult to interpret. To make it easier to understand, what we do instead is color-code the magnitude of the Fourier transform and we use dark color, dark hues for small values and whitish or brilliant hues for large values. We also take the logarithm of the magnitude in order to compress the range of values that are associated to magnitude and to better map them over a color scale. And we put the spectral slices one after another to obtain an image like picture of the time variance spectrum. So this is the spectrogram of the DTMF signal. On the horizontal axis we put the variable m. So the starting point for each spectral slice, here on the vertical axis, we put the D.F.T. coefficient. We have a real signal so we just go from zero to L over two where L is the size of our DFT window. You can see here that the black pixels in the picture correspond to vary small values for the DFT. So these black areas indicate the silence regions in the DTMF signal. At the same time the bright bands here correspond to high values of the DFT coefficients. So these are actually the frequencies in each digit being dialed. And so in the plot, we have shown at the same time, both the time information, we have a good estimation of where each digit begins and ends and of the frequency content that is associated to each digit. So we can read this picture and find out that the digits were 1-5-9 in sequence. If we know the system clock for the signal or the sampler rate, we can label the axis just in the same way we did for the DFD. So remember the highest positive frequency is Fs over 2 where Fs is the sampler rate of the signal. The frequency resolution, how fine a frequency we can resolve in a DFT will be given by Fs over L and the width of the time slice. So the time resolution is l times T s seconds where T s is one over F s. So if we apply this to the DTMF signal, which was sampled at eight kilohertz, we can label the axis like so. We have a maximum frequency of 4kHz and a total duration of the signal of 2.1s. The natural questions that should come to mind, at this point, are, what about the width of the analysis window. We chose 256 points. Why? Is it the optimal size? What happens if we take a larger window? What happens if we take a smaller window? How do we position these windows along the signal? Do they overlap, and if so, by how much? And what is the shape of the window that we should use? By shape here, I mean the following. Here we're taking chunks of L samples and just taking a DFT of the raw data. Now suppose my signal is a very smooth signal over this window that goes like this. So this is a smooth signal and my DFT should have basically just low frequency coefficient. But now remember that to the DFT, everything is a periodic sequence. So what the DFT really sees is something like this. Now here, we have all of a sudden a big jump at this continuity. And this will create spurious high frequency content in the DFT coefficients. We can counteract this side effect by taking the raw data in a chunk and using a tapering window. So for instance, suppose we have a tapering window shaped like a triangle, we multiply each sample by the value of the window. And we will have a signal that is pretty much identical to the original data in the middle of the window, but then it tapers to zero at the extremities. And so at the end you will get something without jumps in the periodized version like so. So, the whole story is that we could really spend weeks talking about all the tricks and tweaks that we can apply to a spectrogram in order to extract some kind of information from a real-world signal. But since we don't have that kind of time here, we will just talk about the main trade-off, which is related to the size of the analysis window. Spectrograms can be either wide band or narrow band according to the frequency resolution of the associated DFT. So if we choose a long window, if our L is big, in that case we have a narrow band spectrogram. Why is that? Because a long window will give us more DFT points and therefore more frequency resolution. Remember the frequency resolution in the end is equal to the sampling frequency divided by L or 2 Pi divided by L if we remain in the abstract discrete time. However, in a long window, more things can happen. And so we have less precision in the time resolution. In the limit, a long window is the DFT of the entire signal. And we have seen that completely obliterates the time information. Conversely, if choose a short window, then we have a wide band spectrogram. A short window will create many time slices because you will divide the whole support of the signal into more chunks. And so we have a much more precise location of the transitions. But a short window will give us fewer DFT points, so the frequency resolution will be poor. So let's use our DTMF signal once again and look at the difference between a wide band and narrow band spectrogram. Here is an example of wide band spectrogram where the analysis window is just 32 samples. With such a short analysis window, we have a very good localization in time of the start and stop point for each burst of sound. But you can see that the frequency bands are extremely wide. Also having a very short window creates artifacts in the high frequency range because as we sweep the window over this signal, we will be encompassing uneven numbers of periods for the underlying sinusoids. This is a spectrogram that we saw in the beginning. The window now is 256 point. So let's say that this spectrogram is in between an extremely narrow band and an extremely wide band. This is a very good compromise to interpret what's going on in the signal. If we increase the size of the window to 1024, so four times larger, then we have an extremely narrowband spectrogram. You see now that the frequencies are localized extremely precisely, but on the other hand, the time resolution is very poor. We're completely missing the silence here in the beginning for instance and the silence between these two digits.