Welcome back to the course on Audio Signal Processing for Music Applications. This week, we're talking about the short-time Fourier transform. And the spectrogram is basically the output of the STFT. It's the visualization of the time-varying spectra that we compute. So in this demonstration class I want to use the spectrogram, the short-time Fourier transform to analyze a voice sound. So that we can understand better the STFT and at the same time, understand better a given sound. So well start with the sonic visualizer. And this is the sound I want to analyze, it's a note of a soprano singing and let's hear that. [SOUND] Okay, that's a quite high pitch of sound, and it has quite a bit of a vibrato, this frequency oscillation that is characteristic of operatic singing. And to understand a little bit better the sound, let's open the spectrogram, so let's open the pane on the spectrogram. So here it is, and so we see now clearly some information of this voice sound, we see these horizontal lines that correspond to the harmonics, and we see this oscillation, which is basically this vibrato that is present. To go a little bit deeper, let's open another pane with a single spectrum of this time-veering spectrogram. So let's first show it as in a linear scale, the horizontal axis. Let's have lines as interpolation. And the window let's use the same window done for the STFT. So we'll use 1024. Okay, and this is it. Let's maybe make it zoom in and stretch so that we see same things that we would see in the spectrogram, okay? So this is one slice of the spectrogram, so all these horizontal lines correspond to these peaks that we see here, okay? And now maybe let's change things. Let's change for example the window size. If we change to 256 both analysis, okay? Now, what we're seeing is a much smoother shape, we are not seeing the individual harmonics, we're just seeing an overall shape which basically correspond to what we call the formants. The resonances of the vocal track which is what makes us be able to distinguish between vowels for example. So each vowel has a characteristic formant structure. Of course if we move, things will change a little bit. But of course the vowel remains the same so it will not change that much. If we go back to the analysis size that allows us to visualize the harmonics and when we move well, we see few more changes because the harmonics are changing more than just the formants, okay? So now if we want to change the type of window, we can go to the Preferences and in the Analysis tab, let's put this away from this. Here in the last option, we see the analysis window to be used. Okay, curly is the blackman window. Let's change for example to rectangular window, which is the square that would cut the signal very abruptly. Let's apply that. Okay, so clearly, it doesn't look so nice. A single spectrum looks with quite a lot of kind of noise, with the noise floor is very high, so we see very few of the harmonics. And in the spectrogram, we see these vertical lines, which is kind of a distortion of it. Okay, so that's clearly not that good, so we definitely would prefer for this sound, the blackman window. Okay, now let's do the same kind of analysis using the sms-tools, okay? With the sms-tools, we have the single spectrum using the DFT option or the time-varying spectrogram using the STFT option. So let's first start with a single spectrum. Let's go to the soprano sound. So soprano E4, okay? And let's analyze in the middle of the sound like 0.5 and let's use the same values that we use before, 1024 for both FFT size and window size with the blackman window. And we can compute, okay, and this is basically what we saw before. This is the 1024 samples we have started with, this is the magnitude and phase spectrum and we see clearly the peaks corresponding to the harmonics. I mean, here we see the phase spectrum, which we didn't see before, and also we see the inverse of this. So, this is the windowed signal that we generate back by taking the inverse FFT of this spectrum. And, of course, we can do the same thing. We can change windows, for example, if we change the window size to 256, and also the 50 size to 56. In the same location, we compute, well, we see we are, of course, taking much less samples and the spectra are much smoother, less information there, okay? One advantage, of course, with this interface we have is that we can independently control the window size from the FFT size. So we can put FFT size 1024 and maybe a window size not that large, maybe 801. It will compute, okay? We are taking less samples than before but still the frequency resolution is quite good. And it's quite smooth because we have been doing zero padding and so the shape of the spectrum is quite nice. Okay, let's look at all these from the short-time Fourier transform perspective from the spectogram. So we will get the same sound. Okay, and again, let's put 1024, 1024 and the hop size has to be at least much smaller than the window size in a way that they overlap at factor as correctly. So for 1024 in the blackman window, at least we need one-fourth. So let's put 256 and we compute. Okay, and this is the result. So we have the input signal. The magnitude spectrogram, the phases of the time-varying phases and the output sound and the output sound. [SOUND] Well, it's very much the same than the original because we have done a good reconstruction with a good overlap. However, if this overlap is not correctly set for example, let's put the same than the windows size for example, 1024 and let's compute it. Well, clearly now is something wrong in the output signal and if we can listen to it [SOUND], okay? Of course, we see this modulation that is at the frame rate because we are not overlapping correctly, so every frame we see a burst of sound, and they don't balance out by the overlap factor. So we definitely need to have a much smaller hop size. And anyway, that's all I wanted to say. Basically, I encourage you to play around with these parameters. You can change the window size, you can change the FFT size, the hop size, the type of window. And in between the DFT and the STFT, I believe you can get a good grasp of all these different parameters and the effect they have in the visualization of the spectrogram and also in the reconstruction of the signal. So, let's just finish and okay, today basically, we have used SonicVisualizer to analyze voice sound and to visualize the spectrogram and the individual spectrum. We have also done the same thing with the interface of the sms-tools. And of course we have used the sound is soprano sound from freesound. So this hopefully has given you a more practical view on the STFT and has allowed you to understand how useful might be to use these type of techniques to visualize a particular sound, in this case, a soprano sound. Of course this is just the beginning of this more complex analysis. So in next demonstration class, we're going to analyze a more complex sound. And we will see how we can analyze time-varying sound, that have much more structure and how we can use this spectrogram analysis to get some insights on that. So thank you very much for your attention and I hope to see you next class.