Hello welcome to the fourth week of the course on how to sing a process to a music application. And in this demonstration class I want to continue what we started in the last one which was to analyze a sound in that case a sound of soprano using the short time transform, the topic of this week. So in this lecture, I want to analyze another sound, a sound that can give us another view of the short time frame transform. So let’s open the sound visualizer, and this is the sound we’re going to analyze today. This is the sound of a piano and so let's listen that. [MUSIC] Okay, so this is a very simple piano phrase, quite clear, five notes. So let's go directly to the sms-tools and let's go to the short-time Fourier transform module. So let's go to the piano sound that is here, piano.wav, okay now let's decide about the parameters okay? In the last class we mention that the Blackman was quite a good choice for what we doing so let's keep it the window size, okay. This is not a high pitch as the voice, so we would need quite a bigger window size. So, I don't know let's start with for example a 1501. This a knot size window and this is something that we will whenever possible, we will do. If we take windows with an odd size, that means that they can be centered around zero, and especially for the phase analysis, that's going to be very convenient. So let's use that and let's take that as a habit of using always odd sized windows. The FFT has to be bigger than that. And of course, normally now we will be all using R of 2. So that is efficient if you use FFT algorithm. So the power of 2B here then 1,500 is 2,048. Okay, that's a good size. And the hop size has to be, for the Blackman window has to be hops, so that the window overlap correctly. So, let's say they have to be at least one-fourth of 1,500, so that would be around, let's say 325, okay. That would be around one fourth. And let's compute. Okay, this was the input sound, the magnitude, and phase spectrogram. And their output reconstructed. So let's first just listen to the reconstruction. [SOUND] Okay, that's pretty good. So, I guess we haven't lost any information from the analysis. That means that the hub size and the window size were chosen correctly so that they overlap correctly. In the phase spectrum, it looks like very minimalist, but it's quite interesting. We see these very clear vertical lines, and these correspond to the attacks. Basically this means that during these attacks, the phase is quite disrupted, is quite chained. There is a quite big transition there. That's something that we see very clearly in the phase information. And in the more steady, or in the notes we see more of these horizontal structure. That means that the harmonics maintain a kind of a phase continuity that can be identified in the phase spectrogram. In the magnitude spectrogram, well we see very clearly the harmonics. These are red lines. And we see that as the sound evolves, a piano being a percussive instrument in the attack, there is more energy and so there is more harmonics. And as the time evolves, the harmonics are decaying and especially they decay the high harmonics and the low harmonics are staying more. We also see quite clearly the attacks of the sound and what is going on during the attacks so that's quite interesting. Okay now let's zoom in and let's go into some detail of that so let's use the option of this figure of doing zooming into a rectangle. And let's just take this middle note, the fourth note from a little bit before they attack to around, when the note ends. Okay, and that's what we're getting, and what we are seeing, in fact, is the de-secretization of the analyses. We have zoom enough so that we can see this vertical, kind of, quantization, these vertical bars. This correspond to every frame, every spectrum computed. So at every bar correspond to the number of samples of the hop size. So this was this 325 samples that we are hopping from one frame to the next. And vertically we also see this kind of discretization, this horizontal lines that are more narrower because we have taken quite a bit of samples in delta t. We have taken 2,048 samples, so we have a pretty good frequency resolution. Let's compute with a different set of parameters. For example, let's use window size, which is smaller. For example, let's use 201 samples. And let's use an FFT size correspondingly smaller, it doesn't have to be that big. So, let's say 256, and of course the hop size has to be accordingly to the window size, at least one-fourth, so let's use 50. And now let's compute it. Takes a little bit, because, it's, is of course, being the hub sized model, has to compute more FFT's and this is the what we get. Basically we are visualizing a similar thing, the analysis and then the synthesis and the synthesis is going to be pretty good. Let's listen to that. [MUSIC] Since we have maintained the same relationship between the hub size and the window size, the identity is preserved. So the output sound is identical to the original. But now let's zoom into the same region that we zoomed before to try to understand the differences. Let's get a little bit before the tag and let's get a little bit of the steady state, okay. And let's compare it with the previous one. Okay, this was the previous and this is this one. Well, quite different. If we mention what we were talking about before, the concept of the vertical and horizontal lines, in terms of the vertical lines, we see them narrower. There are more frames per second here. So the resolution, the time resolution is bigger. Okay so we see more things in terms of what, how things evolve in time. In exchange, at the vertical axis, the frequency resolution is worse because the 50 size was much smaller, therefore these boxes are kind of larger in the vertical access. So we see less information in the frequency resolution domain. And these are the core of one of the things that are fundamental for the short time free transfer is what we call the time frequency compromise. In the first case, we had a good frequency resolution and a not so good time resolution in exchange in this second example, we have a pretty good timed resolution but not so good frequency resolution. And that's a quite important consideration to take into account when we analyze a sound and to decide what is the best set of parameters for a particular sound. Okay, now let's go into one aspect to the attack, and try understand some aspect of the sound byte, looking at these find spectrum analyses that we have started to do. So, for that, let's do the DFT okay, and we will just compute the DFT of one location at the attack. The attack, more or less, it was around, let's see, it was 1.54. That's kind of where the attack is. And let's keep these same resolution that we have. So let's keep the 1,501. And let's have the FFT size 2,048. And let's use the piano sound. Okay, now we'll compute it. Okay this is the beginning. We see here there is the attack on the piano. So we see quite a bit of things going on here. The phase and the reconstruction. Let's zoom into the beginning. So let's just get the magnitude spectrogram up to, let's say, well, let's get it up to 10,000 hertz. Okay so we see quite a bit of things. Let's now recompute with the same parameters but a little bit beyond the attack. So when more is a steady state. So let's say, 100 milliseconds after. So, one second and 64 with the same parameters okay, and this another analysis. And again, let's zoom into the same region. Let's just zoom into the region that goes until 10,000 hertz and that we get all the information. Okay and let's compare it, and let's see if we can understand what is going on at the sound level. The top is the tack, the bottom is the more steady state. In the time domain clearly we see the difference, in the frequency domain I believe we can see also significantly difference. For example in the top the harmonics are not so well defined because it's the beginning of the sound the harmonics have not been started completely, instead in the steady state, these peaks are much more clear, much more resolved, okay? And then another thing is that in the attack, the kind of the noise floor or basically the energy of the high frequencies is higher. So the high frequencies are much louder than or at least substantially louder than during the steady state in which are the lower harmonics that are clearly louder. Okay so this is a good way to try to understand a particular sound, a particular fragment of a sound and do some analysis using the short-time Fourier transform that gives us some insight into the sound. Okay, so that's basically all I wanted to say. So we have been looking at a sound, in this case the piano sound, using the sms-tools. And, of course, the sound is available under free sound, and, hopefully, this has given you another insight into the tool we are building, in this case, the short-term fray transfer. But, at the same time, has given you some insight into the piano sounds And I believe it's quite an interesting instrument and sound and using these tools we can appreciate quite a bit of it. So anyway, so that's it for the demonstrations of this week. So I hope to see you next class. Thank you.