0:01

Welcome back to the course on audio signal processing for music applications.

Â This week we are talking about the harmonic model.

Â And in these demonstration classes,

Â we are trying to understand this model by actually using it.

Â By analyzing some sounds and synthesizing them.

Â In this lecture, I want to go a little bit beyond what we did and

Â analyze a fragment of a sound.

Â And see if we can take it to the limit and

Â see what is it's potential and it's limitations.

Â So, in particular, we will be analyzing a few notes of a cello that I played.

Â And the cello is, of course, it's a great instrument.

Â It's a very traditional instrument that you can do a lot of things with it.

Â So a good way to get a grasp of the types of sound that the cellist does is look

Â at freesound and just search for violin cello.

Â Okay, and that will give you a few samples of different types of cello sounds.

Â So in fact, for example, the first one is kind of an extended technique.

Â 1:09

So let's listen to that.

Â [NOISE] Okay, that's what is called a seagull effect.

Â It's kind of an interesting sound.

Â Of course you can also get some more traditional notes playing what

Â is called tenuto.

Â [SOUND] Or also with the cello you can play pizzicato notes,

Â and this is pizzicato note.

Â [SOUND] But, of course, you can find many types of sounds and little fragments.

Â The sound that you will be using is model a short

Â cello-phrase from a very traditional catalan song.

Â The song of the birds.

Â And in fact, it's the one that I use in the teaser of the course.

Â Let's listen to that one.

Â [MUSIC]

Â Okay, so let's analyze this sound.

Â And let's open the SMS Tools GUI, and

Â we'll first start from the short-time Fourier transform.

Â This is a time-varying sound,

Â so we need the short-time Fourier transform to get a grasp of it.

Â 2:37

Okay, and okay, we have to choose the parameters.

Â And okay, instead of a humming window, let's choose the blackman window.

Â The blackman window after the main lope, is wider than the humming,

Â but the slide lopes are lower, and that may be good for this sound.

Â So, let's choose a blackman window.

Â And then, okay.

Â Window size, we don't have much to decide from,

Â but let's just, for example, let's use 1001 examples.

Â Just leave the 50 size at 1024, and the 50 size, the hub size,

Â has to be at least one-fourth of the window size, so let's use 250.

Â Now we compute that.

Â This is a longer fragment so it's going to take a little longer.

Â 3:28

And from it we will be able to visualize the magnitude spectrum and

Â the phase spectrum.

Â Okay. Now what we are interested in is

Â in deciding what parameters

Â are needed in order to be able to distinguish the harmonics of this phrase.

Â So, in fact, an important thing

Â is to identify what is the lowest fundamental that is being played.

Â The lowest fundamental will be the one that will determine

Â the minimum distance between two harmonics.

Â So let's zoom in to the very bottom of this spectrogram.

Â Okay.

Â 4:09

Okay, so this is the first and a little bit of the second harmonics.

Â So it's a very clear harmonic sound.

Â And we can see that it starts a little bit low, goes up, and then it goes down.

Â Clearly the lowest frequencies are going to be the first and the last.

Â But here we see that the resolution is not so good.

Â In fact, we see these boxes, kind of this quantization in the horizontal axis.

Â The vertical axis is pretty good.

Â We have 250 samples so there is quite a lot of frames.

Â But there is not that many in terms of frequency samples in order to be able to

Â visualize and then, further on, analyze the peak of this harmonic.

Â So let's increase, maybe the window size cannot be increased,

Â because then we would lose the time per solution, but

Â let's increase the FFT size, so we get a smoother spectrum of samples.

Â For example let's say 4096, so quite a bit more of the FFT size.

Â So this will give us quite a bit of zero padding and

Â therefore will give us many more samples

Â in the frequency lane even though the actual data point will be the same.

Â So again, this takes a little bit to compute.

Â 5:34

Ok, so this is the spectrum.

Â Clearly there is a more fine resolution than before.

Â Let's do the same thing.

Â Let's zoom into the very bottom of the spectrum.

Â Okay.

Â And okay and yeah.

Â Now, we definitely have many more frequencies samples and of course it looks

Â similar but we now would be able to see the center of the window much better.

Â So if we look like in the last node but with the center of this,

Â now looking at the y-axis, it's around 348 hertz, okay?

Â So that would be the lowest note.

Â And the highest note is around 456 hertz, okay?

Â So this is good information for deciding, now, the window size that we should use.

Â So in fact, okay let's do the sinusoidal model, and let's look at this information.

Â So in order to decide what is the period lens,

Â we have to take 44,100 and

Â divide it by that frequency which was around 340 hZ.

Â Okay, so make it a little bit lower.

Â So this says 129 samples, and then if we use, for example,

Â the blackman window, well, we will need six times that.

Â So, if we take six times this.

Â 7:18

That should be enough to discriminate the harmonics.

Â And the FFT size, I think it was good to have these big FFT size so

Â 4,096 was a good choice.

Â because it gave us a good resolution, at least visually.

Â And now, of course, in the sinusoidal model, we can choose threshold,

Â the minimum duration of the sinusoids and how many sinusoids we want to track.

Â The maximum frequency deviation here we should, I think,

Â have it a little bit bigger because there is quite a bit of variation.

Â Now let's, of course, choose the cello sound,

Â the cello-phrase, and we'll compute it.

Â So this is the sinusoidal analysis.

Â 8:26

Okay, we see well, definitely the harmonics,

Â but also we some lines in between them.

Â And we see some, trajectories stopping and continuing.

Â Anyway, so this sinusoid model.

Â If we listen to the result.

Â [MUSIC]

Â Well, it's pretty good.

Â Maybe we are losing a little bit of the attacks of the notes, but

Â it's pretty good.

Â So let's go directly now to the harmonic model, and

Â let's use the same cello-phrase.

Â 9:08

And let's use the Blackman window.

Â Let's use the same 700.

Â We should have an odd size window.

Â So let's use 779 in terms of the FFT size again.

Â Okay I think it was a good choice, 4096, a lot of zero padding.

Â The magnitude threshold minus 90, that's okay.

Â The duration of the tracks.

Â Okay, so these will require to be 0.1 seconds.

Â I think we can even make it bigger.

Â So let's say 0.2 seconds.

Â And the maximum number of harmonics, there is clearly no need for 100.

Â In fact, a way to check how many harmonics are needed is the,

Â if we divide 44,100 by the lowest frequency.

Â Okay no. We have to divide half of the so

Â 22,050 divided by the lowest frequency.

Â So 64.

Â That would be the maximum number of harmonics that we would have if we really

Â would have all the harmonics in the lowest note.

Â So no need for 100.

Â Let's say 60 would be plenty.

Â And here, now is where we have to choose a range that includes all this melody.

Â So we said that the lowest frequency was around 340,

Â let's make it safer, so let's make it 300.

Â And the highest was above 450 or something, so

Â let's make it quite a bit higher.

Â Just in case let's make it 500.

Â Okay?

Â 10:54

And this is an error threshhold that will be

Â quite relevant now for identifying the fundamental frequency, but

Â let's just leave it as it is now, and see if we have to change it later.

Â Okay. So, now we'll compute it.

Â So again, this will take a little bit of time.

Â 11:13

Okay, so this is the harmonics it has obtained and that looks pretty good.

Â It found quite a bit of harmonics, of course in the transitions,

Â that's where the problems, or at least the little deviations occurs.

Â If we just zoom into just one transition, let's say.

Â So this is where the harmonics of course get lost and they are picked up again.

Â If we listen, well, let's plot it again to the regional.

Â And if we listen to the synthesize.

Â [MUSIC]

Â Okay, that's pretty good.

Â Now in terms of this error threshold of the algorithm.

Â If we make it more restrictive, so

Â that means that unless it's below a certain error it will not be accepted.

Â We might see, then, that some of these areas, it does not find the fundamentals.

Â So instead of seven, let's put, for example, two, and let's see what it finds.

Â 12:26

Okay, so now we see the result and

Â we clearly see that in the transitions there are gaps and

Â this is because, in the transitions, the fundamental is not very clear.

Â We are in a kind of attack, noisy attack.

Â So it has lost a little bit of the transitions, and if we listen to that,

Â in fact, we're going to listen to these gaps.

Â [MUSIC]

Â Okay, so there are gaps in the transitions because that's where the areas that it

Â didn't find the fundamental and therefore it didn't find any harmonics.

Â And that's basically all I wanted to say.

Â So let's go back to the slides and well,

Â we have used the SMS tools GUI in order

Â to analyze this cello fragment.

Â And we have used the short time Fourier transform, the sinusoidal model,

Â and now the harmony model, to see this phrase and to analyze the harmonics.

Â And we can see that by tweaking the parameters, we can get quite

Â a bit of difference in the way that these harmonics are analyzed.

Â So, that's all and this is all for the demo classes of this harmonic model week.

Â So hopefully this has given you a view of

Â how the harmonic model can actually be used in practice.

Â And still, it's not ideal.

Â There is some parts of the sound,

Â especially like in this sound that we just heard in the attacks,

Â that we lose a little bit of the sound that is present there.

Â So the next week, we will extend the idea of the sinusoidal and

Â harmonic model to include that aspect,

Â to include what we will call the residual or the stochastic component.

Â Hopefully, that will allow us to generalize our models and

Â to be able to handle many more types of sound.

Â So I will see you in the next class.

Â Thank you very much.

Â