0:00

Welcome to the course on audio signal processing for musical applications.

Â This week, we are talking about the harmonic model.

Â And in this programming lecture, I want to talk about the implementation of it.

Â In particular, the first part of it.

Â The part that requires detecting the fundamental frequency so

Â that we can then identify the harmonics of a given sound.

Â So we will be talking about one particular algorithm, the two way mismatch algorithm,

Â that's an algorithm that we presented in the theory lecture and

Â it's a frequency the main algorithm that basically tries to identify

Â harmonic series, possible harmonic series that match the peak of spectrum.

Â So in this plot we see the measured peaks that we have identified.

Â And then we keep trying different predicted

Â fundamental frequencies and the harmonics of it.

Â And we measure the error,

Â we measure the distance between these two lists of values.

Â And we did that by measuring two errors, the predicted two measures.

Â So the difference between, the distance between the predicted and

Â the measured values.

Â And also we have another measure which is the measure to predict that error.

Â But let's go directly to the actual code.

Â 1:25

Okay, in the sms tools package, in the util functions file,

Â there is the code for a way we match algorithm.

Â The core of it is a function called two way mismatch In fact,

Â there is a C version and a Python version.

Â Now we will go through the Python version.

Â When we run it, we normally use the C version because it's more efficient.

Â So this algorithm, what it does,

Â is it receives the peaks, the frequencies, and magnitudes of the peaks.

Â It receives a list of candidates,

Â of frequencies of candidates of fundamental frequency.

Â And it basically identifies which is the candidates that has the smallest error.

Â 2:15

So it does that by measuring the two errors, the error predicted to measure

Â just this part and then measure to predict it, which is this part.

Â Within it it keeps identifying

Â all the distances between all the balance of the harmonic series and the peaks.

Â And it has different ways of comparing those.

Â We're not going to go into the tail of that, but of course,

Â feel free to go into it.

Â And then finally, it just creates an error array,

Â which is the list of errors of all the candidates, okay?

Â So we have, in the array, we have all the errors for every single candidate.

Â And then what we do is we choose the minimum of those errors, and

Â the fundamental frequency is going to be that candidate that has the minimum error.

Â 3:19

And then this function is wrapped by another one

Â that is the one responsible for generating the candidates and calling the function.

Â So this Fz, F02wn, receives

Â again the peaks of the spectrum, and then it receives the control parameters,

Â so like the maximum error allowed, this is the error that will be allowed for

Â the fundamental frequency to be accepted as such, and then the range

Â of the fundamental frequencies are from minimum and maximum at 0.

Â And then there is one value which is kind of a memory, a tracking value,

Â that is basically the fundamental frequency of the previous frame.

Â And this will allow us to refine the fundamental frequency

Â by restricting that the fundamental frequency should be as smooth as possible.

Â But the algorithm here is very simple.

Â In fact, it just takes the list of peaks that are within

Â the minimum and maximum value of the frequencies.

Â And the rest, it just makes a few more comparisons about that.

Â There's a lot of room for

Â improvement in this algorithm in the sense of generating more candidates so

Â that we do a more exhaustive trial of different frequencies,

Â but for efficiency reasons we made this simple implementation so

Â that allows us to compute this quite efficiently.

Â 4:56

Okay, so I wrote a little script that basically does

Â an analysis of a single spectrum and then it computes the errors of all candidates.

Â Okay, so in here, I have this little script that from a sound,

Â the subtle sound, it just computes one DFT.

Â And so here is, it computes one DFT of that particular sound.

Â It finds the peaks.

Â It finds the, interpolates the peaks and

Â then it generates possible candidates of the fundamental frequency

Â in a similar way than what we just saw but even more simply

Â in the sense that we are taking the candidates as all the peaks that lie

Â within the ranges that we specified.

Â And then it called the two way mismatch algorithm but

Â I modify the functions what I have it here in a way that,

Â instead of returning just one value.

Â The fundamental frequency that is the minimum error, it returns all the errors.

Â So it returns the array of all the errors for all the candidates,

Â so that we can look at them and see how they behave.

Â Okay, and then it prints, here it just prints the spectrum and the pixel.

Â You can understand a little bit what's going on.

Â Okay, so let's run this.

Â So let's run test.

Â 6:36

Okay, and this is the spectrum.

Â The magnitude spectrum and the peaks that we found.

Â So let's maybe zoom in a little bit.

Â So then we can see a little bit what is going on, okay.

Â So these are clearly the harmonics of the sound.

Â It has also found peaks like before the fundamental frequency and

Â one after the fundamental frequency.

Â But that's so that's what basically in

Â terms of the identified just the harmonics.

Â So now let's plot or let's print some of the intermediate values of all these.

Â So clearly the first thing is the candidates.

Â So if we print the f0c.

Â 7:26

This is the candidate and is going to be a pics that lie with in

Â the frequency of range we specified which was between 50 and 2000 hertz.

Â So the candidates are the first five pics and

Â if we print their frequencies so by doing ipfreq.

Â And f0 candidates, those are the frequencies

Â that lie within the frequencies range, and that we're going to test in the algorithm.

Â So we're going to test 166 hertz, 440, 637, etc, etc.

Â 8:05

So now let's print the errors that it returns.

Â So F0Errors, which is

Â the output of this algorithm will have the errors for every one of these values.

Â So for 166, we have an error 4.8, 440 has minor 0.13,

Â so clearly this is the smallest of all errors and

Â this is indeed the fundamental frequency.

Â The candidate that is the best one for, as a fundamental frequency.

Â So these error values are really misleading because they are not

Â bounded within a particular range.

Â It can even be negative, like in this case.

Â But clearly the larger the error, the less

Â 9:07

Okay, so this works quite well.

Â Now we can go into another

Â file that basically does this for the whole sound.

Â So we will be iterating for the whole sound.

Â We just doing the exact same thing.

Â We're taking the sotto sound and we are trying a different window.

Â We keep doing it and see if we can get a different type of result.

Â We take the FFT, we find the minimum and maximum and

Â we call a function F0 detection.

Â Which in fact is on the harmonic model file.

Â 9:50

In the harmonic model file there is this function

Â called F0 detection that does all what we talked about.

Â Basically accepting gifs from the input sound.

Â The sampling rate window FFT size and the values given by the user,

Â it iterates over the whole sound and it calls the DFT, the peak detection,

Â peak interpolation, the two-way mismatch algorithm and

Â then it decides which one repeat the base, the fundamental frequency.

Â That is if you sort of constraints to make sure that the fundamental

Â is stable in time, it related with this track that we talk about.

Â So but basically it returns just the fundamental frequency that's

Â considered to be the best, okay.

Â So let's now look at test1 and let's run it.

Â So let's run test1, okay and

Â now in fact we can just show the f0.

Â Okay so this is the values that it has returned,

Â while the hop size that is specified was quite large, 1,000 so

Â it is not that many samples, so that's easier to look at.

Â And okay, clearly there is not a perfect

Â fundamental frequency identify.

Â It kind of varies.

Â It goes from 439.9 to 440 something.

Â So in fact, if we plot this array,

Â we will see the variation that we will have here.

Â 11:45

let's get rid of these and now lets plot it again.

Â Okay, now we can zoom in to the very top, okay.

Â So clearly it moves around 440 and

Â these variations are caused by clearly by the errors

Â of the peak detection algorithm and the interpolation so

Â that we are not really exactly at 440 but,

Â of course, the error is very small.

Â It's less than 1 Hz error.

Â So, this is 440, and 440.5 and a little bit below.

Â So this is clearly a quite small deviation

Â from the nominal value that is 440.

Â If we change these values, we might get better results.

Â For example, instead of having the window being 1,001,

Â let's make it twice as much.

Â 12:53

And a 50 size, let's make it twice as much.

Â Okay, and, no, times 2, okay.

Â And now we will, so let's see that before we were 439.95, 440 something,

Â let's see if it does any different,

Â any different by looking now at these values, okay?

Â So this is what we got now and it's a little bit better.

Â So we can see the difference between these two values.

Â Now the error is smaller than before.

Â In fact, if we plot now this have zero.

Â Well there is the exponential to the minus,

Â to the minus 2, so clearly this is a very small,

Â a smaller error range than what we had before,

Â so the lowest now is 439.9 and

Â the highest value is 440.042.

Â So that means that as the window gets larger and

Â the FFT gets larger and the increases we will have better values.

Â Okay, now let's look at the real sound and let's finish

Â by running it on this over sound and basically do exactly the same thing.

Â So I can just run this test2, okay,

Â this will compute a fundamental frequency of this other sound.

Â And now if I plot this at 0,

Â 14:44

Okay and now we will have to zoom into the meaningful range and

Â well there is definitely also a variation, but

Â here there is both the variation that may be caused by errors.

Â And the variation that is clearly natural to the plane of the over sound.

Â So for example, this sound is clearly higher then 440,

Â so the over sound was played a little bit higher in

Â frequency than 440, so around 442.

Â And there is kind of a periodic oscillation that make sense

Â to be present in the sound and of course it might be sound of

Â this oscillation and maybe caused by some error but

Â this is a very interesting way to try to understand what is going on.

Â And both in terms of algorithm and in terms of the sound in terms of this other

Â sound and natural oscillations that may be caused by either the acoustics or

Â the performer that is playing this note.

Â 15:59

Okay, and that's all I wanted to say.

Â So basically, we have talked about the implementation

Â of the two-way mismatch algorithm and I think that has given us

Â a view on the issues of how to detect it from the mental frequency.

Â Of course, we have used Python and a number of these packages and

Â the implementations that we have in the SNS tools package.

Â 16:27

So that's all.

Â So this was the first programming class on this harmonic model week and

Â then on the next lecture we will then add the whole model.

Â And include these fundamental frequencies into a harmonic analysis,

Â and we'll be able to do both analysis and synthesis of sounds.

Â So thank you very much and I'll see you next lecture.

Â