0:00

Hello, welcome back to the course on Audio Signal Processing for Music Applications.

Â This week that we're talking about sound transformations,

Â in the demonstration lectures, we have been exemplifying the different

Â models that we have been talking about during the course.

Â And some transformations that can be done using those models.

Â For example, on the first class,

Â we talked about the short-time Fourier transform as it can be used for morphing.

Â On the second one, we talked about time scaling using the sinusoidal model.

Â And then on the last one,

Â we talked about how to do pitch changes using the harmonic plus stochastic model.

Â Now I want to go back to the idea of morphing, but using a different model.

Â 1:05

So in order to do the morphing, we first have to have a good analysis

Â of each of the sounds that we want to morph.

Â So let's start with the GUI, the SMS tools model of GUI.

Â And let's go to the harmonic plus stochastic.

Â So let's start with one of the sounds we're going to morph.

Â We're going to morph the violin sound with soprano sound.

Â So let's start with the violin sound, okay?

Â This is the sound.

Â [SOUND] So we have to choose quite a few parameters.

Â The blackman window is a good choice for this stable note and

Â the side lobes are quite low, so that's good.

Â Have to choose the window size and that's always requires some computing.

Â So B3 is around 246 hertz.

Â So in order to decide the window size,

Â we just take the number of beams of the window

Â which is 6 times the sampling rate 44100

Â divided by the frequency of this note.

Â So we need around 1,075 samples.

Â So let's put it here, 1075.

Â The window, the FFT size has to be larger, so let's make it quite large.

Â So we get all our observed padding, 4096 for example.

Â 2:50

In terms of the maximum number of harmonics, again,

Â we want quite a few, as many as we can.

Â So, 100 is also okay.

Â Now in terms of the range of the fundamental frequency,

Â we said that it was 246 fundamental so definitely this has to be below that.

Â If we do from 200 to 300, that should be okay.

Â And the f zero detection error, well, 7 should be fine.

Â The deviation, yeah, 0.1, it's quite a bit and it's fine.

Â And the stochastic approximation, yeah, let's not do too much of an approximation.

Â So we get the good quality of the residual, so let's 0.8, the maximum

Â would be 1 which would be whole magnitude of the spectrum, 0.8 is okay?

Â So let's compute it.

Â Now let's listen to the sinusoids.

Â [SOUND] Okay, this is fine,

Â the stochastic.

Â [NOISE] Okay, that's quite noisy but it's soft, it's okay.

Â I think we can manage that, okay?

Â And here we can see this representation.

Â We could try other parameters but let's leave it like that.

Â And now let's analyze the other sound.

Â Let's analyze the soprano sound, okay?

Â And this is an E4.

Â So now again, let's listen to that.

Â [SOUND] And let's give the blackman, the window doesn't have to be that large.

Â So let's choose the window size.

Â It's 6 times 44100 and

Â an E4 is around 330 Hertz, kind of.

Â So it's 330.0.

Â Okay, so doesn't need to be that large, the window, so let's I'll leave it as 801.

Â And FFT size, well, let's leave it at peak so that's good.

Â Magnitude threshold, -100, duration 0.5 is fine.

Â Let's keep the number of harmonics.

Â It has to be the same number of harmonics because we're going to be interpolating

Â the two of them.

Â So 100 is fine and now since the frequency was 330,

Â let's now, the voice has a vibrato so it will change quite a bit so

Â let's be safe and let's put from 250, for example, to 400.

Â Okay, and now we can just leave the same parameters,

Â the same error threshold for the f zero detection.

Â The same deviation and the stochastic factor of 0.8.

Â So let's compute that.

Â Okay, it's a little more difficult to analyze this sound because of the form,

Â there is some areas of the voice and there is not much.

Â 5:46

But let's listen to the result.

Â [SOUND] The sinusoids look good.

Â [NOISE] The stochastic sounds, okay, good.

Â And of course, the sound is fine.

Â Okay, now we are ready to go and

Â to do the actual morph between these two representations.

Â So let's close this and let's go to

Â the transformation directory and

Â let's type python transformations_GUI.

Â This is the interface for the transformations, so now we can go directly

Â to the HPS morph option, and in fact, the sounds

Â that we are going to morph are already the default ones so we will use those.

Â And now let's change the parameters to the ones we decided to use.

Â So we decided to use the size of the window for

Â violin 1075, and the FFT size, a big one, 4096.

Â The threshold -100.

Â The minimum duration of a trajectory, we decided 0.5.

Â And given that these frequencies around 246,

Â we decided to use from 200 to 300.

Â And now the error threshold we set 7.

Â And here maybe we can be a little bit more open and just say 0.05, okay?

Â And for sound two, the soprano.

Â We are going to use the same window, blackman.

Â We are going to use a smaller window because it's a higher pitch,

Â so 801 is fine.

Â And we decided to use the same FFT size.

Â And similar values for the rest.

Â So for the minimum, this is a higher frequency, so we need to,

Â it's 300 and something, so we need, yeah,

Â 250 to, there is no need for 100, 400 should be enough and

Â this 7 and this, okay?

Â So we can now analyze.

Â 8:28

Okay, so these are the two sounds.

Â Clearly on the violin, it found more harmonics than on the voice.

Â So that means that we are only going to be able to interpolate the harmonics

Â of the voice.

Â So now what the transformation will do,

Â it will be interpolating this as two sets of values.

Â And we have three ways to interpolate the set of values.

Â We can interpolate the frequencies of the harmonics.

Â We can interpolate the magnitude of the harmonics.

Â Or, we can interpolate the stochastic component.

Â So for example, let's just have the frequencies of sound 0 of the first sound.

Â So let's put that at time 0 we'll have the sound, the first sound,

Â which we'll refer as 0.

Â And at n, we also have the first sound.

Â So basically the frequencies are of the violin.

Â And the magnitudes, let's say, are of the voice.

Â So we'll put that at time 0, we'll have 1 which are the magnitudes of the voice.

Â And at 1, we'll also have that.

Â And for the stochastic, well, we can just put 5815 so

Â we can just put a time 0.5 in between and

Â at times 1, we will put 0.5.

Â Okay, let's see what happens.

Â Okay, so this is a result.

Â And the frequencies clearly look the spacing of the violin, but

Â the, let's see the magnitude,

Â we don't see them here because we don't see the magnitude of the lines.

Â But let's listen to that.

Â [NOISE] Yeah, so clearly it sounds what it is,

Â it sounds a little bit the magnitude of the voice,

Â but at the pitch of the violin.

Â Now, let's go from one to another.

Â So if we go from all the values of the violin

Â to all the values of the voice,

Â we can just do it by putting 0011, and

Â again here 0011, and here 0011, okay?

Â And let's apply it.

Â Okay, and here, clearly we see that it's going from one sound, and

Â here from the frequency we see that there is this kind of which is

Â because the pitch of the voice is higher than the pitch of the violin.

Â So let's listen that.

Â [NOISE] Okay, so clearly we see this evolution.

Â And of course, in these, we have an envelope that we

Â can specify any interpolation and in any time varying fashion.

Â So we could have quite sophisticated interpolation envelopes.

Â Clearly, this is very different from the short time for a transform that we did.

Â So okay, let's finish this.

Â And basically we have talked about a transformation,

Â the morphing, using the harmonic plus stochastic model.

Â That's within the SMS tools.

Â And clearly it's a different type of morphing.

Â It has different possibilities than the SDFT.

Â We can now interpolate basically every set of parameters.

Â And obtain any sound in between.

Â So even though we are using the same term, morphing, the model has a big

Â impact on the possibilities that the technique offers and

Â what we can do with this idea of interpolating between two sounds.

Â