13:47

And therefore, it has to be clearly distinguished from what

Â we would consider a normal spectrograms, quite different.

Â So here, for example, we see the first 12 coefficients.

Â We can choose a number of coefficients at 12

Â coefficients is a quite standard number that is used.

Â And in fact, the zero coefficient is not shown.

Â Normally, we do not display the zero coefficient because

Â that's relates to the loudness or to the energy of the sound.

Â And that we have other majors for that.

Â So we normally show starting from the first coefficient,

Â at until normally like I say 12 coefficients.

Â The first coefficient is the one that describes the bigger

Â picture of the spectrum, so the bigger overall shape and as we go higher up,

Â it describe more details, more small changes in the spectrum.

Â And so, this is normally used as a vector including all these

Â coefficients at every frame.

Â So we have very compact representation, just 11 or

Â 12 values, that can capture different aspects of the spectral shape.

Â Let's now talk about some features,

Â some descriptors that relate with the pitch information of a sound.

Â And the first one is idea of pitch salience.

Â 15:27

Pitch salience is a measure of the presence of a pitch sounds in signal.

Â This particular implementation of pitch salience that is available in Essentia

Â starts from the spectral peaks and we know about that.

Â And from then it computes the salience of all possible pitches present.

Â So it does it by summing the weighted energies found at

Â multiples of every particular peak.

Â So it tries to find the possible harmonics that are present of a particular peak.

Â And then it sums all that and it computes this pitch salience per every peak volume.

Â Here in this we see the magnetic spectrum, how we find the peaks.

Â We get the amplitude and frequencies of every peak and

Â then we have this pitch salience function which is a quite complex equation.

Â And here we just see a very overall picture of

Â overall equation of this computation.

Â But basically at every peak and for every amplitude of every peak,

Â we apply a waiting function that measures these multiples and

Â measures the energy of all these multiples of the fundamental frequency,

Â of our peak being considered a fundamental frequency.

Â And then it sums all together into S[b] is this salience

Â at every bin frequency that we are starting with.

Â So we are basically computing the salience of all possible

Â frequencies being considered as a fundamental frequency.

Â And this is the result that we obtain if we only take

Â the maximum salience at every particular frame.

Â So at every particular frame, we have many salient values,

Â but this idea of peak salience normally relates to how much

Â of a peak is present at a particular frame.

Â So by taking the maximum of it is a good measure of how probable,

Â let's say, there is a good pitch sound at every particular frame.

Â So this is an orchestral sound.

Â It is this Chinese orchestra that we have heard before.

Â And there are many instruments playing together.

Â Some are pitch sounds, some are percussive sounds.

Â So by looking at this function,

Â this pitch salience, we can sort of visualize and

Â estimate the presence of the pitch sounds in every frame.

Â And that can be quite useful to characterize quite a number of sounds.

Â And then let me talk about another type of feature that is also

Â related with pitch information and this is the chroma feature.

Â And in particular, we'll talk about the harmonic pitch class profile.

Â But chroma, which is a concept used in music perception and music theory,

Â is a concept that represents the inherent singularity of pitch organization.

Â The same pitch notes in different octaves have the same chroma.

Â So when we talk about pitch classes,

Â we refer to all the pitches that have the same chroma.

Â And the HPCP, the harmonic pitch class profile,

Â is a particular implementation of this idea of chroma features.

Â And it is a distribution of the signal energy across

Â a predefined set of pitch classes.

Â So the idea, and this equation shows that, again,

Â starts from the spectral peaks, A sub p.

Â And then by applying a function to that and

Â summing over all possible peaks, we can get a measure of the different

Â pitches that are present within a particular octave.

Â So the idea of chroma is that we fold everything into one octave and

Â we can divide the octave in 12 semi-tones or

Â in any other type of frequency quantization.

Â And this equation and this implementation basically

Â finds the pitches that have that particular chroma,

Â that have that particular, let's say, note name.

Â So this is an example of analyzing a sound

Â with the HPCP implementation available in asynthia.

Â This is the cello sound in which I played two notes, in fact, let's listen to that.

Â [MUSIC]

Â So in here, what we see is basically the pitches,

Â the pitch classes that are present in this fragment.

Â This is a fragment in which I play basically two strings,

Â a double string in which one is very stable, the low note.

Â And in fact the zero values that we see here,

Â the more red horizontal line relates to one of these very stable pitches,

Â which basically is the A sound that is always present.

Â And then what we see is the other pitches,

Â there is a very strong D sound that is also present throughout.

Â So we see it all throughout and

Â we see also the other notes a little bit, itâ€™s not very clear.

Â But it gives us an idea that there is some clear pitches.

Â And we might hearing it a little bit, we could get quite a distant

Â view of the pitch classes, not the absolute frequencies of the pitches, but

Â the pitch classes or the notes that are present in this recording.

Â Okay, now let's go to multiple frames so

Â features that require to be analyzed with multiple frames.

Â And let me give you just three examples of things that we could do with

Â multiple frames.

Â One is the idea of segmenting an audio recording and

Â identifying onsets, for example.

Â Another is to find the prominent pitch, and for the prominent pitch,

Â we need to see the continuation of the pitch.

Â And finally, the idea is that we can compute the statistics

Â of the single frame features but on a larger scale, on a fragment of a sound.

Â So the segmentation of a recording, and for

Â example, identification of the onsets,

Â can be obtained by calculating some spectral

Â features that measure the change in frequency content.

Â For example, the spectral flux,

Â which is a very common feature used in segmentation,

Â what it does, it compares two consecutive spectra.

Â And then it sums overall these differences.

Â This is basically the L1 norm of these differences.

Â And these can give a measure of the spectral variation, and

Â these can be an indication of where things are changing in the sound.

Â There are many implementations of this idea of a spectral flux.

Â And we can develop variations that can focus on a particular aspect.

Â For the particular case of identifying the onsets,

Â segmenting the sound by finding where are the beginning of an event or

Â a note to starts, there is a number of features that we can use,

Â in fact the spectral flux could be used for that.

Â But here, I have put another feature which is the high frequency content.

Â So what this descriptor does is Find the content,

Â the high-frequency content.

Â So how much of the high frequencies are present, and

Â then we compare with the previous one.

Â So in the case of identifying the onset, clearly an onset is

Â a part of the sound in which there is an increase of high frequencies.

Â Most attacks represent a higher presence of high frequencies.

Â So if we identify where we have an increasing presence

Â of high frequencies, we can detect where the onsets are.

Â