0:36

So we'll first present the idea of the music information plane.

Â And then we will distinguish between sounds and music with the aim of

Â developing methodologies that are either relevant to the more generic concept of

Â sound, or are more specific for the characteristics of music.

Â So we'll first talk about sounds and sound recordings and

Â then collections of sound recordings.

Â And then we'll continue with the idea of music recordings and

Â collections of music recordings.

Â 1:24

So we can dither different abstraction levels and

Â the left column are these different abstraction levels.

Â And we can go from the physical level,

Â which is basically the lowest level that we're dealing with.

Â And we go up to the cognitive level, okay.

Â So that would be the highest level that we can see there and some steps in-between.

Â So at the physical level, when we talk about sounds and music,

Â we can talk about concepts like the frequency or

Â the duration of the sound, or the spectrum and

Â some clear characteristics of the spectrum like the centroid.

Â And also we can talk about intensity of the sound.

Â If we go a level higher, a sensorial level,

Â then instead of frequency we can talk about the pitch of the sound.

Â And then in search of duration, we can talk about times,

Â the sensorial time of the duration.

Â And then instead of taking about spectrum we can talk about timbre

Â which is a sensorial concept.

Â And finally in terms of

Â intensity now we talk about loudness which again is a sensorial concept.

Â And we can go a level higher and talk about

Â perceptual topics, or perceptual concepts that are more musical,

Â that are related to musical concepts.

Â So in here, we talk about successive and

Â simultaneous intervals of pitches, what will be called nodes.

Â And then, we talk about time.

Â We talk about structuring of time, and we talk about things like the beat.

Â And then the timbre we talk about aspects of the timbre that we can identify and

Â characterize with some aspect of a musical sound.

Â And for example, the spectral envelope would be that.

Â And then finally instead of loudness,

Â when we talk about musical loudness we normally refer to dynamics.

Â And we have vocabulary that talks about the dynamics of musical sounds.

Â 3:47

And we can still go a level higher and go towards the more

Â formalized way of talking about musical concepts.

Â And therefore, when we talk about pitch-related concepts,

Â we talk about things like melody or key or tonality.

Â When we talk about timing related concepts, so

Â we talk about rhythmic patterns, we talk about tempo, we talk about meter.

Â And when we talk about spectral timbre characteristics, we identify musical

Â instruments or the voices, entities that have a characteristic timbre.

Â And then finally when we talk about dynamics or loudness we are interested

Â in direct articulations of the sounds and how sounds change from one to another.

Â And finally we can reach the highest level, the cognitive level,

Â the level that relates to as humans in a very subjective way.

Â And how we listen to music and

Â 4:53

what issues are relevant for us in the interaction with music.

Â So at this level the columns are not anymore valid.

Â There is interaction between all these different concepts.

Â And then we can talk about the emotion or the musical style or

Â semantic concepts that clearly integrate all these other levels of

Â descriptions to obtain these concepts.

Â These ways of describing music that are clearly more generic and

Â definitely are subjective or cultural.

Â And that would be a level that definitely would be hard to reach and

Â in this class we are definitely not going to talk much about that.

Â 5:55

If you want to describe sounds in a generic way,

Â sounds like the ones we find in free sound, we can group audio features.

Â The audio features that we talked about in the last lecture and

Â we can group them in different categories.

Â So we can talk about the timbre related features and

Â we mentioned quite a few of them, like the spectral centroid,

Â or the MFCCs, or the high-frequency content, etc.

Â Then we can talk about another group of features that relate with dynamics.

Â And that's basically the loudness and the level of a particular recording,

Â and then we can talk about the pitch related features.

Â 6:43

And here is where we can talk about the pitch or the pitch salience,

Â and finally we have to describe also the time varying aspects of it.

Â Aspects of a sound that relate to the evolution of the sound to the texture

Â of the sound, and this we can group them under the term morphological features.

Â And here we can talk about things like the envelope of a sound or the onset rate or

Â many other type of descriptions that we could include under this.

Â 7:18

We already have seen quite a few of these descriptors, so

Â what is interesting now is from these descriptors from these features

Â that we can analyze, we can now talk about collections of sounds.

Â So let's talk about how to describe collections of sounds, and clearly

Â there are many ways that we can analyze a collection of sounds and describe it.

Â And we'll focus on three basic concepts.

Â The first one and the most important concept that we need to develop is

Â the idea of similarity If you want to talk about collection of sounds,

Â we have to talk about the similarity between these sounds so

Â we can form the year of collections and groups them.

Â Once we can talk about similarity then we can cluster sounds.

Â We can groups sounds according to some criteria.

Â And finally, if we know some classes, some existing labels

Â that we use to describe a particular group of sounds, then we can classify sounds.

Â We can assign classes to particular sounds.

Â 8:40

In order to properly describe a sound, we have to use many features.

Â But for simplicity, we will be taking only, in this case, two features.

Â So if you consider a sound as represented by two features,

Â we can display a sound, a set point in a two dimensional space.

Â And that's what we're seeing here.

Â Every feature is one dimension.

Â So here, we're showing two audio features.

Â The horizontal line is the mean of the spectral centroid.

Â So we have analyze notes of three instruments, a violin, flute, and trumpet.

Â We have computed the spectral centroid and we have taken the mean of it.

Â So this is a multi-frame feature.

Â And also we have done the mean of one of the MFCC coefficients,

Â the second coefficient.

Â 9:37

So that's the mean of the second coefficient in the vertical line.

Â And we can see that the violin has a quite high value for

Â this coefficient, for the MFCC value, and

Â it has a centroid that it quite covers quite a bit of space.

Â The trumpet has this MFCC coefficient quite lower.

Â So these blue dots are more in the lower side.

Â And the flute sound is kind of in between and

Â also the MFCCs are in between.

Â So that we can kind of see that these types of sounds

Â are distinct according to these two features.

Â 10:31

Now, in order to play around with the space, the most fundamental

Â thing is to measure the distance between sounds, between points.

Â So we have to find a way in a multi dimensional space,

Â not just in this simple 2D space, how we compare 2 sounds.

Â How do we find the similarity between the two?

Â So Euclidian distance is one of the simplest ways to measure

Â the distance between two points in a multi dimension of space.

Â So in this case, p would correspond to one sound,

Â the collection of features of one sound.

Â And q would correspond to another sound,

Â the collection of feature values of the other sound.

Â And then, for every dimension i, we just take the distance between

Â those two values on that particular feature.

Â Then we square it and we sum over all

Â the features of the dimensions, and then we take the square root.

Â And that's the Euclidian distance.

Â In the case of 2D space of just 2 features, that becomes much simpler.

Â So in this case, the red and the blue are two sounds with two features.

Â And we can just measure this Euclidian distance, and it's basically the line

Â that separates these two points, the length of this line.

Â 12:12

Now that we know how to measure distance,

Â we can cluster sound.

Â K-means is a clustering algorithm.

Â If we give to the algorithm the desired number of clusters, it will create

Â the clusters and it will return the mean value of each of these clusters.

Â K-means clustering aims to partition and observations, so

Â the observations would correspond to the number of sounds in

Â two K-clusters, so into K-categories or groups of sounds.

Â 13:24

So this equation expresses these mean immunization process that we

Â have to go through in the K-means algorithm.

Â So the goal is to find the mu for every cluster, so

Â we have this K-cluster that minimizes this overall sum,

Â so we have to do it sort of holistically

Â of attaining this overall minimization result.

Â When here in the plot we see the three steps

Â in this process of obtaining these clusters.

Â On the left one, we start from a collection of points.

Â In fact, this is not sound features.

Â This is just random points in space.

Â And the goal is to cluster them according to two clusters.

Â So we're going to find two clusters.

Â 14:18

So we initialize the algorithm by putting

Â two points that will be used as the initial means of two clusters.

Â So the middle diagram, the red and the blue are the two initial means.

Â And these two initial means, this collection of samples of

Â sounds get clustered in the way that we see here with the red cluster and

Â the cyan cluster.

Â And now, with K-means, we iterate over this

Â minimization that, this equation that we have here.

Â And after a certain iteration, it converges,

Â and it converges to the clustering that we have on the right.

Â So it has clustered the red dots in the lower left corner and

Â the cyan dots in the upper right corner.

Â And clearly, this is a much better clustering than the initial

Â random clustering that the algorithm started with.

Â So now, with that, we can have collections of sounds and

Â automatically find classes that group sounds

Â that might have a similar audio features.

Â The last thing that we talked about for

Â describing sound collections is the classification of sound.

Â And that means that we know some classes.

Â We have identified certain categories of sounds.

Â And what we want to do is given a new sound,

Â we want to classify to one of these known classes.

Â So the K nearest neighbors classifier, KNN,

Â is an algorithm used for this type of classification.

Â And the rule of that we implement with KNN,

Â it classifies a sound by assigning to it the class

Â that is most frequent in the neighbors.

Â So we find K neighbors, and whatever is the majority vote of those neighbors

Â then becomes the class of this query or of this new sound.

Â So this block diagram exemplifies this process,

Â these set of rules that are implemented in the KNN algorithms.

Â So we start from a query, okay so that would correspond to a new sound, and

Â we are starting with target examples.

Â So we are starting with collection of samples, of sounds that have a label.

Â For example, in the diagram below, we have two such collections,

Â label collections, the blue and the red ones.

Â And the cyan dots are our query.

Â So we have to label or assign these query samples to one of these two collections.

Â So what we do is we measure the distance with the Euclidean distance.

Â We measure from every query sample to all the neighbors, okay?

Â And we take the K top results.

Â So we only look at the K nearest neighbors.

Â 17:51

And from those what we do is we take a majority vote

Â based on the classes they belong to.

Â So the last box is basically we know the classes that the neighbors belong to.

Â And we take the majority of the vote and we assign the class that is the majority.

Â So, on the right diagram, we see the result.

Â So the cyan dots have been assigned a color.

Â So some have been assigned the blue class,

Â and the rest have been assigned in the red class.

Â So this is a very simple but quite efficient way

Â to classify sounds or, of course, any other type of data in to classes.

Â If we now go to musical sounds, recordings of pieces of music.

Â The features to be analyze should be more specific and

Â more related to musical in meaningful concepts.

Â So let's start by defining some categories or features or

Â descriptors that are relevant musically.

Â So we can talk about timbre related descriptors.

Â And things that we mentioned like instrument characterization or

Â instrumentation characterization, or even the remixing of

Â musical recordings that is also an important feature of music.

Â 19:21

Then another category would be related to melody and harmony.

Â And that includes things like the phrase, the motive, or

Â the tonic of the piece of music and even if we talk about non-western music

Â traditions like Indian music tradition, we talk about raga, or

Â like in the Turkish music tradition, we talked about makam.

Â So, these are melodic concepts that can be described and

Â that are important to characterize particular piece of music.

Â Then we can talk about rhythm.

Â And then again we talk about patterns.

Â Or we can talk about tempo.

Â Or we can talk about beat.

Â 20:24

These descriptions cannot be obtained by just performing audio analysis.

Â We normally start from audio features, but then we have to develop models

Â from a combination of features that can capture the essence of each concept,

Â and clearly this is beyond the aim of this class.

Â And this is very much an open research area, very active and

Â that hopefully we'll be evolving through the years and

Â we'll be able to eventually do things like this.

Â 21:37

And then, the concept that we talk about sounds, also apply but

Â they have to be adapted here so similarities of fundamental concepts.

Â But then we have to divide or we can find different facets of the similarity and

Â we can talk about rhythmic similarity,

Â we can talk about similarity of the instrumentation,

Â of the melodic aspects or the harmonic aspects structural similarity.

Â And then, of course, we could combine them in order to find similar songs.

Â And these types of similarity are clearly not Euclidean distances.

Â We have to develop similarities that are much more sophisticated.

Â And then we can classify and

Â cluster these pieces of music according to different criteria.

Â The classification for example, can be classified according to genre,

Â or style, or artist, or the school that the music tradition comes from.

Â Again, this is much beyond what we can cover in this class.

Â That is a fascinating topic that is a natural continuation of

Â the kinds of things we talked about.

Â 23:17

And then for the more specific things that we have talked about,

Â you can look at the specific entries for Euclidean distance or for

Â the K-means clustering has a good entry in Wikipedia, or

Â the concept of classification base on K-nearest neighbors.

Â Of course, these are just two examples of different clustering and

Â classification strategies.

Â There's a lot of different strategies coming from the field of machine learning

Â that has brought many new possibilities to do these type of tasks.

Â And that's all, so in this lecture, we have opened the door into

Â a huge research field that aims at automatically describing and

Â organizing large collection of sounds and music recordings.

Â We just introduced some of the basic concepts and

Â specific methodologies that can be used to start working on this topic.

Â In the programming lectures,

Â we will show a little bit examples of how to actually do some of this.

Â That clearly we cannot make justice to this field of research.

Â However, I hope you got a taste of it.

Â And I will see you next class.

Â We will present some more demonstrations and practical examples of all this.

Â See you next time, bye bye.

Â