0:47

In order to take any sound that we might want to look at and represent it as a

Â series of sound waves. And and we'll talk about some

Â implications of this algorithm in terms of particularly two parameters of the

Â frame size and bin width. but need to think about very carefully as

Â we're configuring it, because they have some serious implications in terms of

Â what we get are zeroes. so it's, it's pretty obvious now that we

Â know how sound is represented digitally on a computer.

Â It's pretty obvious how a waveform representation like this comes about.

Â you know, we simply take the successive amplitude values, and, and we kind of

Â plot them over time on, on the x axis. and and then we have our wave form, we

Â can connect the dots if we want to to make it look a little nicer.

Â but how we get from this to that is not obvious, because you know, when we

Â represent sound digitally we're encoding a series of amplitude values over time.

Â We're not including any information about frequency at all.

Â So that's why we need to think about this a little bit more carefully and think

Â about how we get to this. So we're going to revisit the Fourier

Â Theorem which we looked at in the timbre video earlier.

Â I want to look at it in a little bit more depth now.

Â Just to recap we said the Fourier Theorem said that any periodic wave form.

Â Can be represented as a sum of sine waves at frequencies that are integer multiples

Â of a fundamental frequency. And we looked at examples of this with a

Â soft tooth wave and we looked at examples of the trombone sound.

Â Of how we could kind of combine these sine waves together.

Â I mean we wouldn't hear them anymore as individual sine waves, but we'd hear them

Â kind of coming together come possibly to, to create this, this single sound for us.

Â because of this special relationship they had to each other in terms of being

Â integer multiples in this, this based frequency.

Â and, and and because of the way that they were linked.

Â so at the time I, I also mentioned a really important limitation here.

Â This periodic limitation here. this only works for periodic wave forms

Â like a perfect sine wave or a perfect square wave or, or something like that.

Â And that isn't how sounds work in the real world really you know they're,

Â they're not perfectly periodic. They don't repeat a cycle infinitely over

Â and over and over again without any variation.

Â so that's, that's one problem is that, you know, we've gotten this, this spectra

Â aspect of, of timbre here, but not the envelope of timbre, not the changing in

Â time, aspect of it. The other problem here and this is

Â actually something I didn't write into, to, to, to, this text for the Fourier

Â Theorem at the time. Is that when we say that sum of sine

Â waves here there's an important caveat here.

Â Its a potentially infinite number of sine waves may be required to do this.

Â and computers don't tend to like infinity very much they're not continuous beings.

Â They're, they're, they're discrete, they do things as, as you know as sets of

Â zeros and ones. So if we need potentially infinite number

Â of sine waves to do this, that's also going to be really problematic for us.

Â and so what we do instead is, is we use this basic idea of the Fourier Theorem

Â but we, we tweak it a little bit, we kind of, we kind of fake it out if you will.

Â Of to, to pretend that we're working periodic waves, and, and we do process,

Â it doesn't do things perfectly. but doesn't use an infinite number of

Â sine waves either to make this happen. And so there's three stages to this that

Â I'm going to talk about in detail. windowing is when we take a wave form and

Â split it up into tiny little bits. Then, we take each of those tiny little

Â bits and we do this thing called Periodicization.

Â There's really nothing to this we just pretend that that little bit repeats

Â infinitely so that it, it's a periodic it is a periodic sample.

Â and then on each of those little windows we apply a method called the Fast Fourier

Â Transform which you'll often see abbreviated as FFT.

Â and so we, we apply this process in order to, convert our, time domain, set of

Â amplitudes values into a, a information about, frequency.

Â So, I'm going to go through each of these steps in more detail now.

Â the first step is wind [UNKNOWN]. so what we're we do is we divide the

Â audio into equal size, overlapping frames.

Â So, let me show you what I mean. We pick a number of samples that would be

Â included in each frame. So, like our frame size might be 1024

Â samples, for instance. So these are tiny frames.

Â So 1024 samples if our sampling rate were 44,100 hertz is about 140th of a second.

Â So tiny fractions of a second. And so if we were taking this waveform

Â and splitting it up we might have That be one.

Â And then we're going to overlap them with each other.

Â So that might be another, that might be another.

Â That might be another, that might be another, and so on and so forth.

Â Well, all the way through our file. But it's more complicated than this

Â actually, because these are overlapping and we want smooth transitions from one

Â to the next. As we're doing this, each of them kind of

Â fades in and fades out. So the first one I'm going to fade, in

Â fade out with an amplitude envelope like that.

Â This one we'll fade in and fade out too, this one we'll fade in and fade out too,

Â and so on. So there's always one that's kind of

Â fading in and always one that's kind of fading out with an overlap like that, and

Â so on and so forth. So that's what windowing is, we end up

Â with these windows that kind of fade in and fade out that are each a tiny

Â fraction of a second long. then we take each of those windows and,

Â this is the easy part, we pretend that it's a periodic function.

Â So we take this tiny little window here, and we repeat it and we repeat it and we

Â repeat it and we repeat and we repeat it and again, and again.

Â Okay, we just pretend that this goes on forever.

Â So now we've met the periodic requirement of the Fourier Theorem.

Â 7:13

The details of how this algorithm works are a little bit beyond the scope of this

Â course. I encourage you to look up some more

Â details. If you're interested, I'll point you

Â towards some references, but right now I just want to explain about, kind of

Â pretend that it's a black box. And explain kind of what goes in and what

Â goes out. Are these amplitude samples over time in

Â the frame. So if our frame size is 1,024, we'd have

Â 1,024 we'd have 1,024 amplitude values that would go in.

Â And what would come out are a set of amplitudes and, and phases for each

Â frequency bin. So in other words, I'm going to divide up

Â my frequency space into a series of linearly spaced bins.

Â And I'll get into more of how this works in a second.

Â And then I'm going to look at what's going on in each of those.

Â How much energy is there in each of those bins?

Â And also it's the phase. of, of, of the sine wave it's represented

Â by each of those bins. And so there's some simple ways to, to

Â calculate how the algorithm does this and my number of frequency bins is half of my

Â frame size. and and then the width between each of

Â these bins, so it's a you know, from one to the next to the next.

Â Is my Nyquist frequency, the highest frequency I can represent in my sampling

Â rate, divided by my, my number of bins. so let's work through an example here

Â just to make sure this is, this is totally clear.

Â so my frame size is 1,024 samples and my sampling rate is 44,100 Hertz then my

Â nyquist frequency would be 100 divided by two so 22,050.

Â So then my number of bins is the frame size, 1024 divided by two.

Â So that's 512, and my bin width is going to be my nyquist frequency that's 22,050

Â and that's Hertz just to be clear. 22,050 divided by my number of bins, 512

Â This comes out to about 43 Hertz. It's a little bit more than 43 Hertz.

Â So that means that my frequency bends are going to be spaced zero, 43, at 86, 129,

Â so on and so forth all the way up Is 22,050 Hertz.

Â so that's, that's how this stuff is divided up.

Â and and then I, I have information at that point about what's going on in each

Â of those, those frequency areas. And so you see can how it could generate

Â a sonogram from there. I could I could take each of these frames

Â and generated one vertical strip of frequency view in my sonogram.

Â based on that data that's coming back, and I'm going to show you how that works

Â in a second. but before I do that I want to talk about

Â some of the issues with this process, because it is not a perfect process.

Â First of all its a Lossy process, I lose data in this process.

Â If I do this fast Fourier Transform and then I go back to my wave form I've lost

Â something in the process. Because I've split these things up into

Â these linear frequency bins. So I only know whats happening with a

Â very low resolution, as they're moving up in frequency.

Â And I also only know things about a fairly low resolution in terms of time.

Â because I only know what's happening frame by frame by frame so 1,024 samples

Â in the example we've been using at a time.

Â and so there's actually a, a big trade-off here when I pick my frame size.

Â In terms of how much resolution do I want in a time domain versus how much do I

Â want in the frequency domain. If I want to know exactly when things are

Â happening in time along my x axis. I can pick a very low frame size.

Â So my frames are really tiny. So I get a lot of time resolution or

Â horizontally. But then my bin width gets huge.

Â And so I know very little about what's happening vertically in my frequency

Â dimension. if I want to know a lot vertically in my

Â frequency dimension, I can pick a really high frame size.

Â But then there's a lot of time that passes from one frame to the next to the

Â next. And so I, I lose a lot of resolution in

Â the horizontal in the time domain. I'll show you this in, in, in a demo in a

Â second. the one point I wanted to make first

Â before I go there is is this word here. These bins, the frequency space is

Â divided linearly. but if you remember from our, our module

Â on, our, our video on psychoacoustics, we actually listen we actually hear a, a

Â pitch not not linearly, but logarithmically.

Â And so a lot of these frequency bins are, kind of, wasted, if you will, on things

Â very high up in frequency space. so half of the bins are for what we would

Â hear as just the final octave of our frequency space.

Â so so this isn't a great match either, but that's how this particular algorhythm

Â works. so let me go ahead and and open this up

Â in Reaper for you. And what I want to show you, I'm going to

Â play a sound here. And I want to show you the sonogram for

Â it. and we have an option here to pick our

Â frame size. So, I'm going to show you how this is

Â going to start looking differently, that time versus frequency resolution

Â trade-off, as I pick different frame sizes.

Â 13:03

[MUSIC]. So at 1024 right now samples which is a,

Â a good compromise, but what if I want really, really, really good frequency

Â dimension. I may go over to 32,768 [MUSIC], and you

Â can see how much clearer the things are on the vertical dimensional in.

Â Its a lot less grainy I can tell exactly were things are happening vertically then

Â frequency dimension [MUSIC] but now it's kind of blurred in the horizontal

Â dimension. I don't get a very good sense of the

Â rhythm at all anymore. This is a very rhythmic sampling.

Â So if I went down to something really low, 16.

Â Now I've seen the rhythm very precisely. You can see all those peaks representing

Â every single note that's playing, but I see almost nothing.

Â To represent what's happening [MUSIC], in the vertical dimension.

Â It's all just kind of these bars, [MUSIC], that are, are going up and more

Â or less the same height as each other. I'm not seeing very much at all that

Â specifically where things are happening in frequency space.

Â so I just wanted to show that to you to illustrate that trade off, that decision

Â you have to make when you pick the frame size.

Â And that's why pretty much anything that's doing frequency domain analysis is

Â going to give you the option of the frame size.

Â