[MUSIC] All right, so now we know what sound is. We know how we measure sound and also we have a good understanding of how we perceive sound. This is really useful, specifically if we want to use audio in virtual reality environments. And we've thought a lot about direction, and frequency, and amplitude, and how these different things can combine with reverberation to create an audio scene. But how do we store digital audio In a computer? How do we take audio and make it into something that we can manipulate in a virtual reality environment? We do this through a process called sampling. Now in order to sample any acoustic signal, we can use a microphone. So a microphone has a filament of metal suspended in a electromagnetic field. When pressure waves pass over this metal, it makes it vibrate. The signals vary the electromagnetic field and are passed down a cable into a computer. At the computer there's something called an analog to digital converter which takes samples at a regular rate. These samples are recordings of amplitude variations in that electro magnetic field. So in this way, you end up with a series of amplitude measurements and we store them as a list of numbers, just a list of numbers on a computer. Now the number of samples that we take every second is what we call the sampling rate, or the sampling frequency, and it's quite an important number. In most cases, digital audio systems store 44,100 amplitudes every second. So each one is a separate record of the amplitude at that moment, and there's quite a lot of them. This allows digital audio systems to represent sounds at a range from 20 Hertz to 20,000 Hertz. And if you remember from what I was saying earlier, that means it can record any frequency that the human ear can hear. Take a look at this diagram. You can see that there's a red line and a black dotted line. The red line is the audio waveform itself. The black line is the way we are sampling that audio waveform. Now you can see that the dotted line represents a wave that is slower than the red line. What I mean by slower is, it's lower in frequency. There's a greater distance between the peaks. However, it's produced this sound by taking points along the red line. This is a problem known as aliasing, and this happens when your sampling rate is not high enough to record the frequency that you need to record. In order to avoid this problem, you need to take roughly two individual samples for that very highest frequency that you want to record. This is why the sampling rate is normally 44,000 or thereabouts. It's because it's roughly twice the top end of the human hearing range of 20,000 Hertz. Each sample contains only the amplitude values. And we derive all the frequency content from just these amplitude values alone. But how do we record these amplitude values? How big are they? What sort of range are they in? In general, this number varies. But most of the time, it's in 16-bit resolution. What this means is, there are 65,536 potential amplitude values in every single sample. These are all separate steps. So in order to stop these discrete steps from sounding distorted, what we need to do is interpolate. That means when we produce the sound we make up the difference between these individual steps to create a smooth sound on the way out. Generally speaking, there's quite a lot of fancy engineering and a lot of expertise that's gone into doing what we call digital to analog conversion. But for the moment all you need to know is, it's taken care of by your sound card. So in this diagram you can see a series of discrete samples that represent a waveform. The blue dots are the samples themselves, and the red line is the waveform. You'll notice on the left-hand side of the diagram that we have 16 independent values. Each one is a step which allows us to record the amplitude at that value. Now that is not an amplitude value in itself. It is the bit where we place the value. And the resolution is actually, as I've said, much much higher. We've explored frequency and how that's represented on a computer and we've also looked at amplitude and how that's represented on a computer, so we're all good. But what about direction? How do we represent direction on the computer? Well, normally we do this by using multiple channels. Most commonly, two separate channels are used, and these two channels have different amplitudes. Now these dont actually contain interaural amplitude difference. They just contain a synthetic version of something like that, which helps our ear to be fooled into thinking that the sound is either moving or in a particular place. So it's been in use for decades, and we call this system stereo. Now, you may have heard of stereo, you probably use it all the time. It's becoming less and less common. We still use it in music listening, but in VR it's almost not used at all. It maybe that we use stereo sound files in VR. But generally speaking, people will use individual mono sounds and then they will pan them or position them using a system that is more common, called the head-related transfer function. So stereo doesn't really work in VR that well. We need something that will encourage the interaural time differences and the interaural amplitude differences. And these days, it's more common to use something called the head-related transfer function. Almost all VR platforms now use this. It doesn't matter whether it's PlayStation VR, Oculus Rift or HTC Vive. Positioning is calculated by trying to work out what the interaural time and amplitude differences would be for a dummy head in a particular position. Now, there's a lot of complex maths behind how this happens, and I'm not going to try and wave it away. But it is something which I think even if I explained to you now, you wouldn't be able to understand that easily. What I can say is, you don't really need to understand it anymore. Because it comes as standard In almost every VR authoring platform including Unity. So if you position a sound in Unity, it's going to use the HRTF to allow you to position a sound anywhere in space relative to a particular head, and we'll talk more about that later on. It's also very important that we understand how digital audio is stored. Even in Unity we might find that we want to manipulate the digital audio to change it in ways that aren't possible in the interface. In order to do that you have to be able to access it and ten make specific edits and program particular routines to do processing. So it's good to know how this works. Digital audio is just a huge list of numbers, as I've said. And it's stored in a list on a hard drive, normally in something called an array. It takes up space in memory though, either in RAM or on hard disk. As I've said, it's very important to know which one it's in. If it's in RAM you can play it back really fast to manipulate it very fast because it's right there and it's available to the central processing unit. If it's on the hard drive, it's going to take much longer to process and it's going to slow you down. Manipulating data stored on hard drive is much slower than manipulating data stored in RAM. Also, it takes up space, and you need to be aware of this. So a minute of stereo, 16-bit, 44,100 audio, takes up around 10.1 megs. It's not much, but it can add up. If you load up a whole bunch of sound beds and sound effects, you might certainly find you're running out of memory. With HRTF audio, sounds are usually mono so they take up less sound and also they get spacialized by the HRTF algorithms. And lots of processing happens in real time. So in general, because there's a lot more real time processing, the CPU is doing more stuff in RAM. [MUSIC]