0:00

In this section we'll be addressing how information theoretic ideas can help us

Â to understand how the neural code may be specially adapted to the structure of

Â natural signals. We'll briefly first look at some of the

Â special properties of natural inputs. And then some theories of how code should

Â behave. Finally we'll sum up with some

Â suggestions from the principles that may be at work in shaping the neural code.

Â So I'm going to show you some photos that we taken by one of our Post-Docs, Fred

Â Sue, as he was sitting in his apartment on one of our typical sunny Seattle

Â afternoons, looking out at the view. He tried to take a picture that both

Â encompassed his beautifully furnished apartment and the grand view outside.

Â You can see that he had to change his f stop over a wide range in order to be

Â able to capture information both about the scene inside and about the world

Â outside. Now this is something that our eye does

Â effortlessly. If you were sitting here at this table,

Â you would be able to see both the inside and the outside with perfect fidelity.

Â So looking even at this familiar example, we can see two properties that are

Â characteristic of natural inputs. One is that there's a huge dynamic range.

Â There are variations in light level and contrast that range over orders of

Â magnitude. We can see signs of another property by

Â comparing these two boxes. Because of effects of depth and

Â perspective, there's similar structure, similarly well defined shapes and objects

Â at very different length scales. This is reflected in the power spectrum

Â of natural images. If one computes the power in different

Â spatial frequency components this function has a, this function has a power

Â log form. That is it scales, like the frequency, to

Â the power minus two. This reflects the lack of any

Â characteristic scale. The similar structure are not.

Â Despite these scale differences and the very large variations in light and

Â contrast across the image, we'd like to be able to distinguish detail at every

Â point in it. Unlike this camera.

Â These basic issues arise for almost all of our senses.

Â Here's an audio track of a chunk of speech.

Â The signal is full of complex fluctuations that carry detailed

Â information about pitch and nuance. However, these fast variations are

Â modulated by the relatively huge variations in amplitude that make up the

Â envelope of speech. We're perfectly capable of understanding

Â all of these signal components regardless of the overall amplitude, even when there

Â are multiple speakers, or they're far away.

Â So how can a neural system, with a limited range of responses, manage to

Â convey the relevant information about details in the face of these huge

Â variations of scale? We found that the entropy, we found that

Â the entropy, reached it's maximum, when it was of the form of these two symbols

Â were used equally often. Now if we're thinking about maximizing

Â the mutual information. We also have to take into account this

Â noise term. But generally the amount of noise for a

Â given stimulus may not be something that's easily controlled.

Â While the total response entropy is something that's in the hands of the

Â coder. Let's see how.

Â Let's imagine that the stimulus that a system needs to encode Is varying in

Â time, this is s of t, it has some distribution, p of s over here.

Â Our job as an encoder is to map the stimulus onto the symbols that we have at

Â our disposal. Let's imagine that we're constrained to

Â use some maximal firing rate, so we have some limited range of possible symbols at

Â our disposal, say zero to 20 hertz. How should we organize that mapping so

Â that we end up with the most efficient code?

Â We'll get the most information by maximizing our output entropy.

Â That is, by using all of our symbols about equally often.

Â So what does that imply for the shape of this curve?

Â So what we should do is move along our stimulus distribution and encode equal

Â shares of that distribution with each symbol.

Â If we have 20 symbols lets count up 1 20th of our total area under this curve,

Â and assign that to symbol one. What this amounts to is a response curve

Â that's given by the cumulative integral of the stimulus distribution.

Â Another name for this is histogram equalization.

Â 4:26

So this implies that for a good coding system, its input output function, this

Â function here, should be determined by the distribution of natural inputs.

Â So here's a classic study in which this idea was tested directly.

Â In the early 1980's, Simon Laughlin went out into the fields with a camera, and

Â measured the typical contrasts, that is deviations in the light level, divided by

Â the mean light level, that would be experienced in the natural world, for

Â example, by a fly. So, that's this distribution here.

Â If the response does indeed follow the distribution of natural inputs...

Â Then the response curve, here, should look like the cumultive probability

Â determined by integrating p of c. And in fact, that's a very good match to

Â what he did actually observe in the response properties of the fly large

Â mono-polar cells, the neurons that integrate signals from the fly's

Â photo-receptors. Now, a study like this poses a challenge.

Â While it makes sense that our sensory systems would, over evolution or

Â development, set up response codes that are adjusted to natural input statistics.

Â It seems that much more work is needed to handle the problems posed by this huge

Â natural variation, that stimuli take as one moves from indoors to outdoors or

Â even moves one's eyes around a room. The contrast distribution is varying

Â widely. Might sensory systems rather adjust

Â themselves on much shorter timescales to take these statistical variations into

Â account. So let's take a patch of the image, and

Â look at the, the variations in contrast in that image.

Â Here for example, that contrast distribution might take, might be narrow

Â like this. Wheras over here, it might be much

Â broader. What our code should do is take the

Â widths of these distributions into account in setting up a local.

Â Input, output curve, that accommodates this structure of the, currently measured

Â statistics of the input. So that's the question that we tested

Â here, in the h1 neuron. In this experiment, we took a white-noise

Â input, of the type that you used in the problem sets, so some s of t.

Â Looks like that. And we multiplied it by some time

Â varying, slowing time varying envelope. Call that sigma of t.

Â And that's what you see here. So we repeated the same sigma of t.

Â This is a 90 second long chunk of stimulus.

Â Repeated the same sigma of t. In every trial, but we changed the

Â specific white noise. Stimulus.

Â And that allowed us to pick out spikes that occurred at different time points

Â throughout this presentation of, of sigma of t, where in every trial the cell would

Â have seen a different specific stimulus. And to calculate the input output

Â function described by those spikes, in those different, in those different

Â windows of time. So now one, when one analyzes spikes

Â across these different windows, and pulls out their input output function using the

Â methods that we talked about in week two, one finds that for example, here in this

Â window, one gets a very broad input and output curve.

Â Where, when the stimulus is varying very little, one finds a very sharp input and

Â output curve. Now, it turns out that if one normalizes

Â the stimulus by its standard deviation, or by this envelope sigma of t, all of

Â these curves collapse onto the same curve.

Â What that says is that the code has the freedom to stretch its input access such

Â that it's accommodating these variations in the overall scale of the stimulus.

Â And it's able to do that in real time as this envelope is varying.

Â This is being seen in several other systems, including the retna and the

Â auditory system. But here's an example from rat barrel

Â cortex. This is somatosensory cortex of the rat.

Â In particular. The part that encodes the vibrations of

Â whiskers. So, from extracellular in vivo recordings

Â of responses to whisker motion, whiskers were stimulated with a velocity signal

Â again, s of t, that looked like this. So this is a slightly simpler experiment.

Â The standard deviation was varied between two different values.

Â And now one can pull out spikes that are generated in these two epochs that

Â presentation. The high variance case and the low

Â variance case. And one can compute input output curves

Â for spikes that occurred under these two different conditions.

Â 9:00

So in the low-variance case, one sees this input output curve, in the

Â high-variance case, one sees this input output curve.

Â And hopefully you won't be surprised that if I now divide the stimulus.

Â By its standard deviation, we now see a common curve.

Â So now we see again that this input output curve has the freedom to stretch

Â itself such that its able to encode stimuli in their natural dynamic range.

Â So what I've shown you is that as one changes the characteristics of the

Â stimulus. In this case, in the cases we've talked

Â about, by changing its overall amplitude, changes can occur in the input output

Â function. So here we've found that if a stimulus

Â say, took on this dynamic range, it might be encoded with an input output curve

Â like that. Now you should be able to see that if one

Â increased the range of the stimulus and stayed with that same input output curve.

Â Most of the time, your stimuli would be giving responses that were even zero or

Â at saturation point. Similarly, if you now decrease the range

Â of the stimulus you'd be hovering at the central part of the curve.

Â So, ideally one would like to use one's entire dynamic brain by defined by this

Â input output curve. And so, one would like to match it to the

Â range of the stimulus. And that's exactly what we saw in the

Â experiments. Now this adaptive representation of

Â information is not confined to change us in the input output function.

Â It's also been seen that changes can happen in the feature as the statistics

Â of the inputs are changed. The feature that's selected by a neural

Â system can also adapt to changes in the stimulus statistics.

Â And information theory has also been used to explain the way in which this occurs.

Â For example it's been used to explain how the spatial filtering properties of

Â neurons in retina, and in LGN change with light level.

Â Joe Addick and his colleagues pose the following question: If we consider that

Â the retina imposes a linear transfer function, or a filter on its inputs,

Â what's the shape of that filter that maximizes information transmission

Â through the retina? The solution turns out to depend on two

Â things. The powers spectrum of natural images and

Â the signal-to-noise ratio. At high light levels, or high signal to

Â noise, one would predict a filter shape like the one we've seen already, the

Â Mexican hat shape. This acts like a differentiator, looking

Â for edges of the stimulus, but at low light levels, the predicted optimal

Â filter is integrating, and simply averages its inputs to reduce noise.

Â And indeed in retinal receptive fields it's seen that the surround becomes

Â weaker at low-light levels and the center braoder which qualatatively matches these

Â predictions. We can also use information theory to

Â find out what it is about a stimulus that drives a neuron to fire.

Â We looked at this method in week two. In this case, this is called the, the

Â method of maximally informative dimensions.

Â One can choose a filter, so one can extract from the stimulus some component

Â that maximizes the Colbeck-Libler Divergence between the spike conditional

Â and the prior distributions. This turns out to be equivalent to

Â maximizing the information that the spike provides about the stimulus.

Â One can use this method to search for the optimal feature that explains the coating

Â properties of a system. When it's being presented with stimuli of

Â a particular distribution. Distribution.

Â So for example if one initially starts with a Gaussian white noise distribution,

Â that's a Gaussian, that's vertical Gaussian, in this, in this

Â representation. One might find a particular feature.

Â But now if one changes the distribution to say natural images, which will have

Â some very different distribution. The filter that maximizes the, the

Â information between spike and stimulus maybe different and that's being shown to

Â be the case for cortical receptive fields among other systems.

Â 13:01

So, finish up by discussing briefly an influential idea that Ragesh mentioned in

Â the first lecture. That might explain my cortical receptor

Â fields have the shape that they do. Many years ago, Horace Barlow proposed

Â that because because spikes are expensive, neural should be trying to

Â encode systems as efficiently as possible.

Â What does this mean for a popular of neurons?

Â If you consider the joint distribution of the responses of many neurons, here lets

Â just take two. Maximizing their entropy should imply

Â that they code independently. That is their joint distribution should

Â factor into the product of the two marginal distributions.

Â This is a strategy that would maximize their entropy.

Â Why is that? Because the entropy of a joint

Â distribution is always less then or equal to the entropy of the distributions of

Â the marginals added together. So this idea is known as redundancy

Â reduction. The neural system should be optimized to

Â perform as independently as possible. However in the past years, it's been

Â realized that correlations between neurons can have some advantages.

Â For one. Having many neurons that encode the same

Â thing may allow for error correction and more robust coding.

Â It's also been realized that correlations can actually help discrimination, and

Â indeed, neurons in the retina have been observed to be redundant.

Â That is, that their joint distribution is very different from the product of

Â independent distribution. More recently, Barlow proposed a new

Â idea, that neuron populations should be as sparse as possible.

Â That is that their coding properties should be organized so that as few

Â neurons as possible are firing at any time.

Â 14:38

This idea was developed formally by Olshausen and Field, and also Bell and

Â Sejnowski. Here's the idea.

Â Let's say that one can write down a set of basis functions, phi i, with which to

Â reconstruct a natural scene. Then any image can be expressed as a

Â weighted sum, with coefficients ai over these basis functions with perhaps the

Â addition of some noise. Now this basis function should be chosen

Â so that as few coefficients ai as possible are needed in general to

Â represent an image. This is carried out by minimizing a

Â function that includes the reconstruction error.

Â So here, the root mean squared difference between the reconstructed image and the

Â image itself. So that one gets a good match to the

Â images, but that also includes a cost term, whose role, whose role is to count

Â how many coefficients are needed, so one simple choice of this cost function, is

Â just the absolute value of these coefficients.

Â [INAUDIBLE]. The coefficient lambda, weights the

Â strength of that constraint. The job of this term is to penalize

Â solutions that require too many basis functions to represent an image.

Â Too many coefficients ai, that are, that are different from zero.

Â A fourier basis for instance, represents the images as a sum of signs and cosines.

Â While the fourier basis is guaranteed to be able to represent any image.

Â One might already be able to guess that coding with such a basis is not sparse.

Â Because, as you probably recall, the power spectrum is broad, which means,

Â that many coefficients are needed. When one runs an algorithm to find the

Â best basis functions, the best values of phi i, for natural images, one finds,

Â instead, a set of functions that look like this, like localized oriented

Â features, like those that we see in v one.

Â So this implies that when we view an image using neuronal receptive fields

Â that look like this, this excites on average a minimal number of neurons.

Â This is called a sparse code. So we've touched upon several different

Â ideas about coding principles. The idea of coding efficiency, that

Â neural codes should represent input stimuli as efficiently as possible.

Â We've seen that this implies adaptation to stimulus statistics.

Â As one changes the statistics of the stimulus, one should see aspects of the

Â coding model changing to ensure that it remains efficient.

Â We've also brought up the idea of sparseness.

Â That it would be ideal if the neural code needed as few neurons as possible to

Â represent its input. And this brings us to the end of our

Â discussion of coding. I've shown you some classic and state of

Â the art methods for predicting how stimuli are encoded in spikes.

Â We've seen models for decoding stimuli from neural.

Â Responses. We've discussed information theory and

Â how it's used to evaluate coding schemes, and we've taken a very quick glance at

Â how coding strategies might be shaped by the statistics of natural inputs.

Â There's a lot that we've missed. In particular, let's just go through the,

Â the typical cycle of behavior of an organism.

Â Where we have invested some time is the idea, that we go from complex

Â environments, animals extract some features from that environment to solve

Â problems, and that's represented in neural activity.

Â What the brain is then doing is extracting that information and

Â synthesizing it to drive decisions. We talked about some examples of using

Â maximum likelihood methods that might in fact have neural implementation.

Â These decisions then generate motor activity which drives behavior.

Â Muscles work together to perform actions that drive behavior output.

Â And these actions effect subsequent sensation.

Â So, we didn't really address any of this part of the, of the behavioral feedback

Â loop. Next week, we'll be moving onto a new

Â topic. Rather than handling data analysis, we'll

Â be moving more into the realm of modeling.

Â And we'll start that with a brief introduction to the bio-physics of

Â coding. How do single neurons generate action

Â potential. We'll talk about neuronal excitability.

Â And we'll close up with some simplified models that capture neuronal firing

Â before moving on to the second part of the course where you'll be learning about

Â network modeling. So that's all for this week.

Â Looking forward to seeing you next week.

Â