0:01

Hello, again. So now were are moving on to calculating

Â information in spike trains. And in this section of the lecture, we're

Â going to be talking about two methods one of which is how to compute information in

Â spike patterns. And the other is how to compute

Â information in single spikes. So let's go back to our, our grandma's

Â information recipe. So remember that we're calculating the

Â mutual information, which is the difference between the total response

Â entropy and the mean noise entropy. So, what was the strategy, we're going

Â to, test strategy. We're going to take a single stimulus S,

Â repeat it many times to obtain the probability of the responses given S, in

Â that response distribution, via the noise entropy.

Â We're going to repeat that for all s, and then average it over s.

Â Finally, we'll compute the probability of response, and from that the total

Â response entropy. So now, let's go ahead and compute

Â information in spike patterns. So far we've really only dealt with

Â single spikes or firing rates, so what we'd like to ask here is, what

Â information is carried by patterns of spikes?

Â By these interesting sequences of 0s and 1s that occur here in the code.

Â What this allows us to do is to analyze. Patterns of the code and to ask how

Â informative they are. So the way we're going to turn out our

Â spike train into, into a pattern code, is that we're going to chop up segments of

Â these responses, so we take our voltage train when we divided into time bins Of

Â size delta t. If there's a spike in, in that time, then

Â we'll put a one. If there's no spike, we'll put a zero.

Â And now we'll chunk up these zeros and ones into words of some length, big T.

Â So now that we, we've defined these binary words.

Â With the letter size delta t and length of T, we can now walk through our data.

Â So, so, here's a raster plot produced by a stimulus that was randomly chosen on

Â every trial. And so, if one converts such a raster

Â plot into sequences of zeros and ones, you can look through that and pull out

Â many, many examples of these words, again of length T and type in delta t.

Â 2:18

So now, one can form a distribution over these words.

Â So here, the most common word was silence, there was no spike in this set

Â of eight consecutive time bins, the next most common was that one spike appeared

Â and of course, we can have that one appearing at different locations

Â throughout the word. These are the next most common set of

Â words. Then one starts to get combinations of

Â spikes occurring at different locations throughout the word.

Â So now we can walk through our data and calculate these probabilities and then

Â calculate the entropy of that word distribution.

Â Now the information, is the difference between, that entropy, and the

Â variability, due to noise, averaged over stimuli.

Â So here was our total entropy. Here's how we're going to compute our

Â noise entropy. So, in this case, the same stimulus was

Â given every time, and now, what one sees, over many repetitions of that stimulus.

Â Is that on the first trial, you see a word, zero, zero, one, zero, zero, zero,

Â zero. On the next trial, you have the same

Â word, but now you see that there are some times when there was no spike, and some

Â times when that spike appeared in a different bin.

Â What that's going to do is generate a distribution of different wads.

Â Now that distribution is going to be considerably narrower than the total

Â distribution. And it's exactly this reduction in the

Â entropy from knowing nothing about the stimulus, to knowing something about the

Â stimulus that information will be capturing.

Â Alright, so let's go ahead and apply Grandma's recipe.

Â We'll take a stimulus sequence and repeat it many times, by how we're sampling the,

Â this probability of stimulus. We're going to use a bit of a trick,

Â which is that instead of averaging over all possible stimuli, we're going to take

Â a long random stimulus and average it over time.

Â 4:23

So, now, time is standing in, for the average over stimulus.

Â So now, for each time, in the repeated stimulus, we're going to get a set of

Â words, p of w, given stimulus at time t. And our noise entropy, our average noise

Â entropy, is now going to be averaged over those different time points, i.

Â So if we choose a length of repeated sequences long enough, that will allow us

Â to sample the noise entropy adequately. So let's have a look at the application

Â of this idea to data from the LGN in a classic paper by Pam Reinagel and Clay

Â Reid. They carried out this exact procedure, so

Â as you saw before they ran a random stimulus over many trials.

Â Then they ran a fixed stimulus, call it frozen white noise, which has some

Â structure, in fact here it is. It's the stimulus as a function of time

Â and you can see that in response to the stimulus spikes appeared in a time lock

Â sequence. And now for an averages across those

Â repeats one finds a PSTH, that is a Post Stimulus Time Histogram, Where these

Â events show these large modulations in the time varying fine rate produced in

Â response to that stimulus. Now, if one zooms in on a tiny piece of

Â these responses, you'll see something like this.

Â So, at, at very Fine time scales. There's quite a bit of jitter in those

Â responses. Now our goal in computing the

Â information, and what the author has examined in this paper was ask on what

Â time scale, do these responses continue to convey information about the stimulus?

Â So one can see by looking at this picture that there's quite a bit of variability

Â in the spike train, and so that defines some kind of window around which a spike

Â can jitter and still signal the same information about the input.

Â So the questions we'd like to understand is how finely do we have to bend our

Â spike train and pay attention to the individual timings of spikes in order to

Â extract all that the neural code has to tell us about the stimulus?

Â So one can do that by exploring the information produced by the spike train

Â as a function of these two parameters, as a function of delta t, the Binning time

Â width and also of the length of the word, as the word gets longer our coding symbol

Â is able to capture more and more of the correlations in the input.

Â And so, to what extent does increasing L continue to capture more and more

Â information about the stimulus. So here's what the authors found in the

Â LGN, they varied both DT, both the temporal resolution of their words and

Â the total word length. So, here drawn as a function of 1 over L.

Â And have plotted here the information that they calculated for different

Â choices of those parameters of the definition of the word.

Â 7:22

So, clearly, there's going to be a problem in going to this limit of very

Â large word lengths. So, as the word gets longer an longer,

Â for a finite amount of data, you're going to have very few samples of a word of

Â that length. And so when one tries to estimate the

Â entropy of the distribution of words of this length, it's very unlikely that you

Â will have seen them all. And so not surprisingly, if you now look

Â at the entropy, plotted as one over the word length The entropy drops off at this

Â limit indicating that the information is not completely sampled.

Â So what can be done is to compute the entropy for different lengths of words

Â and you can see that these form almost a line.

Â And so one can simply extrapolate the tendency of this line back toward

Â infinite word length. And extract an estimated value for the

Â entropy at that limit. That's not what was done in this figure

Â this was purely the information directly captured.

Â And so one can look over different delta t's and different word length to see how

Â information depended on these parameters. So what you should notice is that there

Â is some limit. To DT, beyond which the information

Â doesn't grow anymore. As one looks at the woods in higher and

Â higher temperol resolution. So one takes into account finer and finer

Â details about how those spike patterns are generated.

Â and so that's what's being quantified as we move down this axis.

Â As the time discordization of the wood. These bin sizes, is getting smaller and

Â smaller, that's able to capture more and more of the variability, in the spike

Â train, that's actually signaling something different about the stimulus.

Â But that at some point, it seems that that, information, stops increasing.

Â So, this red, we're at about, you know, between 80 and 100 bits per second, is

Â the information rate. And you see that that stops increasing

Â with delta t, and of delta t of about 2 milliseconds.

Â So hopefully you'll remember from the jitter in the spike trance that we looked

Â at, that they seem to be repeatable on a time scale of about a millisecond or 2

Â milliseconds. So that time scale dt corresponds to the

Â time scale in which the jitter in the spike train.

Â Still allows one to read that off as an encoding of the same stimulus.

Â It's going to quantify approximately what's the temporal with that one can

Â discatize this spike train and still extract all the information about the

Â stimulus that distinguishes it from other stimuli.

Â So in this example we've seen one case where we didn't have enough data to be

Â able to sample say very long words. In general this is always true.

Â When one's trying to calculate information theoretic quantities, one

Â needs to know the full distribution of responses, and the full distribution of

Â stimuli. And there's simply never enough data to

Â come up with really reliable estimates for information, unless one has very

Â simple experimental setups. And so a lot of effort has been put into

Â finding ways to correct the sample distributions for the fact that there is

Â a finite amount of data. And there's been some very interesting

Â work by a number of groups over the last 15 years or so, that has made significant

Â advances in being able to compute information theoretic quantities from

Â finite amounts of data. Now we're going to turn to a different

Â approach, this one proposed by [UNKNOWN] Brenner and [UNKNOWN].

Â How much does the observation of a single spike tell us about the stimulus?

Â Now this is similar to the case that we started with at the beginning of this

Â lecture, but now we're going to address the question that we noted then What if

Â we don't know exactly what it is about the stimulus that triggered the spike.

Â It turns out that, as in the case we just went through, is straightforward to

Â compute information with an explicit knowledge of what exactly in the input is

Â being encoded. This is because the mutual information

Â allows us away to quantify the relationship between input and output

Â without needing to make any particular model of that relationship relationship.

Â So, the paradigm is exactly the same as before.

Â We're going to compute the entropy of responses, when the stimulus is random,

Â and the entropy, when given a specific stimulus.

Â So, here, things are a little simpler, than in the case of Wuds/g, without

Â knowing the stimulus, the probability that a single spike acud/g, is given by

Â the average firing rate times the bin size.

Â Similarly, the probability of no spike is just 1 minus that.

Â Now the probability of a spike at a given time during the presentation of a

Â stimulus r of t times the time then, when now r of t is the time varying rate

Â caused by the changing stimulus We can get an estimate of that time varying rate

Â by repeating the input over and over again.

Â The variability in these responses means that these events show a continuous

Â variation, and have some width as we saw before, depending on the jitter and the

Â spike times. So let's go ahead and compute the

Â entropy. We're going to define, for the moment, p

Â equals r bar delta t and p of t to be r of t delta t.

Â The information will simply be the difference between the total entropy,

Â we've already computed that in the beginning of the lecture For, for this

Â binomial case to minus p log p minus 1 minus p log 1 minus p and we need to

Â subtract from that the noise entropy. Now the noise entropy would take on a

Â value at every time t depending on the time variant firing rate.

Â Now again every time t represents a sample of stimulus S.

Â And averaging over time is equivalent to averaging over the distribution of s.

Â This ability to swap an average over the ensemble stimuli, for an average over

Â time, is known as ergodicity. At different values of S are visited in

Â time with the frequency that's equivalent to their probability.

Â So now we have our expression for the information between response and

Â stimulus, we can do some manipulations on it.

Â So we're placing back P by R delta T. We can take the time average firing rate,

Â to be equal, to the mean firing rate, so that's equivalent here to this, to the

Â integral, over, the probability as a function of time, in the mean, going

Â toward that main firing rate. And getting rid of some small terms, we

Â have here a couple extra, extra pieces that turn out to be small, we end up with

Â a rather neat expression for the information per spike.

Â let's take a closer looks at this expression, as we've emphasized already

Â This method of computing information has no explicit stimulus dependence.

Â Meaning no need for any explicit coding or decoding model.

Â It relies on the repeated part of the stimulus being a good estimate of the

Â distribution of a possible stimuli. Note also that although we computed this

Â for the arrival or not of a single spike, this formulism could be applied to the

Â rate of any event. For example the occurrence of a specific

Â symbol in the code. So this is a way to evaluate how much

Â information might be conveyed by a particular pattern of spikes, for example

Â a sudden inter spike interval. We can also examine what determines the

Â amount of information in the spike train /g.

Â So looking again at this expression, we can see that it's going to be determined

Â by two things. One is timing precision.

Â That's going to blur this function R of T.

Â So if events are blurred so that R of T increases and decreases slowly, without

Â reaching large values, this will reduce the information.

Â At the extreme, let's imagine, that the response is barely modulated at all by

Â this particular stimulas. In that case, r of t goes towards the

Â average firing rate. And one gets no information.

Â The more sharply and strongly modulated r of t is the more information it contains.

Â The other factor is the main firing rate. If the spike rate is very low then the

Â average firing rate is small and information is likely to be the large.

Â The intuition is that the low firing rate signifies that the neuron response to a

Â very small number of possible stimuli so that when it does spike its extremely

Â informative about the stimulus. Note that this is the information per

Â spike. The information transmitted is a function

Â of time, for the information rate is going to be small for such a neuron.

Â So let's look at some hypothetical examples.

Â Rat hippocampal neurons have what's known as a place field such that when the rat

Â runs through that region in space, the cell fires.

Â Let's imagine the place cell looks like this.

Â As the rat runs around the field, Is going to pass through that place field,

Â and what's the firing rate going to look like?

Â Here, as it moves through the field is going to go from zero, ramp up kind of

Â slowly, go down again. Because that place field is quite large,

Â the red is likely to pass through it farely often.

Â So we're going to get some R of T of that form.

Â Now let's imagine that the place field is very small.

Â Now, rat runs around. Very, very rarely passes through that,

Â that place field. And so, now, going to get almost no

Â firing and then some blip of firing as it passes through that field.

Â Now, what if the edges of the place fill the very shop?

Â So now again rat runs around. Very, very rarely passes through that

Â field, so now as the rat runs around, it passes through that place field very

Â rarely, but when it does, the firing rate increases very sharply toward its

Â maximum. So that's going to increase the

Â information we get from such a receptor field.

Â Okay, so now we're done with computing information in spike trains.

Â Next up we'll be talking about information and coding efficiency.

Â We'll be looking at natural stimuli. What are the challenges posed to our

Â nervous systems by natural stimuli? What do information theoretic concepts

Â suggest that neural systems should do when they encode such stimuli?

Â And finally, what principles seem to be at work in shaping the neural code?

Â