0:03

We'll finish up this week's material by

Â considering a final couple of model upgrades.

Â Let's go back and stare at this data again.

Â There are a couple of issues that we haven't yet addressed.

Â One is that we modeled only a time bearing firing rate.

Â And of course this data is in the form of spiked times.

Â In what sense the precise patterns of these spike trains might be

Â meaningful is something that we'll return to in a couple of weeks.

Â But in this section we'll address directly the hidden assumptions

Â of models, like the ones we've been developing, about the relationship between

Â that time varying firing rate, RFT, and the currents of single spikes.

Â And we'll try to deal with the fact that there does appear to be some

Â fine structure, here maybe, in the spike

Â trains that a smooth function RFT can miss.

Â 0:46

But first we'll talk about the fact that

Â this data was produced by showing the retina a

Â natural movie and not white noise, which was

Â the stimulus that we used in our previous discussion.

Â In real life, neurons aren't living in a world

Â of white noise and it turns out that the

Â statistics of this stimulus that you used to sample

Â a model do affect that model that you arrive at.

Â So we choose to use white noise rather than

Â some more natural stimulus because no matter how you filter

Â it, it's always Gaussian, which means that there's no

Â special structure, no special directions in the stimulus set itself.

Â 1:17

Since it's already come up and it will be coming up again let me just remind

Â you what a Gaussian function is. So it's defined as follows.

Â Some coefficient multiplied by this exponential factor,

Â which includes x minus some, some parameter.

Â X not squared divided by 2 sigma squared.

Â So here, x not is the center of this function.

Â And sigma is a measure of its width.

Â 1:41

So for thinking about this function p of x, as a Gaussian

Â probability, distribution over x.

Â Then x is mean, x bar, which is the mean of x, is x naught.

Â 1:55

And it's variance, defined as x minus

Â it's average, squared is equal to sigma squared.

Â So that standard deviation is just the square root of that, which is sigma.

Â 2:11

Now, if you add together two or more Gaussian random numbers, the new random

Â number also has a Gaussian distribution and

Â that's just what you're doing by filtering.

Â Taking linear combinations of the values of

Â the white noise at different time points.

Â So with white noise, when we're using geometrical

Â techniques like PCA, we're making sure that we

Â have a stimulus that's as symmetric as possible

Â with respect to those coordinate transformations that filtering

Â give us.

Â There are no special stimulus dimensions that are built

Â into the prior, into the, into the stimulus ensemble itself.

Â 2:46

Let's go back to the question that we posed last time.

Â When have we found a good feature?

Â When have we identified a good, a good filter, f?

Â We answered that by looking for the response function with respect to that

Â stimulus component f, an input output curve

Â that is interesting or has some structure.

Â 3:04

So, recall, I showed you these, these two cases.

Â In this one, the Gaussian prior here of the distribution

Â of the filtered stimulus.

Â 3:15

And the conditional distribution, so those values of the filtered stimulus

Â that are conditional on, on the arrival time of the spike.

Â In this case, those two distributions are very similar, so when we take

Â their ratio to compute the input output function, we get just a flat curve.

Â 3:40

So, instead of taking the average or doing PCA

Â to find that filter, could we just go directly to

Â these quantities, to these, to these distributions, the prior and

Â the conditional distribution, and ask, can I find an F?

Â A choice of F, that when I project the stimulus onto it,

Â that the conditional distribution and the prior are as different as possible.

Â So, what would it mean to be

Â as different as possible?

Â There's a standard measure that we use for evaluating the

Â difference between two probability distributions,

Â and that's called the Kullback-Leibler divergence.

Â So here is the, the definition of the Kullback- Leibler divergence, DKL.

Â So here is the divergence between two distributions P of s and Q of s.

Â It's given by integrating over all the, all the random variables.

Â So, in this case s, so we integrate

Â over s.

Â P of s, multiplied by the logarithm of the ratio of those two distributions.

Â 4:38

So what do we get if we use this DKL

Â between the prior and the spike conditional distribution as a measure

Â of the success of the choice of f, and just

Â try to find an f that maximized this this quantity directly?

Â 5:00

So now, I'm taking some arbitrary stimulus distribution.

Â And here I've drawn it in a, you know, pseudo high dimensional space.

Â P of s is the, is the distribution of all possible stimuli.

Â We're going to take some filter, again a vector, in this high dimensional space,

Â that's f1, and project all of the stimuli onto it to compute the prior here in gray.

Â And now we'll project the spike-triggering stimuli which

Â here, we pictured in, in yellow to compute

Â the spike-conditional distribution, here in yellow.

Â And now, one can vary f around.

Â Right, so we can take different directions of this f.

Â And repeat this procedure and compute the DKL

Â between this prior and the spike conditional distribution.

Â Here's another example of a different choice of f, f2.

Â 5:47

In that case our prior has a slightly different shape because

Â the stimulus distribution has a different shape in that direction and

Â the spike conditional distribution also has a different shape.

Â You can see that these two distributions are much more similar than these two are.

Â And so, we would prefer f1 as a better choice of our filter than we would f2.

Â And so one can move around in this space and keep evaluating these two

Â distributions, and look, search for an f

Â that maximizes the difference between those two distributions.

Â 6:21

Now, this turns out to be equivalent to maximizing

Â the mutual information between the spike and the stimulus.

Â So we're trying to find a stimulus

Â component that is as informative as possible.

Â So observing a spike pins down our estimate for the stimulus much better

Â for the f1 component, in this case, than it does for the f2 component.

Â 6:45

So notice that the stimulus here is no longer Gaussian, we mentioned that.

Â It's no longer a nice, symmetric ball, and I've draw it like that

Â because there's nothing about this technique that

Â demands that our stimulus be white noise.

Â Since this is a stimulus with some

Â arbitrary distributions, you can see both the prior

Â and the spike-conditional distributions and varying with

Â the direction of f, but that is okay.

Â 7:06

The fact that this method can be applied to arbitrary inputs

Â means that this technique has been applied to derived models using natural stimuli.

Â So one can then take, take this to the next step and compute

Â the input-output function from the ratio

Â of the conditional distribution and the prior.

Â So it's a powerful technique.

Â It generalizes to, to complex stimuli.

Â 7:40

So to summarize, we saw how to build a model

Â with a single filter, by taking the spike triggered average.

Â We saw that we could generalize that to multiple filters using PCA.

Â And, finally we introduced an information theoretic

Â method that uses the whole distribution of stimuli

Â to compute and optimal filter and this light

Â less method removed the requirement for Guassian stimuli.

Â 8:16

So to go from r of t to spikes, the assumption that we'll be making is that

Â every spike is generated independently with the probability

Â that scaled by that time variant r of t.

Â What does this mean and how can we test it?

Â 8:32

Let's start from the most elementary random process, the flip of a coin.

Â Says probability 1/2 of landing heads, probability 1/2 of landing tails.

Â Now, let's take a biased coin, it only has some small

Â probability, p, of landing heads up, and that;s when the system spikes.

Â 8:50

So now we can think of the arrival times

Â of spikes as,as obeying something as simple as that.

Â We have some time, t.

Â We divide it into many time bends of size delta t.

Â Let's say there's n of them, right, n is t over delta t.

Â 9:16

Now we'd like to know how many spike will occur in the total time t?

Â This is, of course, a random number.

Â It will vary on every trial.

Â This random number has what's called a binomial distribution.

Â Binomial meaning two value and those two values have

Â the probability firing p and the probability of not

Â firing one minus p.

Â 9:45

How do we compute this?

Â All we need to do, is count, what's the probability that there's a spike

Â at exactly k bends, it's the probability, bend by bend, that a spike occurred.

Â So, we need probability to the power k.

Â 9:59

And then the probability that a spike didn't

Â occur in the remaining bends, so 1 minus p.

Â How many bends did a spike not happen in, that's n minus k.

Â And we don't really care which of the k bends it occurred in,

Â so we need to count up the number of different ways that we could arrange those

Â k spikes among the, among the n bends. And that's a quantity often called

Â n choose k, and we can write that as n factorial,

Â over k factorial n minus k factorial. Where factorial,

Â let's give an example, three factorial is three times two, times one.

Â So n factorial is n times n minus one, times n minus two all the way down to one.

Â 11:11

Now, in the limit that there are many time bends and the probability of a spike

Â in any bend becomes very small, one can

Â show that the binomial distribution has a limit.

Â That's the following form.

Â So we go from that distribution that we just arrived,

Â in the limit of very small time bends and now where

Â we set a parameter r, which is the probability in

Â a time bend, divided by the size of the time bend.

Â So the probability for a given time bend is going to be

Â coming very small as the time bend size becomes very size small.

Â So what we want to do is set some parameter r, such

Â that that parameter stays finite as the time bend gets very small.

Â And so that's the rate or probability per unit of time.

Â 11:54

So now that becomes our parameter in this distribution.

Â So one can start with that previous distribution of the binomial distribution,

Â do some calculations and end up with with an expression like this.

Â Some of you might like to try that for

Â yourself or perhaps look it up on, on Wikipedia.

Â 12:12

This new distribution is called the Poisson Distribution.

Â I've sub scripted it now, not by the number of bends but by the total

Â time, t, as we again assumed that we've

Â taken limit where delta t becomes very small.

Â 12:26

So what are the properties

Â of the Poisson distribution?

Â It has a mean of r times t, which hopefully feels intuitive.

Â The number of spikes is the rate times the total time, slightly less intuitive.

Â So it has a variance that's given by r times t.

Â So you might notice that that's the same as the mean.

Â That is a very unusual propedate, and because of that, a quantity called the

Â Fano factor, which is the ratio of the mean to the variance, has become

Â a way to test whether a distribution is Poisson or not.

Â If it has a value of one, then it's Poisson.

Â 13:01

Finally false spikes have been generated through a Poisson process,

Â which fundamentally expresses the idea we started from, which is

Â that they're generated in every time bin, delta t, as

Â though they were independent with the probability r times delta t.

Â 13:16

Then they'll also have the property that the

Â intervals between successive spikes has an exponential distribution.

Â You can gets some intuition for why this is by considering this distribution

Â above but, evaluated just for one spike as a function now of the time, t.

Â You'll see the appearance of the exponential, and the factorial goes away.

Â So comparing between them, the interval distribution

Â doesn't have this factor t out the

Â front because it has to be normalized

Â over all time while the expression above doesn't.

Â 13:46

Now, the probability of seeing 5 spikes in a

Â chunk of time, t, depends on the firing rate in

Â this way, this is the Poisson distribution.

Â 13:56

So these are two strong characteristics of a Poisson distribution.

Â One, that the final factor is 1.

Â And, second that the interval distribution should

Â look like an exponential distribution of times.

Â 14:10

So here are some examples of the Poisson distribution

Â for a few different choices of the firing rate.

Â For low firing rate, the distribution is almost exponential, whereas as the rate

Â gets higher, the Poisson distribution looks more and more Gaussian.

Â Now in general, the rate is varying as a function of time.

Â So if we want to see if this idea is

Â reasonable by looking at data, we need to allow r,

Â the rate, to vary in time.

Â Here is a data from a neuron in monkey MT cortex, which is sensitive to motion.

Â The monkey is watching the variable patterns drift across the screen

Â and we're going to look in more detail at this experiment next week.

Â The same pattern is being shown over and over again.

Â 15:08

Now if you split the data up into these

Â little windows of time and plot the main number

Â of spikes in a time bend against the variance

Â in that time bend what would you expect to see?

Â In every bend, if the spikes of Poisson but with

Â a different rate, you could plot the rate against the variance.

Â What would you expect?

Â Remember that the slope of that plot would be the Fano factor.

Â So it expected, if it were Poisson, to have a constant slope of about one.

Â And in the data you see that, that is, that is very close to being true.

Â Here is the line, the line of slope 1.

Â You see that the data is very close to that.

Â So, even though the firing rate is changing in

Â each short time chunk, the cell's response looks Poisson.

Â 15:53

Where does this kind of variability come from?

Â It's likely that while the neuron is receiving a mean input that's

Â proportional to the stimulus, it's also receiving a barrage of background input.

Â Remember that a cortical neuron gets inputs from around 10,000 other neurons.

Â If that input is balanced, that is, if it varies around zero, to

Â be both positive and negative, it won't add much to the average firing rate.

Â But it will jitter at the spikes.

Â 16:22

For example, here's the behavior of a

Â neuron model, that's driven by white noise.

Â It also looks very close to Poisson, in the sense

Â that the interspike interval distribution looks very close to exponential.

Â I've emphasized that by plotting the number

Â of intervals in log, against the interval itself.

Â Which, if it's an exponential distribution, should look like

Â a straight line with a negative slope, given by

Â the firing rate.

Â 17:00

At short intervals, the distribution stops looking exponential.

Â This is for the very good reason that

Â a neuron is unable to fire arbitrarily rapidly.

Â There are bi-physical processes that

Â prevent a neuron from firing immediately after

Â an action potential, and you see here that's

Â caused a gap of maybe a minimum of

Â 10 seconds, in this case, between successive spikes.

Â 17:21

So we're going to talk about those processes in a few weeks from now.

Â So, we might want to improve our model yet

Â more, by taking these intrinsic limitations in firing seriously.

Â This can be very helpful, as these intrinsic processes going on inside

Â the neuron, might add quite a bit of structure to the spike trains.

Â For example, there may be some resonance such the neuron likes to

Â fire at a certain frequency independent of the fluctuations of the stimulus.

Â 17:48

So these intrinsic effects can be built into coding models.

Â They're elaborations of the ones we've

Â been looking at called generalized linear models.

Â Here the setup is very similar, the stimulus comes at similar, the

Â stimulus comes in, is filtered through

Â some feature, processed through a nonlinearity.

Â Here the nonlinearity is drawn as exponential.

Â I'll talk about that in a minute.

Â And there's, then there's an explicit spike generation step,

Â explicit Poisson spike generation step.

Â 18:16

If generation of the random process generates

Â a spike, then a so-called post-spike filter,

Â drawn here, is injected back in to the input that's going into the nonlinearity.

Â 18:30

So, of example, if the system is refractory what you'd want for

Â this waveform is that it would quickly move you away from threshold and

Â hold you away from it for some time, so you

Â want a big negative pulse that might decay back over time.

Â So we might want to add in something like this that decays back over time.

Â 18:57

The one that's drawn here, taken from this, this

Â very nice paper, is a little bit more sophisticated.

Â It first draws the neuron away from spiking, with

Â a big initial dip, so it has the refractory property

Â built in, but then it becomes positive, which is going

Â to promote spiking at some time after the previous spike.

Â So that could give a neuron that has a

Â slight tendency to fire periodically which is very nice.

Â So the spiking probability is now proportional to

Â an exponential of the filtered stimulus as before,

Â plus the filtered spiking activity as we've

Â drawn, as we've written out right here.

Â 19:33

So why this exponential non-linearity?

Â In the models that I've shown before,

Â we've allowed the non-linearity to be something

Â that we've computed directly from the data

Â whereas here it's fixed as a non-linearity.

Â Liam Paninsky showed that by fixing the non-linearity to be exponential, or to

Â be in the exponential family, you become able to, you become able to find

Â all the parameters of this model, all the values of

Â these filters using an optimization scheme that's now globally converted.

Â So you've sacrificed some generality for a model that's more complete in another way.

Â You get more power in that you can add more, more filters and it's guaranteed to

Â be solved reliably and repeatably. So if we're going on adding additional

Â factors to what can influence the spiking probability, why, why stop at that?

Â As Emery Brown and colleagues pointed out, one

Â can also include many other intrinsic and extrinsic factors.

Â In this paper, the group included the influence, not

Â only of refractory effects, but also of the firing of

Â other neurons in the network and applied this to

Â the type of data that you saw from the retina.

Â So including both self firing,

Â the output of the neuron itself, and also the effects of the

Â firing of other neurons, they allowed them to predict the spike patterns.

Â So they got, they were able to captured these detailed

Â spike interval patterns that we saw in the retinal data,

Â 21:07

So, I'll finish up with another beautiful idea from, from Emory Brown's group.

Â We can use this Poisson nature of firing to test whether we

Â have captured everything that we can about the inputs in our model.

Â Let's say we have a model like the GLM, where

Â the output depends on many influences, on the stimulus, on

Â the history of firing in the neuron that recoding from

Â on the history of firing in, in other neurons as well.

Â Then we can our

Â output spike intervals and scale them by

Â the firing rate that's predicted by the model.

Â So we take these intervals times between successive spikes, we scale them by the

Â firing rate that our model predicted given

Â all the interactions that, that we've incorporated.

Â 21:49

If this predicted rate does truly account for all the influences on

Â the firing, even ones due to previous spiking, then these new scaled

Â intervals should be distributed like a pure Poisson process, with an

Â effective rate of one, that is as a single clean exponential.

Â So this is called the Time-rescaling theorem and it's used as a way to test

Â how well one has done in capturing all the influences on spiking with ones models.

Â 22:18

So, we've reached the end of this stretch.

Â We've looked at some classical, and some more modern ways, of thinking

Â about what spikes represent and how one can predict them from, from data.

Â I'd like to emphasize that some of these models and methods

Â are a very powerful way of thinking about the neural code.

Â But there is a lot that they ignore.

Â These models, in particular, give the impression that

Â neurons represent a particular thing, and that's it.

Â In fact,

Â neural responses are modulated by many other influences, by

Â how the animal is using it's body to deploy

Â it's senses, to what it expects to see in

Â the environment, by the context in which the stimulus appears.

Â We'll have a look at one example of such influences in the later lecture.

Â But you should also, always keep in mind that while I'm trying

Â to give you an overview of current approaches to understanding the brain.

Â And these methods have made huge progress in allowing us to make sense

Â of a lot of data even if under rather limited circumstances.

Â It's likely that some of these ideas might

Â be overturned completely with a much more general approach.

Â So the field is still really wide open to new ideas

Â and concepts that will provide a richer and a more powerful understand.

Â 23:24

So to wrap up, I know this week has started

Â to exercise maybe some math muscles that might be rusty.

Â So please refer to the supplementary materials online to see if there's

Â anything that can help you, and do hit the forums.

Â There are a lot of knowledgeable people among you, and

Â it's great to see questions being answered and discussions developing there.

Â And, of course, our team is standing by, ready to pitch in and to help, as well.

Â For next week, I hope you'll join us again as

Â we start to learn how to use decoding to read minds.

Â Back next week.

Â