We're going for a couple years here.

And then you have particulate matter levels down here at the bottom.

Now you can see there's not as much particulate matter data

as there is mortality data, because there's a lot of missing data.

So, essentially you just want to say, is

this top thing correlated with the bottom thing.

So question is, can we encode

everything that we found in the statistical

and epidemiological research into a single package?

The answer is yes.

Time series studies like this don't have a huge range of variation.

And so they typically involve similar types of data.

You know, it might be hospitalization instead of

mortality or what not, but it's often very similar.

And so can we create kind of

a deterministic statistical machine for this area?

So the basic

kind of pipeline that looks, it's a very simple pipeline.

This is not a very complicated analysis for the most part.

You want to check the data, see if there

are any outliers, high leverage types of points.

Pollution data is often skewed, so you want to check for that.

Look for overdispersion.

Do you want to fill in the missing data? The answer is absolutely no.

There's been a lot of work on that.

It doesn't, it doesn't turn out well.

The big question really here is model selection, so one

of the things that we have to worry about is

called unmeasured confounding in these types of time series studies.

And you don't measure a lot of things that vary over time.

So I guess this is like your batch effects.

And so you have to, there are various approaches

to estimating of how you adjust these unmeasured confounders.

We use semiparametric progression methods to do this, so.

Estimating the degrees of freedom, this is, has the most profound effect

on any kind of association that you might estimate, so this is critical.

So, but, you know, there is lots

of research on how to do this.

There's been a couple of papers One is mine.

One of the various approaches to e, estimating

this number of degrees of freedom and there, and

you can settle on a, you know, one or

two approaches that are That are better than others.

So we can just implement that.

Other aspects of the model tend not to be that important.

Again, whether you adjust for temperature and weather and other

things it kind of doesn't really matter how you do that.

There's other things that you're typically interested

in, multiple lag analysis and sensitivity analysis.

So you want to see, you know, you can

select a model here, but if you want to see

if you can move the model back and forth

a little bit Does your association change dramatically, right?

So those are the typical things that you want to see, in this kind of analysis.

And when I review, you know, one paper a month, of, of

a time series study these are the things that I always ask for.

So. >> Roger.

>> Yeah.

>> Is there a 15 second response to why.

Computation is so bad in the setting? >> Oh, because the data

are missing systematically so the, the, the pollution data is very difficult to

collect, and so they typically only measure it only once every six days.

So there's five days missing for every six to eight days.

One observation for every six days.

So, you can try to imp-to inpute it but you just

add a lot of noise for a little bit of savings advice.