An introduction to the statistics behind the most popular genomic data science projects. This is the sixth course in the Genomic Big Data Science Specialization from Johns Hopkins University.

Loading...

From the course by Johns Hopkins University

Statistics for Genomic Data Science

148 ratings

An introduction to the statistics behind the most popular genomic data science projects. This is the sixth course in the Genomic Big Data Science Specialization from Johns Hopkins University.

From the lesson

Module 1

This course is structured to hit the key conceptual ideas of normalization, exploratory analysis, linear modeling, testing, and multiple testing that arise over and over in genomic studies.

- Jeff Leek, PhDAssociate Professor, Biostatistics

Bloomberg School of Public Health

So this lecture's about what statistics is, and so this class is about statistical

genomics and so I'd thought I'd give you just a bit of an overview of what

my own personal view of statistics is and in particular the statistics for genomics.

So for me, statistics is the science of getting generalizable knowledge

out of a set of data.

So that's a quote that I made up just now but it's typically a kind of a usual view

of statistics from someone who's working in the field.

So there's a few different things that statisticians and

people who do statistics in genomics do.

One is study design, trying to decide how many people to sample or

how many organisms to sample.

Which parts of those organisms to sample.

What to do, should we genotype them, should we measure their gene

expression and so forth, calculating power and those sorts of things.

That's typically one thing that statistics does for genetics.

Another thing that statistics is involved in is in data visualization and

exploration.

So here I'm showing you some plots of heat map of correlations and

then showing you two variables, gene 74 and gene 77, and how correlated they are.

So this sort of investigation with plotting and

exploring a set of data is something that.

Would do as well.

The other thing they would do is help to pre-process and normalize data.

So we talked a little about the pipeline from raw data to processed data, and

so typically when you do that, to get from raw data to processed data you have

to perform statistical calculations or computations on the data to make it

more comparable across people or to remove sources of bias.

So that sort of statistical preprocessing is something that statisticians do.

Another thing that statistics does is statistical inference.

And this is probably the thing that everybody knows about statistics, so

if you think about the t-test or you think about doing some sort of calculative

standard deviation or estimates of error or anything like that.

Those sorts of things are all tied up in statistical inference.

It's basically if we have a small set of samples, how do we say something about

the big population when we're uncertain about what we're saying.

That's the most common thing that people think of, but

its only one part of what statistics does and genomics.

The last thing and

I think maybe this is the one that is most undervalued but critically important.

Statistics is about communicating the results of an analysis to

a broader community.

And so here I’m showing you an example of a complied or mark-down document,

we’ll talk about that later.

It’s basically how you do a reproducible analysis and how do you intermix the text

describing the models you fit with the models that you fit.

And so, how do you make people understand what you’ve done.

What computations.

What statistical calculations and what inferences that you've made.

And so I think statistical communication is a critical part of what

statisticians do.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.