But so the basic issue here is now is that you know, when you read an article in literature or research article, for the most part what you get is the article and, you as the reader get nothing else but of course you know, everyone knows that behind the scenes there's a lot that went into this article and that's what I call here the research pipeline. And you can see here that on the left side here, the author is kind of going from left to right along this research pipeline. And then, the you, the reader, is kind of going from right to left. So you read the article, and you want to know more about what happened, where was the data, what was, you know, what was used here? And so the basic idea is, you know, behind reproducibility, is to focus on this kind of analytic data and this computational results box here. And so with reproducibility, we hope, we try to allow the author and the reader to kind of meet in the middle so to speak. There's been a lot of discussion of recent re, reproducible research in the, in the media and the, and in the scientific literature. The, the Journal of Science had a special issue on reproducibility and data rep, data replication,um. There is a recent feature in, on the TV show 60 minutes looking at a, an incident at Duke University where many results were found to be not reproducible, and this led to the kind of a, to a, to a large kind of controversial series of events, where studies or clinical trials had to be stopped, and a lot of investigation, which is still ongoing, had to occur. Finally, the Institute of Medicine, in response to a lot of events that have recent, involving reproducibility of scientific studies, issued a report kind of saying that best practices, or things that should be done to promote and encourage reproducibility, and particularly in what's called omics space research with these like genomics, proteomics, other types of areas. So, this is a, this was a very important report. And one of the many recommendations that they had, they, they, the recommended that things that, stuff, things that like data and metadata be made available. That computer code be fully specified. And so that people can examine it to see what happened. And then all the steps of the computational analysis be kind of described so that people can study it and reproduce it. So, what do we need for reproducible research? So, what is the definition? And I think there's a variety of ways to talk about this, but one basic definition that we've come up with is that analytic data be available, so the data that were used for the analysis that was presented. This is different from the raw data because very often in a given analysis the raw data are not entire, are not all used for an analysis but rather some subset issues. Now it may be interesting to see the raw data but it may be impractical to actually have it. So the analytic data is key to examining the data analysis. The analytic code is also important. So this may be the code that looked, that, that was applied to the analytic data and produced key results. So this may be regression modeling code. It may be other types of things. Documentation of that code and the data is very important of course. And finally some standard means of distribution, so it has to be easily accessible, all this data in this code. It's important to realize that there are multiple players when you talk about reproducibility so there's different types of parties that have different types of interest. So there's, roughly speaking there are authors who produce research and they want to make their research reproducible. And they need tools to kind of, make their lives easier, to make their work reproducible. And there's also readers who are reading research and they want to reproduce that work and they also need tools to kind of make their lives easier. So, some of the challenges that are out there right now is that authors of research have to undergo considerable effort to make their results available on the web. So to publish data, the publish code is not necessarily a, a trivial task and although there are a number of resources available now, that were not re, available even, you know, five years ago. It's still a bit of a challenge to kind of get things out there on the web. furthermore, once, even if things are out there, readers have to download the data, they have to download, look at the results, they have to download the code, they have to piece things together, usually by hand and it's not always an easy task to put these things together. furthermore, readers may not have the same resources that the original authors did, so if the original authors use an enormous computing cluster, for example, to do their analysis, the readers may not have that same enormous computing cluster at their disposal, so it may be difficult for readers to to do exactly reproduce the same results. And then, so, and generally the toolbox for doing reproducible research is small, although it's definitely growing. But it still, there are a lot of needs. So what happens, in reality, is that authors just kind of throw things up on the web. There are, there may be journals, supplementary materials that are famously disorganized. And there are only a few central databases where, that, authors can take advantage of to post their data and make their data available. So if you're working in a field that has a kind of central database that everyone uses, that's great. But if you're not, then you're kind of out of luck. furthermore, the readers just end up just kind of downloading the data, putting the code together by hand. And they have to kind of piece together the software and it can be difficult to do.