So this lecture's about finding statistical software that you can trust. So this is one of the probably most frustrating things about doing statistics on genomics data. Is that often you get a new data type that you want to analyze and maybe you haven't done statistics on that type of data before. And you want to go out and find say, an R package or another piece of software that you can run on that data to do some kind of analysis. And one of the hardest things to do, is to go out and determine whether that software is trustworthy or not. And so I've done a little bit of organization here of ways in which you can find software and how trustworthy they are. So the first case is if you just find a piece of software linked to from some random persons website, that has maybe not been vetted and you can't find any other information about. So I'm showing you an example here of a little script that I wrote and posted on my website along time ago. This piece of software is probably not to be trusted. There's no documentation, it's quite hard to use, and for sure it will break if you use it in the future. Typically you have to be maintaining and updating software for it to work. So the least trustworthy place to find software is just linked off of a random persons website, especially for statistical software. A slightly better place to go look is on GitHub. So GitHub is a place where you can share code. And you can share, for example, our packages but you can also share other kinds of software. And often people that use GitHub tend to be pretty savvy. And so you can find very good pieces of statistical software on GitHub. But they typically haven't been run through any kind of vetting process. Now, they can be. Sometimes you find very trustworthy software on GitHub. And usually you have to judge by the person that has created it, whether you think that they might produce good software in the future, which can be a little bit dicey sometimes. But in general, GitHub is a little bit better than people who just have it on their own website, because at least you know they put some effort into putting it on GitHub. A little bit better than that is Cran for our packages. So this is now exclusively our packages that you can find there. Our packages tend to be those that have passed at least very minimal checks, so you know that they mostly work. And so that doesn't mean that they've checked that they actually do what the statistical method says they'll do, but at least it shows you that the person who developed it, the developer cares enough to make sure that the software is functioning. The next case would be bioconductor, and so bioconductor adds some additional checks onto software. So again this is exclusively R packages,, but those R packages have to not only run, but they have a minimal level of peer review that somebody has looked it over, made sure it does something different, has maybe even tried it out and see if it works. And you also get some more data about the maintainer and whether it's running right now, and so forth from the Bioconductor website. And so, that's probably the most trustworthy place that you can go for software. Is some place that's a curated set of software where people have actually done some kind of review of that software and checks on that software. So how can you check to see what are the, like, numbers that you can look at to see if a piece of software is trustworthy? So Bioconductor is actually quite easy. They give you information on the download stats. Here it tells you what percentile the software's in, so how many people have downloaded it. So if it's in the top percentiles, likely lots of and lots of people have used it, and so it's probably pretty trustworthy. It also gives you some information on how often they respond to comments, so how responsive their author is of the software to questions. It is going to be something that will matter to you a lot, if you use the software, and it breaks, or something like that. You want to be able to get ahold of the person that created that software. So that's a useful piece of information. And then it does things like check to make sure that the software runs everyday on Windows and Macs, and so fourth. You can look and see if that says okay, and that's helpful because it tells you something about how the software is functioning right now. On GitHub you have similar source of information, maybe a little bit less directly useful. So for example, you have a star feature, so you can see how many people have liked the software. And that can give you some indication on whether it's used or not. and then you could also see how many commits have been made and when those commits have been made if you click on the commits. And so that tells you how often it's being worked on, and that is sort of a proxy for how, well maintained the software is. And so you could use these numbers to sort of, help guide yourself. Another thing that you can do is go ask on mailing lists or supports sites. And so Biostars is one support site where you can ask about all sorts of bioinformatic or genomic software, not just our packages. And you can get information on how to use it. Maybe also whether you think it's trustworthy or not. You can ask those sorts of questions. Another example is the Bio-conductor support page, where you tend to ask more specific questions about Bio-conductor but again, you can ask information about how to do specific analysis, also in general, what's the right packages to use for a particular analysis and get some feedback from the community. So hopefully that will give you some idea about picking which software to trust when you're doing statistical analysis.