Hi, everyone. This is the beginning of the instructional videos section that we have on Genome Assembly. But before we get into how do you launch a genome assembly in PATRIC, I wanted it to describe briefly what genome assembly is, where you can find data in PATRIC, and what PATRIC offers for different types of assembly strategies. Let's get started. Genome assembly basically is you take an organism, you isolate DNA from it and that DNA generally is in the shape of chromosomes, but then you submit the DNA to a sequencing center. Those chromosomes we can fragmented into smaller chunks that we call reads. Assembly is the process of stitching those reads back together into some form of order. There are two types of assembly: reference based and de novo. Reference based is when you take the reads from your sequencing center that you've gotten and you map them against an already known reference genome. Then you arrange the reads in the order that those genes occur on the reference. Your organism may actually have them in a different order than that. But with a reference based assembly, you're structuring it based on that particular genome. De novo means you take the reads and you allow a computer to put them together in the best way that it determines as possible. PATRIC only offers at this time de novo assembly, we don't have reference based yet. There's different sequencing reads that we accept at PATRIC at this time. Illumina is by far the most common because the sequencing is pretty cheap, these are short reads. We also have Ion Torrent that we accept. Then, there are two types of longer reads: the PacBio and the Nanopore reads. All of those can be uploaded to PATRIC and submitted to the assembly service. Then you would have to choose a strategy based on your reads. The sequencing strategies that we offer in PATRIC are: Unicycler, SPAdes, and there are several types of SPAdes, Canu, metaSPAdes, plasmidSPAdes, and multiple displacement amplification or MDA, which is also another type of SPAdes. Unicycle can be used for hybrid assemblies which is taking long and short reads together. SPAdes is short read-only, Canu is for long reads like Nanopore or PacBio, metaSPAdes is when you have a metagenomic sample, a mixed sample that you want to try to get assemblies from that. These are generally pretty big so we have more memory allocated to the metaSPAdes assembly. PlasmidSPAdes is when you or at least this particular algorithm, plasmidSPAdes is you're giving it a whole genome and you're saying, "Please try to pull out plasmid out of that. If there are plasmids, can you just give me that particular assembly." Then, MDA assembly, right now, we're at the beginning of what may very well be a huge boom in sequencing from single-cell genomics. This is an assembly strategy that is particularly good for that. Generally, what MDA does is prior to the sequencing, you have a special strategy of whole genome amplification. The assembly strategy is taking this into account when it's trying to generate the assembly from that. We talked about hybrid genome assembly. Generally, when you have short reads, there may be parts of the genome that you can't sequence through or can't see the other side up so you'll get many different contexts from map, which are pieces that would map to a chromosome. When people are doing a hybrid assembly, they're taking long reads that should be able to stretch past those sections that might be problem issues like where there's often horizontal transfer or a lot of transpose zones often with the short reads it's hard to get past those. If we have a long read, it can get past those. But generally, with the long reads, the coverage isn't as good. They aren't as clear at the single nucleotide level. When you do a hybrid genome assembly, if you can afford, if that's the best of both worlds. If you've done that sequencing, there are two ways you can do that in PATRIC, Unicycler will do it automatically. But another thing you can do is submit both long and short reads. Canu, you would choose the strategy Canu, but then you would do polishing. We're using the short reads with the pylon iterations and so you may be asking what is polishing? When you polish the assembly, it's trying to fix errors in the assembly by going back to the reads. It looks generally at base pairs or small regions. Racon is if you have long reads, pylon is for short reads. This is showing you the interface in PATRIC, our default is to set two and two for racon and pylon iterations. Even if you just submit long reads or just submit short reads, the default is still there. During the course of the assembly, we'll have several short instructional videos. We'll have a question session following each video. I've also created some test data for you to use. To find the test data, you would go to the workspaces tab at the top of any PATRIC page and you would click on "Public Workspaces". This is going to open a window that shows all the public workspaces in PATRIC. You would need to scroll down to find the MOOC PATRIC Course and click on that. That's where I will have the data for you, and I'll show you that before each question just so you're sure on it. I think we're ready to go, let's start assembling in PATRIC.