Hi, everyone. As part of this compared to genomics section, we're going to talk about taking a genomics region and blasting map and how you confirm things. Let's just give you the setup here. Using the Protein Family Sorter and the anchoring function, we found a region in Brucella microbians, some other genomes that was uniquely shared to them, and some genomes are missing in it. Most notably, the Brucella abortus genomes. These are just a subset of 39 genomes from Brucella abortus, and abortus has probably hundreds of genomes in PATRIC. So it's just a subset of that. But this one particular genome abortus bv strain 3196, which you can see in the pop-up window there. It has some of the proteins in this region. But for the most part, it looks like Brucella abortus is missing it. We join with our cursor, a region over, that corresponded to that genomic island that generated a feature table that showed what those genes were. We chose the middle gene, which is this TraI gene, and looked at that gene. Then in the last video we did, we went to the Genome Browser and we learned how to adapt the Genome Browser, change the sequence to map the region. We went back into the table and added the start and stop, and then we adjusted it here. Then we saw that if you click on the "Reference sequence", "Safetrack data" and "Visible region" and "View" that, you can capture the nucleotide sequence that corresponds to that genomic island. So let's copy that. Let's go to a different PATRIC page. I'm just going to override it here and go up into Services and click "BLAST." Let's paste the sequence here. PATRIC immediately recognizes what it is, and it says, do you want to search against the nucleotide database or the protein? Well, we're looking for this genomic island, so we want the nucleotide. Default is set for reference and representative genomes. But remember, we're asking, is it true that Brucella abortus is missing this with the exception of some pieces here? Let's go back here and click the down arrow box here, and there are a number of things we could blast against. But at the very bottom, it says, "Search within a taxon." Let's click on that. Then it wants you tell what the Taxon is. The Taxon I want to verify that it's missing from or check it out is see if the heatmap is correct, it's Brucella abortus. As soon as I start typing, it starts trying to find something, and up here it says "Species Brucella Abortus". So let's click on that and we're looking for genomic sequences or contigs. We're just going to set it very loosely, the E-value threshold. Let's click "Search". So it's blasting against all the brucella and at first, you look at this and you think, wow, it's hitting a lot of the Brucella abortus. It must be here. Let's dig into this a little more closely. Here's the genome, here's the accession, here is the contig, so this is contig 20. It's a 100 percent identity. So up the query that I gave it, and the query was this sequence, 84 percent had a 100 percent identity and that was what? What does this mean? Subject seven-percent. Well, remember it's looking at the whole contig here. So here's the query length of that whole genomic Island is 18964. 84 percent of that is covered in the blast hit. The sub check, the whole contig is 226340, so it is only seven percent, and there's the score and the E-value. That looks pretty good. Then we have this particular genome that has 79 percent query coverage, and it's, obviously, a smaller little contig. You can see that it's smaller here compared to that one. It's 82 percent there. So we've got less. Look, we're getting the same genome here, tham, pam, pam, pam. This bv five, strain, 3196. So it's still getting a 100 percent but then it's smaller pieces of the query, these are actually not so great. It's seven percent, six percent, six percent, six percent. Actually, these are tiny little contigs. They're very small. It's just hitting little pieces of this contig. If we go back to the hit map view and look at this, you can see that it's what we're seeing here, just a few of these genes that it's hitting. Then we go down here and things get even worse. It's even hitting a smaller portion. If you want to see what's being hit here, we can click this down arrow here, and you can see how much of it is actually being hit in the BLAST hit. It's a small little piece, clicking there, as compared to this one, which was a good one. Let's click that open and you can see that it's a much larger piece, it's starting at this part of the query sequence so it's that particular location. That's a much better hit. That's a good way to open up the results in a BLAST sequence. I would still argue that I would trust this and this, and even these little pieces, even though they're small. But when it gets to this, this is just terrible. It's probably just pieces of the chromosome letter conserved across that. It's nothing like what we would expect to see with a good strong identity, especially if it's only five percent. To be honest, this isn't so great either. I would argue that out of all the Brucella abortus, this two are the ones that have the best hit to this, and that everybody else just has tiny little pieces. These guys have got most of it. If you want to know how many Brucella abortus genomes there are in PATRIC, then let's go into Organisms and then scroll down to Brucella. This is summarizing all the information for all the Brucella genomes. There are 1,053 total genomes in Brucella. How many of those are abortus? Well, we can find that out by clicking on Taxonomy. You see abortus is here at the top, we click on that. Then the vertical green bar tells us we could go to the taxon landing page. Let's click on that. There are 408 Brucella genomes in PATRIC. When we did BLAST, we got really good hits, only two of them and a partial hit here, and then the rest is not so great. But these are the two best ones. That's how you can use BLAST going from a genomic region and seeing how strongly it's conserved, or if you can find it somewhere else. Just for fun, let's click Edit and resubmit. Let's scroll down to, we got to get down to the databases. Just for fun. I don't want to search within a taxon. Let's see if we have any hits the plasmids, known plasmids, and let's search that. It comes up with some not great, like six percent. Now unfortunately, we're not seeing any plasmids hits for this. But that's one of the things that you can use in PATRIC and one of the ways you can use our different databases to explore different sequences that you're interested in. Thank you for listening and thank you for using PATRIC. I've changed my background picture to black and white because we're going to BLAST. BLAST is where the rubber meets the road. Many people, when they've annotated the genome, they come to me and say, "I didn't see this gene in my genome. It wasn't in the protein family sorter, " or, "I couldn't find it in the annotation." That means it's gone. But you should be getting a good sense, but just because you don't see something at this point, sometimes that has to do with the assembly. Here's what I want you to do. In the last assignment, you got the sequence from the gene that was in Unicyler and missing in Canu. That gene you've got the sequence for it and you also got the sequence for the region. BLAST both of those against the Canu assembled genomes. Was there a hit? What does it mean if there's not a hit? Does it mean it's really gone? I don't think we're there yet. If there wasn't the hit, I'm starting to feel good that it's really gone. This is going to lead us into our next tool that's going to do an even more fine grain analysis of presence and absence in interpreting those things. That's the podium comparison. Try the assignment and then I'll see you at the summary and then the next tool. Bye.