Earlier, we talked about getting data from the heat map view using our cursor to go over a specific region, which gave us a table of genes that we can see. Now, we're going to talk about drilling down into a gene of interest. Let's click on this one, which is this. Just because it's named, there's no particular reason to do this. But why this gene and not the others, I wanted something in the middle which is this particular gene ID, 358. That's pretty much in the middle of this group. When I click on that, it populates the vertical green bar with possible downstream actions. This time, we're going to click on "Feature". That's going to open a new tab that shows all the information we have about this particular gene. In PATRIC, the unique identifier is this, which we've talked about. This is the unique identifier for the genome, which is this Brucella microti genome, this is the bread crumb across the top that tells us exactly where we are. In RefSeq, this particular gene is known as this. This is the gene symbol, this is what it's called in PATRIC, and it's a coding sequence gene. This is the protein ID. If I were to click on this and say open it in a new tab, it should take me to that gene in RefSeq, and they've updated it recently. You could follow the links there. This is the gene ID, this is the protein family, this is the global family. Here, you have more information about the genome itself, more information about the exact location of this gene, and you could see the sequence help. They have the gene is and the protein and the length. Also here, it shows you where the gene is in bright red and who the neighbors are around it that you can see by scrolling over. Also, you can go to different database repositories and look for information about this gene. One of my favorites is the CDDs. But let's click on Conserved Domain Database, and that's showing what the information is about this gene. I really like this particular resource, it's really good. You could also look at string to see if there are any known protein-protein interactions for it. If none are found, not surprising because it's obscure little gene. But pay attention to these particular links to external tools in PATRIC. I've also talked to you recently about how you can add a genome to a group, you could add this to a feature group. If I wanted to see how many of the PATRIC Brucella genomes had this particular local family, I could click here, it would open a new tab. We would wait a few minutes, and it tells me that 155 of the PATRIC Brucella genomes have that particular protein family. You could do the global family, which goes beyond the genus boundary, so let's click on that. It should be bigger, but let's see how big it is. Yeah, 1,720. This gives you an idea of how many different genre have this particular protein in them. You could see how many genomes have the exact same sequence. I'm going to click on this and open a new tab of this, and that's 99. This was the protein families, the local protein families which was 155. Ninety-nine of those have the exact same sequence, so I really like the feature page for this. We also have different tabs, the genome browser and the compare region viewer. What I want to talk to you today is about the compare region viewer. Usually, when you're doing bioinformatic analyses, you have to go to extra lengths to get beyond the gene, that you can do the genome comparison, you can do the gene comparison. But doing the gene neighborhood can feel a little bit difficult. We have this function which is the compare region viewer that shows that, so let's click on that. What this is showing us is that the gene that we came in on, which this incP-type, DNA relaxase Trial one gene. It's here in red and you may ask, why are there two different bars? Well, every gene, when it's starting to draw on the gene neighborhood, when you launch this tool, it wants to see how strong the blast hit is to that particular gene. Because we started with micro D, that's a perfect hit. It's a really strong, this represents the blast hit, which you can see as you mouse over it. This represents the gene, telling you the location. As we go down the Brucella and well suddenly we get to this one. The blast hit is less strong and so it's lighter. Look here at this one. It only had a partial blast hit to that gene. What else are we seeing here in the compare region viewer? What is this doing? It's taking the upstream region and the downstream region. That equals 10,000 base pairs. It takes that entire region and it looks across in this instance the reference and representative genomes in PATRIC to see how tightly conserved to find the closest matches, not for the gene, but for the neighborhood. It's looking for this entire neighborhood. The reference and representative genomes are something that GenBank has and assigns. They're the gold standard of high-quality genome. You're showing the ones that match that in Brucella. You can see in the upstream regions, the colors are all similar. These genes, if I mouse over them, they're hitting the same gene and these are all the same, annotated the same way. Some of them like this one, this little hypothetical gene is missing in the neotomae, but it's present in most of the others. Then you can see in the downstream region, these are all hypothetical genes. Though it's coloring them all the same color. When you get out of Brucella, you notice that while this gene stays the same that's as far as it could get. The rest of the neighborhood is totally shocked. Looking at the reference and representative genes. Let's look at this a bit more closely. We have Brucella microti, which is the genome we were looking at. This NVSL, strain, neotomae, canis, suis, ceti. Well, I know that there are different Brucella reference and representative genomes. Let's go back to the heatmap view that we launched a while earlier. Remember when we march through this, there are borders. They're all reference and representative of borders genomes. Those appear to be missing. The melitensis genomes are also missing. When we first launched this tool, we were just using a subset of all the genomes, Brucella that are available in PATRIC. Let's see if this holds if we extend it. One of the things we want to do is go from reference and representative to all public genomes. Notice we're looking across genome families that would be important here because it's showing how strongly this is conserved across the genus boundaries. Remember that this section of Brucella, let me go back to the heat-map view. This is a known genomic islands, so we're trying to see which other genomes in PATRIC might have features of this island. We'd want to keep it across genome families. You could change the size if you wanted to and then click "Update". Now back always takes a long time. I launched it earlier, I cheated. These are across all the Brucella genomes and I wanted to see how many abortus genomes had this, there's one, two, three. Only three of these genomes which we'll explore later in a different part of the analyses. It's confirming that this is unique to outside of abortus. I just want to scroll all the way down to the bottom. It's a very long page. We get to Brucella suis, suis, and then we get to Serratia. Look here, we're seeing on this five-prime end of this region, we're seeing a lot of similarity here. Once we got out of the reference and representative genomes, we can see that this is conserved. We can see this five-prime region that goes all the way up to the protein we came in on, but after that, it falls apart. So it looks like, for this particular genomic island, in Brucella, looks like these guys are unique. Remember we were looking at the reference and representative genomes. They were missing this in the five-prime region too, but these particular genomes have a piece of it. I think that there's something going on with horizontal transfer, and this part, this five-prime is pretty conserved across these genomes which aren't a lot. PATRIC has over 300,000 genomes. Now, we're talking about five or six. But they seem to be having the same genes in this area. But once you get here, things change up quite a bit. So it's an interesting way to look at things. In the next instructional video, we're going to talk about using the genome browser and capturing data, the sequence that corresponds to this area and trying to demonstrate how you can verify that something is really missing in a different genome. Thanks for listening and thank you for using PATRIC. I had to do part of this exercise in assignment eight. But actually, repetition makes you a stronger, better person, and able to more quickly do tasks downstream after you've done them several times. So don't thank me, it's all for the good of the PATRIC user. Here's what we did. We had the Canu and Unicycler groups. We ran the pg fams and the protein family sorter. We went to the heat map. I had you, at one point anchoring the genome on the Canu assembly. I think it was zero rack on one pile one iteration. Then suddenly, you were able to see these differences in genes at the very beginning of the Canu assembly. I want you to go with that. Select one of the genes that was yellow in the Unicycler assembly, and orange or tangerine in the Canu assembly, you draw over it. It says, "show proteins" when the pop-up window comes in your cap up on the Feature Tab or you can draw a bigger box around it. Click on the gene from the same protein family in each. So what you might want to do is within the heat map where Canu drops off and Unicycler picks up, you choose the gene that's orange in Canu, yellow in Unicycler. Draw a box around both of them, and then when the feature table comes up, it'll have both of those in there. Click on the Canu one and then say show the feature, and then click on the Unicycler one and say show the feature. So you'll have two tabs side-by-side. I just as a hint, I told you what I use, but you can use whatever you want. You've opened the Feature Tab. Now, I want you to drill down and open the Compare Region Viewer for each, and what does that look like between the two? Remember it's the same gene, but it was in a different color in the Canu assembly than the Unicycler assembly, indicating that there are more members or something's wrong with that chain. So looking at the differences in the Compare Region Viewer, what do you think? Does it appear to you that one of those genes are pseudogenized? What assembly strategy gave a complete chain? Maybe they both gave complete chains. Keep in mind during all of this, because I have to keep reminding myself both of these genomes came from the same set of reads. So what does this tell me about the organism that was used to generate the assembly? What's the truth? We're trying to get to truth. You didn't realize that PATRIC was going to help enable you in your search for truth, but that's where we're going. Good luck. Not too many more assignments to go.