Hi everyone. In this particular episode, we're going to learn how to cluster the view in the Heatmap. There are two buttons here that allow you to cluster the data. One is the Standard cluster and then the Advanced cluster. Let's just click on the cluster and see what happens. What this is doing is it's taking all the data on the protein families in the genomes, and it will show a visual clusters on the Heatmap. It'll rearrange both the genomes and protein families based on presence and absence, and try to create patterns. Sometimes we have to wait a bit for it, but this looks pretty nice. Conceived got across this way, all the protein families that are broadly shared. As you scroll down more, you'll see these other patterns. You see areas where protein families aren't broadly shared. If I look at there, I noticed these genomes are from the Brucella that have been isolated from the marine mammals. You can continue on down until you get the ones that are singletons for particular genomes, like these are all from Brucella inopinata, scroll all the way, and see all the patterns within them. Generally, I'm looking for a nice picture and an easy way to see the data. There are different clustering algorithms that we have in PATRIC. To see those, you can click on Advanced, and you can choose to just cluster the protein families and leave the genomes in the same place. This would be especially important if you've had gone ahead and arrange the genomes in a specific order like I described in the last video. Let's say you arrange the genomes in a phylogenetic order, and you didn't want that to be messed up, then you just click on the Protein Families or you might want to meet the protein families. Let's say you filter down the specific set of protein families, and let's say you change the orientation of those the way they appeared in the genome, and you didn't want that messed up, so then you just choose genomes to cluster behind. PATRIC provides a number of algorithms that can be used for clustering. Just for fun, let's click "City-block distance". I don't know anything about this algorithm, except I like the name. We click on that. We have four different clustering types; average linkage, who's the default. Let's click "centroid linkage" just to see, and let's submit that, and see how that job changes. Sometimes you have to wait a bit, especially when you have a lot of genomes that you're trying to cluster, sometimes it can take a long time. We've recently improved the Heatmap view, and so it works a lot more quickly than it used to work. This would be the city-block distance with the centroid pairwise clustering. You can see, it's changed not only the order of the genomes, but the protein families, and it just gives you a different picture, it's just something you would play with. I don't know that anybody be judged by the clustering algorithm that they used, but they're just different ways to look at it. Once again, you could look at where particular genes were after they had been clustered. It's a pretty fun little way to look at the data. I like the colors so that we can see strong patterns. In the next instructional video, we'll be talking about anchoring the genome. Also, let me point out that after you have gone to work creating the image, you can take a snapshot of it and download it. Those are all things that you can do with the Heatmap view in PATRIC. See you next time for anchoring, and thanks for using PATRIC. Bye. For people who are in the United States, there is this, or used to be this commercial for Monday Night Football where they'd say, "Are you ready for some football?" I counter in this assignment, "Are you ready for some clustering of the Heatmap?" Granted, probably not the same thrill as football. But actually, it's pretty thrilling to me, and I do love this functionality, so I hope you didn't close your tab. Go ahead and hit the cluster on the default. When I compared my genomes and granted things, changed somewhat between each assembly job. There were some differences that I could see in protein families that were assembled using Canu that had zero rack-on iterations and 1-4 pilon iterations. If you see those, figure out how many protein families there are, and take note of them. Now, are there any protein families that are missing across all the genomes assemble by Canu, that are present in all of the genomes assembled by Unicycler? How many protein families do you see like that? If you are a whiz kid, with that filter down the left, figure out how you can use that to answer this particular question.