In this instructional video, we're going to show how to use the filtering mechanisms in the protein families order to be able to narrow down and examine a specific group of features. Where we left off, we had launched the Protein Families order tool and it had given us this table that has a filter on the right and the tabular results on the left. We talked about how you could get the pan-genome here, the core genome here, and be able to identify protein families unique to individual genomes by clicking here. What I want to show you now is how to use this filter. Pick narrow down the results page to be more specific. Let's say you wanted to find all the protein families that had the word flagella in them; click "Filter", you notice now we have 34. These are all 34 protein families. However, if I went flagellum, let's see if they have something different. I have to click "Filter" here. You have to know what the things or names sometimes to be able to narrow down on these protein families. When we talked earlier about genome unique identifiers, there's also the protein family unique identifiers. Let me reset this to show all 5,000 results. A really good example of this is taken from this paper by this group of researchers that were looking at a Brucella [inaudible] from a particular frog. They listed all the genes within the paper that were involved in creating the L antigen, which is an important defense mechanism of the Brucella. I went through and I captured all the protein families that match those genes. So let's go into PATRIC, click on "Workspaces", Public Workspaces, go to PATRIC Workshop, click on "Protein Family Sorter" and where it says, Brucella LPS genes, click on that row, and then click on "Download". When that pops up, it gives us a list of the protein families that are involved in creating the L antigen in Brucella. I think there are 72 of them. I'm going to copy all of those. Let's go back to the protein family sorter. This is an awesome way to use this filter box. You can click there and have those protein family IDs. Then I click "Filter" and it shows me just those 72 results. I wanted to show you something that you can do from the protein families. Here is, for example, let's choose one of these protein families. How about this one? I click on that protein family and it populates the vertical green bar with downstream actions. Let's click on "Multiple Sequence Alignment". This will open up a new tab and it will present an alignment, and a gene tree. What we have across the top is something called web logo that summarizes the data down below. It's bigger, this is taller, because it's showing that the number of genes that have lining here. So it's bigger when it's totally conserved. There, if you have it, it's a smaller size. You can scroll down the alignment. You can see areas like all of the melatensis. It looks like half a T here instead of an I. This is a really nice and fun functionality in PATRIC, to be able to see these different protein families this way. Oh, look here. Here we have protein family. It looks like there was a frameshift, because everything was going well in this kinase genome. Then there's a different sequence here, which is an indication that this gene is broken. In Brucella kinase, this is a very important gene. When the Brucella have a functioning O antigen, they have a smooth phenotype when they're plaited out. So they're identified as smooth. This particular mutation gives all the kinase genomes, a rough phenotype. It makes them different form all the other Brucella. Let's unclick this now. Let's reset the filter. There are a couple more things to discuss. In PATRIC, you can filter on perfect families, which is one protein per genome. Note that right now, there are 5,097 protein families. If I click on Perfect and Filter, it goes down to 3,999; so like a significant portion of the Brucella. Their genes are very tightly conserved. You can click on Non perfect Families and Filter on that. So there are 1098, and the details of the number of crude pins and the number of genomes that have them are here now. A non-perfect family may just mean that it's got two copies of that gene, but it could be that they're truly unique to it. You could click on this and filter down, and see which only single genomes have or a couple of genomes have. Let's reset that filter. Another thing you could do is, I could say, I want to see the number of proteins per family. I want to see the ones that have 10-28. I don't know why I'd want to see that, but let's see. You can do that. I can also say I want to see 1-37 or any other number; filter that, and be able to see all the protein families that are not shared across everyone. That's the way to use the tabular results to filter the information. We can also launch a multiple sequence alignment from this. In the next instructional video, we'll be talking about looking at the heat map. You're going to want to watch that. Thanks for listening and thank you for using PATRIC. Remember, we're looking at the two genome groups; one with the hybrid assembly that used Unicycler strategy, and the others that use Canu. Within those groups we have different levels of Racon or Pilon iterations. In the last assignment, you were supposed to launch the protein family sorter, and the core pan, and accessory genome. Here are some more questions. You need to be using the filters in the table. How many perfect families are there? How many non-perfect families are there? When should you use the textbox and filter on the word kinase? How many protein families across all of these groups have kinase in their name? How many protein families have less than 25 genomes per family? How many protein families have six proteins in them? What are the genomes that have that particular protein family? I want you to look at two protein families from that whole table. Choose one that has a high standard deviation and one that has a low standard deviation. I want you to generate a multiple sequence alignment for both of those. Which has more conserved alignment? This one should be fun for you, and it's going to help you learn to use those filters better and make you into a protein family sorter expert so that you can just whip these things out when you're looking at your own data. Let me know if you have any problems.