So you've launched your comprehensive genome analysis job and we looked at the upper level files, so let's just go look and step where we've been. So I click on my jobs page. I want to go to an earlier job that I did I could filter here by clicking on comprehensive genome analysis an let's look at this job. So I highlight the row and then I click on the view button in the vertical green bar to the right. In an earlier video, we discussed in detail the full genome report and what all of these downloadable files are that come with a comprehensive genome analysis, this one when we submitted it, we used reads, so we have an individual assembly job. And we also have an individual annotation job now. Earlier we talked about what you see when you click on the checkered flag here to go see the assembly jobs. In this video, we're going to talk about what you see when you click on the annotation job. So we click on it here and it opens up. There are a boatload of files here. If you've looked at any of the annotation videos that I presented for just the straight up annotation service, I go into a lot of detail describing these different files in the formats that are there, I'll just do it briefly here. The can't eggs are here, the fast A files and he can download those. If I click on it you see I can download it. There are a number of things I can do with that. This is an EMBL format of annotation. This one has the feature DNA faster, so if you open that up, it's going to have each of the chains and the DNA sequence for them. This one is the same thing, but this time it's the protein sequence. This is just a textless of the features. This is a GenBank format. This is the genome format. This is another a GFF file. This is the merged GenBank format and then you have it all wrapped up in a tar GC. This when I sort of skipped over annotation genome. It's a JSON file that shows all the information associated with the annotation. The text file is a summary of that. You can get Excel file which I would encourage you to download and look at. It shows you all the information about each of the jeans that are called in is with the sequence, and then there's the quality JSON file. Now you notice I skipped over this one. All of the files are important, but this is a really important file within the annotation job cause it tells you a lot of information about the quality of your genome. So I've highlighted it and you can click the view button. And this is what it tells me. It tells me the genome ID that it has in Patrick what the genome is, what the reference genome was, when it's looking to see how good of quality this genome is, it's comparing it from this particular genome, and you could click on that. And it would open up a new tab and show me just exactly who that genome is. Sometimes we have to wait a little bit for Patrick to show that it showing me this was the reference genome that it used. It says the course consistency is 98.7. The fine consistency is 98.4 and it gives me estimates of completeness and contamination, so I feel pretty good about this genome. Here's the evaluation group. It tells me how many can't eggs and then a whole bunch of different parameters on estimating the quality of the genome. The M50, the L50, and this is the interesting one where it's looking at the stack lacrosse this group. And that reference genome and it's asking does it have any genes that are unique to it and sent missing some genes that it shouldn't have. So we have over present under present rules the rules that are predicted. The completeness rules that total distinct roles and then Patrick speak rules is the genes. And then it tells you the jeans without a functional assignment, meaning the genes that are not cold, hypothetical and the jeans with the functional assignment. How many of the feature coverage your protein encoding? How many are hypothetical? Actually, that's pretty good compared to most of them have actual function and the features that are In the local protein families, I've talked exhaustively about global families and local families. Local families are those very specific genus down. So the potentially problematic roles are, Patric has scripts that are running to look at your genome and it's saying, what do we expect and what don't we expect? And so it looks like this particular gene. It predicted that this genome would have one, but it has five, so it got flagged five of this. Immediately, it seems to be a problem, and then it tells you where those are and what the problem is, this is actually so valuable. Look at this, comment that it has. This gene, number 8, it has no good roles in his short, meaning that it's probably not a good gene. And right next to it, number 9, no good roles in his short. NBumber ten, no good roles in his short. This is an indication to me that these are all on the same time taken. There's probably either some assembly problem with this or this is truly a gene that has issues and it's becoming pseudogenized. And it may be pseudogenized because there is a perfectly good gene somewhere in the genome, like here. This one looks like it's pretty good. And here's another one that this one, and actually this is the one that's calling and it's looking at that reference genome. I'm with this number 127.6 and well, if we go up here at the 127.6 it looked at those two genomes and it says you know why? It's really close, it's got 869 out of 1000 matching kamers. We think that this is the one that did it and the rest of these ones are probably no good in junk. You can dig into that if you wanted to by clicking here and that will take you to the page for that particular feature. It actually it's going to the compare region view. I'd be more interested in looking at the genome browser when it comes up. But this is the overview for that gene and the genome browser should show me all those little pieces when it pops up at those individuals genes. So it's just taking its time to load on that, but that's just shows you how you can take from this, and this is how valuable this report is. It showing you all this information and what you should expect. I think by far it's one of the best reports that we have in Patric that really allows you to dig in and what's going on with the genome that you used in the service. Now in our subsequent videos were going to talk about how you view this genome in Patric. So you have the report in everything but what you really want to do is see your genome in the context of Patric and what you can do with it once you get there. So we're going to show you that and then will also show you where you can go when you need help. So see you next time. Okay, everybody we've got another kind of complicated one. Remember earlier with the assignments with the hybrid assemblies, how I had you first go into the full genome reported copy the length of time, the long read coverage in the short creek coverage. And then open the dot PSD folder and get the cross statistics an put those in that Excel sheet. Well, now we're going to add in the annotation results so that you can compare those things. So we're going to need to navigate the annotation folder for each of those hybrid comprehensive genome analysis jobs. Open the genome report within the annotation folder, and then copy and paste the results table in Excel for each job. I'll show you how to do that. You're going to need to do it 27 different times. And then I want you to go through and just look at some of the differences in those things. So I'm just going to take any job, I'll take this Unicycler 00, click on that. Click on View, and then I need to navigate to this annotation folder. So I click on that, and then I open the genomereport.html. The annotation shop doesn't give you a TSV folder. But luckily enough, view that? This is structured in a way that I can put into Excel, not as cleanly as I'd like, but it's okay. So I would just take the whole table, copy it, and then I need to go. Into my Excel folder and this is the right job, let me see here, yes, it's the correct job. And okay, I apologize this is going to be a little complicated and some of you who are better with scripting and stuff can do it better than I'm going to do it but I'm just going to show you the way I do it. So I paste and I say match destination formatting and then the genome ID, I need to copy that over and then I take these IDs and I just copy it and then I transpose it. So it should have all of those filled in and then I just go through an delete that and do that step by step, I know it's a little tedious. But what you'll find out that's really interesting, pay attention to the consistency and completeness with each of these. We already have the contact numbers, but also you should be paying attention to, over present rolls and under present rolls, things like that. And so you'll start to get a sense of how as you mess with the assembly it has downstream effects in the annotation. And also to me interesting are at the features that are hypothetical and the features that are protein encoding. Now we have this 1% features that are in local protein families. We chose bacteria well, you didn't choose it, I chose it for you when we annotated this. If you recall back from the other discussions, an annotation like in the annotation tutorials that were earlier. To get the local families, which are the genus specific families, you need to at least specify it a genus. I didn't do that with this because I wanted you to have some unknowns, because sometimes I'm assuming you aren't exactly sure of what it is that you're working with, so we're just following along that little path. Had we chosen what this organism was, then we would see that thing filled in. Good luck, it'll take a while, but I think you'll really like seeing the patterns that pop up, bye.