[MUSIC] Hi everyone, my name is Pimlapas Leekitcharoenphon, I'm a researcher from the Research Group for Genomic Epidemiology at National Food Institute, Technical University of Denmark. Today, I'm going to talk about MLST typing, MLST tool description and application. For many species. Multilocus sequence typing or MLST is considered to be a gold standard of typing. And traditionally, it's performed with an expensive and time-consuming manner. For example, you have to PCR those seven housekeeping gene and Sanger sequencing and then you have to analyze what the ST type of your bacteria species. But right now, the cost of whole genome sequencing continue to decline and it become increasingly available to scientists and routine diagnostic laboratory. And currently, the cost of sequencing whole genome sequencing is actually below the cause for traditional MLST, and now you might wonder why we have to do MLST, why it becomes the gold standard of typing. I give you one example of one of my paper on genomic of emerging clone of Salmonella Typhimurium ST313 from Nigeria and Congo. The story is that typhimurium, salmonella typhimurium is everywhere in the world and they have many clone, many substrains. But there's one particular strain called ST313, the sequence ST313 is actually make people with more more deaths, more invasive than the other typhimurium. Then, as a public health or anyone that are doing with surveillance, if you found a typhimurium with ST313, then you need to act very quickly. Because it more invasive than the other normal typhimurium that you might usually found in your country or in your cities, for example. This is why we need to know about the ST type, okay, let's go back to the MLST tool that we're going to talk about today using whole genome sequencing. So at our group, we developed a web-based method for MLST-based typing, based on whole genome sequencing data. And, of course, the MLST tool has advanced database that we actually download it monthly from the PubMLST. You can actually go to PubMLST and get the database of MLST by yourself. So how does that database look like? It's composed of two things, first, it's composed of the ST profile, for example, it contains seven housekeeping genes and each variance number. This combination of variance number, you can get the ST type, so this is the ST profile. Another part is you need to have the sequences of all the variation of each seven genes, so this is another component of the database. So let me give you more example how you actually get the ST type of this. Every species, it has it own scheme, this is an example of one specie, let's say so. So normally, it has seven genes and in each of the genes, you see they have different number. For example, this one, they are aroE, they have variance number 1, variance number 4. They might be differ by one or two nucleotide differences, but anyway, each of variation for each of the loci of the housekeeping genes, they have it own number. And when you search your genome and you have this profile, the variation of each of the loci, the combination, unique combination of these seven numbers, you can identify the ST number. This is what is already in the database or PubMLST and in the database of our tool here. And the way that we search to identify the number, here we use BLAST so the tool only look at the best-matching MLST allele and give the number over here. The tool has been tested with multiple dataset, the first one is assembled data genome of 336 isolates covering 56 MLST Scheme. The second dataset is from raw read data of 387 isolates covering ten scheme. And we have a small test set with raw read data of 29 isolate, which we actually did test it in the lab for identifying the ST types. So we have the lab result to confirm the whole genome sequencing results and the results show that MLST can actually determine the sequence types of all Isolates that we have in the testing set here. The idea of MLST tool is as a scientist or researcher, when you have whole genome sequencing data to submit to the MLST tool in CGE and it's going to determine the ST type for you. The database as I told you contain the different allele sequences from the publicly available database like MLST and as a user, you submit your unknown sequences either as assembled genomes. If you submit as assembled genomes, the tool will use BLAST for the alignment. It is submit raw reads, the tool will not do any assembly, but the tool will map your raw reads directly to the sequences using KMA or K-mer alignment tool. And what you get you will get the sequence type of your submitted genome. This is a link to the MLST tool and it's the front page of the MLST website. So first thing first, when you see this tool the first thing that we have to choose, you have to choose the MLST configuration which is the species of your strain. That mean first of all before using MLST, you have to know species of your unknown strain. So you can see there are plenty of a list of the species, it's only available for those species that have the MLST scheme. So for those species which are very rare that doesn't have a scheme, but most of them is already here. Some of them you can see they have scheme number one, scheme number two. There is some example, Escherichia coli E.coli have scheme number one and and number two. The differences between one and two might be the one has seven housekeeping gene, the number two might have eight housekeeping gene. And of course it gives different ST types, okay? But most of the people they're likely to use the scheme seven housekeeping gene. So next what you have to choose next is what is the sequence type of your input data? So for example, if you choose assembled genome or contig data, then your input need to be in the format called Fasta format. This example of Fasta format two lines first ID and the sequences. If your data is what we call raw read or raw sequencing data, it needs to be in the format of FastQ format like this and then when you choose everything here, now you can upload your data. You click here to upload your data and again, this tool is only accept one submission per one strain. You cannot submit multiple strain per one submission. Okay, so once you have the strain here, you click upload the data. Once the data is already upload completely, you get these page. It's like every tool that we have. If you want to wait for the result, just stay here but you can have an option to put your email here and click notify me via email. When the tool is done, it will send the output link into your email, let's have a look at the output of MLST. This is an example. Of course it tell you what is MLST scheme that you chose. What is the organism and then here, here this is the output that you need from from the MLST tool. What is the sequence type of your input genome and you see more detail of your result. You see those seven housekeeping gene, seven loci and then you see what is allele's number. This combination of this allele's number determine your sequence type and moreover, you see how good your data alignment of your data. For example the coverage, what is the coverage mean? The coverage is the length of the alignment between the best matching alleles and the corresponding sequence in a genome. If you have 100%, what does it mean? You have the gene, right? Let's say this is the gene and this is your sequence, if your sequence align completely from the first position to the last position, this means 100% coverage. So you can see from here alignment length, if alignment link equal to the size of alleles, that mean it aligned from the first position of the gene to the last position. And if every match is identical, no mismatch, you would get 100%. So if you get 100% identity, 100 in coverage, that means perfect hit. We can trust it's completely the result. And if you want to see more, we have more options over here. For example you can get all of this output in the text file in the TSV file and you can get whatever hit. So let's say the hit of this gene, what part of your input genome that's similar to this gene or all the gene here. You can get from Hit in genome sequences and it's going to look like this, the part of your genome that similar to whatever genes that you identify over here. And of course you also can get the ST type sequences of these seven house-keeping gene if you click over here and you can click extended output to see an alignment in detail, you want to see it. For example, you might want to see if there's any mismatches in the alignment you can see from this option. If you want to read more about these tool, you can go to this paper and use these tool. You can sign this paper as well and. You want to know more about databases of this MLST here in the front page you can see the link to software and link to database version. And by the way, most of the CGE tools that we have, we have the web based version and we also have the stand-alone version that you can actually download and install in your own server. Or your own computer, but you have to have a little bit of UNIX skill or bioinformatics skill in order to be able to install and run the program that, but this is an example if you want to download installed tool by yourself you can go to this website. We have a bit bucket. Our Genomic Epidemiology here. That's all for the MLST tool and thank you very much. [MUSIC]