[MUSIC]. In this lecture we'll discuss a tool we developed called L1000CDS square to query a subset of the L1000 gene expression data. To identify small molecules that can either reverse or mimic gene expression, given input of differently expressed genes. So L1000CDS square is a search engine for gene expression signatures, to find the best matches given and input of the freshly expressed genes. We only query a subset of the LINCS L1000 small molecule expression profiles because we only took those perturbations that induced a significant change, we also are using the characteristic direction method that we developed to identify the differentially expressed genes. So this is why L1000CDS square is different from the tool that is provided from the Connectivity Map database from the Broad Institute. So the user input can be either up or down gene lists, or a differential expression vector. The tool also supports search for drug combinations, with ability to predict combinations that potentially will have synergy. The tool also provides canned analysis for many diseases, perturbation types, and potential small molecules that can inhibit a growth. Of cancer cell lines. So here are some of the numbers that give you some statistics about what's behind the scene of this tool. There are 22,926 signatures that we deemed significant, and they come from the CPC and the CPD perturbagen batches from lincscloud.org. We also added a new set of perturbations that are published on the Gene Expression Omnibus recently. So in total we have about 33,197 gene expression signatures. These signatures cover almost 4000 unique small molecules applied to 63 cell-lines. So, L1000CD square uses two separate search algorithms to identify signatures that match the input, up and down differential expressed genes, or the expression vector. If you are inputting up and down gene list, the tools simply look to two intersections between your two gene sets. And gene sets that were determined to be differentially expressed by the drugs. If you want to reverse expression, we look for the up genes that are intersecting with down genes, and the down genes intersected with the up genes. If the input is differential expression vector, we use the cosine distance for the search. Cosine distances between the input vectors, and all gene expression in the database are calculated. And the top ranked signature will be either those with the smallest. Cosine distance from the input vector. For the search for the combinations a predicted synergy score, and to do that, we compare every possible pair among the top 50 signatures, and compute the potential orthogonality of those pairs. When we looking for pairs, when the input is the gene set, we just combine the overlap of the differentially expressed genes of the two signatures with the input differentially expressed genes. So now let's go to the site, and examine the various parts of the app. The input page consists of five sections. Input text boxes, examples, and signatures, configuration, metadata, and recent searches. The entry point into the L1000CDS square app is to paste up and down gene list into the up and down gene text boxes, or place the signature into the up gene text box. A signature is a list of genes with the differential expression value separated by a comma from the gene ID. The up gene text box will automatically detect if the input is a gene, list, or a signature. The search button will only become enabled when both up and down genes' text boxes are filled with gene lists, or when the up gene text box is filled with a signature. Clicking the search button, and the information for the tab 50 signatures that match the input list will be displayed in a table on a new page. The app comes with examples in the signature section, which includes pre-computed signature that the user can use as the input. Clicking on the Gene-Set Example button, will fill in an example of the up down gene lists, in the text boxes, for demo search using the gene set method. Clicking the signature example button, will fill in an example of a signature. The EBOV signatures, are gene expression signatures, measured at three time points after Ebola infection. And this is unpublished data. From our collaborators. The Disease signature comprised of 670 disease signatures computed from GEO, and some of those signatures were extracted by the students of our last course, that are now assistants biology. The Ligand signatures are consensus ligand perturbations calculated from another LINCS L1000 subset. Called LGP4. The CCLE signatures were computed from the CCLE gene expression data by comparing the gene expression profile of each cell line to the rest, using the characteristic direction method. The goal here is to find small molecules that can potentially inhibit the growth of those. Cell lines in a specific manner. In the configurations section, the mimic reverse slider allow you to choose between either a search for small molecules that would mimic the input signature, or ones that would reverse the input signature. The default search mode is reverse. Small molecule combination search is supported, but you have to check the search for small molecule combination check box. Users can also share their input signatures and metadata, so others can query the signatures and gene set they submitted, by checking the share check box. In the metadata section, any metadata associated with the input signature can be entered. Clicking the plus sign will allow you to add more information. Most importantly, users are encouraged to type in at least one tag for future reference. The twenty most recent searches are stored in a Recent Searches section. Clicking an entry of a recent search will invoke search results in a new page. Note that those recent searches are stored in a browser's local storage, so cleaning the browser data would result in a loss of those records. On the results page, the search results are rendered as a paginated table with 14 entries per page. Each entry provides seven pieces of information about the signature, rank, score, perturbation, cell line, dose, time point, and overlap with the input. For gene set search, the search score is the overlap between the input differentially expressed genes, and the signature differentially expressed genes divided by the effective input. The effective input is the length of the intersection between the input genes and the L1000 genes. Since some input list contain genes that are not present in the L1000 dataset. This includes all the 22,000 L1000 genes. Not just the measures 1,000. Clicking the overlap button, will show the overlapping genes and their value. In two text boxes. If the user inputted up and down gene list, the first box will show the overlapping genes, between the input up gene, and the signature up genes. Those text boxes also provide a link to do some analysis for those gene lists with Enrichr. Clicking the download button will allow you to download all the information about the signatures as a JavaScript object notation or JSON. On top of the table are buttons and icons that provide various useful services. For example, you can reanalysis the input and this will redirect you back to the input page, and then you can change some input settings. For example, you can switch between mimic and reverse. You can also share your results with others. By providing them with a permanent link. This will also allow the tag button displays the tag and search mode. Clicking on the button shows the input meta data. The cloud download icon downloads the result table as a CSV file, or an Excel file. The diamond icon performs enrichment analysis on the substructure of the top-ranked small molecules. This can potentially enable the design of small molecules with desired properties. So this is the result of the substructure enrichment analysis. Which is displayed in a table where each row is as significantly enriched chemical substructure. Each row provides three pieces of information, substructure, P-value and perturbation count. The substructure is represented as a string in the SMARTS format. The p value is computing using the fisher exact test. The perturbation count shows the number of perturbations that have this sub structure. Clicking the plus sign shows a visualization of the sub structure, and a table of the top perturbations that contain the sub structure. The rank in the table is the rank of the perturbation in the top 50 signature table. If you are choosing the small molecule combinations, a table of signature combination will appear below the single perturbation result. This table is also paginated be 14 entries per page. Each entry provides three pieces information about the identified combination. Rank, synergy score, and combinations. When searching for combinations at 1,000 CDS square compare every possible pair among the top 50 signature for a single search. Synergy score is calculated as the combined overlap of the differentially expressed genes, if the input of the two sets of up and down genes. In the cosine distance search the synergy is calculated by the orthogonality of the two characteristic direction signatures. The rationale for this is that two perturbations that are orthogonal may have an overall effect. On two independent pathways, pushing the cells into the same direction. The rank is based on the Synergy score. The number before each chemical perturbation in the combination column, is the rank of that perturbation in the single signature results page. Clicking on the perturbation, we'll highlight that perturbation in the single signature results table. This allows the user to learn more details about that specific perturbation. Clicking on the cloud download button on the upper corner will download the combination table as an excel file. So this tool developed by the DCIC, using the data that was developed at the Broad Institute Transcriptomics Link Center is a power tool for drug discovery and systems pharmacology. In the course, we have an opportunity to use this tool to predict drugs for many diseases by combining another two called geo to enricher that will be demonstrated later in the course. [MUSIC].