Chapter 4, designing a COVID-19 diagnostic test. In this chapter, we will discuss how we can use the information contained within our genome annotation in order to design a COVID-19 diagnostic test. Let's first explore some challenges that were faced in COVID-19 testing towards the beginning of the pandemic. In any rapidly spreading outbreak, a key strategy for minimizing spread is wide-scale testing. If people infected with the pathogen test positive, they can isolate until they can no longer transmit. As such, a critical first step in fighting a novel pathogen is to develop a diagnostic test that is accurate, fast, and inexpensive. Throughout the COVID-19 pandemic, however, inaccurate tests coupled with a lack of understanding of the concepts of false positives and false negatives in the general public has resulted in widespread problem. There are two possible outcomes of a COVID-19 diagnostic test. Either the test will say that the patient has COVID-19 or the test will say that the patient does not have COVID-19. Similarly, there are two possible realities for the patient. Either the patient has COVID-19 or the patient does not have COVID-19. If our test says that the patient has COVID-19, in other words, the test is positive and the patient actually does have COVID-19, you would call that a true positive. If our test says that the patient does not have COVID-19, or in other words, our test is negative and the patient actually does not have COVID-19, we would call that a true negative. However, if our test says that the patient does have COVID-19, in other words, our test is positive, but the patient in fact does not have COVID-19, we would call that a false positive. Similarly, if our test says that the patient does not have COVID-19, in other words, our tested negative, but the patient actually does have COVID-19, we would call that a false negative. As you can imagine, true positives and true negatives are desired, whereas false positives and false negatives are undesired. In general, when designing a diagnostic test, we want to maximize the true positives and true negatives while simultaneously minimizing the false positives and false negatives. Now that we have a better understanding of true positives, false positives, true negatives, and false negatives, let's explore how we can utilize this knowledge to design an effective COVID-19 diagnostic test by learning about a lab technique known as PCR. Our diagnostic test will require collecting samples of the viral genome from a COVID-19 patient, but the levels of viral genomic material that can be collected by nasal swab are typically too low to be able to be detectable by a test. To combat this, we will utilize a common laboratory technique known as polymerase chain reaction, or PCR, which essentially allows us to generate millions of copies of a given fragment of DNA. This may look complex at-a-glance, but let's break it down. Before starting a PCR experiment, we need a few items. First, we need a template DNA sequence of interest that we wish to copy millions of times. Next, we need a pair of primers, which are two short single-stranded DNA fragments that match to the left and right ends of the DNA sequence we wish to copy. We also need to add DNA nucleotides that will be used as the building blocks of the copying process. The first step of PCR is denaturation. We heat the sample to around 94 degrees Celsius, which causes our template double-stranded DNA sample to denature or split apart. The second step of PCR is annealing. We bring the temperature down to around 68 degrees Celsius and the primers can anneal or bind to the separated strands of our template DNA. The third step of PCR is elongation. We bring the temperature up to 72 degrees Celsius and we use a protein called polymerase to build a new complementary strand of DNA nucleotide by nucleotide starting from our primers. In other words, we elongate our primers to build new strands of DNA. This process repeats over and over again to keep creating copies of our original template DNA. Now that we've learned about PCR, let's use it to design a COVID-19 diagnostic test. Recall that upon assembling the SARS-CoV 2 genome, researchers worked together to curate a high-quality genome annotation. We can make use of this genome annotation to design our PCR diagnostic test. Specifically, we can design PCR primers to match specific portions of the SARS-CoV 2 genome. Then, given a sample collected from the sinus of an individual who might have COVID-19, we can extract RNA from the sample, convert RNA to DNA using a process called reverse transcription, and then perform PCR using the primers we have chosen. If the PCR experiment is successful, that would imply that there existed viral RNA in the original sample, which would mean that the person did indeed have COVID-19. Because of the biochemistry behind a PCR experiment, there are certain properties that are desirable in the PCR primers means. The specifics of these properties are out of the scope of this course. But fortunately for us, we can use tools like Primer-BLAST to automatically select primers that have these desirable properties. Experts are free to tweak the Primer-BLAST parameters as they desire. But for beginners like us, the default parameters are generally okay to use, at least initially. We now have the initial idea behind the PCR COVID-19 diagnostic test. A successful PCR experiment using our COVID-19 primers on a patient sample would translate to a positive, and a failed PCR experiment would translate to a negative. However, let's revisit what we discussed earlier about true positives, true negatives, false positives, and false negatives. As mentioned before, true positives and true negatives are desirable. We want our test to say "Yes", if the patient actually does have COVID-19, and we want our test to say, "No", if the patient actually does not have COVID-19. How exactly can the undesirable outcomes, false positives and false negatives come about? First, let's discuss false negatives, which were by far the most frequent type of erroneous diagnostic test result during the COVID-19 pandemic. Recall that our diagnostic test will output, "No", if the PCR experiment fails. If the patient does not have COVID-19, the sample will not have RNA matching the SARS-CoV-2 primers we designed. The PCR experiment will fail, which is a good thing. However, is this the only potential cause of a failed PCR experiment? Unfortunately, that is not the case. What if the patient had COVID-19, but the sample collection was performed incorrectly. So no SARS-CoV-2 RNA appeared in the sample. What if the sample was collected properly, but the PCR experiment was performed incorrectly? What if the sample was collected properly and the PCR experiment was performed correctly but the patients simply have low amounts of SARS-CoV-2 RNA in their sinus, and the amount that was collected was insufficient for the PCR experiment to succeed? Lastly, just like all genomic sequences, the SARS-CoV-2 genome evolves over time. If our PCR primers happened to target a portion of the SARS-CoV-2 genome that evolves rapidly, future mutations could prevent our primers from functioning properly, which would yield a failed PCR experiment. There are unfortunately numerous ways for our PCR diagnostic test to output "No" even though the patient truly has COVID-19. Fortunately, for us as bioinformaticians, we have the ability to prevent false negatives that could result from mutations occurring in the primer regions over time. Specifically, we can take advantage of the fact that protein coding sequences undergo high selection pressure. Because proteins are critical to the functionality of an organism, we generally don't observe very many mutations in protein coding regions of the organism's genome because such mutations can critically damage the organism. So we wouldn't be able to observe that resulting organism. It would just die out. Thus, when we design PCR primers for our diagnostic test, we can select primers that appear in a part of the SARS-CoV-2 genome that codes for a functionally important protein such as the spike protein. As an epidemic or pandemic progresses, we can continue sequencing viral samples from patients and monitor what parts of the viral genome are relatively conserved, parts with relatively few observed mutations. What about false positives? Recall that our tests will output, "Yes", if the PCR experiment succeeds. How is it possible for the PCR experiment to succeed even though the patient did not truly have COVID-19? One simple possibility is contamination. A testing facility could potentially mishandle the samples they receive. SARS-CoV-2 RNA from a positive sample could accidentally contaminate a negative sample. However, what if the testing facility was perfect and the PCR experiment was conducted perfectly? Could we still have false positives? Recall that SARS-CoV-2 is one of many coronaviruses, and as such, much of its genome is identical to, or at least very similar to the genomes of other coronaviruses. If we accidentally chose primers from the SARS-CoV-2 genome that happened to also appear in other coronavirus genomes, a sample from a patient with a sickness other than COVID-19, such as the common cold, could yield a successful PCR experiment simply due to our negligence. Fortunately for us as bioinformaticians, we have the ability to prevent false positives that could potentially results from PCR primers that match other viruses. Specifically, we can use Primer-BLAST to get a set of candidate primers, and we can then blast each primer to see if it also appears in any sequences other than the SARS-CoV-2 genome. If we have any non-SARS-CoV-2 matches, then that primer would be a bad choice that could result in a false positive, so we would throw it out.