[MUSIC] So now we'll talk in depth about the types of inputs for syndromic surveillance systems and how these systems actually work. Broadly the data sources for syndromic surveillance systems are clinical data sources, and then what are broadly called alternate data sources. Clinical data sources can also be separated into two main types, including those focused on volume and those focused on specific diagnoses. In terms of volume, these could represent the total numbers of emergency department visits, or the total patient volume in a particular clinic or a set of clinics. Can look at the total admissions to a hospital or to intensive care units within that hospital. It can also look at the total numbers of calls to provide our hotlines or 911. It can look at total medical examiner or coroner case volumes, especially in the context of unexplained deaths. It can look at insurance claims or billing data. It can focus on clinical laboratory or radiology ordering volume, so for example the total number of X-rays ordered, or the total number of blood tests ordered. It can look at calls to poison control center calls. Importantly, this is just looking at the volume of these calls, these admissions, these tests being ordered, and not what tests they are. But alternatively we can look at diagnoses. So we can for example look at the triage log of chief complaints in an emergency department. We can look at the outcome or the diagnosis of emergency department visits. We can look at diagnosis that are made in the context of outpatient or ambulatory care clinics. We can look at the chief complaints in the provider hotline or 911 calls, and we can look at diagnoses made by medical examiners. Alternate data sources are brought. We can look at absenteeism, are people missing school or work more than would be expected? We can look at over the counter medication sales, health care provider database searches, volume of Internet-based health increase by the public. Internet-based illness reporting, and animal illnesses or deaths. Now overall, these data sources really rely on detecting features of a clinical case prior to the confirmation of that disease. In other words, what does somebody do when they are sick? Well, we know that they seek information via the Internet often. We know that they call 911. We know that they may visit outpatient services. They'll go to the emergency department. They'll phone provider hotlines and be triaged appropriately. They will miss school or work. They will buy medications. They will stock up on groceries. And a series of other determinant. But it's really about trying to understand what people do and develop surveillance systems to match those. When we look at how the public seeks information, these days it's exclusively or nearly exclusively via the Internet. And we can measure that using web clicks or search patterns. And there's examples, like the Global Public Health Intelligence Network, which is really focused on CBRN, or chemical, biological, radiological, and nuclear threats. And then Google Flu. Now Google Flu has been around for more than a decade, and it famously started evaluating and predicting flu outbreaks based on the types of things that people were searching for. Here's a schematic comparing the influenza-like surveillance network that we discussed in earlier talks. Two Google Flu trends, finally to a transformed Google Flu trend, where they use the Google Flu trend data but statistically manipulate it to control for some outliers. On the y axis, we're looking at the percent visits that are related to influenza-like illnesses, and on the x axis, we're looking at dates from January 2011, through to January 2014. And what you can see here in arrows, are some places where Google Flu Trends was earlier to predict a coming Outbreak of influenza than the more traditional case based system. And it is that Time T difference between when Google flu would have predicted that an influenza outbreak was coming. And when the traditional systems like the influenza-like illness surveillance system would have predicted it. Importantly, one can also see an example where Google Flu over-predicted, or where it was a false positive in terms of the severity of the outbreak. Highlighting the importance of interpreting syndromic surveillance in the context of confirmed case definitions as well, achieving a balance between those two systems. Another example where we looked at search patterns as predicting outbreaks was the Listeriosis outbreak in 2008 in Canada. This was an outbreak that was ultimately determined to be due to contaminated machinery at a single facility in Canada. But given the volume of meat that facility was producing, it caused an outbreak across the country. If we look at this graph, again we can look at the relative volume from 0 to 12 on the y-axis, and in the x-axis we can look in the year of 2008 from the months of June through September. Now, retrospectively they confirmed cases and saw that those cases really picked up in July. But it wasn't until late August that an official announcement was made related to that outbreak. What we can see is that looking at search volume for Listeriosis, and also for Listeria would have provided about a month's notice that there was an outbreak happening before the official announcement. And that's really due to the fact that people individually going to their doctor, having symptoms associated with Listeria, being diagnosed accordingly, being sent home with their diagnosis. And immediately Googling that and trying to understand what is happening. But the system as a whole didn't recognize that there was an outbreak for another month, and it would have allowed for an earlier intervention. And ideally to avoid those Listeria cases with the recognition at Listeria especially in the context of pregnancy can be really dangerous for both the mother and the fetus. Now another example as we talked about is that one can look at visits to the emergency department. And one can do that using the measurement of chief complaints or the measurement of billing codes using the ICD 10, or the International Classification of Diseases 10 standard for diagnoses. But it's important to note that this is also really complicated. Chief complaints often in the context of EMR, or electronic medical record systems, are entered by free text, and there's significant variation. This is a review from the New York City Department of Health and Mental Hygiene of the chief complaint database. You can see that there are 92 different variations of spelling related to the word abscess. There are 349 variations related to spelling of the word diarrhea. And for the word abscess, it's actually spelled incorrectly 45% of the time. The third most commonly misspelled word is vomiting, and if we look at some of the examples of those 379 ways that vomiting is spelled, we can see it is andvomiting, bomiting, cvomiting, v0mitting. Some of these things are related to shorthand that clinicians use, that it highlights the complexity of trying to analyze across these and really assess to see whether there's an outbreak happening. Similarly for billing codes the international classification of Diseases 10 has way too many categories to track, so requires grouping of categories. Here, you can see an example where vomiting and diarrhea are combined to include gastrointestinal symptoms, cold, respiratory and asthma are included to be respiratory symptoms. But it highlights that aggregation that needs to happen to support surveillance. Another example, and one that really builds on an earlier talk, is the use of over-the-counter medications. As we know, and as we've earlier discussed, there's a strong correlation between respiratory diseases, and people going out and buying flu remedies. In this graph, what you're looking at on the y curve, is smooth data ratio residuals, but really looking at overall numbers. And on the x axis, again, we're looking at time from September 29 2001, through to January of 2003. And what I've done here is show some examples of where the use of over-the-counter flu remedy sales would have predicted an outbreak happening. It's also important to note that I've highlighted some examples where it was a false negative in the sense that we didn't see that increase in purchases, yet there was increased respiratory diagnosis. And I've also highlighted examples where it looked like it was going to predict an outbreak but actually no outbreak occurred. So it highlights that there's clearly importance to the use of this methods, but still there's a lot of imprecision as compared to using confirmed cases. [MUSIC]