In this session, we will discuss some basic concepts in evidence synthesis and parameter estimation. “All models are wrong, but some are useful.” This was a quote made by George Box who was an eminent statistician in the 20th century. It is useful advice to heed when we evaluate the validity of the assumptions and results in infectious disease models. For example, in the SIR model, we assume that the force of infection is the same for all susceptibles, which is the same as saying that everyone in the population will make contact with each other with equal probability. This so-called homogeneous mixing assumption is clearly unrealistic, for example, people of similar age are more likely to have contacts with each other. Although the SIR model is built on this and other unrealistic assumptions, it demonstrates robustly the general principle of how herd immunity limits the spread of disease. In this sense, the SIR model is useful for generating insights into epidemic dynamics, and indeed has been an instrumental tool in infectious disease epidemiology. However, as Albert Einstein said, “Everything should be made as simple as possible, but not simpler.” This means that ideally, the model should be (i) complex enough to provide a robust answer to the question we are trying to address but (ii) simple enough so we don’t get distracted by unnecessary details. We learn from the epidemiologic triangle that epidemics are interplays among the pathogen, the host and the environment. As such, when infectious disease models are being used to inform public health policies, their assumptions, structures and parameters often need to be based on the best available data from all related disciplines. For example, when evaluating vaccine recommendation or allocation strategies, we need estimates of transmissibility and severity in order to assess the reduction in morbidity and mortality associated with each vaccine strategy. These estimations are often based on multiple sources of data, including those from clinical surveillance, clinical trials, outbreak investigation, hospital-based data, lab surveillance, serologic data, etc. In this so-called evidence synthesis approach, the model is an interpretive tool for integrating multiple sources of data to form the evidence base for decision making. This is why in practice, infectious disease modeling is often a multidisciplinary effort among modellers, microbiologists, public health practitioners, frontline clinicians, ecologists, statisticians, and scientists from many other related fields. The most difficult step in evidence synthesis is parameter estimation. Parameters refer to the numerical constants in the model which determines the epidemic dynamics. For example, the SIR model has two parameters: beta, the rate of making contacts that conduce infection if the contact is made between a susceptible and an infected person, and D, the mean infectious duration. Some model parameters are directly measurable, while others need to be inferred indirectly from other observable data such as epidemiologic data. For example, in the SIR model, the mean infectious duration may be estimated by longitudinally following a cohort of infected people. However, beta cannot be directly measured because it is an abstract parameter; we often don’t have a precise physical definition for the contact that beta represents, let alone measure their rate of occurrence. For example, while we agree that influenza can spread via droplet and airborne transmission, we don’t know precisely how close and for how long contacts have to be, in order to conduce influenza transmission. One solution is to infer beta from the epi-curve, which means finding the value of beta that makes the model epi-curve as similar as possible to the observed epi-curve. This procedure is dubbed “fitting the model to the data." Let us now illustrate how this works using a simple model. Suppose this is the epi-curve for a novel pathogen in a population of 100,000 and outbreak investigation suggests that the mean infectious period is 3 days. To fit the SIR model to this epi-curve, we can try different values of beta in the model, measure the similarity between the model epi-curve and the observed epi-curve, and choose the beta value that has the highest similarity measure. Because we are only inferring one parameter, we can find the best-fit value by simply plotting beta against the similarity measure. Commonly used similarity measures include likelihood and sum of squares from introductory statistics. So how does the model serve as an interpretive tool for the epi-curve data? In other words, now that we have estimated beta from the epi-curve, how does the estimated value of beta help us understand the dynamics of this epidemic? Recall that the basic reproductive number R0 and mean generation time Tg are the two main determinants for epidemic dynamics. However, R0 and Tg do not explicitly appear in our formulation of the SIR model. Instead, we use beta and D to parameterize the SIR model. This is because when constructing the SIR model from the first assumptions, it is more natural and intuitive to describe the underlying processes using contact rate and recovery rate. We can deduce R0 in the SIR model as follows. When one infected person is seeded into a population of size N, he is equally likely to make infectious contact with all N susceptibles for an expected duration of D, which is the mean infectious period. Therefore, R0, the expected number of secondary cases generated by this infected seed, is simply beta times N times D. With further analysis, which we won’t go over here, we can show that the mean generation time Tg is simply the mean infectious duration D. Despite the simplicity of the method in this example, a landmark study by Professor Marc Lipsitch in 2004 used a slightly modified version of it to provide robust estimates of the reproductive number of the 1918 pandemic influenza A/H1N1 strain by fitting an epidemic model to pneumonia and influenza death epi-curves from 45 US cities. The conclusion was that R was about 2-3. When studying epidemic dynamics from epi-curves, we often have multiple parameters to estimate, in which case fitting models to data requires sophisticated techniques in numerical optimization and statistical inference. For example, to characterize the epidemic dynamics of the 2009 influenza pandemic in Hong Kong, we fit a model with more than 20 parameters to the pandemic H1N1 hospitalization epi-curves and seroprevalence curves. The list of parameters include the initial reproductive number, the mean generation time, the effect of proactive school closure and summer holidays on reducing disease transmission, the effect of age on susceptibility, and the effect of age on the probability of hospitalization and seropositivity among infected people. One of our findings was that summer holidays reduced the transmission of pandemic H1N1 among school-aged children by around 60 percent for children in kindergartens and primary school and around 20 percent for teenagers in high schools. To summarize, in this session, we have discussed some basic concepts in evidence synthesis and parameter estimation.