[MUSIC] I've told you conceptually about health intelligence, we looked at the health intelligence cycle, and I told you about there being lots of data just out there. So let's have a look at that in a bit more detail. There are two buckets of data, what we term routinely collected data and non routinely collected data. Let's talk about routinely collected data first, as they're the majority of what you're likely to work with in the future. And they're also potentially available to you right now via the Internet. Higher income settings are normally awash with routinely collected data. In public health intelligence we think about four types of routinely collected data. Number one, demographic data, in many countries a census is taken every ten or so years, increasingly more agile approaches are being used. And in the UK because everyone is meant to be registered with a doctor these registrations too can be used for health service planning. Number two, health event data, births and deaths are normally well recorded and definitive. Depending on your health system there may be other hospital or other process statistics that can also be generated. Number three, circumstantial data, this area is increasing massively at the moment. While often dependent on government records there are an increasing number of non-governmental resources that may also be drawn upon. This is particularly the case in lower income settings where less stable government means NGOs and other organizations may be collecting housing or education data. And number four, reference data, this group is perhaps the least intuitive because it consists of data that in some way supports the other groups. In the UK we have a series of what we call postcodes, similar to US zip codes, that indicate a relatively small geographical area. When using Health Service data we’ll often have the postcode of the patient and this is what is used to allocate them to a particular geography. By linking a list of patients to a postcode dataset you can figure out roughly where they live or at least claim to. Routine data are really valuable because they're so readily available, they’re so inexpensive to use. And they allow you to compare one area with another or the same area over time. But they're limited too, they're unlikely to fully answer the question we’re posing, they’re nonspecific. They mask variations within subgroups of interest, they’re also sometimes quite dated, they’re subjected to a number of biases too. For example, if you're trying to look into vulnerable populations those at the margins of society are least likely to engage in registration. Take those who are street homeless, for example, if you don't have a home then even a census may not capture you. Before we move on I want to highlight the difference between census-based and sample-based approaches to data collection. When the term census is used in public health this means by definition an approach that is comprehensive and includes all people. Like many countries failing to submit a census returned in the United Kingdom and the United States is a criminal offense. Both countries undertake a countrywide census every ten years, they’re rich and vital source of information, this informs government planning. On the other hand, sample-based approaches are just that, they assume that the sample is representative of a population. Now, in theory a census should not be prone to selection or non-response bias, whereas a sample-based approach typically is. In reality, as I've mentioned before, census-based approaches remain subject to some bias. A census is hugely expensive. And in several countries they’re increasingly looking to use more agile and intelligent ways to record demographic information more frequently and using sample-based approaches. You'll cover many, many types of routinely collected data over the course of your public health practice, many of these data will be highly context specific. So it's not hugely valuable to learn a list or for me to give you too much detail. But there is one last thing I want you to know and it applies to both routinely and non routinely collected data, it’s about metadata. Metadata are data that pertain to another set of data. In general use the term metadata can mean different things. But in public health intelligence metadata is the word used to describe information that enables the user to understand where, how, and when the data you're using have come about. Earlier on I mentioned that on receiving a dataset from an external partner often my first email back to them will be to clarify what different variables mean. For example, one means yes or no or whether the data cover a chronological versus financial year. If you were to have received the metadata that all these questions should typically be answered. Metadata are important for two reasons in public health intelligence, they enable you to understand and interpret the data. From this you'll be able to appraise the strengths and limitations of the data in a particular context. You might be able to infer whether the numerator is an under estimate or an over estimate of the variable of interest. And number two, metadata provide an audit trail and substantiate your judgment, this is absolutely critical in public health. You'll know from whatever you've done before that it’s dead easy to make a mistake on a spreadsheet. Now, think about a spreadsheet or a statistic that has passed through numerous hands, over many emails, and document iterations. Metadata as simple as recording version control or change logs allow you to quality assure what's happened and reduce the risk of error. If an error is found then metadata also enable you to establish what happened and where things went wrong. So we've talked about routinely collected data, and I've also explained to you what metadata are, and their value. Over the last several years I've noticed that students tend to worry that there aren't perfect data to describe their problem or quantify their impact. I always seek to reassure them that perfect data in public health are rare, if not unheard of. All the data we use are subject to limitations. And our value as public health professionals is to apply our specialist expertise and judgment to say that while this may not be perfect this is what I think is most useful. You can always give your caveats and explain why it's going to be imprecise. But decision makers out there recognize that nothing in the world is perfect and our judgment is better than nothing. So get used to dealing with routinely collected data, get used to using multiple lines of inquiry to triangulate on a position or an estimate. It's our judgment that they’re after and so long as we're more often right than we are wrong then we're doing pretty well. [MUSIC]