Hi. In this module, I'm going to talk about administrative and archival micro data. Administrative and archival micro data play a growing role in contemporary social science research. When I talk about administrative and archival records, I'm talking about records that were created by governments, organizations, nonprofits, and other institutions for other reasons. That is reasons other than conducting social science research. These can include census forms. In many cases, censuses were originally compiled for reasons other than social science research. For example, apportionment, tax records, probate records - that is records of assets of time of death -, deeds, student records, health records from insurers or providers, and employment records. Technological progress has enabled the construction of databases from them and the analyses of the resulting databases. For past populations, they may be the only source we have for micro data. Let's compare survey data and administrative, and archival data. Survey data typically comes from samples, that is samples that are drawn from some larger population about which we would like to generalize. In many cases, administrative or archival data actually cover entire populations because again, the data were compiled by governments or other organizations to record, perhaps, the population of a country, a region, a city, or a district in its entirety. Survey data typically have fewer observations because of the logistical complexity of launching surveys. Perhaps a typical survey may have anywhere between a few thousand observations and ten or 20 thousand observations. Great. Administrative and archival data will typically have more observations because, again, it typically records entire populations. Survey data, well on the other hand, have more variables because surveys can be designed for the purpose of research. So, people, at least if they have the budget for it, can ask a wide variety of questions of the respondents. Administrative or archival data will typically have fewer variables because any particular dataset was probably constructed for a very specific purpose by a government or other organization, and will only have variables that are relevant to that purpose. Survey data tends to be contemporary. The modern era survey research began in the 1950s. Historical data will often be available for the 18th and 19th centuries and then, again, administrative and archival data is often all that we have for the historical past. Survey data is designed - that is we get to design a survey from scratch. Administrative and archival data is discovered. Survey data is distinguished by geographic breadth. We can often have a nationally representative sample. Administrative and archival data is often distinguished by its longitudinal depth. So, as we'll see in a later module, we often see families that are covered for multiple generations. Let's talk about the process of constructing a database. The process of constructing a database of administrative or archival records begins with locating data. That is finding it in the archives or in the government agency that's compiling it. And then, there's a process of acquiring the data. So maybe, perhaps a difficult process for contemporary data - that is data on living people compiled by a government agency - protocols may have to be developed to make sure that the data is not misused and to protect the confidentiality of the data. For historical data, arrangements may have to be made with archives or libraries to provide access. This may be difficult, especially if the original documents are fragile and require special handling or are proprietary. Then, there's the process of transcribing the data. Especially for historical data, it's necessary typically to hire somebody to transcribe the data into a database. And this is a major effort on its own right, developing the procedures for this transcription. Then, there's a process of linkage. In many cases, especially with historical data, the same people may be recorded in multiple different databases and then, or multiple different original sources. And then, as the databases corresponding to these different sources are constructed, the same individual has to be located in these different sources and their records connected together. This is often a labor intensive process because the names may not be written in the same way in different sources. There may be other problems making these links. Finally, there's the process of cleaning the data to actually make sure that errors are addressed. Errors can creep in during the entry of the data or, in fact, there may be errors in the original data. Finally, if we're lucky, we can get to the stage where we can analyze the data. And then, typically, as we conduct the analysis we realize that we need even more data and we go on to try to locate additional data. Now we'll talk about some of the challenges of working with administrative or archival data. One is access. Sources. Most interesting sources may be proprietary or the information may be sensitive, especially for contemporary populations. Access may be difficult because the owners of the data may require various protocols to be put in place, again, to protect the confidentiality of the respondents. Transcription can be difficult, especially for historical documents written by hand. In some cases, the original cursive script may be very difficult to read. I know of examples of Japanese documents from the 18th and 19th centuries that were written in a cursive script that very few people in contemporary Japan can still read. And the people doing the transcription have to do or have special training. Categorization and standardization is a major challenge. If we are looking at historical documents, especially occupations, health problems, disabilities may be open ended and they may have, or they may not, fall into categories that we're used to, they may not resemble occupations, disabilities, or illnesses as we understand them now. A lot of work may be required to turn them into variables that we can actually use in an analysis. Linkage can be very difficult. That is connecting records of the same person and different sources. Maybe a challenge because the same person's name may be spelled differently in different sources. Or in different sources you may have multiple people with the same name and it may be difficult to distinguish which one of them should connect to which record. Now, to mention some of the major contemporary population databases that are out there as an example of the sorts of things that you might be able to work with, if you pursue research in this direction. The Scandinavian countries Sweden, Finland, Denmark, and Norway, have extensive contemporary population databases. These are population registers that record births, deaths, movements, and family status at their core. And they include, or can be used to produce, complete life histories with remarkable detail and context, following people from the time they were born until the time they died and linking them to their parents. These population and data can be linked to health records, employment records, education records, and many other databases. Allowing for, again, complete life history of individuals to be studied and for examination of connections, for example, between education, income, employment, and health. Now, there are a lot of historical databases as well. Historical records of individuals and families survive in archives. Examples include: Records of baptisms, burials, and marriages in parish registers. Catechetical records. In some countries in the past, children have to take exams to prove that they could read the Bible at particular age, and these provide important insight into literacy in past times. And they may have other details as well. Population registers constructed for administrative purposes by different governments in the past. Record families, households, individuals over time, tax records have been used to study wealth inequality and income inequality in the past, at least among the elites. Genealogies compiled by families offer information about kin groups in the past in different countries. And censuses, of course, provide records of entire populations. For many locations, these sorts of sources are being transcribed into databases. In some cases, it's possible to find records of the same individual in multiple sources and then we can link together to provide life histories, or even family histories, as we described in just a few seconds ago. Linkage allows for these to be combined to provide or produce these life histories. And censuses are now being linked across time. It's a little more challenging but also to provide life histories, at least for large samples from national populations. One example of a historical database that's relatively easy to access is the historical sample of the Netherlands. This provides life histories for a sample of 77,000 persons born in the Netherlands between 1812 and 1922. Sources include birth, marriage, and death certificates, and population registers. The information provided include age at marriage, religious affiliation, number of children born, occupation, birthplace, literacy, migration, and social networks. These data are being linked to other sources as well including in later periods, so researchers can follow at least the sample of people in the Netherlands from the middle of the 19th century, or even the beginning of the 19th century, through the industrial revolution in the Netherlands up into the 20th century providing great deal of insight into social, family, and population history of this period. Another major source of data are censuses. I'd like to mention a few of them, or a few of the websites where you can access census data. IPUMS is a major project to make available the decennial U.S. Censuses from 1850 to the present. Now that is the individual level data from the censuses. And in fact, they are now making available since even earlier censuses, which are household censuses. They even have a facility that you can visit that lets you analyze data online without any special training or expertise. IPUMS International provides census data for other countries from 1960 to the present. And in fact, it includes countries from Asia, Europe, and from all over the world. The North American, North Atlantic Population Project provides census data from 1703 to 1911 for eight Northern European and North American countries. And you see the URL here. So, overall, there's a lot of work going on to provide or produce large databases from archival and administrative data. These data can be contemporary, especially in the case of the Scandinavian population registers. And now, there's increasingly accessible historical data from different populations, probably, the easiest place to get started is at the IPUMS website.