Then the last part, you might have heard me talking about this before is the idea that Big Data cannot tell us anything about changing futures, and that has to do with a big discussion. There was a big discussion a few years back that some people said that big data machine learning especially leads to the end of theory, right. So how was it traditionally when we did scientific theory if you try to discover something. So here we have one of the biggest scientific theoreticians we ever had, Newton. So here we have Isaac Newton, and Isaac Newton answered the question. For example, a question like how often is there Full Moon? He answered the question by setting up a comprehensive theory with his differential equations and that's actually the equation that predicts you how often is Full Moon, so the Moon goes around the Earth, and the Earth goes around the sun, and then you can calculate how often there actually is Full Moon. So he had a theory behind that and that's the theory. What would a big data analysts do? A Big Data Analyst doesn't care about theory at all just like the Google Translate people they didn't ask about linguists about the theory behind language. They basically just correlated data. So Big Data Scientists would just go for example to Google search, and look up, when do people Google Full Moon? You can see it's perfect periods of 29 days between the peaks and with the digital footprint, you don't need anything, we don't need Newton with all these differential equations, you just look it up the digital footprint gives you perfect predictions or when it's going to be Full Moon. Well, that's when people Google, Full Moon. So there doesn't need to be any theory and these claims led people like Chris Anderson, who was the editor of Wired that's a big tech magazine used to say, well "Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising. Just assume that better data with better analytical tools would win the day, and Google was right. Google's founding philosophy is that we don't know why this page is better than that one, and Google social recommendations. If the statistics of incoming links a so that's good enough." So Google without any theory is able to make predictions. Same as what it did was Google translate the same for example about your personality. A Google knows your personality, your psychology better than any psychologist without having any psychological training or psychological theory just by your behavioral data as long, and that's a very important as long, as long as your behavior stays the same, as long as your personality stays the same. If you might fall in love, change your job, change the country, and your behavior changes abruptly, past data could not predict anymore your future behavior, your polygons you past behavior and your future behavior is different. Now, a psychologist who as a theory can still be able to make predictions same as Newton's theory because he knows all the variables that actually involve, he knows what causes why and how this hangs together. So we can say, "well, if you change your job to this job or you change to move to this country, that's how your behavior is likely to change" because he has a theoretical understanding of the background. That's how important Big Data leads us to the question of stationarity, the technical term of stationarity. So do behave as stationary like if there are overall statistics of your behavior stay the same, then you can predict the future. If not, then you cannot, right. So this has some pitfalls for example. There's a very famous example of the Google Flu Trend. So Google Flu trend basically used Google searches in order to predict the flu in the United States. Basically, what they did is they used the 50 million most common search terms to predict the spread of the seasonal flu, they don't even have to be ready to the flu. They just saw which of the 15 million most common search terms are related to the flu. They didn't handpick them with any theory, they just look statistically which are correlated. Run a bunch of models and then they identified 45 search terms that it actually could be able to predict the outbreak of the flu, extremely useful much better than the government data because government data they get that from hospitals, when people actually go to hospitals because they already have the flu, and until that's all processed, and then three months later we already lost, right. So this in real-time could give you flu prediction, and it worked perfectly well as you can see that's applied to another case of Dengue, a very serious disease, so it helped us a lot to understand that better. Now, a few years later some other scientists wanted to replicate it. So they took this algorithm and tried to do the flu algorithm, and tried to predict the flu and it didn't work at all. It did not work at all anymore. What happened? Why did the algorithm would work anymore? What happened is that the behavior of the people changed, people just Google different things, is these 45 search terms were not anymore because there was no theory behind this 45 search, and it's not like these search terms had to do necessarily with the flu there might have been spurious, there might have been confounding variables with it, and over the years people started to search different things other things were correlated with it, and as a result, it wasn't able to make predictions anymore because the pattern is non-stationary, right? Behavior just changed, and behavior just changed and nothing can say to make it a little bit more abstract. If you have for example a data series that looks like this, and you have this data series now logo on it, how do you think what's your prediction? How will this data pattern continue? What do you think? There is nothing, nothing in the data that allow you to predict differently then continue as was. Because everything in that data that this data tells you, forces you to predict like, well, it has to continue like this. There's nothing in the data that would allow you to do otherwise. Now, this data from the past will not allowed to make a prediction. For example, like there's nothing in this past that would predict that. If you would have a theory, in theory, yes you could do that, in theory you could say, well, something changes there then not the data, the data is necessarily always from the past because as soon as you recorded it passed. In theory can then if you have a theoretical wherever, you can make predictions about futures that didn't exist before, and we do that as well, and that's then a compliment. So a final limitation of Big Data complemented by another computational social science technique which is computer simulations. So it just like playing Sim City. So here that's a Sim City simulation. What I did here the simulation is create a future that did not exist before. So this is sustainable city, which can be calibrated with big data and creating the city you calibrated with real cities that exist, but eventually you grow cities that doubt exist in Data difference. It is you want to make the world a better place. You want to have a world without pollution, without poverty. In order to make the world a better place, we can then assimilate futures that never existed. So for example, these here real-world simulations. That's assimilation here from United States military in Afghanistan to simulate where people are walking or insurgents or terrorist attacks might happen. This is a chemical attack in Los Angeles that never happened. We have data that can calibrate our model, we have data on Los Angeles, we have data about how people move, we data about chemical attacks, but to see what had happened Los Angeles, we have to simulate something that never existed, and this here is traffic in Chicago. So actually in social sciences, we're always changing what's happening. So social systems are necessarily unstationary because we have this desire in social sciences to make the world a better place. This is actually a critique of a Nobel Prize winner in economics Robert Lucas, it's called the Lucas critique. So Lucas critique just call it an econometrics, that's like the Data Science of economics and he critiqued them, he said, "So you guys are studying some economic dynamic and you run your correlations, and so forth, and then you discover that something's wrong. So you make a policy that is supposed to change the system, but then you still think to predict the future with the past that existed before, and that just doesn't work. Once you intervene and change the system, the system will be a completely different system. You don't have data about this. So econometrics alone cannot help you there. Literally said any change in policy, that music dimension, will systematically alter the structure of the econometric model. In social sciences, we always want to improve the world we always destroy the stationery that you always destroy the insights that we just gained. So social science is extremely complex to actually do and thankfully we need. One method is not enough, empirical work is not enough, we need theoretical work to complement. That's the ultimate limitation of Big Data. All right. That brings us to the three questions we had today. We had what is Big Data, and I gave you five characteristics of big data. I walked you through a bunch of opportunities, there is hopefully more or less entertaining case studies from governments and phone companies, and lastly I walked you through six limitation that this Big Data paradigm, all it says is very important to understand these limitations in order not to completely go with excitement and turn it into a height. It's a very powerful way of doing social science. The digital footprint gives us unprecedented opportunities, but as always there are some limitations. I hope you enjoyed this lecture as much as I did. See you next time.