That brings us to another limitation of big data that often creates a lot of confusion, but it also has actually a pretty straightforward with data science solutions. So if you do your data science well, then you can deal with it. But it's a limitation that is often not considered in practice, and leads to a lot of damage in applying big data analytics, and that is that correlation is not causation, and this is something very common. So correlation basically means that two things go together, and causation means that one thing causes the other. So correlation might be both things go up together or one thing goes up, one thing goes down, or both things go down. Causation means that if this goes up, then it makes this go up as well. So actually, there is a relation but correlation itself doesn't really show that there is a causation between them. This confusion is really is omnipresent. I just turn on the TV. Sometimes I cannot stand it anymore. I mean, these TV reports as well as all these graphs nowadays, and they take out conclusions which are complete and not substantiated because it's based on a correlation. For example, even in the highest ranks of government, this is an internal White House document of the Trump administration from 2017, and the internal White House document alleges that manufacturing decline, that means that there are less people in United States working in manufacturing jobs such as manufacturing cars, increases abortions, infertility, and spousal abuse. So it means that there are less people working in the manufacturing sector, they might not work in another sector. I mean, they don't even have to be unemployed but less working in the manufacturing sector, they might be working now in the service industry or something, and that leads to more abortions, more for infertility. Absolutely, there's nothing that could hold. This slide is actually from the Trump administration that was shown in the White House. There's absolutely no evidence for that. It might be that in some parts of the country, they go together but it's not that a manufacturing decline increases the other one. It might also be the other way around or it might be that they are what we call spurious correlations or confounding variables, and I want to talk a little bit more about that. So basically, correlation and causation, you can find a lot of correlations among other things. For example, the correlation I found here is a very strong correlation of 97, a correlation 97 out of 100. It shows that the number of civil engineering doctorates correlates almost perfectly with the consumption of cheese, and that's also cool as long as you want to make predictions. So as long as this holds, as long as this stays the same, if you want to predict the number of civil engineering doctorates, you can use the consumption of cheese. Because if that's a stable correlation, you can make predictions. What you cannot do is making claims of causality, because if you would intervene and for example, so and say like, "Well, how can we get more civil engineering doctorates?" Well guys, we got to eat more cheese. Yeah, no, nothing will happen. Obviously, nothing will happen there because there is no causation. There's a correlation, but there might be a spurious correlation. Here for example, I've heard a very nice example that the number of movies that the actor Nicholas Cage participated in almost perfectly correlates with deaths by people being hit by sports equipment. Yeah. Well, somebody tell this guy to stop making movies. I mean, that's not a question of taste anymore, it's a public health issue. But again, that's a correlation causation issue. Where does that actually come from? Well, it comes from here and I want to walk you a little bit more in detail through that. Imagine this kind of correlation, which is a correlation that really holds. I mean, the absolutely statistically this holds. That's why Mark Twain used to say, "There are three kinds of lies: there are lies, there are damn lies, and there are statistics. So you can make these kind of lies with statistics because this is a really strong correlation. It says that the size of the shoes of children between 2 and 18 years, almost perfectly correlates with the time they spent on the internet. So kids between 2 and 18 years with bigger shoes spent more time on the internet. Which means guys, which obviously means if we want to increase digital literacy of our children, we got to make their feet grow or at least, we got to buy them bigger shoes, that might be a cheaper solution. Honestly as ridiculous as that sounds, I turned on the TV, I heard these kinds of things all the time in every kind of news channel. That's the kind of conclusion that people draw all the time because they confound a correlation with causation. So why is that here? Well, there is a confounding variable, that's a spurious correlation, this confounding variable leads to a spurious correlation. What is the confounding variable? What variable have you not considered in this example? That is the age. We just threw all the children together from 2 to 18. Obviously, the ones who are 18 have bigger shoes and they use the internet more than than toddlers with two of little tiny feet but they don't use the Internet. If you would control for age, like taking all two-year-old or taking all 18-year old, you don't find any correlation between shoe size and Internet usage. So that's the spurious correlation and the confounding variable. Now, you can control for that statistically, there are some techniques. Also, how you collect data, and especially with the digital footprint that also gives a lot of time-series data. You can control for causality much better than we could ever before. We still don't do it because a little bit more labor-intensive, you needed a little bit more sophisticated techniques, how you collected data is a little bit more sophisticated. But you can actually detect causality with more and better data, and that's what the digital footprint big data is all about. It's actually there are a lot of opportunities. You just don't do it really. So that's what we really have to consider that and distinguished them.