[SOUND] I'm going to walk you through six limitation doesn't mean that these are the only limitations, there are a lot of limitations and things we have to be aware of and conscious of if you work with this digital footprint. But I'm going to work you about some of them, and some of them you've already seen and some of the examples before. For example, when we talked about political campaigns, right? There are some downsides to it, but I want to go particular to some limitations of using the digital footprint for social science purposes, for research purposes. So I go into these kind of limitations. First of all, the limitation is that the footprint is not necessarily representative, and I already said that, right? We said we try to work with entire population of Facebook users, but Facebook users are Facebook users, they are not all people on planet Earth. Second of all that the footprint is not the foot. It's a footprint, and we often confuse that and that can have very dangerous consequences if you confuse that. Third, there's meaning that we can detect in this data through artificial intelligence, through machine learning, but is that always meaningful as well? Is that really what we want, then we have to talk about that. Fourth, discrimination personalization while personalization your resulted from some examples I gave is be personalized but that also often sometimes means discriminate. So we have to think about that a little bit. Correlation is not equal to causation, you probably heard that. It's a very important aspect of it. And yeah, the past is not equal to the future. As intuitive and silly as that sounds, that's something very important because data is always from the past. All right, so first of all, the digital footprint in the representativeness. So if you work with the digital footprint, the digital footprint as you saw already in the Twitter example that I showed you when somebody says good morning on Twitter. We can only see that when we where there is connectivity, where there is a footprint. It is no digital footprint. It's not because there might not be a person, but it might be because there's no digital, therefore no digital footprint. So this is called the digital divide, the divide between those that already have x's and take advantage of digital technology. And those that are yet the still excluded from that the technology. Here, for example, we have a study from Josh Blumenstock from UC Berkeley. And what Josh here studied was mobile phone penetration in Rwanda in 2005. And what we find is that we compare here the subscriber data, that means people have a mobile phone and the survey data. It's kind of like our ground truth that's actually how many people are there. And for example in gender we can see that in Rwanda is 50/50, like in most countries 50% man 50% women. However with regard to mobile phone subscribers, there are many more men than women. That means our digital footprint is bias, same as with age for example. Middle term is overly is overly represented here, and with education as well people with higher education actually overly represented with regard to illiterate people with very low education which at that time did not have access to a cell phone in Rwanda. So the digital footprint and run is bias in terms of who has access to technology, and that's what Josh showed in the study. A few years later, another researcher did another study in Latin America, Frias-Martinez did a study in Latin American economy 2009 with a mobile phone penetration that much higher. So in Rwanda the mobile phone penetration was at to 20% below 20% and here the mobile phone penetration was between 60 and 80%. So up to eight out of 10 people had a mobile phone. And you can see now here the representativeness is extremely good. So 50% man 50% women in the the census data and also 50% men and 50% women have a mobile phone, same as with age are saying with income. So the income these two now, they look very similar, right? So the different income groups are well represented in the digital footprint in the mobile phone footprint in contrast to Rwanda, where this is not the case where we have a very biased digital footprint. So that might lead you to the conclusion to say, all right, so once everybody is connected and that's just a question of time. Everybody has a mobile phone. Actually in this world already, we have more mobile phones than we have people. But some people have two mobile phones, so not everybody, but almost everybody even in the poorest countries. They have a mobile phone, even if there's no technology often. Even in places where there is no landline electricity, for example, they often have mobile phones charging them some solar panels and so forth. So yeah, so you might get to the conclusion, all right, so once everybody is connected the digital footprint is representative. Not so fast, there's one problem still and that's that the digital footprint. Actually, the digital divide has evolved over time. And this evolution of the digital divide is actually due to the fact that our access technology to the digital real has evolved. So back in the in the 80s, we only basically had one technology to access even digital communication, the fixed-line phone was often digital already back then. And each fixed-line phone has the same bandwidth. Yeah, you could talk through it and that's what you could do is through it. So if in a country you wanted more access to the digital real you just would buy more fixed line phones, right? That's waht you would see here. So here on the horizontal x-axis is the number of fixed line phones of subscriptions per capital, and here is the bandwidth and you see a one-to-one relationship, right, it's a one-to-one relationship. Some countries have more bandwidth because they have more phones. Now over time this evolved is actually now a two-dimensional challenge, because you can have more subscriptions, but they're not only fixed line phones their mobile phones and there's broadband, there's narrow band, there's broadband Internet, the different levels of connectivity. And we see some countries that have a lot of bandwidth available. For example, some Asian countries Korea for example, we can see here up here South Korea as a lot of connectivity. And the number of subscriptions per capital, actually how many devices we have doesn't actually increase as more like it increases while we all have ones of just like under like two subscription. So we all have a fixed in a mobile device, then it's almost like we hit a wall right? And then we go straight up and then the divide keeps on evolving not by having more technology. Some people have more bandwidth. So even when we all have a phone, some people will be over-represented in the digital realm, because they have more bandwidth, right? So even if everybody has a smartphone, some people will have little holograms on their hands and everybody's holograms than some people have brain computer interface. What do I know, but it keeps on evolving, right? So there is a difference of over representation of some people simply because some people have more bandwidth. Even so that if everybody, so the digital divide is not closed when everybody has access to a technology. Now the problem with that is that the bandwidth device is incredibly persistent. It's not easy to get rid of, and that's because it's related to the level of income. And income levels are notoriously persistence it in equal income inequality. So here it is the same graph and now I put a third dimension to it, which is the income they mentioned which you see here are coming outwards as the income that's available in a country. And you can see here that even poor countries countries, countries that have no income at all very poor countries. They got a against the wall, right? They don't have any income, they can still still move up and move up and actually on the access of subscription so they can buy mobile phones. They cannot buy a lot of bandwidth, because they're poor but they can buy mobile phones. They have a lot of mobile phones with not a lot of connectivity, but was you can see kind of like they go along this axis. And then once they hate like 1.52 subscriptions per capita, they keep on going up because if we all have two devices a fixed it on mobile, we don't need more but then we keep on going up with getting more bandwidth. And you can see that the countries with more income have a lot more bandwidth. So actually the the bandwidth divide is very related to the income divide and the income divide the income inequality is the touristy persistent. We're not going to change that anytime soon, there's richer countries richer people, poorer people and those with more income. They will be over-represented in the digital footprint because they have more bandwidth. So it's not so easy to say that I've told walk you through all of that, because often the suspicion is making the digital footprint representative. You just have to wait until everybody is connected, but no, everybody will not in the at least as far as we can see never be equally connected. Some people will have more access and some people will have will have less access. For those of you who are familiar with the discussion of net neutrality. That's a very important discussion that fits in here because it's also the question of who is over-represented, who's under represented in the digital realm. So that makes the digital footprint always still biased.