One issue for fairness is where we have correct results, but they're misleading or they're unfair in someway. Here's an obvious thing. There's a question of how you visualize results or how you present them. So here's this company and we're looking at their sales year over year. It looks like fantastic growth curve, a nice hockey stick with a little upward tick. Looks really great, isn't it? Now that's the visual effect. But if you look at the scale on the left hand side, the scale along the y axis, we see that really what's happened, is over a six year time period, sales have gone from 100 to 105. If you look at this data plotted in a normal, unscaled way, it might look more like the graph on the right. What we see is a company that's had a perfectly solid, decent performance, consistent year over year. That's certainly no explosive growth at all. It's got a small, very small in fact, 5% growth over a six year period, nothing to write home about. Visualization is perhaps most obviously place where one can have misleading representation of data. There are many other places that one sees this. Consider a reputation system, a travel system where we are looking at user reviews and using that to choose a hotel to go on a vacation in. A lot of these systems exists and they typically give you a rating between 1 and 5. So we have two hotels here. Hotel A that gets an average rating of 3.2, and this 3.2, it turns out comprises a mix of mostly 3's and 4's. There is another Hotel B which also gets an average of 3.2 but this 3.2 is a mix of mostly 1's and 5's. The question is, which hotel would you prefer? Many reputation websites will just focus on the average and that's what it'd be ranking hotels by, that's what you might order your search results by and so on. And this important difference between the two hotels is going to be obscured unless you really look into it. My point is, that something like a hotel B, which people either love or hate, is a hotel that could either be exactly the perfect hotel for you, or hotel that you're going to avoid by a mile, depending on whether you're more like the people who rated a 5 or more like the people who rated a 1. Calling in a 3.2 is not right because it's either a 5 or a 1 for you depending on who you are. And figuring out which of the two you are is not easy without extra work. And it's certainly not there in the single score that we see. Single scores are a problem for another reason. So here is hotel A, different example, that gets an average of 4.5, based on 2 user reviews. And Hotel B that gets an average of 4.4 based on 200 user reviews. Which would you prefer? If you're me, I'd probably say, I prefer Hotel B and actually I prefer B by a lot. 4.4 is less than 4.5, and so in terms of a sort order, 4.5 might come first. However, we all know that it's too easy on most sites to place a few false positive reviews. And so we 're kind of worried about that 4.5, whereas the 4.4 seems like a much more solid number that we can rely on. And that's why I would pick B over A even though the single score that is recorded is lower. Let's look at this slightly similar example even further. Let's say now that hotel A gets an average of 4.5 with 10 reviews, and Hotel gets an average of 4.4 with 500 reviews. Well, if this was all you knew, you'd prefer hotel B as we were just discussing. But now, if you also know that Hotel A has just 5 rooms, while hotel B has 500 rooms, does that change your decision? And my sense is that at this point you probably prefer hotel A, and by a lot. Since hotel A has fewer customers, you should expect it to have fewer reviews. And so the fact that it has fewer reviews is something you shouldn't hold against it, given that it's so much smaller than hotel B. So where does all of this lead us? If you're gong to tell me something about the results that they are, often you have to boil it down, because that's what one has to do to be able to complete the analysis. And this is particularly the case where the consumer of whatever you're doing is a down stream algorithm, yet another algorithm is going to do something more with it. However, doing so often suppresses a lot of the richness in the data that might be important for decision making. Here's another place where data of importance may get suppressed. So we know that the world has right handed people and left handed people. And just to keep things simple, let's say that the world is divided between exactly these two types of people, and we leave out people who are missing a hand or people who are ambidextrous and so on. So let's just say the world is divided between right and left handed people and that there are far more right handed people than there are left handed people. Well given that that's what the world looks like, designers of products, they have to make a choice, will design their products to support right-handed people. Even if that means that knobs, and levers, and things that need to be manipulated are less conveniently placed for left-handers. Where products can be made in right-handed and left-handed versions, people may make both versions of products. Now in the case of right-handed and left-handed, this is a simple scenario. We all know what this means and we can actually observe how people make these choices. Where these same properties are less clearly understood, these things are much harder to deal with. So, less look at a world where we have a red majority and a green minority. And, we're trying to separate the good from the bad. Now, diamonds are good and squares are bad. And if we look at the majority alone, we will end up with a nice separation line like this. All the diamonds are above the line, all the squares are below the line. Unfortunately, this division is exactly the wrong division for the green group of people. Now in my example here, there are only a few red dots and a few green dots. So the effect of some majority is not that obvious. But, suppose you had a lot of times as many reds as you had green dots. Well, this is the kind of line that your algorithm would draw. And green would just lose. And so you have this hiring algorithm that is looking at some attributes and choosing the attributes on the basis of which to hire. And the criteria that this algorithm is using have been tuned to fit the majority. And if the criteria happened to be different for some minorities, at least if the optimal criteria for some minorities may have been different. And the algorithm is going to perform poorly on these minorities. And because this either then performs poorly on these minorities, we have a few different problems. First, the best minority applicants may not be hired. As we saw, the green dots are being classified the wrong way. And we now have a very qualified minority applicant who would actually do very well. But because our algorithm is badly tuned, this person gets rejected. There is a further problem, which is that the hired minority employees are now not the best because of the poorness of our algorithm training. And because they're not the best, they end up performing not as well. And in consequence, they unfairly besmirch others in the minority, because now the algorithm is going to learn that minorities don't do as well. And so, we began by suppressing diversity and ended up discriminating. It isn't just with respect to hiring. Let's look at this in a medical scenario. We have a clinical trial of a new drug to treat diabetes. And it turns out that they're actually two quite distinct groups of patients who have been pulled together because in terms of what we know about medicine today. Both groups of patients have similar glucose regulation issues,, so what do we know, we put them all in one pool. And because the disease mechanisms are different in the two patients, the drug is very effective for patients in group A but is worthless for patients in group B. So let's see what happens in this clinical trial. If group A is in majority, then the drug will be found to be effective. And we'll report some significance level. We'll say this drug's effective. And this drug will be prescribed for anybody with that kind of diabetes. And patients in group B, for whom this drug is worthless, will also be given this drug. On the other hand, if group B is in the majority, then the drug is not found effective with sufficient significant over the whole population. And we will simply end up not approving the drug. And even though there is a minority of patients, the patients in group A, who could have benefited from this life saving drug, is never going to be marketed. It's never going to be approved. In short, having the right results, but doing so in an overly simplistic manner, can lead to very significant societal consequences.