In this video, we'll look at a few examples of how hastily developed and deployed Machine Learning systems have impacted the real world. Specifically, we'll see some ways a question answering machine based entirely on real data can create destructive feedback patterns. Machine learning algorithms take training data and find patterns. Using data our machine learning systems find patterns that don't necessarily make sense to humans or patterns that we wouldn't have been able to find on our own. This is not the same as identifying causal relationships. The old saying correlation is not causation still applies. Sometimes the patterns the machine finds are useful and consistent for understanding the data. Sometimes the patterns are coincidental. Sometimes the patterns are really there but don't necessarily mean what we think they mean. Machine learning finds patterns in data. Finding the pattern doesn't guarantee it's useful. Let's look at a practical example, a system that reads resumes and recommends the best candidates for interviews. Using machine learning to speed up resume vetting to provide a first-pass filter is a natural use case and the system itself isn't making final decisions so it seems like a safe partnership. Let an automated system get rid of the obvious duds and rely on humans for the more nuanced choices, but a system like this is vulnerable to bias, why? Think of the data you have to train the system. Naturally, you would use your own historic data, what resumes resulted in hires at your company and which were rejected. But wait, any bad practices or unconscious biases from your hiring history creates patterns that our machine learning algorithms pick up on. By definition, they'll recommend candidates similar to the ones you already have for their particular understanding of similar. If you have say a manager who tends to hire students from their old university, that creates a pattern. It might be so subtle that you haven't actually realized it's happening but the machine will pick up on that pattern. If company has been dominated by male employees, that's a pattern our system will find. Because of interdependencies between features, that's something machine learning can pick up on whether or not you explicitly provide gender as a feature. There's no way for the machine learning systems to find the right features that truly determined employee success, they only identify features correlated with the labels you give it. This illustrates a danger in using these patterns or systems based on them to make decisions. You can create runaway feedback loops, where the decision creates data that skews the system even further. Take the example of using machine learning to determine the interest rates for someone getting a loan. Loan companies need to balance their risk and one way they do that is putting higher interest rates when the loans are high risk. So our automated system will identify people as high risk, which leads to their loans having higher interest rates, higher interest rates increase the chance of defaulting on the loan. So the very act of labeling this kind of profile as high-risk increases the risk, more data is generated from the system which naturally includes more defaults from the profiles labeled high-risk. This reinforces the correlations our machine learning system found. It becomes a self-fulfilling prophecy. Another example we mentioned back in course one is predictive policing. Depending on how they're used, predictive policing models create dangerous positive feedback loops. It's natural enough to use arrest data to identify areas which are higher risk, which might require more policing but if police officers are deployed according to this model, especially primed with the information that they're going to a high-risk area, officers will inevitably make more arrests. More police presence, more arrests supports the model's prediction that this is a high crime area and the cycle continues. If that sounds familiar to you, it should because this is a real-world cause and effect of bias in our data. When this data is sent back to the system, it's positive reinforcement that the system is doing the right thing, which reinforces the bias in the training data, predicting greater need for policing in the same neighborhood. The feedback creates a self-reinforcing bias and the cumulative results can be disastrous. A related danger comes from using patterns identified by machine learning systems without understanding the reasons why those patterns exist. Let's look at a study of deaths from pneumonia. Researchers were building models to identify which hospital patients were at risk, these models found that patients with asthma had a lower chance of dying from pneumonia and yes, that was a true pattern. Having asthma is good news at least as far as deaths from pneumonia goes, but this is also clearly wrong. People with respiratory disorders are in greater danger from pneumonia than people with healthy fully functioning lungs. In this case, the researchers went to the doctors to discuss the findings and some explanations emerged, doctors are extra careful with patients suffering from asthma, they're more likely to run extra tests and to take more preventative measures. Patients with asthma know they're at risk, they're more likely to alert the doctors to problems early. So yes, having asthma is correlated with lower death rates from pneumonia, and the model only looks at those results to make the associations. But if we use our model to change the treatment procedures, we'll interfere with that pattern, our asthmatic patients will be labeled low-risk, making it harder for doctors to run those extra tests and take those preventative measures and preventable deaths result. Predicting that patients with asthma are lower risk is fine, until you use that prediction to decide on treatment. It's impossible to totally remove bias from our machine learning models, since we rely on imperfect data from the start. As much as we try, it's not possible to completely identify and remove bias from our data before training. You have to use human judgment to identify potential issues your system will create and aim to proactively address them. Seriously consider potential feedback patterns, look for runaway feedback loops where the predictions made by the system creates data that directly reinforces those predictions, and watch for cases where prediction and decision-making need to be separated, especially when your system changes lives.