Hi, I'm Brian Silverman. Together we will be exploring the topics of quantitative data collection and analysis. I'm a professor of strategic management at the University of Toronto's Rotman School of Management. I frequently teach courses on probability and statistics for managers, and my research applies such techniques to uncover meaningful patterns in data. Patterns that can help managers and policymakers make informed decisions. One of my current research projects focuses on the different benefits that women and men get from the ostensibly similar social networks that they create. Another project explores the extent to which people with different political views avoid working with each other on teams. Both Projects imply consequences for the individuals and also for the organizations to which they belong. Both generate recommendations for steps that organizations can take to address these challenges. Sometimes when people hear the words probability and statistics they freeze up. These topics often come with unhappy memories or other baggage. Don't worry. We absolutely need to understand some basic principles of probability and statistics in order to conduct effective data analysis. But I pledged two things. First, I will cover exactly what we need in order to good work. No less and no more. Second, I will cover these in a user-friendly way that relies on clear examples and maybe even some fun with Excel, which happens to be the way that I most easily understand statistics to. Let me first motivate our time together with two examples. Example 1, you wonder whether women and men in your organization are paid differently. You collect information on the salaries of everybody in the organization. Here are the numbers. On average, men earn $50,000 while women are and $45,000. Is this difference significant? Is it a coincidence? Is it meaningful? Is it due to gender or to factors other than gender? If it's due to gender, then how might you address this? Another example, your organization has designed a prototype of a new product to help people manage their finances. You want to know whether it is appealing to potential customers. You collect information from focus groups and other sources about how people interact with the product. On average, good news people find it useful, but not everybody. Roughly 25 percent of the people seem unimpressed. Is there something systematic about these 25 percent of people? Could a change in the product or additional options address their concerns? By the end of your two weeks with me, we should be able to determine what data you need to answer an important question, know how to collect it effectively? Identify whether or not there are indeed meaningful, significant differences and interpret the results to support evidence-based action. We will come back to these two examples throughout the next several sessions as we gain experience with each new step of quantitative data analysis. But at more general level, by the end of our two weeks together, we will have account on three main ideas. Number 1, the world is a messy place. There are a lot of underlying patterns or tendencies in the world, including among other things, some real direct relationships between the actions that people take and the outcomes that arise. But it can be difficult to see these relationships because of the messiness. For example, if we just look at the compensation of people in the organization from that earlier example, it can be difficult to see a pattern. Number 2, quantitative data analysis is one extremely effective way to infer what the true underlying relationship is. In spite of the fact that we only get to glimpse a small amount of the messy world, and we know that our glimpse is imperfect. But quantitative data analysis enables us to make sense out of the messiness primarily by focusing our attention on averages. In other words, some of the data are messy. Not everything lines up perfectly, but on average, 75 percent of potential customers like the product or on average, women earn 5,000 less than men in our organization. Because of this focus on averages, it's crucial for us to think about who or what should be included in the average. Here's an example from the author Caroline Perez. Think of many mass transit systems. They are hub-and-spoke, so it's easy to get folks from the suburbs to downtown. They're not so good for getting from one neighborhood and a suburb to another. Imagine, for example, that a transit system experiments with a new, more hub-and-spoke type of route. A survey of writers might indicate that the average ridership time overall is unchanged and the experiment is deemed a success. But if we divide into different groups, could be men and women, for example, we may see that although the average change for man is a tiny decrease in time, the average change for women is a large increase. When we combine everybody, the magnitude of this increases hidden, but it's revealed when we look at the subsets of the data. By the way, it doesn't have to be men and women, it could be people living in the suburbs versus people living downtown. Alternatively, one reason that women in our organization earn $5,000 less than men might be that they are in different occupations. We can instead compare subsets of managers who are men and subsets who are women, and also compare subsets of frontline workers who are women to frontline workers who are men. We may find that the difference in earnings is reduced or even eliminated once we compare these subsets. The bottom line is that for our purposes, thinking hard about subsets will be a key issue. Third, because we're dealing with people and because people make decisions based on their expectations, we often need to think about why we find the results that we find. For example, imagine that you find that women are less likely than men to apply for a stretch promotion. One possibility is that women tend to be more risk averse than men. This is a common belief and it could explain the different behavior, but there are other possibilities. Imagine that in recent years, those women who have pursued stretch promotions have been less successful on average than those men who have pursued stretch promotions for whatever reason. In that case, then each woman who is considering whether to pursue such a promotion today, quite reasonably, may have a lower expectation of being successful. If two people are equally risk averse, but one has a lower expectation of success than the other, then we might well see the high expectation person pursue the promotion, while the low expectation person does not. In some, the goal of the next two weeks is to equip you with basic tools for quantitative analysis and to build your intuition for how you should best look at subsets of your data to really see what's going on. There's a forth main idea here. If your quantitative analysis always yields results that confirm your prior expectations, then you're probably doing it wrong. Since the world is a messy and very complicated place, we have to assume that we're going to be surprised sometimes. Sometimes we expect to find differences across people or groups and we don't find any. Sometimes the differences move in the opposite direction from what we expected. If you look for evidence to confirm or refute your expectation and the evidence refutes it, congratulations, that's a success. As long as you then think through the, why might this be the result question, and you're comfortable with the analysis, this is useful information. With that background, here's the plan for your two weeks with me. First, we're going to cover the basic idea of probability. We'll start by talking about flipping a coin. Imagine that we know for certain that this coin is fair, meaning that it should come up heads 50 percent of the time and tails 50 percent of the time. How likely is it that we should see two tails in a row, three heads in a row. This simply tells us the probability of seeing an event given that we know that the coin is fair. Next, we will cover the basic idea of statistical inference. In most cases, we have no idea whether the coin is truly fair. It's hard to tell just by looking at it. But what we can do is flip it several times and see how closely it comes to landing on tails 50 percent of the time, and heads 50 percent of the time. Let's say that we toss the coin 10 times and it comes up heads all 10 times. We can ask ourselves, how likely is it that we would get 10 straight heads if this really were a fair coin? If the answer is a very tiny likelihood, then we should reject the notion that this is indeed a fair coin. We can also get a sense of just how unfair coin really is. Just a little bit unfair or a lot. That's basically all there is to probability and statistics for our purposes. Now, coin flipping might seem to be completely divorced from exploring real-world management and policy issues, but it turns out that the intuition applies very well to such issues. In the subsequent sessions, we will next apply these concepts to a range of examples from the business world, policy world, and beyond. Then in the second part of our time together, we will discuss how to collect data that is useful to study your question. We will get some practice analyzing these data and interpreting the results. To foreshadow, the main point for data collection is, try to have a truly random sample, or at the very least, recognize what biases may creep in given the data that you are able to get. The main point for interpreting the analysis is, don't forget to ask yourself why the results are what they are. This can often lead you to the next question and the next analysis as you gain more insight into the issue.