In this lecture, you're going to do an active data collection activity. It will be most effective for you to have the lecture with you. So that means you should either have the video with you on your phone or a laptop or something, so that you can pause and unpause the video as you go through the required steps, or you could print out the slides that I give you as lecture resources, but you need to have the lecture with you so that you know what steps to follow. You should go to some populated place to do this, not just a dorm room or something, you need to collect say 20 pieces of data about people. So you need to go somewhere where there are at least 20 people wandering around. It can be a coffee shop, or a park, or whatever you choose to do. Here's your job. The data you need to collect are the following. You need two for each person, answer, are they wearing something blue or not? So that's a yes or no question. You'll also need to estimate their age. So something within a decade is fine, you can round. So between zero and 10, and between 11 and 20, and between 21 and 30, and so on. We don't have to be precise here. We're just putting people into age bins, rather than worrying about true precision. So at this point, you should pause the lecture and go collect that data. When you're done, unpause the lecture to continue. Using the data that you've collected, answer the following question. What percentage of people were wearing something blue? What percentage of people worked? Go ahead and pause the lecture while you do that calculation. Once you have it done, unpause the lecture to continue. Now, using the data that you've collected, answer the following question. How many people were wearing red shirts? You could be furious. You didn't ask me about red shirts. I have no idea. I wasn't paying attention. Those people have left the coffee shop. There's an important lesson here. The important lesson is you need to know how the data is going to be used before you race off to collect it. If somebody says, ''Go collect data.'' You need to know, all right, what questions are you trying to answer, and how will this data help you answer them? Or what are you trying to build, and how will this data help you build it? So it's important to know how the data is going to be used before you race off to collect it, because you might not collect data that you needed. For example, in this case, you didn't collect data about red shirts. You might also collect data that you didn't need to waste time collecting. Like the age data, we didn't really care. We just said, Gee I don't know. Maybe will need to know about age. So you need a good understanding of how the data that you're collecting will be used before you race off to collect it. Okay. So now, you should collect data about people wearing red shirts and you can pause the video to do that. Once you've done doing that, unpause the video to continue again. Now, you should answer the following question. What is the average weight of the people in the area? There's a lesson here too. I'm going to say that data is not available. You could go ask people what their weight is, but that's probably not a socially acceptable way to collect this data. You're probably not going to be very accurate unless you're a carnival performer who guesses people's weights. So sometimes the data that we'd like is not available even through direct observation. This is a direct observation exercise. So that's a lesson as well. Sometimes we just can't get the data that we want. Even if I let you go try to collect it again, you just can't get it. Now, how did you store the data that you collected? We know there are certain characteristics that we would like our data to have so that we can explore the data to answer our questions. I will say that as we collect the data, we regularly don't put it into the final form in which we're going to store it. You might just have it on a piece of paper. You're counting things up and marking stuff down, but eventually, you're going to want to meet the common characteristics. Is your data searchable? Can you look for people who are wearing something blue or not? Or could you look for people who are a certain age even though we didn't use that? Or could you look for the people who are wearing red shirts and so on? Is it sortable? Can you in fact, just group things together even if you've collected all the data, one at a time? So say you had your laptop, so you're entering them in a spreadsheet or something like that. Once you're done, it would be nice to just sort them. So all the people with the blue are at the top and then you can just count them. Now, it turns out that this particular data, the questions I'm asking, are they wearing something blue and are they wearing a red shirt, are not exclusive categories. You could have somebody wearing something blue and a red shirt. So the sorting part can be tricky, but we can often, if we've stored it electronically, we can sort based on one criterion to answer one question, and then by another criterion to answer another question. Is it modifiable? If I say ''Okay. Go do this for 20 more people.'' Are you going to easily be able to integrate those data into the data that you've already stored? Finally, is it summarizable? Because the two questions I actually asked, what percentages of the people are wearing blue and not wearing blue? That's summary data. You do a calculation based on your dataset. The same is true of how many people are wearing red shirts. That's a counting problem. So the count of something in your dataset is also a summary of that data. So even if you didn't store data in those ways as you collected it, which is fairly common, eventually, you would want to put that data into a form that need answering the questions easier. To recap, in this lecture, we did some active direct observation data collection and we learned some important lessons along the way.