Last time, we talked about how we decide what data we need to collect to solve a particular problem. This time, we'll talk about how do we actually find the data that we've decided to collect. One technique we can use is direct observation. If we can just go collect the data ourselves then we do that, if we need to go count the number of people wearing red shirts over certain periods of time in a certain location, or measure the average size of different classrooms on our campus, or a variety of different things, we can just go check ourselves. We can go collect the data through direct observation. Another way we can collect the data is through online sources, and this may be the most common thing. Back to our cell phone plan, if you were picking your cell phone plan, you'd probably go online and find out the details, collect the data that you wanted to answer the questions that are important to you and you could do that by accessing online resources. You do have to be concerned about credibility, so I hate to tell you this, but sometimes things on the internet are wrong. So, you actually have to use a filter, pay attention to whether what you're reading actually make sense, and do you know anything about the person saying it, and those kinds of things. If neither one of those works, if direct observation doesn't work, and if you can't find the data you need online, you can contact authoritative sources directly. So for example, say you wanted some information about the spread of disease, so you need to contact the Center for Disease Control in Atlanta here in the US, and you could do that by actually writing a letter, or maybe sending an email, maybe that would work okay, right. But it's going to be slower than just grabbing the data off the internet. You tend to worry less about the credibility of the response that data should be good, if you've picked reasonable organizations to ask the question about. So, what if you don't find the data, what if you can't find the data, it's not available? What can you do? Well, one of the things you can do is find the data from a related domain and make inferences. So, I happened to really care about how computer science students learn. So, if there weren't data about how computer science students learn, and there's lots of data about that. There's a very vibrant computer science education community. But if there weren't, then I might be able to go gather data about how math students learn, and because math is a very detail-oriented step-by-step process, and so is computer programming, I might be able to draw inferences from the math dataset to the computer science domain. You can also collect some data yourself, and this is direct observation. So, that's one way you can do it if no one has provided the data that you want. So let's say I cared about how computer science students learn computational thinking. I could collect data, and I promise am not collecting any data about how you learn computational thinking, but it could in a classroom for example with the appropriate institutional research board approvals, and so on, I could collect data about how students are learning computational thinking, and I'm doing that through direct observation. Probably, the least good, great English, the least good way to do it would be, well, change your problem. You might be able to rephrase your problem to something for which data is available, and then you can answer that question instead. A lot of times we can't really do this, but sometimes, we have the flexibility if we're trying to build. Particularly, if we're trying to build a particular application and we can't get the data that we need for the application we envisioned, we might be able to pivot to a slightly different application that in fact, we can get the data for, so we can build it. To recap, in this lecture, we talked about various techniques we can use to find the data that we've decided to collect to solve a particular problem.