All right. Evaluating the methods of data access. So, we've talked about the importance of data in our data communication story. Right? We've talked about the explosion and growth of data. So, what are the ways we can access that data? Let's look at a few of them and give each of them a little bit of an evaluation. There really are three primary ways that we as analysts will access the data that we use in our Data Communication story. We'll call those bulk downloads, APIs which is application programming interfaces and then web-scraping. Let's look at each of them individually. Bulk downloads is the idea that someone, a data owner, has provided a place for you as an analyst to go and collect that data. These are very controlled releases of data. Some of these, I would even think of as your internal company data. Someone is the owner of that data, you are accessing it for analysis purposes, that is what I would classify as a bulk download just as sure as any third party download sites. So, these bulk download areas are typically data tables that are accessed through some graphical user interface, or GUI, which is built for the purpose of giving you that data. The people who own this data want you to have in the set up a bridge for you to get to that data. These gives the data owner the ability to determine what they give out and what they don't to hold back the sensitive information. It also gives the data owner the ability to provide limits to how much of the data you can get and to differentiate access based on who you as someone who has declared through a login who you are, what data you get, what data you don't. It is the easiest way to access data bar none. Right. A good example of this is the census data here in the US. They've constructed a site called American FactFinder. You can go to American FactFinder, there's great GUI and very simply and easily access all of or most of the US censuses data, at least the data they want you to get access to. Now, these things don't always look great. IMDb has what I would call a bulk download source as well, it's a little bit hidden, a little harder to find. But all the same, they have set up a place for you to go access data as an individual. We live in an era of such plentiful data that we really need to do little more than sign up for a newsletter to get tremendous amounts of information. Interesting, vital datasets sent our way. Kaggle does this weekly, we'll send you a list of interesting new databases that you can access through a prescribed dedicated access link, that is again this idea of bulk downloads. A second way that we can access data is what we call API's application programming interfaces. Truly the idea of an API is to let a computer talk to a computer. It is a machine to machine connection. Doing this accessing and building an API connection for yourself would require some some understanding scripting language, a set of commands for you to use, there are tokens and keys that access security. Again, the design is meant to be Machine to Machine and taken care of automatically. Well, you can, as an analyst, build your own API access tool to go out and tap this data. Typically, data that we want to have access to would be better found through a bulk download. But APIs can work. This is an example of some of that scripting, this is the API for Google Maps. Rather than going out and running your own API, you can utilize sites that are really fancy user-friendly markups of APIs. Let me show you an example, Tweet Binder is one of those. Tweet Binder will give you great access and insight into Twitter data. Basically, what Tweet Binder is, it's an API connection to the Twitter API that takes in through its nice graphical interface, some information that then pops into their API query, goes off pings the Twitter API, grabs data, brings it back all cleaned up for you. So, in this case it's cheating, because you're using an API but you're accessing it through someone else's GUI. That would be in my estimation not as effective as finding a bulk download source, it is still a better alternative to writing your own API scripts that take a little more time and a little more expertise. A third way that we can access data is through this practice that's called Web Scraping. Now, in the early days of the web, this was the black hat access to data, this is the sneaky way to go about and get whatever data you need. It's frankly fallen out of practice, and that's because we live again with so much information around us, and so many organizations and data owners are willing to give you that data. There's very little reason for you to go on and scrape data that you're not intended to have. In fact, in an analysis an interesting one was done by Sophie Chou who put together this thought matrix to say through a decision tree, whether you should or should not build a Web scraper, and almost every single path you went down said, "Don't build a Web scraper." Why would you scrape this data? There really are a number of good reasons why you would not. If you need to scrape data from an html webpage or any source, that's only because it hasn't been packaged for you to collect, which means that unlike through an API or bulk download site, that data owner really probably doesn't want you to have that data. If they did, they would make it available. Right. There are usually copyrights and other licensing issues that govern much of that data, and it can be at times a very easy case of simply just copying and pasting data from an HTML page, can require pre-sophisticated Python programming or other language use to go out and grab the data that you want. Again, data that probably was never intended for you to get anyways. So, those are the three ways that we as analysts can access all this great data that's being created. Again, in my opinion the bulk download area on site is the one that should form the vast majority of the data that we're using in our analysis, either publicly available data that we access through that means or our own company internal data that we access through that means. APIs can in some cases work well and Web scraping is something that I think as an analyst, we're better just to leave behind.