First, let's talk about where data comes from. What are all the common sources of data that you may want to analyze then you may come across. The most obvious is the data that the organization uses every day. This includes customer data, transaction data, processing production data, inventory and sales data, data about employees, partners, suppliers and so forth. This is the data that organizations primarily strive to integrate and make sense of. Then there's what's called dark data, data that may be used for a single purpose and forgotten about or archived. This can include some of the above, but more often it's data like e-mails or documents or reports or system log files that track the operation and performance of devices, computer software and other equipment. As we'll discuss in the next module, clever organizations find endless ways to spin this underutilized and overlooked data into gold. Then there's what's called open data or public data. This is data that's made freely available by government or other organizations. It also includes public financial reports. Many governments from national governments to local governments have a mandate to share data with the public, from safety, to weather, to housing, to population, employment data, and everything in-between. It's been estimated that governments around the world publish over 10 million distinct datasets. Checkout data.gov for an example of a government data portal or a library of available data published by the US Federal Government. Others from the national governments of India, the UK, Germany, and Nigeria are also particularly impressive. Then there's social media content. Of course we're all aware of the amount of data that social media companies like Facebook, Twitter, LinkedIn, now owned by Microsoft, Yahoo, and Google capture about us. Yes, they use this data for various economic benefits mostly presenting targeted ads. But this data is also available to you and your employer to identify trends or instances of information. Again, clever companies use this data to tremendous economic advantage. Then there's what's called syndicated data or commercial data. It's data collected and sold by data aggregators are data brokers like AC Nielsen, Experian, Neustar, Dun & Bradstreet, Bloomberg, Reuters Standard & Poor's et cetera. Credit bureaus have been around in the US since the late 18 hundreds. They collected data on who paid their bills on time as reported by their local merchants and neighbors. Today, thousands of such companies, big and small collect, harvest, and sell data and everything from credit data to name and address information, to patent data, stock trade data, restaurant menu data, to what content is being trafficked over peer-to-peer networks like BitTorrent, and of course news content. Sometimes the lines are blurred as to whether a company is in the data business or the news business or the social media business. In addition, a handful of what are called data marketplaces have emerged to connect sellers and buyers of data. Some are general marketplaces while others focus on a particular industry like healthcare or financial data. Then there's web content. Many organizations are just coming to realize like many syndicated data brokers already have, that websites themselves are a goldmine of information. Technologies have emerged that can crawl websites and harvest content from competitor product pricing to executives that have joined or left accompany. Then there's partner data. All companies are part of a larger ecosystem of partners, suppliers, and even suppliers suppliers. Data exchange arrangements are increasingly commonplace to gather data from partners. There's reference data. There are a couple of main kind of reference data you'll be frequently hearing mentioned. Those are master data and metadata, they're easily confused for one another. Master data is reference data typically about customers or products, its data like a customer or a product code that helps link customer product data wherever it exists throughout the organization and sometimes between organizations. There are some data brokers that specialized in selling reference data, like the Dun & Bradstreet D-U-N-S Number for US businesses or product reference data from index. Metadata is often called data about data. That is, what is the business meaning of a particular dataset and/or data field? What is its provenance or where did it originate? How old is it? How has it been computed or integrated? What are its quality characteristics? If that's even been measured, and perhaps who's responsible for maintaining that data? Metadata is helpful if not critical for business people and business analysts to find, identify, and access the right data to use for the right purpose. Often, metadata is maintained in what's called a data catalog. Then there's the Internet of Things, increasingly data streaming from equipment and machines from cars and planes, to manufacturing and mining equipment, traffic signals, even you're connected coffee machine over the internet to companies that make these products or the customers that installed them. This collective of data is loosely referred to as the Internet of Things or IoT, even though these data streams are completely independent of one another. Finally, there's what people will call alternative data. Alternative data as a term worth mentioning as it's become a lively topic of discussion in data circles. It basically refers to external or exogenous data that can provide some unique or supplemental competitive advantage over data you already generate or collect. Data brokers and data marketplaces specialized in selling alternative data in the form of data products or data services, but any of the above can be considered alternative data as well.