Welcome to “Data Warehouse Architecture Overview.” After watching this video, you will be able to: List use cases that drive data warehouse design considerations. Describe a general data warehousing architecture and list its component layers. Distinguish between general and reference enterprise data warehouse architecture and Describe reference architectures for two enterprise data warehouse platforms. The details of the architecture of a data warehouse depend on the intended usage of the platform. Requirements can include report generation and dashboarding, exploratory data analysis, automation and machine learning, and self-serve analytics. Let’s start by considering a general architectural model for an Enterprise Data Warehouse, or EDW, platform, which companies can adapt for their analytics requirements. In this architecture, you can have various layers or components, including: Data sources, such as flat files, databases, and existing operational systems, an ETL layer for extracting, transforming, and loading data, optional staging and sandbox areas for holding data and developing workflows, an enterprise data warehouse repository, sometimes, data marts, which are known as a “hub and spoke” architecture when multiple data marts are involved, and an analytics layer and business intelligence tools. Data warehouses also enforce security for incoming data and data passing through to further stages and users throughout the network. Enterprise data warehouse vendors often create proprietary reference architecture and implement template data warehousing solutions that are variations on this general architectural model. A data warehousing platform is a complex environment with lots of moving parts. Thus, interoperability among components is vital. Vendor-specific reference architecture typically incorporates tools and products from the vendor’s ecosystem that work well together. Next, let’s check out IBM-specific reference data warehouse architecture. Each layer of the architecture performs a specific function: The data acquisition layer consists of components to acquire raw data from source systems, such as human resources, finance, and billing departments. The data integration layer, essentially a staging area, has components for extracting the data, transforming it, and loading it into the data repository layer. It also houses administration tools and central metadata. The data repository layer stores the integrated data, typically employing a relational model. The analytics layer often stores data in a cube format to make it easier for users to analyze it. And, the final presentation layer incorporates applications that provide access for different sets of users, such as marketing analysts, users, and agents. Applications consume the data through web pages and portals defined in the reporting tool or through web services. IBM reference architecture is supported and extended using several products from the IBM InfoSphere suite. IBM InfoSphere DataStage is a scalable ETL platform that delivers near real-time integration of all data types, on-premises, and in cloud environments. IBM InfoSphere MetaData Workbench provides end-to-end data flow reporting and impacts analysis of information assets in an environment that allows organizations to share easily, locate, and retrieve information from these systems. Use the built-in data flow reporting capabilities to monitor how IBM InfoSphere DataStage moves and transforms your data. IBM InfoSphere QualityStage, designed to support your data quality and information governance initiatives, enables you to investigate, cleanse, and manage your data. This solution helps you create and maintain consistent views of key entities, including customers, vendors, locations, and products. IBM Db2 Warehouse is a family of highly performant, scalable, and reliable data management products that manage both structured and unstructured data across on-premises and cloud environments. And finally, IBM Cognos Analytics is an advanced business intelligence platform that generates reports, scoreboards, and dashboards, performs exploratory data analysis, and even curates and joins your data using multiple sources. In this video, you learned that: An architectural model for a general data warehousing platform includes data sources, ETL pipelines, optional staging and sandbox areas, an enterprise data warehouse repository, optional data marts, and analytics and business intelligence tools. Companies can modify general enterprise data warehouse architecture to suit their analytics requirements. Vendors offer proprietary reference architecture based on the general model, which they test for interoperability among components. An IBM enterprise data warehouse solution combines InfoSphere with Db2 Warehouse and Cognos Analytics.