This course will cover various topics in data engineering in support of decision support systems, data analytics, data mining, machine learning, and artificial intelligence. You will study on-premises data warehouse architecture, dimensional modeling of data warehouses, Extract-Transform-Load (ETL) integration from source systems to data warehouse, On-line Analytical Processing (OLAP) systems, and the evolving world of data quality and data governance. It offers you an opportunity to design, develop and maintain cloud-based data pipelines. Both on-premises and cloud-based platforms will be used to illustrate and implement data engineering techniques using operational and analytical data warehouses.

Data Warehousing and Integration Part 1

Data Warehousing and Integration Part 1

Instructor: Venkat Krishnamurthy
Access provided by Abu Dhabi National Oil Company
Details to know

Add to your LinkedIn profile
13 assignments
See how employees at top companies are mastering in-demand skills

There are 7 modules in this course
This module introduces data warehousing and business intelligence, emphasizing their role in enhancing organizational decision-making. Data warehouses transform raw data into actionable insights using processes like ETL (Extract, Transform, and Load), supported by tools such as OLAP for querying and data mining. While operational databases (OLTP) are suited for daily transactions, OLAP databases are optimized for complex analytics.
What's included
3 videos6 readings1 assignment
3 videos•Total 7 minutes
- Course Overview•2 minutes
- Meet Your Instructor: Venkat Krishnamurthy•2 minutes
- Introduction to Data Warehouses•4 minutes
6 readings•Total 178 minutes
- Welcome to Data Warehousing & Integration Part 1•2 minutes
- Syllabus - Data Warehousing & Integration Part 1•10 minutes
- Academic Integrity•1 minute
- Module 1 Overview•5 minutes
- Introduction to Data Warehouses•5 minutes
- Conceptual Database Design•155 minutes
1 assignment•Total 15 minutes
- Assess Your Learning: Conceptual Database Modeling•15 minutes
This module builds on the foundations of database design from the previous module, focussing on relational database modeling, normalization, and SQL. The readings will guide you in translating a conceptual EER diagram into a relational model, ensuring adherence to normalization principles and aiming for Third Normal Form (3NF). We’ll also emphasize understanding primary keys and foreign keys for maintaining data integrity and establishing table relationships. You will also have the opportunity to create and critique relational models. We’ll then explore SQL basics, covering syntax (SELECT, INSERT, UPDATE, DELETE), querying techniques (WHERE, ORDER BY, JOIN), and operations involving functions and aggregates (COUNT, SUM, AVG, MIN, MAX), which are fundamental in database querying and management.
What's included
3 readings2 assignments1 app item
3 readings•Total 339 minutes
- Module 2 Overview•5 minutes
- Logical Database Design•165 minutes
- SQL•169 minutes
2 assignments•Total 40 minutes
- Assess Your Learning: Logical Database Design•20 minutes
- Assess Your Learning: SQL•20 minutes
1 app item•Total 10 minutes
- Normalization•10 minutes
This module provides an introduction to data warehouse concepts. Data warehouses are based on a multidimensional model. We will look closely into the multidimensional model and its representation as data cubes (also known as hypercubes). We’ll examine how different aspects of data are categorized into facts, measures, and dimensions. Dimensions such as Product, Time, and Customer are organized hierarchically within a cube, allowing data to be analyzed at various levels of detail. Measures such as Quantity and Sales Amount are stored within these cubes, and analysts can navigate through different levels of detail using "rolling up" and "drilling down" techniques. We will also explore key concepts such as granularity, dimension schema, and member hierarchies, which are essential in understanding how data is structured and analyzed in multidimensional models. Finally, we will learn to use techniques such as disjointness, completeness, and correctness to ensure data accuracy and integrity when aggregating information in data cubes, collectively known as summarizability.
What's included
2 videos5 readings2 assignments1 app item
2 videos•Total 6 minutes
- Mental Image of Multidimensional Cube•3 minutes
- Summarizability•3 minutes
5 readings•Total 93 minutes
- Module 3 Overview•5 minutes
- Multidimensional Model•12 minutes
- Measures and Summarizability•46 minutes
- OLAP Operations on a Multidimensional Model•10 minutes
- Data Warehouse and Architecture•20 minutes
2 assignments•Total 50 minutes
- Assess Your Learning: Measures & Summarizability•25 minutes
- Assess Your Learning: OLAP Operations•25 minutes
1 app item•Total 15 minutes
- The Multidimensional Model•15 minutes
In this module we’ll explore conceptual modeling with multidimensional models, visualized using MultiDim. This approach helps us organize data into facts and dimensions and understand the relationships between them, which is essential for designing data warehouses. We’ll explore topics such as dimensions (e.g., date, customer) and measures (e.g., quantity, total sales) in more detail. We’ll also explore the difference between primary events and secondary events and learn how they are used. Finally, we will look at another categorization of Measures into Flow: Level and Unit Measures.
What's included
2 videos4 readings3 assignments
2 videos•Total 9 minutes
- Primary and Secondary Events•4 minutes
- Additivity of Measures•5 minutes
4 readings•Total 56 minutes
- Module 4 Overview•5 minutes
- Design Conceptual Multidimensional Models•36 minutes
- Primary and Secondary Events•5 minutes
- Additivity of Measures•10 minutes
3 assignments•Total 31 minutes
- Assess Your Learning: Conceptual Modeling 1•15 minutes
- Assess Your Learning: Primary and Secondary Events•8 minutes
- Assess Your Learning: Additivity of Measures•8 minutes
In this module, we’ll dive into conceptual modeling of hierarchies within data warehouses, exploring their definitions, characteristics, and significance. Balanced hierarchies have a uniform structure where each child has one parent and all branches are of the same length, making data analysis consistent and efficient. In contrast, unbalanced hierarchies have varying branch lengths and missing aggregation levels, offering flexibility to model real-world scenarios like product categories and geographical hierarchies. You’ll also be introduced to generalized hierarchies, which involve "is-a" relationships between supertypes and subtypes, allowing for detailed data representation but requiring careful management of aggregation and specialization. We’ll also explore alternative hierarchies, showcasing different ways to organize the same dimension, such as calendar vs. fiscal views of time. Finally, we’ll look at parallel hierarchies, both independent and dependent, as tools for analyzing data from multiple perspectives, representing complex organizational structures. Understanding these hierarchy types is crucial for effective data management and analysis in data warehousing.
What's included
4 videos3 readings2 assignments
4 videos•Total 14 minutes
- Balanced and Unbalanced Hierarchies•5 minutes
- Generalized Hierarchies•4 minutes
- Alternative Hierarchies•3 minutes
- Parallel Hierarchies•2 minutes
3 readings•Total 140 minutes
- Module 5 Overview•5 minutes
- Balanced and Unbalanced Hierarchies•60 minutes
- Advanced Modeling Concepts•75 minutes
2 assignments•Total 23 minutes
- Assess Your Learning: Conceptual Modeling of Hierarchies•15 minutes
- Assess Your Learning: Advanced Modeling Concepts•8 minutes
In this module, you’ll explore logical modeling in data warehousing, which is the process of designing a structured, abstract representation of data to be stored, focusing on how data is organized, related, and optimized for efficient querying and analysis. Building on what you learned in the previous modules, you'll take the next step in data warehouse design: translating a conceptual model into a logical model for implementation. The module will focus on the relational representation of data warehouses, including the study of various schema implementations: star, snowflake, starflake, and constellation. You'll also examine the rules for mapping a multidimensional conceptual model to a relational model, highlighting the role and importance of different types of keys in this process. We'll also discuss strategies for maintaining consistency in a data warehouse. Finally, you'll explore how to pre-populate certain dimensions, like time, to streamline operations and improve query performance.
What's included
6 videos11 readings2 assignments1 app item
6 videos•Total 9 minutes
- Introduction to Logical Modeling in Data Warehousing•2 minutes
- Different ROLAP Schemas Conclusion•2 minutes
- Surrogate Keys•1 minute
- Importance of Data Consistency•1 minute
- Consistency in a Data Warehouse Example•2 minutes
- Prepopulating Dimensional Data Example•1 minute
11 readings•Total 122 minutes
- Module 6 Overview•5 minutes
- Logical Modeling of Data Warehouse•32 minutes
- Introduction to Surrogate Keys•10 minutes
- Benefits of Surrogate Keys•10 minutes
- Implementation of Surrogate Keys in a Data Warehouse•10 minutes
- Importance of Data Consistency•5 minutes
- Challenges & Best Practices for Maintaining and Ensuring Data Consistency•10 minutes
- Understanding Prepopulating Dimensions•5 minutes
- The Process of Prepopulating Time and Geography Dimensions•5 minutes
- Benefits of Prepopulating Time and Geography Dimensions•5 minutes
- Prepopulating Dimensions•25 minutes
2 assignments•Total 35 minutes
- Assess Your Learning: Logical Modeling•20 minutes
- Assess Your Learning: Keys, Consistency and Prepopulating Dimensions•15 minutes
1 app item•Total 20 minutes
- Types of ROLAP Schemas•20 minutes
Designing a data warehouse is a complex process that requires transitioning from high-level conceptual models to detailed logical models. This transition is critical because it bridges the gap between understanding business needs and translating them into a technical framework that effectively supports those needs. In this module, you’ll expand on the logical modeling process covered in the previous module, with a particular focus on dimensional model design and the intricacies of hierarchy modeling. As you delve deeper, you’ll encounter logical modeling for advanced concepts such as many-to-many dimensions, links between facts, and facts with multiple granularities. We’ll also explore the concept of Slowly Changing Dimensions (SCDs), which are essential for managing historical data in your warehouse. You’ll learn how to implement different SCD types to accurately track and manage changes in dimension data over time. Finally, we’ll touch on SQL for OLAP, focusing on advanced concepts like aggregation and window functions, and you’ll learn how to use SQL to query and analyze data warehouses.
What's included
5 videos11 readings1 assignment
5 videos•Total 13 minutes
- Modeling Various Types of Hierarchies•5 minutes
- SCD Best Practices•2 minutes
- Translating between SCDs•3 minutes
- Examples of Translating Between SCD Types•2 minutes
- Conclusion •1 minute
11 readings•Total 137 minutes
- Module 7 Overview•5 minutes
- Introduction to Conceptual & Logical Models•15 minutes
- Mapping Process•10 minutes
- Conclusion•1 minute
- Advanced Modeling Concepts•36 minutes
- Understanding Slowly Changing Dimensions•5 minutes
- Types of Slowly Changing Dimensions•10 minutes
- Benefits of Managing Slowly Changing Dimensions•5 minutes
- Steps for Translating Between SCD Types•10 minutes
- Performing OLAP queries with SQL•38 minutes
- Congratulations! •2 minutes
1 assignment•Total 25 minutes
- Assess Your Learning: Logical Representation of Hierarchies and Advanced concepts•25 minutes
Instructor

Offered by

Offered by

Founded in 1898, Northeastern is a global research university with a distinctive, experience-driven approach to education and discovery. The university is a leader in experiential learning, powered by the world’s most far-reaching cooperative education program. The spirit of collaboration guides a use-inspired research enterprise focused on solving global challenges in health, security, and sustainability.
Why people choose Coursera for their career

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.
Explore more from Data Science
NNortheastern University
Course
Category: Credit offeredCredit offered
UUniversity of Colorado System
Course
Category: Credit offeredCredit offered
Course
Category: Credit offeredCredit offered
NNortheastern University
Course
Category: Credit offeredCredit offered