Coursera

Pipeline Architects: Data Engineering to Lakehouse Specialization

Coursera

Pipeline Architects: Data Engineering to Lakehouse Specialization

Build Data Pipelines That Scale to Production.

Master ingestion, transformation, orchestration, and lakehouse architecture at scale.

Hurix Digital

Instructor: Hurix Digital

Access provided by Masterflex LLC, Part of Avantor

Get in-depth knowledge of a subject
Intermediate level

Recommended experience

4 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace
Get in-depth knowledge of a subject
Intermediate level

Recommended experience

4 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Design data flow diagrams and configure Airbyte connectors for relational databases, streaming platforms, and REST APIs to unify diverse sources.

  • Build modular ETL pipelines using Python, dbt, and Airflow, and evaluate columnar versus row-oriented storage formats for analytical workloads.

  • Implement incremental warehouse loading, SCD2 historical tracking, and data lake transactions with versioning and schema evolution support.

  • Architect and build lakehouse platforms using Delta Lake, Iceberg, and Hudi, registering external tables and automating ingestion pipelines.

Details to know

Shareable certificate

Add to your LinkedIn profile

Taught in English
Recently updated!

April 2026

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Advance your subject-matter expertise

  • Learn in-demand skills from university and industry experts
  • Master a subject or tool with hands-on projects
  • Develop a deep understanding of key concepts
  • Earn a career certificate from Coursera

Specialization - 10 course series

Map Data Flows Fast

Map Data Flows Fast

Course 1, 1 hour

What you'll learn

  • Visual data flow docs are key for system clarity and form the base for good pipeline design and team communication.

  • Complete data flow diagrams must show the full journey from sources through transforms to final destinations.

  • Structured diagram creation follows steps: find sources, map processes, set destinations, and check connections.

  • Good data flow visuals connect technical work with business needs, enabling stakeholder alignment and decisions.

Skills you'll gain

Category: Data Processing
Category: Diagram Design
Category: Data Flow Diagrams (DFDs)
Category: Data Presentation
Category: Data Store
Category: Data Transformation
Category: Data Visualization
Category: Data Literacy
Category: Data Pipelines
Category: Technical Communication
Category: Dataflow
Category: Data Mapping
Unify Diverse Data Sources

Unify Diverse Data Sources

Course 2, 1 hour

What you'll learn

  • Standardized connector configuration patterns apply across different data source types, making integration skills transferable.

  • Authentication and security considerations must be built into every connector setup to ensure enterprise-grade data protection.

  • Proper offset and parameter management in streaming and API connections prevents data loss and ensures complete data capture.

  • Unified staging approaches enable downstream analytics and business intelligence regardless of source system complexity.

Skills you'll gain

Category: Databases
Category: Real Time Data
Category: Relational Databases
Category: Enterprise Security
Category: Data Integration
Category: Database Systems
Category: Data Pipelines
Category: Authentications
Category: Restful API
Category: Data Infrastructure
Category: Apache Kafka
Evaluate Storage for Data Warehousing Success

Evaluate Storage for Data Warehousing Success

Course 3, 2 hours

What you'll learn

  • Storage format choice strongly affects query performance and should match workload needs, not general assumptions.

  • Column storage suits read-heavy analytics, while row storage performs better for transactional and write-focused workloads.

  • Benchmarking with real datasets and queries offers the best basis for sound storage architecture decisions.

  • Compression and ingestion speed must be balanced carefully to align performance with business priorities.

Skills you'll gain

Category: Scalability
Category: Data Processing
Category: Query Languages
Category: Star Schema
Category: Data Storage Technologies
Category: Performance Testing
Category: Data Storage
Category: Amazon Redshift
Category: Apache Hive
Category: Analysis
Category: Data Architecture
Category: Technical Communication
Category: Data Warehousing
Category: Data-Driven Decision-Making
Category: Snowflake Schema
Build & Transform Data Pipelines

Build & Transform Data Pipelines

Course 4, 2 hours

What you'll learn

  • Modular pipeline design enables maintainable, scalable data systems that can adapt to changing business requirements.

  • Integration of complementary tools (Spark, dbt, Airflow) creates more robust and efficient data processing workflows than single-tool approaches.

  • Proper separation of concerns between ingestion, transformation, and loading stages reduces complexity and improves debugging capabilities.

  • Automation and orchestration are essential for reliable, production-grade data systems that minimize manual intervention and human error.

Skills you'll gain

Category: Data Pipelines
Category: Data Processing
Category: Extract, Transform, Load
Category: Data Warehousing
Category: Cloud Computing
Category: Apache Airflow
Category: Data Integration
Category: Cloud Deployment
Category: Maintainability
Category: Data Cleansing
Update Your Data Warehouse Incrementally

Update Your Data Warehouse Incrementally

Course 5, 2 hours

What you'll learn

  • Standardized connector configuration patterns apply across different data source types, making integration skills transferable.

  • Authentication and security considerations must be built into every connector setup to ensure enterprise-grade data protection

  • Proper offset and parameter management in streaming and API connections prevents data loss and ensures complete data capture.

  • Unified staging approaches enable downstream analytics and business intelligence regardless of source system complexity.

Apply SCD2 to Build Dynamic Data Models

Apply SCD2 to Build Dynamic Data Models

Course 6, 2 hours

What you'll learn

  • Historical data preservation is essential for accurate business analytics and regulatory compliance - once overwritten, critical context is lost.

  • SCD2 patterns create sustainable data architecture by maintaining complete audit trails through automated versioning than destructive updates.

  • Effective dimensional modeling requires systematic change detection logic that identifies modifications and creates new historical records.

  • Modern data tools like dbt democratize complex SCD2 implementation, making enterprise-grade historical tracking accessible through declarative SQL.

Skills you'll gain

Category: Data Integrity
Category: Data Modeling
Category: Business Intelligence
Category: Trend Analysis
Category: Audit Planning
Category: Data Warehousing
Category: Scalability
Category: Data Management
Category: Data Quality
Category: Data Pipelines
Category: Database Design
Category: SQL
Apply Data Lake Transactions & Versioning

Apply Data Lake Transactions & Versioning

Course 7, 2 hours

What you'll learn

  • Transactional storage layers ensure data lake reliability, supporting concurrent operations and maintaining integrity.

  • Version control in data lakes enables auditing, compliance, time-travel queries, and error recovery for production systems.

  • Schema evolution strategies help data systems adapt to business changes while maintaining backward compatibility.

  • Converting raw files to transactional formats is a key pattern supporting both analytics and operational reliability.

Skills you'll gain

Category: Database Architecture and Administration
Category: Data Pipelines
Category: Data Lakes
Category: SQL
Build & Analyze Your Data Lakehouse

Build & Analyze Your Data Lakehouse

Course 8, 2 hours

What you'll learn

  • External tables let query engines access distributed files without duplication, reshaping large-scale analytics design.

  • Choosing Delta, Iceberg, or Hudi requires evaluating schema changes, time travel needs, and performance goals.

  • Lakehouse architecture merges data lake flexibility with warehouse reliability using metadata and ACID support.

  • Automated ingestion with staging and transformation layers ensures consistent, high-quality data across analytics systems.

Automate Data Workflows with Airflow Excellence

Automate Data Workflows with Airflow Excellence

Course 9, 1 hour

What you'll learn

  • Production-grade workflows require proactive failure handling strategies, not reactive troubleshooting approaches.

  • Parameterization and configuration management are essential for workflow reusability across different environments and datasets.

  • Task dependency design and SLA monitoring form the foundation of reliable data pipeline operations.

  • Robust workflow architecture prevents downstream business disruptions and reduces operational overhead.

Skills you'll gain

Category: Scalability
Category: Data Pipelines
Category: Apache Airflow
Category: Workflow Management
Category: Incident Response
Category: Service Level Agreement
Category: Extract, Transform, Load
Category: DevOps
Category: MLOps (Machine Learning Operations)
Category: System Monitoring
Unify, Reconcile, and Tune Data Systems

Unify, Reconcile, and Tune Data Systems

Course 10, 3 hours

What you'll learn

  • SQL MERGE offers atomic sync that maintains consistency in CDC pipelines with minimal overhead.

  • Field-level conflict analysis needs clear business rules and source-of-truth hierarchies for reliable reconciliation.

  • Integration performance improves through measurement, bottleneck detection, and targeted tuning, not large redesigns.

  • Sustainable data systems balance quality, speed, and reliability through ongoing monitoring and iterative improvement.

Skills you'll gain

Category: Data Validation
Category: Performance Improvement
Category: SQL
Category: Data Integrity
Category: Data Quality
Category: Data Cleansing
Category: Consolidation
Category: Stored Procedure
Category: Operational Databases
Category: Database Design
Category: Performance Testing
Category: Data Pipelines
Category: Application Performance Management
Category: Performance Measurement
Category: Data Integration
Category: Data Governance
Category: Data Manipulation
Category: Systems Integration
Category: Performance Tuning
Category: Performance Metric

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Hurix Digital
Coursera
387 Courses33,948 learners

Offered by

Coursera

Why people choose Coursera for their career

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."