Coursera

Open source Data Engineering with Spark, dbt & Airflow Professional Certificate

Coursera

Open source Data Engineering with Spark, dbt & Airflow Professional Certificate

Build Production Data Pipelines at Scale.

Explore Spark, dbt, and Airflow to design, automate, and deploy enterprise-grade data pipelines.

Access provided by PALC Dev

Earn a career credential that demonstrates your expertise
Intermediate level

Recommended experience

4 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace
Earn a career credential that demonstrates your expertise
Intermediate level

Recommended experience

4 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Build modular, production-grade data pipelines using Apache Spark, dbt, and Airflow to ingest, transform, and load data at scale.

  • Design and implement dimensional data models including star schemas, SCD Type 2, and incremental load strategies for data warehouses.

  • Optimize distributed data processing by resolving Spark shuffle, skew, and partitioning issues to improve pipeline performance.

  • Automate deployments and enforce data quality using CI/CD pipelines, Docker containers, and automated testing frameworks like Great Expectations.

Details to know

Shareable certificate

Add to your LinkedIn profile

Taught in English
Recently updated!

March 2026

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Advance your career with in-demand skills

  • Receive professional-level training from Coursera
  • Demonstrate your technical proficiency
  • Earn an employer-recognized certificate from Coursera

Professional Certificate - 6 course series

What you'll learn

  • Build end-to-end data pipelines that automatically ingest from databases, APIs, and streams using Spark, dbt, and Airflow tools.

  • Design data models with historical tracking using SCD Type 2 patterns to preserve complete change history for analytics.

  • Create automated workflows with intelligent retry logic, SLA monitoring, and parameterization for production reliability.

  • Optimize Spark job performance using partitioning and caching strategies to achieve 30%+ runtime improvements.

Skills you'll gain

Category: Data Architecture
Category: Data Modeling
Category: Data Processing
Category: Data Quality
Category: Data Pipelines
Category: Data Integration
Category: Data Transformation
Category: Apache Airflow
Category: Apache Spark
Category: Extract, Transform, Load
Category: Data Warehousing
Category: Enterprise Security
Category: Data Flow Diagrams (DFDs)
Category: Data Validation
Category: Database Development
Category: Configuration Management

What you'll learn

  • Optimize Spark job performance through strategic partitioning and caching, achieving 30%+ runtime improvements using data access analysis.

  • Implement transactional data lakes with Delta format, enabling versioning, ACID operations, and schema evolution for reliable datasets.

  • Provision secure cloud data infrastructure using IAM policies, private networks, and encrypted storage following security best practices.

  • Evaluate and benchmark storage formats (Parquet, ORC, Avro) to select optimal solutions for analytical workloads and cost efficiency.

Skills you'll gain

Category: Cloud Deployment
Category: Data Storage Technologies
Category: Amazon S3
Category: Infrastructure Architecture
Category: PySpark
Category: Apache Spark
Category: Data Integrity
Category: Performance Tuning
Category: Cloud Security
Category: Infrastructure as Code (IaC)
Category: Cloud Storage
Category: Data Warehousing
Category: Transaction Processing
Category: Data Management
Category: Data Lakes
Category: Cloud Computing Architecture
Category: Data Security
Category: Data Storage
Category: Data Infrastructure
Category: Cloud Computing

What you'll learn

  • Design star schema data models with fact and dimension tables that enable intuitive self-service business intelligence reporting.

  • Apply third normal form normalization to optimize database structure while maintaining query performance through indexing strategies.

  • Use advanced SQL window functions to calculate rolling metrics, rankings, and time-series analytics for complex data analysis.

  • Implement database replication and incremental loading techniques to ensure high availability and efficient data warehouse updates.

Skills you'll gain

Category: Data Modeling
Category: Star Schema
Category: Performance Tuning
Category: Data Warehousing
Category: Business Intelligence
Category: Relational Databases
Category: Database Development
Category: Data Integration
Category: Data Pipelines
Category: Database Software
Category: Database Design
Category: Database Architecture and Administration
Category: Extract, Transform, Load
Category: SQL
Category: Data Quality

What you'll learn

  • Resolve merge conflicts and trace bugs using Git history tools, keeping collaborative codebases stable and production-ready.

  • Design branching strategies and automate deployments with CI/CD pipelines to safely promote data pipeline artifacts across environments.

  • Build and publish versioned Docker images and automate server configuration with Ansible for consistent, reproducible environments.

  • Analyze query execution metrics and optimize resource allocation to maintain performance targets in production data systems.

Skills you'll gain

Category: Continuous Deployment
Category: Continuous Integration
Category: Application Deployment
Category: Root Cause Analysis
Category: Data Infrastructure
Category: Development Environment
Category: Docker (Software)
Category: Git (Version Control System)
Category: DevOps
Category: Data Pipelines
Category: Ansible
Category: CI/CD
Category: Configuration Management
Category: Containerization
Category: Infrastructure as Code (IaC)
Category: Version Control
Category: Performance Tuning

What you'll learn

  • Define and automate data quality tests using YAML to validate row counts, null thresholds, and uniqueness across pipeline datasets.

  • Trace data anomalies through pipeline stages by analyzing logs and dashboards to identify and fix the exact source of failure.

  • Apply advanced Python debugging tools — including conditional breakpoints, watchpoints, and pdb — to diagnose and resolve pipeline issues.

  • Resolve complex concurrency bugs by reading stack traces and correlating thread logs to identify deadlocks and race conditions in code.

Skills you'll gain

Category: Performance Tuning
Category: Test Automation
Category: YAML
Category: Debugging
Category: Python Programming
Category: Data Validation
Category: Reliability
Category: Development Testing
Category: Anomaly Detection
Category: Generative AI
Category: Data Integrity
Category: Dashboard
Category: DevOps
Category: Data Quality
Category: Root Cause Analysis
Category: Data Pipelines

What you'll learn

  • Build a data engineering portfolio with end-to-end pipeline projects that prove your ability to design, build, and deploy production-style systems.

  • Create a resume, LinkedIn profile, and GitHub presence that position you as a hands-on data engineer ready to contribute from day one.

  • Practice real data engineering interview scenarios and develop structured responses to technical, design, and behavioral questions.

  • Execute a 30-day career launch plan covering portfolio completion, job applications, and networking in the data engineering community.

Skills you'll gain

Category: Software Development
Category: SQL
Category: Apache
Category: Python Programming
Category: Professional Networking
Category: Data Quality
Category: Portfolio Management
Category: Collaboration
Category: Communication
Category: Data Infrastructure
Category: Professional Development
Category: Apache Spark
Category: Data Pipelines
Category: Apache Airflow
Category: Interviewing Skills
Category: GitHub

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Professionals from the Industry
321 Courses 45,807 learners

Offered by

Coursera

Why people choose Coursera for their career

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

¹Career improvement (i.e. promotion, raise) based on Coursera learner outcome survey responses, United States, 2021.