Is this course really 100% online? Do I need to attend any classes in person?

This course is completely online, so there’s no need to show up to a classroom in person. You can access your lectures, readings and assignments anytime and anywhere via the web or your mobile device.

Can I just enroll in a single course?

Yes! To get started, click the course card that interests you and enroll. You can enroll and complete the course to earn a shareable certificate. When you subscribe to a course that is part of a Certificate, you’re automatically subscribed to the full Certificate. Visit your learner dashboard to track your progress.

Certificat Professionnel Open source Data Engineering with Spark, dbt & Airflow

Obtenez l'une de nos meilleures offres avec Coursera Plus pour 199 $ (habituellement 399 $). Économisez maintenant.

Certificat Professionnel Open source Data Engineering with Spark, dbt & Airflow

Build Production Data Pipelines at Scale.

Explore Spark, dbt, and Airflow to design, automate, and deploy enterprise-grade data pipelines.

Instructeur : Professionals from the Industry

Inclus avec

Série de 6 cours

Obtenez une qualification professionnelle qui traduit votre expertise

niveau Intermédiaire

Expérience recommandée

4 semaines à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

Série de 6 cours

Obtenez une qualification professionnelle qui traduit votre expertise

niveau Intermédiaire

Expérience recommandée

4 semaines à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

Ce que vous apprendrez

Build modular, production-grade data pipelines using Apache Spark, dbt, and Airflow to ingest, transform, and load data at scale.
Design and implement dimensional data models including star schemas, SCD Type 2, and incremental load strategies for data warehouses.
Optimize distributed data processing by resolving Spark shuffle, skew, and partitioning issues to improve pipeline performance.
Automate deployments and enforce data quality using CI/CD pipelines, Docker containers, and automated testing frameworks like Great Expectations.

Compétences que vous acquerrez

Catégorie : Database Design
Catégorie : Diagram Design
Catégorie : Snowflake Schema
Catégorie : Cloud Security
Catégorie : Star Schema
Catégorie : Data Flow Diagrams (DFDs)
Catégorie : Data Validation
Catégorie : Interviewing Skills
Catégorie : SQL
Catégorie : Workflow Management
Catégorie : CI/CD
Catégorie : Data Pipelines
Catégorie : Data Modeling
Catégorie : Data Warehousing

Outils que vous découvrirez

Catégorie : Git (Version Control System)
Catégorie : Docker (Software)
Catégorie : PySpark
Catégorie : Apache Airflow
Catégorie : Ansible
Catégorie : Apache Spark

Détails à connaître

Certificat partageable

Ajouter à votre profil LinkedIn

Enseigné en Anglais

Récemment mis à jour !

mars 2026

Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées

En savoir plus sur Coursera pour les affaires

logos de Petrobras, TATA, Danone, Capgemini, P&G et L'Oreal

Faites progresser votre carrière avec des compétences recherchées

Recevez une formation professionnelle par Coursera
Démontrez vos compétences techniques
Obtenez un certificat reconnu par les employeurs auprès de Coursera

Certificat professionnel - série de 6 cours

This program equips you with the open-source tools and architectural thinking used by professional data engineers to build scalable, reliable data systems from the ground up. You will work hands-on with Apache Spark for distributed data processing, dbt for modular SQL-based transformation, and Apache Airflow for workflow orchestration — the same stack powering data infrastructure at leading technology and data-driven organizations worldwide.

Across the courses, you will gain practical expertise in designing dimensional data models, implementing incremental load strategies, optimizing Spark job performance, enforcing data quality with automated testing frameworks, and deploying pipelines through CI/CD workflows. You will also develop foundational skills in cloud storage provisioning, containerization with Docker, and version control best practices that mirror real production environments.

By the end of this Program, you will be able to design and deploy end-to-end data pipelines that ingest from diverse sources, transform data through well-tested models, and deliver analytics-ready datasets to downstream consumers — demonstrating job-ready engineering skills valued across analytics engineering, data platform, and data infrastructure roles.

Projet d'apprentissage appliqué

Throughout this Program, you will complete hands-on projects that mirror real production data engineering challenges — from building modular ETL pipelines that ingest CRM and streaming data into a cloud data warehouse, to authoring Airflow DAGs with retry logic and SLA monitoring, to diagnosing Spark performance bottlenecks and implementing Delta Lake versioning. Each project asks you to work in your own development environment, producing portfolio-ready artifacts that demonstrate your ability to design, optimize, and deploy reliable data infrastructure using open-source tools.

Building Automated Data Pipelines with Spark,dbt,and Airflow

COURS 1, 9 heures

Ce que vous apprendrez

Build end-to-end data pipelines that automatically ingest from databases, APIs, and streams using Spark, dbt, and Airflow tools.
Design data models with historical tracking using SCD Type 2 patterns to preserve complete change history for analytics.
Create automated workflows with intelligent retry logic, SLA monitoring, and parameterization for production reliability.
Optimize Spark job performance using partitioning and caching strategies to achieve 30%+ runtime improvements.

Compétences que vous acquerrez

Catégorie : Data Pipelines

Catégorie : Apache Airflow

Catégorie : Data Flow Diagrams (DFDs)

Catégorie : Apache Spark

Catégorie : Data Transformation

Catégorie : Data Modeling

Catégorie : Diagram Design

Catégorie : Data Architecture

Catégorie : Data Integration

Catégorie : Data Mapping

Catégorie : Extract, Transform, Load

Catégorie : Enterprise Security

Catégorie : Service Level

Catégorie : Database Development

Catégorie : Data Warehousing

Catégorie : Dataflow

Catégorie : Data Processing

Optimizing Spark and Cloud Data Storage for Analytics

COURS 2, 10 heures

Ce que vous apprendrez

Optimize Spark job performance through strategic partitioning and caching, achieving 30%+ runtime improvements using data access analysis.
Implement transactional data lakes with Delta format, enabling versioning, ACID operations, and schema evolution for reliable datasets.
Provision secure cloud data infrastructure using IAM policies, private networks, and encrypted storage following security best practices.
Evaluate and benchmark storage formats (Parquet, ORC, Avro) to select optimal solutions for analytical workloads and cost efficiency.

Compétences que vous acquerrez

Catégorie : Apache Spark

Catégorie : Cloud Security

Catégorie : Performance Tuning

Catégorie : Data Warehousing

Catégorie : Data Storage

Catégorie : Transaction Processing

Catégorie : Data Management

Catégorie : Cloud Infrastructure

Catégorie : Cloud Deployment

Catégorie : Cloud Computing

Catégorie : Infrastructure Architecture

Catégorie : Infrastructure as Code (IaC)

Catégorie : Cloud Storage

Catégorie : Data Security

Catégorie : Data Integrity

Catégorie : Security Controls

Catégorie : PySpark

Catégorie : Data Lakes

Catégorie : Cloud Computing Architecture

Catégorie : Data Storage Technologies

Data Modeling & Warehousing Fundamentals in Data Engineering

COURS 3, 9 heures

Ce que vous apprendrez

Design star schema data models with fact and dimension tables that enable intuitive self-service business intelligence reporting.
Apply third normal form normalization to optimize database structure while maintaining query performance through indexing strategies.
Use advanced SQL window functions to calculate rolling metrics, rankings, and time-series analytics for complex data analysis.
Implement database replication and incremental loading techniques to ensure high availability and efficient data warehouse updates.

Compétences que vous acquerrez

Catégorie : Database Design

Catégorie : Star Schema

Catégorie : Performance Tuning

Catégorie : SQL

Catégorie : Database Management

Catégorie : Extract, Transform, Load

Catégorie : Data Warehousing

Catégorie : Database Development

Catégorie : Database Theory

Catégorie : Database Architecture and Administration

Catégorie : Data Infrastructure

Catégorie : Data Architecture

Catégorie : Relational Databases

Catégorie : Business Intelligence

Catégorie : Data Modeling

Catégorie : Data Integration

Catégorie : Database Software

DevOps and CI/CD for Data Engineering Performance

COURS 4, 12 heures

Ce que vous apprendrez

Resolve merge conflicts and trace bugs using Git history tools, keeping collaborative codebases stable and production-ready.
Design branching strategies and automate deployments with CI/CD pipelines to safely promote data pipeline artifacts across environments.
Build and publish versioned Docker images and automate server configuration with Ansible for consistent, reproducible environments.
Analyze query execution metrics and optimize resource allocation to maintain performance targets in production data systems.

Compétences que vous acquerrez

Catégorie : DevOps

Catégorie : CI/CD

Catégorie : Git (Version Control System)

Catégorie : Containerization

Catégorie : Performance Tuning

Catégorie : Data Pipelines

Catégorie : Ansible

Catégorie : Root Cause Analysis

Catégorie : Infrastructure as Code (IaC)

Catégorie : Docker (Software)

Catégorie : Development Environment

Catégorie : Application Deployment

Catégorie : Version Control

Catégorie : Data Infrastructure

Catégorie : Continuous Deployment

Catégorie : Software Versioning

Catégorie : Configuration Management

Catégorie : Continuous Integration

Catégorie : Devops Tools

Data Quality and Debugging for Reliable Pipelines

COURS 5, 7 heures

Ce que vous apprendrez

Define and automate data quality tests using YAML to validate row counts, null thresholds, and uniqueness across pipeline datasets.
Trace data anomalies through pipeline stages by analyzing logs and dashboards to identify and fix the exact source of failure.
Apply advanced Python debugging tools — including conditional breakpoints, watchpoints, and pdb — to diagnose and resolve pipeline issues.
Resolve complex concurrency bugs by reading stack traces and correlating thread logs to identify deadlocks and race conditions in code.

Compétences que vous acquerrez

Catégorie : Data Quality

Catégorie : Data Validation

Catégorie : Debugging

Catégorie : Anomaly Detection

Catégorie : YAML

Catégorie : Data Integrity

Catégorie : Test Automation

Catégorie : Test Tools

Catégorie : Reliability

Catégorie : Test Script Development

Catégorie : Root Cause Analysis

Catégorie : AI Integrations

Catégorie : Data Pipelines

Catégorie : Python Programming

Catégorie : Performance Tuning

Catégorie : Generative AI

Catégorie : Problem Management

Career Development For Open Source Data Engineering

COURS 6, 2 heures

Ce que vous apprendrez

Build a data engineering portfolio with end-to-end pipeline projects that prove your ability to design, build, and deploy production-style systems.
Create a resume, LinkedIn profile, and GitHub presence that position you as a hands-on data engineer ready to contribute from day one.
Practice real data engineering interview scenarios and develop structured responses to technical, design, and behavioral questions.
Execute a 30-day career launch plan covering portfolio completion, job applications, and networking in the data engineering community.

Compétences que vous acquerrez

Catégorie : Apache Spark

Catégorie : Apache Airflow

Catégorie : Portfolio Management

Catégorie : SQL

Catégorie : Apache

Catégorie : Professional Networking

Catégorie : Data Quality

Catégorie : Data Presentation

Catégorie : Python Programming

Catégorie : Web Presence

Catégorie : GitHub

Catégorie : Data Pipelines

Catégorie : Interviewing Skills

Obtenez un certificat professionnel

Ajoutez ce titre à votre profil LinkedIn, à votre curriculum vitae ou à votre CV. Partagez-le sur les médias sociaux et dans votre évaluation des performances.

Instructeur

Professionals from the Industry

489 Cours112 906 apprenants

Offert par

Coursera

Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?

Felipe M.

Étudiant(e) depuis 2018

’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’

Jennifer J.

Étudiant(e) depuis 2020

’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’

Larry W.

Étudiant(e) depuis 2021

’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’

Chaitanya A.

’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’

Foire Aux Questions

This Program is designed for intermediate learners. You should be comfortable writing Python scripts and SQL queries before starting. Prior experience with data engineering tools like Spark or Airflow is not required — you will build that knowledge through the courses.

You will work in your own local or cloud-based development environment using open-source tools including Apache Spark, dbt Core, Apache Airflow, Docker, and Git. Specific setup instructions are provided at the start of each course.

This program is designed for aspiring data engineers and technically curious professionals who want to build a career working with data infrastructure and pipelines. It is well-suited for software developers transitioning into data engineering, analysts looking to move beyond spreadsheets and SQL into pipeline development, and recent graduates seeking job-ready, hands-on data engineering skills.

Basic Python familiarity and foundational SQL knowledge — such as writing simple SELECT and JOIN queries — are recommended before starting. General comfort working in a command-line environment will also be helpful. No prior experience with Spark, dbt, Airflow, Docker, or cloud platforms is required. The program builds all data engineering skills from the ground up.

Plus de questions

Visitez le Centre d'Aide pour les Étudiants

Aide financière disponible,

¹Basé sur les réponses au sondage sur les résultats des étudiants Coursera, États-Unis, 2021.