When will I receive my Course Certificate?

If you complete the course successfully, your electronic Course Certificate will be added to your Accomplishments page - from there, you can print your Course Certificate or add it to your LinkedIn profile.

Why can’t I audit this course?

This course is currently available only to learners who have paid or received financial aid, when available.

Is financial aid available?

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Data Engineering with Databricks Cookbook

Obtenez l'une de nos meilleures offres avec Coursera Plus pour 199 $ (habituellement 399 $). Économisez maintenant.

Ce cours n'est pas disponible en Français (France)

Nous sommes actuellement en train de le traduire dans plus de langues.

Data Engineering with Databricks Cookbook

Instructeur : Packt - Course Instructors

Inclus avec

11 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

1 semaine à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

11 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

1 semaine à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

Ce que vous apprendrez

Implement Apache Spark for efficient data ingestion and transformation
Optimize performance of Spark and Delta Lake for scalable data solutions.
Build and orchestrate data pipelines using Databricks workflows and Delta Live Tables.

Détails à connaître

Certificat partageable

Ajouter à votre profil LinkedIn

Récemment mis à jour !

juin 2026

Évaluations

11 devoirs

Enseigné en Anglais

Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées

En savoir plus sur Coursera pour les affaires

logos de Petrobras, TATA, Danone, Capgemini, P&G et L'Oreal

Il y a 11 modules dans ce cours

This course offers a hands-on approach to mastering data engineering using Apache Spark, Delta Lake, and Databricks. By combining these technologies, you will learn how to build robust, scalable data pipelines and implement effective data management strategies in real-world applications. With a focus on performance optimization, data orchestration, and modern data engineering practices, this course provides essential skills for professionals working in the data engineering space.

You’ll start by exploring data ingestion techniques using Apache Spark, followed by methods for transforming and managing data within a data lakehouse. Each section builds on the last, providing learners with actionable insights that can be directly applied to their workflows. The course also covers DataOps and DevOps practices to help you streamline and automate your data processes. What sets this course apart is its emphasis on practical, real-world applications. You’ll work through concrete examples and recipes for managing data, from ingestion to transformation, ensuring that you can tackle data engineering challenges with confidence. Ideal for data engineers, data scientists, and IT professionals with a background in SQL and Python, this course will help you enhance your skills in data pipeline orchestration and optimization.

This module introduces practical techniques for ingesting and extracting data from various formats such as CSV, JSON, and XML using Apache Spark. Learners will explore common challenges, data transformation functions, and methods for handling nested and complex data structures. By the end, participants will be equipped to efficiently process and manipulate diverse data sources in Spark.

Inclus

1 vidéo8 lectures1 devoir

1 vidéoTotal 1 minute

Overview1 minute

8 lecturesTotal 40 minutes

Introduction4 minutes
Common Issues Faced While Working with CSV Data4 minutes
Reading JSON Data with Apache Spark5 minutes
The Flatten() and Collect_list() Functions6 minutes
Parsing XML Data with Apache Spark4 minutes
Working with Nested Data Structures in Apache Spark5 minutes
The Map Keys and Map Values Functions6 minutes
Using the regexp_extract() Function6 minutes

1 devoirTotal 16 minutes

Data Ingestion and Extraction with Apache Spark16 minutes

This module introduces learners to essential data manipulation techniques using Apache Spark and PySpark, including filtering, joining, aggregating, and handling null values in large datasets. Learners will explore both standard and advanced operations such as approximate aggregations and nested window functions to efficiently process and analyze data. By the end, participants will be equipped to transform and manage data at scale using Spark's distributed computing capabilities.

Inclus

1 vidéo7 lectures1 devoir

1 vidéoTotal 1 minute

Overview1 minute

7 lecturesTotal 34 minutes

Introduction6 minutes
Filtering Data with Apache Spark5 minutes
Performing Joins with Apache Spark5 minutes
Performing Aggregations with Apache Spark4 minutes
Approximate Aggregations6 minutes
Nested Window Functions5 minutes
Handling Null Values with Apache Spark3 minutes

1 devoirTotal 16 minutes

Mastering Data Processing in Apache Spark16 minutes

This module introduces the core concepts and practical skills needed to manage data using Delta Lake, an open-source storage layer for lakehouse architectures. Learners will explore reading and merging data, implementing change data capture, optimizing tables, and leveraging versioning and time travel features to ensure data integrity and performance. Hands-on exercises will reinforce best practices for handling big data workloads with Delta Lake in Python.

Inclus

1 vidéo6 lectures1 devoir

This module introduces the fundamentals of processing real-time data streams using Apache Spark Structured Streaming. Learners will explore how to ingest data from sources like Apache Kafka, apply transformations and filters, configure checkpoints and triggers, and perform windowed aggregations for robust stream processing applications.

Inclus

1 vidéo6 lectures1 devoir

1 vidéoTotal 1 minute

Overview1 minute

6 lecturesTotal 42 minutes

Introduction9 minutes
Reading Data from Real-Time Sources, Such as Apache Kafka, with Apache Spark Structured Streaming7 minutes
Defining Transformations and Filters on a Streaming DataFrame4 minutes
Configuring Checkpoints for Structured Streaming in Apache Spark6 minutes
Configuring Triggers for Structured Streaming in Apache Spark6 minutes
Applying Window Aggregations to Streaming Data with Apache Spark Structured Streaming10 minutes

1 devoirTotal 16 minutes

Exploring Streaming Data Processing with Apache Spark16 minutes

This module explores real-time data processing using Apache Spark Structured Streaming and Delta Lake. Learners will discover techniques for idempotent stream writing, merging change data capture events, joining streaming and static datasets, and monitoring streaming queries. Practical recipes and examples will help you build robust, scalable streaming data pipelines.

Inclus

1 vidéo6 lectures1 devoir

1 vidéoTotal 1 minute

Overview1 minute

6 lecturesTotal 38 minutes

Introduction8 minutes
Idempotent Stream Writing with Delta Lake and Apache Spark Structured Streaming6 minutes
Merging or Applying Change Data Capture on Apache Spark Structured Streaming and Delta Lake6 minutes
Joining Streaming Data with Static Data in Apache Spark Structured Streaming and Delta Lake5 minutes
Joining Streaming Data with Streaming Data in Apache Spark Structured Streaming and Delta Lake6 minutes
Monitoring Real-Time Data Processing with Apache Spark Structured Streaming7 minutes

1 devoirTotal 16 minutes

Streaming Data Processing Fundamentals16 minutes

This module explores advanced techniques for optimizing Apache Spark applications, focusing on improving performance and resource efficiency. Learners will discover strategies such as minimizing data shuffling, handling data skew, leveraging broadcast variables, and optimizing partitioning and join operations. Practical guidance on caching and persistence will also be provided to help accelerate data processing workflows.

Inclus

1 vidéo7 lectures1 devoir

1 vidéoTotal 1 minute

Overview1 minute

7 lecturesTotal 46 minutes

Introduction5 minutes
Using Broadcast Variables5 minutes
Optimizing Spark Jobs by Minimizing Data Shuffling6 minutes
Avoiding Data Skew8 minutes
Caching and Persistence5 minutes
Partitioning and Repartitioning8 minutes
Optimizing Join Strategies9 minutes

1 devoirTotal 16 minutes

Mastering Spark Performance Tuning16 minutes

This module explores advanced techniques to enhance query performance in Delta Lake, including data partitioning, Z-ordering, data skipping, and compression strategies. Learners will gain practical skills to optimize storage and reduce I/O costs for large-scale data processing.

Inclus

1 vidéo4 lectures1 devoir

This module introduces learners to automating and managing data pipelines using Databricks Workflows. You will explore how to configure, monitor, and parameterize workflows, implement conditional branching, and trigger jobs based on external events such as file arrivals. By the end, you'll be equipped to orchestrate robust data processing tasks on the Databricks platform.

Inclus

1 vidéo5 lectures1 devoir

1 vidéoTotal 1 minute

Overview1 minute

5 lecturesTotal 30 minutes

Introduction8 minutes
Running and Managing Databricks Workflows3 minutes
Passing Task and Job Parameters Within a Databricks Workflow5 minutes
Conditional Branching in Databricks Workflows6 minutes
Triggering Jobs Based on File Arrival8 minutes

1 devoirTotal 16 minutes

Mastering Databricks Workflow Orchestration16 minutes

This module guides learners through building robust data pipelines using Delta Live Tables on Databricks. You will explore techniques for ingesting and transforming streaming data, enforcing data quality, quarantining invalid records, monitoring pipeline health, deploying with asset bundles, and implementing change data capture (CDC). By the end, you'll be equipped to create scalable, reliable pipelines for real-time analytics.

Inclus

1 vidéo7 lectures1 devoir

1 vidéoTotal 1 minute

Overview1 minute

7 lecturesTotal 39 minutes

Introduction6 minutes
Building a Data Pipeline with Delta Live Tables on Databricks4 minutes
Implementing Data Quality and Validation Rules with Delta Live Tables in Databricks6 minutes
Quarantining Bad Data with Delta Live Tables in Databricks4 minutes
Monitoring Delta Live Tables Pipelines4 minutes
Deploying Delta Live Tables Pipelines with Databricks Asset Bundles9 minutes
Applying Changes (CDC) to Delta Tables with Delta Live Tables6 minutes

1 devoirTotal 16 minutes

Data Pipeline Fundamentals with Delta Live Tables16 minutes

This module introduces the core features of Databricks Unity Catalog for managing data governance in a lakehouse environment. Learners will explore catalog creation, fine-grained access controls, metadata management, data lineage, and system table querying to ensure secure and compliant data operations. Practical exercises demonstrate how to implement row filters, column masks, and leverage the Unity Catalog UI for effective data stewardship.

Inclus

1 vidéo9 lectures1 devoir

1 vidéoTotal 1 minute

Overview1 minute

9 lecturesTotal 40 minutes

Introduction7 minutes
Creating a Catalog4 minutes
Defining and Applying Fine-Grained Access Control Policies Using Unity Catalog5 minutes
Tagging, Commenting, and Capturing Metadata About Data and AI Assets Using Databricks Unity Catalog5 minutes
Using the Unity Catalog UI4 minutes
Apply Row Filters4 minutes
Apply Column Masks3 minutes
Using Unity Catalogs Lineage Data for Debugging, Root Cause Analysis, and Impact Assessment4 minutes
Accessing and Querying System Tables Using Unity Catalog4 minutes

1 devoirTotal 16 minutes

Data Governance with Unity Catalog16 minutes

This module explores practical strategies for implementing DataOps and DevOps workflows on the Databricks platform. Learners will discover how to automate tasks using the Databricks CLI, streamline development with the VSCode extension, manage infrastructure with Databricks Asset Bundles, and integrate CI/CD pipelines using GitHub Actions. By the end, participants will be equipped to enhance data and software development efficiency through automation and best practices.

Inclus

1 vidéo5 lectures1 devoir

1 vidéoTotal 1 minute

Overview1 minute

5 lecturesTotal 35 minutes

Introduction9 minutes
Automating Tasks by Using the Databricks CLI6 minutes
Using the Databricks VSCode Extension for Local Development and Testing4 minutes
Using Databricks Asset Bundles (DABs)8 minutes
Leveraging GitHub Actions with Databricks Asset Bundles (DABs)8 minutes

1 devoirTotal 16 minutes

DataOps and DevOps Implementation on Databricks16 minutes

Instructeur

Packt - Course Instructors

Packt

1 946 Cours569 983 apprenants

Offert par

Packt

En savoir plus sur Data Analysis

Statut : Essai gratuit
Packt
Mastering Azure Databricks for Data Engineers
Spécialisation
Packt
Data Engineering with Scala and Spark
Cours
Statut : Essai gratuit
Pragmatic AI Labs
Data Engineering with Delta Lake on Databricks
Cours
Statut : Essai gratuit
Duke University
Spark, Hadoop, and Snowflake for Data Engineering
Cours

Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?

Felipe M.

Étudiant(e) depuis 2018

’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’

Jennifer J.

Étudiant(e) depuis 2020

’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’

Larry W.

Étudiant(e) depuis 2021

’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’

Chaitanya A.

’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’

Foire Aux Questions

Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.

If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.

Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.

Plus de questions

Visitez le Centre d'Aide pour les Étudiants

Aide financière disponible,

Data Engineering with Databricks Cookbook

Ce cours n'est pas disponible en Français (France)

Data Engineering with Databricks Cookbook

Ce que vous apprendrez

Détails à connaître

Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées

Il y a 11 modules dans ce cours

Data Ingestion and Data Extraction with Apache Spark

Inclus

Data Transformation and Data Manipulation with Apache Spark

Inclus

Data Management with Delta Lake

Inclus

Ingesting Streaming Data

Inclus

Processing Streaming Data

Inclus

Performance Tuning with Apache Spark

Inclus

Performance Tuning in Delta Lake

Inclus

Orchestration and Scheduling Data Pipeline with Databricks Workflows

Inclus

Building Data Pipelines with Delta Live Tables

Inclus

Data Governance with Unity Catalog

Inclus

Implementing DataOps and DevOps on Databricks

Inclus

Instructeur

Offert par

En savoir plus sur Data Analysis

Mastering Azure Databricks for Data Engineers

Data Engineering with Scala and Spark

Data Engineering with Delta Lake on Databricks

Spark, Hadoop, and Snowflake for Data Engineering

Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

Réaliser des économies en milieu d'année et donner de l'élan à sa carrière

Aidez votre équipe à s'élever

Foire Aux Questions

Can I preview a course before enrolling?

When will I have access to the lectures and assignments?

What will I get when I enroll?

Plus de questions