When will I receive my Course Certificate?

If you complete the course successfully, your electronic Course Certificate will be added to your Accomplishments page - from there, you can print your Course Certificate or add it to your LinkedIn profile.

Why can’t I audit this course?

This course is currently available only to learners who have paid or received financial aid, when available.

Is financial aid available?

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Data Engineering with Databricks Cookbook

Sichern Sie sich eines unserer besten Angebote mit Coursera Plus für 199 $ (normalerweise 399 $). Jetzt sparen.

kurs ist nicht verfügbar in Deutsch (Deutschland)

Wir übersetzen es in weitere Sprachen.

Data Engineering with Databricks Cookbook

Dozent: Packt - Course Instructors

Bei enthalten

Mehr erfahren

Fragen Sie Coursera

11 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

Stufe Mittel

Empfohlene Erfahrung

1 Woche zu vervollständigen

unter 10 Stunden pro Woche

Flexibler Zeitplan

In Ihrem eigenen Lerntempo lernen

11 Module

Verschaffen Sie sich einen Einblick in ein Thema und lernen Sie die Grundlagen.

Stufe Mittel

Empfohlene Erfahrung

1 Woche zu vervollständigen

unter 10 Stunden pro Woche

Flexibler Zeitplan

In Ihrem eigenen Lerntempo lernen

Was Sie lernen werden

Implement Apache Spark for efficient data ingestion and transformation
Optimize performance of Spark and Delta Lake for scalable data solutions.
Build and orchestrate data pipelines using Databricks workflows and Delta Live Tables.

Kompetenzen, die Sie erwerben

Kategorie: Data Access
Kategorie: Data Processing
Kategorie: Data Capture
Kategorie: Data Engineering
Kategorie: Data Transformation
Kategorie: Data Pipelines
Kategorie: Apache
Kategorie: Data Governance
Kategorie: DevOps
Kategorie: Data Manipulation
Kategorie: Devops Tools
Kategorie: Real Time Data
Kategorie: Performance Tuning

Werkzeuge, die Sie lernen werden

Kategorie: Data Lakes
Kategorie: Databricks
Kategorie: Apache Spark
Kategorie: PySpark

Wichtige Details

Zertifikat zur Vorlage

Zu Ihrem LinkedIn-Profil hinzufügen

Kürzlich aktualisiert!

Juni 2026

Bewertungen

11 Aufgaben

Unterrichtet in Englisch

Erfahren Sie, wie Mitarbeiter führender Unternehmen gefragte Kompetenzen erwerben.

Weitere Informationen zu Coursera für Unternehmen

Logos von Petrobras, TATA, Danone, Capgemini, P&G und L'Oreal

In diesem Kurs gibt es 11 Module

This course offers a hands-on approach to mastering data engineering using Apache Spark, Delta Lake, and Databricks. By combining these technologies, you will learn how to build robust, scalable data pipelines and implement effective data management strategies in real-world applications. With a focus on performance optimization, data orchestration, and modern data engineering practices, this course provides essential skills for professionals working in the data engineering space.

You’ll start by exploring data ingestion techniques using Apache Spark, followed by methods for transforming and managing data within a data lakehouse. Each section builds on the last, providing learners with actionable insights that can be directly applied to their workflows. The course also covers DataOps and DevOps practices to help you streamline and automate your data processes. What sets this course apart is its emphasis on practical, real-world applications. You’ll work through concrete examples and recipes for managing data, from ingestion to transformation, ensuring that you can tackle data engineering challenges with confidence. Ideal for data engineers, data scientists, and IT professionals with a background in SQL and Python, this course will help you enhance your skills in data pipeline orchestration and optimization.

This module introduces practical techniques for ingesting and extracting data from various formats such as CSV, JSON, and XML using Apache Spark. Learners will explore common challenges, data transformation functions, and methods for handling nested and complex data structures. By the end, participants will be equipped to efficiently process and manipulate diverse data sources in Spark.

Das ist alles enthalten

1 Video8 Lektüren1 Aufgabe

1 VideoInsgesamt 1 Minute

Overview1 Minute

8 LektürenInsgesamt 40 Minuten

Introduction4 Minuten
Common Issues Faced While Working with CSV Data4 Minuten
Reading JSON Data with Apache Spark5 Minuten
The Flatten() and Collect_list() Functions6 Minuten
Parsing XML Data with Apache Spark4 Minuten
Working with Nested Data Structures in Apache Spark5 Minuten
The Map Keys and Map Values Functions6 Minuten
Using the regexp_extract() Function6 Minuten

1 AufgabeInsgesamt 16 Minuten

Data Ingestion and Extraction with Apache Spark16 Minuten

This module introduces learners to essential data manipulation techniques using Apache Spark and PySpark, including filtering, joining, aggregating, and handling null values in large datasets. Learners will explore both standard and advanced operations such as approximate aggregations and nested window functions to efficiently process and analyze data. By the end, participants will be equipped to transform and manage data at scale using Spark's distributed computing capabilities.

Das ist alles enthalten

1 Video7 Lektüren1 Aufgabe

1 VideoInsgesamt 1 Minute

Overview1 Minute

7 LektürenInsgesamt 34 Minuten

Introduction6 Minuten
Filtering Data with Apache Spark5 Minuten
Performing Joins with Apache Spark5 Minuten
Performing Aggregations with Apache Spark4 Minuten
Approximate Aggregations6 Minuten
Nested Window Functions5 Minuten
Handling Null Values with Apache Spark3 Minuten

1 AufgabeInsgesamt 16 Minuten

Mastering Data Processing in Apache Spark16 Minuten

This module introduces the core concepts and practical skills needed to manage data using Delta Lake, an open-source storage layer for lakehouse architectures. Learners will explore reading and merging data, implementing change data capture, optimizing tables, and leveraging versioning and time travel features to ensure data integrity and performance. Hands-on exercises will reinforce best practices for handling big data workloads with Delta Lake in Python.

Das ist alles enthalten

1 Video6 Lektüren1 Aufgabe

1 VideoInsgesamt 1 Minute

Overview1 Minute

6 LektürenInsgesamt 37 Minuten

Introduction7 Minuten
Reading a Delta Lake Table5 Minuten
Merging Data into Delta Tables7 Minuten
Change Data Capture in Delta Lake5 Minuten
Optimizing Delta Lake Tables6 Minuten
Versioning and Time Travel for Delta Lake Tables7 Minuten

1 AufgabeInsgesamt 16 Minuten

Mastering Delta Lake Data Management16 Minuten

This module introduces the fundamentals of processing real-time data streams using Apache Spark Structured Streaming. Learners will explore how to ingest data from sources like Apache Kafka, apply transformations and filters, configure checkpoints and triggers, and perform windowed aggregations for robust stream processing applications.

Das ist alles enthalten

1 Video6 Lektüren1 Aufgabe

1 VideoInsgesamt 1 Minute

Overview1 Minute

6 LektürenInsgesamt 42 Minuten

Introduction9 Minuten
Reading Data from Real-Time Sources, Such as Apache Kafka, with Apache Spark Structured Streaming7 Minuten
Defining Transformations and Filters on a Streaming DataFrame4 Minuten
Configuring Checkpoints for Structured Streaming in Apache Spark6 Minuten
Configuring Triggers for Structured Streaming in Apache Spark6 Minuten
Applying Window Aggregations to Streaming Data with Apache Spark Structured Streaming10 Minuten

1 AufgabeInsgesamt 16 Minuten

Exploring Streaming Data Processing with Apache Spark16 Minuten

This module explores real-time data processing using Apache Spark Structured Streaming and Delta Lake. Learners will discover techniques for idempotent stream writing, merging change data capture events, joining streaming and static datasets, and monitoring streaming queries. Practical recipes and examples will help you build robust, scalable streaming data pipelines.

Das ist alles enthalten

1 Video6 Lektüren1 Aufgabe

1 VideoInsgesamt 1 Minute

Overview1 Minute

6 LektürenInsgesamt 38 Minuten

Introduction8 Minuten
Idempotent Stream Writing with Delta Lake and Apache Spark Structured Streaming6 Minuten
Merging or Applying Change Data Capture on Apache Spark Structured Streaming and Delta Lake6 Minuten
Joining Streaming Data with Static Data in Apache Spark Structured Streaming and Delta Lake5 Minuten
Joining Streaming Data with Streaming Data in Apache Spark Structured Streaming and Delta Lake6 Minuten
Monitoring Real-Time Data Processing with Apache Spark Structured Streaming7 Minuten

1 AufgabeInsgesamt 16 Minuten

Streaming Data Processing Fundamentals16 Minuten

This module explores advanced techniques for optimizing Apache Spark applications, focusing on improving performance and resource efficiency. Learners will discover strategies such as minimizing data shuffling, handling data skew, leveraging broadcast variables, and optimizing partitioning and join operations. Practical guidance on caching and persistence will also be provided to help accelerate data processing workflows.

Das ist alles enthalten

1 Video7 Lektüren1 Aufgabe

1 VideoInsgesamt 1 Minute

Overview1 Minute

7 LektürenInsgesamt 46 Minuten

Introduction5 Minuten
Using Broadcast Variables5 Minuten
Optimizing Spark Jobs by Minimizing Data Shuffling6 Minuten
Avoiding Data Skew8 Minuten
Caching and Persistence5 Minuten
Partitioning and Repartitioning8 Minuten
Optimizing Join Strategies9 Minuten

1 AufgabeInsgesamt 16 Minuten

Mastering Spark Performance Tuning16 Minuten

This module explores advanced techniques to enhance query performance in Delta Lake, including data partitioning, Z-ordering, data skipping, and compression strategies. Learners will gain practical skills to optimize storage and reduce I/O costs for large-scale data processing.

Das ist alles enthalten

1 Video4 Lektüren1 Aufgabe

This module introduces learners to automating and managing data pipelines using Databricks Workflows. You will explore how to configure, monitor, and parameterize workflows, implement conditional branching, and trigger jobs based on external events such as file arrivals. By the end, you'll be equipped to orchestrate robust data processing tasks on the Databricks platform.

Das ist alles enthalten

1 Video5 Lektüren1 Aufgabe

1 VideoInsgesamt 1 Minute

Overview1 Minute

5 LektürenInsgesamt 30 Minuten

Introduction8 Minuten
Running and Managing Databricks Workflows3 Minuten
Passing Task and Job Parameters Within a Databricks Workflow5 Minuten
Conditional Branching in Databricks Workflows6 Minuten
Triggering Jobs Based on File Arrival8 Minuten

1 AufgabeInsgesamt 16 Minuten

Mastering Databricks Workflow Orchestration16 Minuten

This module guides learners through building robust data pipelines using Delta Live Tables on Databricks. You will explore techniques for ingesting and transforming streaming data, enforcing data quality, quarantining invalid records, monitoring pipeline health, deploying with asset bundles, and implementing change data capture (CDC). By the end, you'll be equipped to create scalable, reliable pipelines for real-time analytics.

Das ist alles enthalten

1 Video7 Lektüren1 Aufgabe

1 VideoInsgesamt 1 Minute

Overview1 Minute

7 LektürenInsgesamt 39 Minuten

Introduction6 Minuten
Building a Data Pipeline with Delta Live Tables on Databricks4 Minuten
Implementing Data Quality and Validation Rules with Delta Live Tables in Databricks6 Minuten
Quarantining Bad Data with Delta Live Tables in Databricks4 Minuten
Monitoring Delta Live Tables Pipelines4 Minuten
Deploying Delta Live Tables Pipelines with Databricks Asset Bundles9 Minuten
Applying Changes (CDC) to Delta Tables with Delta Live Tables6 Minuten

1 AufgabeInsgesamt 16 Minuten

Data Pipeline Fundamentals with Delta Live Tables16 Minuten

This module introduces the core features of Databricks Unity Catalog for managing data governance in a lakehouse environment. Learners will explore catalog creation, fine-grained access controls, metadata management, data lineage, and system table querying to ensure secure and compliant data operations. Practical exercises demonstrate how to implement row filters, column masks, and leverage the Unity Catalog UI for effective data stewardship.

Das ist alles enthalten

1 Video9 Lektüren1 Aufgabe

1 VideoInsgesamt 1 Minute

Overview1 Minute

9 LektürenInsgesamt 40 Minuten

Introduction7 Minuten
Creating a Catalog4 Minuten
Defining and Applying Fine-Grained Access Control Policies Using Unity Catalog5 Minuten
Tagging, Commenting, and Capturing Metadata About Data and AI Assets Using Databricks Unity Catalog5 Minuten
Using the Unity Catalog UI4 Minuten
Apply Row Filters4 Minuten
Apply Column Masks3 Minuten
Using Unity Catalogs Lineage Data for Debugging, Root Cause Analysis, and Impact Assessment4 Minuten
Accessing and Querying System Tables Using Unity Catalog4 Minuten

1 AufgabeInsgesamt 16 Minuten

Data Governance with Unity Catalog16 Minuten

This module explores practical strategies for implementing DataOps and DevOps workflows on the Databricks platform. Learners will discover how to automate tasks using the Databricks CLI, streamline development with the VSCode extension, manage infrastructure with Databricks Asset Bundles, and integrate CI/CD pipelines using GitHub Actions. By the end, participants will be equipped to enhance data and software development efficiency through automation and best practices.

Das ist alles enthalten

1 Video5 Lektüren1 Aufgabe

1 VideoInsgesamt 1 Minute

Overview1 Minute

5 LektürenInsgesamt 35 Minuten

Introduction9 Minuten
Automating Tasks by Using the Databricks CLI6 Minuten
Using the Databricks VSCode Extension for Local Development and Testing4 Minuten
Using Databricks Asset Bundles (DABs)8 Minuten
Leveraging GitHub Actions with Databricks Asset Bundles (DABs)8 Minuten

1 AufgabeInsgesamt 16 Minuten

DataOps and DevOps Implementation on Databricks16 Minuten

Dozent

Packt - Course Instructors

Packt

1.946 Kurse571.338 Lernende

von

Packt

Mehr von Data Analysis entdecken

Status: Kostenloser Testzeitraum
Packt
Mastering Azure Databricks for Data Engineers
Spezialisierung
Packt
Data Engineering with Scala and Spark
Kurs
Status: Kostenloser Testzeitraum
Pragmatic AI Labs
Data Engineering with Delta Lake on Databricks
Kurs
Status: Kostenloser Testzeitraum
Duke University
Spark, Hadoop, and Snowflake for Data Engineering
Kurs

Warum entscheiden sich Menschen für Coursera für ihre Karriere?

Felipe M.

Lernender seit 2018

„Es ist eine großartige Erfahrung, in meinem eigenen Tempo zu lernen. Ich kann lernen, wenn ich Zeit und Nerven dazu habe.“

Jennifer J.

Lernender seit 2020

„Bei einem spannenden neuen Projekt konnte ich die neuen Kenntnisse und Kompetenzen aus den Kursen direkt bei der Arbeit anwenden.“

Larry W.

Lernender seit 2021

„Wenn mir Kurse zu Themen fehlen, die meine Universität nicht anbietet, ist Coursera mit die beste Alternative.“

Chaitanya A.

„Man lernt nicht nur, um bei der Arbeit besser zu werden. Es geht noch um viel mehr. Bei Coursera kann ich ohne Grenzen lernen.“

Häufig gestellte Fragen

Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.

If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.

Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.

Weitere Fragen

Besuchen Sie die das Hilfe-Center für Kursteilnehmer.

Finanzielle Unterstützung verfügbar,

Data Engineering with Databricks Cookbook

kurs ist nicht verfügbar in Deutsch (Deutschland)

Data Engineering with Databricks Cookbook

Was Sie lernen werden

Kompetenzen, die Sie erwerben

Werkzeuge, die Sie lernen werden

Wichtige Details

Erfahren Sie, wie Mitarbeiter führender Unternehmen gefragte Kompetenzen erwerben.

In diesem Kurs gibt es 11 Module

Data Ingestion and Data Extraction with Apache Spark

Das ist alles enthalten

Data Transformation and Data Manipulation with Apache Spark

Das ist alles enthalten

Data Management with Delta Lake

Das ist alles enthalten

Ingesting Streaming Data

Das ist alles enthalten

Processing Streaming Data

Das ist alles enthalten

Performance Tuning with Apache Spark

Das ist alles enthalten

Performance Tuning in Delta Lake

Das ist alles enthalten

Orchestration and Scheduling Data Pipeline with Databricks Workflows

Das ist alles enthalten

Building Data Pipelines with Delta Live Tables

Das ist alles enthalten

Data Governance with Unity Catalog

Das ist alles enthalten

Implementing DataOps and DevOps on Databricks

Das ist alles enthalten

Dozent

von

Mehr von Data Analysis entdecken

Mastering Azure Databricks for Data Engineers

Data Engineering with Scala and Spark

Data Engineering with Delta Lake on Databricks

Spark, Hadoop, and Snowflake for Data Engineering

Warum entscheiden sich Menschen für Coursera für ihre Karriere?

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

Sparen Sie zur Jahresmitte und bringen Sie Ihre Karriere in Schwung

Helfen Sie Ihrem Team aufzusteigen

Häufig gestellte Fragen

Can I preview a course before enrolling?

When will I have access to the lectures and assignments?

What will I get when I enroll?

Weitere Fragen