When will I receive my Course Certificate?

If you complete the course successfully, your electronic Course Certificate will be added to your Accomplishments page - from there, you can print your Course Certificate or add it to your LinkedIn profile.

Why can’t I audit this course?

This course is currently available only to learners who have paid or received financial aid, when available.

Is financial aid available?

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Data Engineering with Databricks Cookbook

Data Engineering with Databricks Cookbook

Instructor: Packt - Course Instructors

Included with

Learn more

11 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

1 week to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

11 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

1 week to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Implement Apache Spark for efficient data ingestion and transformation
Optimize performance of Spark and Delta Lake for scalable data solutions.
Build and orchestrate data pipelines using Databricks workflows and Delta Live Tables.

Details to know

Shareable certificate

Add to your LinkedIn profile

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 11 modules in this course

This course offers a hands-on approach to mastering data engineering using Apache Spark, Delta Lake, and Databricks. By combining these technologies, you will learn how to build robust, scalable data pipelines and implement effective data management strategies in real-world applications. With a focus on performance optimization, data orchestration, and modern data engineering practices, this course provides essential skills for professionals working in the data engineering space.

You’ll start by exploring data ingestion techniques using Apache Spark, followed by methods for transforming and managing data within a data lakehouse. Each section builds on the last, providing learners with actionable insights that can be directly applied to their workflows. The course also covers DataOps and DevOps practices to help you streamline and automate your data processes. What sets this course apart is its emphasis on practical, real-world applications. You’ll work through concrete examples and recipes for managing data, from ingestion to transformation, ensuring that you can tackle data engineering challenges with confidence. Ideal for data engineers, data scientists, and IT professionals with a background in SQL and Python, this course will help you enhance your skills in data pipeline orchestration and optimization.

This module introduces practical techniques for ingesting and extracting data from various formats such as CSV, JSON, and XML using Apache Spark. Learners will explore common challenges, data transformation functions, and methods for handling nested and complex data structures. By the end, participants will be equipped to efficiently process and manipulate diverse data sources in Spark.

What's included

1 video8 readings1 assignment

1 videoTotal 1 minute

Overview1 minute

8 readingsTotal 40 minutes

Introduction4 minutes
Common Issues Faced While Working with CSV Data4 minutes
Reading JSON Data with Apache Spark5 minutes
The Flatten() and Collect_list() Functions6 minutes
Parsing XML Data with Apache Spark4 minutes
Working with Nested Data Structures in Apache Spark5 minutes
The Map Keys and Map Values Functions6 minutes
Using the regexp_extract() Function6 minutes

1 assignmentTotal 16 minutes

Data Ingestion and Extraction with Apache Spark16 minutes

This module introduces learners to essential data manipulation techniques using Apache Spark and PySpark, including filtering, joining, aggregating, and handling null values in large datasets. Learners will explore both standard and advanced operations such as approximate aggregations and nested window functions to efficiently process and analyze data. By the end, participants will be equipped to transform and manage data at scale using Spark's distributed computing capabilities.

What's included

1 video7 readings1 assignment

1 videoTotal 1 minute

Overview1 minute

7 readingsTotal 34 minutes

Introduction6 minutes
Filtering Data with Apache Spark5 minutes
Performing Joins with Apache Spark5 minutes
Performing Aggregations with Apache Spark4 minutes
Approximate Aggregations6 minutes
Nested Window Functions5 minutes
Handling Null Values with Apache Spark3 minutes

1 assignmentTotal 16 minutes

Mastering Data Processing in Apache Spark16 minutes

This module introduces the core concepts and practical skills needed to manage data using Delta Lake, an open-source storage layer for lakehouse architectures. Learners will explore reading and merging data, implementing change data capture, optimizing tables, and leveraging versioning and time travel features to ensure data integrity and performance. Hands-on exercises will reinforce best practices for handling big data workloads with Delta Lake in Python.

What's included

1 video6 readings1 assignment

1 videoTotal 1 minute

Overview1 minute

6 readingsTotal 37 minutes

Introduction7 minutes
Reading a Delta Lake Table5 minutes
Merging Data into Delta Tables7 minutes
Change Data Capture in Delta Lake5 minutes
Optimizing Delta Lake Tables6 minutes
Versioning and Time Travel for Delta Lake Tables7 minutes

1 assignmentTotal 16 minutes

Mastering Delta Lake Data Management16 minutes

This module introduces the fundamentals of processing real-time data streams using Apache Spark Structured Streaming. Learners will explore how to ingest data from sources like Apache Kafka, apply transformations and filters, configure checkpoints and triggers, and perform windowed aggregations for robust stream processing applications.

What's included

1 video6 readings1 assignment

1 videoTotal 1 minute

Overview1 minute

6 readingsTotal 42 minutes

Introduction9 minutes
Reading Data from Real-Time Sources, Such as Apache Kafka, with Apache Spark Structured Streaming7 minutes
Defining Transformations and Filters on a Streaming DataFrame4 minutes
Configuring Checkpoints for Structured Streaming in Apache Spark6 minutes
Configuring Triggers for Structured Streaming in Apache Spark6 minutes
Applying Window Aggregations to Streaming Data with Apache Spark Structured Streaming10 minutes

1 assignmentTotal 16 minutes

Exploring Streaming Data Processing with Apache Spark16 minutes

This module explores real-time data processing using Apache Spark Structured Streaming and Delta Lake. Learners will discover techniques for idempotent stream writing, merging change data capture events, joining streaming and static datasets, and monitoring streaming queries. Practical recipes and examples will help you build robust, scalable streaming data pipelines.

What's included

1 video6 readings1 assignment

1 videoTotal 1 minute

Overview1 minute

6 readingsTotal 38 minutes

Introduction8 minutes
Idempotent Stream Writing with Delta Lake and Apache Spark Structured Streaming6 minutes
Merging or Applying Change Data Capture on Apache Spark Structured Streaming and Delta Lake6 minutes
Joining Streaming Data with Static Data in Apache Spark Structured Streaming and Delta Lake5 minutes
Joining Streaming Data with Streaming Data in Apache Spark Structured Streaming and Delta Lake6 minutes
Monitoring Real-Time Data Processing with Apache Spark Structured Streaming7 minutes

1 assignmentTotal 16 minutes

Streaming Data Processing Fundamentals16 minutes

This module explores advanced techniques for optimizing Apache Spark applications, focusing on improving performance and resource efficiency. Learners will discover strategies such as minimizing data shuffling, handling data skew, leveraging broadcast variables, and optimizing partitioning and join operations. Practical guidance on caching and persistence will also be provided to help accelerate data processing workflows.

What's included

1 video7 readings1 assignment

1 videoTotal 1 minute

Overview1 minute

7 readingsTotal 46 minutes

Introduction5 minutes
Using Broadcast Variables5 minutes
Optimizing Spark Jobs by Minimizing Data Shuffling6 minutes
Avoiding Data Skew8 minutes
Caching and Persistence5 minutes
Partitioning and Repartitioning8 minutes
Optimizing Join Strategies9 minutes

1 assignmentTotal 16 minutes

Mastering Spark Performance Tuning16 minutes

This module explores advanced techniques to enhance query performance in Delta Lake, including data partitioning, Z-ordering, data skipping, and compression strategies. Learners will gain practical skills to optimize storage and reduce I/O costs for large-scale data processing.

What's included

1 video4 readings1 assignment

This module introduces learners to automating and managing data pipelines using Databricks Workflows. You will explore how to configure, monitor, and parameterize workflows, implement conditional branching, and trigger jobs based on external events such as file arrivals. By the end, you'll be equipped to orchestrate robust data processing tasks on the Databricks platform.

What's included

1 video5 readings1 assignment

1 videoTotal 1 minute

Overview1 minute

5 readingsTotal 30 minutes

Introduction8 minutes
Running and Managing Databricks Workflows3 minutes
Passing Task and Job Parameters Within a Databricks Workflow5 minutes
Conditional Branching in Databricks Workflows6 minutes
Triggering Jobs Based on File Arrival8 minutes

1 assignmentTotal 16 minutes

Mastering Databricks Workflow Orchestration16 minutes

This module guides learners through building robust data pipelines using Delta Live Tables on Databricks. You will explore techniques for ingesting and transforming streaming data, enforcing data quality, quarantining invalid records, monitoring pipeline health, deploying with asset bundles, and implementing change data capture (CDC). By the end, you'll be equipped to create scalable, reliable pipelines for real-time analytics.

What's included

1 video7 readings1 assignment

1 videoTotal 1 minute

Overview1 minute

7 readingsTotal 39 minutes

Introduction6 minutes
Building a Data Pipeline with Delta Live Tables on Databricks4 minutes
Implementing Data Quality and Validation Rules with Delta Live Tables in Databricks6 minutes
Quarantining Bad Data with Delta Live Tables in Databricks4 minutes
Monitoring Delta Live Tables Pipelines4 minutes
Deploying Delta Live Tables Pipelines with Databricks Asset Bundles9 minutes
Applying Changes (CDC) to Delta Tables with Delta Live Tables6 minutes

1 assignmentTotal 16 minutes

Data Pipeline Fundamentals with Delta Live Tables16 minutes

This module introduces the core features of Databricks Unity Catalog for managing data governance in a lakehouse environment. Learners will explore catalog creation, fine-grained access controls, metadata management, data lineage, and system table querying to ensure secure and compliant data operations. Practical exercises demonstrate how to implement row filters, column masks, and leverage the Unity Catalog UI for effective data stewardship.

What's included

1 video9 readings1 assignment

1 videoTotal 1 minute

Overview1 minute

9 readingsTotal 40 minutes

Introduction7 minutes
Creating a Catalog4 minutes
Defining and Applying Fine-Grained Access Control Policies Using Unity Catalog5 minutes
Tagging, Commenting, and Capturing Metadata About Data and AI Assets Using Databricks Unity Catalog5 minutes
Using the Unity Catalog UI4 minutes
Apply Row Filters4 minutes
Apply Column Masks3 minutes
Using Unity Catalogs Lineage Data for Debugging, Root Cause Analysis, and Impact Assessment4 minutes
Accessing and Querying System Tables Using Unity Catalog4 minutes

1 assignmentTotal 16 minutes

Data Governance with Unity Catalog16 minutes

This module explores practical strategies for implementing DataOps and DevOps workflows on the Databricks platform. Learners will discover how to automate tasks using the Databricks CLI, streamline development with the VSCode extension, manage infrastructure with Databricks Asset Bundles, and integrate CI/CD pipelines using GitHub Actions. By the end, participants will be equipped to enhance data and software development efficiency through automation and best practices.

What's included

1 video5 readings1 assignment

1 videoTotal 1 minute

Overview1 minute

5 readingsTotal 35 minutes

Introduction9 minutes
Automating Tasks by Using the Databricks CLI6 minutes
Using the Databricks VSCode Extension for Local Development and Testing4 minutes
Using Databricks Asset Bundles (DABs)8 minutes
Leveraging GitHub Actions with Databricks Asset Bundles (DABs)8 minutes

1 assignmentTotal 16 minutes

DataOps and DevOps Implementation on Databricks16 minutes

Instructor

Packt - Course Instructors

Packt

1,946 Courses568,385 learners

Offered by

Packt

Explore more from Data Analysis

Status: Free Trial
Packt
Mastering Azure Databricks for Data Engineers
Specialization
Packt
Data Engineering with Scala and Spark
Course
Status: Free Trial
Pragmatic AI Labs
Data Engineering with Delta Lake on Databricks
Course
Status: Free Trial
Duke University
Spark, Hadoop, and Snowflake for Data Engineering
Course

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Unlock access to 10,000+ courses with a subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 4,700 global companies that choose Coursera for Business

Frequently asked questions

Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.

If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.

Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.

Data Engineering with Databricks Cookbook

Data Engineering with Databricks Cookbook

What you'll learn

Details to know

See how employees at top companies are mastering in-demand skills

There are 11 modules in this course

Data Ingestion and Data Extraction with Apache Spark

What's included

Data Transformation and Data Manipulation with Apache Spark

What's included

Data Management with Delta Lake

What's included

Ingesting Streaming Data

What's included

Processing Streaming Data

What's included

Performance Tuning with Apache Spark

What's included

Performance Tuning in Delta Lake

What's included

Orchestration and Scheduling Data Pipeline with Databricks Workflows

What's included

Building Data Pipelines with Delta Live Tables

What's included

Data Governance with Unity Catalog

What's included

Implementing DataOps and DevOps on Databricks

What's included

Instructor

Offered by

Explore more from Data Analysis

Mastering Azure Databricks for Data Engineers

Data Engineering with Scala and Spark

Data Engineering with Delta Lake on Databricks

Spark, Hadoop, and Snowflake for Data Engineering

Why people choose Coursera for their career

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

Unlock access to 10,000+ courses with a subscription

Advance your career with an online degree

Join over 4,700 global companies that choose Coursera for Business

Frequently asked questions

Can I preview a course before enrolling?

When will I have access to the lectures and assignments?

What will I get when I enroll?

More questions