Databricks Lakehouse Fundamentals

This course is part of Enterprise AI and Data Engineering with Databricks Specialization

Instructor: Noah Gift

Access provided by American University of Bahrain

4 modules

Gain insight into a topic and learn the fundamentals.

Beginner level

Recommended experience

5 hours to complete

Flexible schedule

Learn at your own pace

4 modules

Gain insight into a topic and learn the fundamentals.

Beginner level

Recommended experience

5 hours to complete

Flexible schedule

Learn at your own pace

What you'll learn

Write PySpark and SparkSQL queries using lazy evaluation, the Catalyst optimizer, and broadcast join optimization
Schedule end-to-end data pipelines as multi-task Databricks Jobs with dashboards and alerting
Build and query Delta Lake tables with ACID transactions, schema enforcement, time travel, and MERGE-based incremental ETL

Skills you'll gain

Tools you'll learn

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

4 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Enterprise AI and Data Engineering with Databricks Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 4 modules in this course

Learn to build data pipelines on the Databricks Lakehouse Platform — from architecture concepts to hands-on Spark and Delta Lake. This beginner course starts with why the lakehouse pattern replaced separate data warehouses and data lakes, then moves directly into the Databricks workspace where you'll configure compute, write PySpark and SQL queries, and manage data with Unity Catalog's three-level namespace.

Week by week, you'll progress from navigating the platform to transforming DataFrames with select, filter, groupBy, and joins, then to creating Delta Lake tables with ACID transactions, schema enforcement, and time travel. You'll perform real DML operations — INSERT, UPDATE, DELETE, and MERGE — and learn to schedule production pipelines using Databricks Jobs with DAG-based orchestration. The course runs entirely on Databricks Free Edition, so there's no cloud billing. Six hands-on labs reinforce each module: explore the workspace, write notebook-based transformations, build Delta tables, and wire up an automated workflow. By the end, you'll have built a complete data engineering pipeline from raw ingestion through Delta Lake to scheduled production jobs.

This module introduces the lakehouse paradigm and the Databricks platform. You'll learn about the structure of lakehouse architecture, explore the Databricks workspace and its core tools, and understand how compute and storage work together.

What's included

6 videos7 readings1 assignment

6 videosTotal 25 minutes

Data Architecture Evolution5 minutes
Lakehouse Architecture5 minutes
Databricks and the Lakehouse3 minutes
Databricks Overview4 minutes
Workspace, Catalog & Data4 minutes
Compute Resources4 minutes

7 readingsTotal 70 minutes

About This Course10 minutes
Key Terms10 minutes
Reflection10 minutes
Key Terms10 minutes
Reflection10 minutes
Key Terms10 minutes
Reflection10 minutes

1 assignmentTotal 5 minutes

Quiz: Lakehouse Architecture & Platform5 minutes

This module covers notebooks and hands-on data manipulation using PySpark. You'll create and organize notebooks, load data from the Catalog, and write PySpark transformations to select, filter, aggregate, and join datasets.

What's included

6 videos4 readings1 assignment

6 videosTotal 28 minutes

Using Notebooks4 minutes
Magic Commands & Utilities4 minutes
Loading & Previewing Data5 minutes
Spark Core Concepts3 minutes
Select & Filter Operations7 minutes
GroupBy, Aggregations & Joins5 minutes

4 readingsTotal 40 minutes

Key Terms10 minutes
Reflection10 minutes
Key Terms10 minutes
Reflection10 minutes

1 assignmentTotal 5 minutes

Quiz: Spark Fundamentals5 minutes

This module introduces Delta Lake, where you'll create Delta tables, perform transactional operations like updates, deletes, and merges, use time travel to query previous versions, and see how Delta Lake connects to governance and automation features.

What's included

6 videos4 readings1 assignment

6 videosTotal 25 minutes

What Is Delta Lake4 minutes
Delta Lake Concepts4 minutes
Creating Delta Tables6 minutes
Insert, Update & Merge5 minutes
Time Travel3 minutes
Jobs, Dashboards & Workflows4 minutes

4 readingsTotal 40 minutes

Key Terms10 minutes
Reflection10 minutes
Key Terms10 minutes
Reflection10 minutes

1 assignmentTotal 30 minutes

Quiz: Delta Lake & Workflows30 minutes

Build an end-to-end lakehouse data pipeline integrating every concept from the course. Starting from raw data files, you will construct a complete medallion architecture (bronze → silver → gold) with Delta Lake, implement incremental MERGE logic, and orchestrate the pipeline as a scheduled Databricks Job. Six hands-on lab notebooks guide you through the project using the course GitHub repository.