What is an Apache Iceberg lakehouse?

An Apache Iceberg lakehouse is a storage architecture that brings database-like reliability to data lakes. By using Snowflake's platform, organizations can manage Iceberg tables with high performance, near-instant elasticity, and universal governance.

Does this course cover migrating data to Snowflake?

Yes. You will learn to migrate existing data from Parquet, CSV, and legacy databases into Iceberg formats that integrate seamlessly with the Snowflake AI Data Cloud.

When will I have access to the lectures and assignments?

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

What will I get if I purchase the Certificate?

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Is financial aid available?

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Apache Iceberg: From Zero to Production Data Lakehouse

Économisez sur les compétences qui vous font briller avec 40 % de réduction sur 3 mois de Coursera Plus. Économisez maintenant

Ce cours n'est pas disponible en Français (France)

Nous sommes actuellement en train de le traduire dans plus de langues.

Apache Iceberg: From Zero to Production Data Lakehouse

Instructeur : Snowflake Northstar

Inclus avec

3 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

6 heures à compléter

Planning flexible

Apprenez à votre propre rythme

3 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

6 heures à compléter

Planning flexible

Apprenez à votre propre rythme

Ce que vous apprendrez

Build production-ready Iceberg lakehouses with optimized partitioning and schema design for maximum query performance
Migrate existing data to Iceberg and manage schema evolution, partitioning changes, and Git-like workflows without downtime
Maintain Iceberg tables at scale through compaction, snapshot management, and write strategy optimization for concurrent workloads

Compétences que vous acquerrez

Catégorie : Data Infrastructure
Catégorie : Data Pipelines
Catégorie : Interoperability
Catégorie : Metadata Management
Catégorie : Transaction Processing
Catégorie : Data Maintenance
Catégorie : Data Validation
Catégorie : Data Management
Catégorie : Database Design
Catégorie : Data Migration
Catégorie : Performance Tuning
Catégorie : Database Management
Catégorie : Data Integrity
Catégorie : Data Architecture

Outils que vous découvrirez

Catégorie : Apache Spark
Catégorie : Query Languages
Catégorie : Data Lakes
Catégorie : Apache Hive

Détails à connaître

Certificat partageable

Ajouter à votre profil LinkedIn

Récemment mis à jour !

mars 2026

Évaluations

3 devoirs

Enseigné en Anglais

91% of learners achieved a positive career outcome

Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées

En savoir plus sur Coursera pour les affaires

logos de Petrobras, TATA, Danone, Capgemini, P&G et L'Oreal

Il y a 3 modules dans ce cours

This course is designed for data engineers, analytics engineers, data platform engineers, and data architects who work with data lakes and want to modernize their data infrastructure. It's also valuable for software engineers transitioning into data roles and technical leads evaluating Apache Iceberg for their data.

By the end of this course, you will be able to: - Build and configure an Apache Iceberg lakehouse using catalogs, object storage, and query engines like Spark and Trino - Design optimal table structures using hidden partitioning, sort orders, and column metrics to maximize query performance - Migrate existing data from Hive tables, Parquet files, CSV, and databases into Iceberg using snapshot, migrate, and reserialization approaches - Implement production workflows using Write-Audit-Publish for validation, branching for testing, and rollback for recovery - Evolve table schemas and partition specifications without downtime or rewriting data - Execute maintenance operations including data file compaction, metadata compaction, and snapshot expiration - Configure write strategies (merge-on-read vs copy-on-write) and distribution modes for different workload requirements - Manage concurrent operations and avoid conflicts in multi-writer scenarios To be successful in this course, you should have: - Working knowledge of SQL and relational database concepts (tables, schemas, queries) - Basic understanding of data engineering concepts including ETL/ELT, data warehouses, and data lakes - Familiarity with command-line interfaces and Docker for running the course environment - Comfort reading and understanding code examples in Python/PySpark (code is provided; you don't need to write from scratch) - Experience with Apache Spark or distributed computing is helpful but not required—core concepts are explained throughout the course Apache Iceberg, Iceberg, Apache, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Détails du module

Learn what Apache Iceberg is and how its metadata architecture enables powerful query optimizations. Build your first Iceberg lakehouse environment and explore how hidden partitioning and column metrics work together to skip unnecessary data during queries. Work with real NYC Taxi data to compare different partitioning strategies and measure their performance impact.

Inclus

6 vidéos3 lectures1 devoir

6 vidéosTotal 23 minutes

Course Introduction3 minutes
What does it mean that Apache Iceberg is an Open Table Format? 3 minutes
The Open Lakehouse5 minutes
Modeling Data into an Apache Iceberg Table5 minutes
Hidden Partitioning in Apache Iceberg Tables6 minutes
Summary of Module 12 minutes

3 lecturesTotal 65 minutes

Getting Started: Setting Up Your Apache Iceberg Learning Environment45 minutes
[IMPORTANT] Have Questions? Join the Q+A Forum for this course10 minutes
Introduction to Data Modeling Exercise10 minutes

1 devoirTotal 30 minutes

Module 1 Quiz: Apache Iceberg Fundamentals30 minutes

Move existing data into Iceberg using migration strategies for Parquet, Hive, CSV, and database sources. Master Git-like features including Write-Audit-Publish for validation, branching for safe experimentation, and tagging for marking milestones. Learn how to evolve both table schemas and partition specifications without downtime or rewriting data.

Inclus

5 vidéos3 lectures1 devoir

5 vidéosTotal 32 minutes

Moving existing data to Iceberg8 minutes
Git-like features with Write-Audit-Publish and Branching and Tagging8 minutes
Schema Evolution for Iceberg Tables6 minutes
Partition Evolution for Iceberg Tables7 minutes
Summary of Module 23 minutes

3 lecturesTotal 30 minutes

Moving Existing Tables to Iceberg10 minutes
Safe Experimentation in Apache Iceberg Exercise10 minutes
Schema and Partition Evolution Exercise10 minutes

1 devoirTotal 30 minutes

Module 2 Quiz: Taking Advantage of Iceberg Tables30 minutes

Optimize write performance and manage production Iceberg tables at scale. Understand streaming versus batch ingestion patterns, merge-on-read versus copy-on-write strategies, and how to handle concurrent operations safely. Execute essential maintenance operations including compaction and snapshot expiration to keep tables performant as they grow.