This course is designed for data engineers, analytics engineers, data platform engineers, and data architects who work with data lakes and want to modernize their data infrastructure. It's also valuable for software engineers transitioning into data roles and technical leads evaluating Apache Iceberg for their data.
By the end of this course, you will be able to:
- Build and configure an Apache Iceberg lakehouse using catalogs, object storage, and query engines like Spark and Trino
- Design optimal table structures using hidden partitioning, sort orders, and column metrics to maximize query performance
- Migrate existing data from Hive tables, Parquet files, CSV, and databases into Iceberg using snapshot, migrate, and reserialization approaches
- Implement production workflows using Write-Audit-Publish for validation, branching for testing, and rollback for recovery
- Evolve table schemas and partition specifications without downtime or rewriting data
- Execute maintenance operations including data file compaction, metadata compaction, and snapshot expiration
- Configure write strategies (merge-on-read vs copy-on-write) and distribution modes for different workload requirements
- Manage concurrent operations and avoid conflicts in multi-writer scenarios
To be successful in this course, you should have:
- Working knowledge of SQL and relational database concepts (tables, schemas, queries)
- Basic understanding of data engineering concepts including ETL/ELT, data warehouses, and data lakes
- Familiarity with command-line interfaces and Docker for running the course environment
- Comfort reading and understanding code examples in Python/PySpark (code is provided; you don't need to write from scratch)
- Experience with Apache Spark or distributed computing is helpful but not required—core concepts are explained throughout the course
Apache Iceberg, Iceberg, Apache, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Learn what Apache Iceberg is and how its metadata architecture enables powerful query optimizations. Build your first Iceberg lakehouse environment and explore how hidden partitioning and column metrics work together to skip unnecessary data during queries. Work with real NYC Taxi data to compare different partitioning strategies and measure their performance impact.
Inclus
6 vidéos3 lectures1 devoir
Afficher les informations sur le contenu du module
6 vidéos•Total 23 minutes
Course Introduction•3 minutes
What does it mean that Apache Iceberg is an Open Table Format? •3 minutes
The Open Lakehouse•5 minutes
Modeling Data into an Apache Iceberg Table•5 minutes
Hidden Partitioning in Apache Iceberg Tables•6 minutes
Summary of Module 1•2 minutes
3 lectures•Total 65 minutes
Getting Started: Setting Up Your Apache Iceberg Learning Environment•45 minutes
[IMPORTANT] Have Questions? Join the Q+A Forum for this course•10 minutes
Move existing data into Iceberg using migration strategies for Parquet, Hive, CSV, and database sources. Master Git-like features including Write-Audit-Publish for validation, branching for safe experimentation, and tagging for marking milestones. Learn how to evolve both table schemas and partition specifications without downtime or rewriting data.
Inclus
5 vidéos3 lectures1 devoir
Afficher les informations sur le contenu du module
5 vidéos•Total 32 minutes
Moving existing data to Iceberg•8 minutes
Git-like features with Write-Audit-Publish and Branching and Tagging•8 minutes
Schema Evolution for Iceberg Tables•6 minutes
Partition Evolution for Iceberg Tables•7 minutes
Summary of Module 2•3 minutes
3 lectures•Total 30 minutes
Moving Existing Tables to Iceberg•10 minutes
Safe Experimentation in Apache Iceberg Exercise•10 minutes
Schema and Partition Evolution Exercise•10 minutes
1 devoir•Total 30 minutes
Module 2 Quiz: Taking Advantage of Iceberg Tables•30 minutes
Operating and Optimizing Apache Iceberg
Module 3•2 heures à terminer
Détails du module
Optimize write performance and manage production Iceberg tables at scale. Understand streaming versus batch ingestion patterns, merge-on-read versus copy-on-write strategies, and how to handle concurrent operations safely. Execute essential maintenance operations including compaction and snapshot expiration to keep tables performant as they grow.
Inclus
8 vidéos6 lectures1 devoir
Afficher les informations sur le contenu du module
8 vidéos•Total 45 minutes
Ingesting Data into Apache Iceberg•8 minutes
Copy on Write and Merge on Read•4 minutes
Handling Concurrency in Apache Iceberg•3 minutes
Table Maintenance for Iceberg - The Basics•5 minutes
Table Maintenance for Iceberg - Compaction and Abandoned File Cleanup•7 minutes
Writing efficiently to Iceberg Tables•6 minutes
Sort Orders•10 minutes
Summary of Module 3•3 minutes
6 lectures•Total 60 minutes
Ingestion Exercise•10 minutes
Maintenance Exercise•10 minutes
Advanced Modeling and Ingestion Exercise•10 minutes
Additional Resources•10 minutes
Apache Iceberg Course Glossary•10 minutes
Course Acknowledgements•10 minutes
1 devoir•Total 30 minutes
Module 3 Quiz: Operating and Optimizing Apache Iceberg•30 minutes
A single, global platform that powers the Data Cloud. Snowflake is uniquely designed to connect businesses globally, across any type or scale of data and many different workloads, and unlock seamless data collaboration.
OK
Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?
Felipe M.
Étudiant(e) depuis 2018
’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’
Jennifer J.
Étudiant(e) depuis 2020
’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’
Larry W.
Étudiant(e) depuis 2021
’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’
Chaitanya A.
’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’
An Apache Iceberg lakehouse is a storage architecture that brings database-like reliability to data lakes. By using Snowflake's platform, organizations can manage Iceberg tables with high performance, near-instant elasticity, and universal governance.
Does this course cover migrating data to Snowflake?
Yes. You will learn to migrate existing data from Parquet, CSV, and legacy databases into Iceberg formats that integrate seamlessly with the Snowflake AI Data Cloud.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I purchase the Certificate?
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.