When will I have access to the lectures and assignments?

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

What will I get if I subscribe to this Specialization?

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Is financial aid available?

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Advanced Data Engineering

Advanced Data Engineering

This course is part of Large Language Model Operations (LLMOps) Specialization

Instructors: Noah Gift

5,631 already enrolled

Included with

Learn more

4 modules

Gain insight into a topic and learn the fundamentals.

22 reviews

Intermediate level

Recommended experience

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

4 modules

Gain insight into a topic and learn the fundamentals.

22 reviews

Intermediate level

Recommended experience

2 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Create and manage data pipelines and their lifecycle
Connect and work with message queues to manage data processing
Use vector, graph, and key/value databases for data storage at scale

Skills you'll gain

Tools you'll learn

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

14 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the Large Language Model Operations (LLMOps) Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 4 modules in this course

In this advanced course, you will gain practical expertise in scaling data engineering systems using cutting-edge tools and techniques. This course is designed for data scientists, data engineers, and anyone with a foundational understanding of data handling who desires to escalate their skills to handle larger, more complex datasets efficiently.

Throughout the course, you'll master the application of technologies such as Celery with RabbitMQ for scalable data consumption, Apache Airflow for optimized workflow management, and Vector and Graph databases for robust data management at scale. The course will culminate with hands-on projects that offer real-world experience, where you'll put your acquired skills to test in solving data engineering challenges. You will not only learn to create scalable data systems but also to analyze their performance and make necessary adjustments for optimum results. This invaluable experience in advanced data engineering techniques will prepare you for the demanding tasks of handling massive datasets, streamlining complex workflows, and optimizing data operations for businesses of any scale.

Module details

In this module, you will learn about databases and queues. You will find out the purpose and components of RabbitMQ including its use of queues and integration with Celery. Through hands-on exercises, they will gain experience connecting Celery to RabbitMQ within a Flask application and implementing task patterns like fire and forget and result retrieval. The course also covers core MySQL skills like interacting via the command line interface, manipulating databases, and integrating with Python web apps. By the end, students will have a foundational understanding of RabbitMQ, Celery, and MySQL that allows them to start building modern, asynchronous applications backed by a database.

What's included

22 videos15 readings4 assignments1 discussion prompt1 ungraded lab

22 videosTotal 79 minutes

Meet your instructor: Alfredo Deza2 minutes
About this course3 minutes
Introduction1 minute
Overview of Queues6 minutes
What is Celery?3 minutes
Use cases for RabbitMQ3 minutes
Overview of a Flask and Celery application4 minutes
Summary2 minutes
Introduction1 minute
Configuring Celery with Flask5 minutes
Connecting Celery with RabbitMQ6 minutes
Defining a Celery task in Flask3 minutes
Fire and forget task in Flask3 minutes
Retrieve values from asynchronous tasks4 minutes
Summary2 minutes
MySQL Overview3 minutes
MySQL from Terminal3 minutes
Archive and Drop Database5 minutes
Import external database Sakila7 minutes
Modify database Sakila5 minutes
Bash pipelines with MySQL5 minutes
MySQL to Python Standard Library Web Server4 minutes

15 readingsTotal 145 minutes

Connect with your instructor10 minutes
Meet your instructor: Noah Gift10 minutes
Course structure and discussion etiquette10 minutes
Report a Problem with the Course5 minutes
Key Terms10 minutes
Introduction to Celery10 minutes
Using RabbitMQ with Docker10 minutes
External lab: Start RabbitMQ in a development environment10 minutes
Key Terms10 minutes
Build a web app by using Python and Flask10 minutes
Background tasks with Celery10 minutes
External lab: Add a new Celery task for RabbitMQ10 minutes
Key Terms10 minutes
Getting Started with MySQL10 minutes
Lesson Reflection10 minutes

4 assignmentsTotal 120 minutes

Queues and Databases - Final week quiz30 minutes
Introduction to RabbitMQ and Flask30 minutes
RabbitMQ with Celery and Flask30 minutes
Quiz-MySQL for Data Engineering30 minutes

1 discussion promptTotal 10 minutes

Meet and greet (optional)10 minutes

1 ungraded labTotal 60 minutes

Linux Hacking with MySQL60 minutes

What's included

17 videos13 readings4 assignments

17 videosTotal 79 minutes

Introduction1 minute
What is Apache Airflow?7 minutes
Installing Apache Airflow from PyPI5 minutes
Using Apache Airflow with Docker6 minutes
Exploring the Airflow UI7 minutes
Introduction1 minute
Exploring directed acyclic graphs (DAG)11 minutes
Creating a DAG8 minutes
Running a backfill5 minutes
Testing and validation7 minutes
Summary1 minute
Introduction1 minute
Identifying a task to build a DAG5 minutes
Retrieving remote data5 minutes
Cleaning and normalizing data4 minutes
Inspecting the UI for results5 minutes
Summary1 minute

13 readingsTotal 130 minutes

Key Terms10 minutes
What is Apache Airflow10 minutes
Exploring the Airflow User Interface10 minutes
External lab: Install Apache Airflow10 minutes
Lesson Reflection10 minutes
Key Terms10 minutes
External lab: Create a DAG10 minutes
Architecture overview10 minutes
Lesson Reflection10 minutes
Key Terms10 minutes
External Lab: Build a data pipeline for census data10 minutes
Build Data Pipelines with Apache Airflow10 minutes
Lesson Reflection10 minutes

4 assignmentsTotal 120 minutes

Quiz-Optimizing Workflow Management at Scale with Apache Airflow30 minutes
Quiz-Installing Apache Airflow30 minutes
Quiz-Apache Airflow Fundamentals30 minutes
Quiz-Creating a pipeline30 minutes

In this module, we explore vector and graph databases, powerful tools for managing and extracting insights from large, complex datasets. As data volumes continue to grow, scalability is crucial. We'll learn how vector and graph databases can efficiently store data while maintaining relationships, enabling more advanced analytics. Through real-world examples, you'll see how these databases unlock scalability for machine learning, fraud detection, social networks, and more.

What's included

14 videos11 readings3 assignments1 ungraded lab

14 videosTotal 43 minutes

Picking the proper database3 minutes
What are vector databases and how they work2 minutes
Implementing Semantic search5 minutes
Quickstart Qdrant3 minutes
Qdrant Rust Client3 minutes
Vector Database Architectures2 minutes
Hands-on lab: Enhance Semantic Search3 minutes
Graph data models and database concepts2 minutes
Introduction to Amazon Neptune3 minutes
Graph algorithms: UFC graph centrality in Rust4 minutes
Kosaraju Community Detection in Graphs4 minutes
Shortest Path with Graphs3 minutes
Key Components of Rust CLI Tool2 minutes
Lab Walkthrough: Building a Rust Graph CLI Tool3 minutes

11 readingsTotal 110 minutes

Key Terms10 minutes
What is a Vector Database?10 minutes
External Lab: Run Quickstart of Qdrant10 minutes
External Lab: Extend Semantic Search10 minutes
Jaccard index10 minutes
Lesson Reflection10 minutes
Key Terms10 minutes
Rust CLI with Clap10 minutes
External Lab: Rust Graph CLI Tool10 minutes
Amazon Neptune10 minutes
Lesson Reflection10 minutes

3 assignmentsTotal 90 minutes

Final Quiz-Achieving Scalability with Vector, Graph, and Key/Value Databases30 minutes
Quiz-Introduction to Vector Databases30 minutes
Quiz-Introduction to Graph Databases30 minutes

1 ungraded labTotal 60 minutes

Social Media Recommender60 minutes

In this final module, you will work on advanced real-world data engineering projects, applying everything you've learned. You'll encounter complex data challenges and devise solutions using the latest tools and techniques. This is an opportunity to bring together data engineering concepts covered throughout the course and implement them holistically to deliver impactful outcomes.