When you enroll in this course, you'll also be enrolled in this Specialization.
Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate
There are 4 modules in this course
In this advanced course, you will gain practical expertise in scaling data engineering systems using cutting-edge tools and techniques. This course is designed for data scientists, data engineers, and anyone with a foundational understanding of data handling who desires to escalate their skills to handle larger, more complex datasets efficiently.
Throughout the course, you'll master the application of technologies such as Celery with RabbitMQ for scalable data consumption, Apache Airflow for optimized workflow management, and Vector and Graph databases for robust data management at scale.
The course will culminate with hands-on projects that offer real-world experience, where you'll put your acquired skills to test in solving data engineering challenges. You will not only learn to create scalable data systems but also to analyze their performance and make necessary adjustments for optimum results.
This invaluable experience in advanced data engineering techniques will prepare you for the demanding tasks of handling massive datasets, streamlining complex workflows, and optimizing data operations for businesses of any scale.
In this module, you will learn about databases and queues. You will find out the purpose and components of RabbitMQ including its use of queues and integration with Celery. Through hands-on exercises, they will gain experience connecting Celery to RabbitMQ within a Flask application and implementing task patterns like fire and forget and result retrieval. The course also covers core MySQL skills like interacting via the command line interface, manipulating databases, and integrating with Python web apps. By the end, students will have a foundational understanding of RabbitMQ, Celery, and MySQL that allows them to start building modern, asynchronous applications backed by a database.
External Lab: Build a data pipeline for census data•10 minutes
Build Data Pipelines with Apache Airflow•10 minutes
Lesson Reflection•10 minutes
4 assignments•Total 120 minutes
Quiz-Installing Apache Airflow•30 minutes
Quiz-Apache Airflow Fundamentals•30 minutes
Quiz-Creating a pipeline•30 minutes
Quiz-Optimizing Workflow Management at Scale with Apache Airflow•30 minutes
Achieving Scalability with Vector, Graph, and Key/Value Databases
Module 3•5 hours to complete
Module details
In this module, we explore vector and graph databases, powerful tools for managing and extracting insights from large, complex datasets. As data volumes continue to grow, scalability is crucial. We'll learn how vector and graph databases can efficiently store data while maintaining relationships, enabling more advanced analytics. Through real-world examples, you'll see how these databases unlock scalability for machine learning, fraud detection, social networks, and more.
What's included
14 videos11 readings3 assignments1 ungraded lab
Show info about module content
14 videos•Total 43 minutes
Picking the proper database•3 minutes
What are vector databases and how they work•2 minutes
Implementing Semantic search•5 minutes
Quickstart Qdrant•3 minutes
Qdrant Rust Client•3 minutes
Vector Database Architectures•2 minutes
Hands-on lab: Enhance Semantic Search•3 minutes
Graph data models and database concepts•2 minutes
Introduction to Amazon Neptune•3 minutes
Graph algorithms: UFC graph centrality in Rust•4 minutes
Kosaraju Community Detection in Graphs•4 minutes
Shortest Path with Graphs•3 minutes
Key Components of Rust CLI Tool•2 minutes
Lab Walkthrough: Building a Rust Graph CLI Tool•3 minutes
11 readings•Total 110 minutes
Key Terms•10 minutes
What is a Vector Database?•10 minutes
External Lab: Run Quickstart of Qdrant•10 minutes
External Lab: Extend Semantic Search•10 minutes
Jaccard index•10 minutes
Lesson Reflection•10 minutes
Key Terms•10 minutes
Rust CLI with Clap•10 minutes
External Lab: Rust Graph CLI Tool•10 minutes
Amazon Neptune•10 minutes
Lesson Reflection•10 minutes
3 assignments•Total 90 minutes
Quiz-Introduction to Vector Databases•30 minutes
Quiz-Introduction to Graph Databases•30 minutes
Final Quiz-Achieving Scalability with Vector, Graph, and Key/Value Databases•30 minutes
1 ungraded lab•Total 60 minutes
Social Media Recommender•60 minutes
Real-world Advanced Data Engineering Projects
Module 4•6 hours to complete
Module details
In this final module, you will work on advanced real-world data engineering projects, applying everything you've learned. You'll encounter complex data challenges and devise solutions using the latest tools and techniques. This is an opportunity to bring together data engineering concepts covered throughout the course and implement them holistically to deliver impactful outcomes.
What's included
13 videos10 readings3 assignments2 ungraded labs
Show info about module content
13 videos•Total 44 minutes
Learn AWS CloudShell for Dynamo Development•4 minutes
Learn AWS CodeCatalyst for Dynamo Development•5 minutes
Leveraging AWS CodeWhisperer for Dynamo Development•4 minutes
Create a Table with CLI•1 minute
Populate a Table With Batching Records•1 minute
Query a Table with Records•2 minutes
Project Walkthrough•2 minutes
Introduction•1 minute
Overview of a pipeline requirements•4 minutes
Using SqlAlchemy with Pandas•6 minutes
Persisting data in a task•6 minutes
Reviewing the results•5 minutes
Summary•2 minutes
10 readings•Total 100 minutes
Key Terms•10 minutes
Amazon CodeCatalyst•10 minutes
External Lab: Extended DynamoDB •10 minutes
Lesson Reflection•10 minutes
Key Terms•10 minutes
Quick start for SQLAlchemy•10 minutes
Explore and analyze data with Python•10 minutes
Lesson Reflection•10 minutes
Recommended Next Steps•10 minutes
Share your learning experience•10 minutes
3 assignments•Total 90 minutes
Quiz-Building a solution with DynamoDB with the AWS CLI•30 minutes
Quiz-Persisting data through a multi-task DAG with Pandas•30 minutes
Final Quiz-Advanced Data Engineering•30 minutes
2 ungraded labs•Total 120 minutes
Jupyter Sandbox•60 minutes
VS Code Sandbox•60 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructors
Instructor ratings
Instructor ratings
We asked all learners to give feedback on our instructors based on the quality of their teaching style.
Duke University has about 13,000 undergraduate and graduate students and a world-class faculty helping to expand the frontiers of knowledge. The university has a strong commitment to applying knowledge in service to society, both near its North Carolina campus and around the world.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.