Lorsque vous vous inscrivez à ce cours, vous êtes également inscrit(e) à cette Spécialisation.
Apprenez de nouveaux concepts auprès d'experts du secteur
Acquérez une compréhension de base d'un sujet ou d'un outil
Développez des compétences professionnelles avec des projets pratiques
Obtenez un certificat professionnel partageable
Il y a 3 modules dans ce cours
“Design Real-Time Architectures with Apache Spark & Kafka” is an intermediate-level course crafted for learners aiming to build modern, scalable streaming systems. Across engaging, scenario-driven lessons, the course offers a comprehensive introduction to designing and implementing real-time data pipelines. Participants explore the foundations of streaming concepts, event-driven patterns, and the unique demands of low-latency processing. They gain practical experience working with Apache Kafka for event ingestion and Apache Spark Structured Streaming for real-time computation, learning to transform raw streams into actionable insights. The curriculum emphasizes reliable pipeline design, covering fault tolerance, checkpointing, and performance tuning to ensure systems can operate at scale. Through hands-on practice, guided dialogues, and real-world financial data scenarios, learners develop the confidence to architect, optimize, and deploy production-ready streaming solutions. By the end of the course, they are equipped with the technical and strategic skills needed to excel in today’s data-driven, real-time environments.
Learners should know basic Python or Scala, be comfortable with the command line, understand distributed systems at a high level, and have a simple introductory familiarity with Kafka and Spark.
This course is ideal for aspiring data engineers, analysts or data scientists shifting into real-time systems, and software engineers exploring event-driven architecture. It also suits anyone working with large-scale data or financial and AI/ML pipelines who wants to understand how real-time data powers modern systems.
By the end of the course, they are equipped with the technical and strategic skills needed to excel in today’s data-driven, real-time environments.
This module introduces the core principles behind real-time data systems and how they differ from traditional batch processing. Learners explore key patterns such as event-driven design, streaming workflows, and the roles Kafka and Spark play in a modern data ecosystem. By the end, learners understand the foundational components required to build low-latency, scalable streaming architectures.
Inclus
4 vidéos2 lectures1 évaluation par les pairs
Afficher les informations sur le contenu du module
4 vidéos•Total 18 minutes
Welcome to the Real-Time Architectures with Apache Spark & Kafka•2 minutes
Streaming Data vs. Stream Processing vs. Real-Time Analytics•5 minutes
1 évaluation par les pairs•Total 20 minutes
Hands-On-Learning: Mapping a Real-Time Architecture for Live Transaction Monitoring•20 minutes
Building Real-Time Pipelines with Kafka & Spark
Module 2•1 heure à terminer
Détails du module
In this module, learners dive into the practical construction of streaming pipelines using Kafka and Spark Structured Streaming. They design Kafka topics, configure producers and consumers, and connect Spark to process incoming data streams. The module emphasizes transformations, windowing, and stateful operations essential for building functional real-world pipelines.
Inclus
3 vidéos1 lecture1 évaluation par les pairs
Afficher les informations sur le contenu du module
This module focuses on preparing real-time systems for production environments. Learners explore fault tolerance, scalability strategies, and performance tuning for Kafka and Spark. They also learn how to monitor streaming workloads, implement checkpoints, and ensure reliability. The module concludes with best practices for deploying and maintaining robust, enterprise-ready real-time architectures.
Inclus
4 vidéos1 lecture1 devoir2 évaluations par les pairs
Afficher les informations sur le contenu du module
4 vidéos•Total 21 minutes
Ensuring Reliability with Checkpointing & Fault Tolerance•5 minutes
Performance Tuning Kafka & Spark for Real-Time Workloads•5 minutes
10× Pipeline Performance: Kafka and Spark Tuning in Practice•5 minutes
1 devoir•Total 20 minutes
Design Real-Time Architectures with Spark & Kafka•20 minutes
2 évaluations par les pairs•Total 80 minutes
Hands-On-Learning: Optimizing and Monitoring a Production-Ready Streaming System•20 minutes
Project: Real-Time Streaming Alert System for Money-Laundering Detection•60 minutes
Obtenez un certificat professionnel
Ajoutez ce titre à votre profil LinkedIn, à votre curriculum vitae ou à votre CV. Partagez-le sur les médias sociaux et dans votre évaluation des performances.
Coursera brings together a diverse network of subject matter experts who have demonstrated their expertise through professional industry experience or strong academic backgrounds. These instructors design and teach courses that make practical, career-relevant skills accessible to learners worldwide.
Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?
Felipe M.
Étudiant(e) depuis 2018
’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’
Jennifer J.
Étudiant(e) depuis 2020
’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’
Larry W.
Étudiant(e) depuis 2021
’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’
Chaitanya A.
’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’
What is a real-time streaming pipeline in this course?
In this course, a real-time streaming pipeline is a connected flow that ingests events as they arrive, processes them continuously, and produces updated outputs without waiting for a scheduled batch run. The emphasis is on designing that flow so it stays low-latency, scalable, and reliable as data keeps moving.
When would you use this kind of real-time pipeline?
You would use this kind of pipeline when the value of the data depends on handling it as it happens rather than much later. The course frames it for ongoing event streams where timely processing, continuous analysis, and immediate outputs matter.
How does a streaming pipeline fit into a broader workflow?
A streaming pipeline sits between event sources and the systems that use processed results, turning raw event flow into structured, ongoing outputs. In the course, it is treated as the repeatable middle layer that connects ingestion, transformation, and operational monitoring.
How is a streaming pipeline different from batch processing?
A streaming pipeline works on events continuously, while batch processing collects data first and runs later on a schedule. The course uses this contrast to show why streaming is better suited to low-latency work, but also why it requires added attention to state, late data, and fault tolerance.
Do you need any prerequisites before learning to build streaming pipelines?
A basic background in Python or Scala, comfort with the command line, and a high-level understanding of distributed systems are helpful before you start. The course also assumes simple introductory familiarity with Kafka and Spark rather than deep experience building streaming systems.
What tools, platforms, or methods are used in this course?
The course centers on Apache Kafka for event ingestion and Apache Spark Structured Streaming for continuous processing. It also introduces event-driven design and reliability practices such as checkpointing and monitoring.
What specific tasks will you practice or complete in this course?
You practice designing Kafka topics and event flow, connecting live streams to Spark, and applying transformations, windowing, and stateful processing to incoming data. You also work on checkpointing, monitoring, and tuning so the pipeline can run reliably in real-time.