In this hands-on, 1-hour project-based course, you will master real-time data processing using Apache Spark Structured Streaming. This course is designed for data engineers and developers who want to gain practical experience in building streaming data pipelines. You will begin by setting up the Spark environment and learn how to configure micro-batches and fault tolerance mechanisms through checkpointing. Next, you’ll dive into transforming streaming data by applying filters, maps, and aggregations to extract meaningful insights. You'll also handle out-of-order data with watermarks, ensuring the accuracy of your real-time analytics. The course will introduce you to querying streaming data using SQL, allowing you to perform transformations and aggregations on live data. Finally, you will learn to deploy your streaming pipeline to production by writing results to an external sink like Parquet files. This is an intermediate level project and in order to succeed in this course it is recommended to have basic understanding of Apache Spark and API PySpark, proficiency in programming and big data as well and some basic knowledge on writing SQL queries. This is the perfect opportunity for anyone looking to dive into real-time data processing and Spark Structured Streaming!



Real-time analytics with Spark: User Activity Monitoring

Instructor: imen kerkeni
Access provided by University of North Texas
Recommended experience
What you'll learn
- Set up and configure a real-time data processing pipeline 
- Perform transformations, aggregations, and SQL queries on streaming data 
- Implement fault-tolerance mechanisms and ensure the pipeline remains resilient under high workloads and data inconsistencies 
Skills you'll practice
Details to know

Add to your LinkedIn profile
Only available on desktop
See how employees at top companies are mastering in-demand skills

Learn, practice, and apply job-ready skills in less than 2 hours
- Receive training from industry experts
- Gain hands-on experience solving real-world job tasks
- Build confidence using the latest tools and technologies

About this Guided Project
Learn step-by-step
In a video that plays in a split-screen with your work area, your instructor will walk you through these steps:
- Task 1: Setting Up the Environment for Real-Time Data Streaming 
- Task 2: Managing Triggers and Checkpoints 
- Task 3: Transforming Streaming Data 
- Practice Activity 
- Task 4: Performing Transformations, Aggregations, and Advanced SQL Queries 
- Task 5: Writing and Deploying the Pipeline 
- Cumulative Challenge 
Recommended experience
Experience with Apache Spark and API Pyspark. Python coding skills. SQL query proficiency. Big Data concepts. Kafka basics.
7 project images
Instructor

Offered by
How you'll learn
- Skill-based, hands-on learning - Practice new skills by completing job-related tasks. 
- Expert guidance - Follow along with pre-recorded videos from experts using a unique side-by-side interface. 
- No downloads or installation required - Access the tools and resources you need in a pre-configured cloud workspace. 
- Available only on desktop - This Guided Project is designed for laptops or desktop computers with a reliable Internet connection, not mobile devices. 
Why people choose Coursera for their career




You might also like
 - École Polytechnique Fédérale de Lausanne 




