Use the Apache Spark Structured Streaming API with MongoDB

Offered By
Coursera Project Network
In this Guided Project, you will:

Use the Apache Spark Structured Streaming API with Python to stream data from two different sources

Use the Apache Spark Structured Streaming API with Python to store a dataset in the MongoDB database and join two datasets

Clock2 hours
IntermediateIntermediate
CloudNo download needed
VideoSplit-screen video
Comment DotsEnglish
LaptopDesktop only

By the end of this project, you will use the Apache Spark Structured Streaming API with Python to stream data from two different sources, store a dataset in the MongoDB database, and join two datasets. The Apache Spark Structured Streaming API is used to continuously stream data from various sources including the file system or a TCP/IP socket. One application is to continuously capture data from weather stations for historical purposes.

Skills you will develop

  • Apache Spark SQL
  • Mongodb
  • Apache Spark Structured Streaming API
  • Apache Spark Schema
  • Apache Spark

Learn step-by-step

In a video that plays in a split-screen with your work area, your instructor will walk you through these steps:

  1. Create a Python PySpark program to read streaming structured data.

  2. Persist Apache Spark data to MongoDB.

  3. Use Spark Structured Query Language to query data.

  4. Use Spark to stream from two different structured data sources.

  5. Use the Spark Structured Streaming API to join two streaming datasets.

How Guided Projects work

Your workspace is a cloud desktop right in your browser, no download required

In a split-screen video, your instructor guides you step-by-step

Frequently asked questions

Frequently Asked Questions

More questions? Visit the Learner Help Center.