By the end of this project, you will use the Apache Spark Structured Streaming API with Python to stream data from two different sources, store a dataset in the MongoDB database, and join two datasets. The Apache Spark Structured Streaming API is used to continuously stream data from various sources including the file system or a TCP/IP socket. One application is to continuously capture data from weather stations for historical purposes.
Apache Spark SQL
Apache Spark Structured Streaming API
Apache Spark Schema
In a video that plays in a split-screen with your work area, your instructor will walk you through these steps:
Your workspace is a cloud desktop right in your browser, no download required
In a split-screen video, your instructor guides you step-by-step