Working with Big Data

Offered By
Coursera Project Network
In this Guided Project, you will:

Process a large dataset from NOAA showing hourly precipitation rates for a ten year period from the state of Wisconsin

Clock2 hours
IntermediateIntermediate
CloudNo download needed
VideoSplit-screen video
Comment DotsEnglish
LaptopDesktop only

By the end of this project, you will set up an environment for Big Data Development using Visual Studio Code, MongoDB and Apache Spark. You will then use the environment to process a large dataset from NOAA showing hourly precipitation rates for a ten year period from the state of Wisconsin. MongoDB is a widely used NoSQL database well suited for very large datasets or Big Data. It is highly scalable and adaptable as well. Apache Spark is used for efficient in-memory processing of Big Data.

Skills you will develop

  • PySpark Queries
  • Mongodb
  • Python Programming
  • Big Data
  • PySpark

Learn step-by-step

In a video that plays in a split-screen with your work area, your instructor will walk you through these steps:

  1. Set up Apache Spark and MongoDB Environment.

  2. Create a Python PySpark program to read CSV data.

  3. Use Spark SQL to query in-memory data.

  4. Configure Apache Spark to connect to MongoDB.

  5. Persist data using Spark and MongoDB.

How Guided Projects work

Your workspace is a cloud desktop right in your browser, no download required

In a split-screen video, your instructor guides you step-by-step

Frequently asked questions

Frequently Asked Questions

More questions? Visit the Learner Help Center.