Explore stock prices with Spark SQL
Create an application that runs on a Spark cluster
Derive knowledge from data using Spark RDD and DataFrames
Store results in Parquet tables
In this 1-hour long project-based course, you will learn how to interact with a Spark cluster using Jupyter notebook and how to start a Spark application. You will learn how to utilize Spark Resisilent Distributed Datasets and Spark Data Frames to explore a dataset. We will load a dataset into our Spark program, and perform analysis on it by using Actions, Transformations, Spark DataFrame API and Spark SQL. You will learn how to choose the best tools to use for each scenario. Finally, you will learn to save your results in Parquet tables.
Spark SQL
Data Analysis
Big Data
Apache Spark
Distributed Computing
In a video that plays in a split-screen with your work area, your instructor will walk you through these steps:
By the end of Task 1, you will become familiar with the Jupyter notebook environment
By the end of Task 2, you will be able to initialize a Spark application
By the end of Task 3, you will be able to create Spark Resilient Distributed Datasets
By the end of Task 4, you will be able to create Spark Data Frames in several ways
By the end of Task 5, you will be able to explore data sets with Spark SQL
By the end of Task 6, you will be able to write statistic queries and compare Spark DataFrames
By the end of Task 7, you will be able to store DataFrames in Parquet tables
