Back to Data Analysis Using Pyspark
Coursera

Data Analysis Using Pyspark

One of the important topics that every data analyst should be familiar with is the distributed data processing technologies. As a data analyst, you should be able to apply different queries to your dataset to extract useful information out of it. but what if your data is so big that working with it on your local machine is not easy to be done. That is when the distributed data processing and Spark Technology will become handy. So in this project, we are going to work with pyspark module in python and we are going to use google colab environment in order to apply some queries to the dataset we have related to lastfm website which is an online music service where users can listen to different songs. This dataset is containing two csv files listening.csv and genre.csv. Also, we will learn how we can visualize our query results using matplotlib.

Status: Python Programming
Status: Query Languages
IntermediateGuided Project2 hours

Featured reviews

SA

4.0Reviewed Jul 2, 2023

Overall good course to kick start. More basics could be covered.

DM

5.0Reviewed Nov 14, 2020

Best guided project for an introduction to the PySpark

VB

5.0Reviewed Dec 10, 2021

quick start for a newbie with most basic information covered here, worth it.

SA

4.0Reviewed Aug 20, 2020

Ok, but needs a longer explanation of the functions that are used and its range of possibilites.

AK

5.0Reviewed Oct 31, 2023

Great Course, appropriate to begin the journey with Pysparks

AA

4.0Reviewed Jan 22, 2022

Quick and good course on pyspark for someone who is familiar with Pandas. Although some more depth can be added to the course by giving short brief on Spark and how it works.

SC

5.0Reviewed Jan 7, 2022

This project is really good. Gives you a hands-on experience on PySpark. I was able to write complex queries on my own after a while in this project. Very useful.

AM

4.0Reviewed Jan 29, 2021

It would have been better if more foundations of Spark framework had been provided

AB

5.0Reviewed Feb 4, 2025

Great introductory course in PySpark. Instructor gives time for you to formulate your response. Focuses on practice via repetition to help you build a strong grasp of basic syntax and functions.

AS

5.0Reviewed Aug 4, 2020

all is great except the idea of struct function it is a little bit confusing .however all awesome hand-on practice please do more coursers

DE

5.0Reviewed Nov 1, 2020

This course has help equip me with a lot of experience on data analysis and i really love it. Thank you Ahmad Varasteh.A big thanks to Coursera for creating such a wonderful opportunity.

All reviews

Showing: 20 of 55

Longlong Feng
1.0
Reviewed Aug 25, 2020
upendra madam
1.0
Reviewed Oct 18, 2020
Иван Темиров
4.0
Reviewed Aug 18, 2020
Gabriel Mendoza
3.0
Reviewed Nov 26, 2020
Agbaeze Henry
3.0
Reviewed Aug 6, 2020
Ahmed Salam
5.0
Reviewed Aug 5, 2020
Silvia Gloria Tamburini
5.0
Reviewed Nov 24, 2022
Antoine CARRE
5.0
Reviewed Apr 1, 2022
Papoj Thamjaroenporn
5.0
Reviewed Apr 13, 2025
Adib Behjat
5.0
Reviewed Feb 5, 2025
Derek Edwin Essiaw
5.0
Reviewed Nov 1, 2020
Shreeya Chafekar
5.0
Reviewed Jan 8, 2022
Oliver Emmanuel Argote Brito
5.0
Reviewed Jul 5, 2021
vignesh babu bm
5.0
Reviewed Dec 11, 2021
Akshay Kalia
5.0
Reviewed Nov 1, 2023
Devvrat Mungekar
5.0
Reviewed Nov 14, 2020
Francesco Angeli
5.0
Reviewed Feb 23, 2022
Batchu Varun
5.0
Reviewed Sep 6, 2023
Shivam Ghodke
5.0
Reviewed Feb 10, 2022
Kishore Kumar V
5.0
Reviewed Jun 26, 2021