Back to Introduction to Big Data with Spark and Hadoop

Introduction to Big Data with Spark and Hadoop

This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data and its application in big data analytics. You will also gain hands-on experience with big data processing tools like Apache Hadoop and Apache Spark. Bernard Marr defines big data as the digital trace that we are generating in this digital era. You will start the course by understanding what big data is and exploring how insights from big data can be harnessed for a variety of use cases. You’ll also explore how big data uses technologies like parallel processing, scaling, and data parallelism. Next, you will learn about Hadoop, an open-source framework that allows for the distributed processing of large data and its ecosystem. You will discover important applications that go hand in hand with Hadoop, like Distributed File System (HDFS), MapReduce, and HBase. You will become familiar with Hive, a data warehouse software that provides an SQL-like interface to efficiently query and manipulate large data sets. You’ll then gain insights into Apache Spark, an open-source processing engine that provides users with new ways to store and use big data. In this course, you will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the components that make up Apache Spark. You’ll learn about DataFrames and perform basic DataFrame operations and work with SparkSQL. Explore how Spark processes and monitors the requests your application submits and how you can track work using the Spark Application UI. This course has several hands-on labs to help you apply and practice the concepts you learn. You will complete Hadoop and Spark labs using various tools and technologies, including Docker, Kubernetes, Python, and Jupyter Notebooks.

Status: Apache Spark

Status: Development Environment

IntermediateCourse20 hours

Featured reviews

5.0Reviewed Jan 30, 2024

That is a well packaged course allow you crate bıg data applıcatıon. You can download as pdf files the application hands on practise and follow them and update them depending on ypur own appication

4.0Reviewed Jan 10, 2025

I found the course to be a great foundation for understanding how to work with large datasets using Hadoop and Spark, with clear explanations and practical examples.

5.0Reviewed Jun 7, 2024

A very very indepth couse by IBM. As someone who studies most courses on Coursera, I think IBM offers an in depth course so far

4.0Reviewed May 1, 2022

hands on lab and quizzes at the end of each session was very helpful

4.0Reviewed Jun 11, 2023

Synth voice narration quality is truly annoying. I'd expect better from IBM. Course materials are quite superficial, which I guess is acceptable for an introductory course.

5.0Reviewed Mar 7, 2022

the lecture was clearly understandible and I feel very gratefull to have this lecture thank you it was phenomenal😊

5.0Reviewed Nov 8, 2022

All the thinks I need to know about Big Data, Spark, Hadoop and Hive and explained in details

5.0Reviewed May 7, 2022

Fantastic blend of theory and practical (labs). The labs are short and have concise material.

5.0Reviewed Jul 15, 2023

Course was full of information and details for a beginner In big data technology

4.0Reviewed Nov 11, 2022

This is really helpful for me to understand Big Data and Apache Spark!

5.0Reviewed Jan 15, 2024

Great program to explore more about AI and Big Data

5.0Reviewed Jan 17, 2025

I have learned a lot from this course, and hopefully it would be helping me throughout my career ahead. Very well designed course, I like the way of teaching, and structured modules.

All reviews

Showing: 20 of 105

Prateek Pandey

1.0

Reviewed Jun 2, 2022

It's very hard to follow a bot's voice. Not at all interactive. A machine keeps speaking at a speed. That's not a way to learn something.

Peter Franek

2.0

Reviewed Feb 10, 2022

To me, this seems like general BS full of buzzwords, but you won't learn anything practical anout how to use these tools

Long Nguyễn Thanh

4.0

Reviewed Oct 25, 2021

Personally, I've found that the knowledge delivering method should be redone in a more attractive manner. Concretly, the fact of giving too many texts and theories in the slides easily make students get bored and lack of motivation to continue listening. For such kind hands-on courses, we need more and more practical exercises instead of too many theories lectures without showing any demo neither coding in the slides

Arnaud H

1.0

Reviewed Oct 31, 2021

IBM stopped to maintain/support it's platform where the jupyter exercises are hosted since more than 1 year !

It's a true shame, we discover at the end of the formation than it is impossible to finish it...

sohil gandhi

2.0

Reviewed Jun 21, 2022

It is Presentation deck with AI bot giving Voiceover narration. No one really teaching anything .

Mohd Shah

3.0

Reviewed Feb 16, 2022

Too dry and technical plus I couldn't get some of the codes with the notebook to work. There is a better video especially for Hadoop on Udacity that gives a much better explanation and I do hope you try to make the material more digestable. Each video is trying so hard to cram as much information into it as possible without giving the students much time to understand what is going.

Omar Hegazy

4.0

Reviewed Jan 30, 2022

It is a great course on the theory part but there is no practical part " I guess they was depending on the next course in the specialization", also it would be better for more non robotic way in delivering the lectures.

LUIS ACERO MORATA

1.0

Reviewed Feb 19, 2023

To much concepts and theory, i need more practice examples and less theory concepts

Daniel Alejandro Lavin Vizcaino

2.0

Reviewed Apr 1, 2025

As with other courses created by IBM, this one is just a hard-to-follow, AI-narrated, too-wide-and-yet-too-shallow information dump using a set of slides. You will NOT learn anything with sufficient depth to consider yourself "ready" to start working with Spark in a real-world project. My biggest itch was the quizzes, though. Not only are some of the questions badly/confusingly phrased or contain typos, but also, and perhaps most annoyingly, the answers are heavily opinion based, or flat out wrong. Some of the questions are also entirely irrelevant and out of place when it comes to practical Data Engineering.

Wanderson Martins

2.0

Reviewed Jan 22, 2024

The worst course of this specialization by far. The lectures demanded a strong dev background, threw tech concepts in our face without caring no dev people would understand. The project was focused on data manipulation, joins/selects, without exploring the true potentials of Spark. Suggestions: applied study cases, real distributed dataset so we could test the different resources allocations, design and run a real spark environment with real data, not only csvs that fit in the computer memory.

Gorana Bosic

3.0

Reviewed Dec 7, 2024

There are three major concerns I have with this course: 1) Content Depth and Structure: The course content feels overly basic, even for an introductory level. The lab exercises are too simplistic and fail to provide meaningful hands-on experience. There is no technical final assessment; the concluding quiz is entirely theoretical. Questions about IBM products or statistics like "the projected growth of data" seem irrelevant and out of place. 2)Lack of Conceptual Clarity and Practical Application: Key concepts like shuffling, grouping, and filtering are not explained or demonstrated in sufficient depth. Including practical examples to showcase their impact would significantly enhance understanding. The course neglects to explain the execution plan in Spark, particularly how it operates and its implications for application performance. This is a critical topic that deserves proper attention. The explanation of differences between RDDs and DataFrames is confusing, even for someone with basic knowledge. Similarly, the coverage of Spark SQL and functions lacks clarity and structure. A more straightforward approach—e.g., showing three ways to accomplish a task, comparing them, and contextualizing their usage in real-world scenarios—would be far more effective. The inclusion of Pandas is unexpected. While it’s noted that Spark RDDs/DataFrames can be created from Pandas DataFrames, there was too much emphasis on it, and in the same stage of course foundational functions like read.csv are not even mentioned. This omission contributes to a sense of disorganization and a lack of a coherent teaching strategy. 3)Presentation and Delivery: The video materials are AI-generated, and this detracts from the learning experience. Personally, I found the videos too short, with unnecessary and repetitive intro/outro segments that quickly became irritating. This format undermines engagement and suggests a lack of thoughtful design. A human instructor narrating the content could provide a more engaging and dynamic learning experience. A human presenter might also recognize and address the lack of substance in the explanations, resulting in clearer and more effective teaching. Overall, the course feels like a collection of loosely connected topics rather than a carefully designed curriculum. A greater focus on depth, practical application, and a more personalized delivery would significantly improve the learning experience.

Noel David

5.0

Reviewed Nov 9, 2022

All the thinks I need to know about Big Data, Spark, Hadoop and Hive and explained in details

Rorisang Sitoboli

5.0

Reviewed May 8, 2022

Fantastic blend of theory and practical (labs). The labs are short and have concise material.

David Arango Sampayo

4.0

Reviewed Jun 22, 2022

Since I already had a previous banckground using Spark this course refreshed the theoretical foundations about its main topics: RDD, parellelism amog other. It was good to remember the way Spark works. On the other hand I must say there are some troubles with some labs environments where is not possible to work on the practical exercises.

Natale Foata

4.0

Reviewed Nov 21, 2021

Good course on the theoretical concepts needed to understand these tools. However, there is not much practice.

Santiago Zuluaga Ayala

3.0

Reviewed Sep 29, 2022

No project. Labs are only copy-paste commands.

A lot of contents that are only explained using text (it will be nice if diagrams, images, examples are more used!)

In general, it's a OK course, with the following course it could be a overall good in order to start your Big Data Journey!

kalpana Gelli

3.0

Reviewed May 17, 2023

i feel, it should be good if they add more hands-on work for learning spark and Hadoop

Fran Moreno

2.0

Reviewed Dec 14, 2023

Muy pocos ejemplos prácticos para comprender el funcionamiento. La teoría se podría aplicar con muchos casos prácticos fácilmente aplicable en los hands-on-lab como en otros cursos que he completado anteriormente.

Shailendra Paliwal

5.0

Reviewed Dec 18, 2025

The course provided a clear and practical introduction to Big Data concepts using Spark and Hadoop, making complex topics easy to understand. Hands-on examples and real-world scenarios helped build strong confidence in working with distributed data systems.

Antonio Guadagno

5.0

Reviewed Nov 15, 2022

I found this course very interesting, great for an aspiring data engineer! It starts by introducing the concept of big data in general and then goes into more and more detail, analyzing Hadoop and Spark in depth. I highly recommend it!