Chevron Left
Back to Introduction to Big Data with Spark and Hadoop

Learner Reviews & Feedback for Introduction to Big Data with Spark and Hadoop by IBM

4.4
stars
461 ratings

About the Course

This self-paced IBM course will teach you all about big data! You will become familiar with the characteristics of big data and its application in big data analytics. You will also gain hands-on experience with big data processing tools like Apache Hadoop and Apache Spark. Bernard Marr defines big data as the digital trace that we are generating in this digital era. You will start the course by understanding what big data is and exploring how insights from big data can be harnessed for a variety of use cases. You’ll also explore how big data uses technologies like parallel processing, scaling, and data parallelism. Next, you will learn about Hadoop, an open-source framework that allows for the distributed processing of large data and its ecosystem. You will discover important applications that go hand in hand with Hadoop, like Distributed File System (HDFS), MapReduce, and HBase. You will become familiar with Hive, a data warehouse software that provides an SQL-like interface to efficiently query and manipulate large data sets. You’ll then gain insights into Apache Spark, an open-source processing engine that provides users with new ways to store and use big data. In this course, you will discover how to leverage Spark to deliver reliable insights. The course provides an overview of the platform, going into the components that make up Apache Spark. You’ll learn about DataFrames and perform basic DataFrame operations and work with SparkSQL. Explore how Spark processes and monitors the requests your application submits and how you can track work using the Spark Application UI. This course has several hands-on labs to help you apply and practice the concepts you learn. You will complete Hadoop and Spark labs using various tools and technologies, including Docker, Kubernetes, Python, and Jupyter Notebooks....

Top reviews

JS

May 1, 2022

hands on lab and quizzes at the end of each session was very helpful

KK

Jan 30, 2024

That is a well packaged course allow you crate bıg data applıcatıon. You can download as pdf files the application hands on practise and follow them and update them depending on ypur own appication

Filter by:

1 - 25 of 100 Reviews for Introduction to Big Data with Spark and Hadoop

By Prateek P

•

Jun 2, 2022

It's very hard to follow a bot's voice. Not at all interactive. A machine keeps speaking at a speed. That's not a way to learn something.

By Peter F

•

Feb 10, 2022

To me, this seems like general BS full of buzzwords, but you won't learn anything practical anout how to use these tools

By Long N T

•

Oct 25, 2021

Personally, I've found that the knowledge delivering method should be redone in a more attractive manner. Concretly, the fact of giving too many texts and theories in the slides easily make students get bored and lack of motivation to continue listening. For such kind hands-on courses, we need more and more practical exercises instead of too many theories lectures without showing any demo neither coding in the slides

By Arnaud H

•

Oct 31, 2021

IBM stopped to maintain/support it's platform where the jupyter exercises are hosted since more than 1 year !

It's a true shame, we discover at the end of the formation than it is impossible to finish it...

By sohil g

•

Jun 21, 2022

It is Presentation deck with AI bot giving Voiceover narration. No one really teaching anything .

By Mohd S B S H

•

Feb 16, 2022

Too dry and technical plus I couldn't get some of the codes with the notebook to work. There is a better video especially for Hadoop on Udacity that gives a much better explanation and I do hope you try to make the material more digestable. Each video is trying so hard to cram as much information into it as possible without giving the students much time to understand what is going.

By Omar H

•

Jan 30, 2022

It is a great course on the theory part but there is no practical part " I guess they was depending on the next course in the specialization", also it would be better for more non robotic way in delivering the lectures.

By LUIS A M

•

Feb 19, 2023

To much concepts and theory, i need more practice examples and less theory concepts

By Wanderson M

•

Jan 22, 2024

The worst course of this specialization by far. The lectures demanded a strong dev background, threw tech concepts in our face without caring no dev people would understand. The project was focused on data manipulation, joins/selects, without exploring the true potentials of Spark. Suggestions: applied study cases, real distributed dataset so we could test the different resources allocations, design and run a real spark environment with real data, not only csvs that fit in the computer memory.

By Noel D

•

Nov 9, 2022

All the thinks I need to know about Big Data, Spark, Hadoop and Hive and explained in details

By Rorisang S

•

May 8, 2022

Fantastic blend of theory and practical (labs). The labs are short and have concise material.

By David A S

•

Jun 22, 2022

Since I already had a previous banckground using Spark this course refreshed the theoretical foundations about its main topics: RDD, parellelism amog other. It was good to remember the way Spark works. On the other hand I must say there are some troubles with some labs environments where is not possible to work on the practical exercises.

By Natale F

•

Nov 21, 2021

Good course on the theoretical concepts needed to understand these tools. However, there is not much practice.

By Gorana B

•

Dec 7, 2024

There are three major concerns I have with this course: 1) Content Depth and Structure: The course content feels overly basic, even for an introductory level. The lab exercises are too simplistic and fail to provide meaningful hands-on experience. There is no technical final assessment; the concluding quiz is entirely theoretical. Questions about IBM products or statistics like "the projected growth of data" seem irrelevant and out of place. 2)Lack of Conceptual Clarity and Practical Application: Key concepts like shuffling, grouping, and filtering are not explained or demonstrated in sufficient depth. Including practical examples to showcase their impact would significantly enhance understanding. The course neglects to explain the execution plan in Spark, particularly how it operates and its implications for application performance. This is a critical topic that deserves proper attention. The explanation of differences between RDDs and DataFrames is confusing, even for someone with basic knowledge. Similarly, the coverage of Spark SQL and functions lacks clarity and structure. A more straightforward approach—e.g., showing three ways to accomplish a task, comparing them, and contextualizing their usage in real-world scenarios—would be far more effective. The inclusion of Pandas is unexpected. While it’s noted that Spark RDDs/DataFrames can be created from Pandas DataFrames, there was too much emphasis on it, and in the same stage of course foundational functions like read.csv are not even mentioned. This omission contributes to a sense of disorganization and a lack of a coherent teaching strategy. 3)Presentation and Delivery: The video materials are AI-generated, and this detracts from the learning experience. Personally, I found the videos too short, with unnecessary and repetitive intro/outro segments that quickly became irritating. This format undermines engagement and suggests a lack of thoughtful design. A human instructor narrating the content could provide a more engaging and dynamic learning experience. A human presenter might also recognize and address the lack of substance in the explanations, resulting in clearer and more effective teaching. Overall, the course feels like a collection of loosely connected topics rather than a carefully designed curriculum. A greater focus on depth, practical application, and a more personalized delivery would significantly improve the learning experience.

By Santiago Z A

•

Sep 29, 2022

No project. Labs are only copy-paste commands.

A lot of contents that are only explained using text (it will be nice if diagrams, images, examples are more used!)

In general, it's a OK course, with the following course it could be a overall good in order to start your Big Data Journey!

By kalpana G

•

May 17, 2023

i feel, it should be good if they add more hands-on work for learning spark and Hadoop

By Fran M

•

Dec 14, 2023

Muy pocos ejemplos prácticos para comprender el funcionamiento. La teoría se podría aplicar con muchos casos prácticos fácilmente aplicable en los hands-on-lab como en otros cursos que he completado anteriormente.

By Antonio G

•

Nov 15, 2022

I found this course very interesting, great for an aspiring data engineer! It starts by introducing the concept of big data in general and then goes into more and more detail, analyzing Hadoop and Spark in depth. I highly recommend it!

By Kenan

•

Jan 30, 2024

That is a well packaged course allow you crate bıg data applıcatıon. You can download as pdf files the application hands on practise and follow them and update them depending on ypur own appication

By TEERTH K

•

Jan 18, 2025

I have learned a lot from this course, and hopefully it would be helping me throughout my career ahead. Very well designed course, I like the way of teaching, and structured modules.

By Marcin S

•

Mar 26, 2024

Solid and in-depth introduction to core elements of Spark. All you need to know on RDD, architecture, drivers and UI in one place!

By Joseph O

•

Jun 8, 2024

A very very indepth couse by IBM. As someone who studies most courses on Coursera, I think IBM offers an in depth course so far

By Aditya P S P

•

Mar 7, 2022

the lecture was clearly understandible and I feel very gratefull to have this lecture

thank you it was phenomenal😊

By Mahmoud G

•

Jul 16, 2023

Course was full of information and details for a beginner In big data technology

By CHIU W S

•

Oct 28, 2022

well-structured course with comprehensive content and practical skills