Chevron Left
Back to Big Data Essentials: HDFS, MapReduce and Spark RDD

Big Data Essentials: HDFS, MapReduce and Spark RDD, Yandex

224 ratings
65 reviews

About this Course

Have you ever heard about such technologies as HDFS, MapReduce, Spark? Always wanted to learn these new tools but missed concise starting material? Don’t miss this course either! In this 6-week course you will: - learn some basic technologies of the modern Big Data landscape, namely: HDFS, MapReduce and Spark; - be guided both through systems internals and their applications; - learn about distributed file systems, why they exist and what function they serve; - grasp the MapReduce framework, a workhorse for many modern Big Data applications; - apply the framework to process texts and solve sample business cases; - learn about Spark, the next-generation computational framework; - build a strong understanding of Spark basic concepts; - develop skills to apply these tools to creating solutions in finance, social networks, telecommunications and many other fields. Your learning experience will be as close to real life as possible with the chance to evaluate your practical assignments on a real cluster. No mocking, a friendly considerate atmosphere to make the process of your learning smooth and enjoyable. Get ready to work with real datasets alongside with real masters! Special thanks to: - Prof. Mikhail Roytberg, APT dept., MIPT, who was the initial reviewer of the project, the supervisor and mentor of half of the BigData team. He was the one, who helped to get this show on the road. - Oleg Sukhoroslov (PhD, Senior Researcher at IITP RAS), who has been teaching MapReduce, Hadoop and friends since 2008. Now he is leading the infrastructure team. - Oleg Ivchenko (PhD student APT dept., MIPT), Pavel Akhtyamov (MSc. student at APT dept., MIPT) and Vladimir Kuznetsov (Assistant at P.G. Demidov Yaroslavl State University), superbrains who have developed and now maintain the infrastructure used for practical assignments in this course. - Asya Roitberg, Eugene Baulin, Marina Sudarikova. These people never sleep to babysit this course day and night, to make your learning experience productive, smooth and exciting....

Top reviews


Nov 22, 2018

Everything in this course is new to me, but it provides me with many practice so I can gradually get familiar with all these new stuff. I find it a bit challenging, but overall it's quite good.


Jun 28, 2018

Absolutely essential for everyone who wants a proper introduction to HDFS, MapReduce and Spark. Brought to you by a great team of geniuses of their time ;)

Filter by:

62 Reviews

By Leonid Moiseev

Dec 12, 2018

The course is advertised as a practical one. But the majority of time is spent on outdated technologies like Map/Reduce. It would be more productive to go deeper into Spark. Assignments are not difficult but it takes a lot of time and attempts to figure out what exactly the authors wanted. The worst part is the grader and how it organized. Nevertheless you can learn a few things even if you are working in this industry.

By Павел Сорокин

Dec 11, 2018

I think students could choose MapReduce or Spark. And about shortest path task. Provided by authors code runs out of memory while checking on cluster. After a lot of time playing with spark paramets and cache/persist i found solution without calculating all distances, but... Also there was no information about spark executors parameters on course...

Simple hint could save a lot of stupidly wasted time.

But it's not major, anyway thanks!

By Marco Gorelli

Dec 05, 2018

Interesting, useful, informative, accessible (and sometimes funny!) lectures.

Stimulating assignments.

Fast responses from instructors/mentors.

Unfortunately, I often spent more time trying to get my assignments to pass the automatic grader than on solving them. This made the course a bit frustrating at times.

By navneet kamboj

Nov 28, 2018

Awesome content...great learning ...:)

By Yiming Huang

Nov 22, 2018

Everything in this course is new to me, but it provides me with many practice so I can gradually get familiar with all these new stuff. I find it a bit challenging, but overall it's quite good.

By Bingnan Li

Nov 15, 2018

There are a lot of unclear things about the homeworks. So even when you can run your homework successfully in the docker image, you still can't pass the online tests. Besides, the error msgs shown by online test system (not logs) are also unclear. It can't tell you the real reason of failure.

By Chinmaya barik

Nov 08, 2018

It was absolutely a great learning experience with Yandex. Many of doubt got cleared after watching these awesome videos.Lectures have in depth knowledge and understanding about Bigdata and Spark.Looking forward for the next course.

By Harish Sakthinehru

Oct 26, 2018

Too advanced for beginners. Some working knowledge needed in big-data stream before starting this course.

By Maryna Donskikh

Oct 19, 2018

Accent is horrible, it is hard to listen, a lot of mistakes in the words pronunciation. But the idea of course is good.

By Dmitry Pyryeskin

Oct 19, 2018

Quizzes in this course ask questions that are not covered in lectures. Subtitles are full of mistakes and typos. Other than that, the material of the course is very interesting.