1.1 Introduction to Map-Reduce

Course video 2 of 29

In this module, we will learn about the MapReduce paradigm, and how it can be used to write distributed programs that analyze data represented as key-value pairs. A MapReduce program is defined via user-specified map and reduce functions, and we will learn how to write such programs in the Apache Hadoop and Spark projects. TheMapReduce paradigm can be used to express a wide range of parallel algorithms. One example that we will study is computation of the TermFrequency – Inverse Document Frequency (TF-IDF) statistic used in document mining; this algorithm uses a fixed (non-iterative) number of map and reduce operations. Another MapReduce example that we will study is parallelization of the PageRank algorithm. This algorithm is an example of iterative MapReduce computations, and is also the focus of the mini-project associated with this module.

About Coursera

Courses, Specializations, and Online Degrees taught by top instructors from the world's best universities and educational institutions.

Join a community of 40 million learners from around the world
Earn a skill-based course certificate to apply your knowledge
Gain confidence in your skills and further your career