This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You'll feel empowered to have conversations about big data and the data analysis process.



Hadoop Platform and Application Framework



Instructors: Natasha Balac, Ph.D.
Access provided by Justice Through Code at Columbia University
150,778 already enrolled
(3,325 reviews)
Skills you'll gain
Details to know

Add to your LinkedIn profile
11 assignments
See how employees at top companies are mastering in-demand skills

There are 5 modules in this course
Welcome to the first module of the Big Data Platform course. This first module will provide insight into Big Data Hype, its technologies opportunities and challenges. We will take a deeper look into the Hadoop stack and tool and technologies associated with Big Data solutions.
What's included
7 videos4 readings1 assignment
In this module we will take a detailed look at the Hadoop stack ranging from the basic HDFS components, to application execution frameworks, and languages, services.
What's included
10 videos6 readings3 assignments
In this module we will take a detailed look at the Hadoop Distributed File System (HDFS). We will cover the main design goals of HDFS, understand the read/write process to HDFS, the main configuration parameters that can be tuned to control HDFS performance and robustness, and get an overview of the different ways you can access data on HDFS.
What's included
9 videos5 readings3 assignments
This module will introduce Map/Reduce concepts and practice. You will learn about the big idea of Map/Reduce and you will learn how to design, implement, and execute tasks in the map/reduce framework. You will also learn the trade-offs in map/reduce and how that motivates other tools.
What's included
9 videos3 readings1 assignment2 programming assignments
Welcome to module 5, Introduction to Spark, this week we will focus on the Apache Spark cluster computing framework, an important contender of Hadoop MapReduce in the Big Data Arena. Spark provides great performance advantages over Hadoop MapReduce,especially for iterative algorithms, thanks to in-memory caching. Also, gives Data Scientists an easier way to write their analysis pipeline in Python and Scala,even providing interactive shells to play live with data.
What's included
10 videos4 readings3 assignments2 programming assignments
Instructors


Offered by
Why people choose Coursera for their career




Learner reviews
3,325 reviews
- 5 stars45.33% 
- 4 stars28.08% 
- 3 stars12.37% 
- 2 stars6.77% 
- 1 star7.43% 
Showing 3 of 3325
Reviewed on Mar 29, 2016
Very good overview course. I didn't like the sample data for shows/channels, but it worked still. Perhaps there's a better example we can use for the assignments.
Reviewed on Jan 18, 2017
This is a great introductory course for entry level Hadoop learner. I hope more content can be added into this course. This course overlaps with other big data courses offered by USDC.
Reviewed on Nov 15, 2015
I really enjoyed this course. It helped me a lot in making my first steps through Big Data. I'm really looking forward for the next courses in Big Data specialization.
Explore more from Data Science
 - Johns Hopkins University 
 - Johns Hopkins University 
 - Johns Hopkins University 


