This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You'll feel empowered to have conversations about big data and the data analysis process.
Welcome to the first module of the Big Data Platform course. This first module will provide insight into Big Data Hype, its technologies opportunities and challenges. We will take a deeper look into the Hadoop stack and tool and technologies associated with Big Data solutions.
What's included
7 videos4 readings1 assignment
Show info about module content
7 videos•Total 53 minutes
Hadoop Stack Basics•4 minutes
The Apache Framework: Basic Modules•4 minutes
Hadoop Distributed File System (HDFS)•6 minutes
The Hadoop "Zoo"•5 minutes
Hadoop Ecosystem Major Components•11 minutes
Exploring the Cloudera VM: Hands-On Part 1•16 minutes
Exploring the Cloudera VM: Hands-On Part 2•6 minutes
4 readings•Total 40 minutes
Apache Hadoop Ecosystem•10 minutes
Lesson 1 Slides (PDF)•10 minutes
Hardware & Software Requirements•10 minutes
Lesson 2 Slides - Cloudera VM Tour•10 minutes
1 assignment•Total 30 minutes
Basic Hadoop Stack•30 minutes
Introduction to the Hadoop Stack
Module 2•3 hours to complete
Module details
In this module we will take a detailed look at the Hadoop stack ranging from the basic HDFS components, to application execution frameworks, and languages, services.
What's included
10 videos6 readings3 assignments
Show info about module content
10 videos•Total 70 minutes
Overview of the Hadoop Stack•4 minutes
The Hadoop Distributed File System (HDFS) and HDFS2•9 minutes
Lesson 3: Hadoop-Based Applications Overview - All Slides•10 minutes
Command list for Applications Slides•10 minutes
Tips to handle service connection errors•10 minutes
References for Applications•10 minutes
3 assignments•Total 74 minutes
Overview of Hadoop Stack•30 minutes
Hadoop Execution Environment•14 minutes
Hadoop Applications•30 minutes
Introduction to Hadoop Distributed File System (HDFS)
Module 3•3 hours to complete
Module details
In this module we will take a detailed look at the Hadoop Distributed File System (HDFS). We will cover the main design goals of HDFS, understand the read/write process to HDFS, the main configuration parameters that can be tuned to control HDFS performance and robustness, and get an overview of the different ways you can access data on HDFS.
What's included
9 videos5 readings3 assignments
Show info about module content
9 videos•Total 58 minutes
Overview of HDFS Architecture•5 minutes
The HDFS Performance Envelope•6 minutes
Read/Write Processes in HDFS•4 minutes
HDFS Tuning Parameters•6 minutes
HDFS Performance and Robustness•10 minutes
Overview of HDFS Access, APIs, and Applications•5 minutes
HDFS Commands•9 minutes
Native Java API for HDFS•5 minutes
REST API for HDFS•9 minutes
5 readings•Total 50 minutes
Lesson 1: Introduction to HDFS - Slides•10 minutes
HDFS references•10 minutes
Lesson 2: HDFS Performance and Tuning - Slides•10 minutes
HDFS performance,tuning, and robustness•30 minutes
Accessing HDFS•30 minutes
Introduction to Map/Reduce
Module 4•7 hours to complete
Module details
This module will introduce Map/Reduce concepts and practice. You will learn about the big idea of Map/Reduce and you will learn how to design, implement, and execute tasks in the map/reduce framework. You will also learn the trade-offs in map/reduce and how that motivates other tools.
Computational Costs of Vector Multiplication•4 minutes
MapReduce Summary•2 minutes
3 readings•Total 30 minutes
Lesson 1: Introduction to MapReduce - Slides•10 minutes
A note on debugging map/reduce programs.•10 minutes
Lesson 2: MapReduce Examples and Principles - Slides•10 minutes
1 assignment•Total 30 minutes
Lesson 1 Review•30 minutes
2 programming assignments•Total 360 minutes
Running Wordcount with Hadoop streaming, using Python code•180 minutes
Joining Data•180 minutes
Spark
Module 5•9 hours to complete
Module details
Welcome to module 5, Introduction to Spark, this week we will focus on the Apache Spark cluster computing framework, an important contender of Hadoop MapReduce in the Big Data Arena.
Spark provides great performance advantages over Hadoop MapReduce,especially for iterative algorithms, thanks to in-memory caching. Also, gives Data Scientists an easier way to write their analysis pipeline in Python and Scala,even providing interactive shells to play live with data.
UC San Diego is an academic powerhouse and economic engine, recognized as one of the top 10 public universities by U.S. News and World Report. Innovation is central to who we are and what we do. Here, students learn that knowledge isn't just acquired in the classroom—life is their laboratory.
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."
Learner reviews
4.0
3,325 reviews
5 stars
45.33%
4 stars
28.08%
3 stars
12.37%
2 stars
6.77%
1 star
7.43%
Showing 3 of 3325
S
SS
5·
Reviewed on Jan 18, 2017
This is a great introductory course for entry level Hadoop learner. I hope more content can be added into this course. This course overlaps with other big data courses offered by USDC.
C
CE
5·
Reviewed on Jan 12, 2016
Excellent material, teachers and all in place to head studies in the right direction to learn Hadoop and related tools..!Thanks very much for the great time shared with us..!
D
DZ
4·
Reviewed on Dec 21, 2015
I don't think the answers of quiz are accurate. Some answers may be true in some cases, but not in others. Difficult to choose, even review the videos and do some google.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.