Johns Hopkins University
Big Data Processing Using Hadoop Specialization
Johns Hopkins University

Big Data Processing Using Hadoop Specialization

Master Big Data Processing with Hadoop. Gain hands-on experience with Hadoop tools and techniques to efficiently process, analyze, and manage big data in real-world applications.

Karthik Shyamsunder

Instructor: Karthik Shyamsunder

Access provided by AUB

Get in-depth knowledge of a subject
Intermediate level

Recommended experience

12 weeks to complete
at 5 hours a week
Flexible schedule
Learn at your own pace
Get in-depth knowledge of a subject
Intermediate level

Recommended experience

12 weeks to complete
at 5 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Gain expertise in Hadoop ecosystem components like HDFS, YARN, and MapReduce for big data processing and management across various tasks.

  • Learn to set up, configure, and utilize tools like Hive, Pig, HBase, and Spark for efficient data analysis, processing, and real-time management.

  • Develop advanced programming techniques for MapReduce, optimization methods, and parallelism strategies to handle large-scale data sets effectively.

  • Understand the architecture and functionality of Hadoop and its components, applying them to solve complex data challenges in real-world scenarios.

Details to know

Shareable certificate

Add to your LinkedIn profile

Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Advance your subject-matter expertise

  • Learn in-demand skills from university and industry experts
  • Master a subject or tool with hands-on projects
  • Develop a deep understanding of key concepts
  • Earn a career certificate from Johns Hopkins University

Specialization - 4 course series

What you'll learn

  • Define Big Data, explore its relevance in analytics and data science, and understand trends shaping modern data processing technologies.

  • Examine Hadoop architecture, its ecosystem, and subprojects, distinguishing distributions and their roles in Big Data solutions.

  • Acquire practical skills to install, configure, and run Hadoop on a Linux virtual machine, enabling effective Big Data processing.

Skills you'll gain

Category: Apache Hadoop
Category: Distributed Computing
Category: Big Data
Category: Linux
Category: System Configuration
Category: Data Processing
Category: Data Infrastructure
Category: Software Installation
Category: Scalability
Category: Analytics
Category: Data Science

What you'll learn

  • Understand HDFS architecture, components, and how it ensures scalability and availability for big data processing.

  • Learn to configure Hadoop for Java programming and perform file CRUD operations using HDFS APIs.

  • Master advanced HDFS programming concepts like compression, serialization, and working with specialized file structures like Sequence and Map files.

Skills you'll gain

Category: File Systems
Category: Data Storage
Category: Apache Hadoop
Category: Distributed Computing
Category: Java
Category: Scalability
Category: Development Environment
Category: File Management
Category: Big Data
Category: Systems Architecture
Category: Data Structures
Category: Data Processing
Category: Infrastructure Architecture

What you'll learn

  • Learn the fundamentals of YARN and MapReduce architectures, including how they work together to process large-scale data efficiently.

  • Understand and implement Mapper and Reducer parallelism in MapReduce jobs to improve data processing efficiency and scalability.

  • Apply optimization techniques such as combiners, partitioners, and compression to enhance the performance and I/O operations of MapReduce jobs.

  • Explore advanced concepts like multithreading, speculative execution, input/output formats, and how to avoid common MapReduce anti-patterns.

Skills you'll gain

Category: Distributed Computing
Category: Apache Hadoop
Category: Data Processing
Category: Software Architecture
Category: Performance Tuning
Category: Java
Category: Scalability
Category: Big Data
Category: System Configuration

What you'll learn

  • Learn to set up and configure Hive, Pig, HBase, and Spark for efficient big data analysis and processing within the Hadoop ecosystem.

  • Master Hive’s SQL-like queries for data retrieval, management, and optimization using partitions and joins to enhance query performance.

  • Understand Pig Latin for scripting data transformations, including the use of operators like join and debug to process large datasets effectively.

  • Gain expertise in NoSQL databases with HBase for real-time read/write operations, and use Spark’s core programming model for fast data processing.

Skills you'll gain

Category: Query Languages
Category: Data Processing
Category: Apache Spark
Category: Apache Hadoop
Category: Data Transformation
Category: Big Data
Category: NoSQL
Category: Apache Hive
Category: Data Manipulation
Category: Data Management
Category: Scripting Languages
Category: SQL

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Karthik Shyamsunder
Johns Hopkins University
4 Courses1,018 learners

Offered by

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."