Big data is the area of informatics focusing on datasets whose size is beyond the ability of typical database and other software tools to capture, store, analyze and manage. This course provides a rapid immersion into the area of big data and the technologies which have recently emerged to manage it.

Big Data Technologies

Big Data Technologies
This course is part of multiple programs.
This course is part of multiple programs

Instructor: Yousef Elmehdwi
Access provided by Vivekananda Global University
Recommended experience
Recommended experience
Intermediate level
Familiarity with Linux Shell (Bash)/Operating Systems, Familiarity with Relational Database (SQL)/Management Systems
Recommended experience
Recommended experience
Intermediate level
Familiarity with Linux Shell (Bash)/Operating Systems, Familiarity with Relational Database (SQL)/Management Systems
What you'll learn
Understanding and identifying use cases and domains of Big Data problems
Selecting and implementing technical solutions involving Big Data systems
Develop and use various open source software systems (Apache) in the Big Data tech stack
Operate and run various cloud computing software services (AWS) in the Big Data infrastructure space
Skills you'll gain
Details to know

Add to your LinkedIn profile
54 assignments
See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate

There are 9 modules in this course
Welcome to Big Data Technologies! In Module 1, students will develop a foundational understanding of analytic data, its inherent value, and the methods to transform raw data into valuable insights. This module covers the challenges of handling large datasets, including their collection, processing, and analysis, while providing a comprehensive overview of Big Data's origins, properties, and real-world applications. Additionally, students will explore the economic, logistical, and ethical concerns associated with Big Data, alongside the professional advantages for data scientists proficient in Big Data analysis.
What's included
16 videos10 readings8 assignments1 discussion prompt
16 videos•Total 104 minutes
- Course Overview•4 minutes
- Instructor Introduction•2 minutes
- Module 1 Introduction•2 minutes
- From Data to Value - Part 1•10 minutes
- From Data to Value - Part 2•7 minutes
- Big Data Overview - Part 1•8 minutes
- Big Data Overview - Part 2•6 minutes
- Confounding Factors - Part 1•8 minutes
- Confounding Factors - Part 2•7 minutes
- Confounding Factors - Part 3•6 minutes
- Big Data Challenges•6 minutes
- Big Data Benefits - Part 1•6 minutes
- Big Data Benefits - Part 2•5 minutes
- Big Data Technology - Part 1•10 minutes
- Big Data Technology - Part 2•8 minutes
- Generic Distributed Storage Systems and Execution Engines •11 minutes
10 readings•Total 500 minutes
- Syllabus•10 minutes
- Module 1 Introduction Reading•60 minutes
- From Data to Value•60 minutes
- Big Data Overview•60 minutes
- Confounding Factors•60 minutes
- Big Data Challenges•60 minutes
- Big Data Benefits•60 minutes
- Big Data Technology•60 minutes
- Generic Distributed Storage Systems and Execution Engines•60 minutes
- Module 1 Summary•10 minutes
8 assignments•Total 330 minutes
- From Data to Value Quiz•15 minutes
- Big Data Overview Quiz•15 minutes
- Confounding Factors Quiz•15 minutes
- Big Data Challenges Quiz•15 minutes
- Big Data Benefits Quiz•15 minutes
- Big Data Technology Quiz•15 minutes
- Creating an AWS Account Assignment•120 minutes
- Module 1 Summative Assessment•120 minutes
1 discussion prompt•Total 10 minutes
- Meet and Greet Discussion•10 minutes
Module 2 introduces students to the challenges of building and managing distributed systems for big data storage and processing. It covers Hadoop’s origins, concepts, core components, and key characteristics, while exploring the Hadoop ecosystem's tools and services. Students will gain an understanding of distributed file systems, specifically HDFS, YARN's resource management, and various technologies for effective big data storage and organization.
What's included
13 videos7 readings6 assignments
13 videos•Total 91 minutes
- Module 2 Introduction•2 minutes
- Hadoop - Part 1•9 minutes
- Hadoop - Part 2•6 minutes
- Hadoop - Part 3•7 minutes
- Hadoop Distributed File System Overview - Part 1•7 minutes
- Hadoop Distributed File System Overview - Part 2•8 minutes
- Hadoop Distributed File System Overview - Part 3•6 minutes
- Using the Hadoop Distributed File System - Part 1•9 minutes
- Using the Hadoop Distributed File System - Part 2•5 minutes
- Cloud Object Storage for Big Data - Part 1•9 minutes
- Cloud Object Storage for Big Data - Part 2•8 minutes
- Yet Another Resource Negotiator - Part 1•9 minutes
- Yet Another Resource Negotiator - Part 2•6 minutes
7 readings•Total 370 minutes
- Module 2 Introduction Reading•60 minutes
- Hadoop•60 minutes
- Hadoop Distributed File System Overview•60 minutes
- Using the Hadoop Distributed File System•60 minutes
- Cloud Object Storage for Big Data•60 minutes
- Yet Another Resource Negotiator•60 minutes
- Module 2 Summary•10 minutes
6 assignments•Total 195 minutes
- Hadoop Quiz•15 minutes
- Hadoop Distributed File System (HDFS) Overview Quiz•15 minutes
- Using HDFS Quiz•15 minutes
- Cloud Object Storage Quiz•15 minutes
- Yet Another Resource Negotiator (YARN) Quiz•15 minutes
- Module 2 Summative Assessment•120 minutes
In Module 3, students will explore the differences between processing small to moderate versus massive data volumes through distributed computing. This module covers the key concepts of the MapReduce framework, including how it breaks down large data processing tasks into smaller, parallel tasks for efficient execution. Students will also learn about the phases of MapReduce, the role of map and reduce functions, optimization patterns, and the benefits and limitations of various development approaches, including Java-based MapReduce and Hadoop Streaming.
What's included
18 videos8 readings7 assignments
18 videos•Total 120 minutes
- Module 3 Introduction•2 minutes
- The Path to MapReduce - Part 1•8 minutes
- The Path to MapReduce - Part 2•7 minutes
- MapReduce Overview - Part 1•6 minutes
- MapReduce Overview - Part 2•5 minutes
- MapReduce Overview - Part 3•7 minutes
- MapReduce Concepts - Part 1•6 minutes
- MapReduce Concepts - Part 2•5 minutes
- MapReduce Concepts - Part 3•6 minutes
- MapReduce Concepts - Part 4•10 minutes
- MapReduce Examples - Part 1•9 minutes
- MapReduce Examples - Part 2•5 minutes
- MapReduce Programming - Part 1•8 minutes
- MapReduce Programming - Part 2•10 minutes
- MapReduce Programming - Part 3•6 minutes
- MapReduce Optimization - Part 1•8 minutes
- MapReduce Optimization - Part 2•4 minutes
- MapReduce Optimization - Part 3•8 minutes
8 readings•Total 430 minutes
- Module 3 Introduction Reading•60 minutes
- The Path to MapReduce•60 minutes
- MapReduce Overview•60 minutes
- MapReduce Concepts•60 minutes
- MapReduce Examples•60 minutes
- MapReduce Programming•60 minutes
- MapReduce Optimization•60 minutes
- Module 3 Summary•10 minutes
7 assignments•Total 210 minutes
- The Path to MapReduce Quiz•15 minutes
- MapReduce Overview Quiz•15 minutes
- MapReduce Concepts Quiz•15 minutes
- MapReduce Examples Quiz•15 minutes
- MapReduce Programming•15 minutes
- MapReduce Optimization•15 minutes
- Module 3 Summative Assessment•120 minutes
In Module 4, students will explore Apache Spark as a powerful distributed processing framework for interactive, batch, and streaming tasks. This module covers Spark's core functionalities, including machine learning, graph processing, and handling structured and unstructured data, while highlighting its in-memory processing potential and unified nature. Students will compare Spark with MapReduce, learn about Spark's primary components, execution architecture, Resilient Distributed Datasets (RDDs), DataFrames, Datasets, and the various methods for creating and optimizing DataFrames for efficient data processing.
What's included
25 videos7 readings6 assignments
25 videos•Total 143 minutes
- Module 4 Introduction•2 minutes
- Spark Overview - Part 1•9 minutes
- Spark Overview - Part 2•9 minutes
- Spark Components - Part 1•7 minutes
- Spark Components - Part 2•6 minutes
- Spark Components - Part 3•6 minutes
- Spark Components - Part 4•7 minutes
- Spark Components - Part 5•3 minutes
- Spark Concepts - Part 1•7 minutes
- Spark Concepts - Part 2•6 minutes
- Spark Concepts - Part 3•5 minutes
- Spark Concepts - Part 4•7 minutes
- Spark Concepts - Part 5•6 minutes
- Spark Concepts - Part 6•4 minutes
- Spark Concepts - Part 7•7 minutes
- Spark Concepts - Part 8•3 minutes
- Spark Concepts - Part 9•5 minutes
- Spark Concepts - Part 10•5 minutes
- Creating Spark DataFrames - Part 1•6 minutes
- Creating Spark DataFrames - Part 2•9 minutes
- Creating Spark DataFrames - Part 3•6 minutes
- Creating Spark DataFrames - Part 4•4 minutes
- Defining Spark Schemas - Part 1•6 minutes
- Defining Spark Schemas - Part 2•5 minutes
- Defining Spark Schemas - Part 3•2 minutes
7 readings•Total 370 minutes
- Module 4 Introduction Reading•60 minutes
- Spark Overview•60 minutes
- Spark Components•60 minutes
- Spark Concepts•60 minutes
- Creating Spark DataFrames•60 minutes
- Defining Spark Schemas•60 minutes
- Module 4 Summary•10 minutes
6 assignments•Total 195 minutes
- Spark Overview Quiz•15 minutes
- Spark Components Quiz•15 minutes
- Concepts Quiz•15 minutes
- Creating Spark DataFrames Quiz•15 minutes
- Defining Spark Schemas Quiz•15 minutes
- Module 4 Summative Assessment•120 minutes
In Module 5, students will delve deeper into Spark's capabilities for data manipulation and transformation. The module covers essential operations such as selecting, filtering, and sorting data, as well as joining DataFrames and performing aggregations. Students will also learn about handling null values, using Spark SQL for data queries, and optimizing performance with caching. Practical applications include creating and manipulating DataFrames, executing transformations and actions, and efficiently writing data to various formats.
What's included
19 videos11 readings10 assignments
19 videos•Total 103 minutes
- Module 5 Introduction•2 minutes
- Transformation - Rows - Part 1•10 minutes
- Transformation - Rows - Part 2•5 minutes
- Transformation - Rows - Part 3•4 minutes
- Transformations Columns - Part 1•9 minutes
- Transformations Columns - Part 2•4 minutes
- Transformations Join - Part 1•4 minutes
- Transformations Join - Part 2•4 minutes
- Transformations - Aggregations - Part 1•7 minutes
- Transformations - Aggregations - Part 2•5 minutes
- Transformations - Working with Null Values - Part 1•5 minutes
- Transformations - Working with Null Values - Part 2•5 minutes
- Transformations - Spark SQL - Part 1•6 minutes
- Transformations - Spark SQL - Part 2•4 minutes
- Transformations - Caching - Part 1•4 minutes
- Transformations - Caching - Part 2•5 minutes
- Actions•10 minutes
- Actions - Writing Data - Part 1•5 minutes
- Actions - Writing Data - Part 2•5 minutes
11 readings•Total 610 minutes
- Module 5 Introduction Reading•60 minutes
- Transformation - Rows•60 minutes
- Transformations - Columns•60 minutes
- Transformations - Join•60 minutes
- Transformations - Aggregations•60 minutes
- Transformations - Working with Null Values•60 minutes
- Transformations - Spark SQL•60 minutes
- Transformations - Caching•60 minutes
- Actions•60 minutes
- Actions - Writing Data•60 minutes
- Module 5 Summary•10 minutes
10 assignments•Total 255 minutes
- Transformations - Rows Quiz•15 minutes
- Transformations - Columns Quiz•15 minutes
- Transformations - Join Quiz•15 minutes
- Transformations/Actions - Aggregations Quiz•15 minutes
- Transformations - Working with Null Values Quiz•15 minutes
- Transformations - Spark SQL Quiz•15 minutes
- Transformations - Caching Quiz•15 minutes
- Transformations - Actions Quiz•15 minutes
- Actions - Writing Data Quiz•15 minutes
- Module 5 Summative Assessment•120 minutes
Module 6 introduces students to the limitations of batch processing and the significance of real-time data processing. It covers essential aspects of stream processing, including data ingestion and analysis, with a focus on tools like Apache Kafka for stream ingestion and Spark Structured Streaming for scalable and fault-tolerant data processing. Students will also explore various design patterns for organizing big data clusters, the concept of data lakes, and the Lambda Architecture for unifying real-time and batch data processing in modern data environments.
What's included
16 videos6 readings6 assignments
16 videos•Total 106 minutes
- Module 6 Introduction•3 minutes
- Stream Ingestion and Processing I - Part 1•9 minutes
- Stream Ingestion and Processing I - Part 2•8 minutes
- Stream Ingestion and Processing I - Part 3•8 minutes
- Stream Ingestion and Processing II - Part 1•6 minutes
- Stream Ingestion and Processing II - Part 2•3 minutes
- Stream Ingestion and Processing II - Part 3•5 minutes
- Stream Ingestion and Processing II - Part 4•7 minutes
- Analytic Cluster Pattern - Part 1•7 minutes
- Analytic Cluster Pattern - Part 2•7 minutes
- Data Lake Pattern - Part 1•6 minutes
- Data Lake Pattern - Part 2•6 minutes
- Data Lake Pattern - Part 3•6 minutes
- Lambda Architecture - Part 1•10 minutes
- Lambda Architecture - Part 2•8 minutes
- Lambda Architecture - Part 3•8 minutes
6 readings•Total 310 minutes
- Stream Ingestion and Processing (Part 1)•60 minutes
- Stream Ingestion and Processing (Part 2)•60 minutes
- Analytic Cluster Pattern•60 minutes
- Data Lake Pattern•60 minutes
- Lambda Architecture•60 minutes
- Module 6 Summary•10 minutes
6 assignments•Total 195 minutes
- Stream Ingestion and Processing (Part 1) Quiz•15 minutes
- Stream Ingestion and Processing (Part 2) Quiz•15 minutes
- What is a characteristic of a transient Hadoop cluster? Quiz•15 minutes
- Data Lake Pattern Quiz•15 minutes
- Lambda Architecture Quiz•15 minutes
- Module 6 Summative Assessment•120 minutes
In Module 7, students will explore the benefits and limitations of relational databases in big data contexts and the concept of distributed database systems. This module covers NoSQL databases, their diverse data models, and their scalability and flexibility advantages. Students will also learn about real-world use cases, data partitioning, consistency models, and the CAP Theorem, gaining a comprehensive understanding of how NoSQL databases manage large datasets across clusters while ensuring scalability and availability.
What's included
18 videos6 readings6 assignments
18 videos•Total 121 minutes
- Module 7 Introduction•3 minutes
- Using Databases for Big Data Storage - Part 1•10 minutes
- Using Databases for Big Data Storage - Part 2•6 minutes
- Using Databases for Big Data Storage - Part 3•3 minutes
- Using Databases for Big Data Storage - Part 4•9 minutes
- Using Databases for Big Data Storage - Part 5•6 minutes
- NoSQL Database Concepts I - Part 1•7 minutes
- NoSQL Database Concepts I - Part 2•7 minutes
- NoSQL Database Concepts I - Part 3•4 minutes
- NoSQL Database Concepts II - Part 1•5 minutes
- NoSQL Database Concepts II - Part 2•3 minutes
- NoSQL Database Concepts II - Part 3•7 minutes
- NoSQL Database Classifications I - Part 1•11 minutes
- NoSQL Database Classifications I - Part 2•7 minutes
- NoSQL Database Classifications I - Part 3•9 minutes
- NoSQL Database Classifications II - Part 1•12 minutes
- NoSQL Database Classifications II - Part 2•5 minutes
- NoSQL Database Classifications II - Part 3•6 minutes
6 readings•Total 310 minutes
- Using Databases for Big Data Storage•60 minutes
- NoSQL Database Concepts (Part 1)•60 minutes
- NoSQL Database Concepts (Part 2)•60 minutes
- NoSQL Database Classifications (Part 1)•60 minutes
- NoSQL Database Classifications (Part 2)•60 minutes
- Module 7 Summary•10 minutes
6 assignments•Total 195 minutes
- Using Databases for Big Data Storage Quiz•15 minutes
- NoSQL Database Concepts (Part 1) Quiz•15 minutes
- NoSQL Database Concepts (Part 2) Quiz•15 minutes
- NoSQL Database Classifications (Part 1) Quiz•15 minutes
- NoSQL Database Classifications (Part 2) Quiz•15 minutes
- Module 7 Summative Assessment•120 minutes
In Module 8, students will explore specific NoSQL databases types – namely Key-Value, Wide-Column, and Document databases. Two similar systems, HBase and Cassandra, will be studied and contrasted in the context of the CAP theorem and associated CP/AP trade-offs. Topics such as consistency and availability will be discussed in the context of specific usage scenarios for both HBase and Cassandra – and general application domains of both systems will be highlighted. Finally, the document database MongoDB will be reviewed in the context of natural language/text processing use cases – and MongoDB usage and architecture will be analyzed with respect to traditional RDBMS.
What's included
9 videos4 readings4 assignments
9 videos•Total 71 minutes
- Module 8 Introduction•0 minutes
- HRBase Pt. 1•10 minutes
- HRBase Pt. 2•7 minutes
- HRBase Pt. 3•6 minutes
- Cassandra Pt. 1•11 minutes
- Cassandra Pt. 2•11 minutes
- MongoDB Pt. 1•9 minutes
- MongoDB Pt. 2•7 minutes
- MongoDB Pt. 3•10 minutes
4 readings•Total 190 minutes
- HR Base•60 minutes
- Dynamo and Cassandra•60 minutes
- Mongo DB•60 minutes
- Module 8 Summary•10 minutes
4 assignments•Total 165 minutes
- HR Base Quiz•15 minutes
- Cassandra Quiz•15 minutes
- Mongo DB Quiz•15 minutes
- Module 8 Summative Assessment•120 minutes
This module contains the summative course assessment that has been designed to evaluate your understanding of the course material and assess your ability to apply the knowledge you have acquired throughout the course.
What's included
1 assignment
1 assignment•Total 180 minutes
- Summative Course Assessment•180 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Build toward a degree
This course is part of the following degree program(s) offered by Illinois Tech. If you are admitted and enroll, your completed coursework may count toward your degree learning and your progress can transfer with you.¹
Build toward a degree
This course is part of the following degree program(s) offered by Illinois Tech. If you are admitted and enroll, your completed coursework may count toward your degree learning and your progress can transfer with you.¹
Illinois Tech
Master of Data Science
Degree · 12-15 months
¹Successful application and enrollment are required. Eligibility requirements apply. Each institution determines the number of credits recognized by completing this content that may count towards degree requirements, considering any existing credits you may have. Click on a specific course for more information.
Instructor

Offered by

Offered by

Illinois Tech is a top-tier, nationally ranked, private research university with programs in engineering, computer science, architecture, design, science, business, human sciences, and law. The university offers bachelor of science, master of science, professional master’s, and Ph.D. degrees—as well as certificates for in-demand STEM fields and other areas of innovation. Talented students from around the world choose to study at Illinois Tech because of the access to real-world opportunities, renowned academic programs, high value, and career prospects of graduates.
Why people choose Coursera for their career

Felipe M.

Jennifer J.

Larry W.
