Every time you use Google to search something, every time you use Facebook, Twitter, Instagram or any other SNS (Social Network Service), and every time you buy from a recommended list of products on Amazon.com you are using a big data system. In addition, big data technology supports your smartphone, smartwatch, Alexa, Siri, and automobile (if it is a newer model) every day. The top companies in the world are currently using big data technology, and every company is in need of advanced big data technology support. Simply put, big data technology is not an option for your company, it is a necessity for survival and growth. So now is the right time to learn what big data is and how to use it in advantage of your company. This 6 module course first focuses on the world’s industry market share rankings of big data hardware, software, and professional services, and then covers the world’s top big data product line and service types of the major big data companies. Then the lectures focused on how big data analysis is possible based on the world’s most popular three big data technologies Hadoop, Spark, and Storm. The last part focuses on providing experience on one of the most famous and widely used big data statistical analysis systems in the world, the IBM SPSS Statistics. This course was designed to prepare you to be more successful in businesses strategic planning in the upcoming big data era. Welcome to the amazing Big Data world!
Big Data Emerging Technologies
This course is part of Emerging Technologies: From Smartphones to IoT to Big Data Specialization
22,895 already enrolled
Details to know
Add to your LinkedIn profile
5 quizzes, 5 assessments
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 6 modules in this course
The first module “Big Data Rankings & Products” focuses on the relation and market shares of big data hardware, software, and professional services. This information provides an insight to how future industry, products, services, schools, and government organizations will be influenced by big data technology. To have a deeper view into the world’s top big data products line and service types, the lecture provides an overview on the major big data company, which include IBM, SAP, Oracle, HPE, Splunk, Dell, Teradata, Microsoft, Cisco, and AWS. In order to understand the power of big data technology, the difference of big data analysis compared to traditional data analysis is explained. This is followed by a lecture on the 4 V big challenges of big data technology, which deal with issues in the volume, variety, velocity, and veracity of the massive data. Based on this introduction information, big data technology used in adding global insights on investments, help locate new stores and factories, and run real-time recommendation systems by Wal-Mart, Amazon, and Citibank is introduced.
6 videos1 quiz
The second module “Big Data & Hadoop” focuses on the characteristics and operations of Hadoop, which is the original big data system that was used by Google. The lectures explain the functionality of MapReduce, HDFS (Hadoop Distributed FileSystem), and the processing of data blocks. These functions are executed on a cluster of nodes that are assigned the role of NameNode or DataNodes, where the data processing is conducted by the JobTracker and TaskTrackers, which are explained in the lectures. In addition, the characteristics of metadata types and the differences in the data analysis processes of Hadoop and SQL (Structured Query Language) are explained. Then the Hadoop Release Series is introduced which include the descriptions of Hadoop YARN (Yet Another Resource Negotiator), HDFS Federation, and HDFS HA (High Availability) big data technology.
8 videos1 quiz
The third module “Spark” focuses on the operations and characteristics of Spark, which is currently the most popular big data technology in the world. The lecture first covers the differences in data analysis characteristics of Spark and Hadoop, then goes into the features of Spark big data processing based on the RDD (Resilient Distributed Datasets), Spark Core, Spark SQL, Spark Streaming, MLlib (Machine Learning Library), and GraphX core units. Details of the features of Spark DAG (Directed Acyclic Graph) stages and pipeline processes that are formed based on Spark transformations and actions are explained. Especially, the definition and advantages of lazy transformations and DAG operations are described along with the characteristics of Spark variables and serialization. In addition, the process of Spark cluster operations based on Mesos, Standalone, and YARN are introduced.
11 videos1 quiz
The fourth module “Spark ML & Streaming” focuses on how Spark ML (Machine Learning) works and how Spark streaming operations are conducted. The Spark ML algorithms include featurization, pipelines, persistence, and utilities which operate on the RDDs (Resilient Distributed Datasets) to extract information form the massive datasets. The lectures explain the characteristics of the DataFrame-based API, which is the primary ML API in the spark.ml package. Spark ML basic statistics algorithms based on correlation and hypothesis testing (P-value) are first introduced followed by the Spark ML classification and regression algorithms based on linear models, naive Bayes, and decision tree techniques. Then the characteristics of Spark streaming, streaming input and output, as well as streaming receiver types (which include basic, custom, and advanced) are explained, followed by how the Spark Streaming process and DStream (Discretized Stream) enable big data streaming operations for real-time and near-real-time applications.
4 videos1 quiz
The fifth module “Storm” focuses on the characteristics and operations of Storm big data systems. The lecture first covers the differences in data analysis characteristics of Storm, Spark, and Hadoop technology. Then the features of Storm big data processing based on the nimbus, spouts, and bolts are described followed by the Storm streams, supervisor, and ZooKeeper details. Further details on Storm reliable and unreliable spouts and bolts are provided followed by the advantages of Storm DAG (Directed Acyclic Graph) and data stream queue management. In addition, the advantages of using Storm based fast real-time applications, which include real-time analytics, online ML (Machine Learning), continuous computation, DRPC (Distributed Remote Procedure Call), and ETL (Extract, Transform, Load) are introduced.
5 videos1 quiz
The sixth and last module “IBM SPSS Statistics Project” focuses on providing experience on one of the most famous and widely used big data statistical analysis systems in the world. First, the lecture starts with how to setup and use IBM SPSS Statistics, and continues on to describe how IBM SPSS Statistics can be used to gain corporate data analysis experience. Then the data processing statistical results of two projects based on using the IBM SPSS Statistics big data system is conducted. The projects are conducted so the student can discover new ways to use, analyze, and draw charts of the relationship between datasets, and also compare the statistical results using IBM SPSS Statistics.
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review
Why people choose Coursera for their career
Showing 3 of 259
- 5 stars
- 4 stars
- 3 stars
- 2 stars
- 1 star
Reviewed on Jun 3, 2020
Reviewed on Jun 15, 2021
Reviewed on Sep 11, 2022
Recommended if you're interested in Information Technology
Open new doors with Coursera Plus
Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:
The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.