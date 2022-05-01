Bernard Marr defines Big Data as the digital trace that we are generating in this digital era. In this course, you will learn about the characteristics of Big Data and its application in Big Data Analytics. You will gain an understanding about the features, benefits, limitations, and applications of some of the Big Data processing tools. You’ll explore how Hadoop and Hive help leverage the benefits of Big Data while overcoming some of the challenges it poses.
7 weeks of study, 1-2 hours / week
What you will learn
Explainthe impact of Big Data including use cases, tools, and processing methods.
ExplainApache Hadoop architecture, ecosystem, and practices, and userelatedapplications including HDFS, HBase, Spark, and MapReduce.
Apply Spark programming basics, including parallel programming basics forDataFrames, data sets, and Spark SQL.
UseSpark’s RDDsanddata sets, optimizingSparkSQLusing Catalyst and Tungsten, anduseSpark’s development and runtime environment options.
Skills you will gain
- Apache Hadoop
- SparkSQL
- SparkML
- Big Data
- Apache Spark
Offered by
IBM
IBM is the global leader in business transformation through an open hybrid cloud platform and AI, serving clients in more than 170 countries around the world. Today 47 of the Fortune 50 Companies rely on the IBM Cloud to run their business, and IBM Watson enterprise AI is hard at work in more than 30,000 engagements. IBM is also one of the world’s most vital corporate research organizations, with 28 consecutive years of patent leadership. Above all, guided by principles for trust and transparency and support for a more inclusive society, IBM is committed to being a responsible technology innovator and a force for good in the world.
Syllabus - What you will learn from this course
What is Big Data?
Begin your acquisition of Big Data knowledge with the most up-to-date definition of Big Data. You’ll explore the impact of Big Data on everyday personal tasks and business transactions with Big Data Use Cases. Learn how Big Data uses Parallel Processing, Scaling, and Data Parallelism. Learn about commonly used Big Data tools. Then, go beyond the hype and explore additional Big Data viewpoints.
Introduction to the Hadoop Ecosystem
In this module, you'll gain a fundamental understanding of the Apache Hadoop architecture, ecosystem, practices, and commonly used applications including Distributed File System (HDFS), MapReduce, HIVE and HBase. Gain practical skills in this module's lab when you launch a single node Hadoop cluster using Docker and run MapReduce jobs.
Apache Spark
Build your skills when you turn your attention to the popular Apache Spark platform. Explore attribute and benefits of Apache Spark and distributed computing. You'll gain key insights about functional programming and Lambda functions. Explore Resilient Distributed Datasets (RDDs), Parallel Programming, resilience in Apache Spark and relate RDDs and Parallel Programming with Apache Spark. Dive into additional Apache Spark components and learn how Apache Spark scales with Big Data. Working with Big Data signals the need for working with queries, including structured queries using SQL. Learn about the functions, parts and benefits of Spark SQL and DataFrame queries, and discover how DataFrames work with SparkSQL.
DataFrames and SparkSQL
Learn about Resilient Distributed Datasets (RDDs), their uses in Apache Spark, and RDD transformations and actions. You'll compare the use of datasets with Spark's latest data abstraction, DataFrames. You'll learn to identify and apply basic DataFrame operations. Explore Apache Spark SQL optimization. Learn how Spark SQL and memory optimization benefit from using Catalyst and Tungsten. Learn how to create a table view and apply data aggregation techniques. Fortify your skills guided via the hands-on lab.
Fantastic blend of theory and practical (labs). The labs are short and have concise material.
hands on lab and quizzes at the end of each session was very helpful
