IBM

Machine Learning with Apache Spark

This course is part of multiple programs.

IBM Skills Network Team
Ramesh Sannareddy

Instructors: IBM Skills Network Team

Access provided by Duke University

17,969 already enrolled

Gain insight into a topic and learn the fundamentals.
4.5

(111 reviews)

Intermediate level

Recommended experience

2 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
4.5

(111 reviews)

Intermediate level

Recommended experience

2 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Describe ML, explain its role in data engineering, summarize generative AI, discuss Spark's uses, and analyze ML pipelines and model persistence.

  • Evaluate ML models, distinguish between regression, classification, and clustering models, and compare data engineering pipelines with ML pipelines.

  • Construct the data analysis processes using Spark SQL, and perform regression, classification, and clustering using SparkML.

  • Demonstrate connecting to Spark clusters, build ML pipelines, perform feature extraction and transformation, and model persistence.

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

7 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is available as part of
When you enroll in this course, you'll also be asked to select a specific program.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate

There are 4 modules in this course

In this module, you will gain knowledge of machine learning techniques that enable computers to perform tasks without explicit programming. You will explore the lifecycle of machine learning models and understand the crucial role of data engineering in machine learning projects. The module covers supervised and unsupervised learning techniques, including classification, regression, and clustering. Furthermore, you will acquire valuable insights into Generative AI and its potential to revolutionize multiple industries, enhance people's lives, and generate newer and previously unimaginable data and experiences.

What's included

11 videos5 readings2 assignments5 app items1 plugin

This module will introduce you to Spark and provide an overview of its key features and applications in the field of data engineering. You will discover the process of connecting to a Spark cluster using SN labs and delve into various topics such as regression, mileage prediction, classification, diabetic classification, clustering, and clustering load data using SparkML. Additionally, you will gain insights into how to construct these models using Spark ML. Moreover, this module will cover GraphFrames on Apache Spark and guide you in hands-on labs.

What's included

5 videos2 readings2 assignments5 app items

This module begins with Apache Spark Structured Streaming and its role in processing streaming data with Spark SQL. You will acquire knowledge about key terms associated with Structured Streaming. The module then covers the Extract-Transform-Load process and provides hands-on experience in transferring data from one source to another destination with varying data formats or structures. Additionally, you will gain a practical understanding of feature extraction and transformation using Spark extract and transform features. The module also delves into machine learning pipelines using Spark, demonstrating the process and benefits involved. Lastly, you will grasp the concept of model persistence and its significant role in Machine Learning.

What's included

6 videos2 readings2 assignments6 app items2 plugins

In this module, you will apply the data engineering skills and techniques you have acquired throughout the course. The course concludes with a final project and assignments that allow you to demonstrate your proficiency in these areas. You will step into the role of a data engineer working at a renowned aeronautics consulting company recognized for its adeptness in handling large datasets. Your role as a data engineer is crucial as the data scientists rely on your expertise to carry out ETL (Extract, Transform, Load) tasks and establish machine learning pipelines. While data scientists possess expertise in machine learning, they depend on your specialized knowledge to handle various algorithms and data formats. Your contribution plays a vital role in ensuring the smooth execution of their tasks.

What's included

4 readings1 assignment2 app items

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Instructor ratings
4.7 (27 ratings)
IBM Skills Network Team
84 Courses1,569,948 learners

Offered by

IBM

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

4.5

111 reviews

  • 5 stars

    76.57%

  • 4 stars

    8.10%

  • 3 stars

    9%

  • 2 stars

    0.90%

  • 1 star

    5.40%

Showing 3 of 111

AS
5

Reviewed on Feb 2, 2024

SS
4

Reviewed on Mar 22, 2024

RR
5

Reviewed on Jun 23, 2025

Explore more from Data Science