IBM
ETL and Data Pipelines with Shell, Airflow and Kafka
IBM

ETL and Data Pipelines with Shell, Airflow and Kafka

This course is part of multiple programs.

Taught in English

Some content may not be translated

Yan Luo
Jeff Grossman
Sabrina Spillner

Instructors: Yan Luo

38,927 already enrolled

Course

Gain insight into a topic and learn the fundamentals

4.5

(285 reviews)

|

87%

Intermediate level

Recommended experience

16 hours (approximately)
Flexible schedule
Learn at your own pace

What you'll learn

  • Describe and contrast Extract, Transform, Load (ETL) processes and Extract, Load, Transform (ELT) processes.

  • Explain batch vs concurrent modes of execution.

  • Implement an ETL pipeline through shell scripting.

  • Describe data pipeline components, processes, tools, and technologies.

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

11 quizzes

Course

Gain insight into a topic and learn the fundamentals

4.5

(285 reviews)

|

87%

Intermediate level

Recommended experience

16 hours (approximately)
Flexible schedule
Learn at your own pace

See how employees at top companies are mastering in-demand skills

Placeholder

Build your subject-matter expertise

This course is available as part of
When you enroll in this course, you'll also be asked to select a specific program.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate
Placeholder
Placeholder

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

Placeholder

There are 5 modules in this course

ETL or Extract, Transform, and Load processes are used for cases where flexibility, speed, and scalability of data are important. You will explore some key differences been similar processes, ETL and ELT, which include the place of transformation, flexibility, Big Data support, and time-to-insight. You will learn that there is an increasing demand for access to raw data that drives the evolution from ETL to ELT. Data extraction involves advanced technologies including database querying, web scraping, and APIs. You will also learn that data transformation is about formatting data to suit the application and that data is loaded in batches or streamed continuously.

What's included

7 videos2 readings2 quizzes1 plugin

Extract, transform and load (ETL) pipelines are created with Bash scripts that can be run on a schedule using cron. Data pipelines move data from one place, or form, to another. Data pipeline processes include scheduling or triggering, monitoring, maintenance, and optimization. Furthermore, Batch pipelines extract and operate on batches of data. Whereas streaming data pipelines ingest data packets one-by-one in rapid succession. In this module, you will learn that streaming pipelines apply when the most current data is needed. You will explore that parallelization and I/O buffers help mitigate bottlenecks. You will also learn how to describe data pipeline performance in terms of latency and throughput.

What's included

5 videos4 readings4 quizzes1 app item1 plugin

The key advantage of Apache Airflow's approach to representing data pipelines as DAGs is that they are expressed as code, which makes your data pipelines more maintainable, testable, and collaborative. Tasks, the nodes in a DAG, are created by implementing Airflow's built-in operators.​ In this module, you will learn about Apache Airflow having a rich UI that simplifies working with data pipelines. You will explore how to visualize your DAG in graph or tree mode. You will also learn about the key components of a DAG definition file, and you will learn that Airflow logs are saved into local file systems and then sent to cloud storage, search engines, and log analyzers.

What's included

5 videos2 readings2 quizzes4 app items

Apache Kafka is a very popular open source event streaming pipeline. An event is a type of data that describes the entity’s observable state updates over time. Popular Kafka service providers include Confluent Cloud, IBM Event Stream, and Amazon MSK. Additionally, Kafka Streams API is a client library supporting you with data processing in event streaming pipelines. In this module, you will learn that the core components of Kafka are brokers, topics, partitions, replications, producers, and consumers. You will explore two special types of processors in the Kafka Stream API stream-processing topology: The source processor and the sink processor. You will also learn about building event streaming pipelines using Kafka.

What's included

4 videos1 reading2 quizzes3 app items1 plugin

In this final assignment module, you will apply your newly gained knowledge to explore two very exciting hands-on labs. “Creating ETL Data Pipelines using Apache Airflow” and “Creating Streaming Data Pipelines using Kafka”. You will explore building these ETL pipelines using real-world scenarios. You will extract, transform, and load data into a CSV file. You will also create a topic named “toll” in Apache Kafka, download and customize a streaming data consumer, as well as verifying that streaming data has been collected in the database table.

What's included

5 readings1 quiz1 peer review3 app items

Instructors

Instructor ratings
4.7 (74 ratings)
Yan Luo
IBM
7 Courses267,684 learners
Jeff Grossman
IBM
2 Courses48,888 learners

Offered by

IBM

Recommended if you're interested in Data Management

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

Showing 3 of 285

4.5

285 reviews

  • 5 stars

    69.44%

  • 4 stars

    18.05%

  • 3 stars

    6.25%

  • 2 stars

    3.47%

  • 1 star

    2.77%

ED
5

Reviewed on Sep 28, 2021

JJ
5

Reviewed on Jul 22, 2023

DL
5

Reviewed on Sep 6, 2022

New to Data Management? Start here.

Placeholder

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions