When will I receive my Course Certificate?

If you complete the course successfully, your electronic Course Certificate will be added to your Accomplishments page - from there, you can print your Course Certificate or add it to your LinkedIn profile.

Why can’t I audit this course?

This course is currently available only to learners who have paid or received financial aid, when available.

Build Batch Data Pipelines on Google Cloud

This course is part of multiple programs.

Instructor: Google Cloud Training

50,820 already enrolled

Included with Learn more

Ask Coursera

4 modules

Gain insight into a topic and learn the fundamentals.

1,712 reviews

Intermediate level

Some related experience required

Flexible schedule

1 week at 10 hours a week

Learn at your own pace

84%

Most learners liked this course

4 modules

Gain insight into a topic and learn the fundamentals.

1,712 reviews

Intermediate level

Some related experience required

Flexible schedule

1 week at 10 hours a week

Learn at your own pace

84%

Most learners liked this course

What you'll learn

Determine whether batch data pipelines are the correct choice for your business use case.
Design and build scalable batch data pipelines for high-volume ingestion and transformation.
Implement data quality controls within batch pipelines to ensure data integrity.
Orchestrate, manage, and monitor batch data pipeline workflows, implementing error handling and observability using logging and monitoring tools.

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

5 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is available as part of

When you enroll in this course, you'll also be asked to select a specific program.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 4 modules in this course

In this intermediate course, you will learn to design, build, and optimize robust batch data pipelines on Google Cloud. Moving beyond fundamental data handling, you will explore large-scale data transformations and efficient workflow orchestration, essential for timely business intelligence and critical reporting.

You will learn the critical role of a data engineer in developing and maintaining batch data pipelines, understand their core components and lifecycle, and analyze common challenges in batch data processing. You'll also identify key Google Cloud services that address these challenges.

What's included

1 assignment2 plugins

You will design scalable batch data pipelines for high-volume data ingestion and transformation. You'll also optimize batch jobs for high throughput and cost-efficiency using various resource management and performance tuning techniques.

What's included

1 assignment2 app items7 plugins

1 assignmentTotal 15 minutes

Module 2 Quiz: Design and transformations15 minutes

2 app itemsTotal 120 minutes

Lab: Build a Simple Batch Data Pipeline with Serverless for Apache Spark60 minutes
Lab: Build a Simple Batch Data Pipeline with Dataflow Job Builder UI60 minutes

7 pluginsTotal 105 minutes

Design batch pipelines15 minutes
Large scale data transformations15 minutes
Dataflow and Serverless for Apache Spark15 minutes
Data connections and orchestration15 minutes
Execute an Apache Spark pipeline15 minutes
Optimize batch pipeline performance15 minutes
Accessing and completing labs15 minutes

You will develop data validation rules and cleansing logic to ensure data quality within batch pipelines. You'll also implement strategies for managing schema evolution and performing data deduplication in large datasets.

What's included

1 assignment1 app item6 plugins

1 assignmentTotal 8 minutes

Module 3 Quiz: Data validation and schema evolution8 minutes

1 app itemTotal 60 minutes

Lab: Validate Data Quality in a Batch Pipeline with Serverless for Apache Spark60 minutes

6 pluginsTotal 90 minutes

Batch data validation and cleansing15 minutes
Log and analyze errors15 minutes
Schema evolution for batch pipelines15 minutes
Data integrity and duplication15 minutes
Deduplication with Serverless for Apache Spark15 minutes
Deduplication with Dataflow15 minutes

You will orchestrate complex batch data pipeline workflows for efficient scheduling and lineage tracking. You'll also implement robust error handling, monitoring, and observability for batch data pipelines.

What's included

2 assignments1 app item6 plugins

2 assignmentsTotal 30 minutes

Module 4 Quiz: Orchestrations and DAGs15 minutes
Module 4 Quiz: Observability15 minutes

1 app itemTotal 90 minutes

Lab: Building Batch Pipelines in Cloud Data Fusion90 minutes

6 pluginsTotal 90 minutes

Orchestration for batch processing15 minutes
Cloud Composer15 minutes
Unified observability15 minutes
Alerts and troubleshooting15 minutes
Visual pipeline management15 minutes
Congratulations: Course summary15 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Instructor ratings

(252 ratings)

Google Cloud Training

Google Cloud

2,281 Courses4,462,864 learners

Offered by

Google Cloud

Explore more from Cloud Computing

Google Cloud
Building Batch Pipelines in Cloud Data Fusion
Project
Status: Free Trial
Google Cloud
Serverless Data Processing with Dataflow
Specialization
Status: Free Trial
Google Cloud
Build Data Lakes and Data Warehouses on Google Cloud
Course
Google Cloud
Building Realtime Pipelines in Cloud Data Fusion
Project

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

5 stars
65.90%
4 stars
25.74%
3 stars
6.01%
2 stars
1.45%
1 star
0.87%

Showing 3 of 1712

Reviewed on May 19, 2020

Informative on various features. But cloud fusion and dataflow are not very clearly explained in detail.. expecting more on this. Want to learn more on the pipeline topic please.

Reviewed on Jun 18, 2020

Excellent course with appropriate explanation on cloud data fusion, data composer, data proc and cloud data-flow. Must learn course for all aspiring Big Data Engineers.

Reviewed on Jul 9, 2020

This course really teaches me in-depth about data engineering than the cloud or any other products offered by GCP which is the most important part.

View more reviews

Unlock access to 10,000+ courses with a subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 4,700 global companies that choose Coursera for Business

Frequently asked questions

Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.

If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.

Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.