Build Batch Data Pipelines on Google Cloud

This course is part of multiple programs.

Instructor: Google Cloud Training

Access provided by VodafoneZiggo

50,379 already enrolled

4 modules

Gain insight into a topic and learn the fundamentals.

1,712 reviews

Intermediate level

Some related experience required

Flexible schedule

1 week at 10 hours a week

Learn at your own pace

84%

Most learners liked this course

4 modules

Gain insight into a topic and learn the fundamentals.

1,712 reviews

Intermediate level

Some related experience required

Flexible schedule

1 week at 10 hours a week

Learn at your own pace

84%

Most learners liked this course

What you'll learn

Determine whether batch data pipelines are the correct choice for your business use case.
Design and build scalable batch data pipelines for high-volume ingestion and transformation.
Implement data quality controls within batch pipelines to ensure data integrity.
Orchestrate, manage, and monitor batch data pipeline workflows, implementing error handling and observability using logging and monitoring tools.

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

5 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is available as part of

When you enroll in this course, you'll also be asked to select a specific program.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 4 modules in this course

In this intermediate course, you will learn to design, build, and optimize robust batch data pipelines on Google Cloud. Moving beyond fundamental data handling, you will explore large-scale data transformations and efficient workflow orchestration, essential for timely business intelligence and critical reporting.

You will learn the critical role of a data engineer in developing and maintaining batch data pipelines, understand their core components and lifecycle, and analyze common challenges in batch data processing. You'll also identify key Google Cloud services that address these challenges.

What's included

1 assignment2 plugins

You will design scalable batch data pipelines for high-volume data ingestion and transformation. You'll also optimize batch jobs for high throughput and cost-efficiency using various resource management and performance tuning techniques.

What's included

1 assignment2 app items7 plugins

1 assignmentTotal 15 minutes

Module 2 Quiz: Design and transformations15 minutes

2 app itemsTotal 120 minutes

Lab: Build a Simple Batch Data Pipeline with Serverless for Apache Spark60 minutes
Lab: Build a Simple Batch Data Pipeline with Dataflow Job Builder UI60 minutes

7 pluginsTotal 105 minutes

Design batch pipelines15 minutes
Large scale data transformations15 minutes
Dataflow and Serverless for Apache Spark15 minutes
Data connections and orchestration15 minutes
Execute an Apache Spark pipeline15 minutes
Optimize batch pipeline performance15 minutes
Accessing and completing labs15 minutes

You will develop data validation rules and cleansing logic to ensure data quality within batch pipelines. You'll also implement strategies for managing schema evolution and performing data deduplication in large datasets.

What's included

1 assignment1 app item6 plugins

1 assignmentTotal 8 minutes

Module 3 Quiz: Data validation and schema evolution8 minutes

1 app itemTotal 60 minutes

Lab: Validate Data Quality in a Batch Pipeline with Serverless for Apache Spark60 minutes

6 pluginsTotal 90 minutes

Batch data validation and cleansing15 minutes
Log and analyze errors15 minutes
Schema evolution for batch pipelines15 minutes
Data integrity and duplication15 minutes
Deduplication with Serverless for Apache Spark15 minutes
Deduplication with Dataflow15 minutes

You will orchestrate complex batch data pipeline workflows for efficient scheduling and lineage tracking. You'll also implement robust error handling, monitoring, and observability for batch data pipelines.

What's included

2 assignments1 app item6 plugins

2 assignmentsTotal 30 minutes

Module 4 Quiz: Orchestrations and DAGs15 minutes
Module 4 Quiz: Observability15 minutes

1 app itemTotal 90 minutes

Lab: Building Batch Pipelines in Cloud Data Fusion90 minutes

6 pluginsTotal 90 minutes

Orchestration for batch processing15 minutes
Cloud Composer15 minutes
Unified observability15 minutes
Alerts and troubleshooting15 minutes
Visual pipeline management15 minutes
Congratulations: Course summary15 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructor

Instructor ratings

(252 ratings)

Google Cloud Training

Google Cloud

2,162 Courses4,167,754 learners

Offered by

Google Cloud

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

5 stars
65.94%
4 stars
25.70%
3 stars
6.01%
2 stars
1.46%
1 star
0.87%

Showing 3 of 1712

Reviewed on May 27, 2020

A great course to help understand the various wonderful options Google Cloud has to offer to move on-premise Hadoop workload to Google Cloud Platform to leverage scalability of clusters.

Reviewed on May 19, 2020

Great course teaching how to build batch pipelines through GCP technologies, and showing cool tools for data wrangling and analysis

Reviewed on Jul 18, 2020

Good, I think pipelines need to have more labs related to some necessities in the industry, such as connect them to other external sources outside GCP

View more reviews