University of California, Davis
Distributed Computing with Spark SQL
University of California, Davis

Distributed Computing with Spark SQL

This course is part of Learn SQL Basics for Data Science Specialization

Taught in English

Some content may not be translated

Brooke Wenig
Conor Murphy

Instructors: Brooke Wenig

46,323 already enrolled

Included with Coursera Plus

Course

Gain insight into a topic and learn the fundamentals

4.5

(665 reviews)

|

86%

Intermediate level
Some related experience required
13 hours (approximately)
Flexible schedule
Learn at your own pace

What you'll learn

  • Use the collaborative Databricks workspace to write scalable Spark SQL code that executes against a cluster of machines

  • Inspect the Spark UI to analyze query performance and identify bottlenecks

  • Create an end-to-end pipeline that reads data, transforms it, and saves the result

  • Build a medallion (bronze, silver, gold) lakehouse architecture with Delta Lake to ensure the reliability, scalability, and performance of your data

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

8 quizzes

Course

Gain insight into a topic and learn the fundamentals

4.5

(665 reviews)

|

86%

Intermediate level
Some related experience required
13 hours (approximately)
Flexible schedule
Learn at your own pace

See how employees at top companies are mastering in-demand skills

Placeholder

Build your subject-matter expertise

This course is part of the Learn SQL Basics for Data Science Specialization
When you enroll in this course, you'll also be enrolled in this Specialization.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate
Placeholder
Placeholder

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

Placeholder

There are 4 modules in this course

In this module, you will be able to discuss the core concepts of distributed computing and be able to recognize when and where to apply them. You'll be able to identify the basic data structure of Apache Spark™, known as a DataFrame. Additionally, you will use the collaborative Databricks workspace and write SQL code that executes against a cluster of machines.

What's included

6 videos3 readings2 quizzes1 discussion prompt

In this module, you will be able to explain the core concepts of Spark. You will learn common ways to increase query performance by caching data and modifying Spark configurations. You will also use the Spark UI to analyze performance and identify bottlenecks, as well as optimize queries with Adaptive Query Execution.

What's included

6 videos2 readings2 quizzes

In this module, you will be able to identify and discuss the general demands of data applications. You'll be able to access data in a variety of formats and compare and contrast the tradeoffs between these formats. You will explore and examine semi-structured JSON data (common in big data environments) as well as schemas and parallel data writes. You will be able to create an end-to-end pipeline that reads data, transforms it, and saves the result.

What's included

7 videos2 readings2 quizzes

In this module, you will identify the key characteristics of data lakes, data warehouses, and lakehouses. Lakehouses combine the scalability and low-cost storage of data lakes with the speed and ACID transactional guarantees of data warehouses. You will build a production grade lakehouse by combining Spark with the open-source project, Delta Lake. Whoever said time travel isn't possible hasn't been to a lakehouse!

What's included

8 videos2 readings2 quizzes1 peer review1 discussion prompt

Instructors

Instructor ratings
4.6 (146 ratings)
Brooke Wenig
University of California, Davis
1 Course46,323 learners

Offered by

Recommended if you're interested in Data Analysis

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

Showing 3 of 665

4.5

665 reviews

  • 5 stars

    65.31%

  • 4 stars

    23.27%

  • 3 stars

    6.60%

  • 2 stars

    2.25%

  • 1 star

    2.55%

CG
4

Reviewed on May 30, 2022

SK
5

Reviewed on Jun 12, 2022

ET
5

Reviewed on Jul 12, 2020

New to Data Analysis? Start here.

Placeholder

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Frequently asked questions