About this Course
12,575 recent views

100% online

Start instantly and learn at your own schedule.

Flexible deadlines

Reset deadlines in accordance to your schedule.

Intermediate Level

Approx. 16 hours to complete

Suggested: 4 weeks of study, 2-5 hours/week...

English

Subtitles: English

What you will learn

  • Check

    U​se the collaborative Databricks workspace and write SQL code that executes against a cluster of machines

  • Check

    Use Spark UI to analyze performance and identify bottlenecks

  • Check

    Create an end-to-end pipeline that reads data, transforms it, and saves the result

  • Check

    B​uild a linear regression model and make predictions using SparkSQL

Skills you will gain

Data ScienceApache SparkSQL

100% online

Start instantly and learn at your own schedule.

Flexible deadlines

Reset deadlines in accordance to your schedule.

Intermediate Level

Approx. 16 hours to complete

Suggested: 4 weeks of study, 2-5 hours/week...

English

Subtitles: English

Syllabus - What you will learn from this course

Week
1
3 hours to complete

Introduction to Spark

6 videos (Total 32 min), 3 readings, 2 quizzes
6 videos
Why Distributed Computing?7m
Spark DataFrames6m
The Databricks Environment8m
SQL in Notebooks3m
Import Data2m
3 readings
A Note From UC Davis10m
Readings and Resources40m
Assignment #1 - Queries in Spark SQL30m
2 practice exercises
Assignment #1 Quiz - Queries in Spark SQL30m
Module 1 Quiz30m
Week
2
2 hours to complete

Spark Core Concepts

6 videos (Total 25 min), 2 readings, 2 quizzes
6 videos
Spark Terminology3m
Caching5m
Shuffle Partitions7m
Spark UI3m
Broadcast Joins3m
2 readings
Readings30m
Assignment #2 - Spark Internals30m
2 practice exercises
Assignment #2 Quiz - Spark Internals30m
Module 2 Quiz30m
Week
3
3 hours to complete

Engineering Data Pipelines

7 videos (Total 43 min), 2 readings, 2 quizzes
7 videos
Spark as a Connector6m
Accessing Data10m
File Formats8m
Schemas and Types4m
Writing Data6m
Managed and Unmanaged Tables4m
2 readings
Readings20m
Assignment #3 - Engineering Data Pipelines30m
2 practice exercises
Assignment #3 Quiz - Engineering Data Pipelines30m
Module 3 Quiz30m
Week
4
4 hours to complete

Machine Learning Applications of Spark

7 videos (Total 35 min), 2 readings, 3 quizzes
7 videos
Applications of Machine Learning4m
Machine Learning Fundamentals6m
Linear Regression6m
Training Linear Regression Model8m
Applying Machine Learning with UDFs4m
Course Summary3m
2 readings
Readings20m
Assignment #4 - Logistic Regression Classifier10m
2 practice exercises
Assignment #4 Quiz - Logistic Regression Classifier30m
Module 4 Quiz30m

Instructors

Avatar

Brooke Wenig

Machine Learning Practice Lead at Databricks
Continuing and Professional Education
Avatar

Conor Murphy

Data Scientist at Databricks
Continuing and Professional Education

About University of California, Davis

UC Davis, one of the nation’s top-ranked research universities, is a global leader in agriculture, veterinary medicine, sustainability, environmental and biological sciences, and technology. With four colleges and six professional schools, UC Davis and its students and alumni are known for their academic excellence, meaningful public service and profound international impact....

Frequently Asked Questions

  • Once you enroll for a Certificate, you’ll have access to all videos, quizzes, and programming assignments (if applicable). Peer review assignments can only be submitted and reviewed once your session has begun. If you choose to explore the course without purchasing, you may not be able to access certain assignments.

  • When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

More questions? Visit the Learner Help Center.