Loading...

Shuffling: What it is and why it's important

Course video 12 of 20

This week we'll look at some of the performance implications of using operations like joins. Is it possible to get the same result without having to pay for the overhead of moving data over the network? We'll answer this question by delving into how we can partition our data to achieve better data locality, in turn optimizing some of our Spark jobs.

About Coursera

Courses, Specializations, and Online Degrees taught by top instructors from the world's best universities and educational institutions.

Community
Join a community of 40 million learners from around the world
Certificate
Earn a skill-based course certificate to apply your knowledge
Career
Gain confidence in your skills and further your career