Efficient Programming

Efficient Programming

This course is part of High-Performance and Parallel Computing Specialization

Instructors: Shelley Knuth

Access provided by Transport and Telecommunication Institute

5 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

8 hours to complete

Flexible schedule

Learn at your own pace

5 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

8 hours to complete

Flexible schedule

Learn at your own pace

What you'll learn

Describe the computing and memory architecture of a supercomputing node or cloud computing instance
Utilize compiler and libraries to increase the performance of your program
Understand how to utilize vector operations of a modern microprocessor to maximize performance
Use OpenMP directives to improve vectorization of your programs

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

5 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is part of the High-Performance and Parallel Computing Specialization

When you enroll in this course, you'll also be enrolled in this Specialization.

Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate

There are 5 modules in this course

This course is targeted to scientists, engineers, scholars, or anyone seeking to solve problems efficiently in high-performance computing environments or in the cloud. Students completing this course will have a basic understanding of how to find bottlenecks in their programs as well as how to address those bottlenecks. The course will provide a high-level introduction to modern compute node architectures of high-performance and cloud computing instances.

This course can be taken for academic credit as part of CU Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU Boulder’s departments of Applied Mathematics, Computer Science, Information Science, and others. With performance-based admissions and no application process, the MS-DS is ideal for individuals with a broad range of undergraduate education and/or professional experience in computer science, information science, mathematics, and statistics. Learn more about the MS-DS program at https://www.coursera.org/degrees/master-of-science-data-science-boulder.

In this module, we cover an approach to analyze and optimize program performance, such as profiling, using optimized libraries, and compiler options for increasing efficiency.

What's included

5 videos2 readings1 assignment1 programming assignment

5 videos Total 39 minutes

Course Overview 3 minutes
Profiling with gprof 9 minutes
Profiling For Python 6 minutes
Numerical Libraries 12 minutes
Compiler Options for Performance 8 minutes

2 readings Total 20 minutes

Earn Academic Credit for your Work! 10 minutes
Course Support 10 minutes

1 assignment Total 15 minutes

Module Quiz 15 minutes

1 programming assignment Total 45 minutes

Matrix Multiplication Profiling 45 minutes

In this module, we examine simple techniques that help with program performance. We are looking at scalar and loop optimization methods that can have a large impact on a program’s floating-point performance.

What's included

5 videos1 assignment1 programming assignment

5 videos Total 28 minutes

Dependency Analysis 6 minutes
Scalar Optimization 6 minutes
Loop Optimizations - Part 1 5 minutes
Loop Optimizations - Part 2 6 minutes
Python Optimization with NumPy 6 minutes

1 assignment Total 15 minutes

Module Quiz 15 minutes

1 programming assignment Total 30 minutes

Matrix Multiplication Optimizations: Loop Transformations and Parallelization 30 minutes

In this module, we introduce the basic architecture of modern computers focusing on how the architecture influences program performance. We are looking at processor level data parallelism and how optimized code for parallelism has a much increased floating-point performance.

What's included

4 videos1 assignment1 programming assignment

Memory performance is generally the main performance bottleneck since the speed of the main memory has not kept up with the capabilities of processors to process floating-point numbers. We introduce how layers of fast memory, called cache memory, can speed up computations and provide an example of how to optimize algorithms for better memory performance.

What's included

4 videos1 assignment1 programming assignment

This module will provide an introduction to parallel and high throughput computing. It will also demonstrate slurm job arrays, where there are mechanisms for working with many similar jobs quickly and easily. Finally, this module will look at running many jobs concurrently with GNU Parallel.