This course is targeted to scientists, engineers, scholars, or anyone seeking to solve problems efficiently in high-performance computing environments or in the cloud. Students completing this course will have a basic understanding of how to find bottlenecks in their programs as well as how to address those bottlenecks. The course will provide a high-level introduction to modern compute node architectures of high-performance and cloud computing instances.



Efficient Programming
This course is part of High-Performance and Parallel Computing Specialization


Instructors: Shelley Knuth
Access provided by Transport and Telecommunication Institute
Recommended experience
What you'll learn
Describe the computing and memory architecture of a supercomputing node or cloud computing instance
Utilize compiler and libraries to increase the performance of your program
Understand how to utilize vector operations of a modern microprocessor to maximize performance
Use OpenMP directives to improve vectorization of your programs
Skills you'll gain
Details to know

Add to your LinkedIn profile
5 assignments
August 2025
See how employees at top companies are mastering in-demand skills

Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate

There are 5 modules in this course
In this module, we cover an approach to analyze and optimize program performance, such as profiling, using optimized libraries, and compiler options for increasing efficiency.
What's included
5 videos2 readings1 assignment1 programming assignment
In this module, we examine simple techniques that help with program performance. We are looking at scalar and loop optimization methods that can have a large impact on a program’s floating-point performance.
What's included
5 videos1 assignment1 programming assignment
In this module, we introduce the basic architecture of modern computers focusing on how the architecture influences program performance. We are looking at processor level data parallelism and how optimized code for parallelism has a much increased floating-point performance.
What's included
4 videos1 assignment1 programming assignment
Memory performance is generally the main performance bottleneck since the speed of the main memory has not kept up with the capabilities of processors to process floating-point numbers. We introduce how layers of fast memory, called cache memory, can speed up computations and provide an example of how to optimize algorithms for better memory performance.
What's included
4 videos1 assignment1 programming assignment
This module will provide an introduction to parallel and high throughput computing. It will also demonstrate slurm job arrays, where there are mechanisms for working with many similar jobs quickly and easily. Finally, this module will look at running many jobs concurrently with GNU Parallel.
What's included
4 videos1 assignment1 programming assignment
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructors


Offered by
Why people choose Coursera for their career




Explore more from Computer Science
University of Colorado Boulder
École Polytechnique Fédérale de Lausanne
University of Colorado Boulder
University of Colorado Boulder

Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy