This course introduces concepts, languages, techniques, and patterns for programming heterogeneous, massively parallel processors. Its contents and structure have been significantly revised based on the experience gained from its initial offering in 2012. It covers heterogeneous computing architectures, data-parallel programming models, techniques for memory bandwidth management, and parallel algorithm patterns.
All computing systems, from mobile to supercomputers, are becoming heterogeneous, massively parallel computers for higher power efficiency and computation throughput. While the computing community is racing to build tools and libraries to ease the use of these systems, effective and confident use of these systems will always require knowledge about low-level programming in these systems. This course is designed for students to learn the essence of low-level programming interfaces and how to use these interfaces to achieve application goals. CUDA C, with its good balance between user control and verboseness, will serve as the teaching vehicle for the first half of the course. Students will then extend their learning into closely related programming interfaces such as OpenCL, OpenACC, and C++AMP.
The course is unique in that it is application oriented and only introduces the necessary underlying computer science and computer engineering knowledge for understanding. It covers the concept of data parallel execution models, memory models for managing locality, tiling techniques for reducing bandwidth consumption, parallel algorithm patterns, overlapping computation with communication, and a variety of heterogeneous parallel programming interfaces. The concepts learned in this course form a strong foundation for learning other types of parallel programming systems.
Programming experience in C/C++.
Although the class is designed to be self-contained, students wanting to expand their knowledge beyond what we can cover in a one-quarter class can find a much more extensive coverage of this topic in the book Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series) - 2nd Edition, by David Kirk and Wen-mei Hwu, published by Morgan Kaufmann (Elsevier), ISBN 0123814723.
The class will consist of weekly lecture videos, which are between 15 and 20 minutes in length. There will also be weekly quizzes and programming assignments.
A laptop or desktop computer. GPU enabled hardware can be helpful but will not be required.
While OpenCL is an industry standard and widely supported by many CPU and GPU vendors, it is much more complex and tedious to use than CUDA. The complexity and tedious details distract from the concepts and techniques that one should master. From our experience, it is much more productive to use CUDA to teach the concepts and techniques. We will then teach the additional complexities of OpenCL so that students can comfortably apply all the concepts to OpenCL.
Out of the 9,908 students who did quizzes and programming assignments,.2,811 received Certificate of Achievement or Certificate of Distinction.
You will learn how to unleash the massive computing power from mobile processors to supercomputers for your applications.