The course "Multicore and GPGPU Programming" provides a foundational understanding of parallel programming, focusing on developing high-performance, multi-threaded applications in both CPU and GPU environments. Beginning with a review of multicore processor architectures, caching mechanisms, and Non-Uniform Memory Access (NUMA) systems, students will learn the essentials of shared memory programming, synchronisation techniques, and the use of locks to ensure data integrity across threads.

Multicore and GPGPU Programming
Erweitern Sie Ihre Kenntnisse mit Coursera Plus für 239 $/Jahr (normalerweise 399 $). Jetzt sparen.
kurs ist nicht verfügbar in Deutsch (Deutschland)

Empfohlene Erfahrung
Empfohlene Erfahrung
Stufe „Mittel“
Basic knowledge of C/C++ and computer architecture is recommended.
Empfohlene Erfahrung
Empfohlene Erfahrung
Stufe „Mittel“
Basic knowledge of C/C++ and computer architecture is recommended.
Was Sie lernen werden
Understand the fundamentals of multi-threaded programming and its applications in multicore systems.
Develop shared memory programs in OpenMP and distributed programming using MPI.
Gain a foundational understanding of GPGPU architecture and the CUDA programming model.
Kompetenzen, die Sie erwerben
- Kategorie: Computer HardwareComputer Hardware
- Kategorie: AlgorithmsAlgorithms
- Kategorie: Program DevelopmentProgram Development
- Kategorie: Distributed ComputingDistributed Computing
- Kategorie: Performance TestingPerformance Testing
Werkzeuge, die Sie lernen werden
- Kategorie: C (Programming Language)C (Programming Language)
Wichtige Details

Zu Ihrem LinkedIn-Profil hinzufügen
124 Aufgaben
Erfahren Sie, wie Mitarbeiter führender Unternehmen gefragte Kompetenzen erwerben.

In diesem Kurs gibt es 12 Module
In this module, the learners will be introduced to the course and its syllabus, setting the foundation for their learning journey. The course's introductory video will provide them with insights into the valuable skills and knowledge they can expect to gain throughout the duration of this course. Additionally, the syllabus reading will comprehensively outline essential course components, including course values, assessment criteria, grading system, schedule, details of live sessions, and a recommended reading list that will enhance the learner’s understanding of the course concepts. Moreover, this module offers the learners the opportunity to connect with fellow learners as they participate in a discussion prompt designed to facilitate introductions and exchanges within the course community.
Das ist alles enthalten
4 Videos1 Lektüre1 Diskussionsthema
4 Videos• Insgesamt 51 Minuten
- Course Introductory Video• 2 Minuten
- Meet Your Instructor - Dr. Gargi Prabhu • 1 Minute
- Meet Your Instructor - Dr. Kunal Korgaonkar• 1 Minute
- Recording of Multicore and GPGPU Programming: Week 1 - Live Session on 25-05-23 18:32:50 [47:25]• 47 Minuten
1 Lektüre• Insgesamt 10 Minuten
- Course Overview• 10 Minuten
1 Diskussionsthema• Insgesamt 10 Minuten
- Meet Your Peers• 10 Minuten
In this module, students will gain foundational knowledge of parallel and multi-threaded programming, exploring the core principles that underlie the efficient utilisation of modern multi-core and many-core processors. Beginning with an overview of parallel programming concepts, this module covers different types of parallelism, including data parallelism, task parallelism, and pipeline parallelism. Students will also examine critical performance metrics like speedup, efficiency, and scalability, which help in evaluating the benefits and trade-offs of parallel approaches.
Das ist alles enthalten
12 Videos2 Lektüren12 Aufgaben1 Diskussionsthema
12 Videos• Insgesamt 73 Minuten
- Need for Ever-Increasing Performance• 8 Minuten
- Parallel Systems and Parallel Programs• 8 Minuten
- Concurrent, Parallel, Distributed Systems• 5 Minuten
- Types of Parallelism: Data, Task and Pipeline Parallelism• 8 Minuten
- Speedup and Efficiency• 5 Minuten
- Amdahl’s Law • 5 Minuten
- Gustafson’s Law • 5 Minuten
- Scalability in Parallel Systems• 5 Minuten
- Cost of Parallelisation• 7 Minuten
- Sources of Overhead in Parallel Programs • 5 Minuten
- Timing Parallel Programs: Methods and Best Practices• 7 Minuten
- GPU Performance• 5 Minuten
2 Lektüren• Insgesamt 120 Minuten
- Recommended Reading: Fundamentals of Parallel Computing• 60 Minuten
- Recommended Reading: Introduction to Performance Metrics in Parallel Computing• 60 Minuten
12 Aufgaben• Insgesamt 36 Minuten
- Need for Ever-Increasing Performance• 3 Minuten
- Parallel Systems and Parallel Programs• 3 Minuten
- Concurrent, Parallel, Distributed Systems• 3 Minuten
- Types of Parallelism: Data, Task and Pipeline Parallelism• 3 Minuten
- Speedup and Efficiency• 3 Minuten
- Amdahl’s Law • 3 Minuten
- Gustafson’s Law • 3 Minuten
- Scalability in MIMD Systems• 3 Minuten
- Cost of Parallelisation• 3 Minuten
- Sources of Overhead in Parallel Programs• 3 Minuten
- Taking Timings of Parallel Programs• 3 Minuten
- GPU Performance• 3 Minuten
1 Diskussionsthema• Insgesamt 30 Minuten
- Why Parallelism? Revisiting the Roots of Multicore Programming• 30 Minuten
This module provides an in-depth exploration of multicore processor architectures, examining the design principles, performance considerations, and challenges involved in building efficient multicore systems. Students will study how multiple cores interact within a processor, focusing on memory hierarchies, caching mechanisms, and the role of parallelism in improving computational performance.
Das ist alles enthalten
15 Videos2 Lektüren15 Aufgaben1 Diskussionsthema
15 Videos• Insgesamt 160 Minuten
- The Von Neumann Architecture• 7 Minuten
- Processes, Multitasking, and Threads• 5 Minuten
- The Basics of Caching• 7 Minuten
- Virtual Memory• 7 Minuten
- Instruction-Level Parallelism• 9 Minuten
- Hardware Multithreading• 6 Minuten
- Classifications of Parallel Computers• 6 Minuten
- SIMD and MIMD Systems• 7 Minuten
- Interconnection Networks: Shared Memory Systems• 6 Minuten
- Interconnection Networks: Distributed Memory Systems• 8 Minuten
- Cache Coherence• 8 Minuten
- Shared-Memory vs. Distributed-Memory• 4 Minuten
- Parallel Software: Coordinating Process and Threads• 11 Minuten
- Distributed Memory Software• 7 Minuten
- Recording of Multicore and GPGPU Programming: Week 2 - Live Session on 25-05-30 18:35:08 [02:05]• 62 Minuten
2 Lektüren• Insgesamt 100 Minuten
- Recommended Reading: Architecture Background• 40 Minuten
- Recommended Reading: Parallel Hardware and Software• 60 Minuten
15 Aufgaben• Insgesamt 114 Minuten
- The Von Neumann Architecture• 3 Minuten
- Processes, Multitasking, and Threads• 3 Minuten
- The Basics of Caching• 3 Minuten
- Virtual Memory• 3 Minuten
- Instruction-Level Parallelism• 3 Minuten
- Hardware Multithreading• 3 Minuten
- Classifications of Parallel Computer• 3 Minuten
- SIMD and MIMD Systems• 3 Minuten
- Interconnection Networks: Shared Memory Systems• 3 Minuten
- Interconnection Networks: Distributed Memory Systems• 6 Minuten
- Cache Coherence• 3 Minuten
- Shared-Memory vs. Distributed-Memory• 3 Minuten
- Parallel Software: Coordinating Process and Threads• 12 Minuten
- Distributed Memory Software• 3 Minuten
- Graded Quiz - Modules 1 and 2 • 60 Minuten
1 Diskussionsthema• Insgesamt 30 Minuten
- From Von Neumann to Multicore: Evolving Architectures and Memory Realities• 30 Minuten
This module introduces students to the architectural principles of General-Purpose GPU (GPGPU) systems and the CUDA programming model. It explores the hardware components, including Streaming Multiprocessors (SMs), CUDA cores, and memory hierarchy, which form the foundation of GPU computing. The module also provides an overview of the CUDA programming model, emphasising its thread hierarchy, grid, and block organisation. By understanding these fundamental concepts, students will develop the ability to harness GPU architecture for high-performance parallel computing.
Das ist alles enthalten
15 Videos2 Lektüren14 Aufgaben1 Diskussionsthema
15 Videos• Insgesamt 127 Minuten
- GPUs and GPGPU• 5 Minuten
- GPU Architecture• 5 Minuten
- Heterogeneous Computing• 4 Minuten
- Paradigm of Heterogeneous Computing• 5 Minuten
- Introduction to CUDA• 5 Minuten
- Structure of a CUDA Program• 8 Minuten
- Threads, Blocks, and Grid• 9 Minuten
- Managing Memory• 7 Minuten
- Writing and Verifying Your Kernel• 6 Minuten
- Compiling and Running CUDA Program• 4 Minuten
- Nvidia Compute Capabilities and Device Architecture• 6 Minuten
- Timing Your Kernel• 7 Minuten
- Organising Parallel Threads• 5 Minuten
- Managing Devices• 4 Minuten
- Recording of Multicore and GPGPU Programming: Week 3 - Live Session on 25-06-06 18:31:21 [44:50]• 45 Minuten
2 Lektüren• Insgesamt 75 Minuten
- Recommended Reading: GPGPU Architecture and CUDA• 15 Minuten
- Recommended Reading: Programming Model Overview• 60 Minuten
14 Aufgaben• Insgesamt 48 Minuten
- GPUs and GPGPU• 6 Minuten
- GPU Architecture• 3 Minuten
- Heterogeneous Computing• 3 Minuten
- Paradigm of Heterogeneous Computing• 3 Minuten
- Introduction to CUDA• 3 Minuten
- Structure of a CUDA Program• 3 Minuten
- Threads, Blocks, and Grid• 6 Minuten
- Managing Memory• 3 Minuten
- Writing and Verifying Your Kernel• 3 Minuten
- Compiling and Running CUDA Program• 3 Minuten
- Nvidia Compute Capabilities and Device Architecture• 3 Minuten
- Timing Your Kernel• 3 Minuten
- Organising Parallel Threads• 3 Minuten
- Managing Devices• 3 Minuten
1 Diskussionsthema• Insgesamt 30 Minuten
- Harnessing GPU Power: Exploring CUDA and the Architecture of Parallelism• 30 Minuten
This module provides a comprehensive understanding of how CUDA executes programs on GPUs. It covers key concepts such as warps, warp scheduling, and resource partitioning, which are critical for understanding GPU hardware behaviour. The module delves into branch divergence and its impact on performance, offering strategies to minimise its effects. It also emphasises exposing parallelism effectively by leveraging CUDA’s hierarchical execution model. Students will learn how to design and optimise GPU programs by aligning with the underlying execution model to maximise efficiency and throughput.
Das ist alles enthalten
15 Videos2 Lektüren15 Aufgaben1 Diskussionsthema
15 Videos• Insgesamt 135 Minuten
- Introduction to CUDA Execution Model• 7 Minuten
- Warps and Thread Blocks• 4 Minuten
- Warp Divergence• 9 Minuten
- Resource Partitioning• 6 Minuten
- Latency Hiding• 10 Minuten
- Occupancy• 5 Minuten
- Synchronization• 4 Minuten
- Scalability• 5 Minuten
- Exposing Parallelism• 10 Minuten
- Checking Active Warps with Nvprof• 6 Minuten
- Checking Memory Operations with Nvprof• 7 Minuten
- Avoiding Branch Divergence• 3 Minuten
- The Parallel Reduction Problem and Thread Divergence• 7 Minuten
- Improving Divergence in Parallel Reduction• 6 Minuten
- Recording of Multicore and GPGPU Programming: Week 4 - Live Session on 25-06-13 18:32:39 [49:37]• 45 Minuten
2 Lektüren• Insgesamt 120 Minuten
- Recommended Reading: Structure of a CUDA Program• 60 Minuten
- Recommended Reading: Exposing Parallelism and Avoiding Branch Divergence• 60 Minuten
15 Aufgaben• Insgesamt 105 Minuten
- Introduction to CUDA Execution Model• 3 Minuten
- Warps and Thread Blocks • 3 Minuten
- Warp Divergence• 3 Minuten
- Resource Partitioning• 6 Minuten
- Latency Hiding• 3 Minuten
- Occupancy• 3 Minuten
- Synchronization• 3 Minuten
- Scalability• 3 Minuten
- Exposing Parallelism• 3 Minuten
- Checking Active Warps with Nvprof• 3 Minuten
- Checking Memory Operations with Nvprof• 3 Minuten
- Avoiding Branch Divergence• 3 Minuten
- The Parallel Reduction Problem and Thread Divergence• 3 Minuten
- Improving Divergence in Parallel Reduction• 3 Minuten
- Graded Quiz - Modules 3 and 4 • 60 Minuten
1 Diskussionsthema• Insgesamt 30 Minuten
- Under the Hood: Warps, Divergence, and CUDA Execution Dynamics• 30 Minuten
The CUDA Memory Model & Streams and Concurrency module introduces students to the intricacies of memory hierarchy in CUDA, including global, shared, and local memory. It emphasises the importance of memory coalescing and efficient memory access patterns to optimise performance on GPUs. The module also covers CUDA streams, explaining how concurrent kernel execution and memory operations can be managed to enhance parallelism. By understanding these concepts, students will gain the ability to design GPU programs that maximise throughput and minimise latency.
Das ist alles enthalten
14 Videos2 Lektüren14 Aufgaben1 Diskussionsthema1 Unbewertetes Labor
14 Videos• Insgesamt 126 Minuten
- Introduction to CUDA Memory Model• 8 Minuten
- Memory Allocation and Deallocation• 6 Minuten
- Zero Copy Memory• 4 Minuten
- Unified Virtual Addressing and Unified Memory • 3 Minuten
- Aligned and Coalesced Access• 6 Minuten
- CUDA Shared Memory• 6 Minuten
- Shared Memory Banks and Access Mode • 7 Minuten
- Configuring the Amount of Shared Memory• 5 Minuten
- Synchronisation• 9 Minuten
- CUDA Streams• 7 Minuten
- Stream Scheduling and Priorities• 6 Minuten
- CUDA Events• 6 Minuten
- Concurrent Kernel Execution• 6 Minuten
- Recording of Multicore and GPGPU Programming: Week 5 - Live Session on 25-06-20 18:31:59 [47:36]• 48 Minuten
2 Lektüren• Insgesamt 120 Minuten
- Recommended Reading: CUDA Memory Model• 60 Minuten
- Recommended Reading: Streams and Concurrency• 60 Minuten
14 Aufgaben• Insgesamt 342 Minuten
- Introduction to CUDA Memory Model• 3 Minuten
- Memory Allocation and Deallocation• 3 Minuten
- Zero Copy Memory• 3 Minuten
- Unified Virtual Addressing and Unified Memory • 3 Minuten
- Aligned and Coalesced Access• 3 Minuten
- CUDA Shared Memory• 6 Minuten
- Shared Memory Banks and Access Mode • 3 Minuten
- Configuring the Amount of Shared Memory• 3 Minuten
- Synchronisation• 3 Minuten
- CUDA Streams• 3 Minuten
- Stream Scheduling and Priorities• 3 Minuten
- CUDA Events• 3 Minuten
- Concurrent Kernel Execution• 3 Minuten
- SGA-1: CUDA Programming and Performance Optimisation• 300 Minuten
1 Diskussionsthema• Insgesamt 30 Minuten
- Smart Memory and Seamless Concurrency: CUDA Memory and Streams• 30 Minuten
1 Unbewertetes Labor• Insgesamt 60 Minuten
- Hands on lab: Parallel Matrix Addition Using CUDA• 60 Minuten
This module explains in depth the difference between processes and threads and introduces multithreaded programming using pthreads library. Students are expected to learn about the various functions in pthreads library and implement those to solve real-world problems through a multithreaded approach. It also discusses precautions to take while developing an algorithm that uses multi-threading.
Das ist alles enthalten
10 Videos11 Lektüren10 Aufgaben1 Diskussionsthema
10 Videos• Insgesamt 116 Minuten
- Processes, Threads and Pthreads• 4 Minuten
- Hello World!!• 9 Minuten
- Matrix-Vector Multiplication• 13 Minuten
- Critical Sections• 5 Minuten
- Busy Waiting• 6 Minuten
- Mutexes• 5 Minuten
- Semaphores• 7 Minuten
- Barriers and Condition Variables• 13 Minuten
- Caches, Cache-Coherence and False Sharing• 9 Minuten
- Recording of Multicore and GPGPU Programming: Week 6 - Live Session on 25-06-27 18:38:36 [43:53]• 44 Minuten
11 Lektüren• Insgesamt 295 Minuten
- Recommended Reading: Processes, Threads and Pthreads• 10 Minuten
- Recommended Reading: Hello World!!• 60 Minuten
- Recommended Reading: Matrix-Vector Multiplication• 15 Minuten
- Recommended Reading: Critical Sections• 30 Minuten
- Recommended Reading: Busy Waiting• 20 Minuten
- Recommended Reading: Mutexes• 15 Minuten
- Recommended Reading: Semaphores• 30 Minuten
- Recommended Reading: Barriers and Condition Variables• 30 Minuten
- Recommended Reading: Read-Write Locks• 60 Minuten
- Recommended Reading: Caches, Cache-Coherence and False Sharing• 15 Minuten
- Lab Instruction Document• 10 Minuten
10 Aufgaben• Insgesamt 135 Minuten
- Processes, Threads and Pthreads• 9 Minuten
- Hello World!!• 9 Minuten
- Matrix-Vector Multiplication• 9 Minuten
- Critical Sections• 9 Minuten
- Busy Waiting• 9 Minuten
- Mutexes• 9 Minuten
- Semaphores• 6 Minuten
- Barriers and Condition Variables• 6 Minuten
- Caches, Cache-Coherence and False Sharing• 9 Minuten
- Graded Quiz - Modules 5 and 6 • 60 Minuten
1 Diskussionsthema• Insgesamt 10 Minuten
- Thread Synchronization and Shared Memory: Building Reliable Parallel Programs with Pthreads• 10 Minuten
This module aims to introduce students to Distributed memory programming using the Message Passing Interface (MPI). Students will learn about the functions provided by the MPI library and their descriptions. It will enable students to develop parallel programming codes and also to convert a serial programmed code into a parallel code with the help of the MPI functions.
Das ist alles enthalten
7 Videos9 Lektüren7 Aufgaben1 Diskussionsthema
7 Videos• Insgesamt 70 Minuten
- Introduction to MPI• 4 Minuten
- MPI Setup and Communicator Functions• 6 Minuten
- SPMD and Communication• 10 Minuten
- Potential Pitfalls• 4 Minuten
- Simple Serial Sorting Algorithm• 20 Minuten
- Parallel Odd-Even Transposition Sort• 19 Minuten
- Safety in MPI Programs• 7 Minuten
9 Lektüren• Insgesamt 125 Minuten
- Recommended Reading: Introduction to MPI• 15 Minuten
- Recommended Reading: MPI Setup and Communicator Functions• 15 Minuten
- Recommended Reading: SPMD and Communication• 15 Minuten
- Recommended Reading: Potential Pitfalls• 15 Minuten
- Recommended Reading: Simple Serial Sorting Algorithm• 15 Minuten
- Recommended Reading: Parallel Odd-Even Transposition Sort• 15 Minuten
- Recommended Reading: Safety in MPI Programs • 15 Minuten
- Lab: Practice Code• 10 Minuten
- Lab: Practice Solution• 10 Minuten
7 Aufgaben• Insgesamt 63 Minuten
- Introduction to MPI• 9 Minuten
- MPI Setup and Communicator Functions• 9 Minuten
- SPMD and Communication• 9 Minuten
- Potential Pitfalls• 9 Minuten
- Simple Serial Sorting Algorithm• 9 Minuten
- Parallel Odd-Even Transposition Sort• 9 Minuten
- Safety in MPI Programs• 9 Minuten
1 Diskussionsthema• Insgesamt 30 Minuten
- MPI in Action: Understanding Setup, Communication, and Parallel Sorting• 30 Minuten
This module aims to introduce the shared memory programming model with the help of the OpenMP library. Students will gain exposure to the functions in the OpenMP library and methods to implement those in code to implement parallelism using shared memory. Students will explore the foundational concepts of OpenMP through videos and readings, starting with the basics of the library and progressing to more advanced topics such as reduction clauses, variable scoping, and mutual exclusion. Through worked examples like the Trapezoidal Rule and sorting functions, learners will understand how to parallelise loops, manage scheduling, and apply critical sections and locks for safe concurrent execution. The module also covers tasking in OpenMP and classic concurrency problems like producers and consumers.
Das ist alles enthalten
12 Videos12 Lektüren13 Aufgaben1 Diskussionsthema
12 Videos• Insgesamt 94 Minuten
- Introduction to OpenMP• 5 Minuten
- Programming in OpenMP• 10 Minuten
- Trapezoidal Rule• 10 Minuten
- Scope of Variables• 4 Minuten
- Reduction Clause• 7 Minuten
- Parallel-For Directive and Caveats in Them• 8 Minuten
- Sorting Functions• 20 Minuten
- Scheduling• 6 Minuten
- Producers and Consumers• 6 Minuten
- Termination, Startup and Atomic Directive• 7 Minuten
- Critical Sections and Locks• 6 Minuten
- Tasking• 5 Minuten
12 Lektüren• Insgesamt 152 Minuten
- Recommended Reading: Introduction to OpenMP• 15 Minuten
- Recommended Reading: Programming in OpenMP• 15 Minuten
- Recommended Reading: Trapezoidal Rule• 15 Minuten
- Recommended Reading: Scope of Variables• 15 Minuten
- Recommended Reading: Reduction Clause• 15 Minuten
- Recommended Reading: Parallel-For Directive and Caveats in Them• 15 Minuten
- Recommended Reading: Sorting Functions• 15 Minuten
- Recommended Reading: Scheduling • 15 Minuten
- Recommended Reading: Producers and Consumers• 15 Minuten
- Recommended Reading: Termination, Startup and Atomic Directive• 1 Minute
- Recommended Reading: Critical Sections and Locks• 1 Minute
- Recommended Reading: Tasking• 15 Minuten
13 Aufgaben• Insgesamt 168 Minuten
- Introduction to OpenMP• 9 Minuten
- Programming in OpenMP• 9 Minuten
- Trapezoidal Rule• 9 Minuten
- Scope of Variables• 9 Minuten
- Reduction Clause• 9 Minuten
- Parallel-For Directive and Caveats in Them• 9 Minuten
- Sorting Functions• 9 Minuten
- Scheduling• 9 Minuten
- Producers and Consumers• 9 Minuten
- Termination, Startup and Atomic Directive• 9 Minuten
- Critical Sections and Locks• 9 Minuten
- Tasking• 9 Minuten
- Graded Quiz - Modules 7 and 8• 60 Minuten
1 Diskussionsthema• Insgesamt 30 Minuten
- Mastering OpenMP: From Parallel Patterns to Synchronisation• 30 Minuten
This module will introduce the n-body problem in physics, examining its significance in simulating gravitational interactions among multiple particles. It will explore classical and modern algorithmic approaches to solving the n-body problem, followed by a discussion on their computational complexity. Emphasis will be placed on identifying opportunities for parallelisation, and students will analyse and implement efficient parallel solutions using the programming languages and parallel computing directives covered in the course.
Das ist alles enthalten
13 Videos13 Lektüren13 Aufgaben1 Diskussionsthema
13 Videos• Insgesamt 107 Minuten
- Introduction to N-body Problem• 8 Minuten
- Serial Solutions to the N-body Problem• 16 Minuten
- Parallelising Strategy• 13 Minuten
- Parallelising Basic Solver Using OpenMP• 9 Minuten
- Parallelising Reduced Solver Using OpenMP • 11 Minuten
- Evaluating OpenMP Performance• 5 Minuten
- Parallelising Basic Solver Using Pthreads • 4 Minuten
- Parallelising Basic Solver Using MPI • 9 Minuten
- Parallelising Reduced Solver Using MPI• 9 Minuten
- Evaluating MPI Performance• 6 Minuten
- Parallelising Basic Solver Using CUDA• 7 Minuten
- Evaluating CUDA Solver and Improving Performance• 4 Minuten
- Using Shared Memory for Solvers• 7 Minuten
13 Lektüren• Insgesamt 195 Minuten
- Recommended Reading: Introduction to N-body Problem• 15 Minuten
- Recommended Reading: Serial Solutions to the N-body Problem• 15 Minuten
- Recommended Reading: Parallelising Strategy• 15 Minuten
- Recommended Reading: Parallelising Basic Solver Using OpenMP• 15 Minuten
- Recommended Reading: Parallelising Reduced Solver Using OpenMP• 15 Minuten
- Recommended Reading: Evaluating OpenMP performance• 15 Minuten
- Recommended Reading: Parallelising Basic Solver Using Pthreads• 15 Minuten
- Recommended Reading: Parallelising Basic Solver Using MPI• 15 Minuten
- Recommended Reading: Parallelising Reduced Solver Using MPI• 15 Minuten
- Recommended Reading: Evaluating MPI Performance• 15 Minuten
- Recommended Reading: Parallelising Basic Solver Using CUDA• 15 Minuten
- Recommended Reading: Evaluating CUDA Solver and Improving Performance• 15 Minuten
- Recommended Reading: Using Shared Memory for Solvers• 15 Minuten
13 Aufgaben• Insgesamt 138 Minuten
- Introduction to N-body Problem• 9 Minuten
- Serial Solutions to the N-body Problem• 9 Minuten
- Parallelising Strategy• 9 Minuten
- Parallelising Basic Solver Using OpenMP• 9 Minuten
- Parallelising Reduced Solver Using OpenMP• 9 Minuten
- Evaluating OpenMP Performance• 9 Minuten
- Parallelising Basic Solver Using Pthreads• 9 Minuten
- Parallelising Basic Solver Using MPI• 30 Minuten
- Parallelising Reduced Solver Using MPI• 9 Minuten
- Evaluating MPI Performance• 9 Minuten
- Parallelising Basic Solver Using CUDA• 9 Minuten
- Evaluating CUDA Solver and Improving Performance• 9 Minuten
- Using Shared Memory for Solvers• 9 Minuten
1 Diskussionsthema• Insgesamt 30 Minuten
- The N-Body Solver: Exploring Parallelism Across Models• 30 Minuten
This module focuses on hands-on implementations of the Sample Sort algorithm using OpenMP, Pthreads, MPI, and CUDA. Students will explore the strengths and limitations of each parallel programming model through practical coding exercises. The module includes performance benchmarking and comparative analysis of the implementations to highlight trade-offs in scalability, efficiency, and suitability for different architectures. By the end of the module, students will have a strong grasp of each API and be equipped to make informed decisions about the most appropriate tool for a given parallel computing task.
Das ist alles enthalten
8 Videos9 Lektüren10 Aufgaben1 Diskussionsthema
8 Videos• Insgesamt 61 Minuten
- Sample Sort and Bucket Sort• 10 Minuten
- Map• 17 Minuten
- Implementing Sample Sort Using OpenMP: First Implementation• 5 Minuten
- Implementing Sample Sort Using OpenMP: Second Implementation• 7 Minuten
- Implementing Sample Sort Using Pthreads • 4 Minuten
- Implementing Sample Sort Using MPI• 6 Minuten
- Implementing Sample Sort Using MPI: Example• 5 Minuten
- Implementing Sample Sort Using CUDA • 7 Minuten
9 Lektüren• Insgesamt 115 Minuten
- Recommended Reading: Sample Sort and Bucket Sort• 15 Minuten
- Recommended Reading: Map• 10 Minuten
- Recommended Reading: Implementing Sample Sort Using OpenMP: First Implementation• 15 Minuten
- Recommended Reading: Implementing Sample Sort Using OpenMP: Second Implementation• 15 Minuten
- Recommended Reading: Implementing Sample Sort Using Pthreads• 10 Minuten
- Recommended Reading: Implementing Sample Sort Using MPI• 15 Minuten
- Recommended Reading: Implementing Sample Sort Using MPI: Example• 15 Minuten
- Recommended Reading: Implementing Sample Sort Using CUDA• 10 Minuten
- Recommended Reading: Which API?• 10 Minuten
10 Aufgaben• Insgesamt 432 Minuten
- Sample Sort and Bucket Sort• 9 Minuten
- Map (Quiz)• 9 Minuten
- Implementing Sample Sort Using OpenMP: First Implementation• 9 Minuten
- Implementing Sample Sort Using OpenMP: Second Implementation• 9 Minuten
- Implementing Sample Sort Using Pthreads• 9 Minuten
- Implementing Sample Sort Using MPI• 9 Minuten
- Implementing Sample Sort Using MPI: Example• 9 Minuten
- Implementing Sample Sort Using CUDA• 9 Minuten
- Graded Quiz - Modules 9 and 10• 60 Minuten
- SGA-2: Odd-Even Transposition Sort Parallelisation • 300 Minuten
1 Diskussionsthema• Insgesamt 30 Minuten
- Parallel Sample Sort Across Platforms• 30 Minuten
Final Comprehensive Examination
Das ist alles enthalten
1 Aufgabe
1 Aufgabe• Insgesamt 30 Minuten
- Final Comprehensive Examination • 30 Minuten
Dozenten


von

von

Birla Institute of Technology & Science, Pilani (BITS Pilani) is one of only ten private universities in India to be recognised as an Institute of Eminence by the Ministry of Human Resource Development, Government of India. It has been consistently ranked high by both governmental and private ranking agencies for its innovative processes and capabilities that have enabled it to impart quality education and emerge as the best private science and engineering institute in India. BITS Pilani has four international campuses in Pilani, Goa, Hyderabad, and Dubai, and has been offering bachelor's, master’s, and certificate programmes for over 58 years, helping to launch the careers for over 1,00,000 professionals.
Mehr von Algorithms entdecken
JJohns Hopkins University
Kurs
CCoursera
Kurs
JJohns Hopkins University
Kurs
JJohns Hopkins University
Kurs
Warum entscheiden sich Menschen für Coursera für ihre Karriere?

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

Neue Karrieremöglichkeiten mit Coursera Plus
Unbegrenzter Zugang zu 10,000+ Weltklasse-Kursen, praktischen Projekten und berufsqualifizierenden Zertifikatsprogrammen - alles in Ihrem Abonnement enthalten
Bringen Sie Ihre Karriere mit einem Online-Abschluss voran.
Erwerben Sie einen Abschluss von erstklassigen Universitäten – 100 % online
Schließen Sie sich mehr als 3.400 Unternehmen in aller Welt an, die sich für Coursera for Business entschieden haben.
Schulen Sie Ihre Mitarbeiter*innen, um sich in der digitalen Wirtschaft zu behaupten.
Häufig gestellte Fragen
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
Weitere Fragen
Finanzielle Unterstützung verfügbar,