Multicore and GPGPU Programming

Obtenez l'une de nos meilleures offres avec Coursera Plus pour 199 $ (habituellement 399 $). Économisez maintenant.

Ce cours n'est pas disponible en Français (France)

Nous sommes actuellement en train de le traduire dans plus de langues.

Multicore and GPGPU Programming

Instructeurs : Kunal Kishore Korgaonkar

Inclus avec

Demander à Coursera

12 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

8 semaines à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

12 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

8 semaines à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

Ce que vous apprendrez

Understand the fundamentals of multi-threaded programming and its applications in multicore systems.
Develop shared memory programs in OpenMP and distributed programming using MPI.
Gain a foundational understanding of GPGPU architecture and the CUDA programming model.

Compétences que vous acquerrez

Catégorie : Program Development
Catégorie : Memory Management
Catégorie : Performance Testing
Catégorie : Algorithms
Catégorie : Distributed Computing
Catégorie : Microarchitecture

Outils que vous découvrirez

Catégorie : C (Programming Language)

Détails à connaître

Certificat partageable

Ajouter à votre profil LinkedIn

Évaluations

124 devoirs

Enseigné en Anglais

Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées

En savoir plus sur Coursera pour les affaires

logos de Petrobras, TATA, Danone, Capgemini, P&G et L'Oreal

Il y a 12 modules dans ce cours

The course "Multicore and GPGPU Programming" provides a foundational understanding of parallel programming, focusing on developing high-performance, multi-threaded applications in both CPU and GPU environments. Beginning with a review of multicore processor architectures, caching mechanisms, and Non-Uniform Memory Access (NUMA) systems, students will learn the essentials of shared memory programming, synchronisation techniques, and the use of locks to ensure data integrity across threads.

The course delves into designing shared memory data structures and introduces advanced synchronisation concepts, including lazy synchronisation, crucial for scalable and efficient concurrent applications. Additionally, students will explore the architecture and programming model of General-Purpose Graphics Processing Units (GPGPUs) and learn CUDA programming to leverage GPU parallelism for compute-intensive tasks. By the end of the course, students will be adept in optimising multi-threaded and many-core applications, balancing workload across CPUs and GPUs to achieve high throughput and efficient resource utilisation. This course is essential for those aiming to develop expertise in high-performance computing and parallel programming for modern multi-core and GPU-based systems.

In this module, the learners will be introduced to the course and its syllabus, setting the foundation for their learning journey. The course's introductory video will provide them with insights into the valuable skills and knowledge they can expect to gain throughout the duration of this course. Additionally, the syllabus reading will comprehensively outline essential course components, including course values, assessment criteria, grading system, schedule, details of live sessions, and a recommended reading list that will enhance the learner’s understanding of the course concepts. Moreover, this module offers the learners the opportunity to connect with fellow learners as they participate in a discussion prompt designed to facilitate introductions and exchanges within the course community.

Inclus

4 vidéos1 lecture1 sujet de discussion

4 vidéosTotal 51 minutes

Course Introductory Video2 minutes
Meet Your Instructor - Dr. Gargi Prabhu 1 minute
Meet Your Instructor - Dr. Kunal Korgaonkar1 minute
Recording of Multicore and GPGPU Programming: Week 1 - Live Session on 25-05-23 18:32:50 [47:25]47 minutes

1 lectureTotal 10 minutes

Course Overview10 minutes

1 sujet de discussionTotal 10 minutes

Meet Your Peers10 minutes

In this module, students will gain foundational knowledge of parallel and multi-threaded programming, exploring the core principles that underlie the efficient utilisation of modern multi-core and many-core processors. Beginning with an overview of parallel programming concepts, this module covers different types of parallelism, including data parallelism, task parallelism, and pipeline parallelism. Students will also examine critical performance metrics like speedup, efficiency, and scalability, which help in evaluating the benefits and trade-offs of parallel approaches.

Inclus

12 vidéos2 lectures12 devoirs1 sujet de discussion

12 vidéosTotal 73 minutes

Need for Ever-Increasing Performance8 minutes
Parallel Systems and Parallel Programs8 minutes
Concurrent, Parallel, Distributed Systems5 minutes
Types of Parallelism: Data, Task and Pipeline Parallelism8 minutes
Speedup and Efficiency5 minutes
Amdahl’s Law 5 minutes
Gustafson’s Law 5 minutes
Scalability in Parallel Systems5 minutes
Cost of Parallelisation7 minutes
Sources of Overhead in Parallel Programs 5 minutes
Timing Parallel Programs: Methods and Best Practices7 minutes
GPU Performance5 minutes

2 lecturesTotal 120 minutes

Recommended Reading: Fundamentals of Parallel Computing60 minutes
Recommended Reading: Introduction to Performance Metrics in Parallel Computing60 minutes

12 devoirsTotal 36 minutes

Need for Ever-Increasing Performance3 minutes
Parallel Systems and Parallel Programs3 minutes
Concurrent, Parallel, Distributed Systems3 minutes
Types of Parallelism: Data, Task and Pipeline Parallelism3 minutes
Speedup and Efficiency3 minutes
Amdahl’s Law 3 minutes
Gustafson’s Law 3 minutes
Scalability in MIMD Systems3 minutes
Cost of Parallelisation3 minutes
Sources of Overhead in Parallel Programs3 minutes
Taking Timings of Parallel Programs3 minutes
GPU Performance3 minutes

1 sujet de discussionTotal 30 minutes

Why Parallelism? Revisiting the Roots of Multicore Programming30 minutes

This module provides an in-depth exploration of multicore processor architectures, examining the design principles, performance considerations, and challenges involved in building efficient multicore systems. Students will study how multiple cores interact within a processor, focusing on memory hierarchies, caching mechanisms, and the role of parallelism in improving computational performance.

Inclus

15 vidéos2 lectures15 devoirs1 sujet de discussion

15 vidéosTotal 160 minutes

The Von Neumann Architecture7 minutes
Processes, Multitasking, and Threads5 minutes
The Basics of Caching7 minutes
Virtual Memory7 minutes
Instruction-Level Parallelism9 minutes
Hardware Multithreading6 minutes
Classifications of Parallel Computers6 minutes
SIMD and MIMD Systems7 minutes
Interconnection Networks: Shared Memory Systems6 minutes
Interconnection Networks: Distributed Memory Systems8 minutes
Cache Coherence8 minutes
Shared-Memory vs. Distributed-Memory4 minutes
Parallel Software: Coordinating Process and Threads11 minutes
Distributed Memory Software7 minutes
Recording of Multicore and GPGPU Programming: Week 2 - Live Session on 25-05-30 18:35:08 [02:05]62 minutes

2 lecturesTotal 100 minutes

Recommended Reading: Architecture Background40 minutes
Recommended Reading: Parallel Hardware and Software60 minutes

15 devoirsTotal 114 minutes

The Von Neumann Architecture3 minutes
Processes, Multitasking, and Threads3 minutes
The Basics of Caching3 minutes
Virtual Memory3 minutes
Instruction-Level Parallelism3 minutes
Hardware Multithreading3 minutes
Classifications of Parallel Computer3 minutes
SIMD and MIMD Systems3 minutes
Interconnection Networks: Shared Memory Systems3 minutes
Interconnection Networks: Distributed Memory Systems6 minutes
Cache Coherence3 minutes
Shared-Memory vs. Distributed-Memory3 minutes
Parallel Software: Coordinating Process and Threads12 minutes
Distributed Memory Software3 minutes
Graded Quiz - Modules 1 and 2 60 minutes

1 sujet de discussionTotal 30 minutes

From Von Neumann to Multicore: Evolving Architectures and Memory Realities30 minutes

This module introduces students to the architectural principles of General-Purpose GPU (GPGPU) systems and the CUDA programming model. It explores the hardware components, including Streaming Multiprocessors (SMs), CUDA cores, and memory hierarchy, which form the foundation of GPU computing. The module also provides an overview of the CUDA programming model, emphasising its thread hierarchy, grid, and block organisation. By understanding these fundamental concepts, students will develop the ability to harness GPU architecture for high-performance parallel computing.

Inclus

15 vidéos2 lectures14 devoirs1 sujet de discussion

15 vidéosTotal 127 minutes

GPUs and GPGPU5 minutes
GPU Architecture5 minutes
Heterogeneous Computing4 minutes
Paradigm of Heterogeneous Computing5 minutes
Introduction to CUDA5 minutes
Structure of a CUDA Program8 minutes
Threads, Blocks, and Grid9 minutes
Managing Memory7 minutes
Writing and Verifying Your Kernel6 minutes
Compiling and Running CUDA Program4 minutes
Nvidia Compute Capabilities and Device Architecture6 minutes
Timing Your Kernel7 minutes
Organising Parallel Threads5 minutes
Managing Devices4 minutes
Recording of Multicore and GPGPU Programming: Week 3 - Live Session on 25-06-06 18:31:21 [44:50]45 minutes

2 lecturesTotal 75 minutes

Recommended Reading: GPGPU Architecture and CUDA15 minutes
Recommended Reading: Programming Model Overview60 minutes

14 devoirsTotal 48 minutes

GPUs and GPGPU6 minutes
GPU Architecture3 minutes
Heterogeneous Computing3 minutes
Paradigm of Heterogeneous Computing3 minutes
Introduction to CUDA3 minutes
Structure of a CUDA Program3 minutes
Threads, Blocks, and Grid6 minutes
Managing Memory3 minutes
Writing and Verifying Your Kernel3 minutes
Compiling and Running CUDA Program3 minutes
Nvidia Compute Capabilities and Device Architecture3 minutes
Timing Your Kernel3 minutes
Organising Parallel Threads3 minutes
Managing Devices3 minutes

1 sujet de discussionTotal 30 minutes

Harnessing GPU Power: Exploring CUDA and the Architecture of Parallelism30 minutes

This module provides a comprehensive understanding of how CUDA executes programs on GPUs. It covers key concepts such as warps, warp scheduling, and resource partitioning, which are critical for understanding GPU hardware behaviour. The module delves into branch divergence and its impact on performance, offering strategies to minimise its effects. It also emphasises exposing parallelism effectively by leveraging CUDA’s hierarchical execution model. Students will learn how to design and optimise GPU programs by aligning with the underlying execution model to maximise efficiency and throughput.

Inclus

15 vidéos2 lectures15 devoirs1 sujet de discussion

15 vidéosTotal 135 minutes

Introduction to CUDA Execution Model7 minutes
Warps and Thread Blocks4 minutes
Warp Divergence9 minutes
Resource Partitioning6 minutes
Latency Hiding10 minutes
Occupancy5 minutes
Synchronization4 minutes
Scalability5 minutes
Exposing Parallelism10 minutes
Checking Active Warps with Nvprof6 minutes
Checking Memory Operations with Nvprof7 minutes
Avoiding Branch Divergence3 minutes
The Parallel Reduction Problem and Thread Divergence7 minutes
Improving Divergence in Parallel Reduction6 minutes
Recording of Multicore and GPGPU Programming: Week 4 - Live Session on 25-06-13 18:32:39 [49:37]45 minutes

2 lecturesTotal 120 minutes

Recommended Reading: Structure of a CUDA Program60 minutes
Recommended Reading: Exposing Parallelism and Avoiding Branch Divergence60 minutes

15 devoirsTotal 105 minutes

Introduction to CUDA Execution Model3 minutes
Warps and Thread Blocks 3 minutes
Warp Divergence3 minutes
Resource Partitioning6 minutes
Latency Hiding3 minutes
Occupancy3 minutes
Synchronization3 minutes
Scalability3 minutes
Exposing Parallelism3 minutes
Checking Active Warps with Nvprof3 minutes
Checking Memory Operations with Nvprof3 minutes
Avoiding Branch Divergence3 minutes
The Parallel Reduction Problem and Thread Divergence3 minutes
Improving Divergence in Parallel Reduction3 minutes
Graded Quiz - Modules 3 and 4 60 minutes

1 sujet de discussionTotal 30 minutes

Under the Hood: Warps, Divergence, and CUDA Execution Dynamics30 minutes

The CUDA Memory Model & Streams and Concurrency module introduces students to the intricacies of memory hierarchy in CUDA, including global, shared, and local memory. It emphasises the importance of memory coalescing and efficient memory access patterns to optimise performance on GPUs. The module also covers CUDA streams, explaining how concurrent kernel execution and memory operations can be managed to enhance parallelism. By understanding these concepts, students will gain the ability to design GPU programs that maximise throughput and minimise latency.

Inclus

14 vidéos2 lectures14 devoirs1 sujet de discussion1 laboratoire non noté

14 vidéosTotal 126 minutes

Introduction to CUDA Memory Model8 minutes
Memory Allocation and Deallocation6 minutes
Zero Copy Memory4 minutes
Unified Virtual Addressing and Unified Memory 3 minutes
Aligned and Coalesced Access6 minutes
CUDA Shared Memory6 minutes
Shared Memory Banks and Access Mode 7 minutes
Configuring the Amount of Shared Memory5 minutes
Synchronisation9 minutes
CUDA Streams7 minutes
Stream Scheduling and Priorities6 minutes
CUDA Events6 minutes
Concurrent Kernel Execution6 minutes
Recording of Multicore and GPGPU Programming: Week 5 - Live Session on 25-06-20 18:31:59 [47:36]48 minutes

2 lecturesTotal 120 minutes

Recommended Reading: CUDA Memory Model60 minutes
Recommended Reading: Streams and Concurrency60 minutes

14 devoirsTotal 342 minutes

Introduction to CUDA Memory Model3 minutes
Memory Allocation and Deallocation3 minutes
Zero Copy Memory3 minutes
Unified Virtual Addressing and Unified Memory 3 minutes
Aligned and Coalesced Access3 minutes
CUDA Shared Memory6 minutes
Shared Memory Banks and Access Mode 3 minutes
Configuring the Amount of Shared Memory3 minutes
Synchronisation3 minutes
CUDA Streams3 minutes
Stream Scheduling and Priorities3 minutes
CUDA Events3 minutes
Concurrent Kernel Execution3 minutes
SGA-1: CUDA Programming and Performance Optimisation300 minutes

1 sujet de discussionTotal 30 minutes

Smart Memory and Seamless Concurrency: CUDA Memory and Streams30 minutes

1 laboratoire non notéTotal 60 minutes

Hands on lab: Parallel Matrix Addition Using CUDA60 minutes

This module explains in depth the difference between processes and threads and introduces multithreaded programming using pthreads library. Students are expected to learn about the various functions in pthreads library and implement those to solve real-world problems through a multithreaded approach. It also discusses precautions to take while developing an algorithm that uses multi-threading.

Inclus

10 vidéos11 lectures10 devoirs1 sujet de discussion

10 vidéosTotal 116 minutes

Processes, Threads and Pthreads4 minutes
Hello World!!9 minutes
Matrix-Vector Multiplication13 minutes
Critical Sections5 minutes
Busy Waiting6 minutes
Mutexes5 minutes
Semaphores7 minutes
Barriers and Condition Variables13 minutes
Caches, Cache-Coherence and False Sharing9 minutes
Recording of Multicore and GPGPU Programming: Week 6 - Live Session on 25-06-27 18:38:36 [43:53]44 minutes

11 lecturesTotal 295 minutes

Recommended Reading: Processes, Threads and Pthreads10 minutes
Recommended Reading: Hello World!!60 minutes
Recommended Reading: Matrix-Vector Multiplication15 minutes
Recommended Reading: Critical Sections30 minutes
Recommended Reading: Busy Waiting20 minutes
Recommended Reading: Mutexes15 minutes
Recommended Reading: Semaphores30 minutes
Recommended Reading: Barriers and Condition Variables30 minutes
Recommended Reading: Read-Write Locks60 minutes
Recommended Reading: Caches, Cache-Coherence and False Sharing15 minutes
Lab Instruction Document10 minutes

10 devoirsTotal 135 minutes

Processes, Threads and Pthreads9 minutes
Hello World!!9 minutes
Matrix-Vector Multiplication9 minutes
Critical Sections9 minutes
Busy Waiting9 minutes
Mutexes9 minutes
Semaphores6 minutes
Barriers and Condition Variables6 minutes
Caches, Cache-Coherence and False Sharing9 minutes
Graded Quiz - Modules 5 and 6 60 minutes

1 sujet de discussionTotal 10 minutes

Thread Synchronization and Shared Memory: Building Reliable Parallel Programs with Pthreads10 minutes

This module aims to introduce students to Distributed memory programming using the Message Passing Interface (MPI). Students will learn about the functions provided by the MPI library and their descriptions. It will enable students to develop parallel programming codes and also to convert a serial programmed code into a parallel code with the help of the MPI functions.

Inclus

7 vidéos9 lectures7 devoirs1 sujet de discussion

7 vidéosTotal 70 minutes

Introduction to MPI4 minutes
MPI Setup and Communicator Functions6 minutes
SPMD and Communication10 minutes
Potential Pitfalls4 minutes
Simple Serial Sorting Algorithm20 minutes
Parallel Odd-Even Transposition Sort19 minutes
Safety in MPI Programs7 minutes

9 lecturesTotal 125 minutes

Recommended Reading: Introduction to MPI15 minutes
Recommended Reading: MPI Setup and Communicator Functions15 minutes
Recommended Reading: SPMD and Communication15 minutes
Recommended Reading: Potential Pitfalls15 minutes
Recommended Reading: Simple Serial Sorting Algorithm15 minutes
Recommended Reading: Parallel Odd-Even Transposition Sort15 minutes
Recommended Reading: Safety in MPI Programs 15 minutes
Lab: Practice Code10 minutes
Lab: Practice Solution10 minutes

7 devoirsTotal 63 minutes

Introduction to MPI9 minutes
MPI Setup and Communicator Functions9 minutes
SPMD and Communication9 minutes
Potential Pitfalls9 minutes
Simple Serial Sorting Algorithm9 minutes
Parallel Odd-Even Transposition Sort9 minutes
Safety in MPI Programs9 minutes

1 sujet de discussionTotal 30 minutes

MPI in Action: Understanding Setup, Communication, and Parallel Sorting30 minutes

This module aims to introduce the shared memory programming model with the help of the OpenMP library. Students will gain exposure to the functions in the OpenMP library and methods to implement those in code to implement parallelism using shared memory. Students will explore the foundational concepts of OpenMP through videos and readings, starting with the basics of the library and progressing to more advanced topics such as reduction clauses, variable scoping, and mutual exclusion. Through worked examples like the Trapezoidal Rule and sorting functions, learners will understand how to parallelise loops, manage scheduling, and apply critical sections and locks for safe concurrent execution. The module also covers tasking in OpenMP and classic concurrency problems like producers and consumers.

Inclus

12 vidéos12 lectures13 devoirs1 sujet de discussion

12 vidéosTotal 94 minutes

Introduction to OpenMP5 minutes
Programming in OpenMP10 minutes
Trapezoidal Rule10 minutes
Scope of Variables4 minutes
Reduction Clause7 minutes
Parallel-For Directive and Caveats in Them8 minutes
Sorting Functions20 minutes
Scheduling6 minutes
Producers and Consumers6 minutes
Termination, Startup and Atomic Directive7 minutes
Critical Sections and Locks6 minutes
Tasking5 minutes

12 lecturesTotal 152 minutes

Recommended Reading: Introduction to OpenMP15 minutes
Recommended Reading: Programming in OpenMP15 minutes
Recommended Reading: Trapezoidal Rule15 minutes
Recommended Reading: Scope of Variables15 minutes
Recommended Reading: Reduction Clause15 minutes
Recommended Reading: Parallel-For Directive and Caveats in Them15 minutes
Recommended Reading: Sorting Functions15 minutes
Recommended Reading: Scheduling 15 minutes
Recommended Reading: Producers and Consumers15 minutes
Recommended Reading: Termination, Startup and Atomic Directive1 minute
Recommended Reading: Critical Sections and Locks1 minute
Recommended Reading: Tasking15 minutes

13 devoirsTotal 168 minutes

Introduction to OpenMP9 minutes
Programming in OpenMP9 minutes
Trapezoidal Rule9 minutes
Scope of Variables9 minutes
Reduction Clause9 minutes
Parallel-For Directive and Caveats in Them9 minutes
Sorting Functions9 minutes
Scheduling9 minutes
Producers and Consumers9 minutes
Termination, Startup and Atomic Directive9 minutes
Critical Sections and Locks9 minutes
Tasking9 minutes
Graded Quiz - Modules 7 and 860 minutes

1 sujet de discussionTotal 30 minutes

Mastering OpenMP: From Parallel Patterns to Synchronisation30 minutes

This module will introduce the n-body problem in physics, examining its significance in simulating gravitational interactions among multiple particles. It will explore classical and modern algorithmic approaches to solving the n-body problem, followed by a discussion on their computational complexity. Emphasis will be placed on identifying opportunities for parallelisation, and students will analyse and implement efficient parallel solutions using the programming languages and parallel computing directives covered in the course.

Inclus

13 vidéos13 lectures13 devoirs1 sujet de discussion

13 vidéosTotal 107 minutes

Introduction to N-body Problem8 minutes
Serial Solutions to the N-body Problem16 minutes
Parallelising Strategy13 minutes
Parallelising Basic Solver Using OpenMP9 minutes
Parallelising Reduced Solver Using OpenMP 11 minutes
Evaluating OpenMP Performance5 minutes
Parallelising Basic Solver Using Pthreads 4 minutes
Parallelising Basic Solver Using MPI 9 minutes
Parallelising Reduced Solver Using MPI9 minutes
Evaluating MPI Performance6 minutes
Parallelising Basic Solver Using CUDA7 minutes
Evaluating CUDA Solver and Improving Performance4 minutes
Using Shared Memory for Solvers7 minutes

13 lecturesTotal 195 minutes

Recommended Reading: Introduction to N-body Problem15 minutes
Recommended Reading: Serial Solutions to the N-body Problem15 minutes
Recommended Reading: Parallelising Strategy15 minutes
Recommended Reading: Parallelising Basic Solver Using OpenMP15 minutes
Recommended Reading: Parallelising Reduced Solver Using OpenMP15 minutes
Recommended Reading: Evaluating OpenMP performance15 minutes
Recommended Reading: Parallelising Basic Solver Using Pthreads15 minutes
Recommended Reading: Parallelising Basic Solver Using MPI15 minutes
Recommended Reading: Parallelising Reduced Solver Using MPI15 minutes
Recommended Reading: Evaluating MPI Performance15 minutes
Recommended Reading: Parallelising Basic Solver Using CUDA15 minutes
Recommended Reading: Evaluating CUDA Solver and Improving Performance15 minutes
Recommended Reading: Using Shared Memory for Solvers15 minutes

13 devoirsTotal 138 minutes

Introduction to N-body Problem9 minutes
Serial Solutions to the N-body Problem9 minutes
Parallelising Strategy9 minutes
Parallelising Basic Solver Using OpenMP9 minutes
Parallelising Reduced Solver Using OpenMP9 minutes
Evaluating OpenMP Performance9 minutes
Parallelising Basic Solver Using Pthreads9 minutes
Parallelising Basic Solver Using MPI30 minutes
Parallelising Reduced Solver Using MPI9 minutes
Evaluating MPI Performance9 minutes
Parallelising Basic Solver Using CUDA9 minutes
Evaluating CUDA Solver and Improving Performance9 minutes
Using Shared Memory for Solvers9 minutes

1 sujet de discussionTotal 30 minutes

The N-Body Solver: Exploring Parallelism Across Models30 minutes

This module focuses on hands-on implementations of the Sample Sort algorithm using OpenMP, Pthreads, MPI, and CUDA. Students will explore the strengths and limitations of each parallel programming model through practical coding exercises. The module includes performance benchmarking and comparative analysis of the implementations to highlight trade-offs in scalability, efficiency, and suitability for different architectures. By the end of the module, students will have a strong grasp of each API and be equipped to make informed decisions about the most appropriate tool for a given parallel computing task.

Inclus

8 vidéos9 lectures10 devoirs1 sujet de discussion

8 vidéosTotal 61 minutes

Sample Sort and Bucket Sort10 minutes
Map17 minutes
Implementing Sample Sort Using OpenMP: First Implementation5 minutes
Implementing Sample Sort Using OpenMP: Second Implementation7 minutes
Implementing Sample Sort Using Pthreads 4 minutes
Implementing Sample Sort Using MPI6 minutes
Implementing Sample Sort Using MPI: Example5 minutes
Implementing Sample Sort Using CUDA 7 minutes

9 lecturesTotal 115 minutes

Recommended Reading: Sample Sort and Bucket Sort15 minutes
Recommended Reading: Map10 minutes
Recommended Reading: Implementing Sample Sort Using OpenMP: First Implementation15 minutes
Recommended Reading: Implementing Sample Sort Using OpenMP: Second Implementation15 minutes
Recommended Reading: Implementing Sample Sort Using Pthreads10 minutes
Recommended Reading: Implementing Sample Sort Using MPI15 minutes
Recommended Reading: Implementing Sample Sort Using MPI: Example15 minutes
Recommended Reading: Implementing Sample Sort Using CUDA10 minutes
Recommended Reading: Which API?10 minutes

10 devoirsTotal 432 minutes

Sample Sort and Bucket Sort9 minutes
Map (Quiz)9 minutes
Implementing Sample Sort Using OpenMP: First Implementation9 minutes
Implementing Sample Sort Using OpenMP: Second Implementation9 minutes
Implementing Sample Sort Using Pthreads9 minutes
Implementing Sample Sort Using MPI9 minutes
Implementing Sample Sort Using MPI: Example9 minutes
Implementing Sample Sort Using CUDA9 minutes
Graded Quiz - Modules 9 and 1060 minutes
SGA-2: Odd-Even Transposition Sort Parallelisation 300 minutes

1 sujet de discussionTotal 30 minutes

Parallel Sample Sort Across Platforms30 minutes

Final Comprehensive Examination

Inclus

1 devoir

Instructeurs

Kunal Kishore Korgaonkar

Birla Institute of Technology & Science, Pilani

2 Cours1 945 apprenants

Prof. Gargi Prabhu

Birla Institute of Technology & Science, Pilani

1 Cours62 apprenants

Offert par

Birla Institute of Technology & Science, Pilani

En savoir plus sur Algorithms

Birla Institute of Technology & Science, Pilani
Multicore and GPGPU Programming
Cours
Catégorie : Prévisualisation
Catégorie : Crédit proposé
Packt
GPU Programming with C++ and CUDA
Cours
Catégorie : Crédit proposé
Johns Hopkins University
Introduction to Concurrent Programming with GPUs
Cours
Statut : Essai gratuit
Catégorie : Crédit proposé
Coursera
OpenCL Programming
Cours
Catégorie : Prévisualisation
Catégorie : Crédit proposé

Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?

Felipe M.

Étudiant(e) depuis 2018

’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’

Jennifer J.

Étudiant(e) depuis 2020

’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’

Larry W.

Étudiant(e) depuis 2021

’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’

Chaitanya A.

’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’

Foire Aux Questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Plus de questions

Visitez le Centre d'Aide pour les Étudiants

Aide financière disponible,