The course "Multicore and GPGPU Programming" provides a foundational understanding of parallel programming, focusing on developing high-performance, multi-threaded applications in both CPU and GPU environments. Beginning with a review of multicore processor architectures, caching mechanisms, and Non-Uniform Memory Access (NUMA) systems, students will learn the essentials of shared memory programming, synchronisation techniques, and the use of locks to ensure data integrity across threads.

Multicore and GPGPU Programming
Ce cours n'est pas disponible en Français (France)

Expérience recommandée
Expérience recommandée
Niveau intermédiaire
Basic knowledge of C/C++ and computer architecture is recommended.
Expérience recommandée
Expérience recommandée
Niveau intermédiaire
Basic knowledge of C/C++ and computer architecture is recommended.
Ce que vous apprendrez
Understand the fundamentals of multi-threaded programming and its applications in multicore systems.
Develop shared memory programs in OpenMP and distributed programming using MPI.
Gain a foundational understanding of GPGPU architecture and the CUDA programming model.
Compétences que vous acquerrez
- Catégorie : Program DevelopmentProgram Development
- Catégorie : Distributed ComputingDistributed Computing
- Catégorie : AlgorithmsAlgorithms
- Catégorie : Performance TestingPerformance Testing
- Catégorie : Computer HardwareComputer Hardware
Outils que vous découvrirez
- Catégorie : C (Programming Language)C (Programming Language)
Détails à connaître

Ajouter à votre profil LinkedIn
124 devoirs
Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées

Il y a 12 modules dans ce cours
In this module, the learners will be introduced to the course and its syllabus, setting the foundation for their learning journey. The course's introductory video will provide them with insights into the valuable skills and knowledge they can expect to gain throughout the duration of this course. Additionally, the syllabus reading will comprehensively outline essential course components, including course values, assessment criteria, grading system, schedule, details of live sessions, and a recommended reading list that will enhance the learner’s understanding of the course concepts. Moreover, this module offers the learners the opportunity to connect with fellow learners as they participate in a discussion prompt designed to facilitate introductions and exchanges within the course community.
Inclus
4 vidéos1 lecture1 sujet de discussion
4 vidéos•Total 51 minutes
- Course Introductory Video•2 minutes
- Meet Your Instructor - Dr. Gargi Prabhu •1 minute
- Meet Your Instructor - Dr. Kunal Korgaonkar•1 minute
- Recording of Multicore and GPGPU Programming: Week 1 - Live Session on 25-05-23 18:32:50 [47:25]•47 minutes
1 lecture•Total 10 minutes
- Course Overview•10 minutes
1 sujet de discussion•Total 10 minutes
- Meet Your Peers•10 minutes
In this module, students will gain foundational knowledge of parallel and multi-threaded programming, exploring the core principles that underlie the efficient utilisation of modern multi-core and many-core processors. Beginning with an overview of parallel programming concepts, this module covers different types of parallelism, including data parallelism, task parallelism, and pipeline parallelism. Students will also examine critical performance metrics like speedup, efficiency, and scalability, which help in evaluating the benefits and trade-offs of parallel approaches.
Inclus
12 vidéos2 lectures12 devoirs1 sujet de discussion
12 vidéos•Total 73 minutes
- Need for Ever-Increasing Performance•8 minutes
- Parallel Systems and Parallel Programs•8 minutes
- Concurrent, Parallel, Distributed Systems•5 minutes
- Types of Parallelism: Data, Task and Pipeline Parallelism•8 minutes
- Speedup and Efficiency•5 minutes
- Amdahl’s Law •5 minutes
- Gustafson’s Law •5 minutes
- Scalability in Parallel Systems•5 minutes
- Cost of Parallelisation•7 minutes
- Sources of Overhead in Parallel Programs •5 minutes
- Timing Parallel Programs: Methods and Best Practices•7 minutes
- GPU Performance•5 minutes
2 lectures•Total 120 minutes
- Recommended Reading: Fundamentals of Parallel Computing•60 minutes
- Recommended Reading: Introduction to Performance Metrics in Parallel Computing•60 minutes
12 devoirs•Total 36 minutes
- Need for Ever-Increasing Performance•3 minutes
- Parallel Systems and Parallel Programs•3 minutes
- Concurrent, Parallel, Distributed Systems•3 minutes
- Types of Parallelism: Data, Task and Pipeline Parallelism•3 minutes
- Speedup and Efficiency•3 minutes
- Amdahl’s Law •3 minutes
- Gustafson’s Law •3 minutes
- Scalability in MIMD Systems•3 minutes
- Cost of Parallelisation•3 minutes
- Sources of Overhead in Parallel Programs•3 minutes
- Taking Timings of Parallel Programs•3 minutes
- GPU Performance•3 minutes
1 sujet de discussion•Total 30 minutes
- Why Parallelism? Revisiting the Roots of Multicore Programming•30 minutes
This module provides an in-depth exploration of multicore processor architectures, examining the design principles, performance considerations, and challenges involved in building efficient multicore systems. Students will study how multiple cores interact within a processor, focusing on memory hierarchies, caching mechanisms, and the role of parallelism in improving computational performance.
Inclus
15 vidéos2 lectures15 devoirs1 sujet de discussion
15 vidéos•Total 160 minutes
- The Von Neumann Architecture•7 minutes
- Processes, Multitasking, and Threads•5 minutes
- The Basics of Caching•7 minutes
- Virtual Memory•7 minutes
- Instruction-Level Parallelism•9 minutes
- Hardware Multithreading•6 minutes
- Classifications of Parallel Computers•6 minutes
- SIMD and MIMD Systems•7 minutes
- Interconnection Networks: Shared Memory Systems•6 minutes
- Interconnection Networks: Distributed Memory Systems•8 minutes
- Cache Coherence•8 minutes
- Shared-Memory vs. Distributed-Memory•4 minutes
- Parallel Software: Coordinating Process and Threads•11 minutes
- Distributed Memory Software•7 minutes
- Recording of Multicore and GPGPU Programming: Week 2 - Live Session on 25-05-30 18:35:08 [02:05]•62 minutes
2 lectures•Total 100 minutes
- Recommended Reading: Architecture Background•40 minutes
- Recommended Reading: Parallel Hardware and Software•60 minutes
15 devoirs•Total 114 minutes
- Graded Quiz - Modules 1 and 2 •60 minutes
- The Von Neumann Architecture•3 minutes
- Processes, Multitasking, and Threads•3 minutes
- The Basics of Caching•3 minutes
- Virtual Memory•3 minutes
- Instruction-Level Parallelism•3 minutes
- Hardware Multithreading•3 minutes
- Classifications of Parallel Computer•3 minutes
- SIMD and MIMD Systems•3 minutes
- Interconnection Networks: Shared Memory Systems•3 minutes
- Interconnection Networks: Distributed Memory Systems•6 minutes
- Cache Coherence•3 minutes
- Shared-Memory vs. Distributed-Memory•3 minutes
- Parallel Software: Coordinating Process and Threads•12 minutes
- Distributed Memory Software•3 minutes
1 sujet de discussion•Total 30 minutes
- From Von Neumann to Multicore: Evolving Architectures and Memory Realities•30 minutes
This module introduces students to the architectural principles of General-Purpose GPU (GPGPU) systems and the CUDA programming model. It explores the hardware components, including Streaming Multiprocessors (SMs), CUDA cores, and memory hierarchy, which form the foundation of GPU computing. The module also provides an overview of the CUDA programming model, emphasising its thread hierarchy, grid, and block organisation. By understanding these fundamental concepts, students will develop the ability to harness GPU architecture for high-performance parallel computing.
Inclus
15 vidéos2 lectures14 devoirs1 sujet de discussion
15 vidéos•Total 127 minutes
- GPUs and GPGPU•5 minutes
- GPU Architecture•5 minutes
- Heterogeneous Computing•4 minutes
- Paradigm of Heterogeneous Computing•5 minutes
- Introduction to CUDA•5 minutes
- Structure of a CUDA Program•8 minutes
- Threads, Blocks, and Grid•9 minutes
- Managing Memory•7 minutes
- Writing and Verifying Your Kernel•6 minutes
- Compiling and Running CUDA Program•4 minutes
- Nvidia Compute Capabilities and Device Architecture•6 minutes
- Timing Your Kernel•7 minutes
- Organising Parallel Threads•5 minutes
- Managing Devices•4 minutes
- Recording of Multicore and GPGPU Programming: Week 3 - Live Session on 25-06-06 18:31:21 [44:50]•45 minutes
2 lectures•Total 75 minutes
- Recommended Reading: GPGPU Architecture and CUDA•15 minutes
- Recommended Reading: Programming Model Overview•60 minutes
14 devoirs•Total 48 minutes
- GPUs and GPGPU•6 minutes
- GPU Architecture•3 minutes
- Heterogeneous Computing•3 minutes
- Paradigm of Heterogeneous Computing•3 minutes
- Introduction to CUDA•3 minutes
- Structure of a CUDA Program•3 minutes
- Threads, Blocks, and Grid•6 minutes
- Managing Memory•3 minutes
- Writing and Verifying Your Kernel•3 minutes
- Compiling and Running CUDA Program•3 minutes
- Nvidia Compute Capabilities and Device Architecture•3 minutes
- Timing Your Kernel•3 minutes
- Organising Parallel Threads•3 minutes
- Managing Devices•3 minutes
1 sujet de discussion•Total 30 minutes
- Harnessing GPU Power: Exploring CUDA and the Architecture of Parallelism•30 minutes
This module provides a comprehensive understanding of how CUDA executes programs on GPUs. It covers key concepts such as warps, warp scheduling, and resource partitioning, which are critical for understanding GPU hardware behaviour. The module delves into branch divergence and its impact on performance, offering strategies to minimise its effects. It also emphasises exposing parallelism effectively by leveraging CUDA’s hierarchical execution model. Students will learn how to design and optimise GPU programs by aligning with the underlying execution model to maximise efficiency and throughput.
Inclus
15 vidéos2 lectures15 devoirs1 sujet de discussion
15 vidéos•Total 135 minutes
- Introduction to CUDA Execution Model•7 minutes
- Warps and Thread Blocks•4 minutes
- Warp Divergence•9 minutes
- Resource Partitioning•6 minutes
- Latency Hiding•10 minutes
- Occupancy•5 minutes
- Synchronization•4 minutes
- Scalability•5 minutes
- Exposing Parallelism•10 minutes
- Checking Active Warps with Nvprof•6 minutes
- Checking Memory Operations with Nvprof•7 minutes
- Avoiding Branch Divergence•3 minutes
- The Parallel Reduction Problem and Thread Divergence•7 minutes
- Improving Divergence in Parallel Reduction•6 minutes
- Recording of Multicore and GPGPU Programming: Week 4 - Live Session on 25-06-13 18:32:39 [49:37]•45 minutes
2 lectures•Total 120 minutes
- Recommended Reading: Structure of a CUDA Program•60 minutes
- Recommended Reading: Exposing Parallelism and Avoiding Branch Divergence•60 minutes
15 devoirs•Total 105 minutes
- Graded Quiz - Modules 3 and 4 •60 minutes
- Introduction to CUDA Execution Model•3 minutes
- Warps and Thread Blocks •3 minutes
- Warp Divergence•3 minutes
- Resource Partitioning•6 minutes
- Latency Hiding•3 minutes
- Occupancy•3 minutes
- Synchronization•3 minutes
- Scalability•3 minutes
- Exposing Parallelism•3 minutes
- Checking Active Warps with Nvprof•3 minutes
- Checking Memory Operations with Nvprof•3 minutes
- Avoiding Branch Divergence•3 minutes
- The Parallel Reduction Problem and Thread Divergence•3 minutes
- Improving Divergence in Parallel Reduction•3 minutes
1 sujet de discussion•Total 30 minutes
- Under the Hood: Warps, Divergence, and CUDA Execution Dynamics•30 minutes
The CUDA Memory Model & Streams and Concurrency module introduces students to the intricacies of memory hierarchy in CUDA, including global, shared, and local memory. It emphasises the importance of memory coalescing and efficient memory access patterns to optimise performance on GPUs. The module also covers CUDA streams, explaining how concurrent kernel execution and memory operations can be managed to enhance parallelism. By understanding these concepts, students will gain the ability to design GPU programs that maximise throughput and minimise latency.
Inclus
14 vidéos2 lectures14 devoirs1 sujet de discussion1 laboratoire non noté
14 vidéos•Total 126 minutes
- Introduction to CUDA Memory Model•8 minutes
- Memory Allocation and Deallocation•6 minutes
- Zero Copy Memory•4 minutes
- Unified Virtual Addressing and Unified Memory •3 minutes
- Aligned and Coalesced Access•6 minutes
- CUDA Shared Memory•6 minutes
- Shared Memory Banks and Access Mode •7 minutes
- Configuring the Amount of Shared Memory•5 minutes
- Synchronisation•9 minutes
- CUDA Streams•7 minutes
- Stream Scheduling and Priorities•6 minutes
- CUDA Events•6 minutes
- Concurrent Kernel Execution•6 minutes
- Recording of Multicore and GPGPU Programming: Week 5 - Live Session on 25-06-20 18:31:59 [47:36]•48 minutes
2 lectures•Total 120 minutes
- Recommended Reading: CUDA Memory Model•60 minutes
- Recommended Reading: Streams and Concurrency•60 minutes
14 devoirs•Total 342 minutes
- SGA-1: CUDA Programming and Performance Optimisation•300 minutes
- Introduction to CUDA Memory Model•3 minutes
- Memory Allocation and Deallocation•3 minutes
- Zero Copy Memory•3 minutes
- Unified Virtual Addressing and Unified Memory •3 minutes
- Aligned and Coalesced Access•3 minutes
- CUDA Shared Memory•6 minutes
- Shared Memory Banks and Access Mode •3 minutes
- Configuring the Amount of Shared Memory•3 minutes
- Synchronisation•3 minutes
- CUDA Streams•3 minutes
- Stream Scheduling and Priorities•3 minutes
- CUDA Events•3 minutes
- Concurrent Kernel Execution•3 minutes
1 sujet de discussion•Total 30 minutes
- Smart Memory and Seamless Concurrency: CUDA Memory and Streams•30 minutes
1 laboratoire non noté•Total 60 minutes
- Hands on lab: Parallel Matrix Addition Using CUDA•60 minutes
This module explains in depth the difference between processes and threads and introduces multithreaded programming using pthreads library. Students are expected to learn about the various functions in pthreads library and implement those to solve real-world problems through a multithreaded approach. It also discusses precautions to take while developing an algorithm that uses multi-threading.
Inclus
10 vidéos11 lectures10 devoirs1 sujet de discussion
10 vidéos•Total 116 minutes
- Processes, Threads and Pthreads•4 minutes
- Hello World!!•9 minutes
- Matrix-Vector Multiplication•13 minutes
- Critical Sections•5 minutes
- Busy Waiting•6 minutes
- Mutexes•5 minutes
- Semaphores•7 minutes
- Barriers and Condition Variables•13 minutes
- Caches, Cache-Coherence and False Sharing•9 minutes
- Recording of Multicore and GPGPU Programming: Week 6 - Live Session on 25-06-27 18:38:36 [43:53]•44 minutes
11 lectures•Total 295 minutes
- Recommended Reading: Processes, Threads and Pthreads•10 minutes
- Recommended Reading: Hello World!!•60 minutes
- Recommended Reading: Matrix-Vector Multiplication•15 minutes
- Recommended Reading: Critical Sections•30 minutes
- Recommended Reading: Busy Waiting•20 minutes
- Recommended Reading: Mutexes•15 minutes
- Recommended Reading: Semaphores•30 minutes
- Recommended Reading: Barriers and Condition Variables•30 minutes
- Recommended Reading: Read-Write Locks•60 minutes
- Recommended Reading: Caches, Cache-Coherence and False Sharing•15 minutes
- Lab Instruction Document•10 minutes
10 devoirs•Total 135 minutes
- Graded Quiz - Modules 5 and 6 •60 minutes
- Processes, Threads and Pthreads•9 minutes
- Hello World!!•9 minutes
- Matrix-Vector Multiplication•9 minutes
- Critical Sections•9 minutes
- Busy Waiting•9 minutes
- Mutexes•9 minutes
- Semaphores•6 minutes
- Barriers and Condition Variables•6 minutes
- Caches, Cache-Coherence and False Sharing•9 minutes
1 sujet de discussion•Total 10 minutes
- Thread Synchronization and Shared Memory: Building Reliable Parallel Programs with Pthreads•10 minutes
This module aims to introduce students to Distributed memory programming using the Message Passing Interface (MPI). Students will learn about the functions provided by the MPI library and their descriptions. It will enable students to develop parallel programming codes and also to convert a serial programmed code into a parallel code with the help of the MPI functions.
Inclus
7 vidéos9 lectures7 devoirs1 sujet de discussion
7 vidéos•Total 70 minutes
- Introduction to MPI•4 minutes
- MPI Setup and Communicator Functions•6 minutes
- SPMD and Communication•10 minutes
- Potential Pitfalls•4 minutes
- Simple Serial Sorting Algorithm•20 minutes
- Parallel Odd-Even Transposition Sort•19 minutes
- Safety in MPI Programs•7 minutes
9 lectures•Total 125 minutes
- Recommended Reading: Introduction to MPI•15 minutes
- Recommended Reading: MPI Setup and Communicator Functions•15 minutes
- Recommended Reading: SPMD and Communication•15 minutes
- Recommended Reading: Potential Pitfalls•15 minutes
- Recommended Reading: Simple Serial Sorting Algorithm•15 minutes
- Recommended Reading: Parallel Odd-Even Transposition Sort•15 minutes
- Recommended Reading: Safety in MPI Programs •15 minutes
- Lab: Practice Code•10 minutes
- Lab: Practice Solution•10 minutes
7 devoirs•Total 63 minutes
- Introduction to MPI•9 minutes
- MPI Setup and Communicator Functions•9 minutes
- SPMD and Communication•9 minutes
- Potential Pitfalls•9 minutes
- Simple Serial Sorting Algorithm•9 minutes
- Parallel Odd-Even Transposition Sort•9 minutes
- Safety in MPI Programs•9 minutes
1 sujet de discussion•Total 30 minutes
- MPI in Action: Understanding Setup, Communication, and Parallel Sorting•30 minutes
This module aims to introduce the shared memory programming model with the help of the OpenMP library. Students will gain exposure to the functions in the OpenMP library and methods to implement those in code to implement parallelism using shared memory. Students will explore the foundational concepts of OpenMP through videos and readings, starting with the basics of the library and progressing to more advanced topics such as reduction clauses, variable scoping, and mutual exclusion. Through worked examples like the Trapezoidal Rule and sorting functions, learners will understand how to parallelise loops, manage scheduling, and apply critical sections and locks for safe concurrent execution. The module also covers tasking in OpenMP and classic concurrency problems like producers and consumers.
Inclus
12 vidéos12 lectures13 devoirs1 sujet de discussion
12 vidéos•Total 94 minutes
- Introduction to OpenMP•5 minutes
- Programming in OpenMP•10 minutes
- Trapezoidal Rule•10 minutes
- Scope of Variables•4 minutes
- Reduction Clause•7 minutes
- Parallel-For Directive and Caveats in Them•8 minutes
- Sorting Functions•20 minutes
- Scheduling•6 minutes
- Producers and Consumers•6 minutes
- Termination, Startup and Atomic Directive•7 minutes
- Critical Sections and Locks•6 minutes
- Tasking•5 minutes
12 lectures•Total 152 minutes
- Recommended Reading: Introduction to OpenMP•15 minutes
- Recommended Reading: Programming in OpenMP•15 minutes
- Recommended Reading: Trapezoidal Rule•15 minutes
- Recommended Reading: Scope of Variables•15 minutes
- Recommended Reading: Reduction Clause•15 minutes
- Recommended Reading: Parallel-For Directive and Caveats in Them•15 minutes
- Recommended Reading: Sorting Functions•15 minutes
- Recommended Reading: Scheduling •15 minutes
- Recommended Reading: Producers and Consumers•15 minutes
- Recommended Reading: Termination, Startup and Atomic Directive•1 minute
- Recommended Reading: Critical Sections and Locks•1 minute
- Recommended Reading: Tasking•15 minutes
13 devoirs•Total 168 minutes
- Graded Quiz - Modules 7 and 8•60 minutes
- Introduction to OpenMP•9 minutes
- Programming in OpenMP•9 minutes
- Trapezoidal Rule•9 minutes
- Scope of Variables•9 minutes
- Reduction Clause•9 minutes
- Parallel-For Directive and Caveats in Them•9 minutes
- Sorting Functions•9 minutes
- Scheduling•9 minutes
- Producers and Consumers•9 minutes
- Termination, Startup and Atomic Directive•9 minutes
- Critical Sections and Locks•9 minutes
- Tasking•9 minutes
1 sujet de discussion•Total 30 minutes
- Mastering OpenMP: From Parallel Patterns to Synchronisation•30 minutes
This module will introduce the n-body problem in physics, examining its significance in simulating gravitational interactions among multiple particles. It will explore classical and modern algorithmic approaches to solving the n-body problem, followed by a discussion on their computational complexity. Emphasis will be placed on identifying opportunities for parallelisation, and students will analyse and implement efficient parallel solutions using the programming languages and parallel computing directives covered in the course.
Inclus
13 vidéos13 lectures13 devoirs1 sujet de discussion
13 vidéos•Total 107 minutes
- Introduction to N-body Problem•8 minutes
- Serial Solutions to the N-body Problem•16 minutes
- Parallelising Strategy•13 minutes
- Parallelising Basic Solver Using OpenMP•9 minutes
- Parallelising Reduced Solver Using OpenMP •11 minutes
- Evaluating OpenMP Performance•5 minutes
- Parallelising Basic Solver Using Pthreads •4 minutes
- Parallelising Basic Solver Using MPI •9 minutes
- Parallelising Reduced Solver Using MPI•9 minutes
- Evaluating MPI Performance•6 minutes
- Parallelising Basic Solver Using CUDA•7 minutes
- Evaluating CUDA Solver and Improving Performance•4 minutes
- Using Shared Memory for Solvers•7 minutes
13 lectures•Total 195 minutes
- Recommended Reading: Introduction to N-body Problem•15 minutes
- Recommended Reading: Serial Solutions to the N-body Problem•15 minutes
- Recommended Reading: Parallelising Strategy•15 minutes
- Recommended Reading: Parallelising Basic Solver Using OpenMP•15 minutes
- Recommended Reading: Parallelising Reduced Solver Using OpenMP•15 minutes
- Recommended Reading: Evaluating OpenMP performance•15 minutes
- Recommended Reading: Parallelising Basic Solver Using Pthreads•15 minutes
- Recommended Reading: Parallelising Basic Solver Using MPI•15 minutes
- Recommended Reading: Parallelising Reduced Solver Using MPI•15 minutes
- Recommended Reading: Evaluating MPI Performance•15 minutes
- Recommended Reading: Parallelising Basic Solver Using CUDA•15 minutes
- Recommended Reading: Evaluating CUDA Solver and Improving Performance•15 minutes
- Recommended Reading: Using Shared Memory for Solvers•15 minutes
13 devoirs•Total 138 minutes
- Introduction to N-body Problem•9 minutes
- Serial Solutions to the N-body Problem•9 minutes
- Parallelising Strategy•9 minutes
- Parallelising Basic Solver Using OpenMP•9 minutes
- Parallelising Reduced Solver Using OpenMP•9 minutes
- Evaluating OpenMP Performance•9 minutes
- Parallelising Basic Solver Using Pthreads•9 minutes
- Parallelising Basic Solver Using MPI•30 minutes
- Parallelising Reduced Solver Using MPI•9 minutes
- Evaluating MPI Performance•9 minutes
- Parallelising Basic Solver Using CUDA•9 minutes
- Evaluating CUDA Solver and Improving Performance•9 minutes
- Using Shared Memory for Solvers•9 minutes
1 sujet de discussion•Total 30 minutes
- The N-Body Solver: Exploring Parallelism Across Models•30 minutes
This module focuses on hands-on implementations of the Sample Sort algorithm using OpenMP, Pthreads, MPI, and CUDA. Students will explore the strengths and limitations of each parallel programming model through practical coding exercises. The module includes performance benchmarking and comparative analysis of the implementations to highlight trade-offs in scalability, efficiency, and suitability for different architectures. By the end of the module, students will have a strong grasp of each API and be equipped to make informed decisions about the most appropriate tool for a given parallel computing task.
Inclus
8 vidéos9 lectures10 devoirs1 sujet de discussion
8 vidéos•Total 61 minutes
- Sample Sort and Bucket Sort•10 minutes
- Map•17 minutes
- Implementing Sample Sort Using OpenMP: First Implementation•5 minutes
- Implementing Sample Sort Using OpenMP: Second Implementation•7 minutes
- Implementing Sample Sort Using Pthreads •4 minutes
- Implementing Sample Sort Using MPI•6 minutes
- Implementing Sample Sort Using MPI: Example•5 minutes
- Implementing Sample Sort Using CUDA •7 minutes
9 lectures•Total 115 minutes
- Recommended Reading: Sample Sort and Bucket Sort•15 minutes
- Recommended Reading: Map•10 minutes
- Recommended Reading: Implementing Sample Sort Using OpenMP: First Implementation•15 minutes
- Recommended Reading: Implementing Sample Sort Using OpenMP: Second Implementation•15 minutes
- Recommended Reading: Implementing Sample Sort Using Pthreads•10 minutes
- Recommended Reading: Implementing Sample Sort Using MPI•15 minutes
- Recommended Reading: Implementing Sample Sort Using MPI: Example•15 minutes
- Recommended Reading: Implementing Sample Sort Using CUDA•10 minutes
- Recommended Reading: Which API?•10 minutes
10 devoirs•Total 432 minutes
- Graded Quiz - Modules 9 and 10•60 minutes
- SGA-2: Odd-Even Transposition Sort Parallelisation •300 minutes
- Sample Sort and Bucket Sort•9 minutes
- Map (Quiz)•9 minutes
- Implementing Sample Sort Using OpenMP: First Implementation•9 minutes
- Implementing Sample Sort Using OpenMP: Second Implementation•9 minutes
- Implementing Sample Sort Using Pthreads•9 minutes
- Implementing Sample Sort Using MPI•9 minutes
- Implementing Sample Sort Using MPI: Example•9 minutes
- Implementing Sample Sort Using CUDA•9 minutes
1 sujet de discussion•Total 30 minutes
- Parallel Sample Sort Across Platforms•30 minutes
Final Comprehensive Examination
Inclus
1 devoir
1 devoir•Total 30 minutes
- Final Comprehensive Examination •30 minutes
Instructeurs


Offert par

Offert par

Birla Institute of Technology & Science, Pilani (BITS Pilani) is one of only ten private universities in India to be recognised as an Institute of Eminence by the Ministry of Human Resource Development, Government of India. It has been consistently ranked high by both governmental and private ranking agencies for its innovative processes and capabilities that have enabled it to impart quality education and emerge as the best private science and engineering institute in India. BITS Pilani has four international campuses in Pilani, Goa, Hyderabad, and Dubai, and has been offering bachelor's, master’s, and certificate programmes for over 58 years, helping to launch the careers for over 1,00,000 professionals.
En savoir plus sur Algorithms

Cours
JJohns Hopkins University
Cours
CCoursera
Cours
JJohns Hopkins University
Cours
Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

Ouvrez de nouvelles portes avec Coursera Plus
Accès illimité à 10,000+ cours de niveau international, projets pratiques et programmes de certification prêts à l'emploi - tous inclus dans votre abonnement.
Faites progresser votre carrière avec un diplôme en ligne
Obtenez un diplôme auprès d’universités de renommée mondiale - 100 % en ligne
Rejoignez plus de 3 400 entreprises mondiales qui ont choisi Coursera pour les affaires
Améliorez les compétences de vos employés pour exceller dans l’économie numérique
Foire Aux Questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
Plus de questions
Aide financière disponible,