Name: Workshop: Accelerating CUDA C++ Applications with Multiple GPUs
Start: 2024-09-11T10:00:00+02:00
End: 2024-09-11T18:00:00+02:00
Location: MS Teams

Workshop: Accelerating CUDA C++ Applications with Multiple GPUs

Wednesday, 11 September 2024 - 10:00

Monday, 9 September 2024
Tuesday, 10 September 2024
Wednesday, 11 September 2024

10:00 Introduction (- Meet the instructor. - Get familiar with your GPU-accelerated interactive JupyterLab environment.)

10:00 - 10:30

10:30 Application Overview (– Orient yourself with a single GPU CUDA C++ application that will be the starting point for the course. – Observe the current performance of the single GPU CUDA C++ application using the Nsight Systems.)

10:30 - 10:45

10:45 Introduction to CUDA Streams: (– Learn the rules that govern concurrent CUDA Stream behavior. – Use multiple CUDA streams to perform concurrent host-to-device and device-to-host memory transfers. – Utilize multiple CUDA streams for launching GPU kernels. – Observe multiple streams in the Nsight Systems Visual Profiler timeline view.)
(– Learn the rules that govern concurrent CUDA Stream behavior. – Use multiple CUDA streams to perform concurrent host-to-device and device-to-host memory transfers. – Utilize multiple CUDA streams for launching GPU kernels. – Observe multiple streams in the Nsight Systems Visual Profiler timeline view.)
10:45 - 12:45

12:45 Lunch break

12:45 - 13:45

13:45 Copy/Compute Overlap with CUDA Streams (– Learn the key concepts for effectively performing copy/compute overlap. – Explore robust indexing strategies for the flexible use of copy/compute overlap in applications. – Refactor the single-GPU CUDA C++ application to perform copy/compute overlap. – See copy/compute overlap in the Nsight Systems visual profiler timeline.)

13:45 - 15:15

15:15 Multiple GPUs with CUDA C++ (– Learn the key concepts for effectively using multiple GPUs on a single node with CUDA C++. – Explore robust indexing strategies for the flexible use of multiple GPUs in applications. – Refactor the single-GPU CUDA C++ application to utilize multiple GPUs. – See multiple GPU utilization in the Nsight Systems Visual Profiler timeline.)

15:15 - 16:15

16:15 Coffee break

16:15 - 16:30

16:30 Copy/Compute Overlap with Multiple GPUs (– Learn the key concepts for effectively performing copy/compute overlap on multiple GPUs. – Explore robust indexing strategies for the flexible use of copy/compute overlap on multiple GPUs. – Refactor the single-GPU CUDA C++ application to perform copy/compute overlap on multiple GPUs. – Observe performance benefits for copy/compute overlap on multiple GPUs. – See copy/compute overlap on multiple GPUs in the Nsight Systems visual profiler timeline.)

16:30 - 17:30

17:30 Final Review (– Complete the assessment and earn a certificate. – Review key learnings and wrap up questions. – Learn to build your own training environment from the DLI base environment container. – Take the workshop survey.) Conveners: Domen Verber, Jani Dugonik

17:30 - 18:00