Workshop: Accelerating CUDA C++ Applications with Multiple GPUs
Wednesday, 11 September 2024 -
10:00
Monday, 9 September 2024
Tuesday, 10 September 2024
Wednesday, 11 September 2024
10:00
Introduction (- Meet the instructor. - Get familiar with your GPU-accelerated interactive JupyterLab environment.)
10:00 - 10:30
10:30
Application Overview (– Orient yourself with a single GPU CUDA C++ application that will be the starting point for the course. – Observe the current performance of the single GPU CUDA C++ application using the Nsight Systems.)
10:30 - 10:45
10:45
Introduction to CUDA Streams: (– Learn the rules that govern concurrent CUDA Stream behavior. – Use multiple CUDA streams to perform concurrent host-to-device and device-to-host memory transfers. – Utilize multiple CUDA streams for launching GPU kernels. – Observe multiple streams in the Nsight Systems Visual Profiler timeline view.)
(– Learn the rules that govern concurrent CUDA Stream behavior. – Use multiple CUDA streams to perform concurrent host-to-device and device-to-host memory transfers. – Utilize multiple CUDA streams for launching GPU kernels. – Observe multiple streams in the Nsight Systems Visual Profiler timeline view.)
10:45 - 12:45
12:45
Lunch break
12:45 - 13:45
13:45
Copy/Compute Overlap with CUDA Streams (– Learn the key concepts for effectively performing copy/compute overlap. – Explore robust indexing strategies for the flexible use of copy/compute overlap in applications. – Refactor the single-GPU CUDA C++ application to perform copy/compute overlap. – See copy/compute overlap in the Nsight Systems visual profiler timeline.)
13:45 - 15:15
15:15
Multiple GPUs with CUDA C++ (– Learn the key concepts for effectively using multiple GPUs on a single node with CUDA C++. – Explore robust indexing strategies for the flexible use of multiple GPUs in applications. – Refactor the single-GPU CUDA C++ application to utilize multiple GPUs. – See multiple GPU utilization in the Nsight Systems Visual Profiler timeline.)
15:15 - 16:15
16:15
Coffee break
16:15 - 16:30
16:30
Copy/Compute Overlap with Multiple GPUs (– Learn the key concepts for effectively performing copy/compute overlap on multiple GPUs. – Explore robust indexing strategies for the flexible use of copy/compute overlap on multiple GPUs. – Refactor the single-GPU CUDA C++ application to perform copy/compute overlap on multiple GPUs. – Observe performance benefits for copy/compute overlap on multiple GPUs. – See copy/compute overlap on multiple GPUs in the Nsight Systems visual profiler timeline.)
16:30 - 17:30
17:30
Final Review (– Complete the assessment and earn a certificate. – Review key learnings and wrap up questions. – Learn to build your own training environment from the DLI base environment container. – Take the workshop survey.) Conveners: Domen Verber, Jani Dugonik
17:30 - 18:00