Description: Present-day high-performance computing (HPC) and deep learning applications benefit from, and even require, cluster-scale GPU compute power. Writing CUDA® applications that can correctly and efficiently utilize GPUs across a cluster requires a distinct set of skills. In this workshop, you will learn the tools and techniques needed to write CUDA C++ applications that can scale efficiently to clusters of NVIDIA GPUs.
You’ll do this by working on code from several CUDA C++ applications in an interactive cloud environment backed by several NVIDIA GPUs. You’ll gain exposure to a handful of multi-GPU programming methods, including CUDA-aware Message Passing Interface (MPI), before proceeding to the main focus of this course, NVSHMEM™.
NVSHMEM is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams. NVSHMEM's asynchronous, GPU-initiated data transfers eliminate synchronization overheads between the CPU and the GPU. They also enable long-running kernels that include both communication and computation, reducing overheads that can limit an application’s performance when strong scaling.
At the end of the workshop, participants can obtain an official certificate from Deep Learning Institute from NVIDIA.
Workflow: The workshop takes place remotely via a browser on the AWS cloud infrastructure.
Difficulty: Basic
Language: English
Target audience: HPC developers using CUDA in the network or cloud.
Prerequisite knowledge: Intermediate experience writing CUDA C/C++ applications.
Skills to be gained:
By participating in this workshop, you’ll learn how to:
– Use concurrent CUDA Streams to overlap memory transfers with GPU computation.
– Utilize all available GPUs on a single node to scale workloads across all available GPUs.
– Combine the use of copy/compute overlap with multiple GPUs.
– Rely on the NVIDIA ® Nsight TM Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop.
Maximum number of participants: 30
Virtual location: MS Teams
Organizer:
|
Lecturers:
Name: | Domen Verber |
Domen Verber is an assistant professor at the Faculty of Electrical Engineering and Computer Science of the University of Maribor (UM FERI) and ambassador of the NVIDIA Deep Learning Institute for the University of Maribor and their HPC specialist. He has been dealing with HPC and artificial intelligence issues for more than 25 years. | |
domen.verber@um.si, deep.learning@um.si |
Name: | Jani Dugonik |
Jani Dugonik is an academic researcher at the Faculty of Electrical Engineering, Computer Science and Informatics of the University of Maribor (UM FERI). He has been working in the field of natural language processing and evolutionary algorithms for more than 10 years. | |
jani.dugonik@um.si |