Workshop: Accelerating CUDA C++ Applications with Multiple GPUs

Name: Workshop: Accelerating CUDA C++ Applications with Multiple GPUs
Start: 2024-09-11T10:00:00+02:00
End: 2024-09-11T18:00:00+02:00
Location: MS Teams

11 September 2024

MS Teams

Europe/Ljubljana timezone

Contact

Description: This workshop covers how to write CUDA C++ applications that efficiently and correctly utilize all available GPUs in a single node, dramatically improving the performance of your applications and making the most cost-effective use of systems with multiple GPUs.

Computationally intensive CUDA® C++ applications in high-performance computing, data science, bioinformatics, and deep learning can be accelerated by using multiple GPUs, which can increase throughput and/or decrease your total runtime. When combined with the concurrent overlap of computation and memory transfers, computation can be scaled across multiple GPUs without increasing the cost of memory transfers. For organizations with multi-GPU servers, whether in the cloud or on NVIDIA DGX™ systems, these techniques enable you to achieve peak performance from GPU-accelerated applications. And it’s important to implement these single-node, multi-GPU techniques before scaling your applications across multiple nodes.

At the end of the workshop, participants can obtain an official certificate from Deep Learning Institute from NVIDIA.

Workflow: The workshop takes place remotely via a browser on the AWS cloud infrastructure.

Difficulty: Basic

Language: English

Target audience: HPC developers using CUDA in the network or cloud.

Prerequisite knowledge: Professional experience programming CUDA C/C++ applications, including the use of the nvcc compiler, kernel launches, grid-stride loops, host-to-device and device-to-host memory transfers, and CUDA error handling, familiarity with the Linux command line, and experience using Makefiles to compile C/C++ code.

Skills to be gained:

By participating in this workshop, you’ll learn how to:

– Use concurrent CUDA Streams to overlap memory transfers with GPU computation.
– Utilize all available GPUs on a single node to scale workloads across all available GPUs.
– Combine the use of copy/compute overlap with multiple GPUs.
– Rely on the NVIDIA ® Nsight TM Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop.

Maximum number of participants: 30

Virtual location: MS Teams

Organisers:

Lecturers:

Name:	Domen Verber
	Domen Verber is an assistant professor at the Faculty of Electrical Engineering and Computer Science of the University of Maribor (UM FERI) and ambassador of the NVIDIA Deep Learning Institute for the University of Maribor and their HPC specialist. He has been dealing with HPC and artificial intelligence issues for more than 25 years.
	domen.verber@um.si, deep.learning@um.si

Name:	Jani Dugonik
	Jani Dugonik is an academic researcher at the Faculty of Electrical Engineering, Computer Science and Informatics of the University of Maribor (UM FERI). He has been working in the field of natural language processing and evolutionary algorithms for more than 10 years.
	jani.dugonik@um.si