Course provider: Faculty of Information Studies in Novo mesto (FIŠ)
Instructors: Biljana Mileva Boshkoska (FIŠ), Srdjan Šrbić (FIŠ), Robi Podtržnik (FIŠ), Pavle Boškoski (FIŠ)
This intensive course is designed for AI developers and researchers seeking to overcome the limitations of individual processing units in demanding computational projects. The focus of the course is on the practical implementation of massive parallel processing using multiple graphics cards (Multi-GPU). Participants will learn how to use advanced distributed computing strategies to drastically accelerate real-world algorithms that are key to modern artificial intelligence and probabilistic modeling.
Learning objectives: The primary goal of the course is to equip participants with the skills to implement the Distributed Data Parallel (DDP) strategy as a fundamental tool for parallelizing complex computational tasks. Rather than focusing solely on the theoretical aspects of models, we will concentrate on how to effectively use DDP to scale Monte Carlo simulations and Variational Bayes inference across multiple nodes. Participants will learn to optimize data synchronization and manage distributed tensors on high-performance infrastructure, enabling the transfer of deep learning methodologies to the field of advanced statistical analysis.
Course content: The course content begins with the technical configuration of a DDP environment for Multi-GPU systems, focusing on process orchestration and communication protocols between GPU units. In the core part of the course, participants implement parallel versions of Monte Carlo algorithms, where DDP serves to distribute massive sampling tasks across the entire HPC cluster. This is followed by a practical module on parallelizing Variational Bayes inference, where we will use DDP to accelerate optimization steps in approximating distributions over large datasets. The course concludes with code optimization for GPU acceleration, allowing participants to immediately apply these techniques to their own AI projects.
Learning outcomes: Upon completion of the course, participants will be able to independently configure and use the DDP protocol for the parallel execution of any AI and statistical algorithms on Multi-GPU infrastructure. They will gain practical knowledge in converting serial Monte Carlo simulations into highly scalable distributed processes. They will be trained to use DDP in variational methods, enabling significantly faster training of complex probabilistic models. With these skills, researchers will be prepared to tackle the most demanding computational challenges where the use of HPC resources is necessary to achieve results within a reasonable timeframe.