Thomas van Osch
As a member of the high-performance machine learning team at SURF, Thomas focuses on streamlining AI model architectures in terms of scaling, speed and model expressivity. Within machine learning, Thomas has notable experience in LLMs (from training until inference), computer vision (computational photography, satellite imagery, video processing, image generation), data formats, etc. Ideally and ambitiously, doing all of this in a responsible way from the perspective of privacy, energy-awareness and open way.
Session
GPT-NL is a publicly funded initiative set to build a sovereign, transparent, and ethically driven Dutch Large Language Model (LLM). Its commitment to a transparent and ethically driven development process requires assessing and choosing training frameworks and training architectures that are efficient and energy aware. Over the last decade, the basic approach to training language models has remained relatively consistent, while the size of the models has grown exponentially. Therefore, increased engineering efforts are dedicated to scaling the model and training process over a large compute pool, as well as implementing an architecture that facilitates close monitoring of such a costly and energy intensive process. In this session, we will share insights into the training process of the GPT-NL model, and design decisions that help exploiting the state-of-the-art Nvidia H100-enabled nodes in the Snellius supercomputer. We will present intermediate results of our effort in designing an architecture that implements a training pipeline while supporting experiment management, traceability and energy monitoring. We will discuss our choices of software stacks for model building (i.e., native PyTorch versus Hugging Face) and distributed training (i.e. PyTorch’s FSDP versus Deep Speed’s ZeRO), supported by experimental results, with a focus on optimizing for (energy) efficient training and effective hardware utilization.