Technology and architecture assessments for scalable and energy-aware training of GPT-NL Advanced Computing User Day 24

Technology and architecture assessments for scalable and energy-aware training of GPT-NL
.ical

2024-12-12 12:00–12:25, Quest

GPT-NL is a publicly funded initiative set to build a sovereign, transparent, and ethically driven Dutch Large Language Model (LLM). Its commitment to a transparent and ethically driven development process requires assessing and choosing training frameworks and training architectures that are efficient and energy aware. Over the last decade, the basic approach to training language models has remained relatively consistent, while the size of the models has grown exponentially. Therefore, increased engineering efforts are dedicated to scaling the model and training process over a large compute pool, as well as implementing an architecture that facilitates close monitoring of such a costly and energy intensive process. In this session, we will share insights into the training process of the GPT-NL model, and design decisions that help exploiting the state-of-the-art Nvidia H100-enabled nodes in the Snellius supercomputer. We will present intermediate results of our effort in designing an architecture that implements a training pipeline while supporting experiment management, traceability and energy monitoring. We will discuss our choices of software stacks for model building (i.e., native PyTorch versus Hugging Face) and distributed training (i.e. PyTorch’s FSDP versus Deep Speed’s ZeRO), supported by experimental results, with a focus on optimizing for (energy) efficient training and effective hardware utilization.

In a rapidly evolving AI landscape, Large Language Models (LLMs) are becoming increasingly ubiquitous. LLMs are deep neural networks capable of understanding and generating text based on vast datasets. The training process involves feeding the model immense volumes of text data, allowing it to capture intricate patterns and contextual nuances in language. Remarkable applications of LLMs have appeared in the last years: from creative writing to computer programme synthesis to autonomously acting agents.
GPT-NL is a project to build a sovereign Dutch / English LLM funded by the Ministry of Economic Affairs. GPT-NL will be a transparent language model trained from scratch. This means that we are transparent about the choices made during the data curation and training process. In doing so, we also explicitly consider challenges around bias, bias and ethical frameworks. Thus, GPT-NL contributes to more openness, transparency and protection of users' data privacy.
We consider sustainability and carbon emissions and must be responsible in our use of resources needed for the development of GPT-NL, such as energy and (cooling) water. We are working towards an efficient language model we can build based on scientific research. In doing so, we are looking critically at both the scope the model should have, as well as how to optimize the training and implementation of GPT-NL.
In this session, we will focus on the implementation of the model on the Snellius supercomputer. We will present intermediate results of our effort in designing an architecture that implements a training pipeline while supporting experiment management, traceability and energy monitoring. We will discuss our choices of software stacks for model building (i.e., native PyTorch versus Hugging Face) and distributed training (i.e. PyTorch’s FSDP versus Deep Speed’s ZeRO), supported by experimental results, with a focus on optimizing for (energy) efficient training and effective hardware utilization.

Claartje Barkhof

Claartje Barkhof is a scientist and integrator in the Advanced Computing Engineering group at TNO. Her work focuses on machine learning research, its integration into computational systems, and its application to societal use cases. She earned a master’s degree in Artificial Intelligence from the University of Amsterdam in 2021, where she also worked as a research assistant at the Institute for Logic, Language, and Computation, focusing on deep generative latent variable modeling.

Thomas van Osch

As a member of the high-performance machine learning team at SURF, Thomas focuses on streamlining AI model architectures in terms of scaling, speed and model expressivity. Within machine learning, Thomas has notable experience in LLMs (from training until inference), computer vision (computational photography, satellite imagery, video processing, image generation), data formats, etc. Ideally and ambitiously, doing all of this in a responsible way from the perspective of privacy, energy-awareness and open way.

Technology and architecture assessments for scalable and energy-aware training of GPT-NL .ical 2024-12-12 12:00–12:25, Quest

Technology and architecture assessments for scalable and energy-aware training of GPT-NL
.ical

2024-12-12 12:00–12:25, Quest