We start the day with our Plenary Welcome and Introduction, where we’ll outline the day’s agenda, introduce key speakers, and provide important logistical information. This session will give attendees a clear overview of the event’s goals and expectations, helping everyone make the most of the sessions ahead.
Multimodal foundation models are a revolutionary class of AI models that provide impressive abilities to generate multimedia content and do so by interactive prompts in a seemingly creative manner. These foundation models are often self-supervised transformer-based models pre-trained on large volumes of data, typically collected from the web. They already form the basis of all state-of-the-art systems in computer vision and natural language processing across a wide range of tasks and have shown impressive transfer learning abilities. Despite their immense potential, these foundation models face challenges in fundamental perception tasks such as spatial grounding and temporal reasoning, have difficulty to operate on low-resource scenarios, and neglect human-alignment for ethical, legal, and societal acceptance.
Join Peter as he shares the dynamic use of SURF’s infrastructure by the advanced computing community in 2024. Who are the key players leveraging the technology? Michielse also takes a moment to honor Wim Nieuwpoort, a pioneering figure in high-performance computing, reflecting on his lasting impact and the legacy that continues to inspire the field today.
The exponential growth in data volume and variety across various domains has led to significant challenges in efficiently processing, storing, and managing large-scale datasets. Remote sensing, with its increasingly diverse and complex datasets and demanding computational requirements, exemplifies these challenges as a prominent example of big data processing needs. Cloud computing offers a promising solution for addressing the scalability and resource allocation needs of big data processing by providing a distributed environment where resources can be dynamically managed. A typical cloud-based big data processing platform encompasses infrastructure orchestration, distributed processing frameworks, data access mechanisms, and user interfaces. While this approach enables efficient handling of large-scale data, it also raises concerns regarding energy consumption and carbon footprints. This presentation will delve into the proposed methods and tools aimed at optimizing energy consumption for big data processing within a cloud environment, using remote sensing big data as a representative example. The discussion will be organized around three interrelated topics: establishing an energy-aware benchmarking framework, optimizing infrastructure orchestration for energy efficiency, and implementing energy-efficient task scheduling. Firstly, the benchmarking framework includes applications, data, and monitoring toolkits for collecting and analyzing performance, resource utilization, and more importantly, energy metrics within distributed big data systems. By benchmarking these metrics, we can identify key areas for improvement in energy efficiency. Secondly, optimizing infrastructure orchestration involves proposing resource allocation strategies such as automatic scaling of clusters, container consolidation, and prioritizing workloads, considering energy efficiency as the main criterion. These strategies aim to reduce energy consumption without compromising performance as much as possible, allowing for the benefits to be applied across various big data applications without requiring changes to the existing codebase. Thirdly, a multi-objective task scheduling strategy is introduced to minimize energy consumption while maintaining acceptable execution times at the computing task level. The output of this research includes software components specifically designed to be integrated into widely used remote sensing big data platforms to measure and improve the energy efficiency of distributed big data processing. Additionally, the research will engage the broader community through workshops and mini symposia to disseminate the findings and methodologies developed. By focusing on these strategies, we aim to advance the field of big data processing by providing tools that can be adapted across various domains and promote sustainable practices in cloud-based big data applications.
The discovery of new materials has historically been a driving force for advancements in technology. Through progress in theoretical understanding and experimental control, new materials can now be designed at the atomic level. This will lead to a new class of materials, Quantum Materials, where quantum mechanical effects manifest on macroscopic scales [1]. There is a national commitment to study the fundamental science of quantum materials, as recognized by the recent NWO gravitation programme “Materials for the Quantum Age – QuMat” [2].
Quantum Materials require a quantum description. To this end, we use first principles simulations that can capture the material specific details at the quantum level. In this talk we focus on two state-of-the-art applications, first-principles superconductivity and theoretical spectroscopy. Our group is proactively developing theory and implementing it in open-source codes SIESTA [3-5] and YAMBO [6-8], respectively. We then use these codes, among others, to study realistic systems involving interfaces, defects and heterostructures consisting of tens to hundreds of atoms. To scale the codes to these system sizes, the availability of High Performance Computing (HPC) facilities is essential.
We discuss our approaches to run these codes efficiently on HPC resources. In particular, we show how the codes exploit parallelism and distributed memory, and the need for good built-in heuristics to distribute the workload. We also talk about the trend of modular code design that allows the offloading of the most expensive operations to highly optimized libraries like OpenBLAS, ELPA and ELSI [9,10]. In addition to code optimization, we also discuss the importance of good tooling and information resources. From proper documentation and code maintenance to reproducible build tools like EasyBuild and Spack and workflow managers like AiiDA [11]. We finally highlight the recent development of porting routines to GPUs to further accelerate simulations [5, 7, 12].
References
[1] B. Keimer and J. E. Moore, The physics of quantum materials, Nature Physics 13, 1045 (2017)
[2] Materials for the Quantum Age – QuMat, https://qumat.org/
[3] R. Reho, N. Wittemeier, A. H. Kole, P. Ordejón, Z. Zanolli, Density functional Bogoliubov-de Gennes theory for superconductors implemented in the SIESTA code,
https://doi.org/10.48550/arXiv.2406.02022
to appear in Phys. Rev. B, 2024
[4] J. M. Soler et al., The SIESTA method for ab initio order-N materials simulation, J. Phys.: Condens. Matter 14(11), 2745–2779 (2002)
[5] A. García et al., Siesta: Recent developments and applications, J. Chem. Phys 152(20), 204108 (2020)
[6] A. Marini et al., yambo: An ab initio tool for excited state calculations, Computer Physics Communications 180, 1392 (2009)
[7] D. Sangalli et al., Many-body perturbation theory calculations using the yambo code, Journal of Physics: Condensed Matter 31, 325902 (2019)
[8] R. Reho, A. R. Botello-Méndez, D. Sangalli, M. J. Verstraete, Zeila Zanolli, Excitonic response in transition metal dichalcogenide heterostructures from first principles: Impact of stacking, twisting, and interlayer distance, Phys Rev B 110, 035118 (2024)
[9] T. Auckenthaler et al., Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations, Parallel Computing 27(12), 783-794 (2011)
[10] V. W.-z. Yu et al., ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers, Computer Physics Communications 222, 267-285 (2018)
[11] S.P. Huber et al., AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance, Sci Data 7, 300 (2020)
[12] Ivan Carnimeo et al., Quantum ESPRESSO: One Further Step toward the Exascale, Journal of Chemical Theory and Computation 19(20), 6992-7006 (2023)
Authors: A. H. Kole, A. R. Botello-Méndez, Zeila Zanolli
We demonstrate the feasibility of using generative AI for processing large quantities of heterogeneous, unstructured data. In an exploratory study on the application of this new technology in research data processing, we identified tasks for which rule-based or traditional machine learning approaches were difficult to apply, and then performed these tasks using generative AI.
We used generative AI in three research projects from three different domains, involving the following complex data processing tasks:
1) Information extraction: Extraction of plant species names from historical seedlists (catalogues of seeds) published by botanical gardens.
2) Natural language understanding: Extraction of certain data points (name of drug, name of health indication, relative effectiveness, cost-effectiveness, etc.) from documents published by different Health Technology Assessment organisations in the EU.
3) Text classification: Assignment of industry codes to projects on the crowdfunding website Kickstarter.
We share the lessons we learnt from these use cases: How to determine if generative AI is an appropriate tool for a given data processing task, and if so, how to maximise the accuracy and consistency of the results obtained using generative AI.
The increasing effects of global climate change urge a global energy transition, in which large-scale storage of renewable energy technologies is expected to play a primary role. Redox flow batteries (RFBs) have emerged as a promising technology for grid-scale energy storage. During the RFB operation, the liquid electrolytes, stored in external tanks, are actively circulated through the electrochemical stack consisting of flow fields, porous electrodes, and membranes, where electrochemical reactions take place on the surface of the porous electrodes. The porous electrodes used in RFBs play a critical role in determining the battery's performance by affecting the thermodynamics, kinetics, and transport phenomena [1].
Despite its advantages, RFB technology has seen limited adoption due to economic and technical challenges. Aiming to boost cost-effectiveness, one effective strategy is to increase the stack power density by increasing the efficiency of the electrodes leading to an increase in the overall system performance [2]. Project TopeSmash aims to develop novel computational models for accelerating the design of porous electrodes and understanding the role of structure in their performance. The project proposes using topology optimization (TO) to design microarchitected variable porosity 3D porous electrodes for RFBs to decrease power losses across various operating conditions. This approach requires integrating two different types of models: 1) multi-physics models to build a theoretical framework to adequately relate the local electrode properties to the overall RFB performance, and 2) TO models to inversely design microarchitected variable porosity 3D porous electrodes.
In TopeSmash project, we have developed a high-performance TO framework for 3D porous electrodes in next-generation RFBs. The models were developed across various length scales using finite element/volume/difference methods implemented using the open-source codes Firedrake, OpenFOAM, and PETSc and in-house CUDA codes. The resulting TO designs were transformed into cellular architectures using triply periodic minimal surface (TPMS) structures using open-source codes ASLI and CGAL, the output of which were additively manufactured using stereolithography 3D printing to assess the performance of the inversely designed electrodes in a real setup. We extensively use the CPU and GPU nodes of Snellius for the above computational workflow.
In this presentation, I will first discuss the structure of the multi-physics modeling approach. Second, I will present the integration of these models into the TO framework. Finally, I will show the transformation of the TO results into cellular infills. I will highlight how we employ Snellius in each of the mentioned steps.
References
1. Alotto et al. Renew. Sustain. Energy Rev. 29, 325–335 (2014).
2. B. K. Chakrabarti et al., Sustainable Energy & Fuels. 4, 5433–5468 (2020).
An overview of the data visualization options that are available on Snellius, including support from SURF.
We present a recently developed numerical method for the solution of the quantum mechanical three-body problem, in the challenging regime where interparticle interactions are strong. This method has shown unprecedented accuracy in comparing to state-of-the-art experiments, yet the underlying code possesses limited scalability due to its sequential and memory-limited nature. To solve this problem, we discuss a new, fully parallelized implementation of the method based on MPI, suitable for running on high-performance computing clusters.
During atmospheric entry, the hypersonic flow environment around capsules or space debris is characterized by strong shock waves, complex fluid thermochemistry, and gas-surface interactions (GSI). Such interactions could induce material decomposition and mass loss, referred to as ablation, which alters the shape of the flying object as its surface recedes. Understanding the influence of these phenomena is crucial in the design of future spacecraft. Ground testing is inadequate for simultaneously replicating all aspects of these types of flows. Hence, proper computational modeling and simulations are essential, yet usually resource-intensive and methodologically demanding. This talk will give a brief overview of some of the recent developments at the Aerodynamics Group of the TU Delft Aerospace Engineering Faculty regarding the computational fluid dynamics (CFD) simulations enabled by high performance computing (HPC) to study high-speed high-temperature flows with gas and surface reactions with an emphasis on the prediction and analysis of laminar to turbulent transition.
Integrating SURF Research Cloud with Research Data Management services powered by Yoda (and iRODS) enables RDM-rich driven research. Data handling is then taken care of by the collaboration between the data management platform and the compute platform. Our data management platform is powered by Yoda (and iRODS), which provides advanced features like data provenance, metadata management, policy enforcement, and secure access control. This ensures that all data is traceable, reproducible, and managed according to the specific needs of the users and their research projects’ policies.
The Research Cloud platform facilitates the creation and management of Virtual Research Environments (VRE). When users create a VRE and log in, they are automatically authenticated to the Yoda server, allowing their data to be ready for their applications to interact with it simplifying the user experience and enhancing data accessibility. By bringing the data platform closer to the compute platform, this integration supports adherence to FAIR best practices throughout the entire research lifecycle.
This integration work required close collaboration between the SURF Research Cloud (SRC), SURF Research Access Management (SRAM), and Data Management Services (DMS) teams. SRAM introduced a new device token flow to enhance secure authentication processes. The SRC platform uses this flow to authenticate the right users to specific servers based on information retrieved from a registry server, taken care of by DMS. To maintain uninterrupted access, SRC takes care of token renewal, ensuring continuous user authentication on the VREs.
This functionality is already in use by Erasmus University, integrating their Yoda server with their Research Cloud environment.
LUMI is a pre-exascale supercomputer and currently the most powerful system in Europe. The Netherlands is a member of the LUMI consortium and SURF provides Dutch researchers access to LUMI, allowing them to run their scientific applications at unprecedented scale.
Cardiac output (CO) is an essential indicator of patient hemodynamic status. Monitoring of CO in the intensive care unit has been shown to improve perioperative outcomes by supporting patient fluid management. Arterial blood pressure-based cardiac output (APCO) estimation devices are minimally invasive compared to CO estimation using (transpulmonary) thermodilution, like the PiCCO system or the highly invasive gold standard method using the pulmonary artery catheter (PAC). However, inaccuracy in APCO device estimations during hemodynamically unstable periods, especially in vasodilatory situations, hamper their application in the critical care setting. An approach to improve APCO estimation involves utilizing a one-dimensional convolutional neural network (1D-CNN) to predict stroke volume (SV) from arterial blood pressure (ABP) and patient demographics. Previously published work demonstrated that by pre-training models on SV data from commercial APCO devices and adjusting them with transfer learning using SV data from the PAC, 1D-CNNs have superior performance over the in-use FloTrac APCO device. Preliminary results in the current study showed that by altering model training hyperparameters, model performance was improved further, significantly lowering the absolute error in PAC SV predictions of the original settings model from 13.9 (SD 11.6) mL to 11.7 (SD 11.0) mL for the new settings model (p < 0.001). This result shows promise in further improvement of deep learning-based APCO algorithms and the estimation of CO from ABP in the critical care setting.
Interactive design-through-analysis workflows in XR to facilitate computational steering made possible by physics-informed ML, a low-barrier frontend and powerful compute system in the backend.
Computational steering has seen regular incarnations in the Computational Science and Engineering domain with every leap forward in computing and visualisation technologies. While often associated with the ability to interact with large-scale simulations running on remote high-performance compute clusters, this presentation will introduce novel interactive design-through-analysis techniques through visual demonstration with different modalities including XR devices.
The design-through-analysis paradigm means seamless integration of computer-aided design and simulation-based analysis tools so that scientists, engineers & researchers can go back and forth between product design, analysis, and optimisation.
The proposed approach's novelty consists of replacing traditional simulation-based analysis that often hinders rapid design-through-analysis workflows with our recently developed IgANets, which is the embedding of physics-informed machine learning into the Isogeometric Analysis paradigm. More precisely, we train parametrized deep networks to predict solution coefficients of B-Spline/NURBS representations in a compute-intensive offline stage. Problem configurations and geometries are encoded as B-Spline/NURBS objects and passed to the network as inputs, to provide a mechanism for user interaction. Evaluation of IgANets is instantaneous, thereby enabling interactive feedback loops.
In this presentation, we will present a first-of-its-kind demonstrator that couples IgANets, developed at the TU Delft, with a novel user frontend in XR, developed at SURF. Connected with this presentation is the wish to initiate a new trend in computational steering, interactive design-through-analysis.
The KM3NeT collaboration is building a neutrino telescope in the Mediterranean sea, to study both the intrinsic properties of neutrinos and cosmic high energy neutrino sources. Once fully constructed, our computing needs will rise to an eventual data volume of ~500TB of data per year, and computing needs of 1000-2000 cores on average. This will require a transition towards distributed computing and data storage. This talk will cover our plans for the infrastructure and software required, based on DIRAC and RUCIO, and the status of our current tests with 15% of the detector constructed.
In this duo presentation, we will explore the promising impact of quantum computing on chemistry through QC2, a software that connects quantum chemistry with quantum computers. The first part will provide an overview of the need for quantum computing in chemistry, using real-world applications to illustrate its potential. We will also touch on recent advancements in drug and materials discovery, along with the opportunities and challenges posed by current NISQ quantum hardware. Finally, we will discuss the progress toward fault-tolerant systems, with the expectation of reaching over 100 logical qubits in the near future.
In the second part, we will present QC2’s modular framework, which seamlessly integrates quantum chemistry tools like PySCF and PSI4 with quantum computing platforms. This innovation streamlines quantum-enhanced simulations in the near term, enabling chemists and researchers to use cutting-edge quantum algorithms without the need to manage complex pre- and post-processing steps. The software is versatile, compatible with any quantum chemistry software, and integrates seamlessly with any quantum software development kit. Join us as we explore how these advancements are reshaping the future of quantum chemistry on quantum computers, making it accessible and user-friendly for computational chemists in both academia and industry, regardless of their expertise in quantum computing.
The portability of High Performance Computing (HPC) codes can be a daunting task due to their complexity and the widespread propagation of low-level, architecture--dependent, memory access operations in the code. As a result, portability implies most of the times a massive rewriting of near entire codes, which is highly unsustainable. The root of the localized memory access is that algorithms are traditionally developed and implemented from a local, certainly intuitive, perspective (e.g.: nodes, element, stencil), which conversely requires a local memory access.
In the context of continuum mechanics (e.g.: fluid and solid mechanics, heat and mass transfer, plasma physics, etc), I propose to remove such a bottleneck by adopting whole algebraic approach, as was already proposed by the BLAS standard more than 40 years ago. Since the formulation of the governing equations is widely regarded in terms of vector calculus (which is the de facto lingua franca of all these problems), it is entirely possible to formulate the discrete governing equation in terms of discrete vector calculus. Actually, there is no fundamental reason to develop codes for continuum physics from a local perspective. Discrete vector calculus operators (e.g.: div, grad, curl, etc) are then represented by matrices. The operations reduce to basic matrix--vector and vector--vector products. This algebraic framework does not only present a fantastic opportunity for encapsulation of the memory access operations, but it also provides a much more sustainable structure by being based in a much smaller set of computational kernels.
While the adoption of an algebraic formulation is nothing new, I propose to extend it to the design and implementation of the computational aspects of HPC libraries. By sticking to rigorous mathematical definitions, we leverage object--oriented programming to define the mathematical objects that will replace common computational objects that are usually implemented ad hoc, most of the times with a particular application in mind (e.g.: mesh, halo, communicators, etc).
The paradoxal example is the concept of "mesh": while common in most continuum mechanics software, there is no strict mathematical definition of what a mesh is, leading to a myriad of different implementations which makes it virtually impossible to reuse among different codes. I propose to replace by a manifold object, which is otherwise well defined in mathematical terms. I will show how the manifold concept does naturally accommodate an interface to store data in a visualization software or topological concepts related to the partition of domain in a parallel environment.
In this talk, I will present a tentative mathematical structure to accommodate common computational concepts found in continuum mechanics applications, and discuss the possibility and interest of turning it into a standard.
GPT-NL is a publicly funded initiative set to build a sovereign, transparent, and ethically driven Dutch Large Language Model (LLM). Its commitment to a transparent and ethically driven development process requires assessing and choosing training frameworks and training architectures that are efficient and energy aware. Over the last decade, the basic approach to training language models has remained relatively consistent, while the size of the models has grown exponentially. Therefore, increased engineering efforts are dedicated to scaling the model and training process over a large compute pool, as well as implementing an architecture that facilitates close monitoring of such a costly and energy intensive process. In this session, we will share insights into the training process of the GPT-NL model, and design decisions that help exploiting the state-of-the-art Nvidia H100-enabled nodes in the Snellius supercomputer. We will present intermediate results of our effort in designing an architecture that implements a training pipeline while supporting experiment management, traceability and energy monitoring. We will discuss our choices of software stacks for model building (i.e., native PyTorch versus Hugging Face) and distributed training (i.e. PyTorch’s FSDP versus Deep Speed’s ZeRO), supported by experimental results, with a focus on optimizing for (energy) efficient training and effective hardware utilization.
The preservation of biodiversity is critical for maintaining ecological balance and ensuring the sustainability of ecosystems. However, biodiversity faces numerous threats, including habitat loss, climate change, and the proliferation of invasive species. Addressing these challenges requires comprehensive monitoring, predictive and conservation planning capabilities that currently do not exist [1].
Deep learning Foundation Models (FMs) [2] have revolutionized numerous scientific domains by leveraging vast datasets to learn general-purpose representations adaptable to various downstream tasks. This paradigm holds immense promise for biodiversity conservation.
In this talk, we introduce the concept of Biodiversity Foundation Model (BFM), a large-scale, multimodal AI model pre-trained on diverse biodiversity data modalities. These modalities include imagery, audio recordings, genomic data, environmental DNA (eDNA), satellite and remote sensing data, geospatial data, climate data, textual data, and sensor data. The BFM aims to enhance biodiversity monitoring, prediction, and conservation efforts, while being flexible and robust to any kind of downstream task, from classification to prediction.
Drawing parallels from models like Aurora [3] and Prov-GigaPath [4], we hypothesize that the BFM can significantly outperform traditional methods in biodiversity-related tasks. For example, using pre-trained weights from a vast dataset of environmental DNA, the BFM could rapidly identify and monitor species presence in various habitats, providing critical data for conservation efforts.
The BFM can transform biodiversity conservation in several ways:
- Enhanced Monitoring: By integrating diverse data sources, the BFM provides comprehensive and real-time monitoring of ecosystems.
- Predictive Analytics: BFM predict future changes in biodiversity due to various factors, enabling proactive conservation measures.
- Invasive Species Management: Early detection and monitoring of invasive species through BFM can help mitigate their impact on native ecosystems.
- Climate Change Adaptation: The BFM can identify potential climate refugia and assist in developing strategies to protect vulnerable species.
Still, like any other advance AI model, BFM comes with a series of challenges from diverse and vast amount of data download, storage and pre-processing, to architecture development, training, test, evaluation and finally safe deployment. Each of these topics require careful handling and a multi-disciplinary team with ecologist, computer scientists, AI and HPC experts to ensure its success.
The session will feature both presentations and an interactive open discussion segment. We warmly invite you to participate in this engaging dialogue, where we will collectively delve into the significant advancements and key challenges associated with the development and utilisation of Foundation Models in the dynamic field of Earth Sciences.
Energy efficiency in computing becomes more and more important as climate change impacts intensify, energy costs increase, and efforts to reach the Sustainable Development Goals to tackle environmental, social and economic issues gain momentum. This is especially true for large-scale computing for research, including cloud computing, which underpin advancements and breakthroughs in many scientific domains. Achieving significant improvements in energy efficiency for such resource-intensive computing tasks, especially artificial intelligence methods including machine learning, deep learning and the new generation of large language models, is a complex challenge that requires a coordinated effort across hardware and software domains, as well as a large group of actors including infrastructure providers, system administrators, software developers, and researchers. Despite the rapid pace of technological development, many existing tools and platforms still lack essential features, such as detailed energy consumption metrics and granular task-level monitoring capabilities. Optimization of research computing workflows considering energy efficiency is also mostly overlooked. This panel discussion seeks to unite a diverse array of experts from the Netherlands, and potentially from the international community, to shed light on the current practices and challenges of energy-efficient computing, with a special focus on research. We aim to explore how energy efficiency is currently addressed within advanced computing, identify critical gaps, and discuss actionable steps to enhance collaboration and drive forward more energy-efficient practices. By fostering this exchange of ideas and experiences of different actors, we hope to contribute to the development of computing infrastructures and workflows that are not only powerful but also environmentally sustainable.
Panelists:
- Prof. Dr. Rob van Nieuwpoort (Leiden University)
- Gilles Tourpe (Amazon Web Services)
- Gijs van den Oord (Netherlands eScience Centre)
- Sagar Dolas (SURF)
- Adhitya Adhitya Bhawiyuga (University of Twente)
A primary objective of the coalition is to unify the currently fragmented Dutch community and become more visible to all the stakeholders. The coalition will synthesize a clear HPC agenda underpinned by use-cases that highlight both academic and economic value. The coalition will act as an advisory and sounding board to represent the Dutch HPC community.
This year's meeting will develop the discussion based on a set of prominent HPC projects and the development and user experience around them. These presentations and the related discussions should directly contribute to the Community Position paper draft.
Speakers:
Dr. Richard Stevens (UTwente)
Prof. Zeila Zanolli (UtrechtU)
Prof. Alexander Bonvin (UtrechtU)
Simon Bijdevier (ClusterVision)
Research today is undeniably data-driven, with some of the most compelling insights emerging from the analysis of sensitive data. Examples include personal information, health records, and commercial data, all of which hold significant potential when properly leveraged. Combining these types of data with advanced analytical techniques can open up entirely new avenues of discovery, leading to breakthroughs that would otherwise remain out of reach.
In this panel discussion I want to go with you through my experiences in working with sensitive as a former researcher at Statistics Netherlands (CBS) and the TU Delft, and how SURF is offering and continuously developing state-of-the-art software and infrastructure solutions to enable this type of research. I want to share the possibilities of working with sensitive nature and open up the discussion surrounding this matter.
Quantum science and technologies are rapidly developing and hold great potential to revolutionize numerous fields in academia and industry. As we progress, it is crucial to support education, awareness, and scientific curiosity in the topic. In this panel, we will discuss the current state of quantum computing and what this means for researchers interested in exploring and exploiting quantum technology.
This talk presents a pipeline that creates synthetic data to train computer vision models for use in extended reality (XR) applications. By generating over 100,000 images using Blender, we trained three distinct models. These models, trained on Snellius, demonstrate how synthetic data can effectively address real-world challenges in complex XR environments. The talk will focus on the benefits and potential of synthetic data for computer vision, showcasing how visualization tools like Blender can be integral in creating robust, adaptable models for practical use.
In Precision Livestock Farming (PLF), deep learning-based approaches are increasingly employed to study animal behavior on farms. These behavioral studies enable animal phenotyping, which can be used for genetic selection and social network analysis in large animal groups. While considerable attention is often given to models in the deep learning field, the models themselves do not function in isolation. Efficient deep-learning workflows require systems that bridge research and prototyping with production operations.
We introduce the IMAGEN Data Analytics Platform under the IMAGEN program, which brings research into animal behavior together with computer science to improve the health and welfare of pigs and lay hens and reduce the ecological footprint of food production. This platform on the Surf HPC cluster supports the development of deep learning models for animal phenotyping and addresses various data challenges through a DataOps approach. Although initially designed for the animal phenotype detection domain, the platform is domain-neutral and can be applied to similar cases in other application domains.
The era of exascale computing presents both exciting opportunities and unique challenges for quantum mechanical simulations. While the transition from petaflops to exascale computing has been marked by a steady increase in computational power, the shift towards heterogeneous architectures, particularly the dominant role of graphical processing units (GPUs), demands a fundamental shift in software development strategies. In this talk, I present a review of the changing landscape of hardware and software for exascale computing, highlighting the limitations of traditional algorithms and software implementations in light of the increasing use of heterogeneous architectures in high-end systems. I will also discuss the challenges of adapting quantum chemistry software to these new architectures, including the fragmentation of the software stack, the need for more efficient algorithms (including reduced precision versions) tailored for GPUs, and the importance of developing standardized libraries and programming models.
EESSI, the European Environment for Scientific Software Installations, is a collaborative project to build a shared scientific software stack for HPC systems and beyond. In this session we will give a brief overview of EESSI and highlight recent developments for HPC users and system administrators. On top of the ever increasing number of supported applications in the software stack, there is a lot of exciting news that make EESSI even more versatile and simple to use. Extending the existing stack and customizing it for the needs of local HPC sites is now a breeze with the new EESSI-extend functionality. Similarly, scientific software developers can use the new dev.eessi.io
repository to build, test, and deploy their development builds among a wide range of CPU architectures. Last but not least, the new EESSI CI workflows take care of many CI setup headaches by making the entire software stack available from the start in GitHub and GitLab.
In response to future challenges in scientific research and computing, SURF introduces the Experimental Technologies Platform: an open, collaborative environment where the sector can experiment with cutting-edge ICT technologies and methodologies in advanced computing and data-driven science.
An introduction into SURF's future EuroHPC Quantum Computer. Why SURF decided to host a quantum computer, what type of hardware we are going for, what challenges we aim to tackle and when you will be able to get your hands on it. (And all other questions you might have about it.)
The Lattice-Boltzmann / Very Large Eddy Simulation solver (LB/VLES) SIMULIA PowerFLOW™ is extensively used to predict aeroacoustic sources from scale-resolved turbulent flows, and to design quieter air/ground vehicles and devices, such as hairdryers and ventilation systems. The present talk will focus on the emerging application of eVTOL and small Unmanned Aerial Systems (sUAS) aeroacoustics, by covering fundamental aspects related to the simulation of transitional flow for broadband noise prediction, and software technological aspects related to the complexity of the digital model, the required computational resources, and the usage of new hardware capabilities to reduce the simulation cost.
In the framework of the European Center of Excellence in simulations for weather and climate ESiWACE3, one of the primary goals is to support modelers across Europe to optimally use HPC infrastructures. High fidelity weather and climate simulations are computationally expensive and produce a deluge of data which is often challenging to analyze. The high throughput of GPUs can be ideally suited to accelerate the simulation and distributed parallel data processing needed to perform statistical reductions on long time series of high-resolution model output. Meanwhile, careful consideration must be given to software sustainability and community engagement to strike a balance between performance and software sustainability. In this presentation, we focus on two examples: 1) GPU-acceleration of the Dutch Atmospheric Large Eddy Simulation (DALES) using OpenACC, which lead to a 5 to 10-fold speedup of the solver on multiple platforms (Nvidia and AMD) while maintaining a Fortran90 code base familiar to researchers; 2) performance optimization of ESMValTool, a community-driven package for evaluation and analysis of Earth System Model output, using distributed parallelism with Dask, achieving 10 times speedup for certain analysis workloads. These two examples showcase the benefits expert support can bring to the community in order to prepare the existing software infrastructure for the advent of ExaScale.
Unstructured data and entity extraction for financial services using next gen technology
In the coming decade, frontier research will generate massive data volumes, driving a need for advanced data processing, simulation, and analysis. High-Energy Physics and Radio Astronomy communities, launching new instruments, will require infrastructures far beyond today’s capabilities, entering the Exascale era. Meeting these needs demands innovative, data-intensive architectures and federated resource models, integrating HPC, HTC, cloud, and quantum computing. The SPECTRUM project (www.spectrumproject.eu) is developing a Strategic Research, Innovation, and Deployment Agenda (SRIDA) and a Technical Blueprint for a European compute and data continuum. This session will highlight SPECTRUM’s approach, achievements, and roadmap toward enabling federated Exabyte-scale scientific collaborations.
Thomas Wolf is the co-founder and Chief Science Officer (CSO) of Hugging Face , where he has been a pivotal figure in driving the company’s open-source, educational, and research initiatives. A prominent advocate for open science, Thomas has played a crucial role in making cutting-edge AI research and technologies widely accessible. He spearheaded the development of the Hugging Face Transformers and Datasets libraries, which have become foundational tools for researchers and developers in the machine learning community.
As the pace of digital transformation accelerates, advanced computing is at the forefront of groundbreaking innovation across research, industry, and society. In this plenary session, we delve into the latest technical developments shaping the field, from emerging high-performance computing (HPC) architectures and quantum technologies to AI-driven optimization and sustainable computing practices.
We will highlight key trends, share insights into the evolving landscape of computational tools, and discuss how these advancements empower researchers and institutions to tackle increasingly complex problems. Gain an understanding of how the cutting-edge capabilities of advanced computing are driving progress in domains like climate modelling, genomics, and artificial intelligence, and explore how SURF and EUROcc are supporting users to harness these developments effectively.
Join us for the closing session as we recap an inspiring day. Our host Valeriu will summarize the main insights, and we’ll conclude with a few final words to leave you motivated. Don’t miss this session to end the day.