2025-12-04 –, Mission 2
In this presentation, we discuss the PartitionedArrays programming model as an alternative to the message passing interface (MPI). We present the key features of this model and illustrate how it can help users of Snellius and other supercomputers to reduce the burden of implementing complex distributed-memory parallel applications. We illustrate the capabilities of this model with the implementation of key kernels in scientific computing such as the distributed sparse matrix-vector product (SpMV), the distributed sparse matrix-matrix product (SpMM), as well as the high-performance conjugate gradient (HPCG) benchmark used in the top 500 supercomputer list. We also compare the performance of the resulting codes against state-of-the art implementations, showing that the proposed model improves user experience without compromising performance, or even improving it.
MPI is the gold-standard to program distributed-memory parallel computers, but it comes with well-known challenges. The programmer explicitly controls data distribution and communication, making the logic of MPI-enabled algorithms significantly more complex than their sequential versions. Debugging this additional logic at large scales is cumbersome or even impractical. Execution order might affect results and inspecting the local variables might be very tedious and time consuming, even for a moderate number of processes. Partitioned Global Address Space (PGAS) systems and other alternatives to MPI have been introduced to address these challenges. They often aim at freeing the users from communication-related details, but they offer less control on performance and face a strong adoption barrier as the programming model of MPI is deeply rooted in the high-performance computing (HPC) community. The PartitionedArrays programming model solves the challenges of MPI without the limitations of PGAS. It provides an effective way of expressing and debugging the logic of distributed applications instead of trying to hide these details from the user. To this end, PartitionedArrays decouples the number of parts used for data partition from the number of processes that run the code. Hence, the logic of data distribution and communication can be debugged on a single process using conventional tools. Moreover, computation and communication are written as a sequence of logically collective phases, which (unlike many MPI directives) have deterministic semantics independently of process execution order. This allows one to implement safety checks and rule out the possibility of dead-locks. These additional benefits come with virtually no penalty in performance, since MPI can still be used to run algorithms implemented with PartitionedArrays by setting the number of parts equal to the number of processes. In addition, the logic of many MPI codes can be expressed in PartitionedArrays allowing to readily port applications developed with MPI in mind, minimizing its adoption barrier in the HPC community.
PartitionedArrays is FAIR software available at https://github.com/PartitionedArrays/PartitionedArrays.jl
Assistant Professor at VU Amsterdam Department of Computer Science.