May 27, 2022 – A team led by scientists Prof. Thomas D. Kühne and Prof. Christian Plessl from the University of Paderborn broke the exascale barrier for mixed-precision computing for a computer science application and achieved an application-level performance of 1.1 EFLOP/s in the quantum chemistry application CP2K on the NERSC Perlmutter system. The results are reported in a preprint which has just been published.
Currently, we are at the dawn of the exascale era and it is widely expected that the first supercomputer to cross the exascale threshold for double-precision floating-point calculations will be announced publicly at the ISC High-Performance Conference (ISC) in Hamburg at the end of May. This milestone will mark the end of a race for exascale which has intensified dramatically with increased global competition for scientific leadership and can rightly be called the “Space Race of the 21st Century”.
The merits and shortcomings of the HPL benchmark for ranking the most powerful supercomputers have been widely debated. Any HPC practitioner knows that extracting anything close to HPL performance and efficiency is very ambitious and that many scientific codes exploit only a small fraction of theoretical performance due to limited parallelism, insufficient vectoring opportunities, additional communication costs, load imbalance, etc. Thus, practically exploiting the capabilities of an exascale computer for computational science will require adaptations of algorithms, numerical libraries and application codes initiated a few years ago.
The Paderborn team took up the challenge of exascale calculation for the field of quantum chemistry and developed a new method, the first variant of which was presented at the SC’20 conference (Lass et. al: Submatrix Method for the Approximate Calculation matrix functions, https://doi.org/10.5555/3433701.3433807). The method was advanced by the authors in 2021 to a highly scalable and efficient method greatly improved with efficient GPU acceleration (Schade et. al: Towards electronic structure-based ab-initio molecule dynamics simulations with hundreds of millions of atos, https: //doi.org/10.1016/j.parco.2022.102920).
At its core, the method computes an approximate matrix function over a very large sparse matrix, which is a key operation in linear-scale electronic structure calculations in quantum mechanics. For this purpose, the method divides the resulting density matrix, which is a huge sparse matrix, into several much smaller, but dense sub-matrices, on which the matrix function is evaluated, and assembles these intermediate solutions into a global solution. As all evaluations of matrix functions on sub-matrices are independent, the method avoids communication and is extremely scalable. Since sub-matrices are small and dense (on the order of a few thousand rows/columns), performing linear algebra on these matrices achieves near-peak performance for GPUs. The sub-array method preserves, that is, enforces, the sparseness patterns of the original array. Thus, the method introduces an approximation error, the amplitude of which is acceptable for this application as shown in the publications. Additionally, the application can be made tolerant for low-precision computation by compensating for introduced errors with a Langevin-like equation for atomic motion. This paves the way for the use of mixed-precision computation using tensor cores, which results in one magnitude better performance than double-precision or single-precision arithmetic.
Record-sized simulation on the JUWELS Booster supercomputer
To evaluate the subarray method, it was integrated into the popular open-source quantum chemistry program CP2K (https://cp2k.org/). There it is used in the xTB method to solve electronic structure problems, which are by far the largest computational time component for ab initio molecular dynamics simulations. In 2021, Paderborn scientists performed simulations of the HIV virus with up to 102 million atoms on what was then Europe’s fastest supercomputer (now ranked 8th in the world), the “JUWELS Booster” at Jülich Supercomputing Center, setting a record for the largest electronic computer structure-based ab initio molecular dynamics simulation. This made it possible to achieve a computational performance of 324 petaflop/s in mixed-precision floating-point arithmetic and an efficiency of 67.7% of the theoretically available computational power, which is exceptional for this field of application ( https://doi.org/10.1016/j.parco.2022.102920).
World record performance on the Perlmutter supercomputer at NERSC
Since the record-breaking simulation in Jülich, the method has been further optimized to increase the efficiency of using GPU hardware accelerators, in particular by combining GPU-appropriate sub-matrices, so that the GPU can operate even closer maximum performance. To practically test the method’s exascale capability, the team was able to secure early access to NERSC’s “Perlmutter” supercomputer, currently ranked number five on the Top 500 list, which has sufficient computing resources to break the exascale barrier for mixed applications. – precision arithmetic.
In April 2022, the team was able to report completion! In a Covid-19 state-of-the-art protein simulation, the exaflop barrier was breached for the first time in a real-world scientific computing application using 4,400 GPU accelerators, and 1.1 exaflop/s in precision arithmetic mixed has been reached in the critical part of the request calculation time (https://doi.org/10.48550/arXiv.2205.12182).
To classify this breakthrough, one can consider that a single simulation step for 83 million atoms takes 42 seconds, performing about 42*1.127*1018 = 47*1018 floating point operations. Without the memory requirements, such a computational step would have taken about 47,000 seconds or about 13 hours with the first petaflop-class system, Roadrunner from 2008, and about 1.5 years with the first teraflop-class system, ASCI Red from 1997.
With this success, the subject is far from exhausted for the groups involved, and the team is already working on the next steps. The gold standard for atomistic simulations in chemistry and solid-state physics is the density functional theory method. The team is confident that it can apply the submatrix method to density functional theory.
Breaking the Exascale Barrier for the Electronic Structure Problem in Ab-Initio Molecular Dynamics Preprint – link
Authors: Robert Schade, Tobias Kenter, Hossam Elgabarty, Michael Lass, Thomas D. Kühne, Christian Plessl
Source: University of Paderborn