HPC Users Adopt BlueField DPUs at ISC 2022

By Daniel G. Barnes On May 30, 2022

Across Europe and the United States, HPC developers are supercharging supercomputers with the power of Arm cores and accelerators inside NVIDIA BlueField-2 DPUs.

At Los Alamos National Laboratory (LANL), this work is part of a large, multi-year collaboration with NVIDIA that aims for 30x speedups in multi-physics computing applications.

LANL researchers predict significant performance gains by using data processing units (DPUs) running on NVIDIA Quantum InfiniBand networks. They will pioneer techniques for computer storage, pattern matching and more using BlueField and its NVIDIA DOCA Software Framework.

An open API for DPUs

Efforts will also help to better define OpenSNAPI, an application interface that anyone can use to operate DPUs. It is a project of the Unified Communication Framework, a consortium enabling heterogeneous computing for HPC applications whose members include Arm, IBM, NVIDIA, US National Laboratories and US Universities.

LANL is already feeling the power of networked computing, thanks to a DPU-powered storage system it created.

The Accelerated Box of Flash (ABoF, shown below) combines solid-state storage with DPU and InfiniBand accelerators to accelerate the performance-critical parts of a Linux file system. It is up to 30 times faster than similar storage systems and is becoming a key part of LANL’s infrastructure.

ABoF places compute near storage to minimize data movement and improve the efficiency of simulation and data analysis pipelines, a researcher said in a recent LANL blog.

Texas rides a super cloud native

The Texas Advanced Computing Center (TACC) is the latest to adopt BlueField-2 in Dell PowerEdge servers. It will use DPUs on an InfiniBand network to make its Lonestar6 system a development platform for cloud-native supercomputing.

TACC’s Lonestar6 serves a wide range of HPC developers at Texas A&M University, Texas Tech University, and the University of North Texas, as well as numerous research centers and faculties.

MPI is accelerating

Eight hundred miles to the northeast, researchers at Ohio State University have shown how DPUs can run one of the most popular HPC programming models up to 26% faster.

By offloading critical parts of the Message Passing Interface (MPI), they accelerated P3DFFT, a library used in many large-scale HPC simulations.

“DPUs are like assistants managing the work of busy executives, and they will become mainstream because they can speed up any workload,” said Dhabaleswar K. (DK) Panda, professor of computer science and engineering at the ‘Ohio State who led the DPU. work with your team MVAPICH open-source software.

DPU in HPC centers, Clouds

The double-digit increases are huge for supercomputers running HPC simulations like drug discovery or aircraft design. And cloud services can use those gains to boost their customers’ productivity, said Panda, which has received requests from multiple HPC centers for its code.

Quantum InfiniBand networks with features such as Nvidia SHARP help make his work possible.

“Others talk about network computing, but InfiniBand supports it today,” he said.

Durham does load balancing

Several research teams in Europe are accelerating MPI and other HPC workloads with BlueField DPUs.

For example, Durham University in Northern England is developing software for load balancing MPI jobs using BlueField DPUs on a 16-node Dell PowerEdge cluster. His work will pave the way for more efficient processing of better algorithms for HPC facilities around the world, said Tobias Weinzierl, the project’s principal investigator.

DPU in Cambridge, Munich

Researchers in Cambridge, London and Munich are also using DPUs.

For its part, University College London is studying how to schedule tasks for a host system on BlueField-2 DPUs. It’s a capability that could be used, for example, to move data between host processors so it’s there when they need it.

BlueField DPUs inside Dell PowerEdge servers in the Cambridge Service for Data Driven Discovery offload security policies, storage frameworks and other tasks from host processors, optimizing system performance.

Meanwhile, researchers from the Computer Architecture and Parallel Systems group at the Technical University of Munich are investigating ways to offload both MPI and operating system tasks with DPUs as part of a EuroHPC project.

Back in the United States, Georgia Tech researchers are collaborating with Sandia National Laboratories to accelerate molecular dynamics work using BlueField-2 DPUs. A document describing their work so far shows that the algorithms can be sped up by up to 20% without loss of simulation accuracy.

An expanding network

Earlier this month, Japanese researchers announced a system using the latest NVIDIA H100 Tensor Core GPU on our fastest, smartest network yet, the NVIDIA Quantum-2 InfiniBand platform.

NEC will build about 6 PFLOPS, H100-based supercomputer for the Center for Computational Sciences at University of Tsukuba. Researchers will use it for climatology, astrophysics, big data, AI and more.

Meanwhile, researchers like Panda are already thinking about how they will use the cores of BlueField-3 DPUs.

“It will be like hiring executive assistants with college degrees instead of those with high school degrees, so hopefully more offloads will be done,” he joked.