Aurora Rising HPC 2 Exafoops

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
10,785
That’s over 2000 PETAFlops if I’m not mistaken

“That is the Aurora 2 x 6 node in the lower left, and the HPE 200 Gb/sec Slingshot switch, based on the “Rosetta” ASIC, right next to it. Intel has delivered all of the blades, but it is not clear if all of the CPUs and GPUs are on them as yet since it did not say the machine was fully manufactured.


“We are very pleased to announce we have delivered over 10,000 blades,” McVeigh said on the conference call. “We have much more work to do for full optimization, delivering on the codes, and acceptance. But this is a critical milestone that we’re very, very happy to have accomplished.”

At 2,007 petaflops (rounding up to 2.01 exaflops), the Aurora 2023 machine is 11.2X more powerful than the Aurora 2018 machine would have been, which is, by the way, two times faster than Moore’s Law at a two-year doubling would provide over that same time. The combined memory capacity across the CPUs and GPUs in Aurora 2023 is 20.4 PB, which is 2.9X that of the original machine, and the aggregate memory bandwidth at 245.4 PB/sec is 8.2X higher. At 2.18 PB/sec of injection bandwidth, the Cray network is about 15 percent lower than was expected with the Omni-Path 200 network at 2.5 PB/sec (and across a much larger machine, mind you), and the bi-section bandwidth was about 38 percent higher at 0.69 PB/sec. As for storage, Aurora has 1,024 nodes running the DAOS file system, which has 230 PB of capacity and 31 TB/sec of bandwidth. That’s 53 percent more capacity, but around 30X the bandwidth on the file system for the Aurora 2023 machine.

On the briefing, McVeigh said that Intel and Argonne were working with the technical community to create AuroraGPT, which is a generative large language model that has been pumped full of corpus of scientific data. McVeigh also showed a bunch of benchmarks for Ponte Vecchio GPUs, but given how far the Aurora GPUs are scaled back – no doubt for thermal reasons – we are not sure if these numbers are all that valuable when it comes to Aurora. But this one running the OpenMC Monte Carlo simulation on Aurora testbed machines against other testbed machines using AMD Instinct MI250X and Nvidia A100 GPUs might be useful:”

1684891394812.png

Source: https://www.nextplatform.com/2023/05/23/aurora-rising-a-massive-machine-for-hpc-and-ai/
 
Back
Top