NVIDIA's Next-Generation "Ampere" GPUs Could Have 18 TeraFLOPs of Compute Performance

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
10,875
Remember when RV770 was the first TeraFlop performer?

"With Big Red 200 supercomputer being based on Cray's Shasta supercomputer building block, it is being deployed in two phases. The first phase is the deployment of 672 dual-socket nodes powered by AMD's EPYC 7742 "Rome" processors. These CPUs provide 3.15 PetaFLOPs of combined FP64 performance. With a total of 8 PetaFLOPs planned to be achieved by the Big Red 200, that leaves just a bit under 5 PetaFLOPs to be had using GPU+CPU enabled system. Considering the configuration of a node that contains one next-generation AMD "Milan" 64 core CPU, and four of NVIDIA's "Ampere" GPUs alongside it. If we take for a fact that Milan boosts FP64 performance by 25% compared to Rome, then the math shows that the 256 GPUs that will be delivered in the second phase of Big Red 200 deployment will feature up to 18 TeraFLOPs of FP64 compute performance. Even if "Milan" doubles the FP64 compute power of "Rome", there will be around 17.6 TeraFLOPs of FP64 performance for the GPU."

https://www.techpowerup.com/263489/...ould-have-18-teraflops-of-compute-performance
 
Yeah just a smidge expensive. Might be cheaper to try and get a solar/wind/battery farm in place instead. Way less administrative red tape too.

What they don't know won't hurt them...the way it should be.
 
And need at least an equally priced system to take advantage of it.

Question, how does the RTX Titan/2080Ti compare to the Titan V?

Roughly the same (if you meant for gaming, Geforce/RTX cards aren't built for FP64 stuff). WIth the stock cooler the Titan V suffers from reduced clocks though, so with stock cards it might look a bit slower.
 
Last edited:
18TFLOPS would be 35% faster than a 2080ti. It’s within the realm of possibility...
 
I'm in for two.

Are personal nuclear power plants a thing, yet?

Soon, the SMRs are starting into prototype testing now. There's a few designs that are very tiny. They're calling them 'nuclear batteries'.

You might want to build a cement vault of some sort for it though just get a bunch of sakrete from the hardware store :p
 
Soon, the SMRs are starting into prototype testing now. There's a few designs that are very tiny. They're calling them 'nuclear batteries'.

You might want to build a cement vault of some sort for it though just get a bunch of sakrete from the hardware store :p

I'll just dig a big hole in the back yard. Done and done.
 
Total crap.

RTX Titan FP64 = 0.51 TFLOPS
RTX 2080Ti FP64 = 0.44 TFLOPS


But comparing GV100 versus Titan RTX are faster on Fp32/FP16, and has more powerful second-gen Tensor cores.

It's hell of a deal for $1500, if you want to do deep learning. Luckily, most people don't nee the accuracy of FP64 in mass throughput form!.
 
But comparing GV100 versus Titan RTX are faster on Fp32/FP16, and has more powerful second-gen Tensor cores.

It's hell of a deal for $1500, if you want to do deep learning. Luckily, most people don't nee the accuracy of FP64 in mass throughput form!.

GV100 is a 32GB HBM2 $9000 part that attains 16.7 TFLOPS on FP32, which is slightly faster than the $2500 Titan RTX 24GB GDDR6 hitting 16.3 TFLOPS on FP32.

Even taking the 8.33 TFLOPS FP64 figure from the GV100, this new Ampere is claimed to be 2.1 times faster. That's...one hell of an impressive leap.
Let's just keep our fingers crossed that the consumer line will see some serious jumps from Turing to Ampere, similar to what we saw going from Maxwell/Maxwell2 to Pascal.
 
Last edited:
Blah. This rumorsville stuff is a waste of time. It's basically the same things as announcing that "AMD is set to disrupt the 4k PC gaming market".
Regardless of if these statements are true or not, it's all just hype until we have a card we can actually use and buy. It is all also just speculation until we have third parties that can actually test all this stuff too.
 
I'm skeptical of the math. Article says an AMD server cpu with x amount flops will be paired with 4 ampere GPU's... then they add the CPU's flops up, subtract that from hat the final known flops, and come up with estimated GPU flops as the remainder.

But in this arrangement, isn't the CPU working to feed the GPU's? Why are the CPUs' flops added to the total? Maybe a portion of the CPU's excess compute can be used alongside the GPU's? But what percentage?

Seems like poorly thought out math. If the CPU is just working to feed the GPU's, then all of the flops would be coming from the GPU's wouldn't it? Which means the individual flops guesstimated on the GPU's is lower that actual.
 
I'm skeptical of the math. Article says an AMD server cpu with x amount flops will be paired with 4 ampere GPU's... then they add the CPU's flops up, subtract that from hat the final known flops, and come up with estimated GPU flops as the remainder.

But in this arrangement, isn't the CPU working to feed the GPU's? Why are the CPUs' flops added to the total? Maybe a portion of the CPU's excess compute can be used alongside the GPU's? But what percentage?

Seems like poorly thought out math. If the CPU is just working to feed the GPU's, then all of the flops would be coming from the GPU's wouldn't it? Which means the individual flops guesstimated on the GPU's is lower that actual.

It is standard practice in the scientific community to count CPU flops in hybrid systems. Just look at any super computer.
 
But in this arrangement, isn't the CPU working to feed the GPU's? Why are the CPUs' flops added to the total? Maybe a portion of the CPU's excess compute can be used alongside the GPU's? But what percentage?
I expect that feeding GPUs is mostly a data-shuttling task, while CPU compute mostly isn't, for this claim to be made.

I do like where you're going, and agree that the number of variables unaccounted for really weakens the case.
 
Back
Top