NVIDIA's Next-Generation "Ampere" GPUs Could Have 18 TeraFLOPs of Compute Performance

erek · Feb 3, 2020

Remember when RV770 was the first TeraFlop performer?

"With Big Red 200 supercomputer being based on Cray's Shasta supercomputer building block, it is being deployed in two phases. The first phase is the deployment of 672 dual-socket nodes powered by AMD's EPYC 7742 "Rome" processors. These CPUs provide 3.15 PetaFLOPs of combined FP64 performance. With a total of 8 PetaFLOPs planned to be achieved by the Big Red 200, that leaves just a bit under 5 PetaFLOPs to be had using GPU+CPU enabled system. Considering the configuration of a node that contains one next-generation AMD "Milan" 64 core CPU, and four of NVIDIA's "Ampere" GPUs alongside it. If we take for a fact that Milan boosts FP64 performance by 25% compared to Rome, then the math shows that the 256 GPUs that will be delivered in the second phase of Big Red 200 deployment will feature up to 18 TeraFLOPs of FP64 compute performance. Even if "Milan" doubles the FP64 compute power of "Rome", there will be around 17.6 TeraFLOPs of FP64 performance for the GPU."

https://www.techpowerup.com/263489/...ould-have-18-teraflops-of-compute-performance

DejaWiz · Feb 3, 2020

I'm in for two.

Are personal nuclear power plants a thing, yet?

Lakados · Feb 3, 2020

DejaWiz said:
I'm in for two.

Are personal nuclear power plants a thing, yet?

Yeah just a smidge expensive. Might be cheaper to try and get a solar/wind/battery farm in place instead. Way less administrative red tape too.

DejaWiz · Feb 3, 2020

Lakados said:
Yeah just a smidge expensive. Might be cheaper to try and get a solar/wind/battery farm in place instead. Way less administrative red tape too.

What they don't know won't hurt them...the way it should be.

Armenius · Feb 3, 2020

GeForce cards have not had FP64 since the original Titan.

DejaWiz · Feb 3, 2020

Armenius said:
GeForce cards have not had FP64 since the original Titan.

Titan V (Volta) has FP64 rated @ 6.9 TFLOPS

But, to your point, the Titan V isn't considered a part of the GeForce portfolio.

Armenius · Feb 3, 2020

DejaWiz said:
Titan V (Volta) has FP64 rated @ 6.9 TFLOPS

But, to your point, the Titan V isn't considered a part of the GeForce portfolio.

Don't get me wrong, it is impressive.

DejaWiz · Feb 3, 2020

Armenius said:
Don't get me wrong, it is impressive.

Meh, it still carries a barf-inducing $3000 price tag.

Rockenrooster · Feb 3, 2020

DejaWiz said:
Meh, it still carries a barf-inducing $3000 price tag.

And need at least an equally priced system to take advantage of it.

Question, how does the RTX Titan/2080Ti compare to the Titan V?

kalston · Feb 3, 2020

Rockenrooster said:
And need at least an equally priced system to take advantage of it.

Question, how does the RTX Titan/2080Ti compare to the Titan V?

Roughly the same (if you meant for gaming, Geforce/RTX cards aren't built for FP64 stuff). WIth the stock cooler the Titan V suffers from reduced clocks though, so with stock cards it might look a bit slower.

DejaWiz · Feb 3, 2020

Rockenrooster said:
And need at least an equally priced system to take advantage of it.

Question, how does the RTX Titan/2080Ti compare to the Titan V?

Total crap.

RTX Titan FP64 = 0.51 TFLOPS
RTX 2080Ti FP64 = 0.44 TFLOPS

Dayaks · Feb 3, 2020

18TFLOPS would be 35% faster than a 2080ti. It’s within the realm of possibility...

Rockenrooster · Feb 3, 2020

DejaWiz said:
Total crap.

RTX Titan FP64 = 0.51 TFLOPS
RTX 2080Ti FP64 = 0.44 TFLOPS

Dang, that's quite the difference

THRESHIN · Feb 3, 2020

DejaWiz said:
I'm in for two.

Are personal nuclear power plants a thing, yet?

Soon, the SMRs are starting into prototype testing now. There's a few designs that are very tiny. They're calling them 'nuclear batteries'.

You might want to build a cement vault of some sort for it though just get a bunch of sakrete from the hardware store

DejaWiz · Feb 3, 2020

THRESHIN said:
Soon, the SMRs are starting into prototype testing now. There's a few designs that are very tiny. They're calling them 'nuclear batteries'.

You might want to build a cement vault of some sort for it though just get a bunch of sakrete from the hardware store

I'll just dig a big hole in the back yard. Done and done.

defaultluser · Feb 3, 2020

DejaWiz said:
Total crap.

RTX Titan FP64 = 0.51 TFLOPS
RTX 2080Ti FP64 = 0.44 TFLOPS

But comparing GV100 versus Titan RTX are faster on Fp32/FP16, and has more powerful second-gen Tensor cores.

It's hell of a deal for $1500, if you want to do deep learning. Luckily, most people don't nee the accuracy of FP64 in mass throughput form!.

JasonLD · Feb 3, 2020

Dayaks said:
18TFLOPS would be 35% faster than a 2080ti. It’s within the realm of possibility...

18TFLOPS of FP64. Which would mean 36TFLOPS of FP32 which at least doubles 2080Ti on flops numbers.

Dayaks · Feb 3, 2020

JasonLD said:
18TFLOPS of FP64. Which would mean 36TFLOPS of FP32 which at least doubles 2080Ti on flops numbers.

Oh, maybe I should RTFA.

Mayybbe on dual gpu cards lol.

Rockenrooster · Feb 3, 2020

JasonLD said:
18TFLOPS of FP64. Which would mean 36TFLOPS of FP32 which at least doubles 2080Ti on flops numbers.

Now we're talkin!!!

DejaWiz · Feb 3, 2020

defaultluser said:
But comparing GV100 versus Titan RTX are faster on Fp32/FP16, and has more powerful second-gen Tensor cores.

It's hell of a deal for $1500, if you want to do deep learning. Luckily, most people don't nee the accuracy of FP64 in mass throughput form!.

GV100 is a 32GB HBM2 $9000 part that attains 16.7 TFLOPS on FP32, which is slightly faster than the $2500 Titan RTX 24GB GDDR6 hitting 16.3 TFLOPS on FP32.

Even taking the 8.33 TFLOPS FP64 figure from the GV100, this new Ampere is claimed to be 2.1 times faster. That's...one hell of an impressive leap.
Let's just keep our fingers crossed that the consumer line will see some serious jumps from Turing to Ampere, similar to what we saw going from Maxwell/Maxwell2 to Pascal.

UnknownSouljer · Feb 3, 2020

Blah. This rumorsville stuff is a waste of time. It's basically the same things as announcing that "AMD is set to disrupt the 4k PC gaming market".
Regardless of if these statements are true or not, it's all just hype until we have a card we can actually use and buy. It is all also just speculation until we have third parties that can actually test all this stuff too.

GoodBoy · Feb 3, 2020

I'm skeptical of the math. Article says an AMD server cpu with x amount flops will be paired with 4 ampere GPU's... then they add the CPU's flops up, subtract that from hat the final known flops, and come up with estimated GPU flops as the remainder.

But in this arrangement, isn't the CPU working to feed the GPU's? Why are the CPUs' flops added to the total? Maybe a portion of the CPU's excess compute can be used alongside the GPU's? But what percentage?

Seems like poorly thought out math. If the CPU is just working to feed the GPU's, then all of the flops would be coming from the GPU's wouldn't it? Which means the individual flops guesstimated on the GPU's is lower that actual.

MangoSeed · Feb 3, 2020

GoodBoy said:
I'm skeptical of the math. Article says an AMD server cpu with x amount flops will be paired with 4 ampere GPU's... then they add the CPU's flops up, subtract that from hat the final known flops, and come up with estimated GPU flops as the remainder.

But in this arrangement, isn't the CPU working to feed the GPU's? Why are the CPUs' flops added to the total? Maybe a portion of the CPU's excess compute can be used alongside the GPU's? But what percentage?

Seems like poorly thought out math. If the CPU is just working to feed the GPU's, then all of the flops would be coming from the GPU's wouldn't it? Which means the individual flops guesstimated on the GPU's is lower that actual.

It is standard practice in the scientific community to count CPU flops in hybrid systems. Just look at any super computer.

IdiotInCharge · Feb 4, 2020

GoodBoy said:
But in this arrangement, isn't the CPU working to feed the GPU's? Why are the CPUs' flops added to the total? Maybe a portion of the CPU's excess compute can be used alongside the GPU's? But what percentage?

I expect that feeding GPUs is mostly a data-shuttling task, while CPU compute mostly isn't, for this claim to be made.

I do like where you're going, and agree that the number of variables unaccounted for really weakens the case.

NVIDIA's Next-Generation "Ampere" GPUs Could Have 18 TeraFLOPs of Compute Performance

[H]F Junkie

Fully [H]

[H]F Junkie

Fully [H]

Extremely [H]

Fully [H]

Extremely [H]

Fully [H]

Gawd

[H]ard|Gawd

Fully [H]

[H]F Junkie

Gawd

2[H]4U

Fully [H]

[H]F Junkie

n00b

[H]F Junkie

Gawd

Fully [H]

[H]F Junkie

2[H]4U

[H]ard|Gawd

NVIDIA SHILL