Ampere vs Turing - Power consumption rise due to more work being done simultaneously?

Factum · Sep 9, 2020

I was looking over the ampere slides when this caught my eye:

1 frame on Turing take 13 milliseconds. Raster + Raytracing + DLSS. Kinda running concurrently, kinda not. Raster run for the whole 13 milliseconds. Raytracing running for 1/3 of the frame. And DLSS having a quick sprint near the end.
1 frame on Ampere take 6.7 milliseconds. Raster + Raytracing + DLSS running fully concurrently. Raster starts, raytracing starts 1/4 into the frame, followed almost immediately by DLSS and the last half'ish of the frame is just raster.

On Turing the CUDA cores runs the whole frame, the raytracing is done concurrently ens. raster keeps going and then DLSS engages and ends. A maximum on two units firing concurrently at any given time.

But on Ampere all three unit types runs concurrently, meaning more parts of the chip is "on" at the same time.
So a frame on Ampere take about 50% of the time in the slide, meaning it should put out double the frames compared to Turing, keep more of the chip "on" while doing so. So should also draw more power while doing so...due to more work being done concurrently.

Then again I could just be to tired and not thinking straight

Nenu · Sep 9, 2020

The Y axis could be power but also could be processing level or % of different parts of the architecture (shown in different colours).

LazyGamer · Sep 9, 2020

Would be interesting to see a 'watts per frame' metric to relate GPU efficiency. Lots of variables to control for though!

MangoSeed · Sep 9, 2020

LazyGamer said:
Would be interesting to see a 'watts per frame' metric to relate GPU efficiency. Lots of variables to control for though!

Isn't that the same as fps per watt?

LazyGamer · Sep 9, 2020

MangoSeed said:
Isn't that the same as fps per watt?

More or less?

It seems like a question of presentation. Whether you add a 'seconds' timeframe etc.

Snowdog · Sep 9, 2020

How can you do DLSS concurrently? You need to finish the frame before you can scale it.

Factum · Sep 10, 2020

Snowdog said:
How can you do DLSS concurrently? You need to finish the frame before you can scale it.

Ask NVIDIA...their chart

NKD · Sep 10, 2020

Well. It’s a graphic card that uses more power. At the end of the it’s as simple as that. Does it give you more performance? Yea sure it does lol.

jeremyshaw · Sep 10, 2020

Snowdog said:
How can you do DLSS concurrently? You need to finish the frame before you can scale it.

Probably working on the previous frame's output while shaders are working on the next one? They used to not be able to do that with Turing, specifically the Tensor cores ate up all of the register B/W. This meant while Tensor ops could technically be done concurrently with FP/INT, it wasn't practical. Apparently, Ampere resolved this.

Snowdog · Sep 10, 2020

jeremyshaw said:
Probably working on the previous frame's output while shaders are working on the next one? They used to not be able to do that with Turing, specifically the Tensor cores ate up all of the register B/W. This meant while Tensor ops could technically be done concurrently with FP/INT, it wasn't practical. Apparently, Ampere resolved this.

Slide without the explanation that should go with it is lacking critical info IMO. I think the slide is just incorrect graphic on a marketing slide, not an actual representation.

If you are halfway into the next frame, while still working on the previous frame, that means you still aren't displaying the previous frame. That means the frame is delayed. A little burst of DLSS at the end of the previous frame is faster getting the frame on screen rather than doing it halfway into the next frame.

If you imagine doing the frame in 7.5 ms and displaying immediately (as shown), or doing frames in 6.7ms (also shown), but now you have to wait at least halfway through the next to display it, then the former is the superior option.

So this graphic makes no sense. There really is no case that doing DLSS in the middle of the frame makes any sense.

Left to guess from the nonsensical slide, my guess is: they are working on greater concurrency that can save another 10%, this will be 2nd generation concurrency, and someone drew some over zealous graphics to go with it incorrectly.

noko · Sep 10, 2020

Snowdog said:
How can you do DLSS concurrently? You need to finish the frame before you can scale it.

You can do DLSS on a completed frame while rendering to another frame, looks like Nvidia greatly increased the data paths in Ampere.

Snowdog · Sep 10, 2020

noko said:
You can do DLSS on a completed frame while rendering to another frame, looks like Nvidia greatly increased the data paths in Ampere.

Post right above yours point out why that doesn't make sense.

noko · Sep 10, 2020

Snowdog said:
Post right above yours point out why that doesn't make sense.

Looks like you would need an additional frame buffers and data for the motion vectors, should be possible and much faster. Have a longer rendering time or have an additional frame buffer. If the overall time to render is less than the additional frame buffer will pay for itself. Is there a white paper yet on Ampere? I have not checked.

cageymaru · Sep 11, 2020

Gigabyte thinks the power draw is awe inspiring.
https://twitter.com/AorusOfficial/status/1304253173612322816

somebrains · Sep 11, 2020

We need to see independent reviews where case by case, performance + noise + power/heat are graphed.

I've been doing that "new gpu soon" mental open loop build. I end up searching for a giant case for a single 60mm 480. The alternative is driving intake air into multi rad setups 2x280/360+280/2x360/etc. That's never a good scenario when your benchrace build is so cluttered you have to disassemble the loop to get at components.

All of that mental excruciating noise is just unnecessary until approximate figures are published.

Ampere vs Turing - Power consumption rise due to more work being done simultaneously?

Factum

2[H]4U

Nenu

[H]ardened

LazyGamer

Weaksauce

MangoSeed

[H]ard|Gawd

LazyGamer

Weaksauce

Snowdog

[H]F Junkie

Factum

2[H]4U

NKD

[H]F Junkie

jeremyshaw

[H]F Junkie

Snowdog

[H]F Junkie

noko

Supreme [H]ardness

Snowdog

[H]F Junkie

noko

Supreme [H]ardness

cageymaru

Fully [H]

somebrains

[H]ard|Gawd