Ampere vs Turing - Power consumption rise due to more work being done simultaneously?

Factum

2[H]4U
Joined
Dec 24, 2014
Messages
2,455
I was looking over the ampere slides when this caught my eye:
01.png


1 frame on Turing take 13 milliseconds. Raster + Raytracing + DLSS. Kinda running concurrently, kinda not. Raster run for the whole 13 milliseconds. Raytracing running for 1/3 of the frame. And DLSS having a quick sprint near the end.
1 frame on Ampere take 6.7 milliseconds. Raster + Raytracing + DLSS running fully concurrently. Raster starts, raytracing starts 1/4 into the frame, followed almost immediately by DLSS and the last half'ish of the frame is just raster.

On Turing the CUDA cores runs the whole frame, the raytracing is done concurrently ens. raster keeps going and then DLSS engages and ends. A maximum on two units firing concurrently at any given time.

But on Ampere all three unit types runs concurrently, meaning more parts of the chip is "on" at the same time.
So a frame on Ampere take about 50% of the time in the slide, meaning it should put out double the frames compared to Turing, keep more of the chip "on" while doing so. So should also draw more power while doing so...due to more work being done concurrently.

Then again I could just be to tired and not thinking straight :D
 
The Y axis could be power but also could be processing level or % of different parts of the architecture (shown in different colours).
 
Would be interesting to see a 'watts per frame' metric to relate GPU efficiency. Lots of variables to control for though!
 
How can you do DLSS concurrently? You need to finish the frame before you can scale it.
 
Well. It’s a graphic card that uses more power. At the end of the it’s as simple as that. Does it give you more performance? Yea sure it does lol.
 
How can you do DLSS concurrently? You need to finish the frame before you can scale it.
Probably working on the previous frame's output while shaders are working on the next one? They used to not be able to do that with Turing, specifically the Tensor cores ate up all of the register B/W. This meant while Tensor ops could technically be done concurrently with FP/INT, it wasn't practical. Apparently, Ampere resolved this.
 
Probably working on the previous frame's output while shaders are working on the next one? They used to not be able to do that with Turing, specifically the Tensor cores ate up all of the register B/W. This meant while Tensor ops could technically be done concurrently with FP/INT, it wasn't practical. Apparently, Ampere resolved this.

Slide without the explanation that should go with it is lacking critical info IMO. I think the slide is just incorrect graphic on a marketing slide, not an actual representation.

If you are halfway into the next frame, while still working on the previous frame, that means you still aren't displaying the previous frame. That means the frame is delayed. A little burst of DLSS at the end of the previous frame is faster getting the frame on screen rather than doing it halfway into the next frame.

If you imagine doing the frame in 7.5 ms and displaying immediately (as shown), or doing frames in 6.7ms (also shown), but now you have to wait at least halfway through the next to display it, then the former is the superior option.

So this graphic makes no sense. There really is no case that doing DLSS in the middle of the frame makes any sense.

Left to guess from the nonsensical slide, my guess is: they are working on greater concurrency that can save another 10%, this will be 2nd generation concurrency, and someone drew some over zealous graphics to go with it incorrectly.
 
How can you do DLSS concurrently? You need to finish the frame before you can scale it.
You can do DLSS on a completed frame while rendering to another frame, looks like Nvidia greatly increased the data paths in Ampere.
 
You can do DLSS on a completed frame while rendering to another frame, looks like Nvidia greatly increased the data paths in Ampere.

Post right above yours point out why that doesn't make sense.
 
Post right above yours point out why that doesn't make sense.
Looks like you would need an additional frame buffers and data for the motion vectors, should be possible and much faster. Have a longer rendering time or have an additional frame buffer. If the overall time to render is less than the additional frame buffer will pay for itself. Is there a white paper yet on Ampere? I have not checked.
 
We need to see independent reviews where case by case, performance + noise + power/heat are graphed.

I've been doing that "new gpu soon" mental open loop build. I end up searching for a giant case for a single 60mm 480. The alternative is driving intake air into multi rad setups 2x280/360+280/2x360/etc. That's never a good scenario when your benchrace build is so cluttered you have to disassemble the loop to get at components.

All of that mental excruciating noise is just unnecessary until approximate figures are published.
 
Back
Top