NVIDIA Ampere GPUs to feature 2x performance than the RTX 20 series, PCIe 4.0, up to 826 mm² dies and 20 GB of VRAM

erek · Feb 28, 2020

https://wccftech.com/nvidia-next-ge...2-cores-benchmarked-40-faster-than-titan-rtx/

defaultluser · Feb 28, 2020

erek said:
https://wccftech.com/nvidia-next-ge...2-cores-benchmarked-40-faster-than-titan-rtx/

That's more along the range of performance bump I was expecting. Especially for large GPU on first-gen process node.

Remember, the GTX 1080 Ti was fairly tiny in size, compared to Maxwell Titan or RTX Titan. They will whip-out the > 700mm des for whatever replaces Ampere.

Pascal just found a huge clock sped bump to go along with the shader increase, but I wouldn't' t expect such an amazing bump this time round.

elvn · Feb 29, 2020

Some replies from members in these threads say they are fine with 4k performance as it is but as we get into 120hz - 144hz 4k (and 120hz hdmi 2.1 4k TVs) we need a lot of gpu performance to stay in the 100 - 120fps range to fill those higher hz. Otherwise you really aren't getting much out of the higher hz at all.

( https://blurbusters.com/blur-buster...000hz-displays-with-blurfree-sample-and-hold/ )

As VR progresses we will need much more powerful gpus too since VR renders a whole scene per eye. The valve index's native resolution is 1440x1600 per eye. The best resolution VR headset on the market today is capable of rendering 4k per eye and in the next few years that kind of VR resolution could become more standard.

I wish they would also concentrate on making much better interpolation technology. Some VR sub 90fps cuts to 45 then doubles to 90 to keep filling frames to feed a 90Hz headset. They could probably make it sub 120 cutting to 60 x 2 too. Really it could be doubled or tripled by default if interpolation tech was advanced enough so your 90 or 100fps gameplay at 4k would be up to 180 to 200 or more. The thing is "your frame rate" is an average and most of us are using variable hz to ride a roller coaster of frame rates. So if you could get 90fps in a game at 4k resolution you might be on a graph from 60 - 90 - 120. Doubled that would be 120 - 180 - 240, and obviously tripled 240 - 360 - 480. I still would say 90 , 100, or 120fps as a base of raw fps in order to see enough higher motion definition ... the # of pages in a animation flip book per se, before you start duplicating pages, so you still wouldn't be able to completely rely on interpolation in regard to your average and lows in your fps graph.

With 4k 120hz consoles, pc gaming demands, extreme resolutions on more advanced VR headsets, and ever increasing graphics ceilings (though somewhat arbitrary caps by devs in the first place).. we are probably going to need some tricks and work arounds like consoles and VR headsets have already started doing in the last gen along with hopefully some very advanced interpolation without bad artifacting and lag.

diceman2037 · Mar 1, 2020

Armenius said:
GTX 480 -> GTX 580 (Fermi -> Fermi 2) = +11% (die shrink)

incorrect, both are 40nm

reaper12 said:
Pretty good table, the only thing I would change is that the Maxwell to Pascal was a double Die Shrink as well as a new arch. Remember 20nm was cancelled due to problems. This skip really messed up AMD as they didn't have the resources to switch up.

Kepler was also a uarch+shrink from fermi

defaultluser said:
Turing was built on a mature 14nm process node. Ampere should be even cheaper to build , once they get yields up.

Turing is build on 16nm+++ ala 14nm

Lakados said:
Yeah MS is working on bringing Ray Tracing to the DX12 spec and it is currently available in the insider preview builds for windows10 developers. NVidia has the instructions on their site how to install the tools to get it going.

I think you're 12 months behind, 1.1 is a revisional update to the already available DXR spec.

Stoly said:
Actually tensor cores can be used for raytracing.

No, actually, they can't.

The tensor cores are purely 16fp matrix multiplication
Cards that can perform raytracing without RT cores are doing so purely on the general purpose shaders, Volta included. Tensor is good at artifact removal and does exactly that.

Stoly said:
Again, tensor cores CAN be used for raytracing. That's how it all started with DXR and Volta. Remember the SW reflections demo? It was done on volta initially.

You keep saying this, but I do not think you know what you're talking about from a technical standpoint.

From Nvidia
"Meanwhile NVIDIA also mentioned that they have the ability to leverage Volta's tensor cores in an indirect manner, accelerating ray tracing by doing AI denoising, where a neural network could be trained to reconstruct an image using fewer rays, a technology the company has shown off in the past at GTC."

DukenukemX said:
You can tell that Nvidia threw their Ray-Tracing technology together when the frame rate of games dramatically drop with the feature on.

You have no idea what you're talking about

Stoly · Mar 1, 2020

diceman2037 said:
incorrect, both are 40nm

Kepler was also a uarch+shrink from fermi

Turing is build on 16nm+++ ala 14nm

I think you're 12 months behind, 1.1 is a revisional update to the already available DXR spec.

No, actually, they can't.

The tensor cores are purely 16fp matrix multiplication
Cards that can perform raytracing without RT cores are doing so purely on the general purpose shaders, Volta included. Tensor is good at artifact removal and does exactly that.

You keep saying this, but I do not think you know what you're talking about from a technical standpoint.

From Nvidia
"Meanwhile NVIDIA also mentioned that they have the ability to leverage Volta's tensor cores in an indirect manner, accelerating ray tracing by doing AI denoising, where a neural network could be trained to reconstruct an image using fewer rays, a technology the company has shown off in the past at GTC."

You have no idea what you're talking about

Except they do.

You can do dxr with all 3, the fact that RT cores are specifically designed for it hence much more efficient doesn't mean it can't be done any other way.

IdiotInCharge · Mar 1, 2020

Stoly said:
Except they do.

You can do dxr with all 3, the fact that RT cores are specifically designed for it hence much more efficient doesn't mean it can't be done any other way.

I could do raytracing with the computer that flew on the Apollo 11...

I'd just never live to see the results

sabrewolf732 · Mar 1, 2020

Thunderdolt said:
Think a little harder here. Why did the mining craze benefit Nvidia but not AMD? Because AMD's cards are too slow. Simple as that. Nvidia is charging more for an inherently superior product. The other bit which I've mentioned a thousand times is that your baseline company, AMD, sells every single GPU for a loss. If not for their CPU business being able to subsidize their GPU losses, AMD would either be out of business entirely or would have raised their prices.

AMD had the superior mining cards, wut.

edit: oof, just realized this was an old post. nvm.

Stoly · Mar 2, 2020

IdiotInCharge said:
I could do raytracing with the computer that flew on the Apollo 11...

I'd just never live to see the results

I loved doing raytracing on the Amiga. Great way to spend the weekend rendering 1 frame

diceman2037 · Mar 2, 2020

Stoly said:
Except they do.

You are wrong and there is no sources available past present or future that you could provide to deem otherwise.
The Tensor cores are not exposed at all to any of the graphical API's, being only exposed to CUDA with specific limitations to its utility in the sdk.

Volta's BVH calculations are done purely in the Integer units which are allowed to function independantly of the FP units in parallel and without penalty.
On Volta, mixed Tensor and SM usage is a performance DECREASE because the volta's tensor capabilities lack the cache management engine that Turings includes, switching between contexts loses an entire cycle to flush the cache.

As RT is just for BVH occlusion detection, the integer units are up to the task of doing this on Volta and then using the FP units to perform the rest of the work - As opposed to RTX where the RT cores perform BVH, Integer units are left available for other tasks and FP32 does grunt work.

Scaling up the amount of RT cores per SM won't have quite as much an effect as increasing the SM's and in turn the Shader count.

Additional note: GTX Turing does BVH on the integer units too, the 16xx cards are just much more cut down in capabilities that they don't perform much better (if at all differently) than 10 series cards with more shaders.

reaper12 · Mar 3, 2020

Stoly said:
Except they do.

I think his point is that you can't do Ray Tracing on Tensor cores. You can denoise the Ray Traced image using Tensor cores.

NVIDIA Ampere GPUs to feature 2x performance than the RTX 20 series, PCIe 4.0, up to 826 mm² dies and 20 GB of VRAM

erek

[H]F Junkie

defaultluser

[H]F Junkie

elvn

Supreme [H]ardness

diceman2037

n00b

Stoly

Supreme [H]ardness

IdiotInCharge

NVIDIA SHILL

sabrewolf732

Supreme [H]ardness

Stoly

Supreme [H]ardness

diceman2037

n00b

reaper12

2[H]4U