Nvidia's RT performance gains not as expected?

I guess I was expecting something like this: RTX 3090 turns on RT and takes a -50% fps penalty for doing so, RTX 4090 being "way way faster at RT" than the 3090 turns on the same RT effects and "only" takes a -25% fps penalty. But clearly this isn't how things work as it seems like no matter how much faster Lovelace is over Ampere at RT, they both suffer similar performance penalties for turning RT on. Or maybe games are not yet optimized for Lovelace RT just yet.
 
I guess I was expecting something like this: RTX 3090 turns on RT and takes a -50% fps penalty for doing so, RTX 4090 being "way way faster at RT" than the 3090 turns on the same RT effects and "only" takes a -25% fps penalty. But clearly this isn't how things work as it seems like no matter how much faster Lovelace is over Ampere at RT, they both suffer similar performance penalties for turning RT on. Or maybe games are not yet optimized for Lovelace RT just yet.
When games start using SER there should be a greater uplift in RT performance. But yea the penalty about the same as the previous generations on older games.
 
One obvious thing here I feel even if the 4090 is way way faster like almost double has fast we would see the same relative drop in game when enabled, because of much faster it is in raster.

The 4080 maybe is a better tester if one want to use relative here because it is not that much faster in regular raster while seem to be able to beat by almost 40% a 3090TI in pure RT:

mqD7g6WhmvmRHFTWbwSATK.png


+38-40% over a 3090TI it rarely (never?) does that in a non pure rt game:
the RTX 4080 still easily outclasses any other graphics card currently available. It's 38% faster than the 3090 Ti, 57% faster than the 3080 Ti, and 79% faster than the 3080 it replaces in name if not price.
 
I think the thread title needs to be changed to "RT performance gains not as expected IN GAMES". Yes those synthetic benchmarks show the massive uplift in RT workloads that nvidia is claiming. Actual games however is a completely differently story. I don't really care what the synthetics are showing if it doesn't translate into actual games, sorta like how the 13900k completely stomps the 5800X3D in cinebench single threaded then ends up losing to it in certain games, or win by a far smaller margin than the difference in cinebench scores suggest. Although I guess fully path traced games will show the same huge performance uplifts as the synthetics do but how many fully path traced games are there?
 
https://www.techspot.com/review/2569-nvidia-geforce-rtx-4080/

You can scroll down to the RT section. Plenty of real-world results there along with the RT / raster only on/off differentials. Yes, the raytracing gains are generally higher. Also note, they are comparing to a 3090ti in those results, when in reality they should be comparing to a 3080ti given the $1200 price point of the 4080. I'm sure that widens the RT gains margin if you were to compare it to a 3080ti.
 
I guess I was expecting something like this: RTX 3090 turns on RT and takes a -50% fps penalty for doing so, RTX 4090 being "way way faster at RT" than the 3090 turns on the same RT effects and "only" takes a -25% fps penalty. But clearly this isn't how things work as it seems like no matter how much faster Lovelace is over Ampere at RT, they both suffer similar performance penalties for turning RT on. Or maybe games are not yet optimized for Lovelace RT just yet.
That was spot on with the point of the video I think.

Also for all the RTX users who want to play Metro Exodus at 4K, even at 1440p, Cyberpunk, if RT is turned on, is their gaming experience really better? What degradation in settings and resolution will most have to take for playable framerates? Is the IQ in the end worst them before? I think for most it would be in those games.

Of course some games limit the use of RT to keep performance up, but then what is the point of having RT when it has to be so limited?

So now we have to render at lower resolutions and upsample and now use frame generation on top of that which at this point has some crappy results. Is that really making games better? Nvidia went down this rabbitt hole. With huge price increases for features that for many are not usable in games.

In Professional apps it is very useful as a side note.
 
Last edited:
That was spot on with the point of the video I think.

Also for all the RTX users who want to play Metro Exodus at 4K, even at 1440p, Cyberpunk, if RT is turned on, is their gaming experience really better? What degradation in settings and resolution will most have to take for playable framerates? Is the IQ in the end worst them before? I think for most it would be in those games.

Of course some games limit the use of RT to keep performance up, but then what is the point of having RT when it has to be so limited?

So now we have to render at lower resolutions and upsample and now use frame generation on top of that which at this point has some crappy results. Is that really making games better? Nvidia went down this rabbitt hole. With huge price increases for features that for some are not usable in games.

Professional apps is very useful as a side note.
DLSS Quality for a game like Cyberpunk or DL2 is perfectly fine, and you get the bonus of free AA on top of it. I don't like it for certain games where I need better clarity/accuracy (Warthunder, COD, etc) but for singleplayer games it's a perfect way of gaining some performance back.

And yes, raytracing makes a big difference in the games that actually use it a lot. DL2 in particular looks substantially better as you can tell they didn't invest a lot of effort to do the traditional shader baked lighting. Cyberpunk is interesting because some areas look substantially better with raytracing, but others, you can tell were built completely with shader/raster in mind, and really don't benefit much from the raytracing. However, on a whole, 2077 looks great with raytracing enabled.

Once UE5 becomes ubiquitous in most games being released over the next few years, you'd be a fool to not want raytracing. It won't require special effort from developers to implement anymore, and i'm willing to bet most UE5 games that you run without raytracing will look like shit because developers aren't going to spend any time working out lightmaps / shader based lighting hacks, as the quality of these has always been down to the quality of dev team building out environments. Basically, raytracing is quickly going to become a requirement because it removes time/money sink from development of a game.
 
DLSS Quality for a game like Cyberpunk or DL2 is perfectly fine, and you get the bonus of free AA on top of it. I don't like it for certain games where I need better clarity/accuracy (Warthunder, COD, etc) but for singleplayer games it's a perfect way of gaining some performance back.

And yes, raytracing makes a big difference in the games that actually use it a lot. DL2 in particular looks substantially better as you can tell they didn't invest a lot of effort to do the traditional shader baked lighting. Cyberpunk is interesting because some areas look substantially better with raytracing, but others, you can tell were built completely with shader/raster in mind, and really don't benefit much from the raytracing. However, on a whole, 2077 looks great with raytracing enabled.

Once UE5 becomes ubiquitous in most games being released over the next few years, you'd be a fool to not want raytracing. It won't require special effort from developers to implement anymore, and i'm willing to bet most UE5 games that you run without raytracing will look like shit because developers aren't going to spend any time working out lightmaps / shader based lighting hacks, as the quality of these has always been down to the quality of dev team building out environments. Basically, raytracing is quickly going to become a requirement because it removes time/money sink from development of a game.
Lol, if all your lighting was from RT in a modern complex game, the 4090 would be a slide show. HENCE, RT has made development even harder. You still have to bake, set up multiple raster lights etc. since most, even the 4090 owners do not have the hardware to run RT in real time other than very simple games.
 
Lol, if all your lighting was from RT in a modern complex game, the 4090 would be a slide show.
Isn't what Metrox Exodus RT edition do ? Some composition for SFX, but pretty much all lighting via RT ?



But yes I would imagine has long as it is not like that Metro Exodus edition "RT only" that would be quite true
 
Isn't what Metrox Exodus RT edition do ? Some composition for SFX, but pretty much all lighting via RT ?



But yes I would imagine has long as it is not like that Metro Exodus edition "RT only" that would be quite true

No, all lighting would be like Vray speeds. Still a hybrids approach.

RTX Quake 2, RT lighting, 4090, 4k native, 76fps.
https://www.dsogaming.com/pc-perfor...x-4090-benchmarks-30-most-demanding-pc-games/
Add 100x-1000x more objects, shaders, textures . . . modern game in other words and that 4090 will be a slide show even with DLSS and extra non rendered frames thrown in. Of course you would not need lightmaps, normal game lighting, cubemaps, shadow maps, bunch of Dx 12 lights, ambient occlusion maps etc. Much easier as promised by Jensen to develope. Except one problem -> no one would play it.
 
No, all lighting would be like Vray speeds. Still a hybrids approach.
it is hybrid rendering and use raster power to have the first from the camera bunch of rays done, but VRAY go for a level of details way higher than an real time game aim for, Hollywood can use 3,000 sample per pixels for a giant budget render job, a video games will use say 1 to 4. The main point was no more pre-baking and only real light calculation seem to be going on in the sense to make the game easier to do (has showed in the video at the time stamp):
With the introduction of raytraced reflections, we no longer need to rely on static, Image-Based Lights (IBLs), also known as reflective textures or cube-maps, to provide additional reflection data. We are automatically generating an analogue of them for each pixel for the diffuse data now, but they were also used to pre-render images containing high-detail reflection data (albeit reflections that rarely managed to coincide with the actual location of the reflective surface). By removing these, we were finally able to eliminate one other encumbrance present in more traditional forms of rendering: no longer does any part of our lighting system consist of “baked”, pre-generated data. It can now all be generated in real time.



Before denoising and compositing, Metro look like this,
https://www.4a-games.com.mt/4a-dna/in-depth-technical-dive-into-metro-exodus-pc-enhanced-edition
2021-04-29_10-14-29.jpg


With denoiser becoming so much better and faster than in the past you really do not need VRAY level of rendering to go on, you can put your Blender in Eevee mode,to see how fast denoised path RT (with cheats obviously vs a full blown pure RT renderer, but no pre-baked light mapping, fully dynamic one) can be done.
 
Last edited:
From my complete lag of understanding, not really, if extreme RT you throw say 4 sample by monitor pixel and the scene having 14,000 triangle humans or 30,000 triangle humans does not change much for it, creating more of a log versus complexity versus linear for traditional affair, to the point that enough complexity could have the RT be faster, you do not need to make sure all triangle get hit by rays, you will naturally get what you need and you still just threw just 4 sample by camera pixel and they will still all bounce say a max of 2 time in the scene and so on regardless of the triangle counts, I could just read that the case. The nature of the triangle (mirror like versus complex lighting bounce versus semi transparent being more important), maybe it make the collision work a bit more complex but I am not sure you need more rays, the rays you have will just make contact with small and better representation of the world triangles.
What!?!
 
it is hybrid rendering and use raster power to have the first from the camera bunch of rays done, but VRAY go for a level of details way higher than an real time game aim for, Hollywood can use 3,000 sample per pixels for a giant budget render job, a video games will use say 1 to 4. The main point was no more pre-baking and only real light calculation seem to be going on in the sense to make the game easier to do (has showed in the video at the time stamp):
With the introduction of raytraced reflections, we no longer need to rely on static, Image-Based Lights (IBLs), also known as reflective textures or cube-maps, to provide additional reflection data. We are automatically generating an analogue of them for each pixel for the diffuse data now, but they were also used to pre-render images containing high-detail reflection data (albeit reflections that rarely managed to coincide with the actual location of the reflective surface). By removing these, we were finally able to eliminate one other encumbrance present in more traditional forms of rendering: no longer does any part of our lighting system consist of “baked”, pre-generated data. It can now all be generated in real time.



Before denoising and compositing, Metro look like this,
https://www.4a-games.com.mt/4a-dna/in-depth-technical-dive-into-metro-exodus-pc-enhanced-edition
View attachment 530407

With denoiser becoming so much better and faster than in the past you really do not need VRAY level of rendering to go on, you can put your Blender in Eevee mode,to see how fast denoised path RT (with cheats obviously vs a full blown pure RT renderer, but no pre-baked light mapping, fully dynamic one) can be done.

Good read, it is a very smart hybrid approach. Taken from that link:

Yes, we are still using rasterization to determine the primary rays cast outward from the camera. If a technique is shown to work well within certain limits, then we will continue to use it within those limits. Incidentally, systems which use a combination of raytraced and rasterized rays like this are known as hybrid renderers. The rasterized positions and normals would have originally been fed into the lighting equation to generate a result for the direct illumination of that surface. We also still do that part.

No light maps, you are correct, rasterize direct lighting and diffuse lighting, ray trace indirect lighting.

They use screen space and raytrace reflections so no time wasted setting up cube maps with static textures.

So having a solely RT hardware require game can indeed make things easier.
 
Good read, it is a very smart hybrid approach. Taken from that link:

Yes, we are still using rasterization to determine the primary rays cast outward from the camera. If a technique is shown to work well within certain limits, then we will continue to use it within those limits. Incidentally, systems which use a combination of raytraced and rasterized rays like this are known as hybrid renderers. The rasterized positions and normals would have originally been fed into the lighting equation to generate a result for the direct illumination of that surface. We also still do that part.

No light maps, you are correct, rasterize direct lighting and diffuse lighting, ray trace indirect lighting.

They use screen space and raytrace reflections so no time wasted setting up cube maps with static textures.

So having a solely RT hardware require game can indeed make things easier.

1669736933887.png


If it really takes an RTX 3080 with DLSS just to get 1080p 60fps in Portal RTX (And it's a game that probably runs at over 1000fps without RT), then I can't imagine what kind of hardware it's going to take to be able to run a game like Metro Exodus fully path traced at 4K 120fps. Don't forget that any game coming out still has to be able to run on consoles as well so it's not like devs can go completely ham with RT, otherwise the PS5 and XSX have no chance of running it.
 
View attachment 530483

If it really takes an RTX 3080 with DLSS just to get 1080p 60fps in Portal RTX (And it's a game that probably runs at over 1000fps without RT), then I can't imagine what kind of hardware it's going to take to be able to run a game like Metro Exodus fully path traced at 4K 120fps. Don't forget that any game coming out still has to be able to run on consoles as well so it's not like devs can go completely ham with RT, otherwise the PS5 and XSX have no chance of running it.
Portal rtx is fully redone as a 2022 title with this version... Full remaster including textures, models, and Raytracing. Not really fair to call it as a comparison to the original directly.
 
Portal rtx is fully redone as a 2022 title with this version... Full remaster including textures, models, and Raytracing. Not really fair to call it as a comparison to the original directly.

Well I guess we could use Quake RTX but I believe that's more of a mod so it's probably not as optimized as it could be. The case is pretty much the same though where a game from the 90s can bring these RTX gpus to their knees so just imagine how much more power it's going to take run fully path traced newer games.
 
  • Like
Reactions: noko
like this
Well I guess we could use Quake RTX but I believe that's more of a mod so it's probably not as optimized as it could be. The case is pretty much the same though where a game from the 90s can bring these RTX gpus to their knees so just imagine how much more power it's going to take run fully path traced newer games.
Portal with RTX is a patch that upgrades the assets and adds ray tracing. Quake II RTX is a modification of the engine code to use Vulkan RT and full path tracing. The former is a mod while the latter is not.
 
Portal with RTX is a patch that upgrades the assets and adds ray tracing. Quake II RTX is a modification of the engine code to use Vulkan RT and full path tracing. The former is a mod while the latter is not.

Ah ok. Well look the point that I'm trying to make is that in the end it's going to take WAY more power well beyond what a 4090 can do until we can have the kind of RT effects found in Portal/Quake put into games like Metro Exodus. Pretty sure that game, even though it requires an RT capable GPU and what not, still isn't pushing RT anywhere near the kind of levels that Portal and Quake does.
 
  • Like
Reactions: noko
like this
https://www.techspot.com/review/2569-nvidia-geforce-rtx-4080/

You can scroll down to the RT section. Plenty of real-world results there along with the RT / raster only on/off differentials. Yes, the raytracing gains are generally higher. Also note, they are comparing to a 3090ti in those results, when in reality they should be comparing to a 3080ti given the $1200 price point of the 4080. I'm sure that widens the RT gains margin if you were to compare it to a 3080ti.
I would disagree. Price points and their associated naming schemes aren't relevant considering what has gone down over the last few.
 
So what is Nvidia going to do next round to improve RT, 600mm2+ 3nm die? MCP? 600w+? At what cost and still RT performance would not be there for the majority of cards Nvidia produced. Video laid it down, 3 generations and RT performance progression. RT is not making a radical improvement over traditional normal raster improvements and the performance when used is very costly and at times not paying off in much better visuals over other methods. Already the 4080, priced at $1200 -> what will the 5080 be priced at? $1495? Just to support this RT path Nvdia maybe over emphasizing and taking up valuable die space.

What I've found interesting, AMD found a way to do effective upscaling without the need for extra hardware on die, which can be used on all relatively modern cards -> that to me is progression and pushing the envelope. Now AMD did mention FSR 2.3 which is to address the ghosting issue or reduce it over current versions, will it get closer to DLSS 2.x quality? or better? Have no idea what Fluidmotion will be from AMD, I am hoping it is not just frame generation but a smarter Radeon Boost using Upscaling, meaning in motion/action your lag will actually decrease with lower resolution rendering not so noticeable when in motion. Not sold on frame generation or if that is the right course to take, have to see how it improves.
 
So what is Nvidia going to do next round to improve RT, 600mm2+ 3nm die? MCP? 600w+? At what cost and still RT performance would not be there for the majority of cards Nvidia produced. Video laid it down, 3 generations and RT performance progression. RT is not making a radical improvement over traditional normal raster improvements and the performance when used is very costly and at times not paying off in much better visuals over other methods. Already the 4080, priced at $1200 -> what will the 5080 be priced at? $1495? Just to support this RT path Nvdia maybe over emphasizing and taking up valuable die space.

What I've found interesting, AMD found a way to do effective upscaling without the need for extra hardware on die, which can be used on all relatively modern cards -> that to me is progression and pushing the envelope. Now AMD did mention FSR 2.3 which is to address the ghosting issue or reduce it over current versions, will it get closer to DLSS 2.x quality? or better? Have no idea what Fluidmotion will be from AMD, I am hoping it is not just frame generation but a smarter Radeon Boost using Upscaling, meaning in motion/action your lag will actually decrease with lower resolution rendering not so noticeable when in motion. Not sold on frame generation or if that is the right course to take, have to see how it improves.
Rumor from known leaker kopite7kimi is that Blackwell will still be monolithic. I wonder if NVIDIA discovered that the increased latency inherent in an MCP design is too much for real-time rendering in gaming?

https://www.hardwaretimes.com/nvidi...s-blackwell-to-leverage-monolithic-die-rumor/

Let's not forget that the 4090 is twice as fast as the 3090 while using less power despite the higher TDP specification. I think the doom and gloom thinking of 600+ Watt video cards is unfounded.
 
Rumor from known leaker kopite7kimi is that Blackwell will still be monolithic. I wonder if NVIDIA discovered that the increased latency inherent in an MCP design is too much for real-time rendering in gaming?

https://www.hardwaretimes.com/nvidi...s-blackwell-to-leverage-monolithic-die-rumor/

Let's not forget that the 4090 is twice as fast as the 3090 while using less power despite the higher TDP specification. I think the doom and gloom thinking of 600+ Watt video cards is unfounded.
Twice as fast? I supposed in a few things when not totally CPU limited, which also means CPUs will also need to be fast enough for next generation GPUs. As for power, the trend has been going up, if the 4090 power is the ceiling level is to be seen. The 4080 power is actually reasonable so yes probably not doom and gloom if one is not a tree hugger.

How much more can AMD split apart from the GPU die in their MCP design is to be seen. As for latency, Optical Interconnects is probably the only way for multi-chip high speed, bandwidth, low latency method. I would expect AMD to be heavily investigating this not just for GPUs but for super computers in general where you have hundreds/thousands of processors, different kinds/purposes being put together.

Optical Interconnects Finally Seeing the Light in Silicon Photonics: Past the Hype:​

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8840221/

AMD secret project so to speak:
https://www.tomshardware.com/news/amd-photonics-patent-reveals-a-hybrid-future
 
Rumor from known leaker kopite7kimi is that Blackwell will still be monolithic. I wonder if NVIDIA discovered that the increased latency inherent in an MCP design is too much for real-time rendering in gaming?

https://www.hardwaretimes.com/nvidi...s-blackwell-to-leverage-monolithic-die-rumor/

Let's not forget that the 4090 is twice as fast as the 3090 while using less power despite the higher TDP specification. I think the doom and gloom thinking of 600+ Watt video cards is unfounded.

I say a big factor in why they were able to deliver such a massive uplift in one generation is the move from Samsung 8nm to TSMC 4nm. Going from TSMC 4nm to TSMC 3nm shouldn't be as big of an improvement on the node front (maybe I'm wrong though) so they're gonna have to pull some other levers to deliver the same kind of performance uplift with RTX 5000 series. If it takes an 800mm2 die built on TMSC 3nm pulling 600 watts to deliver another 2x uplift over the 4090 I can see that happening, but then what's really next after that? They can't make dies larger than 8xx mm2 IIRC, they can't (shouldn't) be pushing stock power draw beyond 600 watts, and they would already be using the latest cuttest edge node. Seems like there ain't much levers left to pull after an RTX 5090.
 
Last edited:
  • Like
Reactions: noko
like this
I say a big factor in why they were able to deliver such a massive uplift in one generation is the move from Samsung 8nm to TSMC 4nm. Going from TSMC 4nm to TSMC 3nm shouldn't be as big of an improvement on the node front (maybe I'm wrong though)
Nvidia is still on TSMC "5nm", "Nvidia 4nm" is a bit of brand not sure how 5nm was 5nm anyway but according to the price ugrade on the 3nm:

TSMC Will Reportedly Charge $20,000 Per 3nm Wafer​


By the time of the next gen, it could be NE3 which is again 70% more transistor versus regular N5
 
Nvidia is still on TSMC "5nm", "Nvidia 4nm" is a bit of brand not sure how 5nm was 5nm anyway but according to the price ugrade on the 3nm:

TSMC Will Reportedly Charge $20,000 Per 3nm Wafer​


By the time of the next gen, it could be NE3 which is again 70% more transistor versus regular N5

Ok and what's the density advantage of N5 over Samsung 8nm? Isn't it more than 70%? I thought I read somewhere it was over double, which is the point I was making that going from TSMC 5nm to 3nm probably won't be as big of a leap as going from Samsung 8nm to TSMC 5nm/4nm. And then after 3nm can we still expect another 70% with 2nm? I don't think nvidia can keep relying on node shrinks alone to double performance every gen and they are pretty close to maxing out the die size.
 
Ok and what's the density advantage of N5 over Samsung 8nm? Isn't it more than 70%? I thought I read somewhere it was over double,
Closer to 3x than 2x, something like 2.6-2.8, I doubt we would see a similar ridiculous jump, the 4080 die is 40% smaller than the 3080 and has an estimate 62% more transistor, the 4090 is a bit smaller than the 3090 and has 2.7 time the transistor.

If they do not massively reduce size of die maybe the 5080 can have a similar transistor jump from the 4080 than the 4080 got from the 3080, 5090-4090 would require to up in size and would be less obvious, but the 5080 should be has much more powerful than the 4080 that it need to be without much issue.

hey are pretty close to maxing out the die size.
I have no idea and it would be very expensive, AD103: 379mm, AD102: 608mm, while TU 102 was 754mm and GH100 is 814mm
 
Closer to 3x than 2x, something like 2.6-2.8, I doubt we would see a similar ridiculous jump, the 4080 die is 40% smaller than the 3080 and has an estimate 62% more transistor, the 4090 is a bit smaller than the 3090 and has 2.7 time the transistor.

If they do not massively reduce size of die maybe the 5080 can have a similar transistor jump from the 4080 than the 4080 got from the 3080, 5090-4090 would require to up in size and would be less obvious, but the 5080 should be has much more powerful than the 4080 that it need to be without much issue.


I have no idea and it would be very expensive, AD103: 379mm, AD102: 608mm, while TU 102 was 754mm and GH100 is 814mm

Whatever the case is I'm pretty sure nvidia can deliver another huge uplift with the 5090, but they're really gonna have to go all out with it and I feel like at that point they will have exhausted all their options for pushing performance (at least while being on a monolithic design).
 
So what is Nvidia going to do next round to improve RT, 600mm2+ 3nm die? MCP? 600w+? At what cost and still RT performance would not be there for the majority of cards Nvidia produced. Video laid it down, 3 generations and RT performance progression. RT is not making a radical improvement over traditional normal raster improvements and the performance when used is very costly and at times not paying off in much better visuals over other methods. Already the 4080, priced at $1200 -> what will the 5080 be priced at? $1495? Just to support this RT path Nvdia maybe over emphasizing and taking up valuable die space.

What I've found interesting, AMD found a way to do effective upscaling without the need for extra hardware on die, which can be used on all relatively modern cards -> that to me is progression and pushing the envelope. Now AMD did mention FSR 2.3 which is to address the ghosting issue or reduce it over current versions, will it get closer to DLSS 2.x quality? or better? Have no idea what Fluidmotion will be from AMD, I am hoping it is not just frame generation but a smarter Radeon Boost using Upscaling, meaning in motion/action your lag will actually decrease with lower resolution rendering not so noticeable when in motion. Not sold on frame generation or if that is the right course to take, have to see how it improves.

A huge amount of die area this gen was dedicated to cache because there was no improvement in VRAM bandwidth. Just like AMD did with RDNA3, I think next gen Nvidia will reduce cache significantly, because GDDR6 speeds will finally go up significantly. This should clear up quite a bit of die space on a monolithic design.
 
I'm just gonna come out and say it.. this guy is an AMD fanboy dofus. Nvidia's RT performance has gone through the roof this generation. How do I know this?

The Ray Tracing feature test in 3DMark is an excellent measure of RT performance.

My previous RTX 3080 got 48 FPS when overclocked as far as possible. My RTX 4090 gets...

View attachment 528196


That's right friends... a roughly 3x improvement in RT performance going from a cut down GA102 to a cut down AD102. Obviously this is not a perfect apples-to-apples comparison (anyone have an RTX 3090 they can test?), but this should put an appropriate amount of egg on Mr. AdoredTV's face.


Also, if anyone here has a 6800/XT, 6900 XT or 6950 XT, could you do us a solid and run the DirectX Raytracing feature test? I'm having some difficulty finding results for those GPUs.
1670131647345.png


54.18. I am running an older Intel CPU which might have some small impact. This was at stock, no OC on the GPU. I only noticed Afterburner wasn't running until after I looked at the GPU-Z sensors... I think I turned it off months ago to test something and forgot to turn it back on, lol. And I haven't even noticed in the games I play.
1670131783577.png

57.33 with Afterburner on with +100 on the gpu, +150 on the vram, 114% power limit
1670132141581.png


1670132188597.png

5.8% gain from +4.6% gpu clock, +1.5% mem clock, +49W
 
A huge amount of die area this gen was dedicated to cache because there was no improvement in VRAM bandwidth. Just like AMD did with RDNA3, I think next gen Nvidia will reduce cache significantly, because GDDR6 speeds will finally go up significantly. This should clear up quite a bit of die space on a monolithic design.

Provided Nvidia knew about this (or anticipated) a couple years ago when they designed the new chips.
 
A huge amount of die area this gen was dedicated to cache because there was no improvement in VRAM bandwidth. Just like AMD did with RDNA3, I think next gen Nvidia will reduce cache significantly, because GDDR6 speeds will finally go up significantly. This should clear up quite a bit of die space on a monolithic design.
I really don't see them giving up too much cache. Since TSMC, they could also use vcache since that is TSMC tech.
 
I just tried Fortnite Chapter 4 which is the Unreal Engine 5.1 build featuring lumen and nanite. Was dropping below 60fps at native 4k with all the settings cranked. Never thought Fortnite would be a game to bring a 4090 to it's knees lol, can't imagine future more demanding games using UE 5.1, especially if they have even more intense RT effects going on. Although to be fair I suppose DLSS 3 would greatly increase the fps but supposedly you kinda need a decent frame rate to begin with. Trying to use DLSS 3 when you are running 30fps without it doesn't turn up good results based on what DF said.
 
I guess I was expecting something like this: RTX 3090 turns on RT and takes a -50% fps penalty for doing so, RTX 4090 being "way way faster at RT" than the 3090 turns on the same RT effects and "only" takes a -25% fps penalty. But clearly this isn't how things work as it seems like no matter how much faster Lovelace is over Ampere at RT, they both suffer similar performance penalties for turning RT on. Or maybe games are not yet optimized for Lovelace RT just yet.
Take a look at these two GPU-Z captures. If you compare the pixel and texture fill rates between the 4090 and the 3090 you get a 2.3 to 2.5x improvement, similar to jumping from a 7800gtx to somewhere between a GTX280 and GTX480. Pretty staggering jump in pixel and texturing throughput for one generation. You also have similar jumps on the tflops of the rt units.

The point I am trying to make is last gen was already quite a bit faster without ray tracing on. Lovelace received huge bumps to raster performance, so by that logic, similar scaling in raytracing performance should lead to large drops in fps with it enabled similar to how last gen dropped.
From my own testing, my 4090 is double or greater the performance of my 3080 in cyberpunk at the same settings. With raytracing set to psycho the game is unplayable on the 3080. China town at night is in the 20s on the 3080. On the 4090 worst case is 60fps and often 70-80fps. If that is a failure. I don't know what that guy considers a success.

NVIDIA-RTX-3090-FE-GPUz.jpgoNpmLrBhpZX5BNMB.jpg4090.gif
 
Take a look at these two GPU-Z captures. If you compare the pixel and texture fill rates between the 4090 and the 3090 you get a 2.3 to 2.5x improvement, similar to jumping from a 7800gtx to somewhere between a GTX280 and GTX480. Pretty staggering jump in pixel and texturing throughput for one generation. You also have similar jumps on the tflops of the rt units.

The point I am trying to make is last gen was already quite a bit faster without ray tracing on. Lovelace received huge bumps to raster performance, so by that logic, similar scaling in raytracing performance should lead to large drops in fps with it enabled similar to how last gen dropped.
From my own testing, my 4090 is double or greater the performance of my 3080 in cyberpunk at the same settings. With raytracing set to psycho the game is unplayable on the 3080. China town at night is in the 20s on the 3080. On the 4090 worst case is 60fps and often 70-80fps. If that is a failure. I don't know what that guy considers a success.

View attachment 532032View attachment 532034View attachment 532033

Oh yeah the raw uplift of the 4090 from Ampere in general is pretty insane. Don't think anyone ever doubted that. I'm enjoying the hell out of my 4090 but I do feel RT might hit a dead end in the future if nothing can be done to mitigate the performance loss. Imagine when games get so demanding to render with raster alone nobody is gonna turn on RT to kill performance even more. Maybe a change in game development is what we do need.
 
  • Like
Reactions: noko
like this
Take a look at these two GPU-Z captures. If you compare the pixel and texture fill rates between the 4090 and the 3090 you get a 2.3 to 2.5x improvement, similar to jumping from a 7800gtx to somewhere between a GTX280 and GTX480. Pretty staggering jump in pixel and texturing throughput for one generation. You also have similar jumps on the tflops of the rt units.

The point I am trying to make is last gen was already quite a bit faster without ray tracing on. Lovelace received huge bumps to raster performance, so by that logic, similar scaling in raytracing performance should lead to large drops in fps with it enabled similar to how last gen dropped.
From my own testing, my 4090 is double or greater the performance of my 3080 in cyberpunk at the same settings. With raytracing set to psycho the game is unplayable on the 3080. China town at night is in the 20s on the 3080. On the 4090 worst case is 60fps and often 70-80fps. If that is a failure. I don't know what that guy considers a success.

View attachment 532032View attachment 532034View attachment 532033
So in your scenario, every other RT card fails and the 4090 gets a stunning 60fps. That really does not indicate a success in that limited case. Where every other card has subpar performance. An extremely small number of owners own a 4090 or will have access to that level of performance in the next 5 years.

RT does have some advantages when used in certain cases, corridors, indoors, tight spaces, up close etc. But can be extremely limited when taken too far on performance.

Just have to see how hardware advances, if RT co-processors connected by photonics tech comes about and so on. Can the next gen, particularly the cards people in the majority buy can outperform the current 4090 in RT is to be seen.
 
So in your scenario, every other RT card fails and the 4090 gets a stunning 60fps. That really does not indicate a success in that limited case. Where every other card has subpar performance. An extremely small number of owners own a 4090 or will have access to that level of performance in the next 5 years.

RT does have some advantages when used in certain cases, corridors, indoors, tight spaces, up close etc. But can be extremely limited when taken too far on performance.

Just have to see how hardware advances, if RT co-processors connected by photonics tech comes about and so on. Can the next gen, particularly the cards people in the majority buy can outperform the current 4090 in RT is to be seen.

I think RT won't get anywhere if the only people who can use it with decent framerates are those willing to spend 4 figures on a GPU purchase so you're right 4090 performance will eventually need to trickle down.
 
  • Like
Reactions: noko
like this
I think RT won't get anywhere if the only people who can use it with decent framerates are those willing to spend 4 figures on a GPU purchase so you're right 4090 performance will eventually need to trickle down.
It really goes back to the video, how Nvidia used tesselation to make ATI look bad on games where the extreme amount of tesselarion had zero visual impact. Even when it also hurt their own cards.

Callisto on the PS5 looks amazing, subsurface scattering, rt shadows etc. Smart RT. That is with AMD hardware. Albeit maybe 30fps but ok for a console.

Many of the RTX games RT makes very little to improve visual quality or looks so bad without it that you see an improvement. Kills performance but somewhat usable with Nvidia hardware if you have the higher end GPUs. Is it more marketing Nvidia crap orientated or really useful stuff?

Personally I want smart use of any technology to improve the experience and not leverage it in stupid ways to look better than a competitor.
 
With raytracing set to psycho the game is unplayable on the 3080. China town at night is in the 20s on the 3080. On the 4090 worst case is 60fps and often 70-80fps. If that is a failure. I don't know what that guy considers a success.

View attachment 532032View attachment 532034View attachment 532033
I think the point is that per Ray Tracing core (or whatever Nvidia calls them) there hasn't been much of an improvement. Basically all of the improvement is that Nvidia decided to put a whole lot of RTX cores on there. Maybe relatively more than they did, the previous two gens. Its another aspect of how the generation had to brute force the hardware, to get the performace up to a level to be comfortable with modern games in 4k.
 
  • Like
Reactions: noko
like this
It really goes back to the video, how Nvidia used tesselation to make ATI look bad on games where the extreme amount of tesselarion had zero visual impact. Even when it also hurt their own cards.

Callisto on the PS5 looks amazing, subsurface scattering, rt shadows etc. Smart RT. That is with AMD hardware. Albeit maybe 30fps but ok for a console.

Many of the RTX games RT makes very little to improve visual quality or looks so bad without it that you see an improvement. Kills performance but somewhat usable with Nvidia hardware if you have the higher end GPUs. Is it more marketing Nvidia crap orientated or really useful stuff?

Personally I want smart use of any technology to improve the experience and not leverage it in stupid ways to look better than a competitor.

You should really stop watching Adored TV videos.
 
AdoredTV is a fanboy.
Some games are hard to see if raytracing is doing anything. Look at examples like Spider-Man,Cyberpunk,Control. The quake,quake2,doom rt ports, Minecraft.

Hell, raytracing on Minecraft is so effective. I always feel compelled to dig out bases on the coast line and build a glass wall in the water to have light shine though the water and the glass.

Also I was rebuking that the rt gains are sub par. Going from cyberpunk running like ass with compromised settings to completely maxed at 60 minimum is a decent accomplishment. It’s like 60 min. Around ~70 average in the areas I tested with a 7950x on psycho rt. Keep in mind psycho rt ran horribly before. Ultra is what last gen cards typically run and is over 100fps
 
Back
Top