RTX 4xxx / RX 7xxx speculation

The 4090 based on leaked specs I feel will be ~30-40% faster than a 3090ti based on FPS
Going from TSMC 7 to TSMC 4 on the A100->H100 pro line, gave a 50% transistor count on a same size die and a little more than tripled the performance versus a 75% jump in power, seem to be around a 75% boost in efficacy (power wise). I would assume Samsung 8 to TSMC 4 is a significantly bigger boost but I would also blindly assume that it is a bit harder to linear upgrade performance for a diverse gaming workload that for what they used to benchmark the Hopper and raw tflops value.

How do you roughly see 30-40% faster, the 50% cuda core jump and a bit of loss not being a perfect linear increase ? (I am really ignorant of the performance increase looking at spec, not something I did generation after generation to develop some rules of thumb)
 
Last edited:
Going from TSMC 7 to TSMC 4 on the A100->H100 pro line, gave a 50% transistor count on a same size die and a little more than tripled the performance versus a 75% jump in power, seem to be around a 75% boost in efficacy (power wise).

How do you roughly see 30-40% faster, the 50% cuda core jump and a bit of loss not being a perfect linear increase ? (I am really ignorant of the performance increase looking at spec, not something I did generation after generation to develop some rules of thumb)
Just my gut feeling based on the past 15 years. Again it is just my gut and I probably will be wrong if we go by what others are saying (which could also be targeted leaks by Nvidia. I mean leaks gives them free marketing and makes people wait versus buy a competitor)
 
Plus cases back then had all the hard drive cages basically blocking the airflow from the front fans in addition to thick cables everywhere due to no concern for cable management behind the motherboard. I can't possibly see a case from this era being competitive with properly designed modern cases and the hot, power-hungry parts we have now.

My Antec 1100 had the side intake fans which eliminated the issue from the hard drive cages and blew cool air directly on the GPU and motherboard chipset. No one does that anymore these days though, because tempered glass and all that.
 
Word on the street is that we might get some sort of an announcement in the next week. I was about to buy a 3090ti (I still might) based on what is announced. The 4090 based on leaked specs I feel will be ~30-40% faster than a 3090ti based on FPS. Obviously certain games will either be lower or higher than that. Price wise it will be similar 1800-2000 and use around the same power 480watt. I don't see what the big fuss is over power. I have a 1200watt PSU that I bought a long time ago to run Tri-Fire and my system would pull around 900-1000 watts. 650-700 watts and running 65-75c doesn't seem bad to me.
My only issue with power, is heat. I'm not concerned with needing to upgrade my PSU (which I will need to do, but I'm gonna wait until ATX 3.0 for that) nor am I concerned with power usage. Heat is a factor though. And while I don't mind running the AC either, it just presents a situation where one room will get very cold while my office is just starting to get comfortable. I'll just have to buy my wife a nice blanket I suppose.
 
My Antec 1100 had the side intake fans which eliminated the issue from the hard drive cages and blew cool air directly on the GPU and motherboard chipset. No one does that anymore these days though, because tempered glass and all that.
That's true. I had some cases that did that as well and I completely forgot about that.
 
Was never a huge fan of the side fans. They almost never have any filtration to speak of and while they may cool the component they're blowing directly against (until dust becomes an issue, which happens pretty quick), they actually do a great job of completely screwing the airflow in the case.
 
Was never a huge fan of the side fans. They almost never have any filtration to speak of and while they may cool the component they're blowing directly against (until dust becomes an issue, which happens pretty quick), they actually do a great job of completely screwing the airflow in the case.
Yeah, I like the 3xxx series cooler's design but for cooling I definitely prefer the blower style models. Except for weird ITX case configurations and stuff they exhaust all the hot air out of the case. For noise I guess they have an advantage at least.
 
So everyone who did that, or something like it, now needs a complete new rig. There's no more upgrading headroom, and that goes for all gaming PCs made without support for resizable BAR, and realistically at this point, no PCIe 5.0.
I agree except for PCIe 5. Yes if you want to be extremely future proof, sure. But I don't see a use or loose case scenario for quite a while.
Why do you think it is a must?
Is the future of Resizable Bar SOON with PCIe 5? Another GPU leap. I'm in!
 
Last edited:
Just my gut feeling based on the past 15 years. Again it is just my gut and I probably will be wrong if we go by what others are saying (which could also be targeted leaks by Nvidia. I mean leaks gives them free marketing and makes people wait versus buy a competitor)

Sorry, but, how are you basing it on the last 15 years and coming up with 30 to 40% improvement in performance between the 3090 and the 4090?

Ignore all the hype and waffle. Just look at the improvements you get from moving from the Samsung 8nm node to the TSMC 4nm node. That alone will bring a roughly 30% increase in performance. If you think there is only going to be a 30 to 40% improvement, then, what you are basically saying is that Nvidia are going to have little to no improvements in performance from changing Architectures. However, this goes against basing things on the last 15 or more years where it can be seen that a large portion of their performance improvements from generation to generation come from Architectural changes.
 
Was never a huge fan of the side fans. They almost never have any filtration to speak of and while they may cool the component they're blowing directly against (until dust becomes an issue, which happens pretty quick), they actually do a great job of completely screwing the airflow in the case.

You know you can just go to Amazon and buy a magnetic dust filters for almost nothing, right?

https://www.amazon.com/s?k=case+fan+magnetic+filter

You just put them on the outside (and on any black case they'll blend in), and then take them off for a quick wipedown and then put them right back on (assuming the side intake grill is flush; sometimes it isn't so you need to screw them in from the other side). Not that I've seen dust make a gigantic difference in airflow anyway in most of my builds...

As far as "screwing up the airflow in the case", kind of depends on the layout. I do think that the much more effective location for GPU cooling fans is right below the GPU, not to the side of it (unless the one to the side is a bit on the low side so it intakes air below the gpu rather than to the side).

For instance, this is the layout inside of my Lian Li Mesh II, which is a modern case:
1652875520906.png


Putting fans on bottom promotes cool airflow going directly onto the GPU fans, and then as a side effect the aftermarket exhaust also gets blown upwards to the exhaust section. I would imagine that for instance if the side intake fans were positioned such that they were facing the GPU's heatsink, it would be somewhat counterproductive. You would blow cool air into the gpu intake, but then the hot air also gets blown back into the heatsink which would probably cause the back half of the GPU to be hotter than the front half... it might still equalize out to be cooler, but it would be awkward. If anything, maybe having exhaust in the midsection would be better because it would divert the heat from the GPU outwards, before it reaches the CPU and/or VRMs.

I think an issue with these sort of aftermarket GPU setups though is the M2 drive cooling. It's usually not an issue, but my Samsung 970 EVO Plus (2TB) runs really hot (afaik they all do at that size)... and GPU exhaust doesn't help lol. During gaming it's usually around 55C, even with the heatsinks the MB provides.

Frankly I'm also not a fan of the stupid tempered glass/acrylic era we're in now, though, and I actually don't like RGB much either. Idk what or if I would put anything in the midsection, but I sure as hell know that I can't when the case literally makes it impossible.
 
As an Amazon Associate, HardForum may earn from qualifying purchases.
Sorry, but, how are you basing it on the last 15 years and coming up with 30 to 40% improvement in performance between the 3090 and the 4090?

Ignore all the hype and waffle. Just look at the improvements you get from moving from the Samsung 8nm node to the TSMC 4nm node. That alone will bring a roughly 30% increase in performance. If you think there is only going to be a 30 to 40% improvement, then, what you are basically saying is that Nvidia are going to have little to no improvements in performance from changing Architectures. However, this goes against basing things on the last 15 or more years where it can be seen that a large portion of their performance improvements from generation to generation come from Architectural changes.

I would have to agree, not sure where he’s pulling this 1.5 decades of experience from. There’s been a couple generations where nvidia or AMD have dropped the ball but those are exceptions not rules.
 
I was anticipating 50-70% on the original 600W specs.
Maybe 40-50% now. More curious about the rest of the 40 series, tbh. The 4090 is going to be $2k+ I find it hard to care.
 
SSD caching/Direct Storage.

For something like direct storage, would the 7,880 MB/s of PCIe 4.0 4x get to be a limitation soon or the difference is more than bandwidth between the 2 ?

PS5 for example had 5,500 mbs drive speed, and putting a 3,200 MB/s drive instead changed absolutely nothing in those series of test:
https://www.theverge.com/2021/8/4/22608153/ps5-ssd-speed-test-storage-expansion-m2-playstation-5
https://www.eurogamer.net/digitalfoundry-2021-the-worst-and-best-nvme-ssds-tested-on-playstation-5

And a Xbox will have less than half that speed.

You could absolutely be right even just bandwith wise, maybe 10,000 mbs buyable drive will be available and with ddr5 bandwith and new GPU will be able to eat that much heavily compressed data at those speed, but at the moment:

forspoken_ssd_speed_2-980x477.jpg


50% faster than the PS5 original drive could take a long time before being a bottleneck. The number above is with the decompression the actual read from the drive would be between 1.5 to 7 time smaller depending of the compression/asset type.

Will see once GPU decompression get online and optimised by a major engine I guess (or maybe people already have some good idea ?), but I feel we could be in a how fast can you decompress/initialize/use the asset being most of the time and not the reading on a 7,000 mb/s disk drive to see a vast improvement going much faster than that.
 
For something like direct storage, would the 7,880 MB/s of PCIe 4.0 4x get to be a limitation soon or the difference is more than bandwidth between the 2 ?

I completely agree that right now it's basically a non-issue, and we might not even see too many games where it gets used heavily for a long while, but I think it will be a big bottleneck in a few years, especially for applications that are specialized to use it, probably content creation like with everything else.

We're talking about future proofing standards, after all, not today affirming standards.

By the way, game load times don't really have much to do with how this will be used going forward. Think frame caching like on the PS5.
 
By the way, game load times don't really have much to do with how this will be used going forward. Think frame caching like on the PS5.
It is a good scenario for how fast the disk read can go (which I imagine is a peak read performance, everything dedicated to it and perfectly packaged data), I must admit I have no idea what frame caching on the PS5 (not sure what it would be in the context of a video game) is and do not seem to be googling with the good keyword.

But PS5 has almost no ram versus a computer, I feel simply having 32-64 gig of system ram an using that to cache things instead of a drive would be 2 to 10 time better and simpler.
 
But PS5 has almost no ram versus a computer, I feel simply having 32-64 gig of system ram an using that to cache things instead of a drive would be 2 to 10 time better and simpler.

I'm sure in a whole lot of situations that's true, but Sony is making it be a thing, Microsoft is making it a thing, and AMD is making it a thing. It's already a thing, even if no one's really using it.

And when people start using it, it's going to be weird. Sure, it can be used to cache frames for increasing video game frame rates, but also, who knows what. We're getting to a point where the difference between a page file and more RAM is starting to blur lines.

So even if it's not a big thing now, it will be very shortly, at least, within this current next generation of parts. And I'm betting that it's the one bit of tech that will define a clear generational gap over a typical incremental improvement.

Storage has been a major part of generational differences, now that CPU architectures are interchangeable. But we're on the cusp of another CPU architecture, too, with heterogeneous main processors.

This will be a break from the last generation.
 
:D
I was anticipating 50-70% on the original 600W specs.
Maybe 40-50% now. More curious about the rest of the 40 series, tbh. The 4090 is going to be $2k+ I find it hard to care.
It has been said, "You should never go full Fermi" Well Nvidia is going full Fermi!
They know the rumored specs of AMD's 7000 top card is %100+ hence they are going full Fermi! If they don't hit %80+ with the rumored power draw then it is a huge fail at least on paper. Maybe it will have 10X RT or Super DLSS that more then equalizes, but it better bring it @ 600W+
 
I'm sure in a whole lot of situations that's true, but Sony is making it be a thing, Microsoft is making it a thing, and AMD is making it a thing. It's already a thing, even if no one's really using it.

I cannot find anything on google about it and I am still not sure what we are talking about, do you a link explaining it ?
 
I cannot find anything on google about it and I am still not sure what we are talking about, do you a link explaining it ?

RedGamingTech on YouTube has done a bunch of videos on it over the past year or so, but trying to search for them using YouTube is a real PITA.

The two main ways that DirectStorage is expected to be used is to allow the GPU to access game assets directly from the SSD and decompress them on the GPU instead of on the CPU. This is faster and reduces CPU load. This isn't really in use yet since it's so fundamental to game engines that it will have to be developed for.

The other way is already in use with the PS5, and that's frame caching. Basically, the PS5 can save whole or parts of rendered frames that the GPU portion of the APU compresses and stores on the SSD directly, and if a future frame calls for a previous frame, or part of a previous frame, instead of rendering it again, it just reads it off the SSD, decompresses it, and stitches it together with any needed changes. This spares a lot of GPU time which can go do other stuff.
 
RedGamingTech on YouTube has done a bunch of videos on it over the past year or so, but trying to search for them using YouTube is a real PITA.

The two main ways that DirectStorage is expected to be used is to allow the GPU to access game assets directly from the SSD and decompress them on the GPU instead of on the CPU. This is faster and reduces CPU load. This isn't really in use yet since it's so fundamental to game engines that it will have to be developed for.

The other way is already in use with the PS5, and that's frame caching. Basically, the PS5 can save whole or parts of rendered frames that the GPU portion of the APU compresses and stores on the SSD directly, and if a future frame calls for a previous frame, or part of a previous frame, instead of rendering it again, it just reads it off the SSD, decompresses it, and stitches it together with any needed changes. This spares a lot of GPU time which can go do other stuff.

im still not seeing why this is a thing. Why not just use RAM which is faster both in terms of bandwidth and latency.
 
Why not just use RAM which is faster both in terms of bandwidth and latency.

Intel is already teasing NVMe drives capable of 28 GB/s. That's faster than fast DDR4. And the GPU will have direct access to it, no offloading to the CPU. No need to use other resources.

And most systems don't have terabytes of RAM.
 
Intel is already teasing NVMe drives capable of 28 GB/s. That's faster than fast DDR4. And the GPU will have direct access to it, no offloading to the CPU. No need to use other resources.

And most systems don't have terabytes of RAM.
But we are already at DDR5 meanwhile this nvme is still being tested. Why not give the GPU direct access to the RAM? The only benefit I see to this is load times, which is hardly an issue with todays NVMe drives.

I suppose progress is still progress, I’m just not as excited as some folks seem to be about it.
 
Intel is already teasing NVMe drives capable of 28 GB/s. That's faster than fast DDR4. And the GPU will have direct access to it, no offloading to the CPU. No need to use other resources.

And most systems don't have terabytes of RAM.
Will they ?

At least what is going on right now, it is still going through ram and handled in part by the cpu:
20a8d0bf-cb53-429f-afd0-40c5fb4f1621.jpg


WHAT DIRECTSTORAGE DOES DIFFERENTLY

DirectStorage for Windows replaces the Win32 FileIO API with a new API designed for very high numbers of small file requests. This allows modern games to get their assets out of storage much quicker and to saturate the high bandwidth of NVMe SSDs. The IO requests are still submitted by the CPU. The compressed assets are loaded into system memory, just like before. The assets are decompressed by the CPU, just like before, and then copied over to VRAM, just like before.

Again, in its current state, DirectStorage for Windows does not bypass the CPU or system memory for graphics file IO.


GPU DECOMPRESSION AND RTX IO

Decompressing assets on the GPU is still being worked on by Microsoft and graphics card vendors. Nvidia calls their GPU-based decompression API “RTX IO”. This is not currently available and has no confirmed release date as of today. Once this feature is released and implemented into games, assets will be able to be copied from system memory to VRAM in a compressed state, where they will then be decompressed by the GPU. However, the compressed assets must still be loaded from storage into system memory via DirectStorage first. The CPU will still handle these IO requests. The only change is that the CPU will no longer have to handle decompressing the assets.


Something cache on the NVME would still go to the regular ram (under CPU instruction) before reaching a GPU under direct storage for windows 11 and by that time ram will be well over 100 gb/s in regular dual channel.

I did search and search and could not find absolutely no mention of frame caching online (for video game not video editing software) either on ram or ssd and I am really not sure how it would work in pratice (how does it know pre-render how similar it is to a previous frame, what to take from them and so on in significantly less time than a render ?)
 
So according to rather accurate leaks, 4070 is on par with 3090. That will be interesting. About €700 for a €2000+ card.
Excited to see if there will be any GeForce info on Computex tomorrow (there probably will).
I can see Nvidia doing this, but limiting 4070 to 8gb or 10gb. If 4080 is at 450w and is twice the speed of 3080, I'm all over that and will be happy for years to come.
Worst case scenario, RTX 3xxx/RX 6xxx prices are going to PLUMMET.
 
The two main ways that DirectStorage is expected to be used is to allow the GPU to access game assets directly from the SSD and decompress them on the GPU instead of on the CPU. This is faster and reduces CPU load. This isn't really in use yet since it's so fundamental to game engines that it will have to be developed for.
Not convinced this will be faster, given most games stress the GPU rather then the CPU. If the CPU is doing less work (comparatively) then the GPU, wouldn't you want the fairly expensive task of decompression on the CPU instead? And while GPUs do tend to be faster then CPUs for that type of workload, you have to consider the high load GPUs will typically have.

So I question if this is "faster" in practice. Which means developers will "assume" it will be, and people will start complaining about stuttering and slow to load game assets.
 
Worst case scenario, RTX 3xxx/RX 6xxx prices are going to PLUMMET.
Yeah it will be interesting to wait and see. I was going to pull trigger but now with Zen4 leaks it's not looking like I should bother waiting to build a DDR5/PCie5 rig at this point, because whatever I build to be future proof now will be out of date by the time it's relevant and standards/support levels will change as usual. The extra cost isn't worth it. I'd wait on GPUs though definitely, especially if nvidia is doing the prescott treatment IF amd chiplet GPUs deliver.
 
So I question if this is "faster" in practice. Which means developers will "assume" it will be, and people will start complaining about stuttering and slow to load game assets.

It already is faster. It's already in limited use.
 
It already is faster. It's already in limited use.

Do you have a realistic example (of gpu decompression done while rendering is occuring) ? NVIDIA example, looking at the code seem to be just during loading the scene (while the gpu compute would have been low, so obviously not an issue to use it for decompression), the few actual games demo I saw seem to be using direct access but not the GPU decompression
 
Do you have a realistic example (of gpu decompression done while rendering is occuring) ?

I don't know if that is what is in use with Forspoken or not, just that it uses Direct Storage. Makes sense, since it's Square Enix.
 
I don't know if that is what is in use with Forspoken or not, just that it uses Direct Storage. Makes sense, since it's Square Enix.
Forspoken is just some form of direct storage used, it is really just taken more advantage of faster harddrive speed.

In a world where GPU often get near to 100% usage and CPU pretty much always has low usage cores during gameplay, if we ever transition to an heavier loading access has needed while gaming model, it is not obvious that doing the decompression on the GPU instead of the CPU/dedicated hardware on a console will be a good idea, GPU could be so much better at it and compression by so effective (say, 5:1, 7:1, that saving the PCI e bandwith and having uncompressed already in the VRAM) make it worth it obviously, but it is not obvious that it would be the case.

For traditional asset loading, it is way more obvious to be worth it.


Interestingly enough, Luminous Productions confirms that DirectStorage does not currently allow GPU decompression. So Forspoken is hitting 1 second loading times while still utilizing the CPU.

"Because the current version of DirectStorage does not support GPU decompression, GPU data is decompressed on the CPU. This implementation, however, still outperforms existing Win32 APIs."



Read more: https://www.tweaktown.com/news/8526...-1-second-on-pc-with-directstorage/index.html
 
  • Like
Reactions: Axman
like this
In a world where GPU often get near to 100% usage and CPU pretty much always has low usage cores during gameplay, if we ever transition to an heavier loading access has needed while gaming model, it is not obvious that doing the decompression on the GPU instead of the CPU/dedicated hardware on a console will be a good idea, GPU could be so much better at it and compression by so effective (say, 5:1, 7:1, that saving the PCI e bandwith and having uncompressed already in the VRAM) make it worth it obviously, but it is not obvious that it would be the case.
Which is my argument. Yes, GPUs are better at compression/decompression, but you have to consider that GPUs tend to be more heavily loaded. If developers aren't careful they could make a lot of assumptions that may not hold true across different configurations, or may break things years down the line. And as someone who plays a lot of older games, I can attest is sucks when the flaws of a developers design manifest a decade down the line with no real way to fix things.
 
Forspoken is just some form of direct storage used, it is really just taken more advantage of faster harddrive speed.

In a world where GPU often get near to 100% usage and CPU pretty much always has low usage cores during gameplay, if we ever transition to an heavier loading access has needed while gaming model, it is not obvious that doing the decompression on the GPU instead of the CPU/dedicated hardware on a console will be a good idea, GPU could be so much better at it and compression by so effective (say, 5:1, 7:1, that saving the PCI e bandwith and having uncompressed already in the VRAM) make it worth it obviously, but it is not obvious that it would be the case.

For traditional asset loading, it is way more obvious to be worth it.


Interestingly enough, Luminous Productions confirms that DirectStorage does not currently allow GPU decompression. So Forspoken is hitting 1 second loading times while still utilizing the CPU.

"Because the current version of DirectStorage does not support GPU decompression, GPU data is decompressed on the CPU. This implementation, however, still outperforms existing Win32 APIs."



Read more: https://www.tweaktown.com/news/8526...-1-second-on-pc-with-directstorage/index.html

freeing up PCIe bandwidth doesn’t seem like a huge deal considering it’s never even close to being saturated. The more I’m reading about direct storage, the more it seems like it’s going to be marginally Faster at load times and not much else.
 
I dont get the push for proprietary solution with Direct Storage... There is already CXL for cache coherency
 
Was never a huge fan of the side fans. They almost never have any filtration to speak of and while they may cool the component they're blowing directly against (until dust becomes an issue, which happens pretty quick), they actually do a great job of completely screwing the airflow in the case.

It depends a lot on system configuration and components used. For a period of time, they were actually one of the most beneficial things you could have done to your case, particularly if you were using a blower-style GPU cooler.

https://www.pugetsystems.com/labs/articles/Side-Panel-Fans-Are-They-Worth-It-102/

Obviously this configuration is less common now, but we are talking about older case designs here, where bypassing the HD cage was necessary on the intake.

It's also easy to filter out that dust. You can use a relatively cheap dust filter and solve that problem. Worst case scenario, blow the dust out of the case every few months as a part of regular maintenance.
 
freeing up PCIe bandwidth doesn’t seem like a huge deal considering it’s never even close to being saturated. The more I’m reading about direct storage, the more it seems like it’s going to be marginally Faster at load times and not much else.
Looking a bit more it has been years that GPU are using compressed texture their get in their VRAM and uncompressed it themselve, which make me wonder how much buzzword marketing is going on here, the multithread taking advantage of the bandwith reading on faster PCIE 3.0 and up drive is quite obvious and will be huge by itself too.

For example:
https://github.com/BinomialLLC/basis_universal

or
http://www.radgametools.com/oodlete...ture is an SDK,-15% smaller after compression.
Oodle Texture RDO optimizes for the compressed size of the texture after packing with any general purpose lossless data compressor. This means it works great with hardware decompression. On consoles with hardware decompression, RDO textures can be loaded directly to GPU memory, Oodle Texture doesn't require any runtime unpacking or CPU interaction at all. Of course it works great with Kraken and all the compressors in Oodle Data Compression, but you don't need Oodle Data to use Oodle Texture, they are independent.

I would imagine company internal and the Unity/Unreal engine had this for a while.
 
Back
Top