NVIDIA RTX IO Detailed: GPU-assisted Storage Stack Here to Stay Until CPU Core-counts Rise

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
10,875
"NVIDIA RTX IO is a concentric outer layer of DirectStorage, which is optimized further for gaming, and NVIDIA's GPU architecture. RTX IO brings to the table GPU-accelerated lossless data decompression, which means data remains compressed and bunched up with fewer IO headers, as it's being moved from the disk to the GPU, leveraging DirectStorage. NVIDIA claims that this improves IO performance by a factor of 2. NVIDIA further claims that GeForce RTX GPUs, thanks to their high CUDA core counts, are capable of offloading "dozens" of CPU cores, driving decompression performance beyond even what compressed data loads PCIe Gen 4 SSDs can throw at them.

There is, however, a tiny wrinkle. Games need to be optimized for DirectStorage. Since the API has already been deployed on Xbox since the Xbox Series X, most AAA games for Xbox that have PC versions, already have some awareness of the tech, however, the PC versions will need to be patched to use the tech. Games will further need NVIDIA RTX IO awareness, and NVIDIA needs to add support on a per-game basis via GeForce driver updates. NVIDIA didn't detail which GPUs will support the tech, but given its wording, and the use of "RTX" in the branding of the feature, NVIDIA could release the feature to RTX 20-series "Turing" and RTX 30-series "Ampere." The GTX 16-series probably misses out as what NVIDIA hopes to accomplish with RTX IO is probably too heavy on the 16-series, and this may have purely been a performance-impact based decision for NVIDIA."


https://www.techpowerup.com/271705/...stack-here-to-stay-until-cpu-core-counts-rise
 
I have 24 cores, how many more do I need. My games load in seconds. I think I personally would derive zero benefit from this tech.

Any pcie x4 3.0+ nvme on a 10series Intel or Ryzen allows full bandwidth of the nvme interface ready. This sounds more applicable to people running slower older stuff.

It honestly sounds like hype drivel. A feature that is somewhat useless to the everyday user. Maybe older cpus and earlier storage tech woukd see a bump. Or people running quad cores.
 
Last edited:
Go play GTA5 and tell me it loads in seconds, 60 seconds maybe. I dont know of any games that will use that many cores, but maybe I am wrong. Look at some of the ps5 loading time comparisons, I dont think this will be a bad thing for anyone.
I have 24 cores, how many more do I need. My games load in seconds. I think I personally would derive zero benefit from this tech.

Any pcie x4 3.0+ nvme on a 10series Intel or Ryzen allows full bandwidth of the nvme interface ready. This sounds more applicable to people running slower older stuff.

It honestly sounds like hype drivel. A feature that is somewhat useless to the everyday user. Maybe older cpus and earlier storage tech woukd see a bump. Or people running quad cores.
 
Go play GTA5 and tell me it loads in seconds, 60 seconds maybe. I dont know of any games that will use that many cores, but maybe I am wrong. Look at some of the ps5 loading time comparisons, I dont think this will be a bad thing for anyone.

Yeah alot of games take a long time to load, like ARK.
It seems like ARK takes so long to load because it extracts the assets on a single thread to RAM/VRAM, because I only got a marginal improvement going from a 1700 to a 3900x and only see one core pegged. However 7zip decompression and compression basically doubled in speed.
This is more of a software limited thing rather than hardware...

Having the CPU/Thread limit removed entirely would be dang nice to have.

My Sabrent Rocket 4.0 needs something to do.....
 

Adding DirectStorage to Windows means that Microsoft will have to ensure that there is hardware that supports the new API. The API itself is meant for NVMe SSDs and there are plenty of NVMe-compliant drives around. Meanwhile, the software giant does not say that all of them will support DirectStorage, but claims that it will be supported by ‘certain systems with NVMe drives’ that are ‘properly configured.’

Since Microsoft yet has to disclose all the peculiarities of its new API, it is unclear whether its support will mandate a particular subset of NVMe instructions (and therefore particular drives with particular firmware will be required) or there are certain things beyond SSD that are needed.
 
Go play GTA5 and tell me it loads in seconds, 60 seconds maybe. I dont know of any games that will use that many cores, but maybe I am wrong. Look at some of the ps5 loading time comparisons, I dont think this will be a bad thing for anyone.
...It does load in seconds? The longest bit of GTA loading are the forced logo/eula screens during bootup. Actually getting into the game is <10 seconds from my nvme.

I also have dozens of games that utilize all 24 threads for my system for loading and at most times. It's not uncommon now.

My longest game load times are my biggest Cities Skylines cities, but those are loading up thousands of ridiculous assets thanks to mods and such.

I'm curious how this technology is gonna work with my Cache setup, but I guess i'll worry about that complication when it releases.
 
Yeah alot of games take a long time to load, like ARK.
It seems like ARK takes so long to load because it extracts the assets on a single thread to RAM/VRAM, because I only got a marginal improvement going from a 1700 to a 3900x and only see one core pegged. However 7zip decompression and compression basically doubled in speed.
This is more of a software limited thing rather than hardware...

Having the CPU/Thread limit removed entirely would be dang nice to have.

My Sabrent Rocket 4.0 needs something to do.....

Ark is ready for me in 15 seconds or less.

I even have mods

And to gta 5.... you cant cherry pick one game and apply its slow loading to everything else.

Once I moved to nvme no more load stutter, no more slow access times, things really zipped along. Yes there are benches showing sata and nvme are almos identical in game loading, they are not identical in loading on demand assets during peak gameplay.

Im sure this nv tech will be nice but not sure how nice. Sounds a little like those killer gaming nics to me. More hype than worth.
 
Last edited:
Ark is ready for me in 15 seconds or less.

I even have mods

And to gta 5.... you cant cherry pick one game and apply its slow loading to everything else.

Once I moved to nvme no more load stutter, no more slow access times, things really zipped along. Yes there are benches showing sata and nvme are almos identical in game loading, they are not identical in loading on demand assets during peak gameplay.

Im sure this nv tech will be nice but not sure how nice. Sounds a little like those killer gaming nics to me. More hype than worth.
It was just an example.

Also, try another map like tunguska with 40 mods... Takes like 3-4 mins to load on my 3900x w pcie4 NVME with like 6% cpu activity and hardly any disk activity (less than 10%)

Island loads fine yes, but it could be soooooooo much faster.
A map is like 4 GBs (compressed) give or take and my ssd can read that in 1 sec easily, so why does it take minutes? Software needs to catch up.

We have so much power and speed but we're not using it to it's full potential.

Direct storage and RTX io can only help
 
It was just an example.

Also, try another map like tunguska with 40 mods... Takes like 3-4 mins to load on my 3900x w pcie4 NVME with like 6% cpu activity and hardly any disk activity (less than 10%)

Island loads fine yes, but it could be soooooooo much faster.
A map is like 4 GBs (compressed) give or take and my ssd can read that in 1 sec easily, so why does it take minutes? Software needs to catch up.

We have so much power and speed but we're not using it to it's full potential.

Direct storage and RTX io can only help

If a cpu is utlized to 6% with the map taking 6 mins to load, how is rtx IO going to help? If only 6% cpu is being used is the game not underoptimized already? Imagine if the game makers bumped cpu up to 20% while doing io?
 
It’ll also require game developers to be on board, to some degree. This stuff works in the upcoming PlayStation 5 and Xbox Series X because developers can target the hardware and APIs in those systems directly. They know what hardware will be available and what that hardware is capable of. A game developer can’t develop a game for a person with an RTX card and a high-end NVMe SSD when there are people out there with AMD cards and games stored on rotational media. AMD will have to present its own IO solution, too, then. That also seems likely considering that AMD is behind the SoCs in the upcoming consoles
from here:
https://techreport.com/news/3473104/what-is-nvidia-rtx-io/
 
If a cpu is utlized to 6% with the map taking 6 mins to load, how is rtx IO going to help? If only 6% cpu is being used is the game not underoptimized already? Imagine if the game makers bumped cpu up to 20% while doing io?
I think that the biggest speedup won't be RTX IO itself, but that when developers start coding for accelerated load times, they'll actually be putting some effort toward optimizing them in the first place.
 
If a cpu is utlized to 6% with the map taking 6 mins to load, how is rtx IO going to help? If only 6% cpu is being used is the game not underoptimized already? Imagine if the game makers bumped cpu up to 20% while doing io?

Read somewhere that storage calls are sequential today.

Microsoft's direct storage allows to bypass the OS & allow the game to directly talk to i/o

This will allow the game to make more i/o calls than they do today. Making more i/o calls will help to utilize the ssd better

Cpu doesn't come into the picture in direct storage
 
If a cpu is utlized to 6% with the map taking 6 mins to load, how is rtx IO going to help? If only 6% cpu is being used is the game not underoptimized already? Imagine if the game makers bumped cpu up to 20% while doing io?
If the CPU is sitting at 6% utilized, and it takes 6 minutes to load game assets, the disk itself is the bottleneck and/or there are zillions of tiny assets (not just a single wad/cab file) that simply take that long to pull from the disk into system memory.
If the CPU is also doing that little, it probably means the assets are not compressed in any way, thus the raw assets will take that much longer to pull that much more data, in small chunks, from the disk.

If the CPU is doing much more, it probably means the assets are compressed and the CPU will need to work harder to pull less data, which is compressed in larger chunks, from the disk.

I have 24 cores, how many more do I need. My games load in seconds. I think I personally would derive zero benefit from this tech.
That's great for you, but for many others who only have 4 cores to work with, this technology will be a massive boon, and will give them the benefit without having to upgrade their entire system and drop over $1000 on a CPU alone.
You're right, you probably won't benefit from this tech for your use-case scenario; that doesn't negate the positives of many others, and technology and the tech industry advancements, which will benefit from it in a massive way.
 
Last edited:
Read somewhere that storage calls are sequential today.

Microsoft's direct storage allows to bypass the OS & allow the game to directly talk to i/o

This will allow the game to make more i/o calls than they do today. Making more i/o calls will help to utilize the ssd better

Cpu doesn't come into the picture in direct storage
Yeah for compatibility storage calls are sequential, what I want to know is the NVidia I/O something proprietary or is that just what they are branding their implementation of Microsoft Direct Storage. MDS is being rolled into DX12 so it should be agnostic and AMD is already using it to some degree on the XBox question is will they enable it in their PC drivers.
 
Read somewhere that storage calls are sequential today.

Microsoft's direct storage allows to bypass the OS & allow the game to directly talk to i/o

This will allow the game to make more i/o calls than they do today. Making more i/o calls will help to utilize the ssd better

Cpu doesn't come into the picture in direct storage

Good answer and thanks, I was asking a serious question. Im not a software coder and never claimed to be. Its just 6% seems highly under utilized for a AAA game asset load is what I was thinking.

Maybe this tech will be nice or maybe its just fluff. We shall see.

I truly hope it blows the doors off of long loading times. We shall see indeed.
 
Last edited:
If the CPU is sitting at 6% utilized, and it takes 6 minutes to load game assets, the disk itself is the bottleneck and/or there are zillions of tiny assets (not just a single wad/cab file) that simply take that long to pull from the disk into system memory.
If the CPU is also doing that little, it probably means the assets are not compressed in any way, thus the raw assets will take that much longer to pull that much more data, in small chunks, from the disk.

If the CPU is doing much more, it probably means the assets are compressed and the CPU will need to work harder to pull less data, which is compressed in larger chunks, from the disk.


That's great for you, but for many others who only have 4 cores to work with, this technology will be a massive boon, and will give them the benefit without having to upgrade their entire system and drop over $1000 on a CPU alone.
You're right, you probably won't benefit from this tech for your use-case scenario; that doesn't negate the positives of many others, and technology and the tech industry advancements, which will benefit from it in a massive way.

Thanks for re-answering this. It makes more sense to me. I just do not understand indepth software programming. And from the consumer point of view, outside looking in, I just want to know that what I am being sold is legit compared to my very expensive and very capable Threadripper machine as it stands.
 
It was just an example.

GTA5 isn't really a good example. Single player isn't too bad for me on a sata ssd and fx8350 cpu. Probably 1-2 mins on average. What kills it is gta online. That takes another 3-5 minutes due to their networking setup. RedDead2 loads far faster on the same system for single and multiplayer. 2-3mins tops for multiplayer.
 
I think people are overlooking the fact that this will allow far faster streaming of textures, and better utilization of VRAM. With how UE5 is they basically have the full res texture on the drive, and the engine will LOD in a far more dynamic fashion. So with more VRAM and a higher rate drive you can have a much higher level of detail with less texture pop in open world games along with eliminating the CPU cost that would normally come with moving that much data between the drive and the GPU.

This isn’t really about initial load time.
 
Last edited:
Question.... would this tech work at all with a traditional HDD.... Lots of titles coming up that are now stating SSD as a requirement and lots of people are blowing that off. But if the games streaming textures using the DX12U Direct Store would that cause crashes or other forms of instability??
 
Question.... would this tech work at all with a traditional HDD.... Lots of titles coming up that are now stating SSD as a requirement and lots of people are blowing that off. But if the games streaming textures using the DX12U Direct Store would that cause crashes or other forms of instability??

Likely of little benefit even with a standard SSD. This will only be useful on nvme drives where the bandwidth is so high it would normally hammer the CPU.
 
I have 24 cores, how many more do I need. My games load in seconds. I think I personally would derive zero benefit from this tech.

Any pcie x4 3.0+ nvme on a 10series Intel or Ryzen allows full bandwidth of the nvme interface ready. This sounds more applicable to people running slower older stuff.

It honestly sounds like hype drivel. A feature that is somewhat useless to the everyday user. Maybe older cpus and earlier storage tech woukd see a bump. Or people running quad cores.
Yes and no, your system is still pulling those textures in sequentially, one at a time using the operating systems IO API. This would bypass that and pull them directly in parallel. Traditional HDD’s being a mechanical storage can only read from one location at a time, SSD’s don’t have that limitation so while it already may be really fast for you this could make it faster and has the benefit of requiring the GPU to store less in memory which just improves efficiency across the board. So the Direct Store May not improve your load times overly much but you would likely see small improvements in other locations. Best part though is it lets programmers and developers do things that weren’t feasible with out it so it raises the bar for environmental effects, map sizes, detail, the works.
 
Question.... would this tech work at all with a traditional HDD.... Lots of titles coming up that are now stating SSD as a requirement and lots of people are blowing that off. But if the games streaming textures using the DX12U Direct Store would that cause crashes or other forms of instability??

Did you just answer your own question 🤔

system is still pulling those textures in sequentially, one at a time using the operating systems IO API. This would bypass that and pull them directly in parallel. Traditional HDD’s being a mechanical storage can only read from one location at a time, SSD’s don’t have that limitation

My guess, microsoft will mandate minimum specs for direct storage, like we had for windows vista Aero effects
 
Yes and no, your system is still pulling those textures in sequentially, one at a time using the operating systems IO API. This would bypass that and pull them directly in parallel. Traditional HDD’s being a mechanical storage can only read from one location at a time, SSD’s don’t have that limitation so while it already may be really fast for you this could make it faster and has the benefit of requiring the GPU to store less in memory which just improves efficiency across the board. So the Direct Store May not improve your load times overly much but you would likely see small improvements in other locations. Best part though is it lets programmers and developers do things that weren’t feasible with out it so it raises the bar for environmental effects, map sizes, detail, the works.

Thanks for that answer. Clarifies it more.
 
Yes and no, your system is still pulling those textures in sequentially, one at a time using the operating systems IO API. This would bypass that and pull them directly in parallel. Traditional HDD’s being a mechanical storage can only read from one location at a time, SSD’s don’t have that limitation so while it already may be really fast for you this could make it faster and has the benefit of requiring the GPU to store less in memory which just improves efficiency across the board. So the Direct Store May not improve your load times overly much but you would likely see small improvements in other locations. Best part though is it lets programmers and developers do things that weren’t feasible with out it so it raises the bar for environmental effects, map sizes, detail, the works.
This wouldn’t help much if at all with SATA drives due to how AHCI works. You need nvme and the ability to access data in parallel across the bus.
 
Is there a lossy decompression?
Yes, lossy compression is a huge thing in images and video's. I don't imagine all of your 4k textures are stored lossless. That said, for loading data like vertices and normals, binormals, etc into a GPU (or imagine loading a scientific workload), it wouldn't make sense to be anything but lossless :p.
 
Yes, lossy compression is a huge thing in images and video's. I don't imagine all of your 4k textures are stored lossless. That said, for loading data like vertices and normals, binormals, etc into a GPU (or imagine loading a scientific workload), it wouldn't make sense to be anything but lossless :p.
DEcompression
 
ps. This can't be all that special if they already announced support for turing ;).

"RTX IO is supported on all GeForce RTX Turing and NVIDIA Ampere-architecture GPUs."

Basically they implemented Direct Storage, yay, lets amp this up like it's NVIDIA coming up with a new way to game, lol. No, they are implementing a feature of an existing API that M$ and AMD have designed for the xbox. PS5 has similar technology as well. This is more like NVIDIA jumps on the bandwagon and supports Direct Storage than NVIDIA's newest most shiny tech that you can't live without!
 
I mean, if it's compressed lossy... one would assume what you decompress is lossy ;). I don't imagine many algorithms alow for compressing lossy and decompressing lossless.
That's an awkward way to put it. Losless decompression means you get 1:1 of whatever is in the compressed container. Lossy decompression would imply that the resulting data is different from what is in the container.

And the article doesn't imply that this would change the way data is compressed in the first place, just how it would be loaded.
 
That's an awkward way to put it. Losless decompression means you get 1:1 of whatever is in the compressed container. Lossy decompression would imply that the resulting data is different from what is in the container.

And the article doesn't imply that this would change the way data is compressed in the first place, just how it would be loaded.
Yeah, it's an odd way to state it for sure. Most people would call it lossless compression, not decompression, as whether it's lossy or not is based on how it was compressed :).
 
Did you just answer your own question 🤔



My guess, microsoft will mandate minimum specs for direct storage, like we had for windows vista Aero effects
I did sort of I still have questions but they would be a development issue but I’ll just tell my friends if they aren’t building with NVMe at this stage they are being idiots. Problem solved.
 
I did sort of I still have questions but they would be a development issue but I’ll just tell my friends if they aren’t building with NVMe at this stage they are being idiots. Problem solved.
I wound up with NVMe because the board I'm using has three M.2 slots and only one of those supports SATA. While this is a higher-end board, I kind of expect that to be what manufacturers do more for enthusiast boards. Especially as 4TB and larger SSDs start to fall to reasonable prices.
 
That's an awkward way to put it. Losless decompression means you get 1:1 of whatever is in the compressed container. Lossy decompression would imply that the resulting data is different from what is in the container.

And the article doesn't imply that this would change the way data is compressed in the first place, just how it would be loaded.

Yeah there's no such thing as lossy decompression.
 
Back
Top