AMD support for Microsoft DirectStorage 1.1

kac77

2[H]4U
Joined
Dec 13, 2008
Messages
3,318
AMD support for Microsoft DirectStorage 1.1

AMD is pleased to support Microsoft®’s recently released DirectStorage 1.1 with GPU Decompression. AMD has worked closely with Microsoft to ensure the best possible experience on AMD devices and platforms. DirectStorage is a feature that must be enabled by (game) application developers to realize the benefits. AMD is ready to support our ISV partners to enable DirectStorage in future game title releases.
It's nice of them to announce it, Nvidia just sort of snuck it in on the 27'th with update 526.74.
I want to know though how AMD is helping it get implemented or if they have to.
Nvidia has stated they are adding support for DirectStorage into the RTX IO libraries.
https://developer.nvidia.com/blog/a...mes-and-apps-with-gdeflate-for-directstorage/
But AMD is just saying they are supporting ISV partners, so are they adding it to the Fidelity FX kit, or are they going to just let developers figure it out.
 
Remember: nVidia always (ALWAYS) approaches anything from the stand point of locking you in to their way of doing anything - therefore it is natural and expected that they will try and tie this support to some custom APIs that they control that they will work like Hell to make sure game engine developers use those custom APIs as the One True Method. It's "The Way it's Meant to be Played." (See PhysX, CUDA, RTX, etc.)

To be fair, were AMD the market domineering bag of dicks that nVidia is, they would probably do some of that too, but since they are not, they tend to use more open APIs - like maybe the actual DirectStorage API from Microsoft, thereby rendering the need for custom APIs pointless.
 
Just tested it on a 980 pro vs a 5400 RPM HDD, 3900x, 3070 RTX

980 pro:
Uncompressed:
16 MiB staging buffer: .......... 4.90982 GB/s mean cycle time: 7,896,088
32 MiB staging buffer: .......... 6.44736 GB/s mean cycle time: 8199583
64 MiB staging buffer: .......... 6.56287 GB/s mean cycle time: 8842102
128 MiB staging buffer: .......... 6.54174 GB/s mean cycle time: 10393338
256 MiB staging buffer: .......... 6.22032 GB/s mean cycle time: 15157299
512 MiB staging buffer: .......... 5.89944 GB/s mean cycle time: 26585047
1024 MiB staging buffer: .......... 5.59646 GB/s mean cycle time: 55861041
ZLib:
16 MiB staging buffer: .. 0.503189 GB/s mean cycle time: 1,287,480,489
32 MiB staging buffer: .. 0.905197 GB/s mean cycle time: 1305462716
64 MiB staging buffer: .. 1.58616 GB/s mean cycle time: 1323209400
128 MiB staging buffer: .. 2.13534 GB/s mean cycle time: 1,431,793,526
256 MiB staging buffer: .. 2.18412 GB/s mean cycle time: 1781843332
512 MiB staging buffer: .. 1.98077 GB/s mean cycle time: 1784772790
1024 MiB staging buffer: .. 1.09736 GB/s mean cycle time: 1982930157
CPU GDEFLATE:
16 MiB staging buffer: .. 0.410827 GB/s mean cycle time: 1,584,944,318
32 MiB staging buffer: .. 0.740777 GB/s mean cycle time: 1590561345
64 MiB staging buffer: .. 1.307 GB/s mean cycle time: 1635158088
128 MiB staging buffer: .. 1.89679 GB/s mean cycle time: 1691128289
256 MiB staging buffer: .. 2.60383 GB/s mean cycle time: 1855841834
512 MiB staging buffer: .. 2.1975 GB/s mean cycle time: 1967137204
1024 MiB staging buffer: .. 0.89858 GB/s mean cycle time: 2376362167
GPU GDEFLATE:
16 MiB staging buffer: .......... 5.32005 GB/s mean cycle time: 10,716,133
32 MiB staging buffer: .......... 8.81922 GB/s mean cycle time: 7166169
64 MiB staging buffer: .......... 9.90347 GB/s mean cycle time: 6714064
128 MiB staging buffer: .......... 10.49 GB/s mean cycle time: 8019200
256 MiB staging buffer: .......... 10.1158 GB/s mean cycle time: 11316506
512 MiB staging buffer: .......... 9.90797 GB/s mean cycle time: 17380972
1024 MiB staging buffer: .......... 9.48362 GB/s mean cycle time: 45833692

HDD:
Uncompressed:
16 MiB staging buffer: .......... 0.159683 GB/s mean cycle time: 20,995,684
32 MiB staging buffer: .......... 0.156098 GB/s mean cycle time: 17703052
64 MiB staging buffer: .......... 0.164099 GB/s mean cycle time: 18796658
128 MiB staging buffer: .......... 0.165134 GB/s mean cycle time: 26064507
256 MiB staging buffer: .......... 0.16493 GB/s mean cycle time: 22280483
512 MiB staging buffer: .......... 0.164644 GB/s mean cycle time: 28202691
1024 MiB staging buffer: .......... 0.163604 GB/s mean cycle time: 61723316
ZLib:
16 MiB staging buffer: .. 0.320242 GB/s mean cycle time: 1,303,216,973
32 MiB staging buffer: .. 0.535054 GB/s mean cycle time: 1267760921
64 MiB staging buffer: .. 0.682731 GB/s mean cycle time: 1265948359
128 MiB staging buffer: .. 0.723447 GB/s mean cycle time: 1288554198
256 MiB staging buffer: .. 0.711416 GB/s mean cycle time: 1296578734
512 MiB staging buffer: .. 0.575066 GB/s mean cycle time: 1509217595
1024 MiB staging buffer: .. 0.580245 GB/s mean cycle time: 1468605779
CPU GDEFLATE:
16 MiB staging buffer: .. 0.328042 GB/s mean cycle time: 1,544,190,610
32 MiB staging buffer: .. 0.547998 GB/s mean cycle time: 1559154326
64 MiB staging buffer: .. 0.813284 GB/s mean cycle time: 1585499802
128 MiB staging buffer: .. 1.03616 GB/s mean cycle time: 1594934689
256 MiB staging buffer: .. 1.01267 GB/s mean cycle time: 1610388225
512 MiB staging buffer: .. 0.988506 GB/s mean cycle time: 1627397063
1024 MiB staging buffer: .. 0.696 GB/s mean cycle time: 1957698250
GPU GDEFLATE:
16 MiB staging buffer: .......... 1.28291 GB/s mean cycle time: 14,761,761
32 MiB staging buffer: .......... 1.44536 GB/s mean cycle time: 14846223
64 MiB staging buffer: .......... 1.4672 GB/s mean cycle time: 14844437
128 MiB staging buffer: .......... 1.44645 GB/s mean cycle time: 16924652
256 MiB staging buffer: .......... 1.39075 GB/s mean cycle time: 20431270
512 MiB staging buffer: .......... 1.35973 GB/s mean cycle time: 26305549
1024 MiB staging buffer: .......... 1.38741 GB/s mean cycle time: 51508186

Compression Ratio
Zlib: 0.18
GDeflate: .16
 
Just tested it on a 980 pro vs a 5400 RPM HDD, 3900x, 3070 RTX

980 pro:
Uncompressed:
16 MiB staging buffer: .......... 4.90982 GB/s mean cycle time: 7,896,088
32 MiB staging buffer: .......... 6.44736 GB/s mean cycle time: 8199583
64 MiB staging buffer: .......... 6.56287 GB/s mean cycle time: 8842102
128 MiB staging buffer: .......... 6.54174 GB/s mean cycle time: 10393338
256 MiB staging buffer: .......... 6.22032 GB/s mean cycle time: 15157299
512 MiB staging buffer: .......... 5.89944 GB/s mean cycle time: 26585047
1024 MiB staging buffer: .......... 5.59646 GB/s mean cycle time: 55861041
ZLib:
16 MiB staging buffer: .. 0.503189 GB/s mean cycle time: 1,287,480,489
32 MiB staging buffer: .. 0.905197 GB/s mean cycle time: 1305462716
64 MiB staging buffer: .. 1.58616 GB/s mean cycle time: 1323209400
128 MiB staging buffer: .. 2.13534 GB/s mean cycle time: 1,431,793,526
256 MiB staging buffer: .. 2.18412 GB/s mean cycle time: 1781843332
512 MiB staging buffer: .. 1.98077 GB/s mean cycle time: 1784772790
1024 MiB staging buffer: .. 1.09736 GB/s mean cycle time: 1982930157
CPU GDEFLATE:
16 MiB staging buffer: .. 0.410827 GB/s mean cycle time: 1,584,944,318
32 MiB staging buffer: .. 0.740777 GB/s mean cycle time: 1590561345
64 MiB staging buffer: .. 1.307 GB/s mean cycle time: 1635158088
128 MiB staging buffer: .. 1.89679 GB/s mean cycle time: 1691128289
256 MiB staging buffer: .. 2.60383 GB/s mean cycle time: 1855841834
512 MiB staging buffer: .. 2.1975 GB/s mean cycle time: 1967137204
1024 MiB staging buffer: .. 0.89858 GB/s mean cycle time: 2376362167
GPU GDEFLATE:
16 MiB staging buffer: .......... 5.32005 GB/s mean cycle time: 10,716,133
32 MiB staging buffer: .......... 8.81922 GB/s mean cycle time: 7166169
64 MiB staging buffer: .......... 9.90347 GB/s mean cycle time: 6714064
128 MiB staging buffer: .......... 10.49 GB/s mean cycle time: 8019200
256 MiB staging buffer: .......... 10.1158 GB/s mean cycle time: 11316506
512 MiB staging buffer: .......... 9.90797 GB/s mean cycle time: 17380972
1024 MiB staging buffer: .......... 9.48362 GB/s mean cycle time: 45833692

HDD:
Uncompressed:
16 MiB staging buffer: .......... 0.159683 GB/s mean cycle time: 20,995,684
32 MiB staging buffer: .......... 0.156098 GB/s mean cycle time: 17703052
64 MiB staging buffer: .......... 0.164099 GB/s mean cycle time: 18796658
128 MiB staging buffer: .......... 0.165134 GB/s mean cycle time: 26064507
256 MiB staging buffer: .......... 0.16493 GB/s mean cycle time: 22280483
512 MiB staging buffer: .......... 0.164644 GB/s mean cycle time: 28202691
1024 MiB staging buffer: .......... 0.163604 GB/s mean cycle time: 61723316
ZLib:
16 MiB staging buffer: .. 0.320242 GB/s mean cycle time: 1,303,216,973
32 MiB staging buffer: .. 0.535054 GB/s mean cycle time: 1267760921
64 MiB staging buffer: .. 0.682731 GB/s mean cycle time: 1265948359
128 MiB staging buffer: .. 0.723447 GB/s mean cycle time: 1288554198
256 MiB staging buffer: .. 0.711416 GB/s mean cycle time: 1296578734
512 MiB staging buffer: .. 0.575066 GB/s mean cycle time: 1509217595
1024 MiB staging buffer: .. 0.580245 GB/s mean cycle time: 1468605779
CPU GDEFLATE:
16 MiB staging buffer: .. 0.328042 GB/s mean cycle time: 1,544,190,610
32 MiB staging buffer: .. 0.547998 GB/s mean cycle time: 1559154326
64 MiB staging buffer: .. 0.813284 GB/s mean cycle time: 1585499802
128 MiB staging buffer: .. 1.03616 GB/s mean cycle time: 1594934689
256 MiB staging buffer: .. 1.01267 GB/s mean cycle time: 1610388225
512 MiB staging buffer: .. 0.988506 GB/s mean cycle time: 1627397063
1024 MiB staging buffer: .. 0.696 GB/s mean cycle time: 1957698250
GPU GDEFLATE:
16 MiB staging buffer: .......... 1.28291 GB/s mean cycle time: 14,761,761
32 MiB staging buffer: .......... 1.44536 GB/s mean cycle time: 14846223
64 MiB staging buffer: .......... 1.4672 GB/s mean cycle time: 14844437
128 MiB staging buffer: .......... 1.44645 GB/s mean cycle time: 16924652
256 MiB staging buffer: .......... 1.39075 GB/s mean cycle time: 20431270
512 MiB staging buffer: .......... 1.35973 GB/s mean cycle time: 26305549
1024 MiB staging buffer: .......... 1.38741 GB/s mean cycle time: 51508186

Compression Ratio
Zlib: 0.18
GDeflate: .16
Those are some serious gains for HDD lol
 
Remember: nVidia always (ALWAYS) approaches anything from the stand point of locking you in to their way of doing anything - therefore it is natural and expected that they will try and tie this support to some custom APIs that they control that they will work like Hell to make sure game engine developers use those custom APIs as the One True Method. It's "The Way it's Meant to be Played." (See PhysX, CUDA, RTX, etc.)

To be fair, were AMD the market domineering bag of dicks that nVidia is, they would probably do some of that too, but since they are not, they tend to use more open APIs - like maybe the actual DirectStorage API from Microsoft, thereby rendering the need for custom APIs pointless.
PhysX is still in wide use and works on any CPU.

CUDA is specific to NVIDIA's hardware, so of course it's "locked in." NVIDIA still supports OpenCL for GPGPU.

RTX is not an API.

NVIDIA is a major contributing member in the Khronos Group, which develops the open standard Vulkan API.
 
Those are some serious gains for HDD lol


I did more test with a much heavier scene (over 4 gig of VRAM used):
https://hardforum.com/threads/directstorage-pc.2022585/#post-1045498174

regular HDD:
With Gdeflate: 3.5-3.61s on the first time, 1.17-1.25 GB/s effective bandwidth when uncompresed
Without Gdeflate:3.66s, 2.4% CPU usage

sata ssd:
With Gdeflate: .2.7s, 1.59 GB/s effective bandwidth
Without Gdeflate:2.2s, 20% CPU usage

NVME (980 pro)
With Gdeflate: .38s, 11.4 GB effective bandwidth, 1.2% max cpu usage

Without Gdeflate: .9s, 4.9 GB effective bandwidth, 92-100% max cpu usage (24 thread)

And one "issue" with a slow HDD seem to be that, if you have a giant amount of MT CPU performance available, they are so good at decompressing that even without using the GPU for decompressing asset the CPU does not reach much usage, while that a fast NVME it could use 100% of a 24 thread Ryzen 3900 if you let it.

Loading a scene in a game will usually involve more non-asset decompression task that the very simple example of just showing some floating models in an empty 3d views, so I would not expect that vast ratio of a difference but could help, and in terms of feeding GPU asset without taxing the CPU that seem to work really well and obviously GPU are parallel monster with really fast ram when it is perfectly setuped for them, so decompressing thousands of individual 64Kb subpart seem to go really fast.

Remember: nVidia always (ALWAYS) approaches anything from the stand point of locking you in to their way of doing anything - therefore it is natural and expected that they will try and tie this support to some custom APIs that they control that they will work like Hell to make sure game engine developers use those custom APIs as the One True Method

Here NVIDIA made with Microsoft GDeflate open to all vendors, AMD-Intel already support it, they seem to obviously have some run faster with "Cuda". And I am sure they will have their own ways has well instead.

If there is a DX 12 card that support Shader model 6, it should work which is quite "old" has it is been seriously in dev by everyone since at least 2 years, the new drivers I think are optimization for it, not nessary, maybe someone could try.

You can see intel similar test:
https://www.intel.com/content/www/us/en/developer/articles/news/directstorage-on-intel-gpus.html

But with much more impressive numbers using a 12900K platform and a 16GB arc A770 and I could imagine a very fast drive system.

Intel just updated there streaming while playing new data instead of simple traditionnal scene loading demo to take advantage of it:
https://github.com/GameTechDev/SamplerFeedbackStreaming
 
Last edited:
Okay, all this is cool, but how do I as a consumer use this?
Game studios have to implement it, Nvidia has added it to their RTX-IO library so if the developers update that then roll that update down as a game update it should cover Nvidia users. No word on how AMD will push the update for their support down channel or if they will just piggyback off the RTX-IO update.
 
Okay, all this is cool, but how do I as a consumer use this?
Has a pure consumer, you wait for games and application you use to take advantage of it, GPU decompression is use for non-game asset has well.

Will probably be only new game, I am not sure it will be common for people to go back to change something relatively complicated like their asset management system
 
Game studios have to implement it, Nvidia has added it to their RTX-IO library so if the developers update that then roll that update down as a game update it should cover Nvidia users. No word on how AMD will push the update for their support down channel or if they will just piggyback off the RTX-IO update.
Ok I have to ask perhaps I am missing something obvious... which is always possible.
Is this not DIRECTstorage. Its a Microsoft DirectX API feature is it not ? This already works on AMD hardware... this was an MS/AMD powered console first thing right ? It works on AMD powered consoles... why would they need to use Nvidia marketing branding driver bits ?

Driver enablement isn't anything a developer needs to worry about is it ? I might be missing something here. Who cares what Nvidia calls parts of their driver... are developers not just coding to Microsofts API here ?
 
Ok I have to ask perhaps I am missing something obvious... which is always possible.
Is this not DIRECTstorage. Its a Microsoft DirectX API feature is it not ? This already works on AMD hardware... this was an MS/AMD powered console first thing right ? It works on AMD powered consoles... why would they need to use Nvidia marketing branding driver bits ?

Driver enablement isn't anything a developer needs to worry about is it ? I might be missing something here. Who cares what Nvidia calls parts of their driver... are developers not just coding to Microsofts API here ?
It is a new feature that until recently was either an AMD proprietary or NVidia proprietary feature that has been standardized and added to DX12 for both.
Developers now have the choice to continue to use the brand specific versions or the standard one.

Nvidia has added the DX12 code functions to their RTX-IO libraries and added translation functions so developers can add the needed code to their game engines as easily as possible.

AMD has simply gone with a “The dev studios will figure it out” approach.
 
Is this not DIRECTstorage. Its a Microsoft DirectX API feature is it not ? This already works on AMD hardware... this was an MS/AMD powered console first thing right ? It works on AMD powered consoles... why would they need to use Nvidia marketing branding driver bits ?
Unlike PC, consoles had a specialized hardware for uncompression I think, "Xbox Velocity" could have some difference with the PC version even if both use the DirectStorage name.

I am not sure what AMD smartaccessstorage or NVIDIA RTX IO add to the concept, other than some marketing to a Windows-DX 12 feature that work on Intel-AMD-Nvidia.

If CUDA exclusive or others are use to beat more generic zlib performance it would be possible, but the giant bulk of it would be quite similar.

Maybe the goal from Nvidia-AMD point of view is to provide developper a Windows less interface to directstorage that shift to something else on PS5-Switch-Linux, etc...
 
Nvidia has added the DX12 code functions to their RTX-IO libraries and added translation functions so developers can add the needed code to their game engines as easily as possible.

AMD has simply gone with a “The dev studios will figure it out” approach.
API from Microsoft tend to be both extremely easy to use and extremely well documented and a quick look at it seem to be the case against, should be one of the easiest thing for a game studio to figure out, that said AMD seem to have the exact same strategy via their AMD smart access memory platform-brand, with AMD Smart Access Storage, they seem to try to make it so Radeon-Ryzen CPU together will utilise DirectStorage better (imagine via some Resize bar-large L3 or something) while if you are not fully compatible the API will still work and fall back.

They talked about it in May:


I could see a lot of people going simply with regular DirectStorage without using neither, because I could see not bringing much more advantage.
 
Last edited:
It is a new feature that until recently was either an AMD proprietary or NVidia proprietary feature that has been standardized and added to DX12 for both.
Developers now have the choice to continue to use the brand specific versions or the standard one.

Nvidia has added the DX12 code functions to their RTX-IO libraries and added translation functions so developers can add the needed code to their game engines as easily as possible.

AMD has simply gone with a “The dev studios will figure it out” approach.
Ok I think I understand now then. So your saying AMD and MS came up with this for the Xbox. MS has added it to DirectX. But Nvidia as always thinks their proprietary way is better. So now gamers will have to probably deal with paid off "way its meant to be played" developers ignoring the DirectX standard and using Nvidias standard instead ?

I guess I should go do some reading. I'll be honest I haven't been following direct storage much at all.
 
Unlike PC, consoles had a specialized hardware for uncompression I think, "Xbox Velocity" could have some difference with the PC version even if both use the DirectStorage name.

I am not sure what AMD smartaccessstorage or NVIDIA RTX IO add to the concept, other than some marketing to a Windows-DX 12 feature that work on Intel-AMD-Nvidia.

If CUDA exclusive or others are use to beat more generic zlib performance it would be possible, but the giant bulk of it would be quite similar.

Maybe the goal from Nvidia-AMD point of view is to provide developper a Windows less interface to directstorage that shift to something else on PS5-Switch-Linux, etc...
That is logical... I will go read up. With Linux and non windows though the open source developers tend to take care of that stuff. Having said that ya perhaps this will be a pain for valve to translate I don't know.
 
Ok I think I understand now then. So your saying AMD and MS came up with this for the Xbox. MS has added it to DirectX. But Nvidia as always thinks their proprietary way is better. So now gamers will have to probably deal with paid off "way its meant to be played" developers ignoring the DirectX standard and using Nvidias standard instead ?

I guess I should go do some reading. I'll be honest I haven't been following direct storage much at all.
Not quite but close enough, Nvidia came up with it for their AI learning environments then AMD and Sony created a version of it for the PlayStation. Microsoft not wanting to be left out then worked with AMD to get it working on the XBox and now the PC.

Nvidia introduced their version with the 3000 series on the consumers side and they packaged it with their dev environment. So the act of adding DLSS and the other Nvidia goodies added the RTX-IO extensions as well.
 
That is logical... I will go read up. With Linux and non windows though the open source developers tend to take care of that stuff. Having said that ya perhaps this will be a pain for valve to translate I don't know.
Currently, there isn't anything for this in Linux that isn't proprietary, GPUDirect is about the only option out there and it is strictly a CUDA library. Kronos hasn't made any moves to implement this feature that they have announced so far, but that doesn't mean it's not being worked on, Kronos isn't exactly forthcoming on their project goals or timelines.

https://docs.nvidia.com/cuda/gpudirect-rdma/index.html
GPUDirect was implemented along with their CUDA release for Kelper. (2012)
 
Last edited:
Currently, there isn't anything for this in Linux that isn't proprietary, GPUDirect is about the only option out there and it is strictly a CUDA library. Kronos hasn't made any moves to implement this feature that they have announced so far, but that doesn't mean it's not being worked on, Kronos isn't exactly forthcoming on their project goals or timelines.

https://docs.nvidia.com/cuda/gpudirect-rdma/index.html
GPUDirect was implemented along with their CUDA release for Kelper. (2012)
I could see valve wanting to get Khronos to get something done before they get to the second deck. Sounds like it would be a great feature for the deck.
 
Remember: nVidia always (ALWAYS) approaches anything from the stand point of locking you in to their way of doing anything - therefore it is natural and expected that they will try and tie this support to some custom APIs that they control that they will work like Hell to make sure game engine developers use those custom APIs as the One True Method. It's "The Way it's Meant to be Played." (See PhysX, CUDA, RTX, etc.)

To be fair, were AMD the market domineering bag of dicks that nVidia is, they would probably do some of that too, but since they are not, they tend to use more open APIs - like maybe the actual DirectStorage API from Microsoft, thereby rendering the need for custom APIs pointless.

You realize you could change "Nvidia" to "Microsoft" in that first sentence and it would read accurately. Microsoft's entire interest in DirectX has just been lock-in. It's the reason they continued to maintain it even during the decade+ they otherwise wrote off PC gaming.

Like it or not, financial incentive drives innovation. Companies invest in R&D to develop intellectual property and give themselves a competitive advantage, and ROI that investment. But there seems to be this common attitude that the output of Nvidia's R&D should all be open-source and public domain, and competitors should be entitled to copy Nvidia's homework. And AMD "embraces open standards" - nevermind it's usually while not being in a strong enough position to begin with in terms of marketshare and leverage. Yet we've seen how AMD can behave the moment they're on top for two seconds, like abandoning the DIY/enthusiast HEDT/Workstation market as soon as Threadripper had some success, and artificially locking Zen3 TR Pro CPUs to Lenovo systems so they were bricked when installed in any other motherboard, which also left AIB partners that had built WRX80 boards to twist in the wind. One example.

FWIW I actually understand AMD's business rationale for the TR Pro segmentation. It's just another large monolithic corporation legally obligated to do whatever they can get away with to please their shareholders. And some are worse than others with anti-competitive practices. But none of them are The Good Guys that care anything for "the community". They only tolerate us, because shareholders are their actual customers.
 
Last edited:
Back
Top