Anything preventing Nvidia from implementing SAM with its cards?

VirtualMirage

Limp Gawd
Joined
Nov 29, 2011
Messages
384
So, I have been doing some reading about the SAM (Smart Access Memory) feature that AMD is touting with their latest video cards when paired with a Zen3 processor. This feature, when enabled stands to gain an extra 4-13% in performance in games on average. Seeing that the SAM is essentially just taking advantage of new features in the latest Windows 10 that allows hardware to access more than the 256MB limit that the BAR has (Base Access Register), would it be possible for Nvidia to leverage this capability as well and also see a potential performance gain? It should be capable with just some driver updates, I would think, providing the rest of the hardware ecosystem supports the capability.

Also, from what I have read, this could even be possible with Zen2 and other processors as well. There is a possibility that AMD may try and leverage the feature on their Zen2 lineup after the new-ness of the Zen3 dies down.

Thoughts?
 

jeremyshaw

[H]F Junkie
Joined
Aug 26, 2009
Messages
12,466
Is the limit even 256MB? Like, where is that limitation codified and written down? In a recent Raspberry Pi CM4 review, the reviewer found AMD cards needed at least 1GB, but Nvidia cards were okay with just 256MB, and the Pi4's SoC defaulted to a lower allocation for it's PCIe slot.

Should be noted he's using very old GPUs, so these may not apply to newer cards. He did get a newer AMD card to test, but results pending.

https://www.jeffgeerling.com/blog/2020/external-gpus-and-raspberry-pi-compute-module-4

BAR space woes​

PCI express devices require BAR ('Base Address Register') space to be able to initialize and map memory to the computer's own memory space, and the Pi currently gives devices 64 MB of RAM for this purpose.

That's fine for a simple device like the VL805 USB 3.0 controller used in the regular Pi 4, and for things like NVMe drives and SATA adapters. But GPUs require a lot more BAR space, so after a bit of research, I documented the process for expanding the BAR space on the Pi to 1 GB after some help from a couple engineers on the Pi Forums.

The Nvidia GPU was happy with just 256 MB of RAM available, but the AMD Radeon wanted 1 GB. This also means that the lowest-end 1 GB Compute Module 4 might not be adequate for certain applications when you want to pair them with PCIe devices.
 

chameleoneel

2[H]4U
Joined
Aug 15, 2005
Messages
3,733
So, I have been doing some reading about the SAM (Smart Access Memory) feature that AMD is touting with their latest video cards when paired with a Zen3 processor. This feature, when enabled stands to gain an extra 4-13% in performance in games on average. Seeing that the SAM is essentially just taking advantage of new features in the latest Windows 10 that allows hardware to access more than the 256MB limit that the BAR has (Base Access Register)
Its very likely features like this are being brought to Windows, because MS was seemingly more involved this time 'round, in AMD's hardware R&D for the new consoles. Annnnnd it just so happens that both the CPU and GPU technology the consoles are based on, is also the same base tech used for AMD's next round of PC hardware. Its never been like this before, where console R&D was for the same hardware going into PC's. In the past, its always been highly custom stuff. Half of which wasn't even compatible with a PC environment.

Anyway, I assume Nvidia could feasibly come up with an analogous solution. But, if it requires firmware updates (bios updates), would they be able to get everyone to do it? The cool thing happening right now with AMD, is that they are getting to show the power of a company who makes CPUs, GPUs, and the motherboards to make them work. And this feature is essentially back patched to an existing motherboard chipset platform. I have a feeling this is just scratching the surface. And their next platform could have even more synergistic features. Especially now that MS seems interested in actually expanding Windows and its relevant API's.
 

VirtualMirage

Limp Gawd
Joined
Nov 29, 2011
Messages
384
That isn't the same.

RTX IO is relying on the new DirectStorage features that give the video card direct access to the storage drive (primarily M.2 type SSDs) to help speed up streaming data by cutting out the middle man to the drive. To take advantage of this, it needs to be written into the games by the developers.

SAM is the involvement of the pipeline and reservation between the GPU, CPU, and memory and being able to adjust the size of the BAR beyond the initial inherent limitation that Windows had on the GPU of only 256MB being able to be reserved. The latest version of Windows allows the BAR for the GPU to be even larger and can be adjusted on the fly (if I am understanding it correctly). This is something that performance gains can be seen on games that weren't even specifically written to take advantage of, unlike the RTX IO, as seen by AMD's demonstration.

TL;DR
One method involves access to storage while the other involves access to memory.
 

VirtualMirage

Limp Gawd
Joined
Nov 29, 2011
Messages
384
Is the limit even 256MB? Like, where is that limitation codified and written down? In a recent Raspberry Pi CM4 review, the reviewer found AMD cards needed at least 1GB, but Nvidia cards were okay with just 256MB, and the Pi4's SoC defaulted to a lower allocation for it's PCIe slot.

Should be noted he's using very old GPUs, so these may not apply to newer cards. He did get a newer AMD card to test, but results pending.

https://www.jeffgeerling.com/blog/2020/external-gpus-and-raspberry-pi-compute-module-4
Linux has had this feature for a while too and the GPU manufacturers have been taking advantage of it on that platform. Might also be why you can see the larger BAR sizes on Raspberry Pi. But it wasn't until the most recent versions of Windows that this capability to go beyond 256MB was possible and it appears it hasn't been fully taken advantage of yet.

And I believe it does require a level of hardware support as well. So the link you referenced and them using older GPUs may or may not be able to fully take advantage of the larger BAR sizes. Also, the amount of performance improvement will also hinge on the application that is running and whether it can leverage the larger size.
 
Top