AMD Ryzen 3000 and Older Zen Chips Don't Support SAM Due to Hardware Limitation, Intel Chips Since Haswell Support it

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
10,875
"It gets more interesting—Intel processors have been supporting this feature since the company's 4th Gen Core "Haswell," which introduced it with its 20-lane PCI-Express gen 3.0 root-complex. This means that every Intel processor dating back to 2014 can technically support Resizable-BAR, and it's just a matter of motherboard vendors releasing UEFI firmware updates for their products (i.e. Intel 8-series chipsets and later). AMD extensively advertises SAM as adding a 1-2% performance boost to Radeon RX 6800 series graphics cards. Since this is a PCI-SIG feature, NVIDIA plans to add support for it on some of its GPUs, too. Meanwhile, in addition to AMD 500-series chipsets, even certain Intel 400-series chipset motherboards started receiving Resizable BAR support through firmware updates."

https://www.techpowerup.com/275565/...mitation-intel-chips-since-haswell-support-it
 
If it's a free 1-2%, they might as well.

Was only a matter of time before system Ram amounts made it more feasible to make use of a larger BAR.
 
It's just like overclocking a video card :ROFLMAO: You either do it because you can, or ignore the possibility.
 
For 1-2%........nah, pass.

If it's a free 1-2%, they might as well.

Was only a matter of time before system Ram amounts made it more feasible to make use of a larger BAR.


AMD Radeon RX 6800 XT gets up to 16% Performance Boost w/ SAM on an Intel Platform

https://www.hardwaretimes.com/amd-r...performance-boost-w-sam-on-an-intel-platform/

Japanese hardware site ASCII.jp (via VideoCardz) recently tested an ASUS ROG Maximum XII Extreme/Intel Core i9-10900K/AMD Radeon RX 6800 XT system with the feature enabled and discovered a massive, 182 percent increase in minimum frame rate performance for Red Dead Redemption 2 in 1080p.

https://www.thefpsreview.com/2020/1...ng-amount-in-amd-radeon-rx-6800-xt-benchmark/
 
Even more performance on the Intel chipset? lol

But 182% is pretty crazy even if its only at 1080p.

DirectX12 new api DirectStorage will bypass the BAR altogether. Wonder what that will do for these numbers.. It's (maybe) years out tho.
https://www.techspot.com/article/2137-next-gen-directx-12/
Was originally developed for the Xbox S/X, so it's been around. Who knows maybe it's coming sooner than later.
 
Even more performance on the Intel chipset? lol

But 182% is pretty crazy even if its only at 1080p.

DirectX12 new api DirectStorage will bypass the BAR altogether. Wonder what that will do for these numbers.. It's (maybe) years out tho.
https://www.techspot.com/article/2137-next-gen-directx-12/
Was originally developed for the Xbox S/X, so it's been around. Who knows maybe it's coming sooner than later.
How does direct storage prevent the need from transferring data to the GPU over the PCI bus?
 
We WANT it to transfer only over the PCIe bus. Only is the key word.
It will go over the PCIe bus, but it will use an entirely new mechanism.

There will be no arbitrary 256Mb transfer size, and no System ram used for it. Since the PCIe bus is very fast, and getting faster, this portion or the performance equation (or potential bottleneck) isn't changing much other than maybe more efficient transfer sizes could be used, or even 2 or 3 Gb at once, instead of 8 to 12 separate transfers thru the BAR when it is 256Mb to move that same 2 to 3Gb of data to vram. This alone could actually be a big boost.

On top of all that, 1 copy step to system ram is eliminated. There has to be some positive benefit to that.
 
That 1-2% performance gain in the GPU isn't going to makeup for 20-60% performance decreases caused by the 70+ hardware security patches going back to Sandy Bridge.
Nice try, Intel. :meh:
 
I haven't been following this BAR thing closely, but the way it is described, I don't see how it helps with framerate at all. At least if you aren't starved for VRAM.

It seems to me it would have more to do with load times than anything else.

Maybe the framerate improvements have to do with worse performance during the first few frames when stuff is still loading?
 
I haven't been following this BAR thing closely, but the way it is described, I don't see how it helps with framerate at all. At least if you aren't starved for VRAM.

It seems to me it would have more to do with load times than anything else.

Maybe the framerate improvements have to do with worse performance during the first few frames when stuff is still loading?
Here is a excerpt from the WDDM 2.0 specs where Microsoft began supporting resizable BAR back in 2017.

"It is typical today for a discrete graphics processing unit (GPU) to have only a small portion of its frame buffer exposed over the PCI bus. For compatibility with 32bit OSes, discrete GPUs typically claim a 256MB I/O region for their frame buffers and this is how typical firmware configures them.
For Windows Display Driver Model (WDDM) v2, Windows will renegotiate the size of a GPU BAR post firmware initialization on GPUs supporting resizable BAR, see Resizable BAR Capability in the PCI SIG Specifications Library.
A GPU, supporting resizable BAR, must ensure that it can keep the display up and showing a static image during the reprogramming of the BAR. In particular, we don't want to see the display go blank and back up during this process. It is important to have smooth transition between the firmware displayed image, the boot loader image and the first kernel mode driver generated image. It is guaranteed that no PCI transaction will occur toward the GPU while the renegotiation is taking place.
For the most part this renegotiation will be invisible to the kernel mode driver. When the renegotiation is successful, the kernel mode driver will observe that the GPU BAR has been resized to its maximum size to expose the entire VRAM of the discrete GPU.
Upon successful resizing, the kernel mode driver should expose a single, CPUVisible, memory segment to the video memory manager. The video memory manager will map CPU virtual addresses directly to this range when the CPU need to access the content of the memory segment.”


And as a fun point of interest Xeon’s have had this feature since the X470’s dating back to 2017. NVidia has had it working on their professional line of cards for almost as long.
 
Last edited:
That's nice, but do we have any credible information about older GPU support?

I'm interested i seeing what it might do for the minimum framerates of Vega, RX 5000, Pascal and Turing. If the architectures support it...
 
I haven't been following this BAR thing closely, but the way it is described, I don't see how it helps with framerate at all. At least if you aren't starved for VRAM.

It seems to me it would have more to do with load times than anything else.

Maybe the framerate improvements have to do with worse performance during the first few frames when stuff is still loading?

My understanding is that in CPU limited scenarios 1080p or 720p with RTX + DLSS, BAR should have an significant effect
 
That's nice, but do we have any credible information about older GPU support?

I'm interested i seeing what it might do for the minimum framerates of Vega, RX 5000, Pascal and Turing. If the architectures support it...
On gaming cards there hasn’t really been a need to support it until recently. So I can see both teams using it as a selling point for the new generation but it’s been working on the Quadro’s for a few years when paired with a Xeon so in theory they could. It’s more to do with driver support than anything.
 
My understanding is that in CPU limited scenarios 1080p or 720p with RTX + DLSS, BAR should have an significant effect
Yes it allows a CPU to spend less time dealing with paging and data referencing and more time doing actual work. It also decreases latency in feeding VRam.
 
Last edited:
My understanding is that in CPU limited scenarios 1080p or 720p with RTX + DLSS, BAR should have an significant effect

Why though, if it really only deals with the transfer of assets to the GPU VRAM prior to rendering starting?

Unless it also impacts per frame instructions from the game engine. Doesn't seem like it would though.

Either way, it is irrelevant to me, as I run in 4k, so I am never going to hit those CPU limited conditions.
 
Why though, if it really only deals with the transfer of assets to the GPU VRAM prior to rendering starting?

Unless it also impacts per frame instructions from the game engine. Doesn't seem like it would though.

Either way, it is irrelevant to me, as I run in 4k, so I am never going to hit those CPU limited conditions.
It also assists in non-render-based workloads so as games start having the GPU do more than just rendering it will assist in those processes. But it does decrease latency in feeding VRAM so in situations where the GPU is constantly swapping assets it provides a small benefit usually in that often touted 1-2% range.
 
Last edited:
We WANT it to transfer only over the PCIe bus. Only is the key word.
It will go over the PCIe bus, but it will use an entirely new mechanism.

There will be no arbitrary 256Mb transfer size, and no System ram used for it. Since the PCIe bus is very fast, and getting faster, this portion or the performance equation (or potential bottleneck) isn't changing much other than maybe more efficient transfer sizes could be used, or even 2 or 3 Gb at once, instead of 8 to 12 separate transfers thru the BAR when it is 256Mb to move that same 2 to 3Gb of data to vram. This alone could actually be a big boost.

On top of all that, 1 copy step to system ram is eliminated. There has to be some positive benefit to that.
All data is transferred over the PCI bus unless you are using NVLink or SLI to move data between cards, resizable BAR and Direct Storage are two entirely different things. One is granting the CPU direct access to the Vram registers the other allows data to be accessed directly without making an OS function call then pull that data using methods more efficient for a Game instead of general data.
 
I can only assume that this also applies to mt Threadripper 3960x, but who knows.

This was previously considered somewhat of a professional feature, so maybe it made it in?
 
Either way, it is irrelevant to me, as I run in 4k, so I am never going to hit those CPU limited conditions.

I find this opposite for me. I specifially game on my 4k60 monitor because I am so CPU limited I cannot go past 60. If I can, I use the 1440p144hz panel instead.
 
I can only assume that this also applies to mt Threadripper 3960x, but who knows.

This was previously considered somewhat of a professional feature, so maybe it made it in?
Here is a post from Alex Deucher one of the AMD GPU Maintainers

"Smart Access Technology works just fine on Linux. It is resizeable BAR support which Linux has supported for years (AMD actually added support for this), but which is relatively new on windows. You just need a platform with enough MMIO space. On older systems this is enabled via sbios options with names like ">4GB MMIO". I don't know what windows does exactly, but on Linux at least, it will work on any platform with enough MMIO space. I suspect windows would behave the same way (although I think windows has stricter requirements about BAR resizing compared to Linux so you may need a sbios update for your platform to make windows happy)."

So there is some real chance that on a hardware level the technology exists to implement it on older chipsets should they choose too, but right now it is a selling point for their new hardware, I suspect that once the marketing campaign has run its course they may throw older systems a bone and enable the features but they may not and instead keep it as a selling feature for newer hardware.

Both AMD and NVidia have supported WDDM2 on their platforms for years and the resizable BAR support was one of the key features added so the functionality should exist on any hardware they sold that kept to spec, but I do not know if it was considered an optional part of said spec so AMD in an attempt to cut costs may have axed it on their older parts.

Edit:
After further reading AMD may not be inclined to activate the features sets on older hardware sets for more than just driver reasons, supposedly the configuration and validation of the PCI bridges was somewhat more costly and completely up to the discretion of the board partners, they may have not done this on the 300 and 400 series boards to keep costs lower, the 500 series though AMD greatly stepped up their requirements on board quality and validation. So while there may be a good number of boards in the 300 and 400 ranges that can meet the requirements they would likely be in the minority, and again completely up to the manufacturers to test, validate, and implement. Intel however has had it as a part of their validation for the whole time which is why we are seeing they can implement the feature going back far further.
 
Last edited:
Here is a post from Alex Deucher one of the AMD GPU Maintainers

"Smart Access Technology works just fine on Linux. It is resizeable BAR support which Linux has supported for years (AMD actually added support for this), but which is relatively new on windows. You just need a platform with enough MMIO space. On older systems this is enabled via sbios options with names like ">4GB MMIO". I don't know what windows does exactly, but on Linux at least, it will work on any platform with enough MMIO space. I suspect windows would behave the same way (although I think windows has stricter requirements about BAR resizing compared to Linux so you may need a sbios update for your platform to make windows happy)."

So there is some real chance that on a hardware level the technology exists to implement it on older chipsets should they choose too, but right now it is a selling point for their new hardware, I suspect that once the marketing campaign has run its course they may throw older systems a bone and enable the features but they may not and instead keep it as a selling feature for newer hardware.

Both AMD and NVidia have supported WDDM2 on their platforms for years and the resizable BAR support was one of the key features added so the functionality should exist on any hardware they sold that kept to spec, but I do not know if it was considered an optional part of said spec so AMD in an attempt to cut costs may have axed it on their older parts.

Edit:
After further reading AMD may not be inclined to activate the features sets on older hardware sets for more than just driver reasons, supposedly the configuration and validation of the PCI bridges was somewhat more costly and completely up to the discretion of the board partners, they may have not done this on the 300 and 400 series boards to keep costs lower, the 500 series though AMD greatly stepped up their requirements on board quality and validation. So while there may be a good number of boards in the 300 and 400 ranges that can meet the requirements they would likely be in the minority, and again completely up to the manufacturers to test, validate, and implement. Intel however has had it as a part of their validation for the whole time which is why we are seeing they can implement the feature going back far further.

Understood.

From the validation, capability and quality perspective, TRX40 is a completely different animal than AM4, though.
 
I find this opposite for me. I specifially game on my 4k60 monitor because I am so CPU limited I cannot go past 60. If I can, I use the 1440p144hz panel instead.

My 4k monitor can hit 120hz with VRR, but if I had to choose I'm picking the resolution over higher framerate.
 
Understood.

From the validation, capability and quality perspective, TRX40 is a completely different animal than AM4, though.
Completely different animal, so it simply comes down to will AMD add the features to the AGESA for the manufacturers to consider implementing, I am thinking the answer may be no as long as they feel there is an economical advantage to keeping it exclusive. But I would not at all mind being wrong about that.
 
Completely different animal, so it simply comes down to will AMD add the features to the AGESA for the manufacturers to consider implementing, I am thinking the answer may be no as long as they feel there is an economical advantage to keeping it exclusive. But I would not at all mind being wrong about that.

...or has resizeable BAR been there all along like with Intel Xeon products, because it was positioned as more of a workstation product?
 
I haven't been following this BAR thing closely, but the way it is described, I don't see how it helps with framerate at all. At least if you aren't starved for VRAM.

It seems to me it would have more to do with load times than anything else.

Maybe the framerate improvements have to do with worse performance during the first few frames when stuff is still loading?
Giant improvement of the minimum frame rate claim could be due to that (new things to load during gameplay), but if average fps move a lot there is probably more than that going on, specially when we see good improvement in the max fps achieved.

https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/03-Memory.pdf
https://gpuopen.com/learn/vulkan-device-memory/

Someone that know better will probably be able to explain some theory, but in general memory on video card now are much slower depending if they read-write/write/read or not visible from the CPU point of view.

There is a small 256mb that is device_local and host_visible with direct access by both CPU and GPU, where the CPU can write directly into the GPU memory, that is used for game constant data flow, if a game ever felt constrained by that 256 mb limit during game play it start to use much slower access to communicate with the gpu.

What got bigger is the memory that use in gameplay by the cpu to update gpu data more than memory use for say storing texture, from what I understand.

I think this explain it :
https://www.fasterthan.life/blog/2017/7/13/i-am-graphics-and-so-can-you-part-4-

If you lack fast device memory for your vertex / uniform other buffer you interact too, you start using way slower memory transfert/communication.

Now that games goes over 5 million triangles in a frame (and a lot more in a scene) we can imagine that the 256mb can feel limited, specially if the game run OpenCL/Cuda type of physic at the same time, maybe that why a game like Forza seem to be one of the biggest winner here.
 
My biggest question then is, why if it's been supported for this long is it now just AMD coming out with it and pushing it? Why hasn't intel and amd/nvidia had support for this for years? Seems really odd that it's been a supported feature that nobodies used leaving "free" performance on the table for no real reason.
 
...or has resizeable BAR been there all along like with Intel Xeon products, because it was positioned as more of a workstation product?
From what I gather it already works in Linux and has for years. So the capability must be there to some extent for AMD to support it in Windows. They need to add two flags to make it work for Windows, as mentioned above they need a >4G flag to the bios before they can enable it in windows due to how Windows deals with the PCI bus differently than Linux.

I was just searching AMD for some added information on SMA and I came across this nugget at the bottom of their official page, the Image was entitled as "How to Enable"

1607114496052.png
 
From what I gather it already works in Linux and has for years. So the capability must be there to some extent for AMD to support it in Windows. They need to add two flags to make it work for Windows, as mentioned above they need a >4G flag to the bios before they can enable it in windows due to how Windows deals with the PCI bus differently than Linux.

I was just searching AMD for some added information on SMA and I came across this nugget at the bottom of their official page, the Image was entitled as "How to Enable"

View attachment 305716

Yeah, I saw this too.

I feel fairly certian I've seen the Above 4G Decoding BIOS setting

I am going to check the BIOS once I get the build reassembled again to see if the "Re-Size BAR Support" option is there.
 
Yeah, I saw this too.

I feel fairly certian I've seen the Above 4G Decoding BIOS setting

I am going to check the BIOS once I get the build reassembled again to see if the "Re-Size BAR Support" option is there.
I have an EPYC that is being rebuilt early next week so while I am doing my bios and firmware updates I'll take a gander as well.
 
From what I gather it already works in Linux and has for years. So the capability must be there to some extent for AMD to support it in Windows. They need to add two flags to make it work for Windows, as mentioned above they need a >4G flag to the bios before they can enable it in windows due to how Windows deals with the PCI bus differently than Linux.

I was just searching AMD for some added information on SMA and I came across this nugget at the bottom of their official page, the Image was entitled as "How to Enable"

View attachment 305716

Now that you mention it, I just updated a gigabyte h360 mobo bios last night and I saw that adoption and wondered what it was for. Well the above 4g decoding one, I don't think the other setting is there.
 
AMD Radeon RX 6800 XT gets up to 16% Performance Boost w/ SAM on an Intel Platform

https://www.hardwaretimes.com/amd-r...performance-boost-w-sam-on-an-intel-platform/

Japanese hardware site ASCII.jp (via VideoCardz) recently tested an ASUS ROG Maximum XII Extreme/Intel Core i9-10900K/AMD Radeon RX 6800 XT system with the feature enabled and discovered a massive, 182 percent increase in minimum frame rate performance for Red Dead Redemption 2 in 1080p.

https://www.thefpsreview.com/2020/1...ng-amount-in-amd-radeon-rx-6800-xt-benchmark/

I mean, that 182% claim is fucking stupid. The game probably hiccuped for a second and recorded that low.

It's clearly an anomaly.
 
Unless I'm giving a technical reason I don't believe that older Ryzen's can't support SAM. AMD lately seems kinda greedy with the way they're handling older motherboards as it is. Just seems like another ploy to get people to upgrade their systems for a feature that gives a 1%-2% increase in performance, if supported.
 
Now that you mention it, I just updated a gigabyte h360 mobo bios last night and I saw that adoption and wondered what it was for. Well the above 4g decoding one, I don't think the other setting is there.
The other part is an AMD thing for Intel all you need is the above 4G support to be enabled. NVidia should have the Intel compatibility added to their consumer cards soon. It’s already active on their enterprise cards (has been since 2017) so it’s more about integration and testing than actually building a new functionality.
 
Back
Top