Memory Bandwidth and Stacked Memory

Liger88 · Jul 9, 2014

One of the things I'm weak in is understanding how GPU bandwidth fully effects performance. I know that with higher resolutions bandwidth is critical due to more data needing to be transferred, but one thing I don't get is how it would affect current systems.

For instance: On the Nvidia side we know Pascal/Volta is expected to have stacked memory with claims of 1TBps being possible. My question to the experts would be, how does this affect performance if these cards come out before mainstream 4K adoption? I mean would you see a huge increase in performance from a 500GBps card or 1TBps at 1080p at all at this point? I'm assuming bandwidth it's useless if you don't have the power to drive it as we currently see with RAM sizes.

I'm just trying to see if it's snake oil for most people still on 1080p or if it'll actually offer huge performance gains prior to the world moving to 4K, which will obviously need far more bandwidth. Any educational knowledge is welcome as I'm still trying to fully understand all aspects of modern day GPU's.

PcZac · Jul 9, 2014

The most basic thing is, if you have a large amount of memory, and a low bandwidth, it's going to restrict your access to that extra memory and make it useless, it's part of the reason the PS4 is doing a lot better in graphics performance than the Xbox One.

If you can find one series of cards that had DDR3 and GDDR5 versions, the ones with the higher bandwidth ram would perform 1.5x-2x better in games.

Odellus · Jul 9, 2014

well looking at watch dogs, people with 290/290Xs don't get stutter (512 bit 320 GB/s) whereas people with titans do (384 bit 288 GB/s.) that's the only game i know of where memory bandwidth matters.

PcZac · Jul 9, 2014

Memory bandwidth matters before the point of stuttering, bandwidth is probably the main reason 290/290Xs get better performance in some games than the titans, and in some game the higher bandwidth doesn't matter so the 290/290Xs fall behind the titans.

Unknown-One · Jul 9, 2014

Odellus said:
well looking at watch dogs, people with 290/290Xs don't get stutter (512 bit 320 GB/s) whereas people with titans do (384 bit 288 GB/s.) that's the only game i know of where memory bandwidth matters.

What about people with Titan Black's (384 bit, 336 GB/s)?

Curious if bus-width vs. clockspeed has an impact (even if two cards have equal total bandwidth).

Armenius · Jul 9, 2014

Right, but what will games be like in 2016 when Pascal is released? Game engines today probably would not see any benefit from these technologies, but future game engines could very well be designed around these features that could take full advantage of the increased bandwidth and shorter pipeline between CPU and GPU.

Liger88 · Jul 9, 2014

Unknown-One said:
What about people with Titan Black's (384 bit, 336 GB/s)?

Curious if bus-width vs. clockspeed has an impact (even if two cards have equal total bandwidth).

See that's what I was thinking too. Perhaps bus width and more things in general come into play that "blur" the lines so to speak. Technically, if I understand correctly, you could easily artificially increase bandwidth right now using fast memory or a bigger bus, or a combination of both.

Seems like only game developers would be the ones to properly answer some of these questions regarding how much the specs matter and when.

LordEC911 · Jul 9, 2014

It is all about ASIC design.

The reason that 2.5/3d DRAM is so interesting for GPUs is because the #1 efficiency loss comes from having to go off die. The combined benefits of more bandwidth and lower power consumption means everything in HPC and also for gaming, though the gaming benefits won't necessarily be as large.

KazeoHin · Jul 10, 2014

Memory bandwidth is a big player in static data. In order to calculate the value of a given pixel in screen-space, the device has to run through a shader. Shaders use math and memory to calculate values. If you have a shader that uses few textures but uses a ton of math (think something like a water or volumetric surface) the memory bandwidth won't really effect it's render times. Instead if you have a shader that requires ten separate textures and uses tricks like screen-space effects, it will require a ton of bandwidth to render quickly. Most modern engines try to use shaders wich go half-half. If an effect is too costly to do in math, in real-time, then the artist can 'bake' the effect into a texture using powerful software (things like subsurface scattering or flow maps). With the texture baked, the system can simply read the values from a file instead of calculating per-pixel.

Liger88 · Jul 10, 2014

KazeoHin said:
Memory bandwidth is a big player in static data. In order to calculate the value of a given pixel in screen-space, the device has to run through a shader. Shaders use math and memory to calculate values. If you have a shader that uses few textures but uses a ton of math (think something like a water or volumetric surface) the memory bandwidth won't really effect it's render times. Instead if you have a shader that requires ten separate textures and uses tricks like screen-space effects, it will require a ton of bandwidth to render quickly. Most modern engines try to use shaders wich go half-half. If an effect is too costly to do in math, in real-time, then the artist can 'bake' the effect into a texture using powerful software (things like subsurface scattering or flow maps). With the texture baked, the system can simply read the values from a file instead of calculating per-pixel.

That's, exactly what I was looking for. Thanks!

Red Falcon · Jul 10, 2014

Excellent thread with good answers, subbed.

KazeoHin · Jul 10, 2014

Liger88 said:
That's, exactly what I was looking for. Thanks!

No probs! I should elaborate a bit also, that your resolution effects the importance of the buffer as well as AA and AF. Obviously texture resolution plays a role, but things like AF multiply the impact of texture scale, and AA multiply the impact of resolution settings.

Anisotropic filtering will essentially save a ton of smaller, more screen-efficient versions of a given texture. Generally, with standard Mip Mapping (the effect AF enhances) 33% of the texture's file size is reserved for these smaller textures. With 2x AF, that percentage goes to 50%. When you crank up the AF to 16x, the 'MIPs' take up vastly more space in memory than the original sized texture. When you do this with 2k, 4k or even 8k textures, you get HUGE amounts of data that takes little time to actually calculate and draw, but it clogs up the memory lines, and may require a ton of clock cycles to fully draw from memory.

AA is generally more well known in how it effects the memory. But what many people don't realise is that some AA (like MLAA or FSAA) don't hit the memory much, but rather add instructions to the math side of things. So there are alternatives to clogging your memory with Typical AA.

Memory Bandwidth and Stacked Memory

Liger88

2[H]4U

PcZac

Limp Gawd

Odellus

[H]ard|Gawd

PcZac

Limp Gawd

Unknown-One

[H]F Junkie

Armenius

Extremely [H]

Liger88

2[H]4U

LordEC911

[H]ard|Gawd

KazeoHin

[H]F Junkie

Liger88

2[H]4U

Red Falcon

[H]ard DCOTM December 2023

KazeoHin

[H]F Junkie