Please explain GPU internal bus bottlenecking

Quartz-1 · Nov 5, 2014

Over the years I've seen a lot of claims that video card X is bottlenecked because it only has a Y bit bus.

How do you determine that? How do you demonstrate it?

Note that I'm NOT asking about CPU vs GPU or PCI Express, but the elements wholly internal to the video card.

cyclone3d · Nov 5, 2014

Look at any card that was offered in multiple versions, such as the Nvidia GTX460 which came in both a 192 and a 256-bit version.

You can tell the difference by benchmarking at the same settings and see the the 256-bit version is faster.

You just have to make sure you are running a game that is not being slowed down by the lower amount of RAM on the 192-bit version on the card.

FnordMan · Nov 5, 2014

Memory bus, video cards need memory bandwidth and a LOT of it, that's why a 192-bit bus card will run slower than one with a 256-bit bus if everything else is the same.

Quartz-1 · Nov 5, 2014

FnordMan said:
Memory bus, video cards need memory bandwidth and a LOT of it, that's why a 192-bit bus card will run slower than one with a 256-bit bus if everything else is the same.

People say that, but can you demonstrate it? How do you determine it's an actual bottleneck in the first place? It's easy to say that a 256 bit bus is better than a 192 bit bus but how do you know that it's an actual bottleneck? How do you demonstrate it? Maybe the 192 bit bus reflects a limitation elsewhere?

Parja · Nov 5, 2014

So to try to generally answer the question of the OP, you've basically got three factors affecting a video card's performance, pixel fillrate, texture fillrate, and memory bandwidth. Pixel and texture fillrate are a result of the GPU itself while memory bandwidth is (very generally) a factor of the memory bus width and memory speed.

The only way to really determine the effect it has is to benchmark cards. There's really no way to look at the specs and say card A will be faster than card B simply due to memory bandwidth, especially if they're cards of different generations and definitely if you're comparing red vs. green.

Araxie · Nov 5, 2014

hmm.. need to explain fast because i'm on phone, GPU bus width its just like a Highway, a greater bus mean more data can travel through the channels like a 4 channels Highway vs 2 Channel Highway.. so more lanes more flow and thats directly expressed as Bandwidth...

in the GPU world the bus width more than the bandwidth itself affect lot of things like high amount of AntiAliasing and High Resolutions that's why you can see a 780TI trading blows with the more powerful 980 at high resolutions or surround setups, (specially in multi GPU situations). and thats why in those situations you can see AMD 290(x)s beign still better at 4K and multi monitor setups due the 512bit bus.

also the way Nvidia set the Bus in some cards for example the 660TI can also mean a bottleneck, for example a 192Bit bus can normally have 6 memory chips or 1.5gb of vRAM. in the case of that card 2 of the memory chips are soldered in the back of the card for a total of 2GB and using a asynchronous memory controller.. in which a setup for 2GB will require 256bit bus to fully implement the 8 memory chips at 32bit.. and that easily explain why the 660TI have a horrible performance with even medium levels of Antialiasing and high resolutions but also have a limitation with the bandwidth when more than 1.5gb of vRAM are used, because all of the chips will work the last memory controller which will limit the bandwidth to 48gb/s so that's a GPU bus width Bottleneck..

Quartz-1 · Nov 5, 2014

Araxie said:
hmm.. need to explain fast because i'm on phone, GPU bus width its just like a Highway, a greater bus mean more data can travel through the channels like a 4 channels Highway vs 2 Channel Highway.. so more lanes more flow and thats directly expressed as Bandwidth...

in the GPU world the bus width more than the bandwidth itself affect lot of things like high amount of AntiAliasing and High Resolutions that's why you can see a 780TI trading blows with the more powerful 980 at high resolutions or surround setups, (specially in multi GPU situations). and thats why in those situations you can see AMD 290(x)s beign still better at 4K and multi monitor setups due the 512bit bus.

Yes, I can see that it's limiting, but I'm asking about bottlenecking - limiting below the capability of the GPU itself.

also the way Nvidia set the Bus in some cards for example the 660TI can also mean a bottleneck, for example a 192Bit bus can normally have 6 memory chips or 1.5gb of vRAM. in the case of that card 2 of the memory chips are soldered in the back of the card for a total of 2GB and using a asynchronous memory controller.. in which a setup for 2GB will require 256bit bus to fully implement the 8 memory chips at 32bit.. and that easily explain why the 660TI have a horrible performance with even medium levels of Antialiasing and high resolutions but also have a limitation with the bandwidth when more than 1.5gb of vRAM are used, because all of the chips will work the last memory controller which will limit the bandwidth to 48gb/s so that's a GPU bus width Bottleneck..

Hmm.. my 660 Ti had 3 GB so I never saw this, but that sounds.reasonable.

Araxie · Nov 5, 2014

Quartz-1 said:
Yes, I can see that it's limiting, but I'm asking about bottlenecking - limiting below the capability of the GPU itself.

Hmm.. my 660 Ti had 3 GB so I never saw this, but that sounds.reasonable.

thats because the 3GB version of the 660TI does not suffer from the asynchronous memory controller because it have 6 modules in each side of the card. so all the memory can be addressed and used correctly.. thats why you see 256bit or 512bit bus with 2gb or 4gb(also 4gb or 8gb) but no 3GB or 6GB. and thats why old cards like the GTX 580 have 1.5gb -3gb versions due to the 384bit. and thats why the 660TI with the 192bit only have 2gb and 3gb versions not 2gb or 4gb as a 256bit bus should be..

also about limiting.. the mentioned case of the 660TI its a good example of bottleneck.. a more clearly example? look the AMD R9 285 vs the R9 280. that its the perfect example of how the bus width bottleneck a card.. the R9 285 as being Tonga should be a better performer card than the R9 280 that are replacing.. but it have a problem. its 256bit vs the 384bit bus in the R9 280 and WTH mean that?. well its just simply as soon as you start to increase the resolution more than 1080P and high antialiasing levels.. the 256bit bus start to be a problem.. this should work as good example:

The 285 its supposed to be a better performer card in comparison to the 280 but the 280 have the big advantage of a wider 384bit bus giving it more bandwidth automatically..

and the same case should apply to the recent nvidia 970 and 980 cards.. those card with 384bit should be monster cards as it have been proven that they are extremely sensitive to the bandwidth increase by just overclocking the vRAM clock.. so yes I can think those card are very limited by the tiny 256bit bus width..

BroHamBone · Nov 5, 2014

Turn the faucet on half way - smaller bus
Turn the faucet on all the way - larger bus.

Same object, bus size allows more power to flow through

I'm just guessing with this Metaphor , but sounds good to me.

cageymaru · Nov 5, 2014

Wonder if you could overclock the ram until the point of diminishing returns? If the ram crashes first then it's a good design. If the card runs out of performance then the design sucks. But seriously I think that most GPU designs are pretty good and you shouldn't worry about it.

wonderfield · Nov 6, 2014

Quartz-1 said:
People say that, but can you demonstrate it? How do you determine it's an actual bottleneck in the first place?

The most practical method is to actually compare two cards which differ only in memory bus width (though, as a corollary, memory amount will tend to differ as well). Testing at multiple resolutions, AA settings and game settings will eventually reveal a point at which the performance of the card with the narrower bus drops sharply while the card with the wider bus sees more typical performance degradation.

NVIDIA and AMD use sophisticated simulators to determine how memory bus bottlenecks will manifest in new architectures, but they're beyond the reach of reviewers.

rennyf77 · Nov 6, 2014

This site examined this bus issue previously in an article that compared the gtx 670 with the gtx 660 ti 2gb card. Essentially they found a performance discrepancy but not one so large that it couldn't be overcome with a bit of overclocking. I think that my be why kyle and the gang didn't refer to memory bus width as the factor for why gtx 980 sli trailed behind 290x cf at 4k. They pinned it entirely on nvidias bridge based sli method. Im going off memory so anyone please correct me if I'm wrong.

Drep · Nov 6, 2014

BroHamBone said:
Turn the faucet on half way - smaller bus
Turn the faucet on all the way - larger bus.

Same object, bus size allows more power to flow through

I'm just guessing with this Metaphor , but sounds good to me.

Almost but not quite....

Turn that faucet on low or high...the bus is still the same. The bus in this scenario being the pipes. No matter if the faucet is turned on low or high, the pipe can only handle so much. Increase the pipe size = making your bus larger allowing more water to flow through the pipe

Unknown-One · Nov 6, 2014

FnordMan said:
Memory bus, video cards need memory bandwidth and a LOT of it, that's why a 192-bit bus card will run slower than one with a 256-bit bus if everything else is the same.

That depends on the workload. Some workloads wont stress either memory bus configuration, even while fully-loading the GPU core itself...

Hulk · Nov 6, 2014

To help you better understand larger vs smaller bus I will use the following photos:

Small bus

Large bus

Any questions?

KazeoHin · Nov 6, 2014

Here is the main thing that people may need to understand, though: the Bus is not the bottleneck, the overall memory bandwidth is the bottleneck. The 980 and 780Ti are able to keep up with the 290x in high resolutions/AA levels despite having drastically smaller memory bus widths, but their memory is clocked so high, that the overall bandwidth is similar. So you need to look at the bandwidth, not just the bus or the memory clock.

Quartz-1 · Nov 6, 2014

Hulk said:
To help you better understand larger vs smaller bus I will use the following photos:

You, sir, are an evil man.

misterbobby · Nov 6, 2014

KazeoHin said:
Here is the main thing that people may need to understand, though: the Bus is not the bottleneck, the overall memory bandwidth is the bottleneck. The 980 and 780Ti are able to keep up with the 290x in high resolutions/AA levels despite having drastically smaller memory bus widths, but their memory is clocked so high, that the overall bandwidth is similar. So you need to look at the bandwidth, not just the bus or the memory clock.

Overall bandwidth does not always mean everything and is just one metric to look at. The 780 ti has wee bit higher bandwidth than the 290x. The 980 has WAY less bandwidth than the 780 ti though. Some of that is made up for with new color compression techniques though.

xLegendary · Nov 7, 2014

And now there will be a new internal bandwidth, with the stacked memory in the GPUs (already in the next Xbox) and soon coming to the next gen AMD cards, 300 series

LordEC911 · Nov 7, 2014

xLegendary said:
And now there will be a new internal bandwidth, with the stacked memory in the GPUs (already in the next Xbox) and soon coming to the next gen AMD cards, 300 series

Xbox doesn't have stacked memory... Nor does it have any next-gen memory tech, it is ESRAM.

defaultluser · Nov 7, 2014

It's become a lot harder to explain in this age, because games are a lot more complex with highly variable loads, and the same goes for graphics cards.

In ten years, midrange 128-bit performance (6600gt to 750ti) has increased around 12x, while the raw memory bandwidth has increased around 6x. This happened due to improvements in caching, lossless compression and a shift from texture-heavy games to shader-heavy.

Unfortunately, this makes it much harder to characterize if a card is bandwidth limited. But the good thing is, both Nvidia and AMD have gotten so good at making scalable architectures, they no longer rely as heavily on bandwidth castration to make unique skus to fill product line-up gaps. Instead of cutting bandwidth in half (e.g. 6600 DDR, or 7600GS DDR2 cards), they either just cut shader blocks, or make small cuts from 256 to 192-bit memory interface.

Please explain GPU internal bus bottlenecking

Supreme [H]ardness

Fully [H]

[H]ard|Gawd

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

[H]ard|Gawd

Fully [H]

Supreme [H]ardness

2[H]4U

Weaksauce

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

2[H]4U

Weaksauce

[H]ard|Gawd

[H]F Junkie