GTX 480 SLI PCIe Bandwidth Perf. - x16/x16 vs. x4/x4 @ [H]

I'm wondering just how much the sli bridge comes into play with this as well. Time to research that little piece of hardware to see what all it actualy does.


Hehe, the SLI and CFX bridges are "free" though and yes we have seen where not using them does cause issues, but not all the time. Take yours off and see if you see a difference or if the tech will even work. I have seen different results. But generally, yes, use the bridges for SLI and CFX.
 
So what I'm reading here, is that at PCIe 4x a single card as powerful as the 480gtx still dose not consume enough bandwidth to fully tap out an AGP slot with the exception of the most demanding of cases. And, even at that point, the difference is hardly noticeable during game play, and really only shows up on bench marks and test graphs. (Agp 8x is 2133MB/s where as PCIe 1x is 500MB/s, there for PCIe 4x is 2000MB/s) and as such, the difference is about 6.65% in lane bandwidth..

I show a difference of min 12.37% max 1.8% avg 3.21% between 16x2 and 4x2. the extra 133mbs afforded by agp 8x almost nearly completely makes up the difference (and it just might totally make up the difference in a canned benchmark) your looking at ~5% difference on the very low end of the most taxing games, on the best hardware. I doubt you would notice..

My point, we have been pushed these newer motherboards and graphics cards, forced to upgrade to "keep up" because they said that agp was tapped out, And spent (in this community) thousands of dollars per person in doing so... now granted we would have bought newer mother boards as sockets have changed, but it would have saved us some coin in that the older standard would not need any R&D. the interface, chip sets, and code where well honed. And even HP has servers with multiple AGP slots (one of them had 64 slots...), so its not like it wasnt possible to make such a beast as an SLI'ed AGP gaming rig.
 
I'm curious how that effected electrical usage, and how the cards made due if electricity was reduced.... Or does PCI-E give full electricity from the first set of PCI-E pins? If not, then did the cards scale back due to lack of juice during peak load etc.
 
H55 and H57 maybe, but many P55 boards just split their main PCIe16 into two PCIe8 slots (lane switching?)

Well, no.

For example - Gigabyte P55-UD5, we can't call this entry level board, right ?http://www.gigabyte.com/products/product-page.aspx?pid=3159
Now take a look at specs :
Code:
1 x PCI Express x16 slot, running at x16 (PCIEX16_1)
1 x PCI Express x16 slot, running at x8 (PCIEX8_1) (The PCIEX16_1 and PCIEX8_1 slots conform to PCI Express 2.0 standard.)
1 x PCI Express x16 slot, running at x4 (PCIEX4_1) (Note 4)
See the 3rd slot ? x16 slot, running at x4, but it's only 1.1 x4, not 2.0!

Now take a look at entry level board - Gigabyte P55-UD3 :
http://www.gigabyte.com/products/product-page.aspx?pid=3164#ov

Code:
1 x PCI Express x16 slot, running at x16 (PCIEX16) (The PCIEX16 slot conforms to PCI Express 2.0 standard.)
1 x PCI Express x16 slot, running at x4 (PCIEX4)

In this case, it's questionable if it is a x4 2.0 or x4 1.1, but considering the wording for PCIEX16 slots the x4 on this board is 1.1 too.

Long story short, the 3rd x16 slot on P55 boards is always x4 1.1 unless you have NF200 or Hydra chip on board.
 
Well, no.

For example - Gigabyte P55-UD5, we can't call this entry level board, right ?http://www.gigabyte.com/products/product-page.aspx?pid=3159
Now take a look at specs :
Code:
1 x PCI Express x16 slot, running at x16 (PCIEX16_1)
1 x PCI Express x16 slot, running at x8 (PCIEX8_1) (The PCIEX16_1 and PCIEX8_1 slots conform to PCI Express 2.0 standard.)
1 x PCI Express x16 slot, running at x4 (PCIEX4_1) (Note 4)
See the 3rd slot ? x16 slot, running at x4, but it's only 1.1 x4, not 2.0!

Now take a look at entry level board - Gigabyte P55-UD3 :
http://www.gigabyte.com/products/product-page.aspx?pid=3164#ov

Code:
1 x PCI Express x16 slot, running at x16 (PCIEX16) (The PCIEX16 slot conforms to PCI Express 2.0 standard.)
1 x PCI Express x16 slot, running at x4 (PCIEX4)

In this case, it's questionable if it is a x4 2.0 or x4 1.1, but considering the wording for PCIEX16 slots the x4 on this board is 1.1 too.

Long story short, the 3rd x16 slot on P55 boards is always x4 1.1 unless you have NF200 or Hydra chip on board.

Lol, I misinterpreted your original message.

There are H55 mobos what has x16/x4 as the ONLY configuration.

So I thought you mentioned X4 due to that, and set off to prove you "wrong" :p

But you are right, and the fact that the x4 PCIe1.1 lanes are futher linked via the DMI bus into the NB, adds a futher bottleneck (and latency).
 
Now what want to see is a 5850 on 16x/4x which is what my mobo supports!
 
I would say do it like this but, when adding content you could add it to one article, by adding pages. That way it's easier to find it all in one place later. Definitely surprised by the results. Saw 4X and though that it would be the breaking point. I guess not. Just goes to show that you don't always need the best and newest.
 
I want to see a PCIe 1x comparison now. This is really interesting stuff. A lot of this data can be extrapolated... for example if xfire and sli work fine on less lanes, then you should have zero problems running single graphics cards in slots that are only 1x and 4x electrically. This is good info for those running m-itx boards that may or may not have full x16 slots. Heck, if you can run a graphics card in a 1x slot with minimal slowdown, then that means you can run the 1x to 16x ribbon cable adapters, or even do a mini pcie adapter to x16 slot for an external GPU solution for laptops. I believe someone did that on the forums not too long ago.
 
I have a P35 board and having two 4850s in CF (PCIe-x 1.1 16x/4/x) would give me lower framerates than just a single 4850 in newer games.
 
Great to know that my PCIe 1.x motherboard isn't holding me back. I was mulling over X48 and P45 motherboards before this article. Thanks Kyle and Brent for showing me I don't need a new motherboard just yet!
 
One more vote for the short and sweet tests like this. Of course keep the full length format for full reviews, but it seems a lot of us really like the expiramental testing like this.

So just for the hell of it x1/x1?
 
Could you do a quick test at 5760x1200 with x4 bandwidth? I'd like to see where the hard limits are; so far you've only shown relatively mild performance hits.
 
Wow, interesting. I wonder how a 5870 would perform on my budget board which supports x16/x4 xfire. Then again, crossfire scales different than SLI so we'd have to see, But still, this actually opens the possibility of crossfiring on my board.
 
Could you do a quick test at 5760x1200 with x4 bandwidth? I'd like to see where the hard limits are; so far you've only shown relatively mild performance hits.

I too would like to see higher resolution tests, with 4x/4x or even 16x/4x.

Interesting results tho - goes to show people that they don't need $800 motherboards to run tri SLI/CFX.
 
I'm curious if the bandwidth limitation might be apparent when using a pair of video cards with 512MB at each set of bandwidth configurations, also to include x16/x4 (rather than x4/x4) since there are some motherboards out there with x16/x8 and x16/x4.

Using the cards with under 1GB on board may be where the PCIe bus bandwidth begins to matter.
 
I'm curious if the bandwidth limitation might be apparent when using a pair of video cards with 512MB at each set of bandwidth configurations, also to include x16/x4 (rather than x4/x4) since there are some motherboards out there with x16/x8 and x16/x4.

Using the cards with under 1GB on board may be where the PCIe bus bandwidth begins to matter.

IMO, if the card have to start swapping textures in and out due to limited memory, performance is going to suffer from stuttering and other slowdowns and not even x16 can prevent that. Ideally, we would still want to ensure everything can fit into the graphic card's memory.
 
IMO, if the card have to start swapping textures in and out due to limited memory, performance is going to suffer from stuttering and other slowdowns and not even x16 can prevent that. Ideally, we would still want to ensure everything can fit into the graphic card's memory.

That's what I was thinking. If you reach a point where you're swapping in crap from system memory all the time, your FPS is probably already in the toilet, and the performance hit from the PCIe bus is irrelevant.

I mean it would be interesting from a purely academic standpoint to see if that is true at all, but from a practical usability perspective it's not very relevant since no one is going to be playing at settings that give them 15-20fps anyway.
 
I have the original maximus and for some reason nvidia cards run 16x in pcie 1.1 mode rather than the full 2.0 spec. So essentially, im running 8x v2.0. Its great to know I've got some life left in this board for the next gpu refresh.
 
Wow, thanks for the article. Thats very interesting.
Like most other replies, I like more numerous shorter articles (as long as you guys keep doing the in-depth stuff as needed)

I love these short form articles for things like this. I would love to see more of them, especially if you guys do some more testing like this. I'd love to see the real-world impact (or lack there of) that memory would have on gaming. Like 2GB vs 4GB or different speeds. Even how much or how little effect triple channel DDR3 would have over dual channel DDR3.

Last I checked, the double vs triple channel is maybe 1% overall performance increase.

I just googled again to see, here's an example:
http://www.insidehw.com/Reviews/Mem...el-vs.-Triple-Channel-Memory-Mode/Page-4.html

This has better more detailed info (make sure to compare 920 [dual] with the same memory freq and timings triple, the bottom 2 results in the first 3 graphs):
http://techreport.com/articles.x/15967/6

So really, putting together an 1366 system vs an 1156 system costs significantly more money for very little overall performance increase even in gaming, in terms of the chipset if using similar performing processors.
 
Last edited:
Nice to know that all I need to upgrade in the next 2-3 years is the GPU. Maybe upgrade to a hex core if the price is right. All ready got over a year out of this build and it's good to know that it will still be a highish end computer for a bit longer. Best $1600 I've ever spent.
 
Nice to know that all I need to upgrade in the next 2-3 years is the GPU. Maybe upgrade to a hex core if the price is right. All ready got over a year out of this build and it's good to know that it will still be a highish end computer for a bit longer. Best $1600 I've ever spent.

IMHO, for single monitor gaming, hardware (both GPU and CPU) have outpaced the software (games) by a significant amount. Even a $50 nvidia 240 will get playable framerates (without AA) on games like L4D2 at high graphics settings and highish resolution.

I'd really like to see more games take advantage of DX11, but I think a big thing holding games back is consoles and the developement for such. They have essentially obsolete hardware on consoles, and yet everyone is still developing for them and then porting over to PC.

Many games are still using DX9 (fallout new vegas is just one of the upcoming games to do so), and not even using dx10 let alone dx11.

And yet, even with the most impressive hardware, we still see x4/x4 setups are not the main bottleneck. Fascinating stuff.
 
IMHO, for single monitor gaming, hardware (both GPU and CPU) have outpaced the software (games) by a significant amount. Even a $50 nvidia 240 will get playable framerates (without AA) on games like L4D2 at high graphics settings and highish resolution.

I'd really like to see more games take advantage of DX11, but I think a big thing holding games back is consoles and the developement for such. They have essentially obsolete hardware on consoles, and yet everyone is still developing for them and then porting over to PC.

Many games are still using DX9 (fallout new vegas is just one of the upcoming games to do so), and not even using dx10 let alone dx11.

And yet, even with the most impressive hardware, we still see x4/x4 setups are not the main bottleneck. Fascinating stuff.

Try playing just cause 2 or BFBC2 on a 240 at high settings and see what happens. Let alone games like metro 2033.
 
The amount of data sent to the card is independent of the resolution the frames are rendered at. You send the data to render the scene then it is rendered at the chosen resolution. Cranking the res or the AA does not tax the bus. Seriously Brent, I thought you would know better.

Actually on second thought I would be interested to see the results at a very low resolution as the more frames that are being rendered the more data needs to be sent over the bus. Therefore you would be relying more on the bus speed.
 
unbelievable!

I like the short format.

And I know this horse is close to dead, but this does leave me curious about ATI cards. It seems apparent that, in particular, the 5970 uses PCIe bandwidth much differently than the gtx480 does.

Bottom line, I would read another bandwidth follow up regarding ATI cards.....
 
I'd like to see this done with cards that have low amounts of RAM on them, like the GTX 460's with 768mb. I think you won't come close to saturating the PCIe bus (even at 4x) unless you're pushing texture data across it, that's when you could see a big difference. That's why you only see a difference on triple monitor with AA enabled as the cards are running out of ram and you have to push the textures across the bus. x4/x4 with 1.53gb gtx480's is a mismatch in the real world as I doubt you'd ever see that. x4/x4 with two 768mb gtx 460 might actually happen and that's where I'd expect there to be a difference.

Try the 1gb 460's that you used in the first article vs the 480's on the x4/x4 setup and see if the pcie bus lanes start to show a difference at a lower resolution than the 480 and you have your answer.

I do like seeing these articles, though.
 
oh interesting. that's a good point.

a test with 768mb gtx460s would be very useful, that is, if brent isn't tired of taping up pci-e interfaces yet :)
 
wow....just wow. i didnt see that one coming. i expected the framerate to drop at 8x/8x so....yeah. goes to show how we're getting stiffed with all the changes of slots/mobos over the years
 
Good to see Motherboards lasting longer then before, especially as performance goes.
 
Ok, so why do 3d cards have between 512MBs or an amazing 2GBs of ram onboard? A: it is to as much as possible remove the bandwidth bottleneck from the PCIe bus and place it squarely where it belongs--in the much, much faster 3d-card's local ram bus. As to whether the local ram on a 3d card will equal 512MB or 2GBs, that's purely a matter of economic consideration as in: How much money does the consumer wish to spend on each of his 3d cards? 1 GB is probably the sweet spot, 2 GBs the ultimate, and 512MB's is all the practical man can afford. But they all to some degree serve to eliminate the PCIe bus as a system bottleneck.

First, while the 3d card is displaying a series of frames already loaded into the card's onboard memory, the card is also actively querying the PCIe bus for the next segment of frames in system ram, frames that the card wants to load into its local ram memory so that when the first series of frames is exhausted, the second series of frames can seamlessly and immediately run at full speed, the user noticing no demarcation whatever among the two sets of frames. Running the latest series of frames even while the card is dumping previously rendered content from memory and loading frames which have yet to be seen is an ongoing process which the game repeats endlessly until it is finished. Obviously, the more onboard ram a card has the less repetitions of this nature it will have to do.

This is *why* the stress and bottleneck is not put on the PCIe bus! Even at x16, the PCIe bus is still far too slow to allow a smooth transition when changing between frame series that are stored in the cards 3d memory. What would happen if the game went to PCIex16 from a 80fps collage of frames, to PCIe16 momentarily to find the next set and then took whatever time the PCIex16 bus needed to whatever time it required to align the frames and synchronize the tracts so that the required unity of the rendered piece was a bit but necessary cost spent to ensure that the video as a single, unflawed, unbroken video as was possible to present.

To have to stop and wait every time data had to be mined from the system ram via the PCIe bus would chop up the presentation terribly so that the distraction it caused might be the only memorable thing about the game, if it was remembered at all.

The more pauses a game makes with respect to digging up data though the PCIe bus, or otherwise just plain stalling at the PCIe bus, the less memorable it is, the less cohesive a yarn the storyteller conveys, and of course, the less believable the story develops. Even though the slowest illustration of the PCIe bus seems "lightning fast" by our own standards of what is not so fast after all, I sincerely think we should add "fingers" and "Thumbnails" to the "taboo-protected list of limbs..;)
 
I liked the way you spread this series of articles out, for me it made the information far easier to comprehend and digest as opposed to if you waited and published all the test results for everything at once.

I like many others was surprised at these results and it will certainly affect how I look my motherboard choices for my next build.
 
Could someone tell me if this motherboard has the PCIE 1.1 or PCIE 2 x4 slot.

I want to know if I would be limited if I crossfired my 4850 1gb card.
 
Sorry, have not read though this entire thread...

(btw thanks for doing this testing - it seems to reinforce what I've been seeing with my non NF200 boards so far!)

What about the effects of SLI bridges? I know if you don't use 'em performance drops. A three way bridge on two cards does NOT connect BOTH goldfingers together. You can verify this with an ohmmeter.

I have tried with one and two on dual 480s and it does seem to be slightly better with both goldfingers connected together. Still nothing conclusive as the differences were minor, well within standard benchmark deviation.
 
1st: Personally, I like the new format.

2nd: Sincere thanks for going back, and being willing to check out new tech on older
architectures. This isn't the first time we've been duped/misled. In fact the last time pissed me off so much that I've been holding off for a long damn time to update from the 8800gtx to the new 460 gtx I've had for about 3 weeks now. Things like this article really help! Thank you.

Will probably get another 460gtx (and do SLI on my ASUS P5Q Pro) and hopefully be good for another 3 years (barring finally converting to 64bit os and upping to 4gb ram)

That depends on the content of the article. Brand new *architectures* require an article with more content.

Fixed ;)

IMO, new rev's (unless a serious change ala 460) don't require nearly as much background detail. Just a 'guided update' so to speak.
 
Last edited:
Back
Top