GTX 480 SLI PCIe Bandwidth Perf. - x16/x16 vs. x4/x4 @ [H]

FrgMstr · Aug 25, 2010

Jeremy C said:
I'm wondering just how much the sli bridge comes into play with this as well. Time to research that little piece of hardware to see what all it actualy does.

Hehe, the SLI and CFX bridges are "free" though and yes we have seen where not using them does cause issues, but not all the time. Take yours off and see if you see a difference or if the tech will even work. I have seen different results. But generally, yes, use the bridges for SLI and CFX.

Thor · Aug 25, 2010

So what I'm reading here, is that at PCIe 4x a single card as powerful as the 480gtx still dose not consume enough bandwidth to fully tap out an AGP slot with the exception of the most demanding of cases. And, even at that point, the difference is hardly noticeable during game play, and really only shows up on bench marks and test graphs. (Agp 8x is 2133MB/s where as PCIe 1x is 500MB/s, there for PCIe 4x is 2000MB/s) and as such, the difference is about 6.65% in lane bandwidth..

I show a difference of min 12.37% max 1.8% avg 3.21% between 16x2 and 4x2. the extra 133mbs afforded by agp 8x almost nearly completely makes up the difference (and it just might totally make up the difference in a canned benchmark) your looking at ~5% difference on the very low end of the most taxing games, on the best hardware. I doubt you would notice..

My point, we have been pushed these newer motherboards and graphics cards, forced to upgrade to "keep up" because they said that agp was tapped out, And spent (in this community) thousands of dollars per person in doing so... now granted we would have bought newer mother boards as sockets have changed, but it would have saved us some coin in that the older standard would not need any R&D. the interface, chip sets, and code where well honed. And even HP has servers with multiple AGP slots (one of them had 64 slots...), so its not like it wasnt possible to make such a beast as an SLI'ed AGP gaming rig.

bh192012 · Aug 25, 2010

I'm curious how that effected electrical usage, and how the cards made due if electricity was reduced.... Or does PCI-E give full electricity from the first set of PCI-E pins? If not, then did the cards scale back due to lack of juice during peak load etc.

faugusztin · Aug 25, 2010

jeremyshaw said:
H55 and H57 maybe, but many P55 boards just split their main PCIe16 into two PCIe8 slots (lane switching?)

Well, no.

For example - Gigabyte P55-UD5, we can't call this entry level board, right ?http://www.gigabyte.com/products/product-page.aspx?pid=3159
Now take a look at specs :

Code:

1 x PCI Express x16 slot, running at x16 (PCIEX16_1)
1 x PCI Express x16 slot, running at x8 (PCIEX8_1) (The PCIEX16_1 and PCIEX8_1 slots conform to PCI Express 2.0 standard.)
1 x PCI Express x16 slot, running at x4 (PCIEX4_1) (Note 4)

See the 3rd slot ? x16 slot, running at x4, but it's only 1.1 x4, not 2.0!

Now take a look at entry level board - Gigabyte P55-UD3 :
http://www.gigabyte.com/products/product-page.aspx?pid=3164#ov

Code:

1 x PCI Express x16 slot, running at x16 (PCIEX16) (The PCIEX16 slot conforms to PCI Express 2.0 standard.)
1 x PCI Express x16 slot, running at x4 (PCIEX4)

In this case, it's questionable if it is a x4 2.0 or x4 1.1, but considering the wording for PCIEX16 slots the x4 on this board is 1.1 too.

Long story short, the 3rd x16 slot on P55 boards is always x4 1.1 unless you have NF200 or Hydra chip on board.

jeremyshaw · Aug 25, 2010

faugusztin said:
Well, no.

For example - Gigabyte P55-UD5, we can't call this entry level board, right ?http://www.gigabyte.com/products/product-page.aspx?pid=3159
Now take a look at specs :

Code:

1 x PCI Express x16 slot, running at x16 (PCIEX16_1) 1 x PCI Express x16 slot, running at x8 (PCIEX8_1) (The PCIEX16_1 and PCIEX8_1 slots conform to PCI Express 2.0 standard.) 1 x PCI Express x16 slot, running at x4 (PCIEX4_1) (Note 4)

See the 3rd slot ? x16 slot, running at x4, but it's only 1.1 x4, not 2.0!

Now take a look at entry level board - Gigabyte P55-UD3 :
http://www.gigabyte.com/products/product-page.aspx?pid=3164#ov

Code:

1 x PCI Express x16 slot, running at x16 (PCIEX16) (The PCIEX16 slot conforms to PCI Express 2.0 standard.) 1 x PCI Express x16 slot, running at x4 (PCIEX4)

In this case, it's questionable if it is a x4 2.0 or x4 1.1, but considering the wording for PCIEX16 slots the x4 on this board is 1.1 too.

Long story short, the 3rd x16 slot on P55 boards is always x4 1.1 unless you have NF200 or Hydra chip on board.

Lol, I misinterpreted your original message.

There are H55 mobos what has x16/x4 as the ONLY configuration.

So I thought you mentioned X4 due to that, and set off to prove you "wrong"

But you are right, and the fact that the x4 PCIe1.1 lanes are futher linked via the DMI bus into the NB, adds a futher bottleneck (and latency).

ljbade · Aug 25, 2010

Now what want to see is a 5850 on 16x/4x which is what my mobo supports!

SoulHunter · Aug 25, 2010

I would say do it like this but, when adding content you could add it to one article, by adding pages. That way it's easier to find it all in one place later. Definitely surprised by the results. Saw 4X and though that it would be the breaking point. I guess not. Just goes to show that you don't always need the best and newest.

CaptNumbNutz · Aug 25, 2010

I want to see a PCIe 1x comparison now. This is really interesting stuff. A lot of this data can be extrapolated... for example if xfire and sli work fine on less lanes, then you should have zero problems running single graphics cards in slots that are only 1x and 4x electrically. This is good info for those running m-itx boards that may or may not have full x16 slots. Heck, if you can run a graphics card in a 1x slot with minimal slowdown, then that means you can run the 1x to 16x ribbon cable adapters, or even do a mini pcie adapter to x16 slot for an external GPU solution for laptops. I believe someone did that on the forums not too long ago.

strikerP4 · Aug 25, 2010

I have a P35 board and having two 4850s in CF (PCIe-x 1.1 16x/4/x) would give me lower framerates than just a single 4850 in newer games.

Mr. Bluntman · Aug 25, 2010

Great to know that my PCIe 1.x motherboard isn't holding me back. I was mulling over X48 and P45 motherboards before this article. Thanks Kyle and Brent for showing me I don't need a new motherboard just yet!

Ultima99 · Aug 25, 2010

One more vote for the short and sweet tests like this. Of course keep the full length format for full reviews, but it seems a lot of us really like the expiramental testing like this.

So just for the hell of it x1/x1?

DanNeely · Aug 25, 2010

Could you do a quick test at 5760x1200 with x4 bandwidth? I'd like to see where the hard limits are; so far you've only shown relatively mild performance hits.

Aaron11 · Aug 25, 2010

Wow, interesting. I wonder how a 5870 would perform on my budget board which supports x16/x4 xfire. Then again, crossfire scales different than SLI so we'd have to see, But still, this actually opens the possibility of crossfiring on my board.

anthony256 · Aug 25, 2010

DanNeely said:
Could you do a quick test at 5760x1200 with x4 bandwidth? I'd like to see where the hard limits are; so far you've only shown relatively mild performance hits.

I too would like to see higher resolution tests, with 4x/4x or even 16x/4x.

Interesting results tho - goes to show people that they don't need $800 motherboards to run tri SLI/CFX.

kev- · Aug 25, 2010

I'm curious if the bandwidth limitation might be apparent when using a pair of video cards with 512MB at each set of bandwidth configurations, also to include x16/x4 (rather than x4/x4) since there are some motherboards out there with x16/x8 and x16/x4.

Using the cards with under 1GB on board may be where the PCIe bus bandwidth begins to matter.

Hornet · Aug 26, 2010

kev- said:
I'm curious if the bandwidth limitation might be apparent when using a pair of video cards with 512MB at each set of bandwidth configurations, also to include x16/x4 (rather than x4/x4) since there are some motherboards out there with x16/x8 and x16/x4.

Using the cards with under 1GB on board may be where the PCIe bus bandwidth begins to matter.

IMO, if the card have to start swapping textures in and out due to limited memory, performance is going to suffer from stuttering and other slowdowns and not even x16 can prevent that. Ideally, we would still want to ensure everything can fit into the graphic card's memory.

NKDietrich · Aug 26, 2010

Hornet said:
IMO, if the card have to start swapping textures in and out due to limited memory, performance is going to suffer from stuttering and other slowdowns and not even x16 can prevent that. Ideally, we would still want to ensure everything can fit into the graphic card's memory.

That's what I was thinking. If you reach a point where you're swapping in crap from system memory all the time, your FPS is probably already in the toilet, and the performance hit from the PCIe bus is irrelevant.

I mean it would be interesting from a purely academic standpoint to see if that is true at all, but from a practical usability perspective it's not very relevant since no one is going to be playing at settings that give them 15-20fps anyway.

Jodiuh · Aug 26, 2010

I have the original maximus and for some reason nvidia cards run 16x in pcie 1.1 mode rather than the full 2.0 spec. So essentially, im running 8x v2.0. Its great to know I've got some life left in this board for the next gpu refresh.

Devistater · Aug 26, 2010

Wow, thanks for the article. Thats very interesting.
Like most other replies, I like more numerous shorter articles (as long as you guys keep doing the in-depth stuff as needed)

Derangel said:
I love these short form articles for things like this. I would love to see more of them, especially if you guys do some more testing like this. I'd love to see the real-world impact (or lack there of) that memory would have on gaming. Like 2GB vs 4GB or different speeds. Even how much or how little effect triple channel DDR3 would have over dual channel DDR3.

Last I checked, the double vs triple channel is maybe 1% overall performance increase.

I just googled again to see, here's an example:
http://www.insidehw.com/Reviews/Mem...el-vs.-Triple-Channel-Memory-Mode/Page-4.html

This has better more detailed info (make sure to compare 920 [dual] with the same memory freq and timings triple, the bottom 2 results in the first 3 graphs):
http://techreport.com/articles.x/15967/6

So really, putting together an 1366 system vs an 1156 system costs significantly more money for very little overall performance increase even in gaming, in terms of the chipset if using similar performing processors.

baker269 · Aug 26, 2010

Nice to know that all I need to upgrade in the next 2-3 years is the GPU. Maybe upgrade to a hex core if the price is right. All ready got over a year out of this build and it's good to know that it will still be a highish end computer for a bit longer. Best $1600 I've ever spent.

Devistater · Aug 26, 2010

baker269 said:
Nice to know that all I need to upgrade in the next 2-3 years is the GPU. Maybe upgrade to a hex core if the price is right. All ready got over a year out of this build and it's good to know that it will still be a highish end computer for a bit longer. Best $1600 I've ever spent.

IMHO, for single monitor gaming, hardware (both GPU and CPU) have outpaced the software (games) by a significant amount. Even a $50 nvidia 240 will get playable framerates (without AA) on games like L4D2 at high graphics settings and highish resolution.

I'd really like to see more games take advantage of DX11, but I think a big thing holding games back is consoles and the developement for such. They have essentially obsolete hardware on consoles, and yet everyone is still developing for them and then porting over to PC.

Many games are still using DX9 (fallout new vegas is just one of the upcoming games to do so), and not even using dx10 let alone dx11.

And yet, even with the most impressive hardware, we still see x4/x4 setups are not the main bottleneck. Fascinating stuff.

baker269 · Aug 26, 2010

Devistater said:
IMHO, for single monitor gaming, hardware (both GPU and CPU) have outpaced the software (games) by a significant amount. Even a $50 nvidia 240 will get playable framerates (without AA) on games like L4D2 at high graphics settings and highish resolution.

I'd really like to see more games take advantage of DX11, but I think a big thing holding games back is consoles and the developement for such. They have essentially obsolete hardware on consoles, and yet everyone is still developing for them and then porting over to PC.

Many games are still using DX9 (fallout new vegas is just one of the upcoming games to do so), and not even using dx10 let alone dx11.

And yet, even with the most impressive hardware, we still see x4/x4 setups are not the main bottleneck. Fascinating stuff.

Try playing just cause 2 or BFBC2 on a 240 at high settings and see what happens. Let alone games like metro 2033.

IronRuler · Aug 26, 2010

Interesting. I thought for sure we would see some differences.

lunix · Aug 26, 2010

The amount of data sent to the card is independent of the resolution the frames are rendered at. You send the data to render the scene then it is rendered at the chosen resolution. Cranking the res or the AA does not tax the bus. Seriously Brent, I thought you would know better.

Actually on second thought I would be interested to see the results at a very low resolution as the more frames that are being rendered the more data needs to be sent over the bus. Therefore you would be relying more on the bus speed.

jebo_4jc · Aug 26, 2010

unbelievable!

I like the short format.

And I know this horse is close to dead, but this does leave me curious about ATI cards. It seems apparent that, in particular, the 5970 uses PCIe bandwidth much differently than the gtx480 does.

Bottom line, I would read another bandwidth follow up regarding ATI cards.....

bobzdar · Aug 26, 2010

I'd like to see this done with cards that have low amounts of RAM on them, like the GTX 460's with 768mb. I think you won't come close to saturating the PCIe bus (even at 4x) unless you're pushing texture data across it, that's when you could see a big difference. That's why you only see a difference on triple monitor with AA enabled as the cards are running out of ram and you have to push the textures across the bus. x4/x4 with 1.53gb gtx480's is a mismatch in the real world as I doubt you'd ever see that. x4/x4 with two 768mb gtx 460 might actually happen and that's where I'd expect there to be a difference.

Try the 1gb 460's that you used in the first article vs the 480's on the x4/x4 setup and see if the pcie bus lanes start to show a difference at a lower resolution than the 480 and you have your answer.

I do like seeing these articles, though.

jebo_4jc · Aug 26, 2010

oh interesting. that's a good point.

a test with 768mb gtx460s would be very useful, that is, if brent isn't tired of taping up pci-e interfaces yet

THRESHIN · Aug 26, 2010

wow....just wow. i didnt see that one coming. i expected the framerate to drop at 8x/8x so....yeah. goes to show how we're getting stiffed with all the changes of slots/mobos over the years

Trimlock · Aug 27, 2010

Good to see Motherboards lasting longer then before, especially as performance goes.

WaltC · Aug 28, 2010

Ok, so why do 3d cards have between 512MBs or an amazing 2GBs of ram onboard? A: it is to as much as possible remove the bandwidth bottleneck from the PCIe bus and place it squarely where it belongs--in the much, much faster 3d-card's local ram bus. As to whether the local ram on a 3d card will equal 512MB or 2GBs, that's purely a matter of economic consideration as in: How much money does the consumer wish to spend on each of his 3d cards? 1 GB is probably the sweet spot, 2 GBs the ultimate, and 512MB's is all the practical man can afford. But they all to some degree serve to eliminate the PCIe bus as a system bottleneck.

First, while the 3d card is displaying a series of frames already loaded into the card's onboard memory, the card is also actively querying the PCIe bus for the next segment of frames in system ram, frames that the card wants to load into its local ram memory so that when the first series of frames is exhausted, the second series of frames can seamlessly and immediately run at full speed, the user noticing no demarcation whatever among the two sets of frames. Running the latest series of frames even while the card is dumping previously rendered content from memory and loading frames which have yet to be seen is an ongoing process which the game repeats endlessly until it is finished. Obviously, the more onboard ram a card has the less repetitions of this nature it will have to do.

This is *why* the stress and bottleneck is not put on the PCIe bus! Even at x16, the PCIe bus is still far too slow to allow a smooth transition when changing between frame series that are stored in the cards 3d memory. What would happen if the game went to PCIex16 from a 80fps collage of frames, to PCIe16 momentarily to find the next set and then took whatever time the PCIex16 bus needed to whatever time it required to align the frames and synchronize the tracts so that the required unity of the rendered piece was a bit but necessary cost spent to ensure that the video as a single, unflawed, unbroken video as was possible to present.

To have to stop and wait every time data had to be mined from the system ram via the PCIe bus would chop up the presentation terribly so that the distraction it caused might be the only memorable thing about the game, if it was remembered at all.

The more pauses a game makes with respect to digging up data though the PCIe bus, or otherwise just plain stalling at the PCIe bus, the less memorable it is, the less cohesive a yarn the storyteller conveys, and of course, the less believable the story develops. Even though the slowest illustration of the PCIe bus seems "lightning fast" by our own standards of what is not so fast after all, I sincerely think we should add "fingers" and "Thumbnails" to the "taboo-protected list of limbs..

Dazed66 · Aug 28, 2010

I liked the way you spread this series of articles out, for me it made the information far easier to comprehend and digest as opposed to if you waited and published all the test results for everything at once.

I like many others was surprised at these results and it will certainly affect how I look my motherboard choices for my next build.

techie81 · Aug 30, 2010

Could someone tell me if this motherboard has the PCIE 1.1 or PCIE 2 x4 slot.

I want to know if I would be limited if I crossfired my 4850 1gb card.

Rubycon · Aug 31, 2010

Sorry, have not read though this entire thread...

(btw thanks for doing this testing - it seems to reinforce what I've been seeing with my non NF200 boards so far!)

What about the effects of SLI bridges? I know if you don't use 'em performance drops. A three way bridge on two cards does NOT connect BOTH goldfingers together. You can verify this with an ohmmeter.

I have tried with one and two on dual 480s and it does seem to be slightly better with both goldfingers connected together. Still nothing conclusive as the differences were minor, well within standard benchmark deviation.

Sprkslfly · Aug 31, 2010

1st: Personally, I like the new format.

2nd: Sincere thanks for going back, and being willing to check out new tech on older
architectures. This isn't the first time we've been duped/misled. In fact the last time pissed me off so much that I've been holding off for a long damn time to update from the 8800gtx to the new 460 gtx I've had for about 3 weeks now. Things like this article really help! Thank you.

Will probably get another 460gtx (and do SLI on my ASUS P5Q Pro) and hopefully be good for another 3 years (barring finally converting to 64bit os and upping to 4gb ram)

Supertag said:
That depends on the content of the article. Brand new *architectures* require an article with more content.

Fixed

IMO, new rev's (unless a serious change ala 460) don't require nearly as much background detail. Just a 'guided update' so to speak.

diesavagenation · Sep 2, 2010

sweet now I can get another 4870 and run crossfire in 8x, 4x and not worry about performance

GTX 480 SLI PCIe Bandwidth Perf. - x16/x16 vs. x4/x4 @ [H]

Just Plain Mean

2[H]4U

Gawd

2[H]4U

[H]F Junkie

Limp Gawd

Limp Gawd

Fully [H]

Limp Gawd

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

[H]ard|Gawd

Weaksauce

n00b

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Gawd

Weaksauce

Gawd

Weaksauce

Gawd

Limp Gawd

[H]ard|DCer of the Month - April 2011

[H]ard|Gawd

[H]ard|DCer of the Month - April 2011

2[H]4U

[H]F Junkie

[H]ard|Gawd

Limp Gawd

[H]ard for [H]ardware

Weaksauce

Gawd

Limp Gawd