No good 3x GPU SandyBridge Motherboards?

Vega

Supreme [H]ardness
Joined
Oct 12, 2004
Messages
7,143
I cannot find any 16x / 16x / 16x PCI-E SandyBridge P67 motherboards. The most I can find is the 16x / 16x / 8x GIGABYTE GA-P67A-UD7.

So if I wanted 3x-SLI 580's one of the cards would be seriously limited. If a mundane 5870 is already capped by PCI-E 8x, I am sure a over clocked 580 pushing 30-40% more bandwidth would be seriously limited.

perfrel.gif


So that effectively means P67 can't do top of the line 3x GPU solutions without a significant penalty. So I guess I have to wait for GTX 595's for Quad SLI or 6990's for Quad Crossfire if I want a 2600K system.

Either that or get a X58 system + an Intel 990X which can support 3x 16x PCI-E or wait for Ivy Bridge in Q4.

What a disappointment I was ready to pull the trigger on SandyBridge 2600k + 3x 580's.
 
I don't know about tri-SLI, but the [H] testing found no noticeable difference in SLI using x8x8 compared to x16x16. There was hardly a difference even at x4x4 if I remember the testing.

If you want full x16x16x16 you'll have to either go X58 or wait for socket 2011.
 
having full 16x support is useless when the CPU and chipset only support 16 lanes of pci express 2.0 total. Nforce 200 chipsets don't add bandwidth, they add card capacity for chipsets that don't support SLI more then anything.. all that bandwidth still needs to be crunched through the 16 lanes the CPU/chipset support. If that bothers you, you'll have to wait for socket 2011, which will support many more lanes of PCI Express 3.0. At this point the difference between 4x and 16x isn't much, so you could even do 3 cards on this chipset.. they'd each effectively get about 5.33333x worth of PCI express bandwidth each given the capacity of the chipset/CPU. If you want to run 3 cards, any of the motherboards such as the asus maximus or gigabyte UD7 which support 16/8/8 or 16/16/8 should work. Again 16/16/16 is useless with this chipset.
 
Last edited:
I don't know about tri-SLI, but the [H] testing found no noticeable difference in SLI using x8x8 compared to x16x16. There was hardly a difference even at x4x4 if I remember the testing.

If you want full x16x16x16 you'll have to either go X58 or wait for socket 2011.

That is correct there was no difference on a single display. But on a multiscreen setup, the difference started to appear. Not only that, those are stock 480's. OC'd 580's are going to demand 20-30% more bandwidth over stock 480's which are already clearly at limited by 8x PCI-e in multi-screen setups.

1282534990Cnhf3iYXfv_2_1.gif


1282534990Cnhf3iYXfv_2_2.gif



So my question would be, if I have 2x 580's operating at full 16x but the 3rd at 8x, do all of the cards perform as if they are at 8x? I would think the bottlenecked 580 on the 8x lane would negatively effect the Tri-SLI setup in some way.
 
sandy bridge socket 1155/p67 is not the chipset or CPU to use for tri-sli.... nforce 200 chips are useless as you still have to have all that bandwidth go through the 16 lanes of PCI express bandwidth the chipset/cpu support. That nforce chip was all well and good back in the day just to add SLI support for chipsets that didn't support them at all, but now that cards are coming closer to exhausting PCI express bandwidth, the bottleneck of the system is simply going to be the 16 lanes the chipset/CPU support, like trying to pour water through a funnel. You'll want to wait for socket 2011 for TRI SLI. Like I've been taught, those nforce chips don't add bandwidth... the capacity of socket 1155 P67 is 16 lanes of PCI express 2.0 period. Alright for 2 cards, not really for 3.

That's exactly what I was afraid of. So I guess my options would be to wait until Q4 for socket 2011, buy a X58 that can do 16x/16x/16x with the soon to be released 990x, or get a P67+2600k and go with quad SLI or Quad crossfire with future GTX 595's or 6990's and drop those into two 16x slots.
 
That's exactly what I was afraid of. So I guess my options would be to wait until Q4 for socket 2011, buy a X58 that can do 16x/16x/16x with the soon to be released 990x, or get a P67+2600k and go with quad SLI or Quad crossfire with future GTX 595's or 6990's and drop those into two 16x slots.

Except the two x16 2.0 slots are really x8 2.0 when they are filled, regardless on a P67 board. The # of lanes doesn't increase out of nowhere. The HardOCP bench you listed shows a miniscule difference with 2-card SLI @ x8... however if you start cramming in the equivalent of four cards/GPU's into that same space instead of 2, you'll be killing the performance relative to even a full x8 per GPU (32 lanes total) in that case. You're going to be running into bandwidth limitations no matter how you play it.

Long story short: Go for two-card SLI and don't worry about it as you're giving up a whopping 2% performance or so x8 2.0 vs x16 2.0 at this time, or choose to wait for S2011 at the later end of this year to get full lanes for all of your cards.
 
You sure on that? There a P67's out now that use the NF200 chip to get true 16x/16x/8x. The problem is I'd like 3-16x like you can get on the X58 chipset.
Since you apparently disregarded the third post, let me quote it for you:

having full 16x support is useless when the CPU and chipset only support 16 lanes of pci express 2.0 total. Nforce 200 chipsets don't add bandwidth, they add card capacity for chipsets that don't support SLI more then anything.. all that bandwidth still needs to be crunched through the 16 lanes the CPU/chipset support. If that bothers you, you'll have to wait for socket 2011, which will support many more lanes of PCI Express 3.0.... If you want to run 3 cards, any of the motherboards such as the asus maximus or gigabyte UD7 which support 16/8/8 or 16/16/8 should work. Again 16/16/16 is useless with this chipset.
Furthermore, the additional latency added on by the NF200 chipset in many cases results in reduced performance over simply not having it there at all.
 
You sure on that? There a P67's out now that use the NF200 chip to get true 16x/16x/8x. The problem is I'd like 3-16x like you can get on the X58 chipset.

Oh god, not again.

The nForce 200MCP does have 32 PCI-Express lanes, but it uses 16 PCI-Express lanes to connect to the chipset and the rest of the system. So the chipset's native PCI-Express lane configuration is still your choke point. This is why there is added latency in nForce 200MCP equipped systems. X58 chipset based boards have a maximum of 36 PCI-Express lanes and tend to use dual nForce 200MCPs. This means that 32 lanes are replaced by 64 lanes on the nForce 200MCP's. Still your choke points are the PCI-Express lanes built into the chipset.

In other words you won't actually get "True" PCI-Express x16x16x16 or 16x16x8 lanes of bandwidth until Intel increases the number of lanes provided in their PCI-Express controllers. Unfortunately though this means that the P55 and P67 chipset based boards are at best limited to an 16x8 configuration or an 8x8x8 configuration. X58's are really limited to 16x16x4, 16x8x8, or 8x8x8x8.

It's been theorized that the nForce 200MCP uses some kind of compression algorithm to send data through the chipset's PCI-Express lanes. However I have no way to verify that. NVIDIA does not share the secrets of the nForce 200MCP and what it actually does. One thing is certain, any "extra bandwidth" it offers comes at a price. The price is latency. Latency which hinders performance compared to not having the chip present at all.

HardOCP's own testing shows that either the difference between 16x16 and 8x8 is minimal (and may actually be totally negligible, only presenting itself in the form of explainable testing variance) or that the nForce 200MCP equipped boards actually run slower than non-nForce 200MCP equipped boards do. While I haven't done this with NVIDIA GeForce GTX 580's I've tested this before myself and found no appreciable difference between PCI-Express x16 and x8 slots with any video card.
 
One of the trade-offs of the DMI architecture is less bandwidth for stuff like this. It's the QPI architecture that excels at this kind of thing, and for that you're going to have to wait until LGA 2011
 
True, you will have to wait till Ivy Bridge is released later this year. P67 was never designed to offer 3x or 4x 16x PCI express slots.

So you will have to use X58 mobo till then. I believe the 990X cpu should be released in the next couple months or sooner.
 
having full 16x support is useless when the CPU and chipset only support 16 lanes of pci express 2.0 total. Nforce 200 chipsets don't add bandwidth, they add card capacity for chipsets that don't support SLI more then anything.. all that bandwidth still needs to be crunched through the 16 lanes the CPU/chipset support. If that bothers you, you'll have to wait for socket 2011, which will support many more lanes of PCI Express 3.0. At this point the difference between 4x and 16x isn't much, so you could even do 3 cards on this chipset.. they'd each effectively get about 5.33333x worth of PCI express bandwidth each given the capacity of the chipset/CPU. If you want to run 3 cards, any of the motherboards such as the asus maximus or gigabyte UD7 which support 16/8/8 or 16/16/8 should work. Again 16/16/16 is useless with this chipset.

Oh god, not again.

The nForce 200MCP does have 32 PCI-Express lanes, but it uses 16 PCI-Express lanes to connect to the chipset and the rest of the system. So the chipset's native PCI-Express lane configuration is still your choke point. This is why there is added latency in nForce 200MCP equipped systems. X58 chipset based boards have a maximum of 36 PCI-Express lanes and tend to use dual nForce 200MCPs. This means that 32 lanes are replaced by 64 lanes on the nForce 200MCP's. Still your choke points are the PCI-Express lanes built into the chipset.

In other words you won't actually get "True" PCI-Express x16x16x16 or 16x16x8 lanes of bandwidth until Intel increases the number of lanes provided in their PCI-Express controllers. Unfortunately though this means that the P55 and P67 chipset based boards are at best limited to an 16x8 configuration or an 8x8x8 configuration. X58's are really limited to 16x16x4, 16x8x8, or 8x8x8x8.

It's been theorized that the nForce 200MCP uses some kind of compression algorithm to send data through the chipset's PCI-Express lanes. However I have no way to verify that. NVIDIA does not share the secrets of the nForce 200MCP and what it actually does. One thing is certain, any "extra bandwidth" it offers comes at a price. The price is latency. Latency which hinders performance compared to not having the chip present at all.

HardOCP's own testing shows that either the difference between 16x16 and 8x8 is minimal (and may actually be totally negligible, only presenting itself in the form of explainable testing variance) or that the nForce 200MCP equipped boards actually run slower than non-nForce 200MCP equipped boards do. While I haven't done this with NVIDIA GeForce GTX 580's I've tested this before myself and found no appreciable difference between PCI-Express x16 and x8 slots with any video card.

That makes a lot of sense. So basically, no matter which bridge chips are used, P67 can only handle a maximum combined of 16x PCI-e. So basically the P67 is effectively a 2x GPU max chipset or otherwise your dealing with serious bottlenecks with 3x high end GPU's?

On X58 motherboards you get 36 PCI-e lanes to the CPU, but the nForce 200's add latency and reduce scaling performance right? But the X58 would make a much better 3 GPU chipset due to having double the PCI-e lanes.

Even if I went say 2x GTX 595's for quad SLI, they would still be seriously hindered by the P67 PCI-e 16x total limitation versus the X58 36x total correct? Effectively making the P67 a poor multi-GPU chipset beyond 2 GPU's.

Does anyone know how the following motherboard will play into all of this using a Lucid Hydra chip?

MSI_BB_Marshal_1.jpg


http://www.semiaccurate.com/2010/12/28/msi-shows-its-big-bang-marshal-board/


Basically what you are saying as even though this board has 4x PCI-e 16x lanes, that all gets compressed down to 16x to talk to the CPU and once again will be seriously hampered with more than 2x GPU's? That would make designing such a board completely pointless.
 
Last edited:
True, you will have to wait till Ivy Bridge is released later this year. P67 was never designed to offer 3x or 4x 16x PCI express slots.

So you will have to use X58 mobo till then. I believe the 990X cpu should be released in the next couple months or sooner.

If I do go with the soon to be released 990x for proper Tri-SLI due to X58's native 36 PCI-e lanes, which one would you recommend? I understand that 8x/8x/8x native would be a better solution versus going with nForce 200 motherboards at 16x/16x/16x correct?

Does anyone know which motherboards would accomplish what I want the best? 990X+3x GTX580's SLI without bridge chips.
 
Last edited:
I think it's all relative, just get a higher end X58 (Evga classified, Asus R3E or gigabyte UD7) and enjoy the 990X along with 16x/8x/8x Tri-Sli as that in itself is awesome.
 
I'd upgrade to sandy bridge no matter what and just upgrade to socket 2011 if it's a big deal to you. Sandy stuff will have good resale. I plan to upgrade when the ivy bridge 22nm stuff hits.. just the mobo and cpu will need replacing which is nice. Get the 2500k and asus p67 pro board and that's only 400 bucks or so. A tri-SLI capable motherboard will be another hundred, and will be adequate certainly for 6 months until socket 2011. Much cheaper then spending 1000 on a hex core processor. I'd never spend 1000 just on a processor but that's me. :)

If you need tri-sli now go 1366.
 
Last edited:
Ya, after reviewing some benchmarks, getting a 990x for gaming would be a waste. It's only really useful for video encoding and all that other crap that I do not do. ;) After reading tons of benchmarks, there really isn't anything worth upgrading to from my i7 920 D0 @ 4.55Ghz for gaming.
 
That makes a lot of sense. So basically, no matter which bridge chips are used, P67 can only handle a maximum combined of 16x PCI-e. So basically the P67 is effectively a 2x GPU max chipset or otherwise your dealing with serious bottlenecks with 3x high end GPU's?

On X58 motherboards you get 36 PCI-e lanes to the CPU, but the nForce 200's add latency and reduce scaling performance right? But the X58 would make a much better 3 GPU chipset due to having double the PCI-e lanes.

Even if I went say 2x GTX 595's for quad SLI, they would still be seriously hindered by the P67 PCI-e 16x total limitation versus the X58 36x total correct? Effectively making the P67 a poor multi-GPU chipset beyond 2 GPU's.

Does anyone know how the following motherboard will play into all of this using a Lucid Hydra chip?

MSI_BB_Marshal_1.jpg


http://www.semiaccurate.com/2010/12/28/msi-shows-its-big-bang-marshal-board/


Basically what you are saying as even though this board has 4x PCI-e 16x lanes, that all gets compressed down to 16x to talk to the CPU and once again will be seriously hampered with more than 2x GPU's? That would make designing such a board completely pointless.

Well there is more to it than that. Essentially in P55/H67/P67 there is one benefit to nForce 200MCP's. Essentially the onboard PCIe controller in those processors is limited to two devices. The nForce 200MCP shows as one device as the devices connected to it aren't presented to the controller directly. That's over simplification of how it works, but basically the nForce 200MCP is a work around or a multiplexor which gets around that two device limit.

It also provides dynamic allocation of PCI-Express lanes to individual slots removing the need for PCI-Express switch cards and or jumpers to configure the allocation of lanes to specific slots. So I'm not saying the nForce 200MCP is useless. It isn't, but it's exact use is often misunderstood as people think it adds PCI-Express lanes. In a way it does but those lanes are bottlenecked by the chipsets PCI-Express lanes which the nForce 200MCP uses for communication to link itself into the rest of the system.

X58 supports more lanes than P55/P67/H67 do. However not all of these boards use the nForce 200MCP. So you can enjoy a latency free SLI rig. But the fact of the matter is that you won't get a true 16x16x16 configuration until the chipsets change and more lanes are allocated. Given that lanes are physical pathways, we may not ever see a 16x16x16 configuration in the truest sense. What we may see instead is PCI-Express 3.0 simply offer far more bandwidth than PCI-Express 1.0/1.0a/2.0 do now. All indications are that we will see exactly that going forward. And once again it will take time before graphics cards saturate the bandwidth provided. Even today we don't really see saturation of PCI-Express 2.0 x8 slots in most circumstances. Even dual GPU cards typically show little improvement going from x8 to x16 slots. Even when you do, again you only see it in multimonitor setups at high resolution.
 
have to keep in mind SB is the lowend stuff im sure socket 2011 will have this.
 
Well there is more to it than that. Essentially in P55/H67/P67 there is one benefit to nForce 200MCP's. Essentially the onboard PCIe controller in those processors is limited to two devices. The nForce 200MCP shows as one device as the devices connected to it aren't presented to the controller directly. That's over simplification of how it works, but basically the nForce 200MCP is a work around or a multiplexor which gets around that two device limit.

It also provides dynamic allocation of PCI-Express lanes to individual slots removing the need for PCI-Express switch cards and or jumpers to configure the allocation of lanes to specific slots. So I'm not saying the nForce 200MCP is useless. It isn't, but it's exact use is often misunderstood as people think it adds PCI-Express lanes. In a way it does but those lanes are bottlenecked by the chipsets PCI-Express lanes which the nForce 200MCP uses for communication to link itself into the rest of the system.

X58 supports more lanes than P55/P67/H67 do. However not all of these boards use the nForce 200MCP. So you can enjoy a latency free SLI rig. But the fact of the matter is that you won't get a true 16x16x16 configuration until the chipsets change and more lanes are allocated. Given that lanes are physical pathways, we may not ever see a 16x16x16 configuration in the truest sense. What we may see instead is PCI-Express 3.0 simply offer far more bandwidth than PCI-Express 1.0/1.0a/2.0 do now. All indications are that we will see exactly that going forward. And once again it will take time before graphics cards saturate the bandwidth provided. Even today we don't really see saturation of PCI-Express 2.0 x8 slots in most circumstances. Even dual GPU cards typically show little improvement going from x8 to x16 slots. Even when you do, again you only see it in multimonitor setups at high resolution.

That last sentence is what has me concerned. Like you, I run 3x 30" surround. It sounds like these PCI-e limitations might affect us more than the mainstream users.

Granted these charts are based off of 5870's and not 580's but for tri-crossfire the 16x PCI-e channel (P55 same as P67) from a NF200 to the CPU doesn't appear to be hampering the performance.

image013.png


About a 4% loss from 16x to 8x and a 12% loss from 16x to 4x.


image046.png


Here is a slight loss going from X58's 36 PCI-e channels down to P55's 16x channels with NF200.


image051.png


Here the performance between X58 and P55 with NF200 is negligible with 3x 5780's. Maybe with SLI/Crossfire bridges in 3+ GPU setups the PCI-e slots for each card aren't getting saturated talking to the CPU.

If you where to base your assumptions off of these charts, 3x 580's would work fine on P67.
 
X58 supports more lanes than P55/P67/H67 do. However not all of these boards use the nForce 200MCP. So you can enjoy a latency free SLI rig. But the fact of the matter is that you won't get a true 16x16x16 configuration.
There is actually a board out there with four true x16 slots without using PCIe bridge chips. Though it's a dual socket xeon board and I don't think it's SLI certified which makes me wonder who it is intended for. Most server cards i've seen are x8 or lower.

http://www.supermicro.com/products/motherboard/QPI/5500/X8DTG-QF.cfm
 
Last edited:
I cannot find any 16x / 16x / 16x PCI-E SandyBridge P67 motherboards. The most I can find is the 16x / 16x / 8x GIGABYTE GA-P67A-UD7.

So if I wanted 3x-SLI 580's one of the cards would be seriously limited. If a mundane 5870 is already capped by PCI-E 8x, I am sure a over clocked 580 pushing 30-40% more bandwidth would be seriously limited.

perfrel.gif


So that effectively means P67 can't do top of the line 3x GPU solutions without a significant penalty. So I guess I have to wait for GTX 595's for Quad SLI or 6990's for Quad Crossfire if I want a 2600K system.

Either that or get a X58 system + an Intel 990X which can support 3x 16x PCI-E or wait for Ivy Bridge in Q4.

What a disappointment I was ready to pull the trigger on SandyBridge 2600k + 3x 580's.

EVGA is releasing a Classified version of their P67 lineup soon. I currently have 570SC SLI & might go for a 3rd down the road so we are in similar situation.
 
That last sentence is what has me concerned. Like you, I run 3x 30" surround. It sounds like these PCI-e limitations might affect us more than the mainstream users.

Granted these charts are based off of 5870's and not 580's but for tri-crossfire the 16x PCI-e channel (P55 same as P67) from a NF200 to the CPU doesn't appear to be hampering the performance.

image013.png


About a 4% loss from 16x to 8x and a 12% loss from 16x to 4x.


image046.png


Here is a slight loss going from X58's 36 PCI-e channels down to P55's 16x channels with NF200.


image051.png


Here the performance between X58 and P55 with NF200 is negligible with 3x 5780's. Maybe with SLI/Crossfire bridges in 3+ GPU setups the PCI-e slots for each card aren't getting saturated talking to the CPU.

If you where to base your assumptions off of these charts, 3x 580's would work fine on P67.

You Keep showing P55 data. P67 is not P55. The PCIe Lanes are very different. They are bi-directional vs uni-directional and have increased in bandwidth. I believe the DMI has even gone from 5GT/s to 20 GT/s.

P67 should perform better @ 8x than P55. and Should be on par @ x58.

Just to give you another tidbit. I currently have a Rampage 2 Extreme which has x16/x8/x8 (which I have 2 GTX 480's and 1 GTX 280 on). And when I first looked into getting that motherboard, EVGA was touting a NF200 TRI-SLI motherboard which had x16/x16/x16. But reviews showed that the Rampage 2 Extreme was indeed faster due to the added latency of the NF200. I think even HardOCP had a review where it showed exactly that.

So the point is moot. I don't think that there is any issue with TRI SLI as it's the same with Rampage 2 Extreme with a NF200. It may be a bit slower, but with the extra CPU power from Overclocking it should be faster anyways.
All the data on the internet for x16/x16 vs x8/x8 is on old data using a P55 and not the P67.

I am moving to P67 with 2 GTX 580s only.
 
The P55 PCI-E 2.0 implementation was only 2.5GTps vice the nominal 5GTps. The X58 uses the 5GTps and so does the P67. Thus, a 16x P55 slot only has as much bandwidth as a 8x slot on X58 or P67. Intel gimped the PCI-E on purpose to push people like you towards the X58.

If you are going to purchase that many GPUs, why not just buy an X58 or wait for LGA2011? Clearly money isn't the biggest issue if you are talking about about 3-4x $500 GPU cards, why go cheap on the MB/CPU setup? ;)

If the differences between X58 and P55 were small (as your graphs mostly show...remember that x% faster only matters if it actually gets you a better experience which is why HardOCP has the best GPU reviews), P67 should be even smaller since the bandwidth to each slot is technically double that of P55. I.e. a 4x P67 slot is equal to a 8x P55 slot and a 8x P67 slot is equal to a 16x P55 slot.
 
Hasn't the pcie bandwidth increased from 2.5 to 5, negating the difference between x8 and x16?
 
This is why I stick with one video card, and one high quality 1680*1050 monitor!
 
I've read this interesting article that says the NF200 can add a small bit of latency but the bandwidth for multi-gpu's talking to each other is worth the trade off. Quote: "While it is true that there will always be a 16X bottleneck between the NF200 and the CPU, the NF200 can allow each of the cards occupying the four slots to communicate directly with each other via the NF200. This means that card-to-card traffic never needs to traverse that 16X bottleneck, only traffic destined directly to the CPU. So there are effectively 32 PCI-E lanes available for cards to talk to each other with, and 16 PCI-E lanes that are used exclusively for card-to-CPU communication. Nvidia also packs some other intelligence into the NF200 that keeps the flow as efficient as possible."

http://www.hardwarecanucks.com/foru...a-p55a-ud7-lga-1156-motherboard-review-3.html


Interesting graph:

1231268597VdOL4b1qrP_1_1_l.gif


At 2560x1600 4x AA the NF200 is just as fast. Seeing this, the NF200 would be pretty much a necessity on P67 multi GPU boards as it allows the graphics cards to talk to each other fairly quickly and not saturate the CPU 16x lane directly.
 
^
Yeah, I remember reading that chart. As for P67 offerings, I am patiently awaiting the release of the MSI P67A-GD80. I should offer TRI-SLI.

Just google P67A-GD80 HD MOD. It should be the very first website listed. It shows a picture of the (hopefully) soon to be released MSI board.
 
I'm not sure that I agree with that assessment. Truly though we don't know exactly what the nForce 200MCP really does but I don't think that graphics cards talk to each other any differently when the nForce 200MCP is present. Certainly AMD cards will not. I think that at 2560x1600+ resolutions, the latency simply becomes a non-issue as other bottlenecks may be a factor. GPU performance characteristics being among the most likely.

In any case I don't think the nForce 200MCP helps or hurts you much either way. In P55 and P67 chipsets it's a good thing due to limitations of the onboard PCIe controller found in those CPUs. X58 doesn't really need it and at resolutions under 2560x1600 it seems it actually can hurt performance to a very small, almost miniscule degree. You'd really need to be a benchmarking whore either way because the differences would be undetectable if you took the "Pepsi Challenge" between systems with and without nForce 200MCPs.
 
You Keep showing P55 data. P67 is not P55. The PCIe Lanes are very different. They are bi-directional vs uni-directional and have increased in bandwidth. I believe the DMI has even gone from 5GT/s to 20 GT/s.
Oh my.

Firstly, the 16 PCIe 2.0 lanes are in the CPU, not the chipset, for both Lynnfield and SB. PCIe lanes come as transmit/receive pairs, for a total of 4 "wires" per lane. Transmit is unidirectional and receive is unidirectional. There is no such thing as bi-directional vs uni-directional. Refer to page 90 of the data sheet, volume 1: http://www.intel.com/design/corei5/documentation.htm As expected, it has 16 PCIe lanes organized as transmit, transmit return, receive and receive return right on the CPU. The Lynnfield PCIe controller supports standard 2.5GT/s (PCIe 1.1) and 5.0GT/s (PCIe 2.0) data rates, just like the SB PCIe controller. See page 24 of the same document. :p

DMI on Lynnfield & SB are 20Gb/s (little b is bits), which is approximately 2GB/s (big B is bytes) of bandwidth. SB block diagram: http://images.bit-tech.net/content_images/2011/01/intel-sandy-bridge-review/p67-blockdiagram.jpg Lynnfield block diagram: http://images.anandtech.com/reviews/motherboards/2009/gigap55u/p55block.jpg DMI bandwidth was unchanged between Lynnfield and SB.

Using Lynnfield as a model of what SB would perform like using the nvidia bridge is a reasonable assumption and it looks like it would be very much worth it if 3 PCIe slots are needed.
 
Oh my.

Firstly, the 16 PCIe 2.0 lanes are in the CPU, not the chipset, for both Lynnfield and SB. PCIe lanes come as transmit/receive pairs, for a total of 4 "wires" per lane. Transmit is unidirectional and receive is unidirectional. There is no such thing as bi-directional vs uni-directional. Refer to page 90 of the data sheet, volume 1: http://www.intel.com/design/corei5/documentation.htm As expected, it has 16 PCIe lanes organized as transmit, transmit return, receive and receive return right on the CPU. The Lynnfield PCIe controller supports standard 2.5GT/s (PCIe 1.1) and 5.0GT/s (PCIe 2.0) data rates, just like the SB PCIe controller. See page 24 of the same document. :p

So does this quote only apply to the PCIe lanes coming off the southbridge?

Intel has been decidedly sneaky in its marketing of the PCI-E capabilities of P55, though. Intel says that P55 supports 'Generation 2' PCI Express, which usually operates at 5GT/sec for an 8x slot. Intel disputes this, claiming "The PCI-E ports on P55 support the PCI Express 2.0 spec at the 2.5 GT/sec speed. The 5GT/sec speed (which is often referred to as Gen 2) is an optional component of the PCI-E 2.0 spec and is not supported on P55.
 
Vega, with your setup you strike me as more of a Socket 2011 guy anyways. Are you not already on X58?
 
So does this quote only apply to the PCIe lanes coming off the southbridge?
Yes. The 5 series chipsets were limited for whatever reason. It's documented in the chipset data sheet. http://www.intel.com/Products/Desktop/Chipsets/P55/P55-technicaldocuments.htm (page 37)

IMO, that was unimportant to most users since graphic card(s) only run off the integrated PCIe controller and add on cards that mainstream and low end LGA 1156 users generally had (sound card, tv card, etc) were not bandwidth constrained. There certainly are some PCIe 2.0 cards that could be choked on that 2.5GT/s bus, such as x1 USB3.0 controllers and x1 RAID controllers (especially with SSDs).
 
Vega, with your setup you strike me as more of a Socket 2011 guy anyways. Are you not already on X58?

Ya I run a X58 with a 920 D0 @ 4.55ghz but I only have the slots to run 2x GPU's. I want to get a third so I'd need another motherboard anyways, hence my interest in a 5+Ghz 2600k tri-6970 system. At the resolutions I play at, the CPU and the GPU's are both the bottlenecks. :D

I will definitely be upgrading to 2011 when that comes out, but since the SB stuff is so cheap I am considering a (side-grade) 2600k system to get that extra GPU in the meantime. I know, two new computer builds in one year is a bit excessive but it's an addiction. :eek:

Probably the smart thing to do is just wait and get dual 6990's as I could fit two of those on my current X58 with full 2x 16x pci-e to the CPU. It will be interesting to see if 2x 6990's will end up being faster than 3x 6970's.
 
I've read this interesting article that says the NF200 can add a small bit of latency but the bandwidth for multi-gpu's talking to each other is worth the trade off. Quote: "While it is true that there will always be a 16X bottleneck between the NF200 and the CPU, the NF200 can allow each of the cards occupying the four slots to communicate directly with each other via the NF200. This means that card-to-card traffic never needs to traverse that 16X bottleneck, only traffic destined directly to the CPU. So there are effectively 32 PCI-E lanes available for cards to talk to each other with, and 16 PCI-E lanes that are used exclusively for card-to-CPU communication. Nvidia also packs some other intelligence into the NF200 that keeps the flow as efficient as possible."

http://www.hardwarecanucks.com/foru...a-p55a-ud7-lga-1156-motherboard-review-3.html


Interesting graph:

1231268597VdOL4b1qrP_1_1_l.gif


At 2560x1600 4x AA the NF200 is just as fast. Seeing this, the NF200 would be pretty much a necessity on P67 multi GPU boards as it allows the graphics cards to talk to each other fairly quickly and not saturate the CPU 16x lane directly.

Am I reading the chart wrong or is it faster without the NF200.
 
Yes, with the NF200 it is slightly slower. Although, I think this information is useful in so far as that a NF200 would be needed on P67 to run 16x/8x/8x for tri-gpu without too much I'll effect.
 
What the NF200 does that benefits multi-gpu configurations is not being addressed here. As we're all aware, SLI/Crossfire doesn't provide pooling of frame buffer. Because of this, a majority of data broadcasting between the CPU and 2-4 GPUs is redundant.

True, the 16x interface between (technically I believe the P67 chipset features 8 lanes of PCI-E 3.0 subdivided into 16 lanes of PCI-E 2.0 bandwidth, though I may be mistook) the CPU/GPU is not alloted multiplied bandwidth. There's no mystical implementation that increases inherent CPU bandwidth. Applying focus here simply, utterly, misses the point.

What the nf200 controller does is dramatically increase the inter-GPU bandwidth. It uses two specific functions whilst operating as a communication HUB between the CPU, memory controller, and PCI-E bus. These functions are as follows: PW Short, Broadcast.

Broadcast acts as a mechanism for replicating a single signal that normally would occupy 2-4x the PCI-E bandwidth on the 16x highway, if routed utilizing normal protocol. Instead, the CPU but requires provision of a solitary packet & thus replicated by the nf200 switch, subsequently then routed to the other 2-4 GPUs. Because these GPUs are able to communicate with each other AND the nf200 hub with 32 *real* lanes of bandwidth, the 16x lanes of CPU only bandwidth are left freer. Dramatically less congested. And in practice, it works phenomenally - especially taking under consideration the absolute limitation of these "mainstream" chipsets.

PW Short is another, similar benefit of the nf200. In traditional SLI/Crossfire rigs, any inter-GPU communication is required to travel up the PCI-E bus, stored in memory, identified & rerouted (taking up precious CPU cycles) to the target GPU..back down the bus (taking up precious PCI-E bandwidth). Without being detail-specific informed here, I won't attempt to provide an ignorance driven expose & likewise make myself look ridankulous. The gist of it, though, is that because only one of the GPUs behaves as a master, the nf200's logic can interfere with the aforementioned chipset travel and route inter-GPU traffic more efficiently, faster, and (again) salvaging the limited 16x CPU bandwidth.

And ya'll, it really works. WIth a triple crossfire config, I've managed to get some blistering-fast results with no apparent PCI-E bandwidth limitation. As follows are at 1920x1080 (eyefinity results are equally impressive, just nothing I can quote atm). Just Cause 2, desert sunrise all maximized : 186 FPS. Heaven 2.1 normal tessellation, otherwise default: 146 fps. 3dmark Vantage: P41065, GPU: 46640. BFBC2: all maximum settings, first level: 182 fps. SF IV, again maximized: 332 FPS. Resident Evil 5 maximum, 208 fps. Crysis Warhead, Ambush benchmark / enthusiast + 8xaa 97FPS; + 0xaa 109FPS.

This is my first post, btw. I adore this website & am often lurking about the forums. Nice to meet ya'll!
 
True, the 16x interface between (technically I believe the P67 chipset features 8 lanes of PCI-E 3.0 subdivided into 16 lanes of PCI-E 2.0 bandwidth, though I may be mistook) the CPU/GPU is not alloted multiplied bandwidth.
Welcome!

But no on your assumption there. :p Lynnfield and SB have an integrated PCIe 2.0 x16 interface right on the CPU die. The 5 and 6 series chipsets have a separate PCIe 2.0 8 lane interface that can be divided up in different ways (x1, x2, x4).

Probably no PCIe 3.0 until Ivy Bridge or the 7 series chipsets, unless of course the 6 series chipsets have the capability now and it's just disabled.
 
What the NF200 does that benefits multi-gpu configurations is not being addressed here. As we're all aware, SLI/Crossfire doesn't provide pooling of frame buffer. Because of this, a majority of data broadcasting between the CPU and 2-4 GPUs is redundant.

True, the 16x interface between (technically I believe the P67 chipset features 8 lanes of PCI-E 3.0 subdivided into 16 lanes of PCI-E 2.0 bandwidth, though I may be mistook) the CPU/GPU is not alloted multiplied bandwidth. There's no mystical implementation that increases inherent CPU bandwidth. Applying focus here simply, utterly, misses the point.

What the nf200 controller does is dramatically increase the inter-GPU bandwidth. It uses two specific functions whilst operating as a communication HUB between the CPU, memory controller, and PCI-E bus. These functions are as follows: PW Short, Broadcast.

Broadcast acts as a mechanism for replicating a single signal that normally would occupy 2-4x the PCI-E bandwidth on the 16x highway, if routed utilizing normal protocol. Instead, the CPU but requires provision of a solitary packet & thus replicated by the nf200 switch, subsequently then routed to the other 2-4 GPUs. Because these GPUs are able to communicate with each other AND the nf200 hub with 32 *real* lanes of bandwidth, the 16x lanes of CPU only bandwidth are left freer. Dramatically less congested. And in practice, it works phenomenally - especially taking under consideration the absolute limitation of these "mainstream" chipsets.

PW Short is another, similar benefit of the nf200. In traditional SLI/Crossfire rigs, any inter-GPU communication is required to travel up the PCI-E bus, stored in memory, identified & rerouted (taking up precious CPU cycles) to the target GPU..back down the bus (taking up precious PCI-E bandwidth). Without being detail-specific informed here, I won't attempt to provide an ignorance driven expose & likewise make myself look ridankulous. The gist of it, though, is that because only one of the GPUs behaves as a master, the nf200's logic can interfere with the aforementioned chipset travel and route inter-GPU traffic more efficiently, faster, and (again) salvaging the limited 16x CPU bandwidth.

And ya'll, it really works. WIth a triple crossfire config, I've managed to get some blistering-fast results with no apparent PCI-E bandwidth limitation. As follows are at 1920x1080 (eyefinity results are equally impressive, just nothing I can quote atm). Just Cause 2, desert sunrise all maximized : 186 FPS. Heaven 2.1 normal tessellation, otherwise default: 146 fps. 3dmark Vantage: P41065, GPU: 46640. BFBC2: all maximum settings, first level: 182 fps. SF IV, again maximized: 332 FPS. Resident Evil 5 maximum, 208 fps. Crysis Warhead, Ambush benchmark / enthusiast + 8xaa 97FPS; + 0xaa 109FPS.

This is my first post, btw. I adore this website & am often lurking about the forums. Nice to meet ya'll!

I've never heard that information concerning the nForce 200MCP. I'm not saying your wrong, but I've never found any data on what it does in so much detail. Even NVIDIA hasn't provided that much information when I've asked them. Neither could Intel when asked about nForce 100MCP and nForce 200MCP chipsets being added to boards. Do you have a link to information concerning the operation of the nForce 200MCP?

As for the chip actually improving inter-GPU bandwidth, I have doubts. AMD and NVIDIA have both stated previously that no communication goes over the PCI-Express bus on any of the cards which use SLI or Crossfire bridges. Therefore that seems unlikely. Benchmarks with and without nForce 200MCP's on X58 chipset based boards seem to confirm that there is little to no value in having the nForce 200MCP onboard. If anything the nForce 200MCP tends to hurt performance more than it helps. At least on the X58 chipset. P55 and P67 are a different matter and that's been covered many times in the thread thus far.
 
Back
Top