To kyle: question about 10.8a drivers effect on 5970 scaling

genetech · Sep 1, 2010

i was reading the article about CFX improvement with the 10.8a drivers and was wondering if this driver effects the 5970 as well. even though it is still a single card shouldn't it also receive a performance boost?

Balthazor · Sep 1, 2010

The 5970 should benefit from crossfire improvements.

Dan_D · Sep 1, 2010

It would benefit from the Crossfire application profiles. Unfortunately I don't think it does too much for CrossfireX scaling. (Dual Radeon HD 5970's) ATI is still boning us on 3 and 4 GPU scaling. When you get down to it the 5970 is really a pair of underclocked Radeon HD 5870's on a single PCB. Overclocked to 5870 speeds, the thing performs the same as dual 5870's. So if Radeon HD 5870 Crossfire scaling is improved, so it will be with the Radeon HD 5970.

genetech · Sep 3, 2010

i can see what you mean. ATI doesn't exactly have the best bandwidth with their products, and using more than 2 GPUs can quickly eat away at the that. Nvidia scales better because they have massive amount of bandwidth with their nf200 chipset, but i don't know what ATI uses for their PCI-E controller

Pliskin · Sep 3, 2010

I wonder if nvidia's scaling is better because they only need to focus on 3 way and same card sli instead of 4 way and multiple card crossfire.

RiZnO · Sep 3, 2010

genetech said:
i can see what you mean. ATI doesn't exactly have the best bandwidth with their products, and using more than 2 GPUs can quickly eat away at the that. Nvidia scales better because they have massive amount of bandwidth with their nf200 chipset, but i don't know what ATI uses for their PCI-E controller

I thought the bandwidth "thing" was debunked last month by Kyle? The way I took it was that even the bandwidth at 4x was enough for each card..

Dan_D · Sep 3, 2010

genetech said:
i can see what you mean. ATI doesn't exactly have the best bandwidth with their products, and using more than 2 GPUs can quickly eat away at the that. Nvidia scales better because they have massive amount of bandwidth with their nf200 chipset, but i don't know what ATI uses for their PCI-E controller

This isn't a bandwidth issue. Neither NVIDIA or AMD's cards need more than 4 PCI-Express 2.0 lanes. (Equivalent to PCI-Express 1.0/1,0a with 8 lanes) Both the recent articles on this here and here refute that as a possibility. Now the 5970 wasn't specifically tested and given that it is a dual GPU card it may need more bandwidth than PCI-Express x4 connections can deliver. However 8 lanes is more than sufficient.

Also, it is a myth that the nForce 200 MCP's provide more bandwidth. They multiplex the connnection. The choke point is the PCI-Express lanes into the chipset which the nForce 200 MCP's take all of. So if you've got 4 PCI-Express x16 cards installed then you are eating 64 lanes worth of bandwidth. Two nForce 200 MCP's supposedly provide this but the fact is they have a 36 lane bridge into the X58 chipset. So again it turns into a choke point. We don't see this have any negative impact because PCI-Express cards have yet to saturate the bandwidth. When they do the nForce 200 MCP equipped boards may be in trouble. Then again they'll be no worse off than regular non-nForce 200 MCP equipped boards which won't have the necessary bandwidth either. By that time PCI-Express 3.0 will be out and we'll be using that.

Boards with the nForce 200 MCP have been tested and tested and in each case I've seen the nForce 200 MCP did nothing for performance and in fact actually hurt performance to a very slight degree. All these chips do is add latency, heat and complexity to the motherboard. In fact the only functionality they provide that is of any value is "load balancing" of the PCI-Express bus. By that I mean that if you've got nForce 200 MCP's on your board, you can have 4 or more PCI-Express x16 (physical) slots and each slot can have the same amount of bandwidth. With non-nForce 200 MCP equipped boards you can't really do that. Each slot is going to have a finite amount of bandwidth. So a 16x16x4 configuration is going to be about the most you are going to get. In the case of nForce 200 MCP equipped boards you could do 16x16x16x16. However again you've got a maximum of 36 lanes going into the chipset, so you don't really get any more performance out of the setup. You will however get the same performance (in theory) out of all three or four of your graphics cards.

And largely I think its a marketing thing. People were hung up on having "true PCI-Express x16 slots" back in the NVIDIA chipset days. This allows manufacturers to say that again on their printed spec sheets because it is technically true. Each slot will have 16 lanes or whatever. What the motherboard manufacturers don't tell you is that it really doesn't mean anything as the current boards are still constrained by the chipsets themselves and that the nForce 200 MCPs do nothing for actual performance.

Pliskin said:
I wonder if nvidia's scaling is better because they only need to focus on 3 way and same card sli instead of 4 way and multiple card crossfire.

I doubt it. NVIDIA I think simply puts more focus on performance. It may also be that their GPU designs tend to benefit more from SLI than AMD's benefit from Crossfire. This may be due to their different approaches in design. Its really hard to say, but it seems like a focus issue to me. NVIDIA works hard at making sure multiple card setups work properly so customers will buy more than one card a generation. AMD has had problems with Crossfire scaling forever and have only started to address them in any meaningful way recently. This is just my perception of the situation but it seems to make sense.

RiZnO said:
I thought the bandwidth "thing" was debunked last month by Kyle? The way I took it was that even the bandwidth at 4x was enough for each card..

It was. Generally speaking PCI-Express 2.0 x4 slots are enough for today's graphics cards. This is no surprise to me because PCI-Express 1.0/1.0a x8 slots were enough for SLI configurations in the past including the 7950GX2. PCI-Express 2.0 x4 slots provide the same amount of bandwidth. We are still quite a ways from saturating PCI-Express in general. I expect this to continue for several more years.

2Luke2 · Sep 3, 2010

I'm using two Sapphire 4GB 5970s and they still don't see any gains to write home about with 10.8a.

DeathPrincess · Sep 3, 2010

Pliskin said:
I wonder if nvidia's scaling is better because they only need to focus on 3 way and same card sli instead of 4 way and multiple card crossfire.

There have been Evga 4 way sli on 4x GTX 280s and similar to 5970 "dual" card 4 way SLI on 295s etc. Also hybrid Sli uses different cards.

Dan_D · Sep 3, 2010

2Luke2 said:
I'm using two Sapphire 4GB 5970s and they still don't see any gains to write home about with 10.8a.

That's unfortunate. Actually that's sad.

junior_mint · Sep 3, 2010

I am using two 5970's and I notice good improvements with 10.8a. For example I couldn't even play AvP 5760 x 1080p with 2xaa with a single 5970 but with two its playable unfortunately average fps is 27 but thats with ultra settings

PrincessFrosty · Sep 4, 2010

Yes the 5870 is essentially 2 5870 GPUs running at 5850 speeds, on one board with a bridge chip to simulate regular Crossfire.

genetech said:
i can see what you mean. ATI doesn't exactly have the best bandwidth with their products, and using more than 2 GPUs can quickly eat away at the that. Nvidia scales better because they have massive amount of bandwidth with their nf200 chipset, but i don't know what ATI uses for their PCI-E controller

There is no bandwidth issues, I don't know whether you're talking about PCI-E bandwidth (of which there is plenty) or if you're talking about memory bandwidth, in the case of memory bandwidth, but in crossfire each card has it's own dedicated memory so there is no video memory bottlenecks.

BlueStorm · Sep 4, 2010

Dan_D said:
This isn't a bandwidth issue. Neither NVIDIA or AMD's cards need more than 4 PCI-Express 2.0 lanes. (Equivalent to PCI-Express 1.0/1,0a with 8 lanes) Both the recent articles on this here and here refute that as a possibility. Now the 5970 wasn't specifically tested and given that it is a dual GPU card it may need more bandwidth than PCI-Express x4 connections can deliver. However 8 lanes is more than sufficient.

Also, it is a myth that the nForce 200 MCP's provide more bandwidth. They multiplex the connnection. The choke point is the PCI-Express lanes into the chipset which the nForce 200 MCP's take all of. So if you've got 4 PCI-Express x16 cards installed then you are eating 64 lanes worth of bandwidth. Two nForce 200 MCP's supposedly provide this but the fact is they have a 36 lane bridge into the X58 chipset. So again it turns into a choke point. We don't see this have any negative impact because PCI-Express cards have yet to saturate the bandwidth. When they do the nForce 200 MCP equipped boards may be in trouble. Then again they'll be no worse off than regular non-nForce 200 MCP equipped boards which won't have the necessary bandwidth either. By that time PCI-Express 3.0 will be out and we'll be using that.

Boards with the nForce 200 MCP have been tested and tested and in each case I've seen the nForce 200 MCP did nothing for performance and in fact actually hurt performance to a very slight degree. All these chips do is add latency, heat and complexity to the motherboard. In fact the only functionality they provide that is of any value is "load balancing" of the PCI-Express bus. By that I mean that if you've got nForce 200 MCP's on your board, you can have 4 or more PCI-Express x16 (physical) slots and each slot can have the same amount of bandwidth. With non-nForce 200 MCP equipped boards you can't really do that. Each slot is going to have a finite amount of bandwidth. So a 16x16x4 configuration is going to be about the most you are going to get. In the case of nForce 200 MCP equipped boards you could do 16x16x16x16. However again you've got a maximum of 36 lanes going into the chipset, so you don't really get any more performance out of the setup. You will however get the same performance (in theory) out of all three or four of your graphics cards.

And largely I think its a marketing thing. People were hung up on having "true PCI-Express x16 slots" back in the NVIDIA chipset days. This allows manufacturers to say that again on their printed spec sheets because it is technically true. Each slot will have 16 lanes or whatever. What the motherboard manufacturers don't tell you is that it really doesn't mean anything as the current boards are still constrained by the chipsets themselves and that the nForce 200 MCPs do nothing for actual performance.

I doubt it. NVIDIA I think simply puts more focus on performance. It may also be that their GPU designs tend to benefit more from SLI than AMD's benefit from Crossfire. This may be due to their different approaches in design. Its really hard to say, but it seems like a focus issue to me. NVIDIA works hard at making sure multiple card setups work properly so customers will buy more than one card a generation. AMD has had problems with Crossfire scaling forever and have only started to address them in any meaningful way recently. This is just my perception of the situation but it seems to make sense.

It was. Generally speaking PCI-Express 2.0 x4 slots are enough for today's graphics cards. This is no surprise to me because PCI-Express 1.0/1.0a x8 slots were enough for SLI configurations in the past including the 7950GX2. PCI-Express 2.0 x4 slots provide the same amount of bandwidth. We are still quite a ways from saturating PCI-Express in general. I expect this to continue for several more years.

You've missed the point of having an NF200. The point is to provide bandwidth between the GPUs, not from the chipset to the CPU. There is no need for that. So the bottleneck argument doesn't really make sense.

If you don't have enough bandwidth between GPUs the resource transfers that happen in AFR mode (when NVIDIA either can't get rid of them with an SLI profile or chooses not to get rid of certain transfers with a profile because it will cause corruption due to how the game relies on data to persist between frames) will not transfer as fast as they could, leading to somewhat reduced performance.. of course that's assuming there is enough data being transferred that bandwidth makes a difference. The transfers between the GPUs are usually direct and hence all they do is go through NF200 to the other board in another PCIE slot.

You would notice a huge difference if a game is not profiled for SLI (or you renamed the exe) and you set it to use "AFR1" mode in the NVIDIA control panel... the bandwidth between GPUs can actually help then. If you rename an app like GTA4 and set it to AFR1 you might see a difference in perf there because there is so much data being transferred between GPUs each frame.

But since NVIDIA is on top of providing profiles for games though, this benefit is nullified since they have already killed most of the transfers that happen anyway.. so the extra bandwidth goes to waste.

My point is simply that in theory it isn't pointless.. it should help for games that are not profiled for SLI where you just set AFR mode (though even then you could just make your own profile through trial and error, or just rename to another EXE and hope perf is good and you don't get corruption). But obviously that's not much of a selling point given the amount of SLI profiling that NVIDIA does. I'm just making a technical point.

jeremyshaw · Sep 4, 2010

BlueStorm said:
You've missed the point of having an NF200. The point is to provide bandwidth between the GPUs, not from the chipset to the CPU. There is no need for that. So the bottleneck argument doesn't really make sense.

If you don't have enough bandwidth between GPUs the resource transfers that happen in AFR mode (when NVIDIA either can't get rid of them with an SLI profile or chooses not to get rid of certain transfers with a profile because it will cause corruption due to how the game relies on data to persist between frames) will not transfer as fast as they could, leading to somewhat reduced performance.. of course that's assuming there is enough data being transferred that bandwidth makes a difference. The transfers between the GPUs are usually direct and hence all they do is go through NF200 to the other board in another PCIE slot.

You would notice a huge difference if a game is not profiled for SLI (or you renamed the exe) and you set it to use "AFR1" mode in the NVIDIA control panel... the bandwidth between GPUs can actually help then. If you rename an app like GTA4 and set it to AFR1 you might see a difference in perf there because there is so much data being transferred between GPUs each frame.

But since NVIDIA is on top of providing profiles for games though, this benefit is nullified since they have already killed most of the transfers that happen anyway.. so the extra bandwidth goes to waste.

My point is simply that in theory it isn't pointless.. it should help for games that are not profiled for SLI where you just set AFR mode (though even then you could just make your own profile through trial and error, or just rename to another EXE and hope perf is good and you don't get corruption). But obviously that's not much of a selling point given the amount of SLI profiling that NVIDIA does. I'm just making a technical point.

And that theory is completely debunked by.....:

http://hardocp.com/article/2010/08/25/gtx_480_sli_pcie_bandwidth_perf_x16x16_vs_x4x4/

http://hardocp.com/article/2010/08/23/gtx_480_sli_pcie_bandwidth_perf_x16x16_vs_x8x8/

For X8/X8, they even did a nVidia Surround (ATi Eyefinity) resolution test - where massive amounts of data should be used in outputting data to the monitors, and transmitted between cards.

And once you get to dual nf200... well... they are still sharing the same connection BETWEEN each other (X58 chipset, normally), so 3+ card scaling shouldn't benefit from the added latency of a PCIe bridge (effectively what the nf200 is), and the lack of actual bandwidth increase between dual nf200 setups means there isn't a benefit.

Dan_D · Sep 4, 2010

BlueStorm said:
You've missed the point of having an NF200. The point is to provide bandwidth between the GPUs, not from the chipset to the CPU. There is no need for that. So the bottleneck argument doesn't really make sense.

If you don't have enough bandwidth between GPUs the resource transfers that happen in AFR mode (when NVIDIA either can't get rid of them with an SLI profile or chooses not to get rid of certain transfers with a profile because it will cause corruption due to how the game relies on data to persist between frames) will not transfer as fast as they could, leading to somewhat reduced performance.. of course that's assuming there is enough data being transferred that bandwidth makes a difference. The transfers between the GPUs are usually direct and hence all they do is go through NF200 to the other board in another PCIE slot.

You would notice a huge difference if a game is not profiled for SLI (or you renamed the exe) and you set it to use "AFR1" mode in the NVIDIA control panel... the bandwidth between GPUs can actually help then. If you rename an app like GTA4 and set it to AFR1 you might see a difference in perf there because there is so much data being transferred between GPUs each frame.

But since NVIDIA is on top of providing profiles for games though, this benefit is nullified since they have already killed most of the transfers that happen anyway.. so the extra bandwidth goes to waste.

My point is simply that in theory it isn't pointless.. it should help for games that are not profiled for SLI where you just set AFR mode (though even then you could just make your own profile through trial and error, or just rename to another EXE and hope perf is good and you don't get corruption). But obviously that's not much of a selling point given the amount of SLI profiling that NVIDIA does. I'm just making a technical point.

No.

See below.

jeremyshaw said:
And that theory is completely debunked by.....:

http://hardocp.com/article/2010/08/25/gtx_480_sli_pcie_bandwidth_perf_x16x16_vs_x4x4/

http://hardocp.com/article/2010/08/23/gtx_480_sli_pcie_bandwidth_perf_x16x16_vs_x8x8/

For X8/X8, they even did a nVidia Surround (ATi Eyefinity) resolution test - where massive amounts of data should be used in outputting data to the monitors, and transmitted between cards.

And once you get to dual nf200... well... they are still sharing the same connection BETWEEN each other (X58 chipset, normally), so 3+ card scaling shouldn't benefit from the added latency of a PCIe bridge (effectively what the nf200 is), and the lack of actual bandwidth increase between dual nf200 setups means there isn't a benefit.

I linked to both those articles earlier. We knew this before from earlier tests conducted on the P6T6 WS Revolution when the X58/nForce 200 MCP boards came out. The notion that the nForce 200 MCP did anything of real tangible value was debunked then. When people asked for a nForce 200 MCP-less version of the EVGA X58 3X SLI Classified (760 version) their wish was granted and that board was no worse off than the nForce 200 MCP version for it. What NVIDIA tells motherboard manufacturers the chips do is irrelevant. What the chip is supposed to do in theory for (inter-GPU communications) is irrelevant. What it does in reality is irrelevant because it doesn't do anything but use power, generate heat, and add latency to the motherboard. I believe NVIDIA requires nForce 200 MCP's for 3-Way SLI and 4-Way SLI certification and configurations, but this requirement may be artificial and largely for licensing purposes rather than technical purposes.

Also, the entire idea that they help with communication between cards is ludicrous because both ATI/AMD and NVIDIA have said in the past that the communication occurs primarily via the Crossfire and SLI bridges respectively. The only cards that communicate between each other via the PCI-Express bus are those that do not require an SLI bridge or Crossfire bridge to enable the feature. In other words, only low end cards that are not powerful enough to saturate the bus in any way shape or form use this method of communication.

The nForce MCP's and their usefulness on Intel chipset based boards is and always has been dubious. It has been at best an identifier for SLI certification to NVIDIA drivers and a chip to add PCI-Express 2.0 compatibility to older PCI-Express 1.0 chipsets suich as the NVIDIA nForce 680i SLI. That's all the nForce 780i SLI was. 680i SLI + nForce 200 MCP. For motherboard manufacturers it aids them in marketing the boards as having more PCI-Express lanes than they would normally have. That's pretty much it as I see things. Though its really hard to tell because specifications and the exact functions of the nForce MCP's as they relate to well anything are pretty much impossible to find.

One thing I know for sure is that the proof is in the pudding. Boards with the nForce 200 MCP's do not score better in tests than boards without them do. If anything they score worse. The difference isn't huge, but it is there and it isn't a positive impact. To me that makes those chips virtually useless.

That was the conclusion reached here. And the latest videos / reviews done on the subject show that's still the case.

Brent_Justice · Sep 5, 2010

genetech said:
i was reading the article about CFX improvement with the 10.8a drivers and was wondering if this driver effects the 5970 as well. even though it is still a single card shouldn't it also receive a performance boost?

Dan_D said:
It would benefit from the Crossfire application profiles. Unfortunately I don't think it does too much for CrossfireX scaling. (Dual Radeon HD 5970's) ATI is still boning us on 3 and 4 GPU scaling. When you get down to it the 5970 is really a pair of underclocked Radeon HD 5870's on a single PCB. Overclocked to 5870 speeds, the thing performs the same as dual 5870's. So if Radeon HD 5870 Crossfire scaling is improved, so it will be with the Radeon HD 5970.

Yes, the 5970 will benefit from CAP 10.8a because it is "CrossFireX" on a card. Two GPUs linked together.

Also, the 10.8 drivers have improved Quad GPU i.e. 2x 5970 CrossFireX. So, to get the best performance with Quad GPU (5970 CrossFireX) have Cat 10.8 installed plus CAP 10.8.a

Just keep this general rule in mind, if you are running more than one AMD GPU, install the latest CAP at all times, have the latest driver and the latest CAP installed, both of them, and you'll get the maximum performance you can get. It won't hurt to install it with a single GPU either, but it also won't help, but if you feel better installing it with a single GPU go ahead, heh.

Brent_Justice · Sep 5, 2010

2Luke2 said:
I'm using two Sapphire 4GB 5970s and they still don't see any gains to write home about with 10.8a.

Not surprised, since 10.8a focused mainly on AvP and BC2 improvements. Still, make sure to have Catalyst 10.8 driver and CAP 10.8a installed, and that is the best performance you are going to get right now.

Brent_Justice · Sep 5, 2010

And just to provide some relevant info to the thread, I believe that this is the bridge chip used on the 5970

http://www.plxtech.com/products/expresslane/pex8647

I do know that the chip uses 3x x16 connections, a 48 lane chip, there is a x16 connection to each GPU and then a x16 connection to the PCIe slot, so the entire system operates at x16 throughout

Brackle · Sep 5, 2010

Where do these crazy rumors about Nvidia SLI being better using the mcp 200 chip.

Shit today my roommate told me he was reading on the net that 8gigs over 4gigs helps run SLI/Crossfire better and more smoother?...

Crazy bullshit out there

VE2007 · Sep 5, 2010

I have been using 10.8 drivers for one week and everything was perfect even playing games but when I use You tube it freezes and I get blue dots everywhere. I'm going to Stay away from you tube weird

To kyle: question about 10.8a drivers effect on 5970 scaling

Limp Gawd

Gawd

Extremely [H]

Limp Gawd

Limp Gawd

Gawd

Extremely [H]

n00b

Fully [H]

Extremely [H]

n00b

Supreme [H]ardness

Gawd

[H]F Junkie

Extremely [H]

Moderator

Moderator

Moderator

Old Timer

Gawd