PCIe speed folding performance

Nathan_P

[H]ard DCOTM x3
Joined
Mar 2, 2010
Messages
3,490
There have been lots of threads on various forums recently about the effect or not of varying PCIe bus speeds on folding performance, gaming performance does take a hit on lower speeds due to bandwidth but does the same hold true for F@H? A previous attempt to test this on my Z9PE rig failed as the WU kept dumping when i changed the GPU into a different and I had no access to a PCIe 3.0 x16 slot due to the CPU's fitted, with my new rig I can now do at least some tests.

Speeds tested will be PCIe 3.0 x16, PCIe 2.0 X16 and PCIe 1.1 x16, this will give us numbers for 3.0 x16, 2.0 x16, 1.1 x16 and 3.0 x8 as bandwidth is the same as 2.0 x16. All tests will be run using FAHBench 2.2.5 and a single WU that was downloaded and run for 15% at each bus speed.

Results were as follows

OK, 1st test on an actual WU - haven't had a chance to do any proper testing as win7 won't install properly and windows is a lot easier to run a bunch of WU tests on the Linux. This is taken from my hfm logs on my main folding machine running the zorin 9 distro.

PCIe 2.0 @ x16 project 10490 tpf 2.40 to 2.42 on a GTX 970. PCIe 3.0 @ x16, project 10490 tpf 2.40 - no change between PCIe 2.0 and 3.0 x16 at this time - this bears out the gaming performance speeds found elsewehere.

PCIe 2.0 @ x8 project 9151 tpf 1.13 to 1.26 on a GTX 980ti, PCIe 3.0 x 16 project 9151 1.07 - 6 second change between current and previous recorded minimum time - this takes ppd from 422k PPD to 479k PPD. Again the gaming performance speeds do show a drop off at PCIe 2.0 x8

As projects run through over the next couple of days i'll post more results and try some PCIe 1.1 tests but early indications are that PCIe 2.0 x8 and below do affect performance and the faster the card the bigger the hit.
 
Thanks for the update Nathan. I plan on getting a 1080 Ti if/when benchmarks show a considerable improvement over 980 Ti performance. I wonder if that (or whatever new Titan comes out) will be the first card to exceed PCI-E 3.0 X8 bandwidth? Do you plan on getting one for testing? If not, maybe someone here can let you borrow one for a week to put it through it's paces. I think that kind of information might have a large impact on GPU F@H purchasing decisions. Until then I'm going to have to seriously consider true PCI-E 3.0 X16 bandwidth in a simple 2-card SLI configuration. At least then I know the CPU has enough PCI-E lanes to handle it natively on a higher-end board and I'm safe for a few generations at least. I've been doing some reading on PLX chips as well. I don't know how those affect F@H, maybe I can find an article on that as well.
 
I'm looking at either a 1070 or 1080 depending on finances as now my own testing means I either have to update my Z9PE with ivy based chips or suffer the associated PPD hit. My plan to sell off a pair of ivy based xeon ES's before prices crash is now dead in the water
 
Shouldn't any 2011 have enough PCI-E lanes to run 4 cards at X16 on that board with two CPUs installed? Or was there not enough spice in your ES chips? Or were they clocked too low for F@H? What's the specs on the ES? I've got two dual-2011 Supermicro boards coming my way eventually and although they're a bit anemic on the PCI-E slots, I've been thinking about BOINC CPU projects as well. Right now I've got Gen 1 2620s, and although they're clocked low, they're a pretty good bang for the buck for some game servers I was running a while back.
 
Its currently got sandy based xeon's in it and to pull back on power I've only got 1 cpu installed which means either jamming the cards together in slots 1 & 3 or using slot 4 at x8. All the cpu's have plenty of spice just wanted to keep the cards cooler - given the results over the weekend i'll be lobbing both ivy xeon in at some point.

Spec's on the v1's are E5-2665 ES 8c/16t running at 2.3 - B0 cpu's so no market for selling them or E5-2692 v2 running at 2.0/2.4 turbo - There is a market for these so decisions decisions.

Now question of the day 980ti or 1070FE - there are the same price at one of my favourite etailers?
 
Why run HT on F@H? Doesn't it decrease the PPD when you're limited to half a core? And are the ES able to run in any board, or just boards like Asus that allow tweaking? If they can run in a Supermicro board I'd have to say that there is a market, maybe just not official channels. I can see what you mean by pulling back on power, TDP is 115 per chip. I'm lucky in that I have access to lots of free power, as long as I provide the cooling and tech. So are you deciding between two pairs of CPUs that you already own, or thinking about buying the v2s? And why am I not seeing the E5-2692 v2 on Intel Ark spec sheets or for sale? And the 8 core vs. 12 core debate only matters if you're using those cores for something. Death of Bigadv means what, BOINC projects with spare cores? Two cores and two GPUs flat out, you're probably looking at getting close to the 900 watt mark. I guess the 1070 might have the power advantage, but it's early in F@H, and I'm waiting for the big daddy 1080 Ti specs to come out. I know there has been some early issues with fan noise intervals on the 1080, might affect the 1070 too, if that matters. And if you're looking for a 980 Ti for a bit less, I've currently got a "friends" sale going on. Let me know if you're interested. I'm keeping my 690, but I have a 970 and several 980 Ti cards up for grabs, plus a couple of 750s if you're looking for high PPD/Watt/Dollar. I almost did a 7-way 750 Ti setup, but decided against it because I didn't want to water cool or make a custom case with risers.
 
HT helps when running a cpu WU, not much use when the rig basically idles so I usually turn it off. Some ES's run in any board, some do not. I'm not sure on supermicro ass I've never had one of their boards, asus have always played nice with ES chips so I haven't switched to another brand. I have both pairs of cpu's and wanted to sell the v2's to free funds up for another gpu or 2. The 2692v2 is an OEM chip, not retail, its an odd one as base clock is 2.0 on my ES, yet will all core turbo upto 2.4.

Thanks for the offer on the cards - may take you up but am going to ponder for a couple of days - and since noise is an issue on blower cards - i'll probably wait until non FE cards are for sale.
 
Update:-

10478 on a 980ti at 2.0 x8 tpf 2:34. On PCIe 3.0 x16 tpf 2.32
9157 tpf min 1.11 to avge 1.28 on PCIe 2 x8. On PCIe 3.0 x16 tpf min 1.10 to avge 1.15

Not as big a gain but still work 10th extra PPD.

on the 970 I've been through the HFM benchmark logs and have noticed a 2-3sec decrease in WU tpf consistently, again not work much but still an average of 2-3k PPD
 
2-3k sacrifice would be worth it to me if that meant not buying another system to use another GPU or cheaper system parts. Now only if I had that many GPUs for that to be an issue.
 
I have that many, and it's a pain. I think I'd rather consolidate, if not just for the reduction in battery backup maintenance and heat, if just for the space required. I'm running 10 cards on and off in 8 machines. Talk about real-estate. I bought a rack with shelves just to stack them up, but since the rack isn't designed that way, I pretty much have to leave the front door open or off. I plan on consolidating when I find a suitable board with 3 true PCI-E 3.0 X16 slots in 1-4-7 slot configuration. To me it's worth the extra money for the board and chips to reduce the maintenance on that much hardware. Although, gratefully, I have yet to have a single component failure since picking up the project again full steam last year.

Oh, and Nathan_P, the 980 Tis I have are EVGA SC+ with backplace, ACX 2.0 design, linkage: EVGA GeForce GTX 980 Ti 06G-P4-4995-KR 6GB SC+ GAMING w/ACX 2.0+, Whisper Silent Cooling w/ Free Installed Backplate Graphics Card - Newegg.com Assuming you have enough exhaust fans they're REALLY quiet. I don't OC them more than stock, because they tend to do that automatically on their own. And I haven't had a single board fail after 24/7 operation, and no coil whine either (in case you're a gamer). It's a solid model IMO, and I did quite a bit of research before making the purchase. Just like the early 970s and 980s had coil whine issues, I'm waiting to see how the first problems crop up in the 1080 and 1080 Ti before I put my money down. It may mean I'm a late adopter, but it also means no buyer's remorse. Five of mine are that model, the last one and the 970 are the MSI versions with the cool red and black dragon scheme. Love the look, and they're solid, too, but I have a good history with EVGA cards since I started working on rigs in the late 90s. I just wish they still had a lifetime warranty. I got so many free card replacements at the 4-5 year mark because many of my customers never upgraded until something failed. Oh, the good times....

Thanks for the numbers. Looks like what you're saying, for the layman, is that there's almost no percentage loss between PCI-E 3.0 X16 and PCI-E 2.0 X8 for a 980 Ti?
 
I think its fair to say its project dependant, I've seen nothing to a 40-50k rise in PPD so far but its still early days as I only have a few data points - going to run my daily rig 24/7 for the next few days to capture lots of data
 
Well I wasn't intentionally testing but noticed some weird behavior and large ppd drops on my old Q6600 HTPC running a GTX750. This was after installing a new battery on the mobo (geez old one was from 2007, who'da thunk). All of a sudden the ppd was in the 7-8K range instead of 50K+ ? It seems after I reset the BIOS from factory defaults I left PCIE Spread Spectrum enabled which allowed the slot to drop down to PCIE x16 1.1 mode and it wouldn't speed up to PCIE x16 2.0 mode (max on old x38 chipset) no matter which app/game was running. Took me awhile to nail it down but 1.1 mode is definitely a ppd killer. I was getting similar ppd on x16 3.0 mode when I was running the 750 on another system so x16 2.0 mode doesn't seem to differ much from it.
 
Thanks for the updates guys. I wouldn't think the 750 would have been affected by 1.1 speeds, given it's performance envelope. I think I'm still the most curious about how 980 Ti and above perform on PCI-E 2.0 X8/X16 vs. 3.0 X8/X16. I just can't get comfortable with buying a 3 or 4-way board with PCI-E 3.0 X8 for folding until I see some concrete tests with the higher end cards. I think going forward I'm going to try a different route instead of trying to consolidate so much. In theory, if I get cheap cases, 650 watt PSUs instead of 900+, and 2-way motherboards I could save on some hardware expense, but I'd have to get more CPUs total. Of course it could also improve the resellability (sp) if I didn't make monster computers every time.... who knows. The 980 Ti cards have already depreciated in value about $200 each given the performance vs. the 1080 series and time. Of course, that's the game we play with this amount of hardware. I think I'm going to limit myself to 4-6 cards in the future, 10 is just too many to manage. I've already removed my 750s and 690 from the folding herd. Which reminds me, I should change my sig....
 
So far its looking like a 3-4 second boost consistently on the 980ti moving from 2.0x8 to 3.0x16, still collecting data though....
 
What has that equated to in PPD or percentage difference? Brain hurts, no math skills, dumb it down for me... hehe.
 
OK Huge update.

GTX 970, PCIe 3.0 x16 is consistently 1 to 4 seconds quicker than it was on PCIe 2.0 x16, ranging from 4 seconds on 11707 to 1 second on 10476. This is worth between 1.8 to 9k depending on the project. The most common gain is 1 second which is worth 2-3k on most projects

GTX980ti, PCIe 3.0 x16 is consistently 1 to 4 seconds quicker than it was on PCIe 2.0 x8, ranging from 4 seconds 11705 to 1 second on 9442. This is worth between 3.8 to 16k depending on project. The most common gain is 2 seconds with is worth 8-9k on most projects.

Here's the outlier 9211-9213 and 9158-9162 are slower on PCIe 3.0 x16 than they were at PCIe 2.0 x8. This may be either a slow WU in the Run - it does happen from time to time or the low power cpu, an E3-1230L v3, is slowing data transfers. More info needed on this as the 2nd goal of the build was to reduce power consumption.

I'm off Thursday so will whack everything into a googledocs spreadsheet for all to view
 
Here we go - googledocs sheet with the data so far. I've now switched the PCIe speed to gen 1 x16 so it will be interesting to see what happens. The next set of tests will be does cpu speed have an impact on GPU PPD?

 
Here we go - googledocs sheet with the data so far. I've now switched the PCIe speed to gen 1 x16 so it will be interesting to see what happens. The next set of tests will be does cpu speed have an impact on GPU PPD?




Thanks for the update. So even 2.0 X16 vs. 3.0 X16 makes a difference. I guess I've made my mind up then. From now on I'm getting mobos with two well-spaced PCI-E 3.0 X16 slots and a CPU that supports 40 PCI-E lanes minimum natively. That brings the PSU down to the 750 Watt category and I won't need any fancy cases or anything like that. I will tell you, though, I have been drooling over the dual-2011 rack-mount systems with 8 double-spaced PCI-E 3.0 X16 slots across the entire back. I can't even imagine what 8 1080 Tis are going to look like in a case like that, but I guess it doesn't matter much because those rigs are noisy as hell and really only designed for the passive flow-through workstation cards or the like. I can't stand that much noise, so I think the two-per-box is where it's going to be at for folding rigs. I just don't trust the PLX chips to work some magic spell to divide the bandwidth properly.

Oh, and you're in the lead to change your title again. ;)
 
OK, so another week and not much more data to add, a lot of the projects run under PCIe2 have finished so I can't get a lot of data. What I will do is run the rig at 2.0 speeds for a week to see if I can get any data. After that its going back to full speed.

Conclusion so far is, yes there is a difference but not a consistent one, some projects don't vary, some vary by a second and a few vary by 2 or more seconds - this may not make much of a difference on lower end cards but it does as you get further up the gpu tree, for a 970 or bigger you do need PCIe 3 to eek out every last point.

One thing I did find is that some projects have very varied run times depend on the run,clone,gen I clocked one project on my 980ti as have 7 different frame times across 7 different WU. What I haven't been able to do is get to the really low PCIe speeds, 1 x8, 1 & 2 x4 and any x1
 
Do you think that the findings from the lower speeds that you mentioned would be of interest to any folders on the cards that you're using now? It seems that we're already in the realm of "You need PCI-E 2.0 X16 or 3.0 X8 at a minimum for modern cards, but PCI-E 3.0 X16 to get the absolute best for folding".

Looking at your spreadsheets, though, I'd venture to say that one could even find an old PCI-E 1.0 X16 slot and still get the majority of the performance out of even a 980 Ti. It looks like the greatest change V1 to V3 is less than 6%? I guess the question is, does the variance from PCI-E 2.0 X16 and 3.0 X16 fall within a standard margin of error, or does it represent a calculated average performance difference that could be represented as a percentage value?

Loving the report, by the way. I'd love to be able to add 1080 and 1080 Ti (when it arrives) to the mix. Have you thought about asking NVidia to sponsor your research? It seems a lot of people are folding these days and would love to get these full stats in order to make educated purchasing decisions. I know I'm one of them.
 
Well, PCIe2 didn't work out so well, as soon as I set it in the bios and rebooted the machine wouldn't post with either card in any slot. Cleared the cmos and set back to PCIe3 and everything worked OK. May have to see if its a bios problem.

I was on EVGA's forums Wednesday night and they done some tests with a 1070 and they have seen a large jump between 2.0 and 3.0 so I'm wondering if cpu speed does play a part, FWIW my xeon will only turbo up to 2.4ghz
 
Last edited:
Interesting. I have a server with similar Xeons, 2620s, that max out turbo at 2.5 and not much overclocking without modding the SM mobo bios. Of course right now they're on CPU for WCG, but I did notice that the 980 Ti often performed under par in one of those servers that I had a card installed for F@H. And my 5820 does better than my 4690, but that might be heat or generational differences, given they have similar clock speeds. Do you have any overclock potential for the CPU? I wonder what a comparison of CPU speed, all other factors equal, would result in?
 
Its a E3-1230L v3 Xeon so no overclocking potential, besides I'm useless at overclocking unless the cpu has an unlocked multiplier. I'm going to try and score a cheapish lga1150 cpu on fleabay but it make take a couple of weeks.
 
Historically I've been able to squeeze about 10% OC out of a simple FSB tweak with no voltage modifications if you want to give that a shot. But you'd probably need to refer to an OC guide for your mobo or a similar one. I usually only overclock gaming systems anymore, but I've heard of quite a few people here OCing Xeons and Opterons with the right BIOS and tweaks. Of course that might skew your results.
 
It should run the 2 gpu's just fine, what we are currently debating is whether cpu speed has an impact on PPD, it shouldn't but given the numbers being posted by EVGA I'm not so sure anymore. I'm still looking for a cheap cpu
 
Does anyone have real stats for how gimped a 1x slot becomes?
 
By request of the poster

I have quickly downloaded FAH Bench and ran the GUI, and ran 180 second tests, one at PCIe 2.0 16x and the other at 1x using this contraption (link).

My test bed is a BIOSTAR H81S2, with a 3.0GHz Pentium dual core (G3220, 22nm). The GPU is a lowly Radeon R9-280X (HD7970).

At 1x, the FAHBench software reported these results on Tahiti, Open-CL, accuracy check enabled, 180 seconds, dhfr task, single precision, NaN check disabled: 38.5408, 23558 atoms.

At 16x/2.0: 42.3566, 23558 atoms.

1x was only 91% of 16x, losing 9% of it's potential. The test was not duplicated, and this GPU hardly stresses the PCIe bus the way a more modern card can. Still a substantial difference though.

I will likely try a 980Ti in the coming days. That should be far more interesting, but for now I've got to get off my butt and get some things done. :)

Edit #1: And I'm impatient, 980Ti results coming soon...

Edit #2: GTX 980Ti = 81.4012 at 16x/2.0; at 1x it was, wow, 41.777.

Huge hit, only 51.3% of the 980Ti's potential was extracted using a 1x interface. I did notice that MSI afterburner was reporting BUS usage at 12-13% at 16x, but only 6-7% at 1x. That seems to scale with the results from FAHBench.

10esseeTony.

PCIe Bandwidth, important for Distributed Computing?
 
Back
Top