SR-2 Optimization Thread

Umm....HOLY $%#^!!!

General path to the setting (names aren't exact, but I am sure you can find it)

A47 modded bios - Power management options - ACPI - Advance ACPI - NUMA (last setting)

Setting this to Disable on my dual L5640 machine @ 3.564

P2685
Previos frame times: 15:22
Current frame time (1 frame): 13:48

The older (A41?) bios does not seem to have this effect, although it could be due to bigadv versus non-bigadv. I am going to hang out for another frame to see if this was an anomoly...

Edit: second frame: 13:46
Gonna let her ride this afternoon, will play around with the other one once I get back from work
 
Last edited:
too bad the SR-2 doesnt have IPMI :(

I agree! My Asus KGPE-D16 with the Opterons does, and what a great tool it is.

162,850 ppd :eek: |V| G!
Can't wait to see how much of a gain with L5640s.

Btw, what ram kits are you guys using with sr-2? I'm shopping around for kits, 6x1gb would be enough, unless 2gbs are near that price.

I'm actually using 3 dual-channel G.Skill F3-12800CL9D-4GBRL kits, so one is split up where one DIMM is in each memory bank. So much for requiring triple channel kits.

Marketecture at its best.

Umm....HOLY $%#^!!!

General path to the setting (names aren't exact, but I am sure you can find it)

A47 modded bios - Power management options - ACPI - Advance ACPI - NUMA (last setting)

Setting this to Disable on my dual L5640 machine @ 3.564

P2685
Previos frame times: 15:22
Current frame time (1 frame): 13:48

The older (A41?) bios does not seem to have this effect, although it could be due to bigadv versus non-bigadv. I am going to hang out for another frame to see if this was an anomoly...

I'm on A49, but I don't think this will be an anomaly musky. It had similar effect on my twin Opteron box in terms of performance.
 
Update: 5 frames in, frame times holding at 13:46, ppd increase from 103K to 115K, and this is on a unit that is 70% complete at the old speed. At least at these speeds, we are looking at pushing a 15K ppd increase for a simple setting.

10e, you are definately a hero!
 
7 is a nice cas latency. I think somebody here reported a 5% improvement dropping from 9 to 8....doublecheck that

If you don't mind jumping through some MIR hoops you could save a few bucks buying three of these 9-9-9 Corsairs for $75
http://www.newegg.com/Product/Product.aspx?Item=N82E16820145260

Or these 7-8-7 corsairs for $95 http://www.newegg.com/Product/Product.aspx?Item=N82E16820145285

Actually there are quiet a few options for Corsair RAM. Take your pick depending on your budget
http://www.newegg.com/Product/Produ...050&IsNodeId=1&name=240-Pin+DDR3+SDRAM&Page=2
 
With L5640s @3.6GHz, I'm seeing the TPF on a P2686 drop from 13:45 with NUMA on to 12:50 with NUMA off. My current WU was already 2/3 complete, but HFM.net is showing a jump from 120K to 130K PPD.

Great find 10e!
 
With L5640s @3.6GHz, I'm seeing the TPF on a P2686 drop from 13:45 with NUMA on to 12:50 with NUMA off. My current WU was already 2/3 complete, but HFM.net is showing a jump from 120K to 130K PPD.

Great find 10e!

We need a frame time thread for dual L5640 rigs. How in the heck were you getting 13:45 frame times at 3.6 on a P2686? The best I ever saw were around 14:30. 45 seconds per frame is huge! 55 seconds is 10K ppd at this speed. We need to share secrets on setup.
 
We need a frame time thread for dual L5640 rigs. How in the heck were you getting 13:45 frame times at 3.6 on a P2686? The best I ever saw were around 14:30. 45 seconds per frame is huge! 55 seconds is 10K ppd at this speed. We need to share secrets on setup.

Different builds on the P2686, maybe? I've been running straight A3's on mine for the past couple days, since I got it back on the A47 BIOS. Any insight you guys can share amongst us fellow SR2 owners in much appreciated. :cool: I'm wondering now just how fast word will get out on the NUMA option to other forums...

Just set up the -bigadv flag on mine again...we'll see just how far we can get on one of those soon. Last attempt I would get up between 12-20% and then get an UNSTABLE MACHINE crash. Will probably try to up the core voltage a bit more if that happens on this attempt.

Ax
 
7 is a nice cas latency. I think somebody here reported a 5% improvement dropping from 9 to 8....doublecheck that

If you don't mind jumping through some MIR hoops you could save a few bucks buying three of these 9-9-9 Corsairs for $75
http://www.newegg.com/Product/Product.aspx?Item=N82E16820145260

Or these 7-8-7 corsairs for $95 http://www.newegg.com/Product/Product.aspx?Item=N82E16820145285

Actually there are quiet a few options for Corsair RAM. Take your pick depending on your budget
http://www.newegg.com/Product/Produ...050&IsNodeId=1&name=240-Pin+DDR3+SDRAM&Page=2
thanks for the help, the 2x2gb corsair low timing has a few reviewers unable to achieve the timings (weird) so I can't go for that one.

We need a frame time thread for dual L5640 rigs. How in the heck were you getting 13:45 frame times at 3.6 on a P2686? The best I ever saw were around 14:30. 45 seconds per frame is huge! 55 seconds is 10K ppd at this speed. We need to share secrets on setup.
*cough *cough Sazan please share what ram kits your using too :D I'd go for the exact same setup and settings :p
 
Wow, I did that NUMA setting....

Dropped my TPF from 15:56 to 14:05 on my l5640 box running 3.5 currently.

Not bad at all. :cool:
 
My hardware and memory config is earlier in the thread here, and the BIOS settings for the L5640s are a few pages back, basically copied from nitrobass24 to hit 200 BCLK. RAM is Corsair Dominator 12GB kit at 8-8-8-24 and auto everything else. I saw very little performance change going from the X5650s at 3.85GHz (175x22) to the L5640s at 3.6GHz (200x18). The faster BCLK certainly helps, and maybe this is also related to other people's discoveries that a higher clock doesn't always equal higher benchmarks. I do know my particular L5640s seem to be good overclockers while my X5650s are relatively poor.

The specific unit is a P2686 (R4, C13, G31). I've apparently run three P2686s on the L5640s with an average frame time of 13:42 until now.

Oh, one other thing. I'm running Windows Server 2008 R2. That may have more efficient memory usage or something.
 
My hardware and memory config is earlier in the thread here, and the BIOS settings for the L5640s are a few pages back, basically copied from nitrobass24 to hit 200 BCLK. RAM is Corsair Dominator 12GB kit at 8-8-8-24 and auto everything else. I saw very little performance change going from the X5650s at 3.85GHz (175x22) to the L5640s at 3.6GHz (200x18). The faster BCLK certainly helps, and maybe this is also related to other people's discoveries that a higher clock doesn't always equal higher benchmarks. I do know my particular L5640s seem to be good overclockers while my X5650s are relatively poor.

The specific unit is a P2686 (R4, C13, G31). I've apparently run three P2686s on the L5640s with an average frame time of 13:42 until now.

Oh, one other thing. I'm running Windows Server 2008 R2. That may have more efficient memory usage or something.

Something is fishy here. You have nothing drastically different than i have, yet you are smoking me in performance...

Once we hit 4 billion, I am going to figure this out.
 
Are you sure your actual performance is not as good?
Or is it just your PPD is not as high?

PPD is not a good measure of performance, especially bigadvs on these SR-2's. The slightest fluctuation in TPF has a dramatic effect on PPD because of the bonus structure.
Additionally, all WUs are not equal. i have had a 2686 that runs 29min TPFs (39,425 PPD) and other 2686 that runs almost 33min TPF (32,487).
This is on a system with E5530's @ Stock. You put this on an SR-2 and the difference will be much more drastic.
Also if you use yours as a workstation it will affect you as well.

If we truly want to be able to compare performance of SR-2s we need to use a benchmarking tool. Or at the very least the EXACT SAME WU (R,G,C).
 
Are you sure your actual performance is not as good?
Or is it just your PPD is not as high?

PPD is not a good measure of performance, especially bigadvs on these SR-2's. The slightest fluctuation in TPF has a dramatic effect on PPD because of the bonus structure.
Additionally, all WUs are not equal. i have had a 2686 that runs 29min TPFs (39,425 PPD) and other 2686 that runs almost 33min TPF (32,487).
This is on a system with E5530's @ Stock. You put this on an SR-2 and the difference will be much more drastic.
Also if you use yours as a workstation it will affect you as well.

If we truly want to be able to compare performance of SR-2s we need to use a benchmarking tool. Or at the very least the EXACT SAME WU (R,G,C).

My TPF is an average of the last 300 frames of 2686 units. I realize what you are saying, but I have a hard time believing that normal fluctuation would cause that kind of difference. Maybe so, but this thing has run very consistantly TPF and ppd wise on 2686s since it was converted to L5640s, which has been quite a while ago now.

As for F@H performance benchmarking, I am working on that now. I don't want to take any machines off-line until we cross 4 billion, but I do think this would be a valuable project. We have enough more-or-less identical machines floating around here to get quite a bit of data.
 
I wonder why they cant make a tool that you can install and has a portion of one of every WU, then it could run 5 -10 frames, 3 for big adv, and give you a TPF and rough PPD value. That wouldnt be hard to make would it?
 
I wonder why they cant make a tool that you can install and has a portion of one of every WU, then it could run 5 -10 frames, 3 for big adv, and give you a TPF and rough PPD value. That wouldnt be hard to make would it?

That is what I am working on...sort of.
 
Or we could just run LinX to see how many Glfops our setup does. Would prob be faster anyways
 
All SR-2 testing needs to stop until 4 billion! Turn off the NUMA thing and let 'em rip....we'll worry about optimizing affterwards.
 
NUMA update - make sure you have stable memory. On my second SR-2, disabling NUMA more than doubled my frame times...

On the first SR-2, NUMA disabled is showing a 7K ppd increase on a 2684.
 
Interesting discovery about NUMA. I had experimented with that on my dual Opteron years ago and found that enabling it had negatively affected SMP performance. That was back when only the A1 core WUs were available. It appears it's still the case even with newer WUs and architectures.
 
The morning after what will become known as The 10e NUMA Incident...

Oh so tired. :( Oh so happy :D

I think 6th October 2010 shall now be known as 1st January 0000. (After 10e / Before 10e etc)

As if a free 12Kppd boost was not enough, disabling NUMA fixed my priority problem!

(not matter what I did with priorities, my 3D renderer would not use over 50% of CPU if anything was running in the background - running FAH meant constant messing with affinities and manual managing of priorities to get my 3D app to render near full speed. Which meant FAH had to go when on a deadline.)

What does this mean? Even more spare cycles for the [H]orde on my main rig SR2#1.
:cool:
 
Yeah, I saw this in LinX. With NUMA I was getting 55 GFlops rate and with UMA it went up to 77 GFlops.

This is how I thank you for all your efforts! More work :p

Good problems to have. But yes, I am looking at a lot of obsolete data. With a big bloody grin on my face.

It's prevalent in SMP systems. I had to share it with the team. Imagine every SR-2 on the team getting a 10 - 15K PPD upgrade just from this.

Imagine? It is happening! And it helps us more than any other team. Tip of the decade bro. :cool:

After this I'm going to try and tweak it to get to a 14:00 TPF, so it will drop a BigAdv unit at least once a day, even 2684s!

Just got 2 2684's going now - better than nothing - and we are looking at 15:45 a frame = 26 hours. Heartbreaking close to being a guaranteed 100,000ppd bigadv machine, but 98,500K it is. :eek:

Interestingly nocpulock at 0 or 1 gives same averages. I reckon 24 hours would need 4.76 GHz to do it. Go for it :p

7 is a nice cas latency. I think somebody here reported a 5% improvement dropping from 9 to 8....doublecheck that

From my testing that is pretty doubtful. I found DDR strap made a big difference, 1T/2T a big diff, and other timings pretty small. But we need better benching tools, which are being worked on soon, after the dust settles.

But Mr Musky has banned further benching until 4 billion, so I don't want to stay back after class :p
 
Bad news.

Turned off numa and now my pc won't boot. I even switched to a different bios chip. No matter what I do the readout gets stuck on FF
Posted via [H] Mobile Device
 
But Mr Musky has banned further benching until 4 billion, so I don't want to stay back after class :p

Since when does anyone listen to me?? :)

For an update on the whole NUMA thing, I have one system that loves it being disabled (1+ minute lower TPF on all bigadv units), and one system that can't complete a frame with it disabled. I attribute this to one or more bad memory modules in one of the banks. With NUMA enabled, the system can "route around" the bad memory module and still keep going. With NUMA disabled, it forces one of the processors to try to access this bad module, which kills performance. Note that this is speculation based on my understanding of what NUMA actually does and some input from punchy over at EVGA. I should have 6 x 1Gb of new memory waiting for me when I get home today and a bigadv that is due to send some time this evening on this machine, so I should be able to at least partially verify this tonight. My advise for now is that if you do not see a significant performance gain on F@H by disabling NUMA, you probably have a memory issue on one of your CPU banks.
 
Bad news.

Turned off numa and now my pc won't boot. I even switched to a different bios chip. No matter what I do the readout gets stuck on FF
Posted via [H] Mobile Device

A bios reset should re-enable NUMA and allow you to boot. See my previous post on potential memory issues. You may want to start remove modules to get it to boot.
 
[H]UMAGate!

I'm really happy it increased your productivity too. Nothing like getting a boost in folding and rendering.

How's about AN and BN > Before NUMA/After NUMA :)

I really hope you got some sleep MIBW. You are truly a [H]ardTr00per

I'm going to stop tweaking for the time being and just let'er run. 16:25 TPF on a P2684 is good enough for me. 27.3 hours is good enough for now to keep the "D" up.

I am hoping to get a P2685/2692 and watch the wings sprout.

BTW thanks for that upload ;)

EDIT: I'm seeing an 8K PPD upgrade on a P6061. My original numbers were 72,000 PPD on it, and now 80,000 since UMA came into my life. TPF only went down by 10 seconds, but a big difference even on regular A3 SMP.

The morning after what will become known as The 10e NUMA Incident...

Oh so tired. :( Oh so happy :D

I think 6th October 2010 shall now be known as 1st January 0000. (After 10e / Before 10e etc)

As if a free 12Kppd boost was not enough, disabling NUMA fixed my priority problem!

(not matter what I did with priorities, my 3D renderer would not use over 50% of CPU if anything was running in the background - running FAH meant constant messing with affinities and manual managing of priorities to get my 3D app to render near full speed. Which meant FAH had to go when on a deadline.)

What does this mean? Even more spare cycles for the [H]orde on my main rig SR2#1.
:cool:

NUMA update - make sure you have stable memory. On my second SR-2, disabling NUMA more than doubled my frame times...

On the first SR-2, NUMA disabled is showing a 7K ppd increase on a 2684.

With UMA each CPU will basically use its own RAM, and that means that a marginal set of RAM or a marginal BCLK/RAM Divider OC may cause issues.

Unfortunately it will be tougher on each CPU's IMC and the RAM bank that it uses, instead of having the first RAM bank tested more than the other during folding. The reason it's killing your folding is that you may effectively have one CPU at triple channel (with the good RAM) and the other at dual channel, in which case NUMA might have issues.

I'd say lower your RAM divider and try again, but I remember you already tried that. But you may want to try both CPUs at dual-channel and see if that helps. Of course that may lower your performance even with NUMA off.
 
Last edited:
The bottom line is you need your memory to be fully symmetric between sockets before disabling NUMA in the BIOS. In other words, you need the same speed and capacity memory in the same slots for each socket, and you need them all to be working ;-)

Disabling NUMA in the BIOS doesn't make the system UMA. The hardware is NUMA no matter what you do - the local memory is always faster than the remote memory, and each processor can access all memory. All disabling NUMA in the BIOS does is not tell the OS you have NUMA hardware (and it may also enable node-node interleave depending on the BIOS and architecture).
 
The bottom line is you need your memory to be fully symmetric between sockets before disabling NUMA in the BIOS. In other words, you need the same speed and capacity memory in the same slots for each socket, and you need them all to be working ;-)

Disabling NUMA in the BIOS doesn't make the system UMA. The hardware is NUMA no matter what you do - the local memory is always faster than the remote memory, and each processor can access all memory. All disabling NUMA in the BIOS does is not tell the OS you have NUMA hardware (and it may also enable node-node interleave depending on the BIOS and architecture).

Good explanation. And I can confirm on my Opteron 6168 duallie that enabling node interleaving disables reporting of NUMA by the BIOS/hardware.

But in the case of folding@home, if it's not a NUMA aware app, I guess it's the O/S that is trying to allocate RAM a-symmetrically via the Bus between CPUs?
 
Good explanation. And I can confirm on my Opteron 6168 duallie that enabling node interleaving disables reporting of NUMA by the BIOS/hardware.

But in the case of folding@home, if it's not a NUMA aware app, I guess it's the O/S that is trying to allocate RAM a-symmetrically via the Bus between CPUs?

Applications do not allocate RAM that is an OS layer operation
 
Good explanation. And I can confirm on my Opteron 6168 duallie that enabling node interleaving disables reporting of NUMA by the BIOS/hardware.
Same with my old Opteron 280.

But in the case of folding@home, if it's not a NUMA aware app, I guess it's the O/S that is trying to allocate RAM a-symmetrically via the Bus between CPUs?
IDK, but one thing I have known for half a decade is if your apps don't have support for NUMA, chances are they will perform better without it.
 
A bios reset should re-enable NUMA and allow you to boot. See my previous post on potential memory issues. You may want to start remove modules to get it to boot.
no luck. this sucks
 
I just scored 6 sticks of Corsair 1gb 1800MHz 7-7-7-20 for $300. Now I'm just waiting for SR-2 to come back in stock somewhere in Canada and I'll be joining you guys :D
 
thanks for looking.

i'm gonna try some other ram when i can. this happened at the worst time, when we have out of town friends staying with us :(
I am still running it with the mismatched CPUs. I am not 100% sure it was ever 100% stable in this configuration.

After fiddling with RAM, cmos, etc a bit more, I'm gonna pull out the hex and put the matched quad back in.

I am also suspicious that is the source of my NUMA problem to begin with.
 
I am still running it with the mismatched CPUs. I am not 100% sure it was ever 100% stable in this configuration.

After fiddling with RAM, cmos, etc a bit more, I'm gonna pull out the hex and put the matched quad back in.

I am also suspicious that is the source of my NUMA problem to begin with.

It likely is. With numa things get allocated differently than non-uma (UMA or S-UMA) and this can cause issues.

I'd remove the second CPU, reset the CMOS and re-boot. Profiles stay when you reset CMOS, so as long as you saved your OC profile you should be ok.

Good luck. Sorry for the trouble!
 
As I posted in the other thread, you don't have to remove a CPU, just disable it with the jumper.
 
Back
Top