Guide: Dual Windows SMP Clients on a Quad

Killer[MoB]

[H]ard|DCer of the Month - March 2008
Joined
Jan 15, 2001
Messages
3,943
As more and more folders are curious about running multiple Windows SMP clients on a Core2Quad using Affinity Changer in a straight up Windows environment, their is a need for more understanding what not to do and what measures you can take to make it easier in case of a problem. For most people, using Affinity Changer allows the frame times on both Windows SMP clients on the quad to be very close to what you would see running a single Windows SMP on a Core2Duo of the same speed. In other words, a quad at 3.4GHz will give you almost the same points as two 3.4GHz dual cores.
Update 1: After some serious behind the scenes testing by relic, BillR and myself, it has been determined that running Affinity Changer with Vista SP1 has little to no improvement over running two instances without Affinity Changer. SP1 includes a update for CPU scheduling and it seems to be making some machines less responsive and without a doubt erases the huge benefit of Affinity Changer on a dual WinSMP folding setup running Vista.

There are a couple of advantages to doing this:
  1. The machine remains responsive. I judge it to be the same as a single SMP client running.
  2. The points increase is impressive over a single client.
  3. It's really easy to set up. Install the second instance in its own folder (C:\Program Files\Folding SMP2 for example) and set it up with Machine ID 2. Use -local as the first thing in your shortcut for starting the clients plus whatever other flags you normally use.
  4. RAM use is low compared to running 2 Linux VMs. Basically just 2x that of a single SMP instance plus whatever the system is using.
  5. Affinity Changer does all the hard work of setting affinity auto-magically for you.

There are several potential problems as well:
  1. Just as with running two instances on a dual quad in windows, you risk, the clients crashing and losing work when shutting them down. There are work arounds which I will cover in more detail in a moment.
  2. When the client reports to Stanford, it will report 4 cores. So it is possible if the have some tougher WUs that they only want to assign to quads, you may get one and will be folding it on only 2 cores. Nothing you can do to fix this one.
  3. This is not a standard way of running the client just like running VMs is not. So if you have problems, you can't really complain. We are just trying to get the most work out of our equipment as we can, which is what we have been doing for 7 years now.

Now, the main thing to cover to make sure this goes as pain free as possible is to cover the biggest issue. That issue is "Just as with running two instances on a dual quad in windows, you risk the clients crashing and losing work when shutting them down." What happens is that when you CTRL-C one client, the other one immediately crashes and it crashes hard taking the work with it. The client just isn't written to handle being run side by side within the same OS. I'm sure it has to do with it being mutli-threaded because, as we all know, you can run the regular console client multiple times without any problems.

Here are the current solutions to this problem:
  1. Hit the reset button on your computer. This one is a bit harsh, but it is reported to work. ( I'm not trying it. ;) )
  2. Unplug your network cable or disable the connection in XP. I don't think this will work in Vista as network problems don't stop the client from working like it does in XP. This causes the clients to stop working by killing off the 8 FahCore_a1.exe processes. You can then Ctrl-C to kill off the client windows. You can start folding back up at anytime without a reboot or any other action.
  3. You can manually or automatically back up your folding directories and restore the files if you have a problem. This is more of a safety net than anything, not a way to keep it from crashing. I put together a bat file that does the process of copying the directories to a backup location that I can manually run. I also have it set up as a scheduled task. If for any reason you stop folding using just CTRL-C and one unit does crash, you can restore it to the latest backup. One thing to note is that if you do this, you will need to manually kill off any FahCore_a1.exe processes or fah.exe processes that are still running before you start either of the WinSMP clients back up. Your best bet is to go ahead and reboot the machine at this point just to be safe. I assume you were going to reboot anyway....why else would you stop folding. ;)
  4. I have tested this a few times lately and it seems to be working. Doing CTRL-C on the console windows in very quick succession seems to keep them from crashing out. Make sure you have no other windows maximized on your screen. This way when you shut down the first one, the second console window is immediately in focus. A couple of very quick CTRL-C (maybe throw in a extra one for good measure) keystrokes and you should be good. If you are not planning on rebooting, you will need to manually kill off any FahCore_a1.exe processes or fah.exe processes that are still running before you start either of the WinSMP clients back up. Your best bet is to go ahead and reboot the machine at this point just to make sure they get a clean start. Also, backing up is a smart thing to do just in case.
 
All of that brings me to the bat files. I will provide them here for anyone that wants to modify and use them. To use the code here, just copy and paste it into notepad and save as whateveryouwant.bat. Copy the code between the lines. Do not copy the lines. You will need to modify the parts in RED to your specific folding directories and the parts in YELLOW to your preferred backup location. You can run a backup manually by double clicking the bat file or you can run it as a scheduled task if you would like to do it every few hours. If you need to restore, just copy the files in the backup directory back to the corresponding F@H directory allowing it to overwrite the files. You won't be able to overwrite any files that are in use but that's really ok because it's the work folder that's the main thing anyway.

This first one backup up both directories to their own backup folder. Each backup makes a new folder for that particular backup and adds the date and time for easy sorting. It will be named like this example: 2008_03_02_18_05 (that's 3/2/2008 @ 6:05 PM). If you use this one, be aware that each one is a new copy of the directories. If you don't manually delete them, they will begin to take up some serious hard drive space if you are doing an automated backup. This backup takes about 1.5 seconds each time.
-----------------------------------------------------------------------------------------
@echo off
:: variables
set drive=F:\FAHBackup\SMP
set hour=%time:~0,2%
if "%hour:~0,1%"==" " set hour=0%time:~1,1%
set folder=%date:~10,4%_%date:~4,2%_%date:~7,2%_%hour%_%time:~3,2%
%backupcmd% "...source dir..." "%drive%\%folder%\...destination dir..."

set backupcmd=xcopy /s /c /d /e /h /i /r /k /y

echo ### Backing up directory...
%backupcmd% "C:\Program Files\Folding SMP" "%drive%\%folder%"

:: variables
set drive=F:\FAHBackup\SMP2
set hour=%time:~0,2%
if "%hour:~0,1%"==" " set hour=0%time:~1,1%
set folder=%date:~10,4%_%date:~4,2%_%date:~7,2%_%hour%_%time:~3,2%
%backupcmd% "...source dir..." "%drive%\%folder%\...destination dir..."

set backupcmd=xcopy /s /c /d /e /h /i /r /k /y

echo ### Backing up directory...
%backupcmd% "C:\Program Files\Folding SMP2" "%drive%\%folder%"

echo You have done a Killer Backup of your folding files. Congrats!
@echo off
cls
----------------------------------------------------------------------------------------

This second one backs up to the specified directories without a sub directory being created. It only writes files that have changed since the last backup. This backup takes about 1.5 seconds on the first go and then about .25 seconds each

time after that.
----------------------------------------------------------------------------------------
@echo off
:: variables
set drive=F:\FAHBackup\SMP
%backupcmd% "...source dir..." "%drive%\%folder%\...destination dir..."

set backupcmd=xcopy /s /c /d /e /h /i /r /k /y

echo ### Backing up directory...
%backupcmd% "C:\Program Files\Folding SMP" "%drive%\%folder%"

:: variables
set drive=F:\FAHBackup\SMP2
%backupcmd% "...source dir..." "%drive%\%folder%\...destination dir..."

set backupcmd=xcopy /s /c /d /e /h /i /r /k /y

echo ### Backing up directory...
%backupcmd% "C:\Program Files\Folding SMP2" "%drive%\%folder%"

echo You have done a Killer Backup of your folding files. Congrats!
@echo off
cls
---------------------------------------------------------------------------------------
 
Very nice guide for dual SMP client setup. This should be in the sticky.

I have only one question. Why do you recommend the -local flag? Is it necessary? I don't have it included on my machines with my multiple SMP clients.

 
The -local flag is only a safety backup for the SMP clients.

If your only running two or more console type clients in Windoze, then each client keeps its own config data in its own folding folder, so its not realy needed.

If your running a mix of one graphical client and one or more console clients then it is needed.
The graphical client keeps it config data in the registry, without the local flag, all the clients will try to use this data.
Or if you have used a graphical client and are switching to two console clients, the again use it just to be safe.

So most of the time you won't run into any problems by not useing it in Windoze.
Linux does not need it as again the config data is aways stored localy.
But Macs do as the config data tends to be stored in a discreate file.
Hope this clears up any confusion.

Luck .............. :D
 
So far running great on my Q6600, increased my processing time from 9.5min/frame to 14.5 min/frame. I will see how it goes. (and report back)




 
One question for you, got any idea how to have them up and running as service? I know running one SMP Client is doable for service, but I don't know about two, got any idea? I really hate seeing console windows up in my face... It makes me angry.
 
This is a dedicated folding box in a spare bedroom, I don't mess with it much...and the windows are open.




 
One question for you, got any idea how to have them up and running as service? I know running one SMP Client is doable for service, but I don't know about two, got any idea? I really hate seeing console windows up in my face... It makes me angry.

No, I haven't even tried one instance as a service. I prefer the console being visible. I've been so used to them for nearly 7 years, that I would be sad to hide it from my view. ;) Just a wild guess, but it seems like you would just set up the service for the second instance just as you did for the first one.
 
Ok..just FYI.
Do not load Vista SP1 if you want to run dual SMP instances.
You will lose two minutes per step.

Confirmed with testing with BillR and KillerMOB.
I was a victim of the acidental 2/22 critical update to Vista 64 SP1. It blows.
 
So far running great on my Q6600, increased my processing time from 9.5min/frame to 14.5 min/frame. I will see how it goes. (and report back)

Nice results.
That's going from 2667 PPD to 3495 PPD.
Also, that's getting 2 units done every 24:10 instead of 2 units every 31:40 running one at a time.:D
 
Ok..just FYI.
Do not load Vista SP1 if you want to run dual SMP instances.
You will lose two minutes per step.

Confirmed with testing with BillR and KillerMOB.
I was a victim of the acidental 2/22 critical update to Vista 64 SP1. It blows.
Thanks for informing us relic. Not that I was intending to install any version of Vista because I hate it with undying passion, but this just gives me one more reason to delay upgrading my OS indefinitely. I can do everything I want with XP, so I'm going to stick with it for the foreseeable future.
 
I'm glad to run the SMP clients under VMWare so I don't have to deal with this crap.

 
Nice thread, set up the dual clients last night, will get the batch files set up tonight. I should start noticing a nice PPD gain in the next few days.

 
What is the production like on a Q6600? Please state your GHZ when quoting PPD.

Thanks!
 
Killer, I can confirm that hitting the reset or pressing the power botton and letting Vista shutdown does not crash the WU and lose work.
 
Killer[MoB];1032169302 said:
No, I haven't even tried one instance as a service. I prefer the console being visible. I've been so used to them for nearly 7 years, that I would be sad to hide it from my view. ;) Just a wild guess, but it seems like you would just set up the service for the second instance just as you did for the first one.

Tried that, the clients freaked out and duked it out against each other. They're now running as a console... I'm watching the process going by... slower but at a higher PPD :D
 
Q6600 2.88ghz (320fsb) 2gb adata, p5b-plus, ram set to 512mb/client/vista basic
core 1 - 2653 13:58/frame/min
core 2 - 2653 13.40/frame/min




 
Q6600 2.88ghz (320fsb) 2gb adata, p5b-plus, ram set to 512mb/client/vista basic
core 1 - 2653 13:58/frame/min
core 2 - 2653 13.40/frame/min





So about 3620, that's not too shabby for windows SMP
 
Q6600 3.4ghz 4gb ram - specs in sig
SMP1 - 2653 12:37/frame/min
SMP2 - 2653 12:07/frame/min

E6700 3ghz
SMP 1 2653 13:07/frame/min

6048 PPD Total

edit: added current FAHMON times and PPD
 
[email protected] (goddamn motherboard freaks out and restarts if you have the FSB set to manual. Yes, even if you set the FSB to the stock 266 speed. Ordering an Abit IP35-e to replace it and selling this board to someone who doesn't overclock for what I paid.)

16:05 for the first.
16:09 for the second.

FahSpy shows a combined 3147 PPD for both.

Oh yeah, this is on XP64.

(Goddamn piece of shit non-overclocking motherboard.):mad::mad::mad:

 
Q6600 3.52ghz 440 FSB - 2gb ram - idle time
SMP1 - 2653 11:13/frame/min - 2259.5
SMP2 - 2653 11:21/frame/min - 2233.0

4492.5 Total PPD

Updating with logs from while I was working today. 11:05 and 11:06 Even better than my OP:
[20:29:11] Writing local files
[20:29:11] Completed 285000 out of 500000 steps (57 percent)
[20:40:16] Writing local files
[20:40:16] Completed 290000 out of 500000 steps (58 percent)
[20:51:21] Writing local files
[20:51:21] Completed 295000 out of 500000 steps (59 percent)
[21:02:25] Writing local files
[21:02:26] Completed 300000 out of 500000 steps (60 percent)

[20:31:21] Completed 365000 out of 500000 steps (73 percent)
[20:42:28] Writing local files
[20:42:29] Completed 370000 out of 500000 steps (74 percent)
[20:53:35] Writing local files
[20:53:35] Completed 375000 out of 500000 steps (75 percent)
[21:04:42] Writing local files
[21:04:42] Completed 380000 out of 500000 steps (76 percent)


 
C2Q @ 3.6Ghz
SMP 1 - 2150 PPD
[04:00:42] Completed 275000 out of 500000 steps (55 percent)
[04:12:09] Writing local files
[04:12:09] Completed 280000 out of 500000 steps (56 percent)
[04:23:35] Writing local files
[04:23:35] Completed 285000 out of 500000 steps (57 percent)
[04:36:04] Writing local files
[04:36:04] Completed 290000 out of 500000 steps (58 percent)

SMP 2 - 2106 PPD
[04:11:02] Completed 285000 out of 500000 steps (57 percent)
[04:22:30] Writing local files
[04:22:30] Completed 290000 out of 500000 steps (58 percent)
[04:34:40] Writing local files
[04:34:40] Completed 295000 out of 500000 steps (59 percent)
[04:47:09] Writing local files
[04:47:09] Completed 300000 out of 500000 steps (60 percent)

TOTAL PPD - 4256

 
Hmm I might have to try this dual windows smp deal on my next box! 4.2-4.4 is impressive and even performance wise with my Q6600 at 3.1 doing 4.1kPPD Very nice!! I bet it is simpler than doing VM's as well.
 
Since I had the client downloaded and had affinity client installed, setting up the second client took 30 seconds. Extract the smp client to Folder 2, run install-run client. Done!!



 
Since I had the client downloaded and had affinity client installed, setting up the second client took 30 seconds. Extract the smp client to Folder 2, run install-run client. Done!!




Yea I cant belive how easy this was. Getting about 4500ppd out of my main rig now.
Looking forward to tricking out my second quad on monday.
 
Anyone notice a difference between 64bit and 32 bit XP in performance?

Just curious if that will matter much.
 
The -local flag might cause problems in 64 bit Vista with Dual SMP clients. When I initially set up dual SMP clients on my 64 bit box, this guide was not available. I used APOLLO's advice about setting everything up and it worked great for about 2 weeks. I left NOD32 running by accident, and when the WU on one of the consoles completed NOD32 caused it to hang while starting the new WU. I figured a fresh reboot was in order. However after booting up I tried the -local flag which I had not used before and I could not get 2 SMP clients to run at all. One would start and the other would error out like below. The thing is that the folding console thought that it was machine ID1 even though it was launching from the folder that was configured as machine ID 8. It didn't matter which Machine ID 1 or 8 that I started with I got the same kind of error. Giving up I just tried to launch each SMP client by running the executable with no -local flag and it worked. Anybody have any ideas about what might be going on? Thanks for putting the dual SMP guide together....I would not have backed up my work folder otherwise and would have lost another WU.

Launch directory: C:\Other Progarms\Folding
Executable: C:\Other Progarms\Folding2\fah.exe
Arguments: -local

[14:10:31] - Ask before connecting: No
[14:10:31] - User name: HighYield (Team 33)
[14:10:31] - User ID: 1D8BA4B430345BC4
[14:10:31] - Machine ID: 1
[14:10:31]

A potential conflict was detected:

Process 936 is currently running and may also be a client with Mach. ID 1
Program will now exit. Upon restart, this check will not be done --
you may wish to check that no client is currently running in
C:\Other Progarms\Folding before restarting.

 
So this is a weird question. I'm using this and it is working GREAT!!! However, I have a question about the modified date on the folders. They are all showing the same time. Is this normal behavior? The create date is consistent with the date of the backup. Not that I care, but just curious

 
The time in the back up folders name should be the time it was backed up in military time.
Mine look like this:
2008_03_11_18_05
2008_03_11_20_05
2008_03_11_22_05

The date is 3/11 and the times are 6:05, 8:05 and 10:05PM. I have mine scheduled every 2 hours @ 5 after.

Can you explain in more detail how yours are appearing?
 
The folder names are correct (much like yours, but mine are scheduled every hour throughout the day).

However, Vista in its infinite wisdom shows Modified Date (this is what originally freaked me out) because they all had the same date/time. THEN I realized it was Modified Date and not Create Date. When I added that to the Explorer view all was well because the Create Date was consistent with the time that the backup was created.

Just a wierd anamoly. I'm sure it's just Vista being plain ole wierd.
 
I've been running dual SMPs for a couple of days.

I'm running Q6600 @ 3.42
SMP1 - 1500 PPD (roughly 17 mins per step)
SMP2 - 1478 PPD (roughly 17.25 mins per step)

I don't understand how others are getting 4000+ ppd. Rig in Sig (obviously not at 3.6 due to temp issues).

My SMPs are configured for Low Priority (I believe this is the value you're supposed to use) and I've got 1024 assigned for memory.

Do I need to make a change to my configuration to get the higher PPD?


 
I've been running dual SMPs for a couple of days.

I'm running Q6600 @ 3.42
SMP1 - 1500 PPD (roughly 17 mins per step)
SMP2 - 1478 PPD (roughly 17.25 mins per step)

I don't understand how others are getting 4000+ ppd. Rig in Sig (obviously not at 3.6 due to temp issues).

My SMPs are configured for Low Priority (I believe this is the value you're supposed to use) and I've got 1024 assigned for memory.

Do I need to make a change to my configuration to get the higher PPD?



A lot of it depends on what work units you get. The 2653 and 2605 projects usually net the most PPD. They are the 1760 point WUs. Those WUs are very abundant right now so a lot of people have very high PPD because of them. Most of the other WUs don't get the same PPD as these do.

One of the instances on one of my quads has been getting project 3050 WUs which are worth 1440 points. The point value for them is lower and they take longer to fold than the other two projects I mentioned. My PPD on that one machine has taken a pretty big hit because of it.

The amount of RAM you allocate to each client may also make a difference. I noticed that at one point when I had more then 512 meg of RAM allocated to a client I wouldn't get the 2605 and 2653 project WUs. Once I set the allocation down to 512, that's about all they got. Well until the project 3050 WUs.

Mostly, the work units you get will be random except that once a client is assigned work units from a server, it will usually keep getting assigned to that same server. I'm pretty sure that's the reason why the one client keeps getting the project 3050 work units.

 
I've been running dual SMPs for a couple of days.

I'm running Q6600 @ 3.42
SMP1 - 1500 PPD (roughly 17 mins per step)
SMP2 - 1478 PPD (roughly 17.25 mins per step)

I don't understand how others are getting 4000+ ppd. Rig in Sig (obviously not at 3.6 due to temp issues).

My SMPs are configured for Low Priority (I believe this is the value you're supposed to use) and I've got 1024 assigned for memory.

Do I need to make a change to my configuration to get the higher PPD?



Are you running the affinity changer? I was getting 1500PPD per client at stock on my Q6600, so something is definitely wrong.
 
Depends on what protein your folding. The 3065 project is a slow low point dog... When my Q6600 @ 3.6 folded the 2653 it would do it at a speed for 2250 PPD per instance, so total 4500PPD.

That same computer is now at 1750 PPD per instance, so 3500PPD. So it can vary by what protein your folding ifyour folding the earlier something is wrong for sure.

 
I've been running dual SMPs for a couple of days.

I'm running Q6600 @ 3.42
SMP1 - 1500 PPD (roughly 17 mins per step)
SMP2 - 1478 PPD (roughly 17.25 mins per step)

I don't understand how others are getting 4000+ ppd. Rig in Sig (obviously not at 3.6 due to temp issues).

My SMPs are configured for Low Priority (I believe this is the value you're supposed to use) and I've got 1024 assigned for memory.

Do I need to make a change to my configuration to get the higher PPD?

Since you are running vista, is there any chance you have SP1 installed? That'll kill your times running two instances. Also, check to see that you see "Extra SSE boost ok" in your log files just after the protein starts up. If SSE isn't kicking in, that would also kill your frame times.

As others have already pointed out, not all WUs make the bigger PPD numbers. The P2653 is the one most windows users refer to when stating PPD. It is the most common WU at the moment and is worth 1760 points.
 
I've got 2 2653's running right now. And yes, unfortunately Vista x64 SP1 got installed that fateful date. SSE Bost is on

But I've heard of 2 minute differences per step due to SP1. With that in mind, I'm still getting 15 minute steps and that is still well below others :(



 
I've got 2 2653's running right now. And yes, unfortunately Vista x64 SP1 got installed that fateful date. SSE Bost is on

But I've heard of 2 minute differences per step due to SP1. With that in mind, I'm still getting 15 minute steps and that is still well below others :(




Is that a new install of Vista? Seems to me that maybe you need to thin the Vista install out some... when it installed on my gaming comp it took up by default 1.1gig... When I trimmed it down I got it to 600meg, that may help production.

P.S. the 2.1v on the memory seems high... 1.85v to 1.9v for DDR26400 memory is normally all that is needed, but you may have found that you needed that to be stable.

For comparison my Q6600 at 3.6ghz with the 2653 proteins was hitting 4,500 PPD and that is what you should be getting. Do you have Affinity Changer installed?

Other than that CTRL ALT DELETE and see what is eating compute cycles besides FAH... Also make sure EIST and Thermal Controls in the bios are OFF. I'm wondering what are your temps on you Quad? The Bios maybe throttling your CPU. Get the Core Temp 97.01 or whatever is the newest version from somewhere like major geeks and see what temps your cores are at. Rule of thumb is never over 70c, I shoot for never hitting 60c that way on hot hot days they won't be over 70c ever.

Good Luck!

 
I think my box is loopy :)

Now fahSpy is showing 4173 PPD :) It is still working on the same 2x2653 that it was working on this morning. The average step time went from 17 mins to 12 mins. Well at this point, I don't know what the heck is up. It's running fine so I'll just leave it alone. "If it ain't broke, I'm sure I can find something that will break it" LOL

 
Updated with #4 on the solutions list. Very quick CTRL-C shutdowns of the consoles seems to prevent them from crashing. See #4 in the solutions list in the OP for more details and instructions.

 
Back
Top