MPICH SMP client permission issues?

uzor

Supreme [H]ardness
Joined
Nov 17, 2004
Messages
7,657
Reference files:

http://uzor.us/client.cfg
http://uzor.us/FAHlog-Prev.txt
http://uzor.us/FAHlog.txt

So, I'd been waiting on a new power supply before I started folding on my new rig, and finally got it not too long ago. Downloaded the GPU2 client and got that up and running on my GTX260 no problem. Downloaded the MPICH SMP client to run as a service on my Vista x64 Ult. install. Went through all the configuration steps as per the walkthrough (at least to the best of my knowledge), and started up the service. Opened the log (FAHlog-Pref.txt linked above) to check that it was running right, and I noticed a few odd things.

Launch directory: C:\Users\Jeff\FAH
Service: C:\Users\Jeff\FAH\fah6
Arguments: -svcstart -d C:\Users\Jeff\FAH -smp -verbosity 9 -d "C:\Users\Jeff\FAH"

Launched as a service.
Could not enter C:\Users\Jeff\FAH! Working in launch directory.
Here you can see the arguments being called along with the service. First off, I'm not sure where the first instance of the start path is coming from, as I only entered that into the config once, and I entered it with quotes, as directed. I also entered my login credentials into the service properties before starting the service (Jeff, as seen above).

It proceeded to run just fine until I got to 17% of the first WU, when I got the following errors/issues:

[06:47:55] Completed 850000 out of 5000000 steps (17 percent)
[07:00:33] CoreStatus = 7B (123)
[07:00:33] Sending work to server
[07:00:33] Project: 3064 (Run 0, Clone 163, Gen 44)
[07:00:33] - Error: Could not get length of results file work/wuresults_01.dat
[07:00:33] - Error: Could not read unit 01 file. Removing from queue.
[07:00:33] Trying to send all finished work units
[07:00:33] + No unsent completed units remaining.
[07:00:33] - Preparing to get new work unit...
[07:00:40] Working on queue slot 02 [March 22 07:00:40 UTC]
[07:00:40] + Working ...
[07:00:40] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 02 -checkpoint 15 -service -verbose -lifeline 4640 -version 623'

[07:00:44] CoreStatus = 63 (99)
[07:00:44] + Error starting Folding@Home core.
[07:00:49]
[07:00:49] + Processing work unit
[07:00:49] Work type a1 not eligible for variable processors
[07:00:49] Core required: FahCore_a1.exe
[07:00:49] Core found.
[07:00:49] Using generic mpiexec calls
[07:00:49] Working on queue slot 02 [March 22 07:00:49 UTC]
[07:00:49] + Working ...
[07:00:49] - Calling 'mpiexec -np 4 -channel auto -host 127.0.0.1 FahCore_a1.exe -dir work/ -suffix 02 -checkpoint 15 -service -verbose -lifeline 4640 -version 623'
Now, the above two excerpt coincide very closely with my shutting down the computer for the evening. I thought that FAH was supposed to save it's progress when the machine shut down.

After turning the computer on the next morning, FAH started working on the new WU just fine, after returning the "Could not enter directory" error message listed above. It proceeded until 94%, at which time I think my daugher (age 2) restarted my computer on her own (bad girl!). It restarted and gave me the following after the opening sequence seen above.
[15:09:08] Preparing to commence simulation
[15:09:08] - Ensuring status. Please wait.
[15:09:25] - Looking at optimizations...
[15:09:25] - Working with standard loops on this execution.
[15:09:25] - Previous termination of core was improper.
[15:09:25] - Fileg to tse standard loops.
[15:09:25] - Files status OK
[15:09:26] ecompressed 534.7 percent)
[15:09:26] 1 (decompressed 534.7 percent)
[15:09:26] - Checksums don't match (work/wudata_02.xtc)
[15:09:26] kup...
[15:09:26] - Checksums - Failed to delete work/wudata_02.bed
[15:09:26] - Failed to delete- Starting from initial work packet
All that work, wasted.... :(
Restarted, and actually completed the WU that time. Got the following.
[17:57:23] Finished Work Unit:
[17:57:23] - Reading up to 516624 from "work/wudata_02.arc": Read 516624
[17:57:23] - Reading up to 971324 from "work/wudata_02.xtc": Read 971324
[17:57:23] goefile size: 0
[17:57:23] logfile size: 275943
[17:57:23] Leaving Run
[17:57:28] - Writing 1964935 bytes of core data to disk...
[17:57:28] ... Done.
[17:57:28] - Failed to delete work/wudata_02.sas
[17:57:28] - Failed to delete work/wudata_02.goe
[17:57:28] Warning: check for stray files
[17:57:28] - Shutting down core
[17:59:28]
[17:59:28] Folding@home Core Shutdown: FINISHED_UNIT
[17:59:28]
[17:59:28] Folding@home Core Shutdown: FINISHED_UNIT
[17:59:32] CoreStatus = 64 (100)
[17:59:32] Unit 2 finished with 32 percent of time to deadline remaining.
[17:59:32] Updated performance fraction: 0.317332
[17:59:32] Sending work to server
...
[17:59:43] - Warning: Could not delete all work unit files (2): Core returned invalid code
Shortly after that I restarted to update my AV software, and when I checked its status, it had restarted with the same startup errors above, but had also dumped the existing log file (now called FAHlog-Prev.txt), and started a new one (called FAHlog.txt here).

As it stands now, I am at about 32% on this WU and plugging along just fine. Now, I realize that this is probably information overload for what seems to be a single permission issue, but I'm not sure what I missed. Ran the install.bat and config steps in a command prompt as administrator. Added the work path in the additional flags field of the config file (linked above), and entered my user info into the service logon tab of the services snap-in. BTW, linked files above have the full text of this install of FAH.

What did I miss?

Thanks!

 
One thing I usually have to do with Vista machines is turn of UAC temporarily to get things running okay. I am not sure if this would help you. You can turn UAC back on when things are running okay.

Do you normally turn off your computers? I do not ever turn mine off.

Did you close the folding program before turning the machine off? You need to wait about 120 seconds after closing the folding program. I just ctrl-C to turn of the folding. But you must wait before shutting the boxen off or rebooting.
 
UAC has been off while I was doing this (interferes with a custom web app I need for work). When it's folding, I like to leave it on for obvious reasons, but since I do use it as my main PC at home, it does get restarted and whatnot from time to time. Thanks for the tip about stopping it manually beforehand - Didn't realize that this client was that touchy about it. Since it is set to run as a service, I always figured that the OS would give it a graceful shutdown command and wait for it to finish before proceeding with the reboot. Maybe once my current WU is done, I'll just try a reinstall of FAH in a directory at the top level of the drive. See if that makes a difference...
 
I am pretty sure that there are problems with installing the Windows SMP as a service. I think it is best to start it and stop it manually after boot up. I remember reading something here at the DC sub-forum about waiting until the Windows boot up process was all done before initiating the SMP start.
 
and I'd be fine with that, except that when I initially created/set it up, I went in to start the service manually after changing the logon credentials, I got that same "could not enter folder" error message as mentioned above. And that was well after the machine had been started up. If I could find some other way to hide the app from the taskbar, I wouldn't even mind launching it manually. However, I found and tried a few "minimize to tray" and "hide application" utilities, but they don't seem to work with the FAH window.

I'm wondering if the issue is arising from the fact that the "-d C:\Users\Jeff\FAH" flag is listed twice, and the first time is without quotes. Is the first instance of that flag created by saying yes to the "run as a service" dialog option? I looked in the config file and only saw the 2nd one listed that had the quotes - couldn't find where the first instance of it was coming from.
 
I don't know. I cannot answer that question.

I have, at times, nuked the whole install and started over to get it working the way I want.

Best of luck getting it working!
 
Back
Top