SMP hung on FINISHED_UNIT

Wheresatom

[H]ard|Gawd
Joined
Mar 20, 2007
Messages
1,390
I have been having some trouble with my SMP hanging up on the completion of a work unit. The work around I found was to 'Ctrl-C' out of the window and open task manager and kill any A1 core processes. Then a restart of the client seems to kick things back into shape nicely. I'd like not to have to do that because I can loose some serious time if I don't remember to come check it. What might be causing this trouble?

Code:
[21:13:40] Completed 247500 out of 250000 steps  (99 percent)
[21:34:00] Writing local files
[21:34:00] Completed 250000 out of 250000 steps  (100 percent)
[21:34:00] Writing final coordinates.
[21:34:02] Past main M.D. loop
[21:34:02] Will end MPI now
[21:35:02] 
[21:35:02] Finished Work Unit:
[21:35:02] - Reading up to 21310704 from "work/wudata_02.arc": Read 21310704
[21:35:02] - Reading up to 559352 from "work/wudata_02.xtc": Read 559352
[21:35:02] goefile size: 0
[21:35:02] logfile size: 212427
[21:35:02] Leaving Run
[21:35:04] - Writing 22088855 bytes of core data to disk...
[21:35:06]   ... Done.
[21:35:06] - Failed to delete work/wudata_02.sas
[21:35:06] - Failed to delete work/wudata_02.goe
[21:35:06] Warning:  check for stray files
[21:35:06] - Shutting down core
[21:37:06] 
[21:37:06] Folding@home Core Shutdown: FINISHED_UNIT
[21:37:06] 
[21:37:06] Folding@home Core Shutdown: FINISHED_UNIT

Folding@Home Client Shutdown at user request.

Folding@Home Client Shutdown.


--- Opening Log file [January 31 23:13:27 UTC] 


--- Opening Log file [January 31 23:13:27 UTC] 


# Windows SMP Console Edition #################################################
###############################################################################

                       Folding@Home Client Version 6.29

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\FAH
Executable: C:\FAH\FAH.exe
Arguments: -smp -advmethods 

[23:13:27] - Ask before connecting: No
[23:13:27] - User name: Wheresatom (Team 33)
[23:13:27] - User ID: 527BB9385EEE6437
[23:13:27] - Machine ID: 1
[23:13:27] 
[23:13:27] Loaded queue successfully.
[23:13:27] 
[23:13:27] + Processing work unit
[23:13:27] Work type a1 not eligible for variable processors
[23:13:27] Core required: FahCore_a1.exe
[23:13:27] Core found.
[23:13:27] Working on queue slot 02 [January 31 23:13:27 UTC]
[23:13:27] + Working ...
[23:13:27] 
[23:13:27] *------------------------------*
[23:13:27] Folding@Home Gromacs SMP Core
[23:13:27] Version 1.74 (March 10, 2007)
[23:13:27] 
[23:13:27] Preparing to commence simulation
[23:13:27] - Ensuring status. Please wait.
[23:13:44] - Looking at optimizations...
[23:13:44] - Working with standard loops on this execution.
[23:13:44] - Previous termination of core was improper.
[23:13:44] - Going to use standard loops.
[23:13:44] - Files status OK
[23:13:44] 
[23:13:44] Folding@home Core Shutdown: MISSING_WORK_FILES
[23:13:44] Finalizing output
[23:15:47] CoreStatus = 1 (1)
[23:15:47] Sending work to server
[23:15:47] Project: 2665 (Run 3, Clone 486, Gen 175)


[23:15:47] + Attempting to send results [January 31 23:15:47 UTC]
[23:16:27] + Results successfully sent
[23:16:27] Thank you for your contribution to Folding@Home.
[23:17:14] - Preparing to get new work unit...
[23:17:14] Cleaning up work directory
[23:17:14] + Attempting to get work packet
[23:17:14] Passkey found
[23:17:14] - Connecting to assignment server
[23:17:15] - Successful: assigned to (171.64.65.54).
[23:17:15] + News From Folding@Home: Welcome to Folding@Home
[23:17:15] Loaded queue successfully.
[23:17:19] + Closed connections
[23:17:24] 
[23:17:24] + Processing work unit
[23:17:24] Core required: FahCore_a3.exe
[23:17:24] Core found.
[23:17:24] Working on queue slot 03 [January 31 23:17:24 UTC]
[23:17:24] + Working ...
[23:17:24] 
[23:17:24] *------------------------------*
[23:17:24] Folding@Home Gromacs SMP Core
[23:17:24] Version 2.15 (Jan 15, 2010)
[23:17:24] 
[23:17:24] Preparing to commence simulation
[23:17:24] - Ensuring status. Please wait.
[23:17:33] - Looking at optimizations...
[23:17:33] - Working with standard loops on this execution.
[23:17:33] - Created dyn
[23:17:33] - Files status OK
[23:17:34] - Expanded 1411259 -> 2059833 (decompressed 145.9 percent)
[23:17:34] Called DecompressByteArray: compressed_data_size=1411259 data_size=2059833, decompressed_data_size=2059833 diff=0
[23:17:34] - Digital signature verified
[23:17:34] 
[23:17:34] Project: 6024 (Run 0, Clone 12, Gen 0)
[23:17:34] 
[23:17:34] Entering M.D.
[23:17:43] Completed 0 out of 500000 steps  (0%)

 
A long pause at the end is quite normal on A1 and A2 units. When -bigadv first came out, it wasn't uncommon for a pause of upwards of 20 to 30 minutes when it finished. I bet if you let it sit, you'd find it'd continue on just fine. I recently had an A1 unit finish and it sat there for 8 minutes before continuing on.
 
Wow... this just answered my question that I was about to post. I thought something was wrong with my bigadv VM since it pauses for about 20min and then starts uploading.

Is there a workaround?
 
Back
Top