Bad work unit? Project 8004 (Run 31, Clone 43, Gen 308)

Carbon_Rod

Gawd
Joined
Apr 2, 2012
Messages
1,022
I had one of my SMP boxen offline for the day while I was at work, and when I got home I started it up and a while later noticed this failed work unit in HFM.

Code:
*********************** Log Started 2012-11-06T01:48:47Z ***********************
01:48:47:************************* Folding@home Client *************************
01:48:47:    Website: http://folding.stanford.edu/
01:48:47:  Copyright: (c) 2009-2012 Stanford University
01:48:47:     Author: Joseph Coffland <[email protected]>
01:48:47:       Args: --child --lifeline 925 /etc/fahclient/config.xml --run-as fahclient
01:48:47:             --pid-file=/var/run/fahclient.pid --daemon
01:48:47:     Config: /etc/fahclient/config.xml
01:48:47:******************************** Build ********************************
01:48:47:    Version: 7.1.52
01:48:47:       Date: Mar 20 2012
01:48:47:       Time: 13:19:11
01:48:47:    SVN Rev: 3515
01:48:47:     Branch: fah/trunk/client
01:48:47:   Compiler: GNU 4.6.2
01:48:47:    Options: -std=gnu++98 -O3 -funroll-loops -mfpmath=sse -ffast-math
01:48:47:             -fno-unsafe-math-optimizations -msse2
01:48:47:   Platform: linux2 3.2.0-1-amd64
01:48:47:       Bits: 64
01:48:47:       Mode: Release
01:48:47:******************************* System ********************************
01:48:47:        CPU: Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz
01:48:47:     CPU ID: GenuineIntel Family 6 Model 42 Stepping 7
01:48:47:       CPUs: 8
01:48:47:     Memory: 7.77GiB
01:48:47:Free Memory: 7.29GiB
01:48:47:    Threads: POSIX_THREADS
01:48:47: On Battery: false
01:48:47: UTC offset: -7
01:48:47:        PID: 938
01:48:47:        CWD: /var/lib/fahclient
01:48:47:         OS: Linux 3.2.0-30-generic x86_64
01:48:47:    OS Arch: AMD64
01:48:47:       GPUs: 1
01:48:47:      GPU 0: ATI:4 NI Whistler [AMD Radeon HD 6600 Series]
01:48:47:       CUDA: Not detected
01:48:47:***********************************************************************
01:48:47:<config>
01:48:47:  <!-- Network -->
01:48:47:  <proxy v=':8080'/>
01:48:47:
01:48:47:  <!-- Remote Command Server -->
01:48:47:  <command-allow v='127.0.0.1 10.0.0.0/24'/>
01:48:47:  <command-allow-no-pass v='127.0.0.1 10.0.0.0/24'/>
01:48:47:
01:48:47:  <!-- User Information -->
01:48:47:  <passkey v='********************************'/>
01:48:47:  <team v='33'/>
01:48:47:  <user v='Carbon_Rod'/>
01:48:47:
01:48:47:  <!-- Folding Slots -->
01:48:47:  <slot id='0' type='SMP'>
01:48:47:    <cpus v='8'/>
01:48:47:  </slot>
01:48:47:</config>
01:48:47:Switching to user fahclient
01:48:47:Trying to access database...
01:48:47:Successfully acquired database lock
01:48:47:Enabled folding slot 00: READY smp:8
01:48:47:WU01:FS00:Starting
01:48:47:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 938 -checkpoint 15 -np 8
01:48:47:WU01:FS00:Started FahCore on PID 981
01:48:47:WU01:FS00:Core PID:985
01:48:47:WU01:FS00:FahCore 0xa4 started
01:48:48:WU01:FS00:0xa4:
01:48:48:WU01:FS00:0xa4:*------------------------------*
01:48:48:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
01:48:48:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
01:48:48:WU01:FS00:0xa4:
01:48:48:WU01:FS00:0xa4:Preparing to commence simulation
01:48:48:WU01:FS00:0xa4:- Ensuring status. Please wait.
01:48:57:WU01:FS00:0xa4:- Looking at optimizations...
01:48:57:WU01:FS00:0xa4:- Working with standard loops on this execution.
01:48:57:WU01:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
01:48:57:WU01:FS00:0xa4:- Expanded 545180 -> 1306608 (decompressed 239.6 percent)
01:48:57:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=545180 data_size=1306608, decompressed_data_size=1306608 diff=0
01:48:57:WU01:FS00:0xa4:- Digital signature verified
01:48:57:WU01:FS00:0xa4:
01:48:57:WU01:FS00:0xa4:Project: 8004 (Run 31, Clone 43, Gen 308)
01:48:57:WU01:FS00:0xa4:
01:48:57:WU01:FS00:0xa4:Entering M.D.
01:49:03:WU01:FS00:0xa4:Completed 0 out of 250000 steps  (0%)
01:49:04:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
01:49:04:WU01:FS00:Starting
01:49:04:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 938 -checkpoint 15 -np 8
01:49:04:WU01:FS00:Started FahCore on PID 1596
01:49:04:WU01:FS00:Core PID:1600
01:49:04:WU01:FS00:FahCore 0xa4 started
01:49:04:WU01:FS00:0xa4:
01:49:04:WU01:FS00:0xa4:*------------------------------*
01:49:04:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
01:49:04:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
01:49:04:WU01:FS00:0xa4:
01:49:04:WU01:FS00:0xa4:Preparing to commence simulation
01:49:04:WU01:FS00:0xa4:- Ensuring status. Please wait.
01:49:13:WU01:FS00:0xa4:- Looking at optimizations...
01:49:13:WU01:FS00:0xa4:- Working with standard loops on this execution.
01:49:13:WU01:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
01:49:14:WU01:FS00:0xa4:- Expanded 545180 -> 1306608 (decompressed 239.6 percent)
01:49:14:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=545180 data_size=1306608, decompressed_data_size=1306608 diff=0
01:49:14:WU01:FS00:0xa4:- Digital signature verified
01:49:14:WU01:FS00:0xa4:
01:49:14:WU01:FS00:0xa4:Project: 8004 (Run 31, Clone 43, Gen 308)
01:49:14:WU01:FS00:0xa4:
01:49:14:WU01:FS00:0xa4:Entering M.D.
01:49:20:WU01:FS00:0xa4:Completed 0 out of 250000 steps  (0%)
01:49:20:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)
01:50:02:Server connection id=1 on 0.0.0.0:36330 from 10.0.0.53
01:50:04:WU01:FS00:Starting
01:50:04:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/www.stanford.edu/~pande/Linux/AMD64/Core_a4.fah/FahCore_a4 -dir 01 -suffix 01 -version 701 -lifeline 938 -checkpoint 15 -np 8
01:50:04:WU01:FS00:Started FahCore on PID 1614
01:50:04:WU01:FS00:Core PID:1618
01:50:04:WU01:FS00:FahCore 0xa4 started
01:50:04:WU01:FS00:0xa4:
01:50:04:WU01:FS00:0xa4:*------------------------------*
01:50:04:WU01:FS00:0xa4:Folding@Home Gromacs GB Core
01:50:04:WU01:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
01:50:04:WU01:FS00:0xa4:
01:50:04:WU01:FS00:0xa4:Preparing to commence simulation
01:50:04:WU01:FS00:0xa4:- Ensuring status. Please wait.
01:50:14:WU01:FS00:0xa4:- Looking at optimizations...
01:50:14:WU01:FS00:0xa4:- Working with standard loops on this execution.
01:50:14:WU01:FS00:0xa4:Examination of work files indicates 8 consecutive improper terminations of core.
01:50:14:WU01:FS00:0xa4:- Expanded 545180 -> 1306608 (decompressed 239.6 percent)
01:50:14:WU01:FS00:0xa4:Called DecompressByteArray: compressed_data_size=545180 data_size=1306608, decompressed_data_size=1306608 diff=0
01:50:14:WU01:FS00:0xa4:- Digital signature verified
01:50:14:WU01:FS00:0xa4:
01:50:14:WU01:FS00:0xa4:Project: 8004 (Run 31, Clone 43, Gen 308)
01:50:14:WU01:FS00:0xa4:
01:50:14:WU01:FS00:0xa4:Entering M.D.
01:50:20:WU01:FS00:0xa4:Completed 0 out of 250000 steps  (0%)
01:50:20:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

The weird part is that I've completed literally hundreds of units from project 8004 and this is the first one that has failed on me. I've tried restarting the machine and it just comes back with the same message. So, for all intents and purposes, this particular unit is stalled/broken. So, before I blow that unit away so I can at least continue to fold on that box, is there anything else I can/should do/try to get it to fold?

EDIT: I guess I just wanted to clarify, that's not me interrupting the core... it just appears to start folding but comes back with the "INTERRUPTED" message immediately afterwards.
 
Last edited:
In the official folding forum, there is some record of UNSTABLE_MACHINE / BAD_WORK_UNIT / EARLY_UNIT_END, but not for same Run/Clone/Gen as you.

I would suggest you post your log in this thread http://foldingforum.org/viewforum.php?f=19&sid=adbdaf156bdce991db3d266a63487217, so the Stanford mods can check if another user get same RCG as you, and if this person gets same error are you.

If the WU is bad, Stanford will remove it. If the WU is ok... we will need to get a closer look at your rig
 
Back
Top