Lost 6903

Core32 · Apr 2, 2012

I noticed while at work today on the stats site that a 6903 I was expecting to drop today, did not.

When I get home the rig is still folding, now on a 6901, at about 65% complete.
I looked through the log file and found this at the end of the 6903:

[14:39:46] Completed 247500 out of 250000 steps (99%)
[14:53:55] Completed 250000 out of 250000 steps (100%)
[14:54:18] DynamicWrapper: Finished Work Unit: sleep=10000
[14:54:28]
[14:54:28] Finished Work Unit:
[14:54:28] - Reading up to 121622496 from "work/wudata_03.trr": Read 121622496
[14:54:30] trr file hash check passed.
[14:54:30] - Reading up to 108764264 from "work/wudata_03.xtc": Read 108764264
[14:54:31] xtc file hash check passed.
[14:54:31] edr file hash check passed.
[14:54:31] logfile size: 209834
[14:54:31] Leaving Run
[14:54:35] - Writing 230769586 bytes of core data to disk...
[14:55:49] Done: 230769074 -> 222427144 (compressed to 3.3 percent)
[14:55:50] ... Done.
[14:56:11] - Shutting down core
[14:56:11]
[14:56:11] Folding@home Core Shutdown: FINISHED_UNIT
[14:56:14] CoreStatus = 64 (100)
[14:56:14] Sending work to server
[14:56:14] Project: 6903 (Run 6, Clone 13, Gen 66)

[14:56:14] + Attempting to send results [April 2 14:56:14 UTC]
[15:09:51] - Server reports problem with unit.
[15:09:51] - Preparing to get new work unit...
[15:09:51] Cleaning up work directory
[15:09:52] + Attempting to get work packet
[15:09:52] Passkey found
[15:09:52] - Connecting to assignment server
[15:09:52] - Successful: assigned to (130.237.232.237).
[15:09:52] + News From Folding@Home: Welcome to Folding@Home
[15:09:53] Loaded queue successfully.
[15:10:15] + Closed connections
[15:10:15]
[15:10:15] + Processing work unit
[15:10:15] Core required: FahCore_a5.exe
[15:10:15] Core found.
[15:10:15] Working on queue slot 04 [April 2 15:10:15 UTC]
[15:10:15] + Working ...

This is the new 6166HE 4P rig, running the kraken and OCd to 241, vcore at 1.000V.
Any chance to salvage that WU?
I don't remember exiting while this one was running or making any adjustment after the last WU.

jojo69 · Apr 2, 2012

Don't know,

but I will commiserate...that is a bummer dude, I hate lost work

sirmonkey1985 · Apr 2, 2012

no if the WU was sent and reported to have an error by the server. then the WU is gone and even if you try to rerun the WU you still won't get points for it. its probably already been sent out to some one else at this point.

Core32 · Apr 2, 2012

Losing one sux. Not knowing why sux the big one.
Any way to determine what went wrong?

theGryphon · Apr 2, 2012

If I had to guess, I'd say the high OC caused some instability which resulted in a miscalculation of something and the WU got corrupted. Idk, just a guess... It's more likely a fluke. But, I'd watch for similar problems...

Core32 · Apr 2, 2012

Yep. Watching the 6901 finish tonight before 10 EST. Hopefully that goes through.
Thanks.

ChelseaOilman · Apr 2, 2012

Somebody else already completed the WU and received credit for it.

Your WU (P6903 R6 C13 G66) was added to the stats database on 2012-04-02 08:07:48 for 22706 points of credit.

It could be a corrupted WU from OCing but a lot of times this message is because after downloading the WU the person changes somthing like the user name or passkey. Or sneakernets the WU.

Core32 · Apr 2, 2012

Very interesting. Is that time UTC or EST or, etc.?

ChelseaOilman · Apr 2, 2012

UTC I believe.

WU assigned to donor at: 2012-03-27 06:54:50
Days taken to complete WU: 6.05
Credit Time: 2012-04-02 08:07:48

Core32 · Apr 3, 2012

How abnormal is it that I received the same WU about 4 days after the other donor?
Looks like I finished it about 7 hours after it was credited. Guess I need to push the OC some more.

I have seen two of my GPUs get the same WU within minutes of each other.

Kendrak · Apr 3, 2012

It is normal.

The work gets sent out a few times to compare results and to validate.

Core32 · Apr 3, 2012

Then everyone who folds that WU gets credit or only the first to complete?
I'm sure my DL of this WU came at least 4 days after the other donor.
There must have been a difference between results to get the WU thrown out I suppose.

Kendrak · Apr 3, 2012

Everyone who finishes the WU in time gets credit.

It is part of the scientific process to validate results.

theGryphon · Apr 3, 2012

I remember reading that PG intentionally sends out the same WU multiple times, though probably not all WUs. It may be to make sure they're successfully returned, by adding sort of an intentional redundancy.

Core32 · Apr 3, 2012

So I should have gotten credit unless there was actually something wrong with my results.

ChelseaOilman · Apr 3, 2012

There was something wrong with your results. That's why you got that message in your log and didn't get any points. The reason why the WU got reissued and you received it is the same reason why PG doesn't want people with slow computers running bigadv. The 1st person that was issued the WU went past the preferred deadline of 5 days. People that can't finish WUs within the preferred deadlines are just wasting resources. The guy that got the WU 1st shouldn't be doing bigadv.

Core32 · Apr 3, 2012

ChelseaOilman said:
UTC I believe.

WU assigned to donor at: 2012-03-27 06:54:50
Days taken to complete WU: 6.05
Credit Time: 2012-04-02 08:07:48

Ok. I'll quit whining now

I thought from your first post that the "problem" was that someone else completed the WU first, not that the results were bad.
Thanks.

Lost 6903

Core32

[H]ard|Gawd

jojo69

[H]F Junkie

sirmonkey1985

[H]ard|DCer of the Month - July 2010

Core32

[H]ard|Gawd

theGryphon

[H]ard|Gawd

Core32

[H]ard|Gawd

ChelseaOilman

[H]ard|Gawd

Core32

[H]ard|Gawd

ChelseaOilman

[H]ard|Gawd

Core32

[H]ard|Gawd

Kendrak

[H]ard|DCer of the Year 2009

Core32

[H]ard|Gawd

Kendrak

[H]ard|DCer of the Year 2009

theGryphon

[H]ard|Gawd

Core32

[H]ard|Gawd

ChelseaOilman

[H]ard|Gawd

Core32

[H]ard|Gawd