Is there something up at Stanford? Rejected results.

Carbon_Rod

Gawd
Joined
Apr 2, 2012
Messages
1,022
I've been having some real stability issues with my 780's lately. After numerous driver updates and rollbacks, underclocking, adjusting voltages, memtests both on the system and on the GPUs, reformatting, etc. I just can't get my 780's to fold without crashing anymore.

I started testing my system by attempting to fold on each card individually and started seeing some weird things happening... here's what I found in my log after I started folding on a single 780:
Code:
23:25:38:WU00:FS00:0x17:Completed 0 out of 2500000 steps (0%)
[COLOR="Red"]23:25:38:WU00:FS00:0x17:Temperature control disabled. Requirements: single Nvidia GPU, tmax must be < 110 and twait >= 900[/COLOR]
23:28:43:WU00:FS00:0x17:Completed 25000 out of 2500000 steps (1%)
23:31:39:WU00:FS00:0x17:Completed 50000 out of 2500000 steps (2%)
23:34:47:WU00:FS00:0x17:Completed 75000 out of 2500000 steps (3%)
Even though I hadn't seen that before, I didn't know what to do about it anyway so I just let it go and continued on. That's when I noticed this at the end of each unit I completed:
Code:
04:28:36:WU00:FS00:0x17:Completed 2500000 out of 2500000 steps (100%)
04:28:47:WU00:FS00:0x17:Saving result file logfile_01.txt
04:28:47:WU00:FS00:0x17:Saving result file checkpointState.xml
04:28:49:WU00:FS00:0x17:Saving result file checkpt.crc
04:28:49:WU00:FS00:0x17:Saving result file log.txt
04:28:49:WU00:FS00:0x17:Saving result file positions.xtc
04:28:51:WU00:FS00:0x17:Folding@home Core Shutdown: FINISHED_UNIT
04:28:51:WU00:FS00:FahCore returned: FINISHED_UNIT (100 = 0x64)
04:28:51:WU00:FS00:Sending unit results: id:00 state:SEND error:NO_ERROR project:8900 run:746 clone:0 gen:137 core:0x17 unit:0x000000a5028c126651a6cc8e3b9c572b
04:28:51:WU00:FS00:Uploading 12.96MiB to 171.64.65.69
04:28:51:WU00:FS00:Connecting to 171.64.65.69:8080
04:28:57:WU00:FS00:Upload 6.27%
04:29:03:WU00:FS00:Upload 13.51%
04:29:09:WU00:FS00:Upload 20.74%
04:29:15:WU00:FS00:Upload 27.98%
04:29:21:WU00:FS00:Upload 35.22%
04:29:27:WU00:FS00:Upload 41.97%
04:29:33:WU00:FS00:Upload 48.72%
04:29:39:WU00:FS00:Upload 55.48%
04:29:45:WU00:FS00:Upload 62.23%
04:29:51:WU00:FS00:Upload 69.47%
04:29:57:WU00:FS00:Upload 76.22%
04:30:03:WU00:FS00:Upload 82.97%
04:30:09:WU00:FS00:Upload 89.73%
04:30:15:WU00:FS00:Upload 96.96%
04:30:18:WU00:FS00:Upload complete
[COLOR="Red"]04:30:18:WU00:FS00:Server responded WORK_QUIT (404)
04:30:18:WARNING:WU00:FS00:Server did not like results, dumping[/COLOR]
Apparently, this only happens when I do beta units. The good news is that I was able to complete 3 additional units without any crashes, but the bad news is that out of those 3 additional beta units so far all the results have been dumped even when they complete successfully.

I changed the client type to advanced and was successful in completing and sending the results of at least one advanced unit, but for far fewer points than any of the beta units I've done recently.

Someone else who was having this issue hypothesized it was because Stanford had updated client 17 to version 0.0.47 from 0.0.46.

Has anyone else encountered anything like this recently? Has there been any news on the FF beta forum about any core updates/changes? I don't have beta forum access so I can't look myself.
 
If you do indeed have 0.0.47 then that is why you are seeing this, the CS is rejecting WU's done with 47, Did you pick up 47 with the beta flag if so something is wrong because it is in internal testing and should not be going out to the general public, at least it is not supposed to be going out. Delete the core and replace it with 46 and you should not have any more problems.
 
Yeah 47 is very buggy at the moment, as you may have noticed it adds some experimental temp monitoring stuff.
 
Ya, I had the client type set to beta. Has the errant core version been withdrawn for download from Stanford's servers? I guess what I'm getting at is if I delete the version 47 core, will the client download the correct one (version 46) from the Stanford server? Or do I have to see if I can manually replace it with version 46?
 
You shouldn't be able to download version 47 without a key anyway. Their is some sort of bug in their system.

just manually replace the core. Take the one from your advanced and put it in your beta folder.
 
I'm not sure if I have a version 46 core 17 handy... I'll have to do some searching. The advanced unit I completed was a core 15 unit.... though I might just set the client type to advanced and keep it there until I do get a version 46. Version 47 was the first one it downloaded after I freshly reformatted that system. I never put in a key of any sort is all I know and I ended up stuck with it.

Update: I just deleted the core 17 from the beta folder and let it download a new version for kicks... also because I was curious, and I now have version 48. I figure it wouldn't hurt to try it. ;)
 
<Spazturtle_> v48 should work right?
<Spazturtle_> the server was rejecting wu run on v47
<PantherX> I do hope that it is fixed, we are testing it now
<PantherX> so far, no one has completed it.
<PantherX> I will finish mine in about 45 minutes and will report back

Lets see if this fixes the issue.
 
Yep issue fixed

<PantherX> The Server is now accepting the WUs from v0.0.48:
<PantherX> 00:00:29:WU00:FS01:Upload complete
<PantherX> 00:00:29:WU00:FS01:Server responded WORK_ACK (400)
<PantherX> 00:00:29:WU00:FS01:Final credit estimate, 7.00 points
<PantherX> 00:00:29:WU00:FS01:Cleaning up
 
No problem.

One last thing, if you find bugs with core 17 please post then on the official folding forums. If you don't have access to the beta section just post them in another section and the thread will be moved to the correct location.
 
Recommend you delete core 17 and get the new v49 one it fixes a "<proteneer> bad shit crazy bug"
 
Hrm... I thought maybe the core update might help with some of my 780 stability issues when they are run in tandem. Alas, it doesn't.... guess I have some more troubleshooting to do. :(
 
So what did it say in the Manual that you did not have set up right. ?
 
I was using a P8Z77 WS workstation board with 4 PCIe-x16 slots. I had put my 780's in slots 1 and 3 to allow for maximum airflow between GPUs (since I don't yet have a 3rd or 4th 780). But according to the manual, with 2 GPUs, you still populate the slots in order. Slots 1 and 2 are to be used. That fixed the issue.
 
Back
Top