Server reports problem with unit.

tkam · Sep 30, 2012

I've had my last two bigadv units get this "Server reports problem with unit." error message after hitting 100% completion and then doing the normal upload procedure. It's happened on my 4P AMD server and my 2P E5 server. One 8101 and one 8102 - only thing I can find in common is they both had the same assignment server: 128.143.231.201

Neither box has ever had a failed WU before that I'm aware of and neither are OC'd.

tkam · Sep 30, 2012

And now a 3rd bigadv WU getting the exact same error - this is also a different box.

Looks like grandpa is on top of things: http://foldingforum.org/viewtopic.php?f=18&t=22566

402blownstroker · Sep 30, 2012

I have 3 BA WUs with this problem in langouste. Can anyone config they have been sent or not? If not, what needs to be done to send them. Our wonderful weekend breakdown.....

It at least looks like the WU results are still in the work directory. Hopefully they can be resent on Stanford gets their shit together...

Jeanjean · Sep 30, 2012

I had this problem recently with two P8101.

I thought that it was due to bad overclocking .

musky · Sep 30, 2012

The servers appear to be accepting results again. I think we are all goig to be screwed on results that tried to upload during the server issues.

Core32 · Sep 30, 2012

What a crock. Nothing but these shit 8101s and now can't even upload the completed ones.
I have 4 total that now are giving this error.
Talk about piling on......

wings2004 · Sep 30, 2012

I thought it was just me the other day, well crap.

402blownstroker · Sep 30, 2012

The next time a WU completes will langouste try sending all queued WUs?

tear · Sep 30, 2012

These WUs appear to be permanently un-returnable (already tried a few things) :-(

Jeanjean · Sep 30, 2012

Standford can do nothing ?

sbinh · Sep 30, 2012

They can "screw" [H] in favor of eVga ..

..

tear · Sep 30, 2012

Jeanjean, I'm sure asking won't hurt.

At least not right away

402blownstroker · Sep 30, 2012

Jeanjean said:
Standford can do nothing ?

[pissandmoanmode]
Of course not. There is no one never available on the weekends to fix these things. They have millions on millions of dollars of resources donated to their cause and they can not get a couple of under graduate students to baby sit servers over the weekend.
[/pissandmoanmode]

Deleted member 12106 · Sep 30, 2012

sbinh said:
They can "screw" [H] in favor of eVga .. ..

This is my tin foil hat conspiracy.

texinga · Sep 30, 2012

Yep, it took a lot of EVGA-bucks to entice PG to pull this one on [H] this particular weekend, but we finally got them to do it. How ya like it?

Deleted member 12106 · Sep 30, 2012

texinga said:
Yep, it took a lot of EVGA-bucks to entice PG to pull this one on [H] this particular weekend, but we finally got them to do it. How ya like it?

That must be the PR-friendly way of saying evga put thier boy friend boots on

-alias- · Sep 30, 2012

Server reports problem with unit.
The same happen to 4 of my rigs, I lost 2 x 8102 and 2 x 8101.

1455p all together down the drain!

402blownstroker · Sep 30, 2012

I am down 4 WU and about 1.2M

tear · Sep 30, 2012

Nothing we can't deal with, right?

Nathan_P · Sep 30, 2012

Its got me again, i had the same problem earlier in the week, this time its Project: 8101 (Run 20, Clone 1, Gen 45) thats affected. I have another different WU processing now, if that fails i'm going back to SMP for a while. No point spending 2 days and 14Kwh of power for no reward

tear · Sep 30, 2012

There were no known outages earlier this week, I suspect something else may be going on.

Grandpa_01 · Sep 30, 2012

Well the way I see it is just move on, It is just a little bump in the road we just need to grab another gear put the pedal to the metal and smoke them thar fellars over at evga. After all they just go a little false hope adorned upon them now lets crush it.

Nathan_P · Sep 30, 2012

It happened last weekend, at the time it was blamed on an unstable overclocked machine which is difficult to do on an asus server mobo.

The rig is fine as its completed 2 BA wu this week without issue (both 8101), we are either getting duff WU to fold or a borked server is dealing with the uploads

tear · Sep 30, 2012

Were the symptoms exactly the same last weekend? (Server reports problem with unit.)
Do you have a log?

tkam · Sep 30, 2012

I'm still seeing every BA WU fail to upload correctly. Then it gets the same WU and starts processing again.

-alias- · Sep 30, 2012

I got this just a few minutes ago on my SR-2?

Code:

[19:58:22] - Autosending finished units... [September 30 19:58:22 UTC]
[19:58:22] + Processing work unit
[19:58:22] Trying to send all finished work units
[19:58:22] Project: 8101 (Run 3, Clone 12, Gen 19)
[19:58:22] Core required: FahCore_a5.exe
[19:58:22] Core found.


[19:58:22] + Attempting to send results [September 30 19:58:22 UTC]
[19:58:22] - Reading file work/wuresults_04.dat from core
[19:58:23] Working on queue slot 08 [September 30 19:58:23 UTC]
[19:58:23] + Working ...
[19:58:23] - Calling './FahCore_a5.exe -dir work/ -nice 19 -suffix 08 -np 24 -checkpoint 3 -verbose -lifeline 1830 -version 634'

[19:58:23]   (Read 91631428 bytes from disk)
[19:58:23] Connecting to http://128.143.231.201:8080/
[19:58:23] 
[19:58:23] *------------------------------*
[19:58:23] Folding@Home Gromacs SMP Core
[19:58:23] Version 2.27 (Thu Feb 10 09:46:40 PST 2011)
[19:58:23] 
[19:58:23] Preparing to commence simulation
[19:58:23] - Ensuring status. Please wait.
[19:58:33] - Looking at optimizations...
[19:58:33] - Working with standard loops on this execution.
[19:58:33] - Previous termination of core was improper.
[19:58:33] - Files status OK
[19:58:35] - Expanded 24863922 -> 30796292 (decompressed 123.8 percent)
[19:58:35] Called DecompressByteArray: compressed_data_size=24863922 data_size=30796292, decompressed_data_size=30796292 diff=0
[19:58:35] - Digital signature verified
[19:58:35] 
[19:58:35] Project: 6901 (Run 14, Clone 0, Gen 317)
[19:58:35] 
[19:58:35] Entering M.D.
[19:58:42] Mapping NT from 24 to 24 
[19:58:44] Completed 0 out of 250000 steps  (0%)
[20:00:52] Posted data.
[20:00:52] Initial: 0000; - Uploaded at ~600 kB/s
[20:00:52] - Averaged speed for that direction ~564 kB/s

[20:00:52] - Server reports problem with unit.

[20:00:52] + Sent 0 of 1 completed units to the server
[20:00:52] - Autosend completed

Patonb · Sep 30, 2012

I uploaded bout 1 1/2hrs ago

tear · Sep 30, 2012

These may be stale units -- if you check slot numbers they all should carry same slot number.

sbinh · Sep 30, 2012

I have at least 3 big adv got stuck since yesterday 6PM (EST) update.

this is sketchy to me ...

kasson said:
» Sun Sep 30, 2012 12:29 pm
Yes--I see something weird going on. Nothing has changed with the work server, but I think some of the people at Stanford may have changed the assignment server without telling me. I'm investigating.

tear · Sep 30, 2012

They won't be returned. If you look at queue entries, they have been marked as "finished" (uploaded).

I tried re-marking them for upload but server seems to have lost context of all outstanding WUs and
is not accepting any of 'old' units.

Just leave the clients running, they will eventually recover.

sbinh · Sep 30, 2012

Agree. It happened in the past and no matter what I tried, I couldn't send WUs manually.
Just leave the client running and hope SF fixes the issue.

tear · Sep 30, 2012

Aye, that is sketchy, sbinh... at best

But I gave up hoping for full disclosures... just glad it's fixed.

402blownstroker · Sep 30, 2012

tear said:
just glad it's fixed.

A turd wrapped in gold is still a turd.

tear · Sep 30, 2012

It is. But that's a topic for whole 'nother discussion

Core32 · Sep 30, 2012

This is crap. I'm burning hundreds of watts for nothing.
This is not even getting the science done at this point, even if it's ok to screw the points.
At my electric bill rate I will have to consider shutting these rigs down until someone at Stanford shows a bit of concern.

tear · Sep 30, 2012

Stuff's now fixed.
At my end -- ~1 day of lost work.

Rants in the FF won't help much as customer support is just absent there; gotta go straight to the boss man.

BTW, wanna vent? Hop on IRC!

EDIT: here's my take on this -- http://foldingforum.org/viewtopic.php?f=18&t=22566&p=225295#p225295

Nathan_P · Oct 1, 2012

tear said:
Were the symptoms exactly the same last weekend? (Server reports problem with unit.)
Do you have a log?

Exactly the same, i'll dig the log out when i get back from work

Tobit · Oct 2, 2012

From Dr. Kasson this morning...

"We have identified and fixed a WS-CS communication issue. This problem should be taken care of going forward; we are continuing to review the logs to analyze the impact of the problem on rejected work units."

Deleted member 12106 · Oct 2, 2012

Tobit said:
From Dr. Kasson this morning...

"We have identified and fixed a WS-CS communication issue. This problem should be taken care of going forward; we are continuing to review the logs to analyze the impact of the problem on rejected work units."

Translated to "hey we fucked up, but we think it *might* be fixed, it is now coffee break"

Speaking of which, I need more coffee...

freeloader1969 · Oct 2, 2012

I've had two 8101's go bad for half a million points. I'll let this one finish and if it fails, I'll be shutting down my folding rigs until Stanford fixes their "problem". My latest one just failed this morning.

Server reports problem with unit.

[H]ard|DCer of the Month - Dec. 2012

[H]ard|DCer of the Month - Dec. 2012

[H]ard|DCer of the Month - Nov. 2012

Weaksauce

[H]ard|DCer of the Year 2012

[H]ard|Gawd

Limp Gawd

[H]ard|DCer of the Month - Nov. 2012

[H]ard|DCer of the Year 2011

Weaksauce

2[H]4U

[H]ard|DCer of the Year 2011

[H]ard|DCer of the Month - Nov. 2012

Deleted member 12106

Guest

Limp Gawd

Deleted member 12106

Guest

Limp Gawd

[H]ard|DCer of the Month - Nov. 2012

[H]ard|DCer of the Year 2011

[H]ard DCOTM x3

[H]ard|DCer of the Year 2011

[H]ard|DCer of the Year 2013

[H]ard DCOTM x3

[H]ard|DCer of the Year 2011

[H]ard|DCer of the Month - Dec. 2012

Limp Gawd

Gawd

[H]ard|DCer of the Year 2011

2[H]4U

[H]ard|DCer of the Year 2011

2[H]4U

[H]ard|DCer of the Year 2011

[H]ard|DCer of the Month - Nov. 2012

[H]ard|DCer of the Year 2011

[H]ard|Gawd

[H]ard|DCer of the Year 2011

[H]ard DCOTM x3

[H]ard|DCer of the Month - March 2010/May 2011

Deleted member 12106

Guest

2[H]4U