• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

HELP: Early Unit Ends

APOLLO

[H]ard|DCer of the Month - March 2009
Joined
Sep 17, 2000
Messages
9,089
I'm getting a lot of early unit ends on one of my machines. It's a dual Barton@2200. I was not getting any WU interruptions until this month. The only thing different I did recently was change my video card for an ATI AIW 9600. Besides the usual suspects of OC and temperature, what other possible causes are there?

The system is one of the most stable I ever had (credit to Tyan). It exhibits no other manifestations of instability. The processors are slightly warm to the touch, thus I'm inclined to conclude that it's not a hardware issue. Furthermore, one of the two instances is far more prone to these interruptions than the other instance.

Any ideas??
 
I've had that happen, seems cpus can change a bit as the age, and face it, we abuse the hell out of them. My cure was a slight voltage increase, not sure if your tyan will support that or not. Once I bumped the core voltage no problems.

Give it a shot


BillR
 
I've had some early unit ends on a duron and just had on (at frame 5) on a p4 2.4. All were running with the "-advmethods" flag. I take it as being part of the price to pay for the faster Gromacs.
 
I noticed this with a unit yesterday on my Folding TV that isn't overclocked and has been running at stock since day one. It's a very stable system and never has boxen issues. The three units since that one has processed fine. The early termination ended at frame 19 and the unit type was a p917_v2180pf909
 
Originally posted by CIWS:

The early termination ended at frame 19 and the unit type was a p917_v2180pf909
The p9xx series are giving me the biggest problems, but I'm also having some problems with dimers. I'm going to try the voltage increase as BillR suggested and see if that helps any. Since the Tyan S2462 is not voltage adjustable, that will mean a pin mod...something I never tried before.
 
Today, my Barton 2500 @ 3200 crapped out on a

Protein: p917_v2180pf909 at frame 58. I think this protein is messed up.
 
Originally posted by Retro Rex:

Today, my Barton 2500 @ 3200 crapped out on a Protein: p917_v2180pf909 at frame 58. I think this protein is messed up.
The p917s never complete on my dual Barton.
 
APOLLO, the voltage mod isn't that bad, I did mine from the bottom of the cpu. Cut all uncut traces, I was able to do this with an exacto knife, you don't need to cut deep at all. Connect the pins for your mod with the silver paint, but use a toothpick to put a small drop between the pins you need to mod. If you get contact with other pins don't sweat it, as the paint drys use a clean toothpick to clean where you don't want the paint. If you totally screw the pooch, again, after it starts drying, just scrape what you don't want away, try again.

Beats the wire mod hands down IMHO;)

BillR
 
Originally posted by BillR:

APOLLO, the voltage mod isn't that bad, I did mine from the bottom of the cpu. Cut all uncut traces, I was able to do this with an exacto knife, you don't need to cut deep at all. Connect the pins for your mod with the silver paint, but use a toothpick to put a small drop between the pins you need to mod. If you get contact with other pins don't sweat it, as the paint drys use a clean toothpick to clean where you don't want the paint. If you totally screw the pooch, again, after it starts drying, just scrape what you don't want away, try again.
Thanks for the information. Is it necessary to cut bridges to perform the voltage mod? I recently fubared a Barton trying to unlock the multipliers. I'm a little reluctant to attempt another use of the x-acto knife.
 
Damn... A p917_v2180pf909 just EARLY_UNIT_END-ed on frame 63. It seems there is some kind of problem with these... Or is it my OC? SSE?

1800+ @ 2300, forcesse, advmethods...

Maybe someone should post about this on folding-community.org...
 
I've had no issues on the P917 protein. Current Barton 2500+ running at 1994 (slight OC) has completed 4 and is working on #5 right now. Anyone who is overclocking and having early work unit ends needs to first throttle back, test, and continue to put settings back closer to default speed settings until stability occurs. Stanford will pretty much ignore your bad results if you OC your CPU, and rightly so. They expect 'some' early work unit ends to occur because of possible impossible atom positions (atoms trying to occupy the same spot in space), but it is more rare now because they are better at setting up the original designs being tested as a start point. This doesn’t mean you won’t ever see an Early Work Unit End if you only run stock default settings. Check your other settings also. Aggressive memory settings? Set them back a bit. That can also help. Got your video card OC’d? That might cause it also.

As pointed out above, over time CPU's might become a little unstable from the wear and tear we are putting on them running flat out 24/7. The CPU's and the stock cooling systems were not designed with CPU's running at 100% in mind. Boosting voltage also increases the temperature (usually). I pull all my HSF's off the CPU's about every 3 months at the most and clean them, apply new HS compound (arctic silver in my case) and slap them back on. Keeps things from getting hot because of dirt and because the HS compound will deteriorate faster because of the continuous high heat from running this client. ;)
 
I had an "early unit end" today with the 917 on a stock P4 2.8.:(

100000.gif
 
Originally posted by Celerator
I had an "early unit end" today with the 917 on a stock P4 2.8.:(
This situation is most likely an unstable WU. I'd like to see them add a short message about what exactly happened.
 
Yes. I'm working on two of those 917's and will track completion (using EMIII of course!):D
100000.gif
 
Yes APOLLO, you do have to cut all the bridges, other wise you end up with really strange results.

The last Barton I did had the soft substrate and was pretty easy to cut. If you have the harder stuff it takes a bit more work. Just keep in mind it's only one straight cut and you don't have to worry about reconnecting anything on top of the chip. All your paint work is down under, on the pins. Check you work with a low voltage continunity tester to know if you went deep enough.

Have fun:D

BillR

http://www.ocinside.de/go_e.html?/html/workshop/socketa/

All the good info is under workshop
 
I just had one on a stock 1.5 centrino...ended on frame 48...posetd no error in log just said....

[05:01:11] Writing local files
[05:01:13] Completed 115000 out of 250000 steps (46)
[05:07:14] Writing local files
[05:07:16] Completed 117500 out of 250000 steps (47)
[05:13:17] Writing local files
[05:13:19] Completed 120000 out of 250000 steps (48)
[05:14:02] Gromacs cannot continue further.
[05:14:02] Going to send back what have done.
[05:14:02] logfile size: 13400
[05:14:02] - Writing 13936 bytes of core data to disk...
[05:14:02] ... Done.
[05:14:02]
[05:14:02] Folding@home Core Shutdown: EARLY_UNIT_END
[05:14:06] CoreStatus = 72 (114)
[05:14:06] Sending work to server


I think it is a bad WU...they should take this off of the server.....

I will end any 917's now I get so I don't waste my time....
 
Just because one work unit ends early doesn't mean all the work is bad on that work unit. Each work unit has a run, clone, and a generation number. They are pushing some models to the edge and they expect some to fail. This information is just as important to the project as those that fully complete the process. When you delete work units you put a delay in the progress of the project by inserting a pot hole which someone else has to fill in at a later date. Since the server that gave you the work unit has no idea how long it will take you to return it, it has to wait a certain amount of time before it assigns that work unit to someone else. This delay is like a car slowing down in 5 o'clock traffic 5 miles in front of you. The chain reaction, when it gets to you, stops the movement of the traffic and there isn't even an accident, just someone hitting the brakes. Doing stuff like deleting work units hurts the progress of the project. And, something you might not know being new to the project, you get partial credit for partially completing a work unit. They know you spent time on it and you get some credit... Stanford does recognize that your donation of CPU time is of value. ;)

BTW, 3 of my systems finished 3 of these last night and are currently working on 3 more which should finish in about 3 hours.
 
I hear ya.....I noticed after looking through my stats I have completed 9 of these to date and the biggest amount I have done on any one Wu.....I notice the gene is often different so I will just let it run....


You are right on the points...It did give me 1/2 the points since I was 48% done....


I wont stop any early....
 
Back
Top