Man.....this sucks...

Crosshairs

Administrator
Staff member
Joined
Feb 3, 2004
Messages
25,388
4 of my quads all got EEU at about 80% today,,,WTF???

I just want to shut them down and walk away right now.....
 
4 of my quads all got EEU at about 80% today,,,WTF???

I just want to shut them down and walk away right now.....

What Project #'s? Some of them have heavier memory demands which based on your setup amy cause this to happen.
 
2653....all of them.....but these machines have been running for months without issue...all of a sudden they shit the bed today.......

Its not network related as they are in 2 different states.....
I just find it strange they all failed at about exactly the same spot....
 
2653....all of them.....but these machines have been running for months without issue...all of a sudden they shit the bed today.......

Its not network related as they are in 2 different states.....
I just find it strange they all failed at about exactly the same spot....

Might I suggest deleting the work directory, delete the log files and que.dat. What I have seen a lot is the work file becomes too large and full of junk leftovers. If you like you could even delete the core and when you start back up give the program a few minutes and it should come back to normal.

I haven’t had an early end since I started doing this on a regular basis.

Luck man, get back in the fold (bad pun, I know);)
 
Might I suggest deleting the work directory, delete the log files and que.dat. What I have seen a lot is the work file becomes too large and full of junk leftovers. If you like you could even delete the core and when you start back up give the program a few minutes and it should come back to normal.

I haven’t had an early end since I started doing this on a regular basis.

Luck man, get back in the fold (bad pun, I know);)


Done...lets see what happens..Thanks for the info..:p

Ill do the other boxes on Monday when I get back there
 
This happened to me too. Thanks for the tip Bill. I thought it might have something to do with my newly upped overclock. We will see if it happens again.

 
Yes, thanks Bill for the tip. I knew my OCs were too mild for all the EUEs I have been getting lately. My production has been suffering because of all the issues. Even the ordinarily stalwart 2653s have been very problematic.

 
This reminds me that I am going to need to figure out how to clean those directories up myself. I have not touched one of them on any of my 9 windows boxes since I started... so any tools to assist with this in windows?

 
Open your FAHlog.txt file.
Your looking for a line at the very start of the protien your crunching.
Your after the Unit number
One of mine looks like ........ [13:09:39] Working on Unit 03 [May 16 13:09:39] <- It tells you the files you want to save.

Now open your work folder.
Delete any .0* files that dont match your Unit number
In my case I'd delete any .01, .02, .04, .05, .06, .07, .08, .09, .00 files.
I'd save the .03 files as I'm working with the Unit 03 files.

Not sure if you can automate this.

Luck .............. :D
 
Another way is to run fah6 -queueinfo then see in what rank it's currently crunching. Then do what Tigerbiten said and delete the rest around (make a backup first in case you slipped and deleted a critical file by accident. If it run fine afterwards, flush the backup).

 
I think I did it wrong because now FahMon says *Hung*. I am still getting about 12 second/frame but FahMon isn't recognizing it right. I am going out of town in an hour or so. Hopefully it sorts itself out while I am gone.

 
I think I did it wrong because now FahMon says *Hung*. I am still getting about 12 second/frame but FahMon isn't recognizing it right. I am going out of town in an hour or so. Hopefully it sorts itself out while I am gone.
I've had clients reporting 'hung' for weeks and they haven't sorted out. That's one of the reasons why I use FahSpy instead. The other reasons include superior layout, better customization and more data displayed.
 
Yeah, I've had the "hung" thing too, but I think it just means the client has stopped (d'oh :rolleyes:) and you just need to restart it and every thing goes back to normal (whatever that is) :D

 
I've had clients reporting 'hung' for weeks and they haven't sorted out. That's one of the reasons why I use FahSpy instead. The other reasons include superior layout, better customization and more data displayed.

The hung issue is easy to overcome.

Set your computer with F@HMON 5 minutes ahead of all your other computers... IT is a time thing.. if it checks a computer that is 1 minute faster than itself it thinks it is 24hrs behind! :)

 
The hung issue is easy to overcome.

Set your computer with F@HMON 5 minutes ahead of all your other computers... IT is a time thing.. if it checks a computer that is 1 minute faster than itself it thinks it is 24hrs behind! :)


Thanks for the tip, I'm going to do this now.


 
Personnally, I ignore the "hung" warnings because I can see the timing logs (usually, this happen when a client updated with the time 1-2 mins ahead of the computer running fahmen). If it is truly hung, I can see after 30 mins.

 
Back
Top