• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

Box unstable only with 600 pointers

Hito Bahadur

[H]ard|DCer of the Month - December 2006
Joined
Jul 8, 2004
Messages
3,169
Gents,

I seem to have 1 box that has trouble with 600 pointers. It's a XP 2400+ (Thoroughbred) with 1Gb of DDR400 Crucial memory. It runs 300+ point Gromacs well and has done well with 600 pointers in the past, but is now unstable. Any ideas? I've played with clock speeds of memory and CPU with no success and otherwise it is stable.

OS is Windows XP.


 
Sorry about that. It terminates the current project and send the results back. See log excerpt below:
[22:10:34] Completed 162500 out of 250000 steps (65)
[22:54:55] Writing local files
[22:54:55] Completed 165000 out of 250000 steps (66)
[23:39:18] Writing local files
[23:39:18] Completed 167500 out of 250000 steps (67)
[00:23:40] Writing local files
[00:23:40] Completed 170000 out of 250000 steps (68)
[01:08:01] Writing local files
[01:08:02] Completed 172500 out of 250000 steps (69)
[01:52:24] Writing local files
[01:52:24] Completed 175000 out of 250000 steps (70)
[02:36:46] Writing local files
[02:36:46] Completed 177500 out of 250000 steps (71)
[02:58:26] Quit 101 - Fatal error: NaN detected: (ener[13])
[02:58:26]
[02:58:26] Simulation instability has been encountered. The run has entered a
[02:58:26] state from which no further progress can be made.
[02:58:26] This may be the correct result of the simulation, however if you
[02:58:26] often see other project units terminating early like this
[02:58:26] too, you may wish to check the stability of your computer (issues
[02:58:26] such as high temperature, overclocking, etc.).
[02:58:26] Going to send back what have done.
[02:58:26] logfile size: 17136
[02:58:26] - Writing 17699 bytes of core data to disk...
[02:58:26] ... Done.
[02:58:26]
[02:58:26] Folding@home Core Shutdown: EARLY_UNIT_END
[02:58:29] CoreStatus = 72 (114)
[02:58:29] Sending work to server
 
Kind of wierd if it happens with 600 pointers consistently, but never with other big packet units, but maybe.
 
If it a one off then you just got a bad work unit.
No hassle as I've also had one.
Mine was Project: 1135 (Run 17, Clone 4, Gen 7)

If it more than one then maybe a quick loop of Memtest86 to check for memory error.
Then Prime95/ StressCPU to check CPU out is in order.

Luck......... :D
u=Tigerbiten.gif
 
Do what tigerbiten said, memtest and prime95.

Also check if you have the -advmethods flag on...that flag will sometimes give you wu's that will be less stable. You could, erm...send the system to me and I could work on it for you.

JasonLee
 
JasonLee said:
Do what tigerbiten said, memtest and prime95.

Also check if you have the -advmethods flag on...that flag will sometimes give you wu's that will be less stable. You could, erm...send the system to me and I could work on it for you.

JasonLee

Appreciate the willingness to help. :p I just did a bunch of security updates also. I'm going to let it run for a bit and see if everything is stable again.

How's it looking for points tomorrow for you? I should have a decent day.
 
Hito Bahadur said:
Sorry about that. It terminates the current project and send the results back. See log excerpt below:
[22:10:34] Completed 162500 out of 250000 steps (65)
[22:54:55] Writing local files
[22:54:55] Completed 165000 out of 250000 steps (66)
[23:39:18] Writing local files
[23:39:18] Completed 167500 out of 250000 steps (67)
[00:23:40] Writing local files
[00:23:40] Completed 170000 out of 250000 steps (68)
[01:08:01] Writing local files
[01:08:02] Completed 172500 out of 250000 steps (69)
[01:52:24] Writing local files
[01:52:24] Completed 175000 out of 250000 steps (70)
[02:36:46] Writing local files
[02:36:46] Completed 177500 out of 250000 steps (71)
[02:58:26] Quit 101 - Fatal error: NaN detected: (ener[13])
[02:58:26]
[02:58:26] Simulation instability has been encountered. The run has entered a
[02:58:26] state from which no further progress can be made.
[02:58:26] This may be the correct result of the simulation, however if you
[02:58:26] often see other project units terminating early like this
[02:58:26] too, you may wish to check the stability of your computer (issues
[02:58:26] such as high temperature, overclocking, etc.).
[02:58:26] Going to send back what have done.
[02:58:26] logfile size: 17136
[02:58:26] - Writing 17699 bytes of core data to disk...
[02:58:26] ... Done.
[02:58:26]
[02:58:26] Folding@home Core Shutdown: EARLY_UNIT_END
[02:58:29] CoreStatus = 72 (114)
[02:58:29] Sending work to server

I have had the exact same error with a different p1475 protein. Since that protein is in beta, it was mainly due to simulation errors. Basically, the protein folding simulation showed a structure that was unstable. You should look at your point statistics and see if you have been credited with roughly 71% of 600 points. That means something is wrong with the protein. If you had a small amount of credit. It is due to boxen instability. Hope it helps.

 
Hito Bahadur said:
Gents,

I seem to have 1 box that has trouble with 600 pointers. It's a XP 2400+ (Thoroughbred) with 1Gb of DDR400 Crucial memory. It runs 300+ point Gromacs well and has done well with 600 pointers in the past, but is now unstable. Any ideas? I've played with clock speeds of memory and CPU with no success and otherwise it is stable.

OS is Windows XP.



This has happened to me as well. I made this thread to post the unit, run, gen, clone, so we would be able to tell if there is a pattern. This is the thread.
Hope this helps!

 
One weird thing I've found is on the exact same type of boxen, the 600 pointers warm them up good-o-plenty (warm to the touch, metal case) and the little boys don't seem to warm them up at all.

Heat issues? Dusty heatsink ect.

Have you tried kicking the boxen, always works for me!
 
marty9876 said:
One weird thing I've found is on the exact same type of boxen, the 600 pointers warm them up good-o-plenty (warm to the touch, metal case) and the little boys don't seem to warm them up at all.

Heat issues? Dusty heatsink ect.

Have you tried kicking the boxen, always works for me!
Hey Marty I do that too. I also use my favorite tool.....bigger hammer! :D

They do get them warm though, I have a large cd wallet that sits on top of my case and I use that as an indicator of how much heat is being held in the case. Lift the wallet (insulator) up and feel the top of the case or underside of the wallet for that matter! If you are using air to cool your boxen, you need to move a lot of air through there.

 
Mayhem33 said:
Hey Marty I do that too. I also use my favorite tool.....bigger hammer! :D

They do get them warm though, I have a large cd wallet that sits on top of my case and I use that as an indicator of how much heat is being held in the case. Lift the wallet (insulator) up and feel the top of the case or underside of the wallet for that matter! If you are using air to cool your boxen, you need to move a lot of air through there.


lol I have 6 noisy fans in the case next to me and no problems with the 600 pointers so far. Gotten used to the noise so much that I sleep through almost anything. :eek:
 
Back
Top