GPU problem

dwdawg

[H]ard|DCer of the Month - January 2013
Joined
Feb 25, 2001
Messages
778
Galaxy GTX 460

Was working fine. Now I get this in the client logs:
[03:12:12] mdrun_gpu returned 52
[03:12:12] NANs detected on GPU

It then shuts down the client for 24 hours.

This card is only a month old. Anything I can do to fix it?

Seems to run fine otherwise....


Edit: ok, more info
[03:16:41] Starting GUI Server
[03:16:41] Setting checkpoint frequency: 500000
[03:16:41] Setting checkpoint frequency: 500000
[03:18:38] Completed 500000 out of 50000000 steps (1%).
[03:18:38] mdrun_gpu returned 52
[03:18:38] NANs detected on GPU
[03:18:38]
[03:18:38] Folding@home Core Shutdown: UNSTABLE_MACHINE
[03:18:40] CoreStatus = 7A (122)
[03:18:40] Sending work to server
[03:18:40] Project: 6806 (Run 2142, Clone 1, Gen 30)
[03:18:40] - Read packet limit of 540015616... Set to 524286976.
[03:18:40] - Error: Could not get length of results file work/wuresults_05.dat
[03:18:40] - Error: Could not read unit 05 file. Removing from queue.
[03:18:40] EUE limit exceeded. Pausing 24 hours.

I've also deleted the contents of the work folder and the logs and core from the client folder and tried a fresh start.
 
Last edited:
you could try the new gpu3 client, see if that helps. but typically when it show NAN's detected as the error means the cards no longer stable. if you have it overclocked your probably going to want to go back to stock and see if the errors continue.
 
Damn, it's never been overclocked. I'll try the new client just for grins.
Thanks.
 
if's its never been overclocked and updating the client doesn't help. then i'd run furmark for about 20-30 minutes and see what happens. might be time for el RMA'o time
 
Alright so we have pretty well established that the card might be unstable. Things to look at are the temps of the card? Have you cleaned it lately? Is the fan spiinning up ok? Were you on the edge of the temp envelope for that card? That type of thing.

Oh and as far as overclocking goes, the manufacturer of some of these cards overclocks them from the factory. So it could be overclocked when you get it. I don't have the reference clock speeds handy but you might want to compare.

Next avenue is to take a look at the power supply. Could it be starting to weaken and not giving the card enough power? Could one of the connectors to the card have corroded and not making good contact?

That is a few things to take a look at... keep us informed...

Oh and one other thing... the GPU3 client is much better suited for that card. That might clear up the whole thing.

Good luck.
 
Just a couple of points to concider .........

Did you check over on the folding forums if it was a know bad work-unit ??
Thats the first thing I do if I suddenly get a protien that won't fold after a long run of successfull ones.

When you cleared your work folder, did you get a different protien ??
Remember if you don't send in a result, you'll download the same protien two more times before you get a chance of a different one.
I tend to change my Machine ID as well as clearing the work folder etc, etc.

Luck ......... :D
[H]
 
Ok, sounds like I have some more work to do. I'll try these suggestions. Thanks to you all.
I'll update when I have time to open up the box this weekend.

If all else fails, fill me in on this 'BAKE-IT' solution :D
 
Also d/l the opencl gpu mem tester from the stanford tools page. It can test your gpu for errors.
 
If all else fails, fill me in on this 'BAKE-IT' solution :D

I don't have the specific recipe but i kid you not, it is possible to put a Gfx card in the oven, bake it for a few minutes at a low heat and coax it back into life. Its on my to do list for my dead 8800gtx
 
dont bake it yet. come on guys.

i know the card is stock but give it a little more juice (volts.)
 
yeahthat.gif
and/or try using Galaxy's software or MSI's afterburner software to increase the card's fan speed to keep it cooler, then try Folding on it again.
 
Set fan to max, set volts both up and down. Brought the clock down. Still has the same problem.

Ran the G80 memtest from Pande. No errors. But it still has the same problem.

Just happened to be at Fry's yesterday so I picked up another card.

I'm going to swap them out and return the bad one. It's only a month old.
 
Okay, this is just nuts.

I accidentally left the gpu client active and the damn thing is folding again :rolleyes:

Now I'm getting FILE_IO_ERRORS on a different box.


I need a drink....
 
It seems like the instability gremlins always appear all at once.
 
Okay, this is just nuts.

I accidentally left the gpu client active and the damn thing is folding again :rolleyes:

Now I'm getting FILE_IO_ERRORS on a different box.


I need a drink....


sounds like bad WU's then. or your just cursed. :D
 
It's not the card. I replaced it and I'm still getting the same thing.
Now I don't know what to do.
 
It's a software issue then.

Uninstall then reinstall the GPU client. if that doesn't help, uninstall and reinstall the drivers.
 
It's a software issue then.

Uninstall then reinstall the GPU client. if that doesn't help, uninstall and reinstall the drivers.

Nope. She still don't fold. Looks like I'll have to try another motherboard.
 
Memory is a possibility. We'll find out when I replace the MB :D

The old K8N-DL is getting replaced by dual Mangy-Curs.

Goes well with my handle, dontcha think?
 
Back
Top