Constant restarts, even after several reformats.

AngryJim

Limp Gawd
Joined
Apr 20, 2006
Messages
234
I am running an older computer as a server and I've been having problems lately. It's an AMD 2800+ with a radeon 1650 card, one main hard drive through ATA and a backup drive on SATA running Windows XP.

I leave the computer running 24/7 with iTunes open to network my music to all of my other computers. It also has all of my important documents, media files, and programs on it shared across the network (close to 300gigs with all of my tv shows). Other than that, I don't touch it very often.

One day it started restarting on it's own. It would happen every now and then, but gradually increase in frequency to the point that I couldn't even open Firefox before it would restart again. Alot of the time, but not all of the time, I'd get the "This system has recovered from a serious error" message. Eventually I just reformatted with a clean wipe of the main disk since the backup disk has nothing but dormant files. I've been through this process four times. Every single time the computer is fine for about a week after reformatting and then it all happens again. I am convinced it is a hardware and not a software problem because it still happens after reformats. I did once try to get the computer to work with the SATA drive removed, but the exact same results occurred.

Any help is greatly appreciated.
AngryJim:mad:
 
Might be a power problem, or a failing motherboard, or failing drives, in order of likeliness -- with the first two being far more likely than the last. The delay is the confusing part.

I think you may be getting file corruption for some reason over time, hence it starts out ok and gets worse, but it's strange this always leads to random restarts.
 
I agree, it's frustrating as hell to have everything work great for the first week after reformatting and then degrade into uselessness.

I guess I shouldn't call the restarts random, really. They begin happening every day or so, then just slowly increase in how often they occur until it gets to the 10-20 second mark before a restart.

And on the file corruption, would that be something I would notice? I haven't seen any problems with any files. Hell, I've even been able to restart old half done bit torrent downloads after a reformat with no problems.

AngryJim:mad:
 
I suppose I should specify something. This has been the way I've reformatted lately:

1. unplug Ethernet
2. install XP sp2 and all drivers
3. Install all software from a usb harddrive. Latest zonealarm, latest AVG anti-virus, and some other 'known-good' programs like iTunes and firefox.
4. plug ethernet back in.
5. Allow the system to get all windows updates.
6. Done.

Basically what I'm trying to say is I really, really doubt it's a virus or software problem of some kind.

NoPantsJim:mad:
 
look over the board for caps that may be leaking, there was a bad run of socket a boards some time ago, you may just have a late bloomer, if your not sure what to look for google bad motherboard capacitors, you'll get pics.. lots and lots of em

if you dont see any thing then its time for a parts swap....

ram... then psu...

if neather fixes it, its the mobo or processor ,and mostlikely not the processor, there is very little that can go wrong unless you have a chipped core, (however while your in there a fresh reapply of thermal paste wouldnt hurt ) but there is a number of things that can be wrong it the mobo..

let us know where you are in trouble shooting hardware, becuase this isnt software, im sure of that
 
Here's the motherboard, in case it rings a bell as one that's had problems in the past. I honestly have no clue if I have revision 1 or 2.

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=1756

The power supply is just some old clunker that case with the antec case that I bought at the time. I may try swapping that out with my roommate's old media pc.

As of now I've really made no hardware inspections. I've got some spare time on Sunday, I'll probably shread the whole PC and have a look inside for anything peculiar and update on what I find. Til then, any suggestions are extremely appreciated.

AngryJim:mad:
 
yup.. i replaced 4 caps on that EXACT same board for a friend so we could recover data befor we trashed it... not all of them had the issue thought... if i remeber correcty of this board, rev 1 was about 30% of the board developed symptoms within 9 mounths.. and 10% more within 2 years.. the rev 2 was a little better and a little worse, at 15% within 12 mounths and 24% within 2 years most of the people i know that had this generation board have ditched it for an upgrade (my friends was a rev 2 that had symptoms at around 1.5yr mark.. ), so beyond the 2 year mark its hard to predict the failure rate, but i wouldnt be supprised if this is your problem.. (infact now that im remembering the symptoms, yours is exibiting the exact symptoms of cap failure... you have a delay befor crashing becuase it takes time for the caps to go criticle ... i beleave the problem extended from the caps only being rated for 100°f... well.. they operate in a perfect 70°f enviroment at 85°f... you put a 2800+ blowing 120°f air on them... and well.. the shoot to over 130°f in about 10 to 15 mins.. the caps where fine as long as they stayed intact.. but at 130°f with lets say 5% fluid loss they cant do there job, fall out of tolerance, the mosfets go 'fuckit' and your cpu /memory controller say 'no.. fuck you' and there you have it.
 
So thore, does that in any way relate to the fact that right after a reformat I can usually run the computer 24/7 for about 5-7 days with no problems, but then the restarts hit and quickly become more and more common?

Additionally, would the capacitors being a problem account for the constant "This system has recovered from a serious error" messages? I just got off work and turned on the computer in question. I was bombarded with 16 (Yes, 16) of these messages in a row until finally after a minute of being at the XP desktop and constantly clicking 'do not send report' it shut off again.

I'll report back with my inspection of the motherboard on Sunday. Til then, I'll be shopping for a mobo that can take my chip and has an AGP slot, hopefully with some more SATA ports than what I have now.

AngryJim:mad:
 
heres bascily how it works,

your powersupply puts out voltage, lets say 3.3 volts on that rail, mosfets supply power down to a given voltage with a certain amount of amprage, your cpu, for example runs at say 1.8v .. (maybe it dose.. maybe it dosent, i dont remeber, pulling numbers for the hell of it) if you where to run 3.3 into your cpu.. it would fry... in a heart beat.. well, if you watch your voltage regulation sofware over time you will see the voltage move around some ... depends on loads... and sun spots :)p)... \

any way, when you have an inconstant supply for what ever reason, you cant build a perfect curcuit to convert usable voltage... so you have to use somthing that acts as a buffer..enter the capacitors... you can kinda think of them like mini batterys they take a charge of energy and hold on to it, when the supply voltage drops below a certain point ( varys by cap) it releases some of its current to meet the demand... now, they beauty is it works from both ends.. if the supply voltage drops they kick in.. or if the device on there backside suddenly decided it needs a ton more juice, they provide it untill the rest of the curcuit can catcxh up..

now.. if voltages are riseing and falling, as demand increases and decreases, the curcuit will occilate .. well.. caps do this any way, its part of there nature to constantly occilate at a given frequency... by adding more caps to the curcuit in parellel then they will all rise and fall at different times (google 2 phase power, 3 phase power, this will give you an idea of what occilation stacking looks like and how it works) and for every cap you add, you smooth out the the frequency varience of the curcuit i.e it cleans out the noise.

now.. if you have a curcuit, like that used in a pc, where you have a cpu the occilates at 2800mhz (i know its a 280+.. im being lazy and not looking up its working frequency) that means its cycles 2.8 billion times per seconed ... usually the board desginers create the power curcuits so that they provide power that arrives at a frequency that jives with what ever the device is they are trying to power..

if you have a cap that is out of sync with the rest of the curcuit, you simply wont end up with a slight dip in noise, but rather a a large standing wave effect , moments of slight noise, followed by a spike in current, follwed by some calm, and then not enought power...

now.. most of the curcuits are desgined to handle some of this... but they have there limits... and memory (and its controllers) are the most supseptable.. because ram is in a volital state voltage varience means the difference between it storeing the bit it was ment to store, and not. voltake and or amprage varience may also cause it to run out of spec,... to fast.. or to slow.. eather way out of sync... when the cpu sends data, and the voltage is to low, it may store the data wrong then when it is requested back, the cpu throws a hissy fit...

now... why dose it take ~24+ hours for symptoms to show up... well, there is a few things that has to go wrong befor bad data is stored... the cpu has to send data at a low point the ram has to store it on a section that isnt getting enought power, , the crc system has to catch that the data wasnt correct when it was stored, and last but not least... the cpu has to have a reason to request that data back... when you combine this with working temps... (some days are hotter than others) you end up with a seemingly random rebooting machine... that will reliably crash regardless of what you do...

and like a crooked CEO, if you cook the books, then next time you go to do something, the numbers dont come out right and the sytem tells you where to stick it. because every thing that gose to the HD MUST pass thought the ram, if there is bad data not being caught in the ram, then it gets stored bad on the HD... and there you have it...



this is why your problem could be bad ram, or a bad PSU as well.. if you start off with bad power... well.. you have bad power... and bad ram.. will do the same... seeing as how ram and psu are easyer to replace than the mobo, that is why i suggest looking at that first. unless you see bad caps... then its a sure thing and there is really no need to check the other 2... (well there is... as they could be damaged by overvoltage(ram) ... but you wont know that for sure untill you replace the mobo... )
 
Thanks for the very thorough post. This Sunday if I don't find any bad capacitors, I'll run memtest for the ram and then try to swap the power supply.

AngryJim:mad:
 
Ok, I took the computer apart and checked every inch of the motherboard. Nothing looks out of the ordinary. I checked all sides of every single capacitor and found nothing strange. Re-assembled and started it up only to find the same problem. I'll run memtest sometime this week, can't get my hands on a copy at the moment.

AngryJim:mad:
 
Just cause you don't see leaky brown stuff doesn't mean they're bad.

I had a board die because the /stock/ HSF (from the factory!) backplate was mounted in such a way that it crushed a surface mount resistor on the back of the board when I reinstalled it after lapping my heatspreader. Shame on you, ABIT.
 
Just ran Memtest this morning, couldn't get through a single pass before hearing some error nosies and the computer shutting completely off, not restarting.

The difference was that the computer would run for about 25-30 minutes before this happened, whereas if I open it with Windows XP it restarts after less than a minute. I'm gonna try and grab some ram from my roommate later and see if thats the problem afterall.

AngryJim:mad:
 
Just tried several different sizes and speeds of ram, had the exact same error with every stick. Probably gonna try the power supply when I get a chance.

AngryJim:mad:
 
Does your Antec case have an Antec power supply?

They die to... caps

I have two dead Antec PSU's here in my cubicle...
 
yep, power supply is by Antec. I'm waiting a few days to get my hands on another PSU to test with.

AngryJim:mad:
 
Just had a coworker come by with a powerup problem, hit the power button a dozen times until the machine started, then run for a bit and turn off...

Antec smart power 450...

He's RMA'ing the thing right now...
 
This Friday I'll have some spare time. I'm going to take the brand new supply out of my gaming pc and try it in old computer that's having the problems and I'll post here with the results.

AngryJim:mad:
 
Back
Top