Board dying? CPU dying? Corrupted OS?

Reality

[H]ard|Gawd
Joined
Feb 16, 2003
Messages
1,937
Been noticing over the past year or so that my system will either completely shut down or restart itself while I'm gaming. When I say shutdown, I mean 'Complete loss of power' shutdown. It happens when my system is running at stock speeds, or overclocked. RAM at 1600 or 2400.

Swapped out the PSU to a brand new EVGA 1300, no change whatsoever in these symptoms.

Swapped out graphics card to a new 2080, no change.

So before suggesting it's a hardware failure with my CPU or board, is it even remotely possible that an OS reinstall could fix this?

No blown caps or any visible damage to the board or cpu. Just reapplied fresh TIM, and my cpu temps are well below average (40-50c at 4.7ghz under load)

No viruses or malware on the system.

This happens when connected to my UPS, or directly to the wall. It happened at my old apartment and at my current place.

I'm thinking of removing all hardware and reseating everything in the case, perhaps theres a short somewhere?

I'm at a loss here with where I should go next.

The CPU, chipset and VRMs on the board have been watercooled for the past 3 years.
 
What does the system log say when it shuts down?

A software issue will show Something in the log, usually the same thing, just before it crashes.

Someone may be nuking your system remotely; logging network accesses should show a pattern, remember WinNuke? :) There are similar things.

How old is the mobo? The last one that did this for me was from the capacitor fail era, and I had to replace all the caps.

I found out recently that standard electrolytic caps will unform themselves at lower than 2.5V; at the current levels on the mobo, there should be visible damage to them is this happened.
(Bulged, leaked, etc.)

That was over 10 years ago, so if the mobo is newer than that, it's less likely; But...

I'd pull all the blocks off the mobo, and look at it very carefully for leakage on the mobo, either from caps or your water system; the water system may have leaked long ago, and left residue that causes an issue under just the right circumstances.

Good luck; that's pretty maddening. I recommend mil spec cap based mobos, like the Sabertooth or other gaming types, for the solid type caps.
 
What does the system log say when it shuts down?


I know the feeling.. the last couple weeks, I have lots 2 motherboards and 3 2500K chips...Sometimes, your luck is not with you.... sometimes, you just have to buy an updated system and start over.
 
Inmediately power supply came to mind.

I know you swapped it, but it still reeks of power supply.

Does it only lock up in games or does it lock up when CPU and RAM are loaded up (prime 95, etc)

If only games, reseat your video card. Make sure PCI-E connectors are good. Swap them out if you have a modular PSU.

Run Futuremark tests looped for each type of stress test and see what fails.

Failing that,

I’d unhook everything you can and stress test it putting one component back into the mix each round of test. See what component fails.

Strip it down to 1 stick of RAM, onboard video, nothing else plugged into PCI-E, run stress test, put the next component in, stress test, repeat until reassembled.
 
Last edited:
Inmediately power supply came to mind.

I know you swapped it, but it still reeks of power supply.

Does it only lock up in games or does it lock up when the only CPU and RAM are loaded up (prime 95, etc)

If only games, reseat your video card. Make sure PCI-E connectors are good. Swap them out if you have a modular PSU.

Run Futuremark tests looped for each type of stress test and see what fails.

Failing that,

I’d unhook everything you can and stress test it putting one component back into the mix each round of test. See what component fails.

Strip it down to 1 stick of RAM, onboard video, nothing else plugged into PCI-E, run stress test, put the next component in, stress test, repeat until reassembled.


Agreed with trying this stuff. One thing I'd like to say is when you encounter a problem like this your first steps should not be to just buy new parts and hope that fixes it. It should be to try to isolate the cause.

#1 Should be taking everything to stock speeds and checking that nothing is out of spec via something like HW info64.
#2 Running stress tests and seeing if you can get it to BSOD/shut off.
#3 Be 100% positive you don't have a piece of software causing crashes. I've been lead down goose chases thinking it's PSU/other when it was the VPN software that got corrupted from an nvidia driver change.
#4 Start doing component trouble shooting like removing pieces of hardware as Archaea suggested.

Hope you get the problem figured out.
 
Inmediately power supply came to mind.

I know you swapped it, but it still reeks of power supply.

Does it only lock up in games or does it lock up when CPU and RAM are loaded up (prime 95, etc)

If only games, reseat your video card. Make sure PCI-E connectors are good. Swap them out if you have a modular PSU.

Run Futuremark tests looped for each type of stress test and see what fails.

Failing that,

I’d unhook everything you can and stress test it putting one component back into the mix each round of test. See what component fails.

Strip it down to 1 stick of RAM, onboard video, nothing else plugged into PCI-E, run stress test, put the next component in, stress test, repeat until reassembled.

It's definitely not the PSU, even though it would appear to be the first thing to eliminate

It doesnt happen when running Unigine Heaven 4.0 or Time Spy on loop, it doesnt happen when running prime95 or Intel burn test

Only happens while gaming

I moved the graphics card to a different pcie slot and made sure its seated properly, theres no sag with the card

The only other device installed is a pcie wifi card that has sketchy win10 drivers
 
Use "bluescreenviewer" to look at the crash dumps.

It'll tell you what happened.

Based on the error, we can tell you if it's hardware or software. (within like 90% certainty)

Also, have you tried lowering your overclock and/or upping the voltage? 4.7Ghz at 1.37v is pretty high up there....
 
Use "bluescreenviewer" to look at the crash dumps.

It'll tell you what happened.

Based on the error, we can tell you if it's hardware or software. (within like 90% certainty)

Also, have you tried lowering your overclock and/or upping the voltage? 4.7Ghz at 1.37v is pretty high up there....

Well it isnt blue screening, it's a complete power down. Like if you had lost power in a storm. The system completely powers itself off without warning, like it was unplugged

The prime95 crash was at 4.5/1.26v :/
 
I had a similar issue with an entirely different computer setup.

Over the course of 4 months, I changed every component. Very literally, I have an entire second computer because of it.

Different OSes, different installations, etc.

Windows would log no error or fault - just like someone pulled the plug.

And yes, I did try a different power cable and outlet. I RMAed the GPU, the PSU, the motherboard, and even the CPU.

End of the day, the only thing that I could find in common with the random reboots was the use of a Corsair H110i. If I used that particular AIO, I would get sporadic reboots. Like I said, I bought an entire second set of hardware, and it happened on both sets. No OCs anywhere, temps always looked great, no apparent issues with fans or the pump. Nothing even to do with load, could restart even while idle.

No idea. Installed correctly, and it had worked fine for about a year and a half before it started giving issues. I pulled my hair and my wallet out trying to sort through it. I still run both of those computers today, just with CM212s now instead of being fancy with a nice AIO.

Not saying this is he OPs issue, just wanted to say I totally understand and empathize. It could be something completely unexpected.
 
Turning off all your overclocks and restoring default voltage is appropriate for your testing.
 
man, I had all sorts of issues with HWINFO64, I had to uninstall my asmedia sata controller drivers, i had to disable certain sensors on the board, all my drives would disappear when i loaded the program, ugh lol
 
man, I had all sorts of issues with HWINFO64, I had to uninstall my asmedia sata controller drivers, i had to disable certain sensors on the board, all my drives would disappear when i loaded the program, ugh lol

You can narrow this down by looking at the PWR_OK signal. If you don't have an oscilloscope (assuming test points are even available) then using a known-good CPU is the only way to differentiate. Remember, even if the CPU isn't overheating that doesn't mean a crapped out thermistor won't ruin your day.

https://www.intel.com/content/dam/www/public/us/en/documents/guides/power-supply-design-guide.pdf
 
I've seemed to fix the problem. My board has dual bios, and switching over to the second bios and disabling every overclocking option. under the DIGI+ section has stopped the random crashing. I'm actually running my CPU overclocked at 4.6ghz with zero issues now. System hasnt crashed in a couple weeks now. Something either related to that bios, or those high end overclocking settings seems to have been causing the instability
 
I've seemed to fix the problem. My board has dual bios, and switching over to the second bios and disabling every overclocking option. under the DIGI+ section has stopped the random crashing. I'm actually running my CPU overclocked at 4.6ghz with zero issues now. System hasnt crashed in a couple weeks now. Something either related to that bios, or those high end overclocking settings seems to have been causing the instability

BIOS code can have bugs too, it's good that you were able to sort it out!
 
I've seemed to fix the problem. My board has dual bios, and switching over to the second bios and disabling every overclocking option. under the DIGI+ section has stopped the random crashing. I'm actually running my CPU overclocked at 4.6ghz with zero issues now. System hasnt crashed in a couple weeks now. Something either related to that bios, or those high end overclocking settings seems to have been causing the instability

Glad to hear. Hopefully doesn't happen again but if it does you will have more experience at least!
 
it may be a good idea to flash the bad bios to the most recent. its always better to have a back up than not.
 
Back
Top