Can't figure out - cold reboot problem

Brian_B

2[H]4U
Joined
Mar 23, 2012
Messages
3,354
I've built this computer in November of 2014, so I've had it a while.

Starting about 6 months ago, it started randomly rebooting (cold reboot, like reset switch). It only occurred rarely, like maybe once a week, and I just kinda wrote it off as Windows 10 weirdness (since it coincided with about the time I upgraded to that).

It got worse. Lately, it has been as frequent as every 15 minutes under heavy gaming/heavy load. Even idle it'll do it occasionally, maybe once every other day if I leave it awake and idle.

So here is what I have:

Asus Maximus Gene Vii
Asus Strix 980
32G Kingston PC1333 DDR3
i7 4970K
Corsair H100i
Crucial M500 SSD
Toshiba 3TB SATA3
Seasonic Platinum 450RM
Corsair Carbide 240 case
Everything running through an APC RS1500
Nothing overclocked, everything stock/auto in the BIOS

Originally, when I first built the rig, I had a Corsair AX650i, after going through 2 RMAs on bad cables, I replaced it with a smaller Seasonic (there's a thread around here somewhere, I'm not 100% sure but I think it's not terribly relevant).

So, once the reboots got to about the once per day stage, I started to actively troubleshoot it and figured something was up. It was getting worse and all. Since it's a cold reboot I suspect it's a power issue, but can't seem to track it down.

Here's what I've done

First off figured it was PSU, so replaced the 450RM with a 750 SNOW. computer rebooted 15 minutes after the swap, so I returned the 750
Swaped video card with wife's computer (GTX 660) - her computer ran the 980 fine for 3 days under load, mine continued to reboot
Swapped video card to other PCI slot
Updated BIOS
MEMTEST86'ed each DIMM successfully
Rolled back nVidia drivers to 2-3 different versions
Rolled back to WIn8.1 on an older SSD, unplugged the Toshiba to make sure it wasn't a drive issue.
RMAed the motherboard, Asus sent it back 2 weeks later. Rebooted 15 minutes under load after I re-installed that.

Checked all internal wiring, it appears ok. Voltages all look fine. Temps on the CPU get a bit higher than they used to, but all still well within normal ranges (30C idle, 70-80C loaded). GPU temps are fine.

Unhooked the power and reset buttons. Swapped out the power cable. Plugged into a different slot on the UPS.

Asking around on a different forum, one person said they had some luck overvolting their RAM (from 1.50 to 1.60). I tried this, it is more stable, but still not entirely stable. With the RAM overvolted (but not overclocked), it goes from every 15-20 minutes to maybe twice a day. I tried bumping up CPU voltages as well, but that seemed to reduce the stability.

So far, the only thing I haven't been able to check out is the CPU (I don't have another Haswell to stick in there to test it). It passes Prime95 checks (at least as far as it will get before a reboot kills it).

Right now, I have FAH set to Heavy and to come up on reboot and I'm burning it in a bit - hoping that whatever the issue is will either catch on fire or bake itself out.

Anyone have any suggestions? I've tried just about everything I can think of without just writing it off as a gremlin build and punting.
 
Not sure what you mean by cold reboot. If the system is running and under stress and it reboots, that's not a cold system.

Really sounds like a bad motherboard. ASUS is bad about just sending the same board, in the same condition right back to you without even touching it.
 
Well, by cold reboot i mean - no BSOD, no hang, just drops like hitting the reset switch - you are correct the system isn't cold temperature-wise.

I agree about the motherboard for the most part, just wanted to see if anyone had any other suggestions before I tried to re-RMA it, or just cut my losses as get a different mobo and hope that solves the problem..
 
Oh, I always referred to those types of reboots as hard. Anyway...

Those are normally caused by two things.

1. PSU/Power/Ground problem. I know you said you replaced the PSU with a different one with similar results. That didn't really eliminate possible grounding issues though. Only way to really eliminate that is to remove all unnecessary hardware from the motherboard and pull it out of the case. Let it run to see if it continues.

2. This is a problem I have had with Windows 7. I assume W10 has a similar feature/setting, but I haven't used it yet so I can't go find it. However, Windows 7 has a setting that tells it to automatically reboot for some Windows errors. To get to it in Windows 7 its.

Left click Start button
Right click Computer
Click properties
Advanced System Settings (on the left side)
Advanced tab
System and Recovery box
Click settings
Un-tick "Automatically restart"

Windows also has a log that will tell you if it's automatically restarting due to a critical error. I forget where it is. Just Google "Windows Event Log" It's somewhere in Administrator Tools in Windows 7.
 
Even though swapping out the PSU didn't fix it, I'd still go with a beefier PSU for that hardware.
At least a quality 600W PSU, 750W would be better. There's not that much price difference to
go to a higher wattage PSU. I run a FSP Hydro G 750W I got on sale for around $80.

Overkill on PSUs is a good thing since it's less stress on the PSU. They also tend to go out of
spec as they age and that will have less effect if the older PSU is not under as much stress.

[H] does a great job on PSU reviews, I'd pick one that is rated Gold by [H].

Since the voltage bump on the RAM did help the issue, I'd say you are on the right track there.
Try running with one stick of RAM at a time.

I know somebody else with Kingston RAM was getting BSODs even though Memtest ran clean.
He couldn't get Windows stable with that RAM though. Windows was accessing the RAM in a
way that Memtest wasn't testing.

Buy a cheap single stick of RAM to test with.

If not RAM, then I agree with other posts that it's likely a bad mobo.
 
This is a update, a lot of good ideas given here.

Problem ended up being the CPU itself.

After much more testing, a second motherboard (Gigabyte), and having swapped out basically all components, I finally found a situation where I could get it to reliably reboot - Prime95 Small FFT test. Computer would reboot within 10 seconds of starting that test, every time I tried it. After that, I swapped out the i7 to an i5 - problem appears gone. There were a lot of red herrings here, and this is probably one of the trickiest problems I've had to troubleshoot.

Going to attempt to get the i7 replaced under warranty.

Just thought I would update now that it's pretty well solved, finally, in case anyone goes back digging for ideas on a future problem. I haven't ever seen a CPU go out like this, it's rare for a CPU to go out period unless it's physical damage, and the couple of times I've seen them kaput electrically it's an all-or-nothing and the entire computer is just dead.
 
Crazy. Glad you got it figured out. Like you said, a CPU going bad is very rare unless it was abused in some way. I've been o/cing computers since '97 and have yet to see a CPU failure. I'll be curious as to how the warranty works out for you.
 
Back
Top