Random reboots ~only~ when idle - Linux and Windows

lopoetve

Extremely [H]
Joined
Oct 11, 2001
Messages
33,891
So, clearly a hardware issue of ~some~ kind, but I'm torn on swapping the PSU or the motherboard first. Memcheck passes 100%.

System:
Gigabyte X570 Aorus Elite
Ryzen 3950X
Hyper-EVO 212 cooler (for now, waiting on replacement backplate so I can put an AIO on this, but crash is at idle, so... not heat/load)?
EVGA RTX 3070
2x32GB G.Skill Trident Neo (@ XMP - 3200Mhz).
Oculus Rift + 2 sensors (rift on 3.2 port, sensors on 3.1 ports), basic keyboard/mouse, wired ethernet from motherboard.
1x1TB NVMe (Inland Professional)
Older HX650 Corsair PSU

Dual boots Ubuntu 20.04

System will reboot if it's idle - hard reboot, fans spin up, and it comes back fine. Does this idling in Linux or Windows. If I fire up a Time Spy stress test, it'll run for hours just fine. Same for playing VR games. Intermittently, idling in Linux (when it's not playing VR), it reboots. If I leave it sitting in windows, it reboots (sometimes). No pattern I can tell - thought it was heat (I have a script that reboots to linux after I finish playing VR, so it can go back to running backup jobs) and it would tend to reboot after that (waits about 2-3 minutes at the linux desktop, then reboots)... but it'll run workloads forever. It's only if its IDLING that it does this - I've literally played Robo Recall and Lone Echo for hours, and it's been running the time spy stress test for an hour now... stable as could be. But end that and let it sit? Windows there's about a 20% chance for a reboot - if I reboot straight to linux, we're guaranteed to do so. If I turn it off for 5+ minutes, linux will probably be stable.

It's WEIRD.
Oh, the one reason I'm thinking motherboard? The Rift stopped working on one of the 3.1 ports - had to move it to a 3.2 (audio just dropped), which... yeah. Weird.
 
Gear Down mode enabled/disabled? I wonder if its a voltage drop issue where the motherboard/CPU/memory combo isn't stable at the lower voltage at idle. Other than that, maybe bump the SOC voltage slightly?
 
Yeah, a friend suggested that. Just set load line to normal from auto last nigh t and am running tests...
 
I ran into a somewhat similar issue recently. I was getting random reboots and idle and my computer kept waking from sleep for no reason. I tried two things:

1) Manually set VDDG and VDDP voltages in the BIOS. Reddit thread
2) Turn off wake on LAN for my Intel network adapter via Device Manager. I think I recent driver update may have broken this.

So far it's been a few days and all is well.
 
Huh. I wonder if the wake on LAN was the problem that hit my Z490 board. Rolled back a dozen times before finally disabling windows driver updates, and problem gone.
 
  • Like
Reactions: Epos7
like this
Gear Down mode enabled/disabled? I wonder if its a voltage drop issue where the motherboard/CPU/memory combo isn't stable at the lower voltage at idle. Other than that, maybe bump the SOC voltage slightly?
Ok. So the load line didn’t do it. Bumped it one more to see. Testing now.
 
Nope. Not it. Will try looking up right voltages and setting manually. Then I’m going to replace both the board and the PSU. Should have remembered my number one rule of building AMD systems.
 
Nope. Not it. Will try looking up right voltages and setting manually. Then I’m going to replace both the board and the PSU. Should have remembered my number one rule of building AMD systems.
Ironically I don’t think it’s the amd system. I believe it’s your power supply. I would t replace the motherboard yet.
 
Ironically I don’t think it’s the amd system. I believe it’s your power supply. I would t replace the motherboard yet.
I’m mostly ranting about Gigabyte AMD boards. I have horrible luck with them almost every single time 🤣
 
Given your original description it could be heat related. Crashes after doing something, maybe in Windows, immediate reboot to Linux after VR causes a crash, generally fine if you shut it down for 5... makes me think it's something like a combination of idle voltage and temperature. I wonder if it would crash if you shut it down, let it cool all the way off, booted straight to Linux, ran a stress test in Linux, then let it idle. But who knows what the hell it is. I've been around this business long enough (started doing the software engineer thing in the '90s, and I've played sysadmin and DBA too) to know that these sorts of problems often end with "WTFBBQ I can't believe that was it" or "I feel stupid for not thinking of that".

PSU is possible but I'd expect a PSU problem to manifest under load. Then again, read the last sentence of the first paragraph I wrote again. I'd mess with the voltage and any other idle parameters in the BIOS. Maybe C-states? The last build I did I made them more aggressive (lower power when snoozing) and didn't have any trouble, but that was an Intel box so it's not even worth comparing on something like that. On the other hand, it could still be the PSU. Maybe it's getting old and voltage regulation goes to shit under light load. At any rate I'd mess with the BIOS more, then try swapping the PSU before replacing the board. The one thing I wouldn't do is replace more than one part at a time unless I was worried about damaging something or there was no way around it. If you swap the PSU and board at the same time you'll never know which one it was, and dammit I'd want to know if it was an old PSU or Gigabyte sucking that caused the problem.
 
what bios version are you using?
I was having almost identical issues as yourself and downgrading back to F21 seems to have solved it.

I also seem to have a strange issue where one of my ram sticks comes loose in slot A2 which was driving me crazy and causing more issues troubleshooting. I typically like gigabyte boards but this one has been a huge headache
 
We should start a thread for strangest pc problems you've encountered/resolved in the past. I've had to troubleshoot some oddities I've encountered in the past 20 years or so. Might end up as a valuable resource for folks having issues currently. Maybe a symptom / resolution format to help people googling for help?
 
We should start a thread for strangest pc problems you've encountered/resolved in the past. I've had to troubleshoot some oddities I've encountered in the past 20 years or so. Might end up as a valuable resource for folks having issues currently. Maybe a symptom / resolution format to help people googling for help?
Amen.

On this one - bumping vdroop two levels and SO FAR we're stable (also added a 5 minute cooldown on the reboot script). My next was going to be trying a different BIOS level. So Idle voltage it looks like!
 
Did idle temp come up accordingly after LLC change? Does your bios/board have frequency control for VRMs? I only ask to speculate about VRM duty cycle issues that could be widespread from the vendors bios.
 
You know, I haven't been monitoring, but I will check that.

I'll check on the frequency control on the next reboot, but I ~think~ it does - I tend to not overclock much (this system was not planned on being overclocked at all) so I hadn't checked when I bought it either. This was low-cost but high core for server and occasional VR work.
 
Back
Top