New Build, Getting Freezing and Crashes. Advice?

cybereality

[H]F Junkie
Joined
Mar 22, 2008
Messages
8,789
So I just built a new build and it's not stable. Running Ubuntu now, and I've gotten repeated freezing (total lockup, image stays but PC cannot be recovered without pressing the power switch) and some game crashing. The freezes have happened at different times, on a Firefox browser, just on the Terminal typing, or while compiling code. One game, DeadCore, crashes to desktop after 1 minute into a level (I tried like 5 times, same deal). I've also tried both Ubuntu 17.10 and 16.04, which froze minutes into getting into the desktop without even installing anything. I feel like the issue may be in the hardware, but I'm not sure what.

This is what I've tried so far:

- 24 hours of MemTest86+, completed 8 passes, no errors.
- 22 hours of Prime95, blended stress test, no errors.
- 2 and a half hours with Unigine Heaven running in loop, 1080p medium, no crashing.

I feel pretty good that the RAM and CPU are OK. So it could be something else, I'm not sure what. Any advice on how to test the rest of the system? I have a spare video card, so I'll probably try that later but over 2 hours in Heaven seems like a decent test. Any other suggestions?

Here are my specs:

AMD RYZEN 7 1700 8-Core 3.0 GHz
ASROCK AB350 Gaming-ITX/ac AM4
GIGABYTE AORUS Radeon RX 580
G.SKILL FORTIS 32GB (2 x 16GB) DDR4 2400
SILVERSTONE SST-SX650-G 650W SFX 80 PLUS GOLD
CRUCIAL MX500 2.5" 1TB SATA III
WD BLACK 2TB 7200 RPM SATA 6Gb/s

Thanks in advance.
 
Okay, I may have figured it out.

Found a StackOverflow post saying you need to enable AMD mircocode firmware support in software options.

I just checked this box, and now things seem working (at least for the past hour or two).

AMD_Microcode.jpg


Played about 1 hour of DeadCore and no freezing. Will keep an eye out, but I really hope this was it.
 
Actually, it's not fixed. It was working perfect for 2 days, but it's happened again. Any ideas?
 
So I think I finally found a solution.

I did a number of hardware stability tests from here: https://blog.codinghorror.com/is-your-computer-stable/ and everything passed without error.

Following this thread I set the cstate in grub: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690085/comments/66

Then I went into BIOS and disabled AMD Cool n Quiet and disabled C6.

Was previously getting a 100% reproducible crash when loading a level in the game DeadCore. I was able to load a level and quit 10 times with no crashes. So far so good.

Hopefully this was really it.
 
Hmm.. wait. Well those changes helped but I'm not totally in the clear. Looks like I may be affected by the Ryzen Linux segfault issue. Running kill-ryzen fails within seconds.

I've now disabled opcache in BIOS and kill-ryzen has been running for 15 minutes without segfaults. Only other thing I haven't tried is bumping up core and soc voltages (which I'll try if the compile fails).

Really not looking forward to RMAing the CPU if I'm still affected, as apparently some people have RMAd and got bunk chips as replacements. But I'll see how things look.
 
So kill-ryzen still failed after running for 4 hours, but there weren't any segfaults this time. Maybe running out of memory or some other issue. I'm going to play some games and other tests but maybe I'm okay.
 
So, I had about 1 week of no issue, then it started freezing again. It happened just browsing Firefox, nothing intensive.

However, I updated to the latest BIOS on my mobo and things seem to be better. However, seeing as it's only been a few days I can't be sure since last time it took a week between freezes.

If it happens again I will RMA the processor as it seems clear to me that has to be the issue now.
 
Was your 1700 from batch 26 or earlier?

If so, it's pretty damn likely you got a bad chip for linux (segfaults etc)

Yes, that is one thing I considered. Unfortunately, I didn't know enough about it when I built the machine, and it seems the only way to see the week number is the text printed on the chip (meaning I would have to remove the heatsink).

Well, I was aware of the segfault Linux issue when I was picking parts, but I (maybe naively) figured I would be getting a new chip at this point. But maybe not.
 
Yeah, if you've already disabled opcache and turned off XMP....

Might be time to check the batch # friendo.
 
Finally contacted AMD about getting a replacement. I hope they're easy to work with. At this point, I'm pretty sure it's the Linux segfault issue people were talking about.

While tweaking the settings did help somewhat, I paid for an 8 core chip and I want to be able to use all the features it's supposed to come with.
 
So, my chip was week 22 and everything before week 25 could potentially be affected.

Kind of a big pain in the butt to take the machine apart. The way my SFF case is, I just barely got the HSF and CPU removed without disassembling everything. However, there's almost no way I can put a new CPU back in without removing the motherboard from the case, so basically a rebuilt. I had no choice, though.
 
So, I replaced the chip with a Ryzen 2600 and it's working great now. No more Firefox crashes, and kill-ryzen ran for 1 hour without segfault.

Did have to down-grade to a HSF with clips so I could install without rebuilding the whole rig. The new cooler is pretty low-end, at 55C idle temp, but I guess I can live with that.
 
Back
Top