New Build, Getting Freezing and Crashes. Advice?

Discussion in 'General Hardware' started by cybereality, Jan 8, 2018.

  1. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    So I just built a new build and it's not stable. Running Ubuntu now, and I've gotten repeated freezing (total lockup, image stays but PC cannot be recovered without pressing the power switch) and some game crashing. The freezes have happened at different times, on a Firefox browser, just on the Terminal typing, or while compiling code. One game, DeadCore, crashes to desktop after 1 minute into a level (I tried like 5 times, same deal). I've also tried both Ubuntu 17.10 and 16.04, which froze minutes into getting into the desktop without even installing anything. I feel like the issue may be in the hardware, but I'm not sure what.

    This is what I've tried so far:

    - 24 hours of MemTest86+, completed 8 passes, no errors.
    - 22 hours of Prime95, blended stress test, no errors.
    - 2 and a half hours with Unigine Heaven running in loop, 1080p medium, no crashing.

    I feel pretty good that the RAM and CPU are OK. So it could be something else, I'm not sure what. Any advice on how to test the rest of the system? I have a spare video card, so I'll probably try that later but over 2 hours in Heaven seems like a decent test. Any other suggestions?

    Here are my specs:

    AMD RYZEN 7 1700 8-Core 3.0 GHz
    ASROCK AB350 Gaming-ITX/ac AM4
    GIGABYTE AORUS Radeon RX 580
    G.SKILL FORTIS 32GB (2 x 16GB) DDR4 2400
    SILVERSTONE SST-SX650-G 650W SFX 80 PLUS GOLD
    CRUCIAL MX500 2.5" 1TB SATA III
    WD BLACK 2TB 7200 RPM SATA 6Gb/s

    Thanks in advance.
     
  2. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    Okay, I may have figured it out.

    Found a StackOverflow post saying you need to enable AMD mircocode firmware support in software options.

    I just checked this box, and now things seem working (at least for the past hour or two).

    AMD_Microcode.jpg

    Played about 1 hour of DeadCore and no freezing. Will keep an eye out, but I really hope this was it.
     
    Speedeu4ia likes this.
  3. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    Actually, it's not fixed. It was working perfect for 2 days, but it's happened again. Any ideas?
     
  4. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    So I think I finally found a solution.

    I did a number of hardware stability tests from here: https://blog.codinghorror.com/is-your-computer-stable/ and everything passed without error.

    Following this thread I set the cstate in grub: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1690085/comments/66

    Then I went into BIOS and disabled AMD Cool n Quiet and disabled C6.

    Was previously getting a 100% reproducible crash when loading a level in the game DeadCore. I was able to load a level and quit 10 times with no crashes. So far so good.

    Hopefully this was really it.
     
  5. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    Hmm.. wait. Well those changes helped but I'm not totally in the clear. Looks like I may be affected by the Ryzen Linux segfault issue. Running kill-ryzen fails within seconds.

    I've now disabled opcache in BIOS and kill-ryzen has been running for 15 minutes without segfaults. Only other thing I haven't tried is bumping up core and soc voltages (which I'll try if the compile fails).

    Really not looking forward to RMAing the CPU if I'm still affected, as apparently some people have RMAd and got bunk chips as replacements. But I'll see how things look.
     
    Speedeu4ia likes this.
  6. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    So kill-ryzen still failed after running for 4 hours, but there weren't any segfaults this time. Maybe running out of memory or some other issue. I'm going to play some games and other tests but maybe I'm okay.
     
  7. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    So, I had about 1 week of no issue, then it started freezing again. It happened just browsing Firefox, nothing intensive.

    However, I updated to the latest BIOS on my mobo and things seem to be better. However, seeing as it's only been a few days I can't be sure since last time it took a week between freezes.

    If it happens again I will RMA the processor as it seems clear to me that has to be the issue now.
     
  8. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    12,408
    Joined:
    Oct 7, 2000
    try manually bumping both the cpu and ram voltage a little see if it helps stabilize it.
     
    cybereality likes this.
  9. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    OK thanks. That was the last thing I hadn't tried. Just bumped CPU up 150 mV, SoC up 175 mV, and RAM to 1.3V. I'll see how that goes.
     
    Speedeu4ia and pendragon1 like this.
  10. horrorshow

    horrorshow [H]ardness Supreme

    Messages:
    4,569
    Joined:
    Dec 14, 2007
    Was your 1700 from batch 26 or earlier?

    If so, it's pretty damn likely you got a bad chip for linux (segfaults etc)
     
    cybereality and pendragon1 like this.
  11. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    12,408
    Joined:
    Oct 7, 2000
    was not aware of this issue, don't deal with too Linux much.
     
    Chikia12187 likes this.
  12. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    Yes, that is one thing I considered. Unfortunately, I didn't know enough about it when I built the machine, and it seems the only way to see the week number is the text printed on the chip (meaning I would have to remove the heatsink).

    Well, I was aware of the segfault Linux issue when I was picking parts, but I (maybe naively) figured I would be getting a new chip at this point. But maybe not.
     
  13. horrorshow

    horrorshow [H]ardness Supreme

    Messages:
    4,569
    Joined:
    Dec 14, 2007
    Yeah, if you've already disabled opcache and turned off XMP....

    Might be time to check the batch # friendo.
     
    cybereality likes this.
  14. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    Finally contacted AMD about getting a replacement. I hope they're easy to work with. At this point, I'm pretty sure it's the Linux segfault issue people were talking about.

    While tweaking the settings did help somewhat, I paid for an 8 core chip and I want to be able to use all the features it's supposed to come with.
     
  15. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    So, my chip was week 22 and everything before week 25 could potentially be affected.

    Kind of a big pain in the butt to take the machine apart. The way my SFF case is, I just barely got the HSF and CPU removed without disassembling everything. However, there's almost no way I can put a new CPU back in without removing the motherboard from the case, so basically a rebuilt. I had no choice, though.
     
  16. cybereality

    cybereality 2[H]4U

    Messages:
    3,088
    Joined:
    Mar 22, 2008
    So, I replaced the chip with a Ryzen 2600 and it's working great now. No more Firefox crashes, and kill-ryzen ran for 1 hour without segfault.

    Did have to down-grade to a HSF with clips so I could install without rebuilding the whole rig. The new cooler is pretty low-end, at 55C idle temp, but I guess I can live with that.