Ryzen issues with Linux

Really this is AMD's fault. Some time ago they reset the AGESA numbers and started over.
1.0.0.4 is newer than the old 1.0.0.6b from last year.

According to various people on other forums, 1.0.0.6b is actually more stable than 1.0.0.4, at least on GB motherboards. There were a few reports that people going back to ancient BIOS versions solved most of their problems, unfortunately no BIOS on my board has that AGESA revision and all the ones i tried didn't make the board stable.
 
OK so the Asus board hard locked now after 5-6 (?) days of uptime, leading me back to the CPU again.

NOW all of this AMD garbage is going in the trash. Never buying an AMD CPU again.
 
I can understand your frustration with this. I am glad I waited to purchase Ryzen2 instead of Ryzen1 for my linux server / pvr.
 
I can understand your frustration with this. I am glad I waited to purchase Ryzen2 instead of Ryzen1 for my linux server / pvr.

Ryzen 2 suffers from the exact same issues as Ryzen 1. People have had the segfault bug on the 2x00G APUs and the normal desktop variants as well.
 
Ryzen 2 suffers from the exact same issues as Ryzen 1. People have had the segfault bug on the 2x00G APUs and the normal desktop variants as well.
When were these claims regarding the normal desktop variants? I do remember reading something about the segfault problem but I also thought I read reports that it has been fixed since. So, when were these claims posted and have you heard/read anything about a fix?
 
When were these claims regarding the normal desktop variants? I do remember reading something about the segfault problem but I also thought I read reports that it has been fixed since. So, when were these claims posted and have you heard/read anything about a fix?

There are kernel bug reports about Ryzen processors of all models and all configurations up until today. There is no single setting that will fix them either, there are dozens of different methods involving playing musical chairs with hardware, dicking around with UEFI settings, various kernel boot parameters, compiling custom kernels with various options, etc.

You'd figure that millions of people plugging away at this problem for more than a year would be able to solve it, but that's not the case.
 
There are kernel bug reports about Ryzen processors of all models and all configurations up until today. There is no single setting that will fix them either, there are dozens of different methods involving playing musical chairs with hardware, dicking around with UEFI settings, various kernel boot parameters, compiling custom kernels with various options, etc.

You'd figure that millions of people plugging away at this problem for more than a year would be able to solve it, but that's not the case.
Do you have any sources that the bug is still present with Ryzen 2 processors?

Anyway, from what I read, I would try using a more recent kernel - I think the first one I'd try is 4.18 (which is stable now). I am currently using an older one but I have to upgrade my OS soon (I use Ubuntu). I have an older Intel system, though.

I found this discussion but it won't be of much help, I guess....

https://www.extremetech.com/computing/254750-amd-replaces-ryzen-cpus-users-affected-rare-linux-bug
https://community.amd.com/thread/225795

It was difficult finding anything recent that was about problems with the Ryzen 2 series of processors and kernel bug reports - at least, I coudn't find any.

It is disconcerting, though. Maybe, I should go with the Intel chip - to avoid this stuff? I dunno...
 
Last edited:
Do you have any sources that the bug is still present with Ryzen 2 processors?


3 months ago with ongoing discussion. I can't find the bugzilla log I was reading before, but it had 450+ entries of people complaining about hard locks with all Ryzen CPUs, regardless of whether 1000 or 2000 series.

Buy Intel, Sell AMD.

If deep pockets, buy a Talos II POWER9 workstation and abandon the sinking x86 ship early. Both blue and red have severe problems with unfixable CPU bugs that date back to the mid 90s (meltdown, spectre, etc.)

At least with the POWER9, you can go all the way down to the CPU firmware to make sure there are no hidden intel management engines, cyrix control chips or AMD "TrustZone"
 
Last edited:
The idle freezes are almost certainly due to the C-states bug (affecting Ryzen 1000 and 2000 series) which was already mentioned: https://bugzilla.kernel.org/show_bug.cgi?id=196683
The workarounds mentioned there should avoid the problem entirely.
Note that for rcu_nocbs=... to work in particular, your kernel needs CONFIG_RCU_NOCB_CPU enabled.
 
The workarounds mentioned there should avoid the problem entirely.
Note that for rcu_nocbs=... to work in particular, your kernel needs CONFIG_RCU_NOCB_CPU enabled.

This doesn't work for everyone, myself included.
 
Couple possible solutions here: downgrade to agesa 1.0.0.2c or disable the psp crypto module in the kernel (may need to recompile for that).
 
3 months ago with ongoing discussion. I can't find the bugzilla log I was reading before, but it had 450+ entries of people complaining about hard locks with all Ryzen CPUs, regardless of whether 1000 or 2000 series.

Buy Intel, Sell AMD.

If deep pockets, buy a Talos II POWER9 workstation and abandon the sinking x86 ship early. Both blue and red have severe problems with unfixable CPU bugs that date back to the mid 90s (meltdown, spectre, etc.)

At least with the POWER9, you can go all the way down to the CPU firmware to make sure there are no hidden intel management engines, cyrix control chips or AMD "TrustZone"
This, perhaps?:
https://bugzilla.kernel.org/show_bug.cgi?id=196683

If it's happening to you, I would make sure the mobo is using the most current BIOS (you are using an Asus mobo, maybe?) and go to the BIOS settings and check to see a setting is at 'Typical Current Idle.' You only have to read the last few/bunch of comments - I would try that and see if it makes a difference.
 
This, perhaps?:
https://bugzilla.kernel.org/show_bug.cgi?id=196683

If it's happening to you, I would make sure the mobo is using the most current BIOS (you are using an Asus mobo, maybe?) and go to the BIOS settings and check to see a setting is at 'Typical Current Idle.' You only have to read the last few/bunch of comments - I would try that and see if it makes a difference.

Already tried all of the "fixes" outlined in the thread and elsewhere. I've spent weeks doing this.

Both boards have had their BIOSes flashed to different versions, UEFI settings changed, etc.
 
Already tried all of the "fixes" outlined in the thread and elsewhere. I've spent weeks doing this.

Both boards have had their BIOSes flashed to different versions, UEFI settings changed, etc.
Okay, sorry. I don't mean to annoy you - if you have tried it already and are familiar with the links I posted. That really sucks. I use Linux primarily, so I think it is worrisome to consider an AMD (Ryzen) system - I know I would be annoyed with frequent freezes or crashes stemming from some BIOS/power or cpu-related issue - the more frustrating part of it seems to be no real explanation or sign of an upcoming solution. I think on the amd reddit, some AMD employees sometimes visit so someone should post about this?

I'm not sure what to go with but perhaps, I should re-consider the Intel build - the Spectre/Meltdown thing sucks but the latest reports indicate it's a small penalty and the fixes seem to be okay. It hasn't stopped people from still buying Intel hardware - I am only buying the mobo new so not really supporting them (either) that much. :)
 
Okay, sorry. I don't mean to annoy you - if you have tried it already and are familiar with the links I posted.

I think you'd be annoyed too if you spent hundreds of dollars on hardware that was broken and manufacturers offer no definitive fix for it being broken, AND you can't return or sell it because it's defective. I keep a fairly hefty pile of spare hardware around, so I could swap everything multiple times (except the board and CPU), but it was still an aggravating process of the machine taking up room and being a constant state of discombobulation.

I'm not sure what to go with but perhaps, I should re-consider the Intel build - the Spectre/Meltdown thing sucks but the latest reports indicate it's a small penalty and the fixes seem to be okay. It hasn't stopped people from still buying Intel hardware - I am only buying the mobo new so not really supporting them (either) that much. :)

The meltdown software fixes only minimally impact the latest hardware, you have to understand that every Intel CPU since the Pentium is affected by it, and the older you go, the more of a performance impact it will have. Though, beyond a certain point, software patches are unavailable. Microsoft isn't going to blow the dust off the Windows 98 source code and implement a fix for it. For others it's slightly more problematic, there are still plenty of late Core 2 CPUs and machines floating around running Windows 7/8/10 and those machines are going to be the most heavily impacted with perf hits.

I would recommend Intel, but not their latest gen parts because Linux support is spotty. The 6th gen is the latest I'd go right now, which is what I use for my main gaming rig.
 
I think you'd be annoyed too if you spent hundreds of dollars on hardware that was broken and manufacturers offer no definitive fix for it being broken, AND you can't return or sell it because it's defective. I keep a fairly hefty pile of spare hardware around, so I could swap everything multiple times (except the board and CPU), but it was still an aggravating process of the machine taking up room and being a constant state of discombobulation.

The meltdown software fixes only minimally impact the latest hardware, you have to understand that every Intel CPU since the Pentium is affected by it, and the older you go, the more of a performance impact it will have. Though, beyond a certain point, software patches are unavailable. Microsoft isn't going to blow the dust off the Windows 98 source code and implement a fix for it. For others it's slightly more problematic, there are still plenty of late Core 2 CPUs and machines floating around running Windows 7/8/10 and those machines are going to be the most heavily impacted with perf hits.

I would recommend Intel, but not their latest gen parts because Linux support is spotty. The 6th gen is the latest I'd go right now, which is what I use for my main gaming rig.
I hear ya.... also, I read the other replies to you on this and I read the replies in those threads of people telling you to modify kernel lines/code etc. etc. and I think that's outrageous. Even one person said 'you shouldn't have to do this' and I agree with him. I think it was in the bugzilla thread.

I'm not familiar with Linux issues with Coffee Lake builds but I think it can't be as 'screwy' as the Ryzen issues - having your computer freeze because you are idle is pretty bad. It's also difficult (as least for me) to determine how much has been 'fixed' as it doesn't seem to impact everyone but I hate trying to fix an issue in which someone claims 'I don't have that issue' and then you find multiple people who claim it does. I'd rather not just risk having that problem - because, like you said, if you have a component that has a rep for issues - it's difficult to sell. Even if the prospective buyer isn't a Linux user, if they are familiar with the topic, it's bad. However, I think I read that this issue doesn't occur in Windows which is peculiar, isn't it?

I think the Coffee Lake processors have a performance impact but it's not substantial so it might be an issue that is easier to live with than Ryzen processors that are part of a system that crashes or freezes at times or in which you are always worried it will happen. It seems to be a widespread problem for some people or for many people, it's difficult to say now but another issue is AMD's reaction. I know that they admitted to some other (different?; same?) problem and were willing to exchange processors. However, that didn't seem to solve the problem for a lot of customers. Maybe, it's a defect or flaw in the component that can't easily be fixed or something? Or perhaps, only some lucky owners can 'find' a fix? Dunno.

P.S. I have an Intel Core 2 Quad cpu. Q6600. I believe I'm effected. :-(
 
I'm posting this on the possible chance you haven't tried this one setting change yet...

Link:

""Deep Sleep State" - 'not familiar with it but apparently it's disabled in the BIOS - and the author is saying to enable that setting. If you are still having issues (obviously - you are?) and you haven't tried that change then I suggest you have nothing to lose trying it? Then, enable the other settings, C6 etc. - don't use kernel parameter edits etc.
 
  • Like
Reactions: Nobu
like this
However, I think I read that this issue doesn't occur in Windows which is peculiar, isn't it?

Nope, it happens in Windows too, but it's again only certain people that have it. I never installed Windows on this box because it's not what it's going to be used for. I'm sending the Gigabyte board off for RMA and see what they do. If they end up giving me the same board back, I'm just going to wait until the Athlon 200GE is available, make a crappy Windows box with it and sell it on.

I know that they admitted to some other (different?; same?) problem and were willing to exchange processors. However, that didn't seem to solve the problem for a lot of customers. Maybe, it's a defect or flaw in the component that can't easily be fixed or something? Or perhaps, only some lucky owners can 'find' a fix? Dunno.

I don't know how AMD historically has handled RMAs for CPUs, but they could be trying to save face. That and people like me are the minority of the minority in their market, it probably doesn't cost them much to replace said defective parts.

P.S. I have an Intel Core 2 Quad cpu. Q6600. I believe I'm effected. :-(

Like I said, all Intel CPUs going back to the Pentium are affected. I have a Pentium Dual-Core E6300 in service as a router and it has both Meltdown and Spectre cpu bug flags listed

Code:
[root@Phobos-IV]# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Pentium(R) Dual-Core  CPU      E6300  @ 2.80GHz
stepping        : 10
microcode       : 0xa0b
cpu MHz         : 1600.024
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm pti tpr_shadow vnmi flexpriority dtherm
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 5599.98
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:
 
I'm posting this on the possible chance you haven't tried this one setting change yet...

Link:

""Deep Sleep State" - 'not familiar with it but apparently it's disabled in the BIOS - and the author is saying to enable that setting. If you are still having issues (obviously - you are?) and you haven't tried that change then I suggest you have nothing to lose trying it? Then, enable the other settings, C6 etc. - don't use kernel parameter edits etc.

I'll have to try this. I don't remember seeing the option in UEFI setup, but I wasn't looking for it.
 
I thought for sure someone had already suggested that, but it looks like that's the first. If it's not in advanced cpu configuration, check AMD CBS -> zen common options (this menu is hidden on most gigabyte firmware versions, not sure about the latest as I haven't upgraded).
 
What distro and what kernel? Maybe something bleeding edge like openSUSE Tumbleweed which has a very current kernel version, that's what I use on my rma'd 1500X + Asus Prime B350.
 
ASRock calls the setting that fixes this "global C-state control" (actually affecting DF / uncore apparently). I would be uncomfortable disabling that with 2800+ RAM, because that bumps VSoC way up on OG Zen, but with 2666 that shouldn't be a problem and it has surprisingly little effect on power consumption.

(I had the same problem with an R7 1700 and that's been a solid fix.)
 
What distro and what kernel? Maybe something bleeding edge like openSUSE Tumbleweed which has a very current kernel version, that's what I use on my rma'd 1500X + Asus Prime B350.

Currently, Fedora 28 with latest updates. But I have tried Xubuntu 18.04 and OpenSUSE 42.3. I'm reluctant to change the OS again since I've spent the last day migrating files, scripts and configs over to it from by backup server and don't really want to have to do it again. Might do more experiments when the Gigabyte board comes back from RMA.

Fedora 28 on the Asus Prime B450M-A and the RMA'd CPU has only frozen once on a screen saver, it hasn't done it again yet. I'm going out of town the next 4 days so I'll see if it's locked up when I return.

ASRock calls the setting that fixes this "global C-state control" (actually affecting DF / uncore apparently). I would be uncomfortable disabling that with 2800+ RAM, because that bumps VSoC way up on OG Zen, but with 2666 that shouldn't be a problem and it has surprisingly little effect on power consumption.

(I had the same problem with an R7 1700 and that's been a solid fix.)

I believe I remember seeing that option and I think that I turned it on, but not 100% sure. I'll have to look at it when I get back in 4 days.

I haven't run the RAM at its rated speed but once on the old motherboard and it was so unstable that I just opted to leave it at 2133. Currently the Patriot RAM isn't even in the machine because I think it's part of the problem. It's currently running on a single 4 GB stick of generic Altex RAM.
 
Currently, Fedora 28 with latest updates. But I have tried Xubuntu 18.04 and OpenSUSE 42.3. I'm reluctant to change the OS again since I've spent the last day migrating files, scripts and configs over to it from by backup server and don't really want to have to do it again. Might do more experiments when the Gigabyte board comes back from RMA.

Fedora 28 on the Asus Prime B450M-A and the RMA'd CPU has only frozen once on a screen saver, it hasn't done it again yet. I'm going out of town the next 4 days so I'll see if it's locked up when I return.



I believe I remember seeing that option and I think that I turned it on, but not 100% sure. I'll have to look at it when I get back in 4 days.

I haven't run the RAM at its rated speed but once on the old motherboard and it was so unstable that I just opted to leave it at 2133. Currently the Patriot RAM isn't even in the machine because I think it's part of the problem. It's currently running on a single 4 GB stick of generic Altex RAM.
What happened when you tried enabling that setting? Btw, Fedora 28? What kernel are you using? I think you should be using, at least, kernel 4.18.8 or later - so might need to upgrade the kernel if you're not at that version or later. Most of the people claiming their system was working seemed to be using a really recent kernel and I think the stable one is 4.18 but people were still saying they were having lockups. I haven't read of anyone using 4.18.8 or later claiming any.
 
It's something that's enabled by default, and the fix is disabling it, not the other way around. IIRC the options are auto/enabled/disabled though, so that may not be completely clear just looking at it.
 
What happened when you tried enabling that setting? Btw, Fedora 28? What kernel are you using? I think you should be using, at least, kernel 4.18.8 or later - so might need to upgrade the kernel if you're not at that version or later. Most of the people claiming their system was working seemed to be using a really recent kernel and I think the stable one is 4.18 but people were still saying they were having lockups. I haven't read of anyone using 4.18.8 or later claiming any.

I'm out on a business trip right now, I left before I had another chance to look at the machine. I'll be back on thursday and have more time to play with it.
 
It's something that's enabled by default, and the fix is disabling it, not the other way around. IIRC the options are auto/enabled/disabled though, so that may not be completely clear just looking at it.
Did you read that post by that Reddit user? He claimed that the default was that it is disabled. The other settings are enabled. The only way you can both be right is if the mobo manufacturers were changing their BIOS defaults?
 
Did you read that post by that Reddit user? He claimed that the default was that it is disabled. The other settings are enabled. The only way you can both be right is if the mobo manufacturers were changing their BIOS defaults?
They use different menu names and hide different settings, different defaults would not surprise me. That said, doesn't hurt to try both.
 
Sorry, I was responding to "I believe I remember seeing that option and I think that I turned it on, but not 100% sure."

My board has a deep sleep option (in south bridge settings), but I've got to admit I'm a little skeptical of it being a fix. If you hit up AMD support about this, their fixes are VSoC and Vcore (in that order), and deep sleep appears to be something fairly far removed from that about S-states. Global C-state control (in AMD CBS -> Zen Common Options) looks like it disables some low-power DF/uncore states achievable with the system idling normally. My interpretation of the problem (could be way off-base) is that the process of bringing the DF up and out of some low-power state is bugged, and can be fixed either by not going into that low-power state or by throwing a lot more voltage at it.

My rig is an ASRock AB350 Gaming-ITX/ac w/ 1.0.0.6b and R7 1700 stock-clocked (warranty replacement golden sample for the compile segfault problem), and it'll run into the idle hang weekly or so without a fix at 912mV VSoC / 2666 RAM. Going to 1075mV / 2800 fixes it (those are default VSoCs for 2666 and 2800), but backing off the voltage at all from there at 2800 makes it happen again. At 912/2666 with global C-state control disabled, it seems rock solid (haven't run into the hang in probably close to a year of use now).
 
Don't toss it! Most ram is lifetime warranted, you could sell it as is, or RMA it and sell what you get back.

Turns out the system had two "problems". First is that the ram really doesn't make rated 15-15-15 @ 3600 or 10-10-10@2400, but bumping the second number up a notch cures that, I'm too lazy to rma for that. Second was cpu overclock which I hadn't intended to leave from some prior messing around and benchmarking runs. Which sabotaged my memory testing, fml.
 
May not help you at all, and I skimmed through the whole thread but may have missed it. Still, I've had issues with it on more than one occasion, and I tend to forget it often, so it doesn't hurt to try:

I see you updated BIOSes. Did you try resetting settings using the Clear CMOS jumper on the MoBo?
 
Just got back from my business trip from hell and the Ryzen system hasn't locked up or crashed yet, so at least it's a bit more stable. I'll have to re-test with this Patriot RAM though, it's been running on the generic 4 GB stick.

ASRock calls the setting that fixes this "global C-state control" (actually affecting DF / uncore apparently). I would be uncomfortable disabling that with 2800+ RAM, because that bumps VSoC way up on OG Zen, but with 2666 that shouldn't be a problem and it has surprisingly little effect on power consumption.

(I had the same problem with an R7 1700 and that's been a solid fix.)

Here are my settings from the "AMD CBS" page in UEFI setup:

Core performance Boost - Disable
Memory Interleaving - None
IOMMU - Auto
Global C-State Control - Enabled
Power Supply Idle Control - Typical Current Idle
Opcache Control - Disabled

What happened when you tried enabling that setting? Btw, Fedora 28? What kernel are you using? I think you should be using, at least, kernel 4.18.8 or later - so might need to upgrade the kernel if you're not at that version or later. Most of the people claiming their system was working seemed to be using a really recent kernel and I think the stable one is 4.18 but people were still saying they were having lockups. I haven't read of anyone using 4.18.8 or later claiming any.

Just checked the Fedora 28 install, it's running 4.18.8-200.fc28.x86_64.

May not help you at all, and I skimmed through the whole thread but may have missed it. Still, I've had issues with it on more than one occasion, and I tend to forget it often, so it doesn't hurt to try:

I see you updated BIOSes. Did you try resetting settings using the Clear CMOS jumper on the MoBo?

It did that automatically on the Gigabyte board when the BIOS was flashed, haven't changed the BIOS on the Asus board since it was already the latest.
 
So I re-installed the Patriot memory to do more testing and decided on memtesting it again just to be sure because of all the other ridiculous problems I've been having.

Sure enough, these modules went bad again. The initial 8 hour memtest when I first got the replacement modules a couple weeks ago passed with no errors.

TZXaKcFh.jpg


This is the second set of memory modules I've gotten consecutively which were bad. They're not as bad as the memory in the OP video, but still bad nonetheless.

So far I've had a bad PSU, bad board, bad cpu and four bad memory modules all on a new build. maybe I should go back to buying sketchy ebay parts from Chinese import sellers, I seem to have better luck with those..
 
is this 2x4GB? What is the speed and primary memory timings? (ie DDR4-2400 15-15-15-35 2T) What voltage? Got 'em in the right slots for parallel access, one per channel A and one per channel B?
 
Doesn't matter what speed (2133 or 2666) or what slots they're in, they generate errors regardless. The 2133 timings are 15-15-15-36, dunno what the 2666 timings are.

The memory error addresses move around when the sticks are swapped, so it looks like only one of the sticks is bad.

The modules don't specify they want additional voltage to run at any speed and shouldn't need it.
 
this probably stupid but, have you considered trying Kingston? or this is mqybe a kit or brand known to have incompatibilities? my buddy did a build with an asrock or something, the board or chipset doesn't seem to like ripjaws.
 
I can tell you I have had no issue with my crucial DDR4 2400 ECC unbuffered dimms (2 X 8GB) on my X470 board. Although you probably did not want ECC. I got that because the Ryzen system is going to be a server / linux pvr device.
 
Last edited:
That is really bad luck you are having there.

I have had a pretty decent experience with my Ryzen build, but I only run Linux in virtualbox. I am tempted to install Ubuntu alongside Windows to see if it has any issues....
 
Back
Top