X470 ECC Support

IdiotInCharge

NVIDIA SHILL
Joined
Jun 13, 2003
Messages
14,710
I have zfs on several single drive installations (home and work). I make good use of the other features like snapshots and send / receive.

Understood- however, giving up trim on a modern drive is likely to not be winning proposition on an OS drive. From a usability perspective, and this is absolutely subjective, it might be worth investigating alternatives so that you can keep trim support.
 

drescherjm

[H]F Junkie
Joined
Nov 19, 2008
Messages
14,919
I just finished reading the entire thread. I am not sure that this issue is what I have.

What happens to me is after some period of time the system hard locks up. After pressing the reset button the sometimes (not always) the nvme drive is not detected. When the nvme drive disappears, I have to first power off the machine and then boot from a sysrescue usb stck reinstall grub on the nvme drive then reboot.

There is one other weird symptom that had happened 2 times. If I was logged in via ssh (GUI already locked up) when this happened on htop there was 1 thread had an entire core 100% used in a kernel task. Several tasks were unkillable, however I could read and write to the nvme drive however eventually the ssh would disconnect.

Before the kernel update (that I mention in the previous post) I was able to trigger this behavior by rebuilding a lot of large packages in gentoo (like chromium).


BTW, The nvme drive has ZOL installed and always scrubbs without errors even after it disappears from the system.

Thankfully replacing the WDC Black G1 NVMe drive with a Samsung 960 1TB NVMe seems to have fixed the problem.
 

Jandor

Gawd
Joined
Dec 30, 2018
Messages
585
I know this is already old.
However I made 2 systems using ECC memory.
1st is using an Asus Prime B350m-k + Ryzen 1600x + 2x4GB RAM ECC Kingston Value DDR4 2400 (non Micron)
Ran flawless from the beginning. Windows 7 Pro OEM. used as server. OS on raid 1 sata made of Crucial MX500 250GB and Samsung 860 evo 250GB. Both sata drives in Raid mode. I didn't take time to upgrade the SSD firmwares.

2nd is an Asus Prim B350m-a + Ryzen 2600 + 2x8GB RAMM ECC Kingston Value DDR4 2400 (non Micron).
Ran from the beginning but after needed Bios upgrade (used a Ryzen 1700), Windows 7 Pro OEM. Running 2 sata SSD as Raid1 with Crudual MX 500 500GB and Samsung Evo 860 500GB.

The ECC memory I grabbed on Amazon is the less expensive I could find.

So both running fine. I stopped all Microsft updates at february 2017. Using MS antivirus and Windows Firewall. May swap to Avira and Comodo in the future.

Opinions needed.
 

Jandor

Gawd
Joined
Dec 30, 2018
Messages
585
Opinions on what?
Opinion on the stuff and the use. I like 0 problem working hardware that doesn't need maintenance. The people using it will be on a user profile.

Do the SSD need upgrading to run safe in Raid1 mode ? Also, I heard new SSD do not need Trim.
 

drescherjm

[H]F Junkie
Joined
Nov 19, 2008
Messages
14,919
I am running my Ryzen 2700 X470 ECC server / DVR box with gentoo linux as the OS MythTV as the DVR software. I originally had a 5XX GB WDC black G1 NVMe drive but found that that had an incompatibility with my board. This system is to replace a core2quad based system that had over 10 years of 24/7/365 with over 99% uptime. I have not finished the transfer yet (lack of free time + an injury got in the way). The system is stable since the BIOS update. Although I do have an Arduino based watchdog that monitors it to ensure it will reboot if it crashes. With that said it has not needed to do so and I have yet to actually connect the reset switch for it to take action..
 
Last edited:

Jandor

Gawd
Joined
Dec 30, 2018
Messages
585
Well, to be precise.
I put 2 SSD in raid1 mode : Crucial MX 500 250GB, and Samsung 860 Evo 250GB. the raid1 volume is 220GB, left 30GB for overprovisioning if it works. The Bios is the original for both and didn't do any trick to extend the hardware overprovision with any tool. Not sure if it works. Don't know how to check (all SSD tools are made for AHCI mode). For now it's 100% functional. Windows 7 Pro.
Another one : 2 volumes Raid1. Same brands 500GB. Updated firmwares (by going through AHCI mode first). Created 400GB volume left 100GB for overprovisioning (not hardware again).

Opinions and advice needed.
Do you believe the overprovisioning is working, that Trim is not needed ? I even wonder if the drivers do not use trim by themselves. It is supposed to be ultra-easy to implement Trim at driver level on Raid1 (not 0).
For instance, just thinking that if Trim doesn't work, Raid1 continuous sync activity may forbid overprovisioning use while resting, because it never rests. Also free space left for overprovisioning may not be seen by the SSD chipset on raid1 volumes. I wonder about all those things.
Those computers are not going to be heavy duty servers ! But one of the cell of those SSD may support around 500 writes before failing in worst case, so I'm trying to be on the safe side about rewrites and those things that matter. MLC SSD are only twice better, and SLC would be great if still produced (like 50 times better).
 
Last edited:

Jandor

Gawd
Joined
Dec 30, 2018
Messages
585
What is Gigabytes cheapest 6-layer PCB socket AM4 board?
It's an old post but I looked into it as I like Gigabyte. They were out of stock so I went Asus.
There is the old AX370 Gaming 5 (not K5) and also the AB350n (mini-iTX) and the B450i (mini-ITX). Those are the most inexpensive AM4 Boards at Gigabyte supporting ECC and also no so full of Led thing, I personally hate.
 

Warriorprophet

[H]ard|Gawd
Joined
May 22, 2001
Messages
1,507
DDR4 already has built in error correction that is seriously fine unless your system is literally going to be powered on for years at a time where unswapped RAM pages might actually degrade...

If you reboot your system more often than annually, and you are not compiling things like OS kernels for a living, then you'll be fine and see WAY better performance with normal DDR4 ram.
 

osrk

[H]ard|Gawd
Joined
Jan 10, 2003
Messages
1,953
DDR4 already has built in error correction that is seriously fine unless your system is literally going to be powered on for years at a time where unswapped RAM pages might actually degrade...

If you reboot your system more often than annually, and you are not compiling things like OS kernels for a living, then you'll be fine and see WAY better performance with normal DDR4 ram.
For NAS users that use ZFS it's apparently pretty instrumental in data integrity. This is the only reason I buy ECC. Other than that the standard stuff is fine.
 

Warriorprophet

[H]ard|Gawd
Joined
May 22, 2001
Messages
1,507
For NAS users that use ZFS it's apparently pretty instrumental in data integrity. This is the only reason I buy ECC. Other than that the standard stuff is fine.

Only on ddr3 platforms, on ddr4 its unnecessary and you are just sacrificing performance. There are tons of more knowledgeable people who have discussed this topic. Level1techs, forum posts with anecdotal and long term testing, et al.
 

osrk

[H]ard|Gawd
Joined
Jan 10, 2003
Messages
1,953
Only on ddr3 platforms, on ddr4 its unnecessary and you are just sacrificing performance. There are tons of more knowledgeable people who have discussed this topic. Level1techs, forum posts with anecdotal and long term testing, et al.
Do you mind sharing then?
 

Jandor

Gawd
Joined
Dec 30, 2018
Messages
585
Do you mind sharing then?
I think there is nothing to share. If you use DDR4 RAM without ECC you WILL experience bit errors and especially there is new RAM using thinner gates that will become even more sensible to this. There are 2 known reasons. First one is quality of RAM. Whatever you do there may be some bits on a chip prone to errors. This may happen at random occasions, like some small current surge, hotter temps in the case, or things like that. But if the chip is tested and of great quality and your system provide great current quality, the risk of this happening is low and will not become very much bigger with time. Circuits do wear with time and RAM circuitry is more prone to wear, for instance than a CPU.
Now there is another type of bit error that is unavoidable and that is due to natural radioactivity. As far as I remember you would have for best quality material for the best case, around 1 bit error per day for 32GB of RAM. That may not affect you but there are chances that it may corrupt a file system in the OS or a data file of yours. And that is true only with the best material.

ECC protection is very simple, costs less than 10% more to produce (it is another matter when you buy it as a basic customer), has nearly no speed penalty when in use. Every computer should run on ECC RAM.
 
Last edited:
Joined
Apr 6, 2019
Messages
2
@drescherjm, do you have a link or even better a part number for the Crucial DDR4 2400 ECC unbuffered memory? Have you seen any ECC triggered since last year? Though Asus now advertise ECC support, I'm finding it hard to find any 16Gb ECC part that's on their official QVL list.

Plus, ideally I'd like 32 Gb. Not sure if it's wise to get 2 and expect them to work well together, anybody know about that?
 

drescherjm

[H]F Junkie
Joined
Nov 19, 2008
Messages
14,919
There has not been a single ECC correction detected or corrected.

Code:
jmd1 ~ # edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
mc0: csrow0: mc#0csrow#0channel#1: 0 Corrected Errors
edac-util: No errors to report.

The part # for the set I am using is CT2K8G4WFS824A

It's interesting that the price is the exact same as I paid last May.
 
Last edited:

vkfu

Limp Gawd
Joined
Sep 20, 2009
Messages
262
I have been running pairs of ct16g4wfd824a 16GB DDR4-2400 ECC DIMMs in Asus Prime X370 and X470 Pro motherboards with Ryzen 3 1200 CPUs. At this point you should probably buy the Crucial DDR4-2666 ECC DIMMs instead.
 
Joined
Apr 6, 2019
Messages
2
The part # for the set I am using is CT2K8G4WFS824A
Thanks, that's very helpful! Not on their QVL but I guess that's because this is a consumer chip and they're just leaving ECC on without official support, presumably they want to protect their market segmentation but get some "power user brownie points" given their still (I suppose?) underdog status.

There has not been a single ECC correction detected or corrected.
Looking at the Google paper on this, apparently that isn't unexpected: around 90% of DIMMs were unaffected in a given year (the high typical error rates usually quoted are a result of the remaining 10% seeing a large number of errors and dominating the mean error rate). Seems if you still have it in a few years or you start loading it more heavily you may well start seeing error counts. So happily it seems your lack of error counts isn't in conflict with the theory that ECC is working fine with your setup.

http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

It's interesting that the price is the exact same as I paid last May.

Right, I presume as vkfu points out that's because it's been superceded by the ct16g4wfd8266 2666 Mhz variant, which is now about 25 USD cheaper in my (UK) market.
 

IdiotInCharge

NVIDIA SHILL
Joined
Jun 13, 2003
Messages
14,710
Not on their QVL

QVLs are almost never a full representation of the market- usually, they're not even close. Memory SKUs are updated constantly and it'd be an uphill battle for motherboard manufacturers to keep up with testing as well as updating their BIOSs when something weird happens.

Easiest thing to do is find something that should work, and then try to find someone that has already gotten it work- and make sure you have a good return policy.
 

kirbyrj

Fully [H]
Joined
Feb 1, 2005
Messages
28,101
QVLs are almost never a full representation of the market- usually, they're not even close. Memory SKUs are updated constantly and it'd be an uphill battle for motherboard manufacturers to keep up with testing as well as updating their BIOSs when something weird happens.

Easiest thing to do is find something that should work, and then try to find someone that has already gotten it work- and make sure you have a good return policy.

I appreciate the idea of a QVL, but he's right. At best it's a half list of RAM that was available at the time the board was made and tested to work. It doesn't represent different SKUs or variations of memory that's available, and the reality is anything that is similar to the QVL should work.
 

Aluminum

Gawd
Joined
Sep 18, 2015
Messages
687
I have been running pairs of ct16g4wfd824a 16GB DDR4-2400 ECC DIMMs in Asus Prime X370 and X470 Pro motherboards with Ryzen 3 1200 CPUs. At this point you should probably buy the Crucial DDR4-2666 ECC DIMMs instead.

Nope, as always if you care about unbuffered ECC on ryzen you should buy M391A2K43BB1 or M391A1K43BB1.

Memory clock is not locked like xeons, boosting the fabric is worth it and ryzen loves samsung B-die.
 
  • Like
Reactions: mikeo
like this

Jandor

Gawd
Joined
Dec 30, 2018
Messages
585
I think anything Crucial or other known brand for ECC (Kingston or Samsung) will go for 2400 or 2666. Il'll try to avoid overclocking as this is not the goal of ECC RAM.
I couldn't find any 32GB unbuffered ECC
 

Aluminum

Gawd
Joined
Sep 18, 2015
Messages
687
I think anything Crucial or other known brand for ECC (Kingston or Samsung) will go for 2400 or 2666. Il'll try to avoid overclocking as this is not the goal of ECC RAM.
I couldn't find any 32GB unbuffered ECC

I linked Samsung, specifically B-die part numbers. There is no real price difference between brands when shopping unbuffered ECC (even though standard desktop ram varies so much) so buy the best. On Zen 1 Samsung is simply the best, no contest. [insert random dude linking hynix-whatever with obviously inferior latency]

Ironically ECC is better in some ways for "overclocking": you don't have to run memtest for a week or wait for corrupt data and crashes to know whether a given speed @ given timings is stable. It either throws errors or it doesn't.

2933C16 is a no-brainer setting on any B-die (even DR & fully populated) and always improves performance on Zen 1 platforms due to the fabric being 1:1 linked with memory speed. For the adventurous with SR and less modules per channel there are common faster settings.

32GB unbuffered modules are not really out yet, may ship this year though with M-die or whatever its called.
 

encore2097

Weaksauce
Joined
Mar 14, 2010
Messages
115
Thinking of picking up an Asrock B450 Pro4 and a 2600.

Whats the preferred unbuffered ECC to go with this to hit 2933c16 ?

Is there a table of other magic speed/timings?

Im seeing Crucial 16GB VLP 2666 readily available for around $100.. not finding the Samsung linked above. found Samsung at Superbiiz
 
Last edited:

Jandor

Gawd
Joined
Dec 30, 2018
Messages
585
I would be very cautious on older than X570 motherboards updated with new BIOSes to support Zen 2. For instance MSI got rid of older APU, but also got rid of all the Raid support. It is not advertised what the motherboard manufacturers got rid of to embrace Ryzen 3000 on their tiny eeprom chip. I would not bet on the EEC feature on the motherboards that used to support it.
It would be interesting to have a list of what's going on, but nothing is mentioned nowhere.
Actually it seems AMD doesn't want to support anything older than X570 for Ryzen 3000 and so the manufacturers are left on their own with AGESA updates made for X570 and beta firmwares to mess with. I am sure they will abandon all the older chipsets in that state as AMD will announce new chipsets (5 new chipsets) in September, to replace all the older ones.
Most of the motherboards are now unstable and crash on certain condition. There is no good solution yet and may not be out of X570. But X570 means Windows 10 (no support for Windows 7) and thermal problems to come on the chipset.
 
Joined
Apr 22, 2021
Messages
1
B450-PLUS seems to work as well, running now with 2x KSM26ES8/8ME. I could see no mention of 6-layers PCB on Asus or Asrock or Gigabyte web so I just went with the Asus.
DMI: System manufacturer System Product Name/PRIME B450-PLUS, BIOS 1201 04/25/2019
smpboot: CPU0: AMD Ryzen 5 PRO 3400G with Radeon Vega Graphics (family: 0x17, model: 0x18, stepping: 0x1)
EDAC amd64: Node 0: DRAM ECC enabled.
EDAC amd64: using x4 syndromes.
cat /sys/devices/system/edac/mc/mc0/rank[4-5]/dimm_edac_mode SECDED SECDED
 

Jandor

Gawd
Joined
Dec 30, 2018
Messages
585
B450-PLUS seems to work as well, running now with 2x KSM26ES8/8ME. I could see no mention of 6-layers PCB on Asus or Asrock or Gigabyte web so I just went with the Asus.
DMI: System manufacturer System Product Name/PRIME B450-PLUS, BIOS 1201 04/25/2019
smpboot: CPU0: AMD Ryzen 5 PRO 3400G with Radeon Vega Graphics (family: 0x17, model: 0x18, stepping: 0x1)
EDAC amd64: Node 0: DRAM ECC enabled.
EDAC amd64: using x4 syndromes.
cat /sys/devices/system/edac/mc/mc0/rank[4-5]/dimm_edac_mode SECDED SECDED
Okay. So the point is, there is no low end limitation. ECC works on Asus A320M-K. I can tell you. I own two of them. One with 2 x 4GB Kingston 2400 ECC and and one with 2x8GB Kingston 2400 ECC. And also on a Asus B350m-Prime with 64GB ECC RAM 4x16GB 2400.

However the interesting point is the you confirm Radeon Pro APU work with ECC enabled, which is not the case, for sure, of the non Pro series.
And one of the most interesting is actually the Ryzen 5 Pro 4650G with is quite a value. It's only a little more expensive than the 3600 + a quite good GPU that probably bests the GT 1030.
I wonder if the GPU is also using the ECC spec of the RAM as this would make the whole an equivalent to a very expensive professionnal hardware with ECC on GPU.
 
Last edited:
Top