DDR5 ECC RDIMM temps, especially in a workstation chassis

compgeek89

Limp Gawd
Joined
Nov 30, 2018
Messages
249
This is more of a memory question, but chassis seems like the closest relevant category!

I have just built and am testing a Threadripper 7000 build. Specs for reference:

AMD Ryzen Threadripper 7970X 350W SP6 - 32-Core/64-Threads
Asetek 35-102-0000354 836S-M1A AMD Threadripper Cooler 360mm
ASUS Pro WS TRX50-SAGE WIFI
NEMIX RAM 256GB (4X64GB) DDR5 5600MHZ PC5-44800 2Rx4 ECC RDIMM
Samsung 990 Pro 2TB
PNY RTX 4000 Ada VCNRTX4000ADA-PB 20GB 160-bit GDDR6
Fractal Design Define 7 XL ATX Full Tower Case
Super Flower LEADEX VII XG 1300 W 80+ Gold
Windows 11 Enterprise 64 Bit
Noctua A14 PWM x5
Noctua A14 FLX x1
Noctua NF-F12 x4

So, I have the case set up for airflow with the radiator top-mounted. The four NF-F12 fans are across the top, three on the radiator, one on the extra space (with the extra space being toward the back, radiator pushed to front). Then three A14s on the front, two on the bottom, one on the back. Air pushing front to back, bottom to top.

In general, temps are good, but the RAM is a little crazy. The motherboard has two RDIMM slots above the CPU and two below. After 24hrs of Prime95 "blend" the average temp for the top stick closest to the socket is 80C, the one above that is averaging upper 70s, but both have hit 95C. The bottom two are staying in the 60s average with a max around 80C. I know the designed operating range for the RAM is 0-95C, but is this OK? If not, is there anything I can/should do about it? I am not having stability issues. I ran the Windows extended memory test for almost 24hrs with no issues, and Prime95 has also been stable for 24hr runs.

Consumer RAM doesn't come anywhere close to this. I have read a few things about RDIMMs running hotter... but I can't find anything conclusive about what is acceptable. Obviously this is unlikely to be the continuous operating status for the machine in daily use. It's supposed to be a Solidworks FEA workstation for the long haul.

I appreciate any insight you guys might have.

P.S. Here are some reference shots at the 24hr mark:

Screenshot 2024-02-18 090850.pngScreenshot 2024-02-18 091103.png
 
Last edited:
DDR5 modules can indeed throw off some heat, especially when the PCB is densely packed with DRAM ICs. The modules will also automatically throttle transactions as the chips get too hot; the system will slow down but may avoid crashing.

Also, the min/max ddr5 temperatures in hwinfo64 may be inaccurate. I've noticed some glitchy temperature reporting in hwinfo64 when going past certain thresholds. As the memory cools, the reported temperature will spike up and vice-versa. One of the specific thresholds I observed during testing was 80C, which would result in a one-sample spike to 95.8C... which exactly matches maxes in your screenshot.

So, your ram temps are bad, but may not be quite as bad as the max values would indicate.


In hwinfo64, plot the DIMM temperatures and system memory bandwidth and kick off your stress test. If you are blatantly thermal throttling, you'll see the impact in the bandwidth plot. The temperature plots will expose the min/max temperature glitches, if they exist.

Grab a spare fan, remove the case's side panel, and point the fan right at the modules while running a memory stress test. If you see DIMM temperatures plummet and memory bandwidth come back to normal, you have a line on how to fix your problem.
 
I wonder where the sensor sits on the module? The register chip? That should be the only thing making registered modules hotter than unbuffered ones. Unless you have capacity higher than what unbuffered offers.

I second the notion to verify that the temperature display is correct, and the recommendation to observe the temps change when under high airflow.

Another verification is asking the mainboard for temps in IPMI (if you have it on your board).
 
flip the back fan so its blowing in.
The first fan isn't over the RAM so I don't think it would change much. I can try.

DDR5 modules can indeed throw off some heat, especially when the PCB is densely packed with DRAM ICs. The modules will also automatically throttle transactions as the chips get too hot; the system will slow down but may avoid crashing.

Also, the min/max ddr5 temperatures in hwinfo64 may be inaccurate. I've noticed some glitchy temperature reporting in hwinfo64 when going past certain thresholds. As the memory cools, the reported temperature will spike up and vice-versa. One of the specific thresholds I observed during testing was 80C, which would result in a one-sample spike to 95.8C... which exactly matches maxes in your screenshot.

So, your ram temps are bad, but may not be quite as bad as the max values would indicate.


In hwinfo64, plot the DIMM temperatures and system memory bandwidth and kick off your stress test. If you are blatantly thermal throttling, you'll see the impact in the bandwidth plot. The temperature plots will expose the min/max temperature glitches, if they exist.

Grab a spare fan, remove the case's side panel, and point the fan right at the modules while running a memory stress test. If you see DIMM temperatures plummet and memory bandwidth come back to normal, you have a line on how to fix your problem.
Interesting. Here's a plot of said graphs running prime95 Large FFT and ... you seem to be right. It's not actually hitting 95.8. That said, how concerned should I be that it will sit in the 80s? I am not sure to what extent the end user will ever see these temps with their workload, but ... they're all high compared to what I'm used to!

1708376252052.png


I wonder where the sensor sits on the module? The register chip? That should be the only thing making registered modules hotter than unbuffered ones. Unless you have capacity higher than what unbuffered offers.

I second the notion to verify that the temperature display is correct, and the recommendation to observe the temps change when under high airflow.

Another verification is asking the mainboard for temps in IPMI (if you have it on your board).
I think you're right, it's the register chip. Unfortunately, no IPMI.
 
That said, how concerned should I be that it will sit in the 80s? I am not sure to what extent the end user will ever see these temps with their workload, but ... they're all high compared to what I'm used to!
It's nice to have some thermal headroom in a performance-oriented build. You have negative.

You can see the performance degrading over time in the bandwidth plots. It will only get worse as the intake filters clog, ambient temperature goes up, etc..

This isn't something like PBO where the CPU is overclocking itself to a thermal limit to extract some bonus performance. Your memory is cooking and failing to sustain the platform's base memory performance.
 
1708635585191.png

1708635600747.png




This setup seems to have brough things down ~8C. Still not what I'd call stellar temps, but it's better and will have to do! Basically just an A14 FLX on 90deg brackets.
 
have you tried it? this is how your airflow is working now, the front isnt really reaching the back enough.
the green will feed fresher air to both the ram and to that last fan on the rad.
1708642813917.png
 
Back
Top