Internal CPU errors on Asus z490-E with i9 10850K even at BIOS default settings - bad board?

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
I recently put together a new build consisting of an Asus ROG Strix z490-E motherboard and an i9 10850K cpu with G.Skill 4000MHz memory. I can boot fine and the system appears stable - until I start to mildly stress the cpu. I like to run a quick and dirty stability test of two instances of WinRar unraring two large archives at once. When I do this, I get 2-4 internal cpu errors in HWiNFO64's Windows Hardware Errors (WHEA) reporting and one or both of the running WinRar instances will frequently crash or report errors on archives I know are okay. The ram is compatible AFAIK. At first I thought it was my 32GB memory (2X16GB sticks) and the XMP profile I was using but much to my chagrin, I loaded "Optimized Defaults" in the z490-E's BIOS running the ram and cpu at default settings and I still get the same issues.

When I posted this problem on the HWiNFO64 forum, someone suggested the board's "Auto" settings aren't detecting correct values and aren't giving enough voltage to the cpu. So I played around with LLC, setting it to level 4 (which is recommended for overclocking). No joy. I'm not overclocking anything. Why can't my board run a 10850K chip at "optimized default" settings with reliable stability? I even updated the BIOS to the latest version. Do I have a bad board? My cpu temps in HWiNFO64 show it's well within acceptable limits (28c-60c) so I don't think the processor is overheating, though I did use some very old Arctic Silver that was kind of stiff when I applied it to the HSF and the chip. I'm temporarily using a Cooler Master LED 212 Air cooler until I get a 280mm liquid cooler I ordered but I haven't been really stressing the cpu outside of doing the WinRar test and it can't even pass that reliably. I have 9 days to return this board to Newegg before the 30 days are up. It seems this board should be able to run the 10850K at BIOS default/auto settings and handle a test like WinRar. Should I RMA it? Or could it be a bad cpu? I'm at wits about this.

Edited: Oh, and I left out something that might be important: during the first day or two I was running the new rig I thought i noticed the smell of somethng burning but attributed it to "that new motherboard smell" caused by normal heat in the case and its components. Could there be damaged electronics on the board?
 
Last edited:
Joined
Jan 16, 2013
Messages
2,379
I recently put together a new build consisting of an Asus ROG Strix z490-E motherboard and an i9 10850K cpu with G.Skill 4000MHz memory. I can boot fine and the system appears stable - until I start to mildly stress the cpu. I like to run a quick and dirty stability test of two instances of WinRar unraring two large archives at once. When I do this, I get 2-4 internal cpu errors in HWiNFO64's Windows Hardware Errors (WHEA) reporting and one or both of the running WinRar instances will frequently crash or report errors on archives I know are okay. The ram is compatible AFAIK. At first I thought it was my 32GB memory (2X16GB sticks) and the XMP profile I was using but much to my chagrin, I loaded "Optimized Defaults" in the z490-E's BIOS running the ram and cpu at default settings and I still get the same issues.

When I posted this problem on the HWiNFO64 forum, someone suggested the board's "Auto" settings aren't detecting correct values and aren't giving enough voltage to the cpu. So I played around with LLC, setting it to level 4 (which is recommended for overclocking). No joy. I'm not overclocking anything. Why can't my board run a 10850K chip at "optimized default" settings with reliable stability? I even updated the BIOS to the latest version. Do I have a bad board? My cpu temps in HWiNFO64 show it's well within acceptable limits (28c-60c) so I don't think the processor is overheating, though I did use some very old Arctic Silver that was kind of stiff when I applied it to the HSF and the chip. I'm temporarily using a Cooler Master LED 212 Air cooler until I get a 280mm liquid cooler I ordered but I haven't been really stressing the cpu outside of doing the WinRar test and it can't even pass that reliably. I have 9 days to return this board to Newegg before the 30 days are up. It seems this board should be able to run the 10850K at BIOS default/auto settings and handle a test like WinRar. Should I RMA it? Or could it be a bad cpu? I'm at wits about this.

Edited: Oh, and I left out something that might be important: during the first day or two I was running the new rig I thought i noticed the smell of somethng burning but attributed it to "that new motherboard smell" caused by normal heat in the case and its components. Could there be damaged electronics on the board?
1. Get a flashlight and carefully inspect the board for melted components.
2. Remove the CPU and check for bent cpu socket pins.
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
31,558
What Furious_Styles said, but also check to see of the optimized defaults STILL have various overclocking settings enabled (a lot of Asus boards do, it seems). Turn those off if they are - heck, set speeds/voltage manually to stock and see what happens. It may be a bad CPU too - they're extremely rare, but it DOES happen.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
What Furious_Styles said, but also check to see of the optimized defaults STILL have various overclocking settings enabled (a lot of Asus boards do, it seems). Turn those off if they are - heck, set speeds/voltage manually to stock and see what happens. It may be a bad CPU too - they're extremely rare, but it DOES happen.

Yeah, I noticed even when I "load optimized defaults" it still retains the level 4 LLC adjustment I made so I'll give the BIOS a good going over before deciding to RMA it. I ordered some ARCTIC MX-4 thermal compound this morning so in two days, I'm also going to take the board out of the case, inspect it for scorch marks, check pins, jumper the CMOS to defaults, and try reapplying the new thermal compound just in case the old arctic silver I used isn't up to snuff, and reseat the HSF. But if it is the cpu, how does one troubleshoot something like that and distinguish between a bad mainboard and a bad processor? I don't have an extra z490 board or i9 chip lying around. This is my first cpu/motherboard purchase since i started years ago where I encountered an issue like this. I guess it was bound to happen someday. I've been lucky.

 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
31,558
Run things like Prime95 (probably not AVX at first) and see if it errors. Then ping intel.
 

JSHamlet234

Limp Gawd
Joined
Apr 9, 2021
Messages
457
Clear CMOS, make sure XMP and Asus Multicore Enhancement are disabled, and test it again.
 

Nasgul

Limp Gawd
Joined
Jun 11, 2005
Messages
131
Seriously, that's the stupidest thing you can rely on for "stability". The only CPU error that can exist is "it doesn't work". Like when a video card doesn't work, you see artifacts.

As for "not giving" enough voltage to the CPU? If any, ASUS boards over-volt the CPUs, just like my 10700KF to reach 5.2GHz on all 8 cores, so the problem is not the CPU, otherwise, it wouldn't work/boot up.

Do what I do, the only value you change in the BIOS is to set XMP Profile to whichever the speed of the RAM is rated for, then in Windows, you use AI Suite (Dual Intelligent Processors 5 then AI Overclocking) to overclock the CPU, the software sets the safe values for all 8 cores. I know, I have the same motherboard.

But do that when you get decent cooling, that lame heat-sink fan you got there is just wrong so stop trying with that 212. I wouldn't even "consider" using that from the beginning, wouldn't even use at all. No wonder why you smelled something burning.

And ARCTIC MX-4? Might as well use toothpaste instead. Judging by that now, you sound like one of those guys that buys a $90,000 BMW but lives in a trailer park. mx-4? Crying out loud!!!!!!!!! Get the real deal: Thermal Grizzly Kryonaut Thermal Grease Paste.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
Seriously, that's the stupidest thing you can rely on for "stability". The only CPU error that can exist is "it doesn't work". Like when a video card doesn't work, you see artifacts.

As for "not giving" enough voltage to the CPU? If any, ASUS boards over-volt the CPUs, just like my 10700KF to reach 5.2GHz on all 8 cores, so the problem is not the CPU, otherwise, it wouldn't work/boot up.

Do what I do, the only value you change in the BIOS is to set XMP Profile to whichever the speed of the RAM is rated for, then in Windows, you use AI Suite (Dual Intelligent Processors 5 then AI Overclocking) to overclock the CPU, the software sets the safe values for all 8 cores. I know, I have the same motherboard.

But do that when you get decent cooling, that lame heat-sink fan you got there is just wrong so stop trying with that 212. I wouldn't even "consider" using that from the beginning, wouldn't even use at all. No wonder why you smelled something burning.

And ARCTIC MX-4? Might as well use toothpaste instead. Judging by that now, you sound like one of those guys that buys a $90,000 BMW but lives in a trailer park. mx-4? Crying out loud!!!!!!!!! Get the real deal: Thermal Grizzly Kryonaut Thermal Grease Paste.

Wow, you really have a bug up your ass, but okay. Hey, if WinRar (a program I use & need) isn't working something is definitely up. Plus, the CPU internal errors.
 

kirbyrj

Fully [H]
Joined
Feb 1, 2005
Messages
28,668
Wow, you really have a bug up your ass, but okay. Hey, if WinRar (a program I use & need) isn't working something is definitely up. Plus, the CPU internal errors.

You should just put him on your ignore list. He's a clown. Case in point...the difference between MX4 and Kryonaut is maybe 2C best case at stock clocks with that kind of cooling. It will stretch out further with better cooling, but I wouldn't waste your time with it until your final cooling comes in. Besides, at stock clocks and no extra MCE, etc. a 212 would be fine for a "stock" 125W TDP CPU to test with.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
Clear CMOS, make sure XMP and Asus Multicore Enhancement are disabled, and test it again.
Just tried it. Winrar crashes and I get internal cpu errors. In two days, when I get the thermal compound, I'm going to breadboard it to make sure there's nothing shorting in the case, reseat the HSF with new thermal compound, perhaps pull one of the ram sticks, play around with them in the slot, and if that doesn't work, I'm calling it a day. It's going back to Newegg. I'll try the Prime 95 test just to be sure it's not the cpu. No sense in chasing my tail sending back a good board only to have it turn out to be the cpu. Thanks to you guys that tried to help, much appreciated. And Nasal... you can go rave on lol
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
You should just put him on your ignore list. He's a clown. Case in point...the difference between MX4 and Kryonaut is maybe 2C best case at stock clocks with that kind of cooling. It will stretch out further with better cooling, but I wouldn't waste your time with it until your final cooling comes in. Besides, at stock clocks and no extra MCE, etc. a 212 would be fine for a "stock" 125W TDP CPU to test with.

Done. lol. I didn't even know that that was a feature of this board :D Yeah, I'm kind of wondering if I should even waste the MX4... something tells me it will make no difference because my cpu temps are fine even with cheap air cooling, and even if I ran a torture test like Prime 95, it will just throttle down to prevent overheating. Thanks for the tips.
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
31,558
Done. lol. I didn't even know that that was a feature of this board :D Yeah, I'm kind of wondering if I should even waste the MX4... something tells me it will make no difference because my cpu temps are fine even with cheap air cooling, and even if I ran a torture test like Prime 95, it will just throttle down to prevent overheating. Thanks for the tips.
Pretty much. Breadboard test (top of the mobo box works great!) is a good test too. Shorts can cause issues like this.
 

UltraTaco

Limp Gawd
Joined
Feb 21, 2020
Messages
462
I
ARCTIC MX-4 thermal compound this
Taco use that too!! If you dont want to wait for thermal paste and continue trouble shooting, just use some aqua fresh! That's what taco did when she received her cpu quicker than thermal paste, it works just as good! After that, just clean off(very easy) nd repaste with MX4!! Kudos!!!
 
Joined
Jan 16, 2013
Messages
2,379
Seriously, that's the stupidest thing you can rely on for "stability". The only CPU error that can exist is "it doesn't work". Like when a video card doesn't work, you see artifacts.

As for "not giving" enough voltage to the CPU? If any, ASUS boards over-volt the CPUs, just like my 10700KF to reach 5.2GHz on all 8 cores, so the problem is not the CPU, otherwise, it wouldn't work/boot up.

Do what I do, the only value you change in the BIOS is to set XMP Profile to whichever the speed of the RAM is rated for, then in Windows, you use AI Suite (Dual Intelligent Processors 5 then AI Overclocking) to overclock the CPU, the software sets the safe values for all 8 cores. I know, I have the same motherboard.

But do that when you get decent cooling, that lame heat-sink fan you got there is just wrong so stop trying with that 212. I wouldn't even "consider" using that from the beginning, wouldn't even use at all. No wonder why you smelled something burning.

And ARCTIC MX-4? Might as well use toothpaste instead. Judging by that now, you sound like one of those guys that buys a $90,000 BMW but lives in a trailer park. mx-4? Crying out loud!!!!!!!!! Get the real deal: Thermal Grizzly Kryonaut Thermal Grease Paste.
Actually a defective CPU can exhibit really odd behavior. I had to test a friend's who had weird problems with performance. It would pass the intel diagnostic test but cause artifacts and other weird stuff. Wasn't something he could test even though he RMA'd just about everything else.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
try bumping your ram voltage to 1.4v, dont worry its safe.

I actually did that automatically when i first assembled the build because I used XMP profile #1 which auto-sets the ram voltage to 1.4v and that's when I first noticed the cpu internal errors issue. But I'll play around with the BIOS some more in the coming few days before RMAing it just to be certain. I have this gut feeling that something got fried on the motherboard because of the smell that first day. Luckily, I have 8 days left to return it to NE. However, if it's the cpu that's bad, my return option with B&H photo is long expired. Does Intel warranty bad/defective chips? Never had this issue before.
 

pendragon1

Fully [H]
Joined
Oct 7, 2000
Messages
29,806
I actually did that automatically when i first assembled the build because I used XMP profile #1 which auto-sets the ram voltage to 1.4v and that's when I first noticed the cpu internal errors issue. But I'll play around with the BIOS some more in the coming few days before RMAing it just to be certain. I have this gut feeling that something got fried on the motherboard because of the smell that first day. Luckily, I have 8 days left to return it to NE. However, if it's the cpu that's bad, my return option with B&H photo is long expired. Does Intel warranty bad/defective chips? Never had this issue before.
take the panel off and give 'er the sniff test. the smell of magic blue smoke lingers forever. intel does 1 year i think...
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
31,558
I actually did that automatically when i first assembled the build because I used XMP profile #1 which auto-sets the ram voltage to 1.4v and that's when I first noticed the cpu internal errors issue. But I'll play around with the BIOS some more in the coming few days before RMAing it just to be certain. I have this gut feeling that something got fried on the motherboard because of the smell that first day. Luckily, I have 8 days left to return it to NE. However, if it's the cpu that's bad, my return option with B&H photo is long expired. Does Intel warranty bad/defective chips? Never had this issue before.
Intel is 3 years for boxed procs. They're good, and fast.
 

vegan

n00b
Joined
Nov 2, 2012
Messages
8
check to see if there is a BIOS update for the motherboard, sometimes that can cure woes
 

GiGaBiTe

[H]ard|Gawd
Joined
Apr 26, 2013
Messages
1,797
You said you've tried XMP off, but what speed were you running the RAM at? 4000 MHz is far above that max 2933 the i9-10850k is officially specced at.

If you haven't tried the RAM at the bog standard 2133 MHz, I'd try that. If you already have tried 2133 MHz, possibly try some different memory modules and see if anything changes before RMAing the CPU.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
You said you've tried XMP off, but what speed were you running the RAM at? 4000 MHz is far above that max 2933 the i9-10850k is officially specced at.

If you haven't tried the RAM at the bog standard 2133 MHz, I'd try that. If you already have tried 2133 MHz, possibly try some different memory modules and see if anything changes before RMAing the CPU.

Oh, I hadn't considered that. Both my G.Skill ram (in XMP profile) and motherboard officially supported 4000 MHz but I didn't know about the chip itself. Anyway, I think my problem was touched on earlier in this topic when someone said that "resetting" the BIOS in new Asus boards to "optimized defaults" doesn't reset several oveclocked settings you may have fiddled with. So I did a hard reset with clearing CMOS and tried a few adjustments in the BIOS. Finally discovered that just enabling XMP #1 (overclocks the ram to 4000MHz at 1.4v) and setting the load line calibration to lvl 4 caused my system to seem stable and I could run multiple instances of WinRar at once and decompress large archives without crashing. I still got cpu eternal errors in HWiN64 but nothing like before. However, it does temporarily push my voltages to the cpu to around 1.4v very briefly if I'm maxing things but these seem okayish since it's using adaptive power modes.

So I think that might have been my problem, that not only does Asus "overclock" the MB at default/auto settings, but it doesn't reset overclocked settings you yourself made when you "load optimized defaults" so I was kind of chasing my tail there. At any rate, as soon as I get my 280mm liquid cooler, I'm going to do more extensive testing/burn in and adjusting of BIOS (found this guide on overclocking 10850K/10900K chips in Asus boards). So, for the time being at least, I don't feel as great a need to RMA it.
 
Joined
Jan 16, 2013
Messages
2,379
Oh, I hadn't considered that. Both my G.Skill ram (in XMP profile) and motherboard officially supported 4000 MHz but I didn't know about the chip itself. Anyway, I think my problem was touched on earlier in this topic when someone said that "resetting" the BIOS in new Asus boards to "optimized defaults" doesn't reset several oveclocked settings you may have fiddled with. So I did a hard reset with clearing CMOS and tried a few adjustments in the BIOS. Finally discovered that just enabling XMP #1 (overclocks the ram to 4000MHz at 1.4v) and setting the load line calibration to lvl 4 caused my system to seem stable and I could run multiple instances of WinRar at once and decompress large archives without crashing. I still got cpu eternal errors in HWiN64 but nothing like before.

So I think that might have been my problem, that not only does Asus "overclock" the MB at default/auto settings, but it doesn't reset overclocked settings you yourself made when you "load optimized defaults" so I was kind of chasing my tail there. At any rate, as soon as I get my 280mm liquid cooler, I'm going to do more extensive testing/burn in and adjusting of BIOS (found this guide on overclocking 10850K/10900K chips in Asus boards). So, for the time being at least, I don't feel as great a need to RMA it.
Here's my advice: don't do any overclocking at all until you are positive nothing is defective. No XMP, no MCE, nothing at all. You start doing some OCing and you will have a harder time tracking down the source of the problem.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
Here's my advice: don't do any overclocking at all until you are positive nothing is defective. No XMP, no MCE, nothing at all. You start doing some OCing and you will have a harder time tracking down the source of the problem.
Yesh, I'm not going to really OC things outside the XMP, just using that guide for basic info. 10850K/10900K chips don't need overclocking.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
I'm saying no XMP. You should have 0 errors in hwinfo64 before any OCs get applied.

Well, since I just ran the WinRar test with XMP enabled and it passed with no cpu internal errors in HWiNFO64, I'm starting to feel very encouraged that I don't have a defective board/cpu after all. It also passes Windows SFC test wheras before I got corrupted files. One other thing: "multicore enhancement" in BIOS is now set to "let BIOS optimize it" (or whatever that setting says) along with LLC at lvl 4. These two settings seemed to have done the trick, along with clear CMOS. Reading other similar issues on the internet it's my understanding now that "multicore enhancement" is troublesome with some systems. But I was even able to run Prime 95 yesterday for about an hour with no issues (outside of the fact that I tried to browse the web while doing it,... bad idea lol) though I don't like to do this without better cooling since it causes my cpu temps to spike to 100c for a few seconds at the beginning of the test before settling to around 65c.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
bump your system agent voltage up to 1.28v. if that doesn't work, try 1.3v.

This adjustment is part of stabilizing high RAM speeds and/or tight timings, for intel.

Right now it seems to be 100% stable so until I get my new phanteks case (which will be delivered today) & EVGA liquid cooler (tomorrow), I'm going to take a break from doing any BIOS fine tuning and/or stress testing. I did learn something new today though; that i9 chips don't officially support 4000MHz ram speed. Someone on Reddit was flabbergasted that I run my ram at 4000MHz ("You live dangerously, dude!" lol).
 

GiGaBiTe

[H]ard|Gawd
Joined
Apr 26, 2013
Messages
1,797
Reading other similar issues on the internet it's my understanding now that "multicore enhancement" is troublesome with some systems.

MCE Basically tells the motherboard to run the CPU balls to the wall, it's no real surprise it makes systems unstable. It allows all cores to run at max turbo boost and/or thermal velocity boost (on CPUs that support it like the i9-10900k) and to never time out. By default, certain cores can only boost to certain clock speeds to try and stay within the thermal limits Intel sets, so the first few cores can boost the highest, with the middle group of cores are less and the last cores get the least boost headroom.

This obviously pushes the CPU way out of spec and balloons the heat and power consumption.
 

chameleoneel

Supreme [H]ardness
Joined
Aug 15, 2005
Messages
4,530
Right now it seems to be 100% stable so until I get my new phanteks case (which will be delivered today) & EVGA liquid cooler (tomorrow), I'm going to take a break from doing any BIOS fine tuning and/or stress testing. I did learn something new today though; that i9 chips don't officially support 4000MHz ram speed. Someone on Reddit was flabbergasted that I run my ram at 4000MHz ("You live dangerously, dude!" lol).
Errors from overclocked RAM don't necessarily cause total instability. You mentioned you still had some errors reported, just less than before. bumping the system agent voltage would take all of 5 minutes to change it and then validate whether or not your errors go away, stay the same, or increase.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
Errors from overclocked RAM don't necessarily cause total instability. You mentioned you still had some errors reported, just less than before. bumping the system agent voltage would take all of 5 minutes to change it and then validate whether or not your errors go away, stay the same, or increase.

Thanks. I'll definitely take this advice into consideration when I do a final tune-up of the BIOS in a couple of days. But even with system agent voltage at default it's running with no HWinfo errors whatsoever now. But maybe with a little more juice to to the system agent, I can lower my LLC levels even more with resulting less power consumption and still get my 4000MHz ram cake too. Ideally, I'd like to get my system running completely stable without it shooting to slightly past 1.4v - however briefly - on the cpu core voltage because it's at lvl 4 LLC. Manually tuning things is probably the best way but I actually haven't done much overclocking or BIOS optimizations in my last 3 or 4 builds. I haven't upgraded in about 6 years and I somehow just expect things to run stably at "load optimized defaults" settings but Asus and others seem to be saying these days they're going to OC things already out of the box, like it or not, system stability or no system stability. Welcome to the age of i9s and modern chipsets.
 
Joined
Jan 16, 2013
Messages
2,379
Thanks. I'll definitely take this advice into consideration when I do a final tune-up of the BIOS in a couple of days. But even with system agent voltage at default it's running with no HWinfo errors whatsoever now. But maybe with a little more juice to to the system agent, I can lower my LLC levels even more with resulting less power consumption and still get my 4000MHz ram cake too. Ideally, I'd like to get my system running completely stable without it shooting to slightly past 1.4v - however briefly - on the cpu core voltage because it's at lvl 4 LLC. Manually tuning things is probably the best way but I actually haven't done much overclocking or BIOS optimizations in my last 3 or 4 builds. I haven't upgraded in about 6 years and I somehow just expect things to run stably at "load optimized defaults" settings but Asus and others seem to be saying these days they're going to OC things already out of the box, like it or not, system stability or no system stability. Welcome to the age of i9s and modern chipsets.
At stock clocks you should be able to run either the lowest or next lowest LLC and be 100% stable.
 

JSHamlet234

Limp Gawd
Joined
Apr 9, 2021
Messages
457
Thanks. I'll definitely take this advice into consideration when I do a final tune-up of the BIOS in a couple of days. But even with system agent voltage at default it's running with no HWinfo errors whatsoever now. But maybe with a little more juice to to the system agent, I can lower my LLC levels even more with resulting less power consumption and still get my 4000MHz ram cake too. Ideally, I'd like to get my system running completely stable without it shooting to slightly past 1.4v - however briefly - on the cpu core voltage because it's at lvl 4 LLC. Manually tuning things is probably the best way but I actually haven't done much overclocking or BIOS optimizations in my last 3 or 4 builds. I haven't upgraded in about 6 years and I somehow just expect things to run stably at "load optimized defaults" settings but Asus and others seem to be saying these days they're going to OC things already out of the box, like it or not, system stability or no system stability. Welcome to the age of i9s and modern chipsets.

You should probably run HCI Memtest just to make sure.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
At stock clocks you should be able to run either the lowest or next lowest LLC and be 100% stable.

Well, I can't. I get WHEA errors in HWinFO at anything under LLC lvl 4. However, they could be memory errors and not actual problems with the chipset/cpu. I strongly suspect

chameleoneel above gave me the solution. Trying to run ram at 4000HHz (even if it's officially speced to run at that speed by g.skill) without increasing system agent voltage is likely just going to cause problems. We'll see.​

 

JSHamlet234

Limp Gawd
Joined
Apr 9, 2021
Messages
457
Well, I can't. I get WHEA errors in HWinFO at anything under LLC lvl 4. However, they could be memory errors and not actual problems with the chipset/cpu. I strongly suspect

chameleoneel above gave me the solution. Trying to run ram at 4000HHz (even if it's officially speced to run at that speed by g.skill) without increasing system agent voltage is likely just going to cause problems. We'll see.​


You have to understand that the RAM and CPU don't each exist in a vacuum. If your RAM is running at 4000MHz, that means it's communicating with something else at 4000MHz. That something else is your CPU, specifically the Integrated Memory Controller (IMC). The RAM is rated for 4000, the IMC is rated at 2933, and you are asking both to talk to each other at 4000. The solution to an overstressed IMC is more System Agent and/or VCCIO voltage. Furthermore, your CPU is programmed by Intel to run at a certain core clock speed and core voltage, on the assumption that it receives adequate cooling, clean & sufficient power, and that all of its components, including the IMC are also being run within spec. Even if your IMC is OK at 4000, it is producing significantly more localized heat on the processor die when it runs at that speed, and your CPU cores are able to do more work in less time which increases their effective workload beyond their design parameters. Because of the heat and increased throughput, the CPU cores themselves may require a bit more voltage to maintain stock speed, which might be why the system runs better with increased LLC, which effectively increases the core voltage under load.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
You should probably run HCI Memtest just to make sure.

You should probably run HCI Memtest just to make sure.

I'll definitely be giving my system a full stress/burn-in test and memory check when I get the new liquid cooler (just using CM 212 air cooling now). Can the freeware version of HCI Memtest do a full memtest with not much effort from the user? I heard something about having to do calculations yourself unless you get the Pro version. I don't know what that even means. I'm used to something like the old MemTest86 where you just click and wait... and wait... and wait lol
 

JSHamlet234

Limp Gawd
Joined
Apr 9, 2021
Messages
457
I'll definitely be giving my system a full stress/burn-in test and memory check when I get the new liquid cooler (just using CM 212 air cooling now). Can the freeware version of HCI Memtest do a full memtest with not much effort from the user? I heard something about having to do calculations yourself unless you get the Pro version. I don't know what that even means. I'm used to something like the old MemTest86 where you just click and wait... and wait... and wait lol
You have to run multiple instances, and each instance can be a maximum of 2-3GB depending on what kind of mood Windows is in. The total amount of RAM tested should be at least 75% of your total RAM, but leave at least a 1 or 2 gigs free so that you don't end up slamming your page file. You should also try to use as many threads as possible so it goes faster and works the RAM harder. I usually leave 1 or 2 threads unused so that the system isn't so unresponsive that I have to check to see if isn't frozen. So, in your example, since you have 10 cores, 20 threads, and 32 gigs, I would suggest running 18 instances of 1400MB each. I let it go to 4000% each thread. It takes about 20 hours on my machine, but yours is faster than mine, so it should take somewhat less than that.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
You have to understand that the RAM and CPU don't each exist in a vacuum. If your RAM is running at 4000MHz, that means it's communicating with something else at 4000MHz. That something else is your CPU, specifically the Integrated Memory Controller (IMC). The RAM is rated for 4000, the IMC is rated at 2933, and you are asking both to talk to each other at 4000. The solution to an overstressed IMC is more System Agent and/or VCCIO voltage. Furthermore, your CPU is programmed by Intel to run at a certain core clock speed and core voltage, on the assumption that it receives adequate cooling, clean & sufficient power, and that all of its components, including the IMC are also being run within spec. Even if your IMC is OK at 4000, it is producing significantly more localized heat on the processor die when it runs at that speed, and your CPU cores are able to do more work in less time which increases their effective workload beyond their design parameters. Because of the heat and increased throughput, the CPU cores themselves may require a bit more voltage to maintain stock speed, which might be why the system runs better with increased LLC, which effectively increases the core voltage under load.

It never used to be this complicated, that's for sure. Every single motherboard I ever owned after that old ugly dinosaur FIC (where you had to do overclocking with physical jumpers on the board itself) was basically just plug and play - you popped in your processor, slapped some thermal grease on the HSF, snapped it in place, flicked a switch, and Voila! If you had compatible memory, motherboard ran completely stable at factory settings. So imagine my surprise when I discovered that "factory settings" today are overclocked settings and you need further adjustments for high-speed ram that's even rated to run in your board. I should have done more research.
 
Joined
Jan 16, 2013
Messages
2,379
It never used to be this complicated, that's for sure. Every single motherboard I ever owned after that old ugly dinosaur FIC (where you had to do overclocking with physical jumpers on the board itself) was basically just plug and play - you popped in your processor, slapped some thermal grease on the HSF, snapped it in place, flicked a switch, and Voila! If you had compatible memory, motherboard ran completely stable at factory settings. So imagine my surprise when I discovered that "factory settings" today are overclocked settings and you need further adjustments for high-speed ram that's even rated to run in your board. I should have done more research.
Well it's still pretty easy. Typically XMP will just work, but when you start getting to the really high clocks (3600+) you can run into issues. Again though you should first make sure to run your memory at 2133 and LLC low MCE off and all stock to make sure there are no errors.
 

Jack Of Owls

Weaksauce
Joined
Jan 3, 2021
Messages
90
Well it's still pretty easy. Typically XMP will just work, but when you start getting to the really high clocks (3600+) you can run into issues. Again though you should first make sure to run your memory at 2133 and LLC low MCE off and all stock to make sure there are no errors.

Well, got some bad news and some good news. Bad: I loaded Optimized Defaults (didn't clear CMOS though), set AI Tweaker to Manual and lowered my ram speed to 2133MHz and set LLC to Level 2 and it's back to square 1 - WinRar test crashes 1-2 of the windows opened in 3 running instances and cpu internal errors in HWINF0 again. Good: at least my new Phanteks case arrived 20 minutes ago lol

I'm still planning on taking the motherboard out and breadboarding it with removal of ram sticks and switching one or both in the slots, but would you guys RMA a new motherboard that can't run at stock settings at LLC Level 2 or below? One other thing: I'm using a PSU pull from my old system, which is an Antec NeoECO C 620C 620W ATX12V (80 PLUS BRONZE). It's about 6-7 years old. Previous sytem was fine but do I need more power?

Edited: btw, I'm using latest BIOS for my Asus z490-E which I flashed to when I initially started having this problem.
 
Top