Xeon + consumer chipset = ECC support?

If you want stability, you do it right and buy server-class hardware.

If you're willing to engage in fugly hackjobs, you don't want stability and might as well ignore ECC entirely.

It's as simple as that.

PS: my toe nails had rolled up
 
The ASUS P6T does not officially support Xeon CPUs. Up until recently only the "WS" series motherboards officially support Intel Xeon CPUs. Only motherboards that officially supported the Xeons actually support ECC memory. Trying to solder additional contact points and using software hacks will not get you the same results as pure ECC memory on a supported CPU and chipset. Memory trace path signaling needs to be precise. It would be hard to replicate with some wire and a soldering iron. Even if you have the trace paths or could do it with a simple wire connection that there is still the BIOS. Motherboards with proper Xeon QVL testing have the correct microcode support in BIOS. Motherboards without official Xeon support do not.
 
Last edited:
I've already fixed the microcode issue, that part is very easy using AMI's AMIBCP tool. You can dump microcodes from a donor BIOS from boards which support the Xeons (like the EVGA Classified SR-2 or any Supermicro or whatever), patch them into the target BIOS, flash that in, problem solved.

The traces and the "magic code" in the BIOS, that enables ECC are the real problem. Either they're there, or they're not. If they're not, I wouldn't care about actually soldering wires onto the DIMM and CPU socket pins. I wouldn't go that far, although it'd be fascinating to try.

Of course this would always be a dirty hack. But curiosity is hard to suppress. ;) I just want to see whether it CAN be done!
 
Please, make it stop.

This is even less stability than just ignoring ECC entirely. Why would you do such things? Why? WHY?

*sobs*

The curiosity about rusty nails in my knee is thwarted pretty rapidly. Why not this one, too?
 
I've already fixed the microcode issue, that part is very easy using AMI's AMIBCP tool. You can dump microcodes from a donor BIOS from boards which support the Xeons (like the EVGA Classified SR-2 or any Supermicro or whatever), patch them into the target BIOS, flash that in, problem solved.

The traces and the "magic code" in the BIOS, that enables ECC are the real problem. Either they're there, or they're not. If they're not, I wouldn't care about actually soldering wires onto the DIMM and CPU socket pins. I wouldn't go that far, although it'd be fascinating to try.

Of course this would always be a dirty hack. But curiosity is hard to suppress. ;) I just want to see whether it CAN be done!

According to the DDR3 ECC routing specifications there is an optional 9th data byte lane group which is required to transmit ECC data. Impedence and lane distance is very strict so I doubt it could be done with a soldering iron as I stated before. Your talking about several lanes here, not just a wire or two. Given that this 9th byte lane group is entirely optional it is unlilkely that non-Xeon compatible motherboards have these lanes.

Combine this requirement and the additional firmware needed to support it and you won't be getting real, hardware based ECC support on anything which doesn't already list it on their spec sheets.

So please stop. :cool:
 
Hm, I didn't expect so much, um.. What is the correct English word.. "Hostility"? Not quite, but something in that area. "Resistance" seems to weak a word.

Patching Microcodes into a BIOS is really nothing big at all. Linux and BSD UNIX kernels patch it in RAM when booting anyway, at runtime (I think Windows does it too, as there are microcode updates coming in via Windows Update). And if I can learn something about how ECC is handled on the software side by reading the Linux kernel source code, why not?

About the signaling issues, I understand your concerns, but as I have stated, I have no intentions on trying to connect the pins if the traces are missing. I have no way of properly doing this anyway, so I'll just leave it. I said it would be fascinating to try, and I really think so. But I do not have the necessary knowledge for that.

Do I understand correctly, that I better take this elsewhere? I just thought I could learn something from Xennex' progress on the matter in the meantime.

And no, I have no intention to stop learning even if it's just a dirty hack. ;) It's not like I'm slaughtering some holy cow here or something. Anyone can do to their hard- and software what they want, and publish the results as they please (whether good or not), that's how I see it...
 
There is no need to take this elsewhere or to stop learning. I didn't think I was being hostile. I was just providing information on the hardware side. I'm trying to illustrate that you aren't going to get hardware ECC support on a platform that wasn't designed for it. I'm confused by is the desire to implement ECC on hardware that doesn't support it when you can have hardware that does for only a little more in most cases. That is unless your scraping the bottom of the barrel in terms of hardware in which case you don't have the money for proper hardware so at some point you have to accept that.

There is a difference between learning and trying to squeeze blood out of a turnip.
 
Ah, I see. This is not exactly a "mission-critical" machine for me, it's just my personal workstation. Actually a second one, that is meant as a replacement in case something fails in the primary machine (same hardware).

Cost is not a major issue for me, not at the level you describe at least. If it was, I would've never gone with S1366 in the first place.

I wouldn't really rely on the ECC even if I could get it to work and even if the ECC error injection tests (that Intels IMCs support) would all yield proper results for days of permanent testing.

My main goal is to understand what exactly differs between an ECC capable BIOS, and one that is not, and what exactly that part of code does to make the processor switch the ECC on. And ultimately, whether it can be replicated by kernel-mode code at runtime, like on AMD platforms.

That's what I want to understand. If it's as simple as writing a value to a single machine-specific register... I just want to know.
 
Thrawn: we don't hate you buddy :D


You just had the unfortunate luck of bumping a thread previously infiltrated by idiots named mux and Xennex81 who thought they could hack ECC HARDWARE onto normal consumer boards, with the same stability as server-grade hardware. Since Dan_D has ended up talking to a wall every time someone bumped this thread, he was expecting the same lack of reason from you!

You are simply hacking software, and trying to determine if the motherboard has BIOS-disabled features that you can hack into working by patching memory now, and possibly modifying the BIOS code in the future. THAT is actually possible, so I wish you luck :D
 
I get it!!!

You are doing it "For the Science"!!!!

I have my fingers crossed for you.
 
Well dude, why the hate? This is the same sort of hacking that uncovers things like the HD 6950 to 6970 mod:

http://www.techpowerup.com/articles/overclocking/vidcard/159

Usually things like this are locked via fuses, but not always. So it doesn't hurt to experiment :D

No it doesn't. I'm actually curious as to what differences are in the BIOS between non-ECC capable motherboards and those that have it enabled. The issue is further complicated by additional differences between motherboards designed with ECC in mind and those that are not. Going beyond CPU microcode support that is.
 
It doesn't matter. If any result is reached, and it turns out to corrupt data rather than protecting it, all I can lose is data on a Debian testing installation that I'm gonna wipe anyway.

But it's not like this is going anywhere. I believe I'm stuck as it stands (which is why I posted here, as I was hoping somebody could provide further insight).

I thought about it again, and ASUS not adding the ECC traces for their desktop boards does make sense. They'd save a lot of money not needing to test that stuff or for extending their QVLs with ECC memory when the whole WS/workstation series of boards can do it and bring in a 100 bucks more per unit. The only reason why they'd be there would be if ASUS wasn't originally sure which markets to target with those specific boards of the very first S1366 generation.. Still unlikely?

So even if I would hit the jackpot, I'll probably never even notice it, if my mainboard doesn't have the physical connections..

Given that there are multiple uncertainties, I might be on the wrong platform for attempting this..

I do have the pinouts for the DIMM sockets and the LGA1366, but who can tell whether those pins are connected or not, given the board has so many layers the traces could sit on...
 
It doesn't matter. If any result is reached, and it turns out to corrupt data rather than protecting it, all I can lose is data on a Debian testing installation that I'm gonna wipe anyway.

But it's not like this is going anywhere. I believe I'm stuck as it stands (which is why I posted here, as I was hoping somebody could provide further insight).

I thought about it again, and ASUS not adding the ECC traces for their desktop boards does make sense. They'd save a lot of money not needing to test that stuff or for extending their QVLs with ECC memory when the whole WS/workstation series of boards can do it and bring in a 100 bucks more per unit. The only reason why they'd be there would be if ASUS wasn't originally sure which markets to target with those specific boards of the very first S1366 generation.. Still unlikely?

So even if I would hit the jackpot, I'll probably never even notice it, if my mainboard doesn't have the physical connections..

Given that there are multiple uncertainties, I might be on the wrong platform for attempting this..

I do have the pinouts for the DIMM sockets and the LGA1366, but who can tell whether those pins are connected or not, given the board has so many layers the traces could sit on...

wouldn't you be able to test that pretty easily with a digital multimeter? All you have to do is check if there is continuity between the correct DIMM socket pins and the CPU socket pins, correct?
 
It came to my mind just 10 seconds ago, so I RACED back here to write just that before anyone notices, so that I won't look stupid.

I failed.. ;)

So that's the next step then!
 
It doesn't matter. If any result is reached, and it turns out to corrupt data rather than protecting it, all I can lose is data on a Debian testing installation that I'm gonna wipe anyway.

But it's not like this is going anywhere. I believe I'm stuck as it stands (which is why I posted here, as I was hoping somebody could provide further insight).

I thought about it again, and ASUS not adding the ECC traces for their desktop boards does make sense. They'd save a lot of money not needing to test that stuff or for extending their QVLs with ECC memory when the whole WS/workstation series of boards can do it and bring in a 100 bucks more per unit. The only reason why they'd be there would be if ASUS wasn't originally sure which markets to target with those specific boards of the very first S1366 generation.. Still unlikely?

So even if I would hit the jackpot, I'll probably never even notice it, if my mainboard doesn't have the physical connections..

Given that there are multiple uncertainties, I might be on the wrong platform for attempting this..

I do have the pinouts for the DIMM sockets and the LGA1366, but who can tell whether those pins are connected or not, given the board has so many layers the traces could sit on...

Most of ASUS' current X99 motherboards do support Xeon CPUs but they do not support ECC memory. The X99 Deluxe, X99-A and Rampage V Extreme are all examples of this. No official ECC support and they probably don't have the traces for it. I haven't looked at the DDR4 guildelines for trace routing but your talking DDR3 anyway. The P6T was an early X58 motherboard which never officially supported Xeons so on a hardware level ECC support is probably right out.
 
I haven't done the measurements yet. You're probably right, but I still wanna check it, just to make sure. Maybe tomorrow or some other day this week.

In any case, I've made a few preparations, so these are the ECC pins, in case anyone ever needs that:


ECC Pins per channel on a LGA1366 socket, click to enlarge



ECC Pins on a DDR3 DIMM socket, click to enlarge

I hope I marked them all right as according to the data sheets, I'm quite tired already... So no warranties included.
 
Just be careful with a multi-meter, they test continuity by applying voltage and checking current, if you're using 2 or more batteries you're putting 3V+ between the test points, and if you put them on the wrong ones, or even bump a wrong one, you may put 3V+ across something that was only designed to handle say 1.5V or less. Would be silly to blow up your CPU or MB trying to measure the ECC traces.
 
I was thinking it'd be best to remove the processor as well as all RAM and power from the system, and then measure from open socket to socket. If the DIMM sockets are too narrow to use, I can still go to the back side of the board.
 
Sorry for the double post..

I'm a bit lost. At first I thought the traces just aren't there like Dan suggested, as I could not get a connection between DDR2_ECC[3] in the LGA1366 and CB[3] on the DIMM socket. To verify it, I tried to connect DQ[0] on the DIMM (pin #3) and DDR2_DQ[0] in the socket (pin W34). And it's the same thing, there is seemingly no connection.

But that's a DDR3 data pin, it's got to be connected?! I tried a few other circuits on the board, and continuity measurements worked there.

I checked everything multiple times, pin locations, correct memory channel and all, but I must be doing something wrong..
 
Sorry for the double post..

I'm a bit lost. At first I thought the traces just aren't there like Dan suggested, as I could not get a connection between DDR2_ECC[3] in the LGA1366 and CB[3] on the DIMM socket. To verify it, I tried to connect DQ[0] on the DIMM (pin #3) and DDR2_DQ[0] in the socket (pin W34). And it's the same thing, there is seemingly no connection.

But that's a DDR3 data pin, it's got to be connected?! I tried a few other circuits on the board, and continuity measurements worked there.

I checked everything multiple times, pin locations, correct memory channel and all, but I must be doing something wrong..

It's possible that some of the connections do exist, but not all of them and that space was provided in the design. Not for the P6T but for the P6T WS. While there are many PCB differences they may share the same basic memory trace and CPU socket design. I don't have both on hand and have never compared the two that way but it makes sense that ASUS would utilize sections of the PCB design for as many models as possible in order to reduce design times. This is especially true given how many models they have. Gone are the days when they use one high end PCB design with missing components for lower end models, but sharing aspects of the design architecture would accomplish the same basic goals.

In other words if you are designing a motherboard by computer (and I'm positive they do) then you can import sections of a given design into a new document and effectively modify the PCB layout for another model changing only what's necessary. So it makes sense to lay most of the groundwork for ECC motherboards with the WS models in mind and then just not connect one or more paths on the final design or take away one or two things to modify the base design for multiple models.
 
Hmm, but can a DDR3 data pin really be missing? I would've thought they need all 64 of them (hence, making it a 64-bit data interface in my laymans way of thinking).

Maybe the socket is actually upside down on the board?! Uhh.. I need to verify that.

Edit: Ahaha, it's upside down.. Bah! All over again...

Edit 2: Ok, it's done. I managed to confirm two data pin connections properly, so my measurement is working. I re-tested the ECC pins, and what I can state now is, that Dan, your assumption was correct: The ECC traces on the P6T Deluxe are non-existant. With that proven, I will not continue, as I don't wanna solder any makeshift connections onto the board. Still, at least now I know why it can never work on my current machines. ;)
 
Last edited:
Thanks mux Xennex81 GrandAdmiralThrawn ! Your posts were helpful in pointing me in the right direction to getting ECC working on regular AMD APUs (Raven Ridge).

Also wondering if this is possible on LGA1151 since there are motherboards that are X99 and X79 that support ECC RDIMMs.. and DDR3 on Haswell-E
 
Back
Top