Intel Atom C2000 Chips Are Bricking Products

Megalith

24-bit/48kHz
Staff member
Joined
Aug 20, 2006
Messages
13,000
There is no good news here aside from the fact that Intel has acknowledged the issue and ”set aside a pot of cash to deal with the problem.” The failure of C2000 processors is kind of a big deal because it is pervasive in networking hardware. I know that some of you guys use, say, Synology NAS boxes—well, if it goes kaput, this may be why.

Intel's Atom C2000 processor family has a fault that effectively bricks devices, costing the company a significant amount of money to correct. But the semiconductor giant won't disclose precisely how many chips are affected nor which products are at risk. Intel indicated in a January 2017 revision of its Atom C2000 family documentation that the chip line contains a clock flaw. Errata note AVR.54, titled "System May Experience Inability to Boot or May Cease Operation," explains that the Atom C2000 Low Pin Count bus clock outputs (LPC_CLKOUT0 and LPC_CLKOUT1) may stop functioning. Permanently. An Intel spokesperson in an email to The Register characterized the issue as "a degradation of a circuit element under high use conditions at a rate higher than Intel’s quality goals after multiple years of service."
 
" ...at a rate higher than Intel’s quality goals... "

That tells me some manufacturers are overclocking these chips, or Intel's management decided to not listen to the engineers and sell them for higher clock rates than designed.
 
That's not good.

I have to say that for home use I very much preferred AMD's AM1 chips over Intel's offerings in the low-power category. I just wish they had ECC memory support and more native SATA ports like the Intel Atom solutions.
 
Yikes.

I've never been a fan of Synology products, because I like to roll my own storage servers, but this was never the reason I thought people might regret their Synology investments.
 
Yikes.

I've never been a fan of Synology products, because I like to roll my own storage servers, but this was never the reason I thought people might regret their Synology investments.
I have other reasons I will not by Synology products again. But this issue is a Intel thing not a Synology thing.
 
I have other reasons I will not by Synology products again. But this issue is a Intel thing not a Synology thing.

Agree. Intel's error is the cause here, but when you have a "roll your own" system you can - if you want to - just replace the CPU or motherboard (or whatever other part may be bad) if you want to, and solve the problem.

That type of flexibility (and upgradeability) is what I like about rolling my own, rather than buying neat little packaged appliances.
 
Well shit... all the networking (minus the switches; those are Cisco) in my lab runs on C2000's and I have a single DS1515+... And I consider all this equipment pretty reliable. Now I find out their all ticking time bombs... FML
 
A little over a year ago we deployed nearly 100 Cisco ISR 4321s at remote sites that have this CPU in them and they all have to be replaced now. Going to be on the road for months visiting every single site to replace them. Our stores are completely dependant on these routers and we could start having mass failures left and right any time now. Fuck.
 
A little over a year ago we deployed nearly 100 Cisco ISR 4321s at remote sites that have this CPU in them and they all have to be replaced now. Going to be on the road for months visiting every single site to replace them. Our stores are completely dependant on these routers and we could start having mass failures left and right any time now. Fuck.


Well, that sounds like it will suck. That being said, they say "no one ever got fired for going with Cisco", so at the very least its a defensible position to be in, especially when Intels failures are so far reaching, hitting numerous vendors.
 
Well shit... all the networking (minus the switches; those are Cisco) in my lab runs on C2000's and I have a single DS1515+... And I consider all this equipment pretty reliable. Now I find out their all ticking time bombs... FML
Um, lots of Cisco gear runs C2000 processors.
From:

Breakdown list of all affected PIDs, along with versions affected and fixed:
Product ID Possibly Affected VID Fixed VID
NCS1K-CNTLR= V01, V02, V03 V04
NC55-18H18F V01 V02
NC55-18H18F= V01 V02
NC55-18H18F-BA V01 V02
NC55-18H18F-BA= V01 V02
NC55-24H12F-SE V01 V02
NC55-24H12F-SE= V01 V02
NC55-24H12F-SB V01 V02
NC55-24H12F-SB= V01 V02
NC55-24X100G-SE V01 V02
NC55-24X100G-SE= V01 V02
NC55-24X100G-SB V01 V02
NC55-24X100G-SB= V01 V02
NC55-36X100G V01, V02 V03
NC55-36X100G= V01, V02 V03
NC55-36X100G-BA V01, V02 V03
NC55-36X100G-BA= V01, V02 V03
IR809G-LTE-GA-K9 V01, V02, or V03 V04
IR809G-LTE-NA-K9 V01 V02
IR809G-LTE-VZ-K9 V01, V02, or V03 V04
IR829GW-LTE-GA-CK9 V01 V02
IR829GW-LTE-GA-EK9 V01 V02
IR829GW-LTE-GA-SK9 V01 V02
IR829GW-LTE-GA-ZK9 V01 V02
IR829GW-LTE-NA-AK9 V01 V02
IR829GW-LTE-VZ-AK9 V01 V02
ISR4321-AX/K9 V02 or lower V03 or greater
ISR4321-B/K9(=) V01 or lower V02 or greater
ISR4321/K9(=) V02 or lower V03 or greater
ISR4321BR-V/K9 V02 or lower V03 or greater
ISR4331/K9(=) V02 or lower V03 or greater
ISR4331B/K9(=) V01 or lower V02 or greater
ISR4331BR-V/K9 V01 or lower V02 or greater
ISR4351-AX/K9 V02 or lower V03 or greater
ISR4351/K9(=) V02 or lower V03 or greater
UCS-EN120E-108/K9(=) V02 or lower V03 or greater
UCS-EN140N-M2/K9(=) V01 or lower V02 or greater
ASA5506 V03 or earlier V04 or later
ASA5506H V03 or earlier V04 or later
ASA5506W V05 or earlier V06 or later
ASA5508 V04 or earlier V05 or later
ASA5516 V04 or earlier V05 or later
ISA-3000-2C2F-K9 V01, V02, V03 V04
ISA-3000-4C-K9 V01, V02, V03 V04
N9K-C9504-FM-E V01 V02
N9K-C9508-FM-E V01 V02
N9K-X9732C-EX V01 V02
MX-84 All
MS-350 All
 
That tells me some manufacturers are overclocking these chips, or Intel's management decided to not listen to the engineers and sell them for higher clock rates than designed.

I don't think it's the manufacturer's clocking the chips too high, sounds more like it's a QC problem on Intel's side, and it's just higher failure than their usual acceptable threshold. Whether it points to inadequate testing, a straight up faulty design, or engineers not being listened too we probably will never know.
 
I don't think it's the manufacturer's clocking the chips too high, sounds more like it's a QC problem on Intel's side, and it's just higher failure than their usual acceptable threshold. Whether it points to inadequate testing, a straight up faulty design, or engineers not being listened too we probably will never know.
It's probably impossible for Intel to QC everything. I remember having to get a new motherboard because the Intel Sandy Bridge chipset would eventually have SATA problems. It cost Intel some $1 Billion to replace all the affected equipment.
 
Doc: I'm still using a "broken" P67 7 years later, so I can attest that the rest of the product is flawless. (I was lazy and didn't bother to replace mine because I've only ever used the 2 SATA 3 ports on it, and not the other 4 sata 2 ports.)

It will be interesting to see how this issue unfolds. Some designs like Cisco's might push the hardware much more than other vendors configurations. If you run your equipment in a stable environment that's 68F with low load, it might last forever. If you run your device in an 85F environment with heavy usage, you might want to be concerned. Other devices have a C2000 but don't use the affected pins, so they might never have an issue. From the wording I read it sounds like some stuff that isn't broken might be fixable by patching, others will have to be reworked. It's likely going to depend upon how it was implemented and what options they have.
 
Back
Top