Intel Atom C2000 Chips Are Bricking Products

Megalith · Feb 7, 2017

There is no good news here aside from the fact that Intel has acknowledged the issue and ”set aside a pot of cash to deal with the problem.” The failure of C2000 processors is kind of a big deal because it is pervasive in networking hardware. I know that some of you guys use, say, Synology NAS boxes—well, if it goes kaput, this may be why.

Intel's Atom C2000 processor family has a fault that effectively bricks devices, costing the company a significant amount of money to correct. But the semiconductor giant won't disclose precisely how many chips are affected nor which products are at risk. Intel indicated in a January 2017 revision of its Atom C2000 family documentation that the chip line contains a clock flaw. Errata note AVR.54, titled "System May Experience Inability to Boot or May Cease Operation," explains that the Atom C2000 Low Pin Count bus clock outputs (LPC_CLKOUT0 and LPC_CLKOUT1) may stop functioning. Permanently. An Intel spokesperson in an email to The Register characterized the issue as "a degradation of a circuit element under high use conditions at a rate higher than Intel’s quality goals after multiple years of service."

dgingeri · Feb 7, 2017

Megalith said:
" ...at a rate higher than Intel’s quality goals... "

That tells me some manufacturers are overclocking these chips, or Intel's management decided to not listen to the engineers and sell them for higher clock rates than designed.

DeathFromBelow · Feb 7, 2017

That's not good.

I have to say that for home use I very much preferred AMD's AM1 chips over Intel's offerings in the low-power category. I just wish they had ECC memory support and more native SATA ports like the Intel Atom solutions.

Zarathustra[H] · Feb 7, 2017

Yikes.

I've never been a fan of Synology products, because I like to roll my own storage servers, but this was never the reason I thought people might regret their Synology investments.

Zarathustra[H] · Feb 7, 2017

Servethehome has an interesting piece adding some detail on this. They suggest a lot of the silence on this subject has to do with Intel NDA's with just about all of their vendors.

rgMekanic · Feb 7, 2017

And AMD Stock ryzes

necrosis · Feb 7, 2017

Zarathustra[H] said:
Yikes.

I've never been a fan of Synology products, because I like to roll my own storage servers, but this was never the reason I thought people might regret their Synology investments.

I have other reasons I will not by Synology products again. But this issue is a Intel thing not a Synology thing.

Zarathustra[H] · Feb 7, 2017

necrosis said:
I have other reasons I will not by Synology products again. But this issue is a Intel thing not a Synology thing.

Agree. Intel's error is the cause here, but when you have a "roll your own" system you can - if you want to - just replace the CPU or motherboard (or whatever other part may be bad) if you want to, and solve the problem.

That type of flexibility (and upgradeability) is what I like about rolling my own, rather than buying neat little packaged appliances.

criccio · Feb 7, 2017

Interesting. We just began rolling these out to customers...

https://netgate.com/products/sg-4860.html

EDIT

Looks like Netgate/PFsense has acknowledged it. https://blog.pfsense.org/?p=2297

FrozenSteel · Feb 7, 2017

Well shit... all the networking (minus the switches; those are Cisco) in my lab runs on C2000's and I have a single DS1515+... And I consider all this equipment pretty reliable. Now I find out their all ticking time bombs... FML

aaronmac3232 · Feb 7, 2017

A little over a year ago we deployed nearly 100 Cisco ISR 4321s at remote sites that have this CPU in them and they all have to be replaced now. Going to be on the road for months visiting every single site to replace them. Our stores are completely dependant on these routers and we could start having mass failures left and right any time now. Fuck.

Zarathustra[H] · Feb 8, 2017

aaronmac3232 said:
A little over a year ago we deployed nearly 100 Cisco ISR 4321s at remote sites that have this CPU in them and they all have to be replaced now. Going to be on the road for months visiting every single site to replace them. Our stores are completely dependant on these routers and we could start having mass failures left and right any time now. Fuck.

Well, that sounds like it will suck. That being said, they say "no one ever got fired for going with Cisco", so at the very least its a defensible position to be in, especially when Intels failures are so far reaching, hitting numerous vendors.

DocSavage · Feb 8, 2017

FrozenSteel said:
Well shit... all the networking (minus the switches; those are Cisco) in my lab runs on C2000's and I have a single DS1515+... And I consider all this equipment pretty reliable. Now I find out their all ticking time bombs... FML

Um, lots of Cisco gear runs C2000 processors.
From:

Breakdown list of all affected PIDs, along with versions affected and fixed:
Product ID Possibly Affected VID Fixed VID
NCS1K-CNTLR= V01, V02, V03 V04
NC55-18H18F V01 V02
NC55-18H18F= V01 V02
NC55-18H18F-BA V01 V02
NC55-18H18F-BA= V01 V02
NC55-24H12F-SE V01 V02
NC55-24H12F-SE= V01 V02
NC55-24H12F-SB V01 V02
NC55-24H12F-SB= V01 V02
NC55-24X100G-SE V01 V02
NC55-24X100G-SE= V01 V02
NC55-24X100G-SB V01 V02
NC55-24X100G-SB= V01 V02
NC55-36X100G V01, V02 V03
NC55-36X100G= V01, V02 V03
NC55-36X100G-BA V01, V02 V03
NC55-36X100G-BA= V01, V02 V03
IR809G-LTE-GA-K9 V01, V02, or V03 V04
IR809G-LTE-NA-K9 V01 V02
IR809G-LTE-VZ-K9 V01, V02, or V03 V04
IR829GW-LTE-GA-CK9 V01 V02
IR829GW-LTE-GA-EK9 V01 V02
IR829GW-LTE-GA-SK9 V01 V02
IR829GW-LTE-GA-ZK9 V01 V02
IR829GW-LTE-NA-AK9 V01 V02
IR829GW-LTE-VZ-AK9 V01 V02
ISR4321-AX/K9 V02 or lower V03 or greater
ISR4321-B/K9(=) V01 or lower V02 or greater
ISR4321/K9(=) V02 or lower V03 or greater
ISR4321BR-V/K9 V02 or lower V03 or greater
ISR4331/K9(=) V02 or lower V03 or greater
ISR4331B/K9(=) V01 or lower V02 or greater
ISR4331BR-V/K9 V01 or lower V02 or greater
ISR4351-AX/K9 V02 or lower V03 or greater
ISR4351/K9(=) V02 or lower V03 or greater
UCS-EN120E-108/K9(=) V02 or lower V03 or greater
UCS-EN140N-M2/K9(=) V01 or lower V02 or greater
ASA5506 V03 or earlier V04 or later
ASA5506H V03 or earlier V04 or later
ASA5506W V05 or earlier V06 or later
ASA5508 V04 or earlier V05 or later
ASA5516 V04 or earlier V05 or later
ISA-3000-2C2F-K9 V01, V02, V03 V04
ISA-3000-4C-K9 V01, V02, V03 V04
N9K-C9504-FM-E V01 V02
N9K-C9508-FM-E V01 V02
N9K-X9732C-EX V01 V02
MX-84 All
MS-350 All

britjh22 · Feb 8, 2017

dgingeri said:
That tells me some manufacturers are overclocking these chips, or Intel's management decided to not listen to the engineers and sell them for higher clock rates than designed.

I don't think it's the manufacturer's clocking the chips too high, sounds more like it's a QC problem on Intel's side, and it's just higher failure than their usual acceptable threshold. Whether it points to inadequate testing, a straight up faulty design, or engineers not being listened too we probably will never know.

DocSavage · Feb 8, 2017

britjh22 said:
I don't think it's the manufacturer's clocking the chips too high, sounds more like it's a QC problem on Intel's side, and it's just higher failure than their usual acceptable threshold. Whether it points to inadequate testing, a straight up faulty design, or engineers not being listened too we probably will never know.

It's probably impossible for Intel to QC everything. I remember having to get a new motherboard because the Intel Sandy Bridge chipset would eventually have SATA problems. It cost Intel some $1 Billion to replace all the affected equipment.

bman212121 · Feb 8, 2017

Doc: I'm still using a "broken" P67 7 years later, so I can attest that the rest of the product is flawless. (I was lazy and didn't bother to replace mine because I've only ever used the 2 SATA 3 ports on it, and not the other 4 sata 2 ports.)

It will be interesting to see how this issue unfolds. Some designs like Cisco's might push the hardware much more than other vendors configurations. If you run your equipment in a stable environment that's 68F with low load, it might last forever. If you run your device in an 85F environment with heavy usage, you might want to be concerned. Other devices have a C2000 but don't use the affected pins, so they might never have an issue. From the wording I read it sounds like some stuff that isn't broken might be fixable by patching, others will have to be reworked. It's likely going to depend upon how it was implemented and what options they have.

Intel Atom C2000 Chips Are Bricking Products

Megalith

24-bit/48kHz

dgingeri

2[H]4U

DeathFromBelow

Supreme [H]ardness

Zarathustra[H]

Extremely [H]

Zarathustra[H]

Extremely [H]

rgMekanic

[H]ard|News

necrosis

Gawd

Zarathustra[H]

Extremely [H]

criccio

Fully Equipped

FrozenSteel

Limp Gawd

aaronmac3232

n00b

Zarathustra[H]

Extremely [H]

DocSavage

2[H]4U

britjh22

Limp Gawd

DocSavage

2[H]4U

bman212121

[H]ard|Gawd