Cisco Says Router Bug Could Be Result Of ‘Cosmic Radiation’

HardOCP News

[H] News
Joined
Dec 31, 1969
Messages
0
I've heard a lot of excuses in my time but I am certain that this is the first time I have ever heard a company blame a bug on "cosmic radiation." Let's just hope the ol' "wrap it in tinfoil" trick works. You might not get very good wireless signal but at least your router would be safe from cosmic radiation. ;)

A Cisco bug report addressing “partial data traffic loss” on the company’s ASR 9000 Series routers contends that a “possible trigger is cosmic radiation causing SEU soft errors.” Cosmic radiation? While we all know that cosmic radiation can wreak havoc on electronic devices, there’s far less agreement as to the likelihood of it being the culprit in this case. Or that Cisco could know one way or the other.
 
seriously.. the first an only time I've seen god given opportunity to promote ECC memory and they blame it one 'cosmic radiation' so why do 'enterprise pay twice as much for the same DIMM"? I've got 512GGB ECC and not half as convinced to use it as my 'backbone' as I was 2 minutes ago.. (I've not read the 'embedded' article but as an ECC owner why should I.. I thought I was 'bullet proof' with my purchase already.. ;-)
 
Actually, I listened to a talk at a semiconductors reliability workshop a few years back from a TI R&D engineer. He made the same claim, but as many redditors point out, with the caveat that cosmic radiation is unfocused (systemic, not component-centric) and mitigated by things like ECC. And I agree with n31l, its a good time to upsell ECC.
 
  • Like
Reactions: N4CR
like this
They should have some router gear sent up to the space station and measure the SEU rate. If higher, then there might be something to it. That doesn't mean they get to just say, "lol cosmic radiation did it". It means they'll have some data to design against to make sure the random stray proton doesn't break stuff in future products.
 
humm or the memory was simply frying due to heat in improper climate controlled settings... meaning they routers run hot and don't have fan inside to save power when in air conditioned server closets and data centers but in home offices where there is no air circulating they cook the electronics... all cosmic radation is, is sunlight that is not scatter by several layers of water vapor to diffuse the heat...
 
So they're admitting they didn't engineer it well enough to survive the harsh conditions of the planet Earth?
Yeah, Earth is a moving target.
Their EM rejection nets are in the wrong place.

Once they have sussed how to stop the Earth spinning they can certify products on one side of the planet.
 
There is some truth to what they say.

"Cosmic rays have sufficient energy to alter the states of circuit components in electronic integrated circuits, causing transient errors to occur, such as corrupted data in electronic memory devices, or incorrect performance of CPUs, often referred to as "soft errors" (not to be confused with software errors caused by programming mistakes/bugs). This has been a problem in electronics at extremely high-altitude, such as in satellites, but with transistors becoming smaller and smaller, this is becoming an increasing concern in ground-level electronics as well.[77] Studies by IBM in the 1990s suggest that computers typically experience about one cosmic-ray-induced error per 256 megabytes of RAM per month.[78] To alleviate this problem, the Intel Corporation has proposed a cosmic ray detector that could be integrated into future high-density microprocessors, allowing the processor to repeat the last command following a cosmic-ray event.[79]"

Cosmic ray - Wikipedia, the free encyclopedia
 
I've heard a lot of excuses in my time but I am certain that this is the first time I have ever heard a company blame a bug on "cosmic radiation." Let's just hope the ol' "wrap it in tinfoil" trick works. You might not get very good wireless signal but at least your router would be safe from cosmic radiation. ;)

I for one would NOT like to have to pull our 9910's out of the rack to wrap them in tinfoil, I think I'd just let the cosmic rays have their way with it.:wtf:
 
Who remembers when RAM and other chips had metal caps over the die to prevent cosmic rays from corrupting operation? I do, I do!

Attack of the Cosmic Rays! (Ksplice Blog) (tl;dr there are many sources of noise that can corrupt bits, and cosmic rays/radiation is only one way of several)
 
  • Like
Reactions: N4CR
like this
Cosmic rays are a problem with in-space electronics, but not at 500 feet above sea level.
 
I vaguely recall reading about one of the differences between the E5 and E7 chips being that the E7s have safeguards against radiation like this.

edit: Found it! source
Intel said:
Soft errors mostly occur because of random events affecting electronic circuits at the molecular level, such as alpha particles or cosmic rays dislodging electrons and therefore moving charges from one part of a circuit to another. Such events usually change the logic behavior of one or more gates.
 
If only I could harness this cosmic energy and use it to reek Havoc around the world.
 
"Sunspots" is my favorite bullshit IT excuse. It works for anything. Network is slow? Those damn sunspots, we can't do anything about it.
 
Or 'alpha particles', or 'beta particles'. They used to work, too. Haven't tried them in quite a while, though.
 
SEU due to high energy neutrons is not bullshit and there are plenty of papers on this topic. This is why the aerospace industry either sticks with 90nm (less susceptible) tech, ensure recoverable statemachines in their code, or majority wins decision making...

There are ways to manage it and poor code can result in locked statemachines
 
SEU due to high energy neutrons is not bullshit and there are plenty of papers on this topic. This is why the aerospace industry either sticks with 90nm (less susceptible) tech, ensure recoverable statemachines in their code, or majority wins decision making...

There are ways to manage it and poor code can result in locked statemachines

The "hot" chip for space is the RAD5500...45 nm SOI...I doubt we will se 7nm CPU is space for the forseeable future ;)

RAD5500 - Wikipedia, the free encyclopedia
 
Last edited:
The "hot" chip for space is the RAD5500...45 nm SOI...I doubt we will se 7nm CPU is space for the forseeable future ;)

RAD5500 - Wikipedia, the free encyclopedia
You will find that the processor will do alot of error checking at die level for potential registers flipping . Read the papers 90nm was the last th
The "hot" chip for space is the RAD5500...45 nm SOI...I doubt we will se 7nm CPU is space for the forseeable future ;)

RAD5500 - Wikipedia, the free encyclopedia
What you will find is that core will utilise some sort of error correcting or majority wins.
Re-read what I wrote , there is a nice OR there... your post just re-affirms my statement.
Likewise if you read the datasheet it states <1e-9 SEU events per day & that it is latchup immune. the latchup immune was the work put in to protect aspects of the process

90nm is the smallest geometry that can be classed as "SEU hardened" when no other additional considerations are done. This is why the ProASIC FPGA line from Microsemi still exists.
 
Back
Top