ESD & Electromigration Rant


Ice Czar

Its been awhile since I posted my ESD rant :p
and since im now the mod of Power supplies...
It is a power issue, more or less, its certainly related
first up

ESD Precautions and Practices
The ATX motherboard specification maintains +5VSB power to the motherboard
unlike PC/XT, AT, Baby AT and LPX form factors that employed a manual switch to turn on the power to the motherboard, the ATX form factor employs a "soft power" scheme
allowing software control of power, allowing the OS or other ap to turn the computer off, it also allows wake on LAN or Wake on WAN. Since there is a low level of power supplied to the board at all times, you need to address this whenever your swapping out components

the ideal solution is to unplug the power supply from the socket then ground the case with an alligator clip to a seperate ground point and use a wristrap to ground you

the alternative method is to turn off the PSU with its own switch
reliable only if the board has a 5VSB LED indicator on it
at which point you know the board is unpowered, yet the case is still grounded via the AC Socket

the alternative method if there is no switch on the PSU and LED mobo power indicator is to unplug the PSU from the wall socket, then disconnect the main power connector and plug the PSU back into the AC Socket, where once again you have the case grounded

keep in mind that typically the PSU physically contacts the case with a metal to metal contact, at least in the past, and would have at least some contact via the screws, but these adys with painted cases and supplies, some augment that with a grounding strap for the PSU itself (very typical in server supplies where they take no chances with contact)

and religously touch a bare metal surface in the case to ground yourself when swapping componets (this is very typical in here with enthusiasts)
but all in all wristraps are cheap and so are alligator clips and a bit o wire
a grounded work pad is also a good investment, as is being aware of how large a role the lack of humidity has in ESD events, especially in Winter when hot air heating desiccates the moisture out of the air

(see I live on the Rocky Mountain High Dessert Plateau, where its already dry and in Winter with central home heating I humidify the environment when working on components)

ESD Reference
"According to (not so) recent studies conducted by the AT & T Bell labs, 25 % of all component failures today are related to E.S.D and out of all defective components that arrive 50%are damaged by E.S.D. the annual damage due to these failures is estimated at 25 Billion dollars"

An Integrated Circuit (IC) consists of several transistors fabricated on one chip. Due to the advances in L.S.I and V.L.S.I thousands of transistors are crowded on a single chip. By decreasing the thickness of the gate oxides and interconnecting lines the manufacturers hope to achieve much higher speeds at very low power consumption. But under these conditions if the Electrostatic Discharge passes through an IC and the current that results is not diverted or diminished by a suitable protective mechanism, the discharge may raise the temperature of the junction inside the component to melting point which will cause damage to the junction or interconnecting lines. Since surface mount devices are smaller than conventional ICs they are even more susceptible to E.S.D damage. E.S.D causes two main types of failures: -

1. Immediate failure where the effect can be readily seen by the equipment manufacturer.

2. Delayed failure where the device is damaged only upto the point where it may pass quality control tests, but wears out sooner than its rated time

Table 2
Examples of Static Generation
Typical Voltage Levels

Means of Generation .........10-25% RH ......65-90% RH
Walking across carpet ......,35,000V ...........1,500V
Walking across vinyl tile ....12,000V ............250V
Worker at bench ................6,000V .............100V
Chair with urethane foam ..18,000V ...........1,500V

ESD Damage—How Devices Fail
Electrostatic damage to electronic devices can occur at any point from manufacture to field service. Damage results from handling the devices in uncontrolled surroundings or when poor ESD control practices are used. Generally damage is classified as either a catastrophic failure or a latent defect.

Catastrophic Failure
When an electronic device is exposed to an ESD event it may no longer function. The ESD event may have caused a metal melt, junction breakdown, or oxide failure. The device's circuitry is permanently damaged causing the device fail. Such failures usually can be detected when the device is tested before shipment. If the ESD event occurs after test, the damage will go undetected until the device fails in operation.

Latent Defect
A latent defect, on the other hand, is more difficult to identify. A device that is exposed to an ESD event may be partially degraded, yet continue to perform its intended function. However, the operating life of the device may be reduced dramatically. A product or system incorporating devices with latent defects may experience premature failure after the user places them in service. Such failures are usually costly to repair and in some applications may create personnel hazards.

It is relatively easy with the proper equipment to confirm that a device has experienced catastrophic failure. Basic performance tests will substantiate device damage. However, latent defects are extremely difficult to prove or detect using current technology, especially after the device is assembled into a finished product.

Static Electricity - Electrostatic Discharge (ESD)

"Most books or articles indicate that a spark can't be seen until the voltage on your body reaches between 450 to 750 VDC. Others indicate that they are very hard to notice until it reaches 1000 VDC. For most people, to feel a shock from a static electricity discharge the voltage is between 2,000-4,000V. A 0.5mm arch of static electricity carries approximately 2850V."

Semiconductor Electromigration In-Depth

Ground that mat, wriststrap and if possible humidify the environment

when handling components, avoid touching any chips, circuitry and the slot contact fingers, hold PCB boards by the edges wherever possible


Originally posted by SB22
. As long as you didn't feel any sort of "shoch" between you and your equipment, you should be fine.

ESD Susceptibility Analysis

"ESD votages sufficient to damage semiconductor devices are often lower than the threshold of human sensory perception, making a person unaware that a static discharge has taken place"

Originally posted by Deadlierchair
Wow, good post Ice Czar...but to not be totally anal about all of those things, would it be pretty much safe to touch stuff if I touch the metal on my case while it is off, but still plugged in and grounded?

thats the basic proceedure most employ, its best if you do that like every other move, and be aware of exactly how much RH (Relative Humidity) influences Static Discharge
Taking great care to never touch any chip or lead, handling only the PCB, perferably by the edges.

the other point of my post is that while the immediate cause and effect relationship of catastrophic failure, using the "typical" proceedure is low...

This board is filled every day with people who have developed RAM errors, data corruption problems (generally RAM) ect, Most of which can be traced to either poor power regulation (Transient Response) of the PSU, or ESD

Latent defects caused by ESD in any IC (and they are just everywhere from HDDs to NIC, CPU, RAM ect) are massively underated as a cause of problems. If you have eliminated power fluctuation problems (PSU voltage regulation and power conditioning) and still experience a component failure, odds are that it was a latent defect, either from installation, or one that wasnt caught during manufacturing.
the membership displays a cavalier attitude towards this issue for 2 reasons, RMA's are pretty easy, and they rarely employ the same component for its fully rated lifespan, upgrading before the eventual premature failure becomes appearent.

a latent defect, not only effects the lifespan, it degrades the performance of the IC as well, and is often the difference between the "Golden Chip" benchmark leader, the norm, and "why cant I get the same OC as this guy? Ive got the same components"
Electromigration is the mechanisim that typically first degrades and then kills IC Chips (Integrated Circuits)

Semiconductor Electromigration In-Depth @ < the totally understandable link (if somewhat dated these days in particulars)

What is electromigration?

Harris Semiconductor Lexicon of technical terms puts it this way:

"Motion of ions of a metal conductor (such as aluminum) in response to the passage of high current through it. Such motion can lead to the formation of "voids" in the conductor, which can grow to a size where the conductor is unable to pass current. Electromigration is aggravated at high temperature and high current density and therefore is a reliability "wear-out" process. Electromigration is minimized by limiting current densities and by adding metal impurities such as copper or titanium to the aluminum."

Electromigration is an effect that occurs when an extremely dense electron flow knocks off atoms within the wire and moves them, leaving a gap at one end and high stress at the other. In a chip, the formation of such a void will cause an open circuit and result in a failure. At the other end, the increase of stresses can cause fracture of the insulator around the wire and shorting.

Electromigration and Voids @ Cornell
Electromigration Simulator @ MIT
processing temperature -- increasing the processing temperature will increase the initial tensile stress present in the line due to thermal mismatch, thereby leading to earlier failures. To neglect the thermal stress, set the process temperature the same as the test temperature.
test temperature -- increasing the test temperature will dramatically reduce the calculation time since diffusivity is follows the temperature by an Arrhenius relationship.

the Arrhenius equation

roughly translating to this rule of thumb
Each 10°C (18°F) temperature rise reduces component life by 50%
Conversely, each 10°C (18°F) temperature reduction increases component life by 100%.

however there is more to it (caution heavy wading)
"Electromigration mechanisms are accelerated by current density as well as temperature. The general relationship is sometimes referred to as Black's Equation. Just as with the Arrhenius equation, we can observe the electromigration effects on lifetimes using a graphical approach."

so, both elevated temperature, and voltage, can cause voids to form in the circuits of any chip, a problem that becomes more and more important as the number of atoms that comprise the width of that circuit decrease, (that first link was written when the manufacturing scale was at 0.18 microns we are now at .09 microns (90nm) with some chips) in addition the clock rates that the power is cycled through is all that much higher as well, making the chips all that more suceptible to ESD, voltage irregularities and temperature.

In short, Im very serious about my ESD precautions, power conditioning, power supplies and my thermal solution (under 40C CPU at full load w\ 30C SYS minimum) , and considering the price Ive paid for my workstation I want to squeeze every last hour out of it that I can, unlike a gaming rig, that may or maynot find another role in life once its yesterdays news, mine will be retired to a nice animation cluster to live out its full lifespan, but these cautions are just as important for anyone aspiring to a record overclock, or killer benchmark ;)
Im just documenting an expired bookmark :p

electromigration aint so easy, even if your intel

Factory flaws yield headaches for chipmakers

April 24, 2004 Reuters

For chipmakers, problems on the factory floor are increasingly turning into big headaches in the executive suite.

Some of the world's biggest chipmakers have lost both money and time straightening out the extraordinarily complex process of turning microchip designs and discs of silicon into working electronics.

The difficulties have only worsened as the industry adopts new design features smaller than the wavelength of light, while moving to larger silicon wafers that can produce more than twice as many chips as previous wafers.

While those new technologies greatly increase the potential for churning out stacks of more powerful chips at lower costs, they have also thrown up hurdles that even the largest chipmakers have occasionally stumbled over.

The most recent slip-up comes from IBM's $3 billion fabrication plant, or fab, in East Fishkill, N.Y. IBM executives have acknowledged that manufacturing problems at the plant contributed to a $150 million loss that the company's chip business had last quarter.

The fab, which produces chips with features as small as 90 nanometers, or billionths of a meter, is one of the world's most advanced.

"It does seem that there has definitely been a bit of a bigger hurdle in the transition to 90-nanometer for the industry at large," said IBM spokesman Chris Andrews.

IBM's troubles have also drawn complaints from customers who pay Big Blue to build their chips. Apple Computer earlier this month said IBM failed to provide it enough chips for its Xserve G5 computer.

"Obviously, we were not happy with the delivery that we got," Timothy Cook, Apple's executive vice president of worldwide sales and operations, said on a conference call.

Manufacturing problems are also believed to have affected memory chipmakers. Analysts say memory companies have had troubles shrinking the design features on their chips to the 110-nanometer level, leading to shortages and price jumps that are rippling through the computing supply chain.

An analyst in Taiwan, Shawn Wang of KGI Securities, has noted that one memory chipmaker there, Nanya Technology, has been delayed in moving to high-volume production for its most advanced memory chips.

Intel, the world's largest chipmaker, has not been done in by problems in the factory. But even the world's biggest spender on chipmaking gear faced trouble around the beginning of the year when a design problem popped up in an unreleased notebook computer chip named Dothan.

The design flaw, which affected the chip's ability to be manufactured, pushed back by three months the launch date for the chip, and found its way into comments made by Intel's president during its fourth-quarter earnings conference call.

"We were disappointed that we did not begin shipping Dothan as planned," Intel President Paul Otellini told investors and analysts in January.

Adopting both smaller feature sizes and larger silicon wafers presents an especially large challenge to chipmakers, and occurs only about once a decade, said Intel spokesman Chuck Mulloy. Fabricating chips from larger wafers can yield significant cost benefits for manufacturers.

Intel, he said, managed that risk by mastering one advance at a time. "Any process generation shift is fraught with risk, unless you really focus on it," Mulloy said.

Attention to "yield"--the industry term for how successful a fab is in making defect-free microchips--has become a growing concern, if not an obsession, for chipmakers around the world.

"Yields get to be a bigger and bigger issue every time we go to a new technology node," said Mark FitzGerald, a semiconductor manufacturing equipment analyst at Banc of America Securities. "It gets to be more rocket science."

As much as manufacturing problems hurt the chipmakers, FitzGerald pointed to one company that stands to benefit from such problems. KLA-Tencor, the largest maker of equipment for inspecting silicon wafers for defects, remains one of the analyst's top stock picks