Big Problem - CPU Failed under OC

Joined
Jan 10, 2007
Messages
9
Well...something is seriously wrong.

I got my Opty 170 and Mushkins about 3 weeks ago. Worked through the regular overclocking steps to get to 3GHz at 1.48V (300*10)with temps under load ~42C. It was stable at 3GHz under dual Prime for over 12 hours before I canceled it. I ran there for several week with no problems. Then to push my mem clock up some I switched to 3GHz@334*9, and again it was dual Prime stable.

Then this past Sunday, I decided to try to "improve" my temps and I lapped my IHS. I did the research and followed several guids etc. I used "wet" sandpaper and did it. No problem that I noticed. Put everything back together and no real improvement to temps. Oh well.

Now, two days later (today) my system started becoming unstable with random shutdowns, reboots etc. and it just got worse. Now it won't stay up more than 2-3 minutes before it crashes. Sometimes it stays up long enough for me to run Prime in an attempt to seperate a mem vs. CPU problem. It fails almost immediately using small FTT (CPU only almost).

I cleared the CMOS and went back to the default/stock everything and it just kept getting worse until it crashed within a few minutes of loading windows. Now it is not stable even at default/stock settings.

I tried underclocking the mem, but no luck. I am afraid something has happened to my CPU. I have done a lot of reading and research over the past several weeks and I never seen any reports of CPUs failing in a short period of time except when the IHS was removed (which I did not do).

If I got water under the IHS during lapping or bent pins I would have expected problems right away, not days later. Also at 1.48V I am not really pushing much voltage through it. Temps have remained good so there was no pump failure and an overtemp issue.

Of course this had to happen AFTER I lapped it and not before because now i can't RMA the CPU if it failed for some reason. Anyway, any experiances with CPU failing (becoming unstable even at stock) in a short time? Or any idea or exeriances regarding what could have happened to all of a sudden cause my 170 to now fail even at stock settings?
 
Could've been a problem with condensation. You might have gotten water under the IHS and it was in a spot where it did no harm. Then the cpu heated up, the water evaporated, then condensed and caused a short.

I'd say you are in the market for a new CPU :(
 
My guess is it is something other than CPU. Try running a bare bones setup with only CPU, HDD, GPU, and one stick of RAM. You might also want to remove your HSF, pull the CPU, clean it up (look in the socket for debris) and reapply your thermal goop.
Can't prove it but I think CPUs either work or they don't. You might also try running Memtest on those Muskins.
Good Luck
 
You might have gotten water under the IHS and it was in a spot where it did no harm.

I was going to say.... WTF to that statement, but remembered (cause I just removed the IHS on my 170 this past weekend), that there is a section where there is no seal between the IHS and cpu's pcb. This would be an area where water could enter.

It does sound like you need a new chip.

Personally, I feel safer removing the IHS than trying to "lap" it.

Sorry for your loss... sounded like a good chip. May it rest in peace.
 
I need to do some troubleshooting, but I think it is likely that it is a problem with water getting under the IHS. I did not realize there was a portion of it not sealed. I had read about others lapping wet. If i had known there was a portion of the IHS not sealed I would not have risked it and done it dry. Live and learn. I am not sure the CPU is fried as in a dead short, it almost seems like a leakage issue maybe. I might be able to clean it up, but I would have to pop the top.

My biggest fear is that the wet lapping may have screwed me. If water got under the IHS it was not just water, but water with copper particles. Copper has high mobility under temp and even if the water dries, residual copper is likely to cause a problem with leakage between pins etc. The only way to fix that would be to pop the top rinse it/clean it off, but of course that opens another set if problems with cracking the die with the waterblock etc.

It was a great overclocker I hate to lose it so soon :( If I can convince myself it is beyond recovery otherwise, I will pop the top and see what I can do.
 
Personally, I feel safer removing the IHS than trying to "lap" it.


Most IHS are soldered on to the chip now so i wouldn't. Lapping is actually extremely easy although I dont know why you would wet the sandpaper when using it on electronics.
 
The only way to fix that would be to pop the top rinse it/clean it off, but of course that opens another set if problems with cracking the die with the waterblock etc.

It was a great overclocker I hate to lose it so soon :( If I can convince myself it is beyond recovery otherwise, I will pop the top and see what I can do.

Well... since you can't RMA it because of the obvious signs of IHS lapping, pop that sucker off! Rinse the CPU pcb (and core and surrounding circuitry) with alcohol and dry it out. As far as placing a naked core under a watercooling block... don't worry about it too much. Most retension mechanisms for Watercooling blocks use bolts, not clips. Just allow the block to touch the core surface, and evenly (and slowly) tighten down the screws. Or go out and buy yourself some 6-32 2" bolts, and some brass knurled 6-32 nuts... much safer than reusing the hardware meant for applying your block to an IHS covered cpu (plus the fact that you are using your fingers to tighten down the knurled nuts rather than a screwdriver on the bolts themselves).

I hope you can resurrect that chip...
 
like has been mentioned already...pop the top on that sucker and survey the damage.
take pics for us! clean the delidded chip surface off(isopropyl alcohol) and let it dry thoroughly before trying to resurrect it again. as far as cracking the core, its hard to do unless your just a bumbling idiot...
i've delidded 3 DC opterons and have yet to encounter a problem. ive mounted and remounted all 3 of them SEVERAL(more than 20?) times without issues. i did cover the transistors on each processor with electrical tape though. it helps with cleanup of TIM and keeps contaminants off the other smaller chips.
 
Well I poped the top, and while I can't say it was easy, it was not too hard. I managed not to gouge the PCB or nick any chips :) I cleaned the CPU and chips off well with 91% IPA. To help avoide crushing it I took the rubber pads off a dead Athlon XP I had and glued those in the corners of the PCB. It works well.

I installed it and temps were better (-2C at idle and -5C at load). While that was not my main goal, it was a nice side benefit.

When I fired everything up (at stock) and it actually booted into Windows and appeared to run. I over clocked it some, and after a short while, while crashes were less than last night I was still getting BSOD and a variety of stop erors. After some reasearch they all appeared to be memory related. I started running Memtest86 and sure enough my memory kept failing fairly quickly (pass 1 or 2). I am somewhat surprised those DDR500 Mushkins are only 3wk old and were running great, tight timings at up to 275+ MHz on only 2.5V. I was testing them using BIOS defaults (1:1 so they were running at DDR400 and SPD timmings).

I thought it could be my memory controler so I put in my previous set of DDR400 (2x512 Corsair TWINX) using the same BIOS settings. So fat they have made it through 12 passes of test #5.

It is looking like my new mem went belly up. After all the good things I heard about Mushkins, I am surprised they failed after only 3wks. Could my overclocking them (I only used 2.5V) have killed them or is it likely they just died young so to speak?
 
Well I spoke too soon :( My old memory failed memtest also, it just took a little longer. Then I retested it and it failed in pass 1. It has to be my on chip memory controller. Could it be the motherboard? Unfortunately I don't have another MB to test with so I have to decide if I order a new CPU, CPU+ MB, or MB alone.
 
Well I spoke too soon :( My old memory failed memtest also, it just took a little longer. Then I retested it and it failed in pass 1. It has to be my on chip memory controller. Could it be the motherboard? Unfortunately I don't have another MB to test with so I have to decide if I order a new CPU, CPU+ MB, or MB alone.

It's been my experience with Mushkin! (been using Mushkin! in my own systems since the SDRAM days) that they like being fed the maximum specified volts. I haven't had any die on me yet. I don't believe your motherboard is at fault... I think you're right in saying that the memory controller is bad.... which means.... a new chip. Since you tested them (Mushkins) at 1:1 at default DDR400 speeds and they failed.... it is possible that they have crapped out, but since your other pair also crapped out, I'd blame the memory controller.

pain.angel said:
Most IHS are soldered on to the chip now so i wouldn't. Lapping is actually extremely easy although I dont know why you would wet the sandpaper when using it on electronics.

Most INTEL IHSs are soldered on these days... AMD has not gone that route (thankfully) with the A64 line... not sure about AM2, but I didn't waste my money on upgrading to that platform when the 939 still serves my needs. "WET" sanding is the preferred method for lapping heatsinks and IHSs. It keeps the paper attached to the flat surface and keeps it from sliding around. Wet sanding also eliminates the grit (sandpaper residue and whatever material you are wearing down) from becoming airborne and getting everywhere. Plus the fact that you're not lapping components inside your computer with the power still plugged in and powered on. :D
 
Are you still running 2.5V on your memory?? You might want to try increasing the Ram voltage to 2.6 or 2.7. 2.5V sounds low to me...especially if your mobo happens to undervolt a little AND if you are overclocking.
Also, aren't DFI boards generally hard to set up properly? They run like champs when everything is configured right but it usually takes some tweaking to find the sweet spot. Changing Ram and CPUs might prompt a re-visit to the BIOS settings.
I've got to believe that if the system will boot and run, even for a short while, that the CPU is ok.
 
Ooops sorry, I did say 2.5V but that should have been 2.6V. Default Vdimm was 2.6V. I tried 2.7V but saw no improvement.

II think I have to bite the bullet and get a new MB and CPU and see which fixes it. Everthing was running fine until I did two things:

1) Wet lap the CPU
2) Started running at a higher FSB. I was running stable (Prime95) for ~3wks at 300*10. This weekend I wanted to run at 334*9. Initially I had problems running anything over 300 FSB (system would not boot). I flashed to a newer BIOS and then it ran at 334.

Unfortunately I did both of these at the same time and 2 days later my system crapped out.
 
Most INTEL IHSs are soldered on these days... AMD has not gone that route (thankfully) with the A64 line... not sure about AM2, but I didn't waste my money on upgrading to that platform when the 939 still serves my needs. "WET" sanding is the preferred method for lapping heatsinks and IHSs. It keeps the paper attached to the flat surface and keeps it from sliding around. Wet sanding also eliminates the grit (sandpaper residue and whatever material you are wearing down) from becoming airborne and getting everywhere. Plus the fact that you're not lapping components inside your computer with the power still plugged in and powered on. :D

I'm pretty sure they AMD went down that road too when they moved to AM2. I'm not really familiar with AMD though as I went with Intel's LGA775 for my last pc. I used dry/wet paper and just left it dry without any troubles, just made sure to clean up everything good with some iso 91% before I put it back in.
 
It has to be memory related. At stock everything, I can run Prime95 small ftt for a little while but as soon as I try to run a modified blend (each using 40% of my memory) both fail in less than a minute. It can't be my ram modules because not only does my new set fail (which have been OCed), my old set that ran in thos board for a year with no OC and with no problem fails too.

Unfortunately I can't tell if it is the memory contoller or the MB.
 
It has to be memory related. At stock everything, I can run Prime95 small ftt for a little while but as soon as I try to run a modified blend (each using 40% of my memory) both fail in less than a minute. It can't be my ram modules because not only does my new set fail (which have been OCed), my old set that ran in thos board for a year with no OC and with no problem fails too.

Unfortunately I can't tell if it is the memory contoller or the MB.

Give the mem 2.9 volts at least.
 
Give the mem 2.9 volts at least.

But why? I know the Mushkins will run 2.6-2.9V but they ran fine for a while at 2.6V. Even if they were marginal there and they have shifted/burnt in I should not have to go all the way up to 2.9V to run DDR500 mem at DDR400 rate and std timmings. If I was having an issue at high overclock, sure I would understand it then. But neither mem pair will run at 200MHz with SPD timmings at either 2.6 or 2.7V.

As far as my old modules go they ran at 2.6V for a year at least on this MB. If I need to increase the vdimm to 2.9V then something is wrong.
 
Back
Top