Graphics Card Autopsy - MSI 980 Ti "Golden Edition"

Glad to have written here before! Sure thing, looking forward hearing from you!
OK, first, what exact model of 980 Ti are you working on? I assume the reference board? This does matter, since each board design is obviously different, and each one is subject to its own unique modes of failure.

Anyway, the 3ish ohms you're measuring on the 6 pin connector are mostly like the DC resistance of the GPU core. You can confirm this by checking for resistance between the 6 pin connector's positive pins (those furthest from the latch) and the positive terminal on one of the big capacitors in the core VRM. On a reference board, C247 is probably a good one to use - that's the uppermost. Positive is the side with the stripe.

What I suspect you'll find is that you have almost zero resistance between the connector and the positive side of that cap. What this means is that you have a short from the 12V input directly to the core VRM output. The most likely cause for this is that one of the high side FETs melted in one or more of the power stages. You'll need to figure out which power stages are connected to the 6 pin connector (I think it's the top four), and then determine which one of them is shorted. There are a few ways to go about this; the easiest is probably to inject 1V into the 12V power plane, and look for which one gets hot (using isopropanol or freeze spray as a bellweather if you need to). Failing that, you could just start removing them until the short goes away. The downside to this approach is that they're a pain in the ass to solder back on, and you will almost certainly damage them when you get them hot enough to remove, so you'll need at least four new ones, which may be difficult to find. Use a preheater to get the board up to 150C or so before you hit them with the hot air wand. That will help immensely.
 
First, thank you very much for your follow-up and enthusiasm, I really do appreciate it a lot!

OK, first, what exact model of 980 Ti are you working on? I assume the reference board? This does matter, since each board design is obviously different, and each one is subject to its own unique modes of failure.
Yes, the unit I'm working on is the Founders Edition (a.k.a the reference board).

Anyway, the 3ish ohms you're measuring on the 6 pin connector are mostly like the DC resistance of the GPU core. You can confirm this by checking for resistance between the 6 pin connector's positive pins (those furthest from the latch) and the positive terminal on one of the big capacitors in the core VRM. On a reference board, C247 is probably a good one to use - that's the uppermost. Positive is the side with the stripe.
C247 positive to 6-pin positive = 0 Ohms
C247 negative to 6-pin positive = 2.7 Ohms
C247 positive to 6-pin negative = 2.7 Ohms
C247 negative to 6-pin negative = 0 Ohms

What I suspect you'll find is that you have almost zero resistance between the connector and the positive side of that cap. What this means is that you have a short from the 12V input directly to the core VRM output. The most likely cause for this is that one of the high side FETs melted in one or more of the power stages. You'll need to figure out which power stages are connected to the 6 pin connector (I think it's the top four), and then determine which one of them is shorted. There are a few ways to go about this; the easiest is probably to inject 1V into the 12V power plane, and look for which one gets hot (using isopropanol or freeze spray as a bellweather if you need to). Failing that, you could just start removing them until the short goes away. The downside to this approach is that they're a pain in the ass to solder back on, and you will almost certainly damage them when you get them hot enough to remove, so you'll need at least four new ones, which may be difficult to find. Use a preheater to get the board up to 150C or so before you hit them with the hot air wand. That will help immensely.
Since you've told me in your reply not to put any power onto the board, I couldn't help myself to continue doing some resistance measurments (without power) on the high side FETs that you've mentioned meanwhile.
As an Austrian citizen, this German guy I've happen to found on Youtube explains quite well which pin of the FET's needs to be checked for any shorts (even in my native language :) ).

Based on the information gained, to me it seems the 3rd FET from the top of the card (or 6th from the bottom) is shorted (resistance = 2.7-3 Ohms):
fbdf93c6-6f33-4430-8bc2-a75518d6b8d7.jpg


All the other MOSFET's have 30ish something kOhms on the same pin (I've just uploaded the one above for comparison, but the others measure the same value):
426912dc-89dc-4dec-85bd-32cf39d1a4af.jpg


I'd continue injecting 1V into the 6-pin power plane (or shall I take the 8-pin power plane? I think it doesn't make sense since the issue seems to be located to the 6-pin plane), apply some isopropyle alcohol as suggested to prove the measurements correct. If the fluid on the suspected FET will dissolve -> remove FET and recheck for shorts.
 
You’re on the right track. Let us know what the voltage injection test shows.
 
You’re on the right track. Let us know what the voltage injection test shows.
Great! Thanks for your feedback!

Unfortunately I'll have to wait until tomorrow because I've just found out my old lab power supply went missing. I'll definitely let you guys know how it went :)

If I may just ask one more question: how many Amps would you recommend injecting with 1V?
 
Great! Thanks for your feedback!

Unfortunately I'll have to wait until tomorrow because I've just found out my old lab power supply went missing. I'll definitely let you guys know how it went :)

If I may just ask one more question: how many Amps would you recommend injecting with 1V?
Start low - maybe 5-10A, and then work your way up to 20 or whatever the limits of your power supply are. If you're only supplying 1V, the GPU and its supporting circuitry should easily be able to withstand hundreds of amps without damage, but if you need more than 20 or so, and you're still not finding anything that gets hot enough to detect, you're wasting your time anyway.

Keep in mind that a short through the GPU itself, or damage to the board that caused a short between the PCB layers are possibilities, and without pretty sophisticated equipment, you won't be able to detect these things, so you have to diagnose them by elimination.
 
Start low - maybe 5-10A, and then work your way up to 20 or whatever the limits of your power supply are. If you're only supplying 1V, the GPU and its supporting circuitry should easily be able to withstand hundreds of amps without damage, but if you need more than 20 or so, and you're still not finding anything that gets hot enough to detect, you're wasting your time anyway.

Keep in mind that a short through the GPU itself, or damage to the board that caused a short between the PCB layers are possibilities, and without pretty sophisticated equipment, you won't be able to detect these things, so you have to diagnose them by elimination.
Ok, wouldn't have expected 5-10A actually, I would have started way more conservative if I wouldn't have asked haha

Thanks for clarifying though, will get back with the results as soon as tomorrow night (afternoon US time) :)
 
Last edited:
I am currently working on voltage injection. should I try with more than 1V?
 
I am currently working on voltage injection. should I try with more than 1V?
Assuming you're troubleshooting a <5 ohm short to ground on the 12V input rails, you should not exceed the operating voltage of the GPU (so, maybe 1.2V?).

The most common cause of this type of problem is a failure of the high side FETs, which causes the 12V input rail to get connected directly to the GPU core. The GPU usually survives this because the power supply's overcurrent protection shuts it off before it has time to do any damage, but that won't be the case with your voltage injection test.
 
yeah..I am trying to find the right mosfet but I am not noticing any alcohol evaporation rate difference..that's why I am asking...
 
Add more amps. If that doesn't work, just start removing them until you get the short cleared. If you had one fail, you really ought to replace all of them anyway, as any attempt replace one is likely to damage the ones next to it anyway.
 
my power generator doesnt' allow adjustment in amp, only voltage. I will start removing the mosfets once I get my new hot air station.
 
Ok, wouldn't have expected 5-10A actually, I would have started way more conservative if I wouldn't have asked haha

Thanks for clarifying though, will get back with the results as soon as tomorrow night (afternoon US time) :)
Short update as promised:

while continuing taking measurements I just realized that GND from the 8-pin header doesn't seem to have a connection to the general board GND (e.g. Display-Port connector). Am I overseeing something here? The suspected "good" AON7403 is still removed from the board.

Anyway, I got myself a new lab PSU today. Unfortunately the maximum my local hardware store had in stock was a 1-30V w/ 0-5A model which turned out to not be sufficient to evaporate the alcohol as I would've expected.
The whole area around the high side FET's did go up to hand temperature while the rest of the board (GPU itself + area near display connectors) remained at room temperature. No sign of any specific compenent being dead, at least nothing to see with bare eye *sigh*

So I guess my only two options left are either getting myself a can of ice-spray (to see any temperature difference more easily) or removing the FET's one by one until the short is cleared (if it's actually the FET's being faulty. Maybe it's a blown capacitor instead or together with the FET(s)).
 
Last edited:
Short update as promised:

while continuing taking measurements I just realized that GND from the 8-pin header doesn't seem to have a connection to the general board GND (e.g. Display-Port connector). Am I overseeing something here? The suspected "good" AON7403 is still removed from the board.

Anyway, I got myself a new lab PSU today. Unfortunately the maximum my local hardware store had in stock was a 1-30V w/ 0-5A model which turned out to not be sufficient to evaporate the alcohol as I would've expected.
The whole area around the high side FET's did go up to hand temperature while the rest of the board (GPU itself + area near display connectors) remained at room temperature. No sign of any specific compenent being dead, at least nothing to see with bare eye *sigh*

So I guess my only two options left are either getting myself a can of ice-spray (to see any temperature difference more easily) or removing the FET's one by one until the short is cleared (if it's actually the FET's being faulty. Maybe it's a blown capacitor instead or together with the FET(s)).
I have also ordered a can of BW-100 Freeze Spray to help with the thermal test also.
 
Tried desoldering the suspected FET but gave up at some point since it didn't want to come off easily (as expected TBF) and been quite busy the last week anyway.

I will try putting the card in the oven first for my next try for like 10-15 mins at 80°C or something and use my soldering station in addition to the hot air rework station. Guess that combination should do the trick.

Also I've found someone selling his broken Asus Gtx 980Ti for cheap as a donor board or vice versa.

Will get back with the results :)
 
Well any updates?
I just read this whole thing and was about to say the same. Had two of those golden MSI 980tis for a while.. they ran TW3 in 4K nicely.

Are you asking me?

As i recall, there were several cards discussed in this thread:

My original one:
I still have it, but during the pandemic, I kind of burnt out on doing computer stuff, since it meant I had to spend the weekend sitting in the same room I sit in all week. As far as I know, the GPU on my original card should still be good, so it might be possible to swap that GPU (a well-binned 980 Ti) onto the EVGA Classified 980 Ti that a fellow [H] member gave to me. I think the Classified may have a dead GPU.

Solan's card:
I got this one working by just removing the bad dual N-FETs, but I think I kept damaging the new ones when I'd solder them. I gave it back to Solan with two of the eight phases disabled, and it worked for a few months and failed again. He's sinced moved far away from me, so I can't just go get it from him to have another look. Probably another high side failure.

Various other folks have found the thread, maybe a couple were able to get their boards "working" again by doing the same thing I did, but I don't think anyone has reported a fully successful repair. I have better equipment now, so i wouldn't mind taking another crack at this if a saveable card happened to make it onto my workbench, but that hasn't happened yet, and as I mentioned before, I get burned out sitting in the same room 12 hours a day during the week, so finding one hasn't been a high priority.
 
Just to add a data point, my MSI 980Ti Golden just failed in a similar way.

Pretty catastrophic with trace delamination.

20230501_022453.jpg

Gold power supply and I've never touched overclocking. Loaded a room, took a step in Prey (2017) and PC shut down.
 
Back
Top