Graphics Card Autopsy - MSI 980 Ti "Golden Edition"

Kumbassa

n00b
Joined
Jul 2, 2020
Messages
7
Hi RazorWind. I've got an EVGA GTX 980Ti with the same issues reported here, as soon as I apply plug in the 8pin power connector it keeps my system from powering on. I've troubleshot it to the 12v pins being shorted...likely a mosfet issue. I've taken the card apart and there is no visible damage anywhere. If there was, I'd try to replace that component. However, with there being no visible damage anywhere, my skills are not good enough to troubleshoot further. Would you want me to send this one to you to take a look at as well? Next step is trash. It's been sitting here for a few months now and I'd like to fix it to put in my son's computer. If you can't fix it, oh well. I've read your terms and am fine with them. Given this one has no visible indicators of damage, I figure it may be a good one to show us how you troubleshoot it to figure out where the actual problem is.
Sounds good. No rush on it at all. Like I said, it’s been sitting here for months. I keep having this nagging feeling that I shouldn’t throw it away because it’s something minor, but I just can’t figure out what it is.

I’ll dig up your email from this thread and send it to you. Thanks for all you’ve posted here. It’s very informative!
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,808
Ok, an update.

I got Solan's card "working" again by removing the "dead" FET package. While I had the card running, I checked the duty cycle on each of the coils, and discovered that we actually have two dead phases. One is obviously the phase that's missing its FETs, and the other is the one it shares control ICs (the actual phase drivers, and the doubler) with. At this point, I'm not sure if the other phase is dead thanks to current balancing on the phase doubler or due to an actual failure.

So, at this point, I think we should ask Solan what he wants to do. I can put the card back together and return it to him with the remaining six phases working, or I can attempt to replace the dead DrMOS one more time. Every time I do this, I risk further damage to the card, but I'd be lying if I claimed I actually fixed it just by clearing the short. It'd probably work OK for a while longer, though.

So, Solan, if you'd prefer to discuss this privately, feel free to reach out, or just respond here.
 

Solan

n00b
Joined
May 30, 2020
Messages
11
Ok, an update.

I got Solan's card "working" again by removing the "dead" FET package. While I had the card running, I checked the duty cycle on each of the coils, and discovered that we actually have two dead phases. One is obviously the phase that's missing its FETs, and the other is the one it shares control ICs (the actual phase drivers, and the doubler) with. At this point, I'm not sure if the other phase is dead thanks to current balancing on the phase doubler or due to an actual failure.

So, at this point, I think we should ask Solan what he wants to do. I can put the card back together and return it to him with the remaining six phases working, or I can attempt to replace the dead DrMOS one more time. Every time I do this, I risk further damage to the card, but I'd be lying if I claimed I actually fixed it just by clearing the short. It'd probably work OK for a while longer, though.

So, Solan, if you'd prefer to discuss this privately, feel free to reach out, or just respond here.

Razorwind, do you have a recommendation or intuitive guess here? Would replacing the known bad FET along with the next FET and the phase doubler IC all in one move be a kind of gambit fix?

Otherwise, my gut inclination would be if the reference design 980 ti is 6+2 phases then I could probably get away with losing 2 phases from this 8+2 phase card since I'd still keep it at stock clocks/power/temp. I could even clock it down closer to 1000mhz reference base clock if it came to it.

Thanks again for all your time checking and working on the patient card!
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,808
Razorwind, do you have a recommendation or intuitive guess here? Would replacing the known bad FET along with the next FET and the phase doubler IC all in one move be a kind of gambit fix?

Otherwise, my gut inclination would be if the reference design 980 ti is 6+2 phases then I could probably get away with losing 2 phases from this 8+2 phase card since I'd still keep it at stock clocks/power/temp. I could even clock it down closer to 1000mhz reference base clock if it came to it.

Thanks again for all your time checking and working on the patient card!

I'd probably go for it. I suspect that, if it fails again, it will fail the same way, and I can just remove the dead components, but that's not guaranteed, so I wanted to at least offer you the opportunity to get the card back "working."

I haven't tried intense gaming, but you could totally use this to surf the web and do typical work stuff, if that's all you need it for. I can test it as-is, if you want.

One thing I noticed is that the BIOS on this card doesn't spin the fans up at all until the GPU hits 60 degrees celsius. I suspect this has something to do with why this particular design seems to fail so often. The VRM seems to be somewhat fragile, and it's obviously not being cooled very well. I think I'd probably recommend a custom fan profile to owners of these cards that keeps the fans spinning even at idle.
 

Solan

n00b
Joined
May 30, 2020
Messages
11
I'd probably go for it. I suspect that, if it fails again, it will fail the same way, and I can just remove the dead components, but that's not guaranteed, so I wanted to at least offer you the opportunity to get the card back "working."

Fair enough, let's go "big" then!

I haven't tried intense gaming, but you could totally use this to surf the web and do typical work stuff, if that's all you need it for. I can test it as-is, if you want.
I was curious about whether intense testing would just burn the next components down the line but it seems more productive to try replacing any suspect parts first. I'm driving displays with an old GTX 660 I dug out (though the GTX 560 is still kicking too). I'll ride it out for another generation or two if this 980 ti doesn't make it. I'd say they don't make them like they used but I think I've read somewhere that manufacturers are either underspec'ing or pushing components to the limit too much on the highest end cards each generation leading to the failures that seem to be increasing in trend (at least anecdotally on enthusiast/owner forums) whereas lots of lower to mid range cards seem to keep chugging along.

One thing I noticed is that the BIOS on this card doesn't spin the fans up at all until the GPU hits 60 degrees celsius. I suspect this has something to do with why this particular design seems to fail so often. The VRM seems to be somewhat fragile, and it's obviously not being cooled very well. I think I'd probably recommend a custom fan profile to owners of these cards that keeps the fans spinning even at idle.

I think that was supposed to be a selling point on at least a few Maxwell cards. I did run it with 30% instead of 0% minimum in a custom fan curve in the past 2 years but perhaps it was left in default profile the first couple of years when my sister originally purchased it (I bought it from her in 2018 when she upgraded to a Pascal card). My wife has a EVGA SSC ACX 970 in her system that also defaults to "fanless" operation so we also overrode it with a profile. They're so quiet at low speeds I'm surprised any companies wanted to pitch "fanless".
 
  • Like
Reactions: Azrak
like this

kalston

[H]ard|Gawd
Joined
Mar 10, 2011
Messages
1,269
Oh, how did I miss this thread until now!

I've had a 980 ti burn in my hands before (looks exactly like your first post and also prevented the PC from booting) but it was successfully RMA'd*. The replacement failed too (but without the smoke and smell) but it was in someone else's hands by then and I have no idea what he's done with it (warranty expired). I actually know of many failed 980 tis over the last few years (and not talking of heavily overclocked cards), maybe nvidia did not engineer that board very well? But I don't know the actual failure rate mind you, just a gut feeling.

Still, the same does not appear to be happening with Pascal generation. Even just the reference 1080 ti seems to be built like a rock (buildzoid said so after all and he knows his stuff).

*the one very strange thing about that failure though is that it also killed the (disabled) onboard audio of my motherboard AND killed my USB powered soundcard (both under warranty thankfully). Nothing else broke and all of the other parts from that build are still in use to this day (although in different machines).
 
Last edited:

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,808
Oh, how did I miss this thread until now!

I've had a 980 ti burn in my hands before (looks exactly like your first post and also prevented the PC from booting) but it was successfully RMA'd*. The replacement failed too (but without the smoke and smell) but it was in someone else's hands by then and I have no idea what he's done with it (warranty expired). I actually know of many failed 980 tis over the last few years (and not talking of heavily overclocked cards), maybe nvidia did not engineer that board very well? But I don't know the actual failure rate mind you, just a gut feeling.

Still, the same does not appear to be happening with Pascal generation. Even just the reference 1080 ti seems to be built like a rock (buildzoid said so after all and he knows his stuff).

*the one very strange thing about that failure though is that it also killed the (disabled) onboard audio of my motherboard AND killed my USB powered soundcard (both under warranty thankfully). Nothing else broke and all of the other parts from that build are still in use to this day (although in different machines).
With great respect for Buildzoid, he's actually wrong fairly often. He seems really credible, but a lot of what he says is just him speculating, and I suspect if you called him out on this, he'd freely admit that. I've definitely seen a few of his videos where I subsequently got one of those cards in my hands and discovered that he guessed wrong about what certain components do. The ones on the 290X and 295x2s being what comes to mind first.

The card in this thread is of a different design from the nvidia reference design (which is shared with the 780 Ti and contemporary Titans). The reference design appears to be a bit less prone to failure than these MSI cards are, in part, I suspect, because nvidia used better components than MSI did, and also didn't insist on totally stopping the fans. There are likely some other contributing factors when they actually fail, such as weak power supplies causing the cards to run excessively hot.

The 10 series cards are definitely better designed than earlier generations, with one of the drawbacks being that nvidia appears to have gone ham with the proprietary parts, similar to the way Apple has done this with some of their PCB components. I have a 20 series card that uses these oddball inductors where I had to figure out which factory over in China makes them and source a whole spool of them direct from the factory. I should make a thread about that card some time. It's an interesting case.
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,808
Fair enough, let's go "big" then!

I was curious about whether intense testing would just burn the next components down the line but it seems more productive to try replacing any suspect parts first. I'm driving displays with an old GTX 660 I dug out (though the GTX 560 is still kicking too). I'll ride it out for another generation or two if this 980 ti doesn't make it. I'd say they don't make them like they used but I think I've read somewhere that manufacturers are either underspec'ing or pushing components to the limit too much on the highest end cards each generation leading to the failures that seem to be increasing in trend (at least anecdotally on enthusiast/owner forums) whereas lots of lower to mid range cards seem to keep chugging along.



I think that was supposed to be a selling point on at least a few Maxwell cards. I did run it with 30% instead of 0% minimum in a custom fan curve in the past 2 years but perhaps it was left in default profile the first couple of years when my sister originally purchased it (I bought it from her in 2018 when she upgraded to a Pascal card). My wife has a EVGA SSC ACX 970 in her system that also defaults to "fanless" operation so we also overrode it with a profile. They're so quiet at low speeds I'm surprised any companies wanted to pitch "fanless".

Ok, so I swapped another mosfet onto the card, along with the doubler and dual phase controller chip, and then tested it. The same thing happened, where the affected phase killed its FET package almost immediately, which triggered my power supply's short detection. So, I removed the dead FET package again, and tested the card, and it "works."

Also, I figured out that I can install the heatsink sideways, and still have access to all the VRM components while the card is running. Handy for testing.
hsf_sideways.jpg

So, at this point, I think this about the best I can do with this card. I'll clean the flux off and reassemble it tomorrow, but it really does seem to work as it is.
 

Solan

n00b
Joined
May 30, 2020
Messages
11
So, I removed the dead FET package again, and tested the card, and it "works."
Well “works” sure beats how it was 😀
Also, I figured out that I can install the heatsink sideways, and still have access to all the VRM components while the card is running. Handy for testing.
Ah, good use of the square shaped mount holes. I think I’ve seen a repair video use a Morpheus II vertically the same way. But do the chokes get hot this way? If I recall, MSI used a thermal pad strip to make contact from the heatsink to the chokes on these cards.
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,808
But do the chokes get hot this way? If I recall, MSI used a thermal pad strip to make contact from the heatsink to the chokes on these cards.
The chokes don't usually require cooling at all. When a thermal pad is used, it's usually there mainly to support the heatsink, or keep it from rattling.

I got the card reassembled, plugged it back in, and gave it a test this afternoon. It seems to work, and even boosts up to 1530 or so, which I gather is this card's extra spicy boost clock.

I think the power measurement is a bit off, since it claims it's only pulling about 80% of its power envelope, but gives the perfcap reason as "power." Still, it survived a few minutes' worth of Heaven before I shut it down, and maxed out around 70C on the core with a very aggressive fan curve. I'll do a little more testing this evening, but it's basically ready to go back to you whenever you want it. I wish I could claim I fixed it 100%, but I have a feeling this may ultimately be a PCB issue, and not something I can fix by replacing components.
 

Solan

n00b
Joined
May 30, 2020
Messages
11
I think the power measurement is a bit off, since it claims it's only pulling about 80% of its power envelope, but gives the perfcap reason as "power." Still, it survived a few minutes' worth of Heaven before I shut it down, and maxed out around 70C on the core with a very aggressive fan curve. I'll do a little more testing this evening, but it's basically ready to go back to you whenever you want it. I wish I could claim I fixed it 100%, but I have a feeling this may ultimately be a PCB issue, and not something I can fix by replacing components.

Interesting. I wonder if it scales reporting, lowering cap to 80% maybe reports 64%? I can test it out if you’re tired of it 😀

In any case, I really appreciate that you got it running again! Let me know when/where is convenient for you to meet.
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,808
Alright, did a little more testing. I wouldn't go overclocking it, but it seems fine otherwise. It survived a run of Firestrike, and ten minutes of Tomb Raider 2013 without issue. it was at that point that my ludicrously expensive Xbox Elite controller stopped working, so I gave up and shut it off.

The Firestrike score it gets is dragged down by my test bench's slow CPU, but it beat the R9 390 I tested last time by about 1000 points.
https://www.3dmark.com/3dm/48664766

I'll reach out to you later this week about where we can meet to hand it back to you.
 
Joined
Feb 12, 2021
Messages
2
Hey RazorWind, I was wondering if you could help me out. So here are some images for an EVGA 980Ti SC model. On the back you can see that Q502 is blown up (others shifted due to my heat gun). I attempted to remove the blown up MOSFET and ended up removing a bit of PCB with it. So now the two S and S leads will not make contact. How do I go about fixing the connection if at all?
20210211_211914.jpg20210212_173910.jpg
I was also wondering if you could help me understand what happened. Basically here is the story to how I got here. I was given the GPU and told that it would not start when the 8-pin PCIE was connected to the GPU. I determined that there was a short in V13 power phase as it is in the image below. I first removed the phase at V13, and when it was removed the GPU would at least boot and stay happy in windows until I had a load on it. Once a load was applied, it would shutoff the PC and reboot. V13 was replaced and the resistance was verified against the others and seemed all good, however, the shutdown at load still happened.
20210212_174026.jpg
The GPU still shutting down made me think I could only get as much info with resistance and I needed to test the voltages. I slapped on a AIO and started probing the points on the GPU when there was power going to it. I found the 12V rail properly supplying the voltages right before the power phases. Testing the voltages at the inductor, the top one, near V98, had a voltage of .953v and the one below it at 1.012v while all the other 4 had the MSI Afterburner set voltage of .897v. At this point, I decided to remove V98 as it made the most sense to be faulty since in my head the circuit most likely split from each phase to the two inductors. Basically V98 and V8 both distribute power to the top most inductor. I removed V98 as in the image below and turned the PC on. At this point, the two inductor's closest to V98 read 12v, which I thought wasn't great, but the 330 resistor on the other side still read .897v so I thought I was safe. Went away from the bios into windows and for about 30 seconds it seemed fine, then the screen went dark. I rebooted the PC and poof, that first image of the Q502 being blown happened. So why did removing the phase at V98 blow up Q502? How are the two related? I am assuming it has to do something with the 12v rail power delivery, but why did removing this V98 finally kill it and not when I had V13 completely off? It just doesn't make much sense since like you, I thought I could at least run without a phase before things went completed catastrophic. Here was me with the V13 soldered on, but no luck at load.

20210212_174030.jpg
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,808
Hey RazorWind, I was wondering if you could help me out. So here are some images for an EVGA 980Ti SC model. On the back you can see that Q502 is blown up (others shifted due to my heat gun). I attempted to remove the blown up MOSFET and ended up removing a bit of PCB with it. So now the two S and S leads will not make contact. How do I go about fixing the connection if at all?
View attachment 328966View attachment 328965
I was also wondering if you could help me understand what happened. Basically here is the story to how I got here. I was given the GPU and told that it would not start when the 8-pin PCIE was connected to the GPU. I determined that there was a short in V13 power phase as it is in the image below. I first removed the phase at V13, and when it was removed the GPU would at least boot and stay happy in windows until I had a load on it. Once a load was applied, it would shutoff the PC and reboot. V13 was replaced and the resistance was verified against the others and seemed all good, however, the shutdown at load still happened.
View attachment 328968
The GPU still shutting down made me think I could only get as much info with resistance and I needed to test the voltages. I slapped on a AIO and started probing the points on the GPU when there was power going to it. I found the 12V rail properly supplying the voltages right before the power phases. Testing the voltages at the inductor, the top one, near V98, had a voltage of .953v and the one below it at 1.012v while all the other 4 had the MSI Afterburner set voltage of .897v. At this point, I decided to remove V98 as it made the most sense to be faulty since in my head the circuit most likely split from each phase to the two inductors. Basically V98 and V8 both distribute power to the top most inductor. I removed V98 as in the image below and turned the PC on. At this point, the two inductor's closest to V98 read 12v, which I thought wasn't great, but the 330 resistor on the other side still read .897v so I thought I was safe. Went away from the bios into windows and for about 30 seconds it seemed fine, then the screen went dark. I rebooted the PC and poof, that first image of the Q502 being blown happened. So why did removing the phase at V98 blow up Q502? How are the two related? I am assuming it has to do something with the 12v rail power delivery, but why did removing this V98 finally kill it and not when I had V13 completely off? It just doesn't make much sense since like you, I thought I could at least run without a phase before things went completed catastrophic. Here was me with the V13 soldered on, but no luck at load.

View attachment 328967
Ok, first things first, if you're working on this card with an actual heat gun, stop. You need a hot air rework station. The reference 980 Ti is notoriously fragile - you do not want to be applying heat to it indiscriminately. Using a heat gun for this is like trying to carve delicate bird sculptures with a chainsaw.

Next, when you're taking voltage measurements, it matters which side of the inductor you take them on. The function of the inductors is to turn the chopped up alternating 12V/0V output from the power stages into a smooth ~1.0V, so if you're taking your measurements on the side near the power stages, you need to be using AC mode. If you're taking your measurements on the output side, you should be using DC mode.

It should be noted that this board is sort of an oddball design, with two banks of four power stages and three inductors. What that means is that you can't just remove a phase and expect it to work correctly, because the phases share components.

To be honest, I don't know for sure what the purpose of Q502 and Q501 is. I suspect it has to do with balancing current draw between the three 12V inputs, and they tend to fail when one of the phases is missing because it throws that balance out of whack. It may also be exacerbated by weak power supplies, since one of the main reasons that MOSFETs fail is that the gate voltage drops too low, and causes the on resistance to be too high.

Your best to get the card working again is to replace all eight FDMF6823As and the memory MOSFETs. The FETs that were used on these cards are very fragile, and if you got it hot enough remove U13 and U98, you most likely damaged all the others in the process, if they weren't already degraded anyway. The proper way to replace them is to use a preheater to heat the board from the back and then use a hot air rework machine to get them just hot enough to melt the solder from above before removing and replacing them with tweezers. Use a good quality flux when you install the new ones, or you'll have all sorts of trouble with getting the QFN pads to solder right.

With respect to Q502, it wasn't pretty I was able make a replacement trace out of copper tape on Solan's card. I would try that first, but I bet the card would work without it.

Note that you're missing the two 0402 resistors that go next to Q502. Also, the large black component marked 330 next to the inductor in your last picture is a capacitor, not a resistor. Its function is to help stabilize the voltage being supplied to the core so that it doesn't drop too low when the GPU suddenly starts doing work.
 
Joined
Feb 12, 2021
Messages
2
Ok, first things first, if you're working on this card with an actual heat gun, stop. You need a hot air rework station. The reference 980 Ti is notoriously fragile - you do not want to be applying heat to it indiscriminately. Using a heat gun for this is like trying to carve delicate bird sculptures with a chainsaw.

Next, when you're taking voltage measurements, it matters which side of the inductor you take them on. The function of the inductors is to turn the chopped up alternating 12V/0V output from the power stages into a smooth ~1.0V, so if you're taking your measurements on the side near the power stages, you need to be using AC mode. If you're taking your measurements on the output side, you should be using DC mode.

It should be noted that this board is sort of an oddball design, with two banks of four power stages and three inductors. What that means is that you can't just remove a phase and expect it to work correctly, because the phases share components.

To be honest, I don't know for sure what the purpose of Q502 and Q501 is. I suspect it has to do with balancing current draw between the three 12V inputs, and they tend to fail when one of the phases is missing because it throws that balance out of whack. It may also be exacerbated by weak power supplies, since one of the main reasons that MOSFETs fail is that the gate voltage drops too low, and causes the on resistance to be too high.

Your best to get the card working again is to replace all eight FDMF6823As and the memory MOSFETs. The FETs that were used on these cards are very fragile, and if you got it hot enough remove U13 and U98, you most likely damaged all the others in the process, if they weren't already degraded anyway. The proper way to replace them is to use a preheater to heat the board from the back and then use a hot air rework machine to get them just hot enough to melt the solder from above before removing and replacing them with tweezers. Use a good quality flux when you install the new ones, or you'll have all sorts of trouble with getting the QFN pads to solder right.

With respect to Q502, it wasn't pretty I was able make a replacement trace out of copper tape on Solan's card. I would try that first, but I bet the card would work without it.

Note that you're missing the two 0402 resistors that go next to Q502. Also, the large black component marked 330 next to the inductor in your last picture is a capacitor, not a resistor. Its function is to help stabilize the voltage being supplied to the core so that it doesn't drop too low when the GPU suddenly starts doing work.
Thank you for the response. First, you are absolutely right, I was in DC Voltage mode on the inductors measuring closest to the power stages. The PSU is a RM750, and was able to shut off when the GPU went boom. Let me see if I fully understand the rest. 1) Get a hot rework station, heatgun probably destroyed everything around 2) Replace all power stages 3) Add back the little 402 resistors 4) Try to run GPU without the Q502 mosfet. Profit? Again, I got the card for free, at this point, its a fun experiment.
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,808
Thank you for the response. First, you are absolutely right, I was in DC Voltage mode on the inductors measuring closest to the power stages. The PSU is a RM750, and was able to shut off when the GPU went boom. Let me see if I fully understand the rest. 1) Get a hot rework station, heatgun probably destroyed everything around 2) Replace all power stages 3) Add back the little 402 resistors 4) Try to run GPU without the Q502 mosfet. Profit? Again, I got the card for free, at this point, its a fun experiment.
It's not so much that the heat gun probably destroyed everything as that the heat gun isn't precise enough. The power ICs you need to replace are very fragile, and you need to be deliberate about how you heat them up and cool them down. If you read the datasheet, it gives a temperature over time profile that the factory would have used to program their SMD soldering machines. You need to stay pretty close to that profile to avoid destroying your new ones when you try to solder them on. Note that I also mentioned a PCB preheater. Experience trying to replace these particular ICs on this particular board design tells me that you really need a preheater, in addition to the hot air rework station. There's too much copper in that part of the board to use a hot air station alone.

Otherwise, yeah, pretty much. Replace the FETs in all ten phases (the memory ones should be easy), put the missing components back, and see if if the card works without Q502. If you can replace the FETs successfully, I bet it does. If it doesn't, you may be able to fashion a replacement trace out of some copper tape, like I did with Solan's card. Be real careful about the QFN pins on the FDMF6823As. If you have too much solder on the center pad, it can be tricky to get the perimeter pins to solder securely.
 

Mr. Bluntman

Supreme [H]ardness
Joined
Jun 25, 2007
Messages
6,516
I'd say they don't make them like they used but I think I've read somewhere that manufacturers are either underspec'ing or pushing components to the limit too much on the highest end cards each generation leading to the failures that seem to be increasing in trend (at least anecdotally on enthusiast/owner forums) whereas lots of lower to mid range cards seem to keep chugging along.
#1 reason why I don't buy anything other than reference cards right here. If the reference card has a problem (like the RX 480 series drawing too much power from the PCIe connector) I just avoid it altogether. Too much variability on partner cards.

This is an amazing thread, glad it got necroed else I'd have missed it.
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,808
#1 reason why I don't buy anything other than reference cards right here. If the reference card has a problem (like the RX 480 series drawing too much power from the PCIe connector) I just avoid it altogether. Too much variability on partner cards.

This is an amazing thread, glad it got necroed else I'd have missed it.
Actually, the 980 Ti reference board is probably a contender for "worst design" after these fancy MSI ones. Every single one will eventually have the power stage closest to the slot connector fail - particularly with the EVGA cooler, which kind of sucks, apparently. When that happens, it often destroys the PCB unless the power supply has a super fast OCP, so you can't even repair it. The good and bad designs are obviously different every generation, but I'd be willing to bet that the nvidia 30 series reference design with the tiny boards will be among the more failure-prone ones in four or five years, with everything so tightly packed on there.

AMD cards are designed in a totally different way (better, IMHO), but the reference design is usually the best because AMD tends to over-build the reference board, and leaves the cost optimization up to the board manufacturers. This was one of the reasons we saw those super cool International Rectifier DirectFETs on the 290/X and 295X2. They're quite expensive, but they're tough as nails, and the way they use the source terminal as a heatsink is pretty awesome. They aren't very efficient, and they'd be hard to design as a power stage, but I wish that design had evolved further. In contrast, there is no reference 390X as far as I know, but the most common version, the "Nitro," uses a significantly weaker and cheaper power stage IC, the name of which I cant remember now.
 

Mr. Bluntman

Supreme [H]ardness
Joined
Jun 25, 2007
Messages
6,516
Actually, the 980 Ti reference board is probably a contender for "worst design" after these fancy MSI ones. Every single one will eventually have the power stage closest to the slot connector fail - particularly with the EVGA cooler, which kind of sucks, apparently. When that happens, it often destroys the PCB unless the power supply has a super fast OCP, so you can't even repair it. The good and bad designs are obviously different every generation, but I'd be willing to bet that the nvidia 30 series reference design with the tiny boards will be among the more failure-prone ones in four or five years, with everything so tightly packed on there.
Well damn, if I would have known that I'd have opted for a 1070 FE instead. Good to know I'm sitting on a ticking time bomb. Upgrading will be a priority, but in this market, unless a kind soul is willing to part with their 1080 or 1080 Ti at a sane price here on the forums I'm afraid I'm stuck with what I got.
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,808
Well damn, if I would have known that I'd have opted for a 1070 FE instead. Good to know I'm sitting on a ticking time bomb. Upgrading will be a priority, but in this market, unless a kind soul is willing to part with their 1080 or 1080 Ti at a sane price here on the forums I'm afraid I'm stuck with what I got.
If it's still working for you, don't worry too much about it. You can probably help prevent the failure by making sure the fans always spin, even when it's idling, and obviously venting the case well.

I have a 2080 that defaults to zero fan RPM when it's idle, and the heatsink on that thing gets wicked hot, even in an open case. I'm not sure if your card does that too, but it wouldn't really shock me. I seem to recall that was about when everyone started trying to offer a "silent" BIOS as a selling point.
 
Top