980 ti went out in smoke :-(

kalston

[H]ard|Gawd
Joined
Mar 10, 2011
Messages
1,482
So yeah, sad but true story.

I was gaming and about to take a break when suddenly boom, a puff of white smoke, an horrible smell and the PC shuts down (and proceeds to reboot but I didn't let it - I shut down the power at the plug). Following my nose I couldn't find anything wrong with the PSU nor, at first, with the GPU. However I found the smell to be very strong around the CPU cooler area. So I take everything apart, only to find that while the CPU heatsink stinks, all looks absolutely perfect in that area and there is in fact no unusual smell (it's just the heatsink that absorbed the smoke, a Noctua NH-U12P). So back to the GPU then and yep, there it is:

T5GAQh2.jpg



I monitor my temps constantly so I know my temps were fine even when it happened. Well... all my temps except the GPU VRMs since I have no way of measuring that. And while it could just be a manufacturing defect, I think I know what just happened and it might be entirely my fault.

I bought this card originally http://www.inno3d.com/images/products/x-large/products_id_200_1.jpg and since the cooling sucked (I don't overclock btw, just running everything at default), noisy, hot, core frequency fluctuating between 1000 and 1265 (small factory OC) - I slapped a Prolimatech MK-26 onto it and called it a day. After that, core frequency never dropped below 1265 in game since the temp of the core was always so damn low - but then that also means the card was sucking a lot of power and the VRMs must have been obviously pretty damn hot (but I assumed this would never be an issue at such low clocks).

My card came with some heatsink for the VRMs as you can see here : http://i.imgur.com/lLLZUTo.jpg
I left it on since installing the VRM heatsinks from Prolimatech is such a hassle. I also figured it would be absolutely fine that way (and it was, for 7 months of very heavy usage), since after all the card isn't heavily overclocked or anything. With the stock cooler, there is actually quite a bit of air being blown at the VRMs. With my Prolimatech set up, not really any. So I'm pretty positive that is the problem - VRMs got too hot for too long.

I read some stories about people running into similar issues with the Kraken G10 cooler (i.e. super cool core and super hot VRMs).

tldr: I learned about the importance of VRM cooling on big GPUs, the hard way.

*Card is still under warranty and the stock cooler had no stickers or anything so it will look just like new when I send it back to them, shouldn't be a big problem. Still this sucks and we don't really know much about Pascal yet so I'll just buy another ti (no I can't wait :p) and sell this one once it's fixed.
 
Last edited:
Yea lesson learned. VRM cooling is an issue with many setups, including some stock coolers!
 
This is why I never buy top of the line GPUs that suck a lot of power. I usually buy at least one model down, because that top model is usually pushing it in terms of voltage and heat anyway. It's both twice the price, and more likely to fail in short order. So for instance, I'd be more likely to consider buying a 970, or two 960 cards. It does suck when you want that last bit of performance, but it's very high risk. Especially if there's even a small overclock. It's also a reason I like MSI's Military Grade hardware. They use better components that are more resistant to this kind of thing and don't heat up as much. I've had a lot of issues with voltage killing computers, more than heat actually. If you do get another 980, it's probably worth trying a better OEM like MSI or ASUS.
 
Last edited:
I picked up an EVGA ACX 2.0 SC+ as an interim GPU (since I can keep it for 14 days and still get a full refund) while I'm RMAing the Inno3d. Eventually I might just keep the EVGA though. I like the company much more. Inno3d was cheap and also practically the only 980 ti I could get my hands on back then (early adopter) but yea, I guess this model was cheap for a reason.
 
:eek::eek::eek::eek:
I have Inno3D!
Although with a better cooler, but I'm switching to custom water cooling next week. I'll keep in mind to find something to cool VRM.
 
Sorry to hear about your problem, it does suck. However, you admit it is your fault and yet you still send it in under warranty? Then you buy another one with the explicit intent on returning it? Remind me never to sell or buy anything from you. At first I thought you would do the right thing when you admitted it was your fault. (Now, if you explain to the Inno3D support line about what happened exactly and they still warranty it, that is different but, if you do not tell them........) :rolleyes:
 
That looks like a blown choke, which is not usually a heat-related thing. Possibly the winding insulation failed and it shorted out.
 
Well I believe it is my fault but I am not 100% sure of that. And it's not like the VRMs weren't being cooled at all, it's just that the heatsink was weak. Before slapping the Prolimatech onto it I did a lot of research and even saw more than one person doing similar stuff (leaving the stock cooling for the VRMs while replacing the rest) because the theory is that VRMs can get super hot and there is nothing to worry about unless you are some hardcore overclocker (and like I said I don't overclock at all, all I'm asking is for stock clocks and silence). The VRM cooling on reference 980 ti's isn't impressive at all btw.

And yes as a customer in the European Union I have the right to buy a product and return it within 14 days to get a refund. What's wrong with that? I don't plan on breaking it, in fact I will even leave the packaging intact so they can resell it to someone else easily. But I might just keep it and sell the fixed Inno3d instead (that I bought from another shop in a different country). They have no way of proving that I removed the cooler since there were no stickers or anything, just fairly loose screws. The cooler on their card sucks anyway, can't even keep up with reference clocks without the fans going mad. Doesn't stop spinning on idle either. And in the end they will know better than me where the fault lies.

That looks like a blown choke, which is not usually a heat-related thing. Possibly the winding insulation failed and it shorted out.

So maybe a plain manufacturing defect? That may be it honestly, all I know is that the core was running at about 60-70c under very heavy load, which is pretty low for an air cooled 250w GPU. My case has very good airflow as well, with the CPU cooler absorbing a lot of the GPU heat :D (which is okay, it can easily cope with it)
 
Last edited:
You change the airflow when you change the cooler. The heatsink is fine with the stock system.

When I installed the G10 Kraken I epoxied on heatsinks for this reason.
 
looks like that large SMD resistor failed also (looks cracked in half) between that and the damage to the choke it's definitely an interesting failure
 
kalston: actually, old choke coils with exposed magnet wire can fail due to extreme heat. The insulation is usually enamel (obsolete) or polymer film/coating and extreme heat within the coil can melt the insulation of the wire in the coil and short it out. I have never seen the inside of a new ferrite powder choke coils but it is likely that they have the same construction.
 
Last edited:
That's the only damage I could see but I can try taking more (and better) pictures later. Haven't looked under the heatsink but I don't even know how to remove it cleanly so I think I'm not gonna try that. Also might not be visible on the pic but there is a burn mark on the heatsink itself near whatever it is that blew up.

This pic may look a bit better : http://i.imgur.com/x4ifbK1.jpg
 
kalston: actually, old choke coils with exposed magnet wire can fail due to extreme heat. The insulation is usually enamel (obsolete) or polymer film/coating and extreme heat within the coil can melt the insulation of the wire in the coil and short it out. I have never seen the inside of a new ferrite powder choke coils but it is likely that they have the same construction.

It is extremely unlikely to be heat related for a choke failure, especially in a non-overclocked card. Nvidia reference cards don't even bother to cool the chokes or put any sort of heatsink at all (I've actually only ever seen one GPU where the chokes had a thermal pad on them or any sort of cooling, it was an MSI card of some sort).
 
Well I believe it is my fault but I am not 100% sure of that. And it's not like the VRMs weren't being cooled at all, it's just that the heatsink was weak. Before slapping the Prolimatech onto it I did a lot of research and even saw more than one person doing similar stuff (leaving the stock cooling for the VRMs while replacing the rest) because the theory is that VRMs can get super hot and there is nothing to worry about unless you are some hardcore overclocker (and like I said I don't overclock at all, all I'm asking is for stock clocks and silence). The VRM cooling on reference 980 ti's isn't impressive at all btw.

And yes as a customer in the European Union I have the right to buy a product and return it within 14 days to get a refund. What's wrong with that? I don't plan on breaking it, in fact I will even leave the packaging intact so they can resell it to someone else easily. But I might just keep it and sell the fixed Inno3d instead (that I bought from another shop in a different country). They have no way of proving that I removed the cooler since there were no stickers or anything, just fairly loose screws. The cooler on their card sucks anyway, can't even keep up with reference clocks without the fans going mad. Doesn't stop spinning on idle either. And in the end they will know better than me where the fault lies.



So maybe a plain manufacturing defect? That may be it honestly, all I know is that the core was running at about 60-70c under very heavy load, which is pretty low for an air cooled 250w GPU. My case has very good airflow as well, with the CPU cooler absorbing a lot of the GPU heat :D (which is okay, it can easily cope with it)

As I said, remind me never to buy anything from you or sell anything to you. That's all.
 
They have no way of proving that I removed the cooler since there were no stickers or anything, just fairly loose screws.
:facepalm:

Does their warranty still hold if you install any aftermarket cooler? Did you mention that to them at all? Based on what I quoted, sounds like no. But not 100% sure.
 
I did not contact them yet, nor did I reinstall the stock cooler. I might try telling them the whole truth, I don't know. It's not like all companies are being all nice and honest with their customers in the first place. The dead card is still right here, with its lovely perfume. But I did not make this thread to talk about RMA and such, laws and conditions vary from country to country anyway. I just wanted to share what happened and to try and discuss the problem. There is at least one person in this thread who thinks this may not be heat related (so that would be a manufacturing defect and not my fault) and that is the kind of stuff I'm interested in reading. It's possible I messed up, but also it's possible I did nothing wrong.

Anyway I put in the new GPU and it all works fine. But somehow my USB soundcard (Xonar U7) died as well as the amp connected to it (the card is USB powered and the amp has its own PSU + a battery). That's quite puzzling. I don't see how a PSU issue could have taken out the amp for example (which is only connected to the soundcard via a line-out jack). Also the headphones are fine. The plot thickens. I enabled the onboard audio and that does work (although it sucks as expected, noisy output and all).

edit : spoke too fast amp is fine
 
Last edited:
FYI I can monitor VRM temps in the sensors tab of GPU-Z on my card. Not sure if this card has the sensors for that, but it's worth checking.
 
FYI I can monitor VRM temps in the sensors tab of GPU-Z on my card. Not sure if this card has the sensors for that, but it's worth checking.

Not all cards report temperatures. The 980Ti is one of those.
 
It is extremely unlikely to be heat related for a choke failure, especially in a non-overclocked card. Nvidia reference cards don't even bother to cool the chokes or put any sort of heatsink at all (I've actually only ever seen one GPU where the chokes had a thermal pad on them or any sort of cooling, it was an MSI card of some sort).

My MSi 980 Ti Gaming came with a strip of thermal pad on the chokes.

And yeah I really don't think chokes put out that much heat anyway, since with the exception of Gigabyte, none of the other non-ref 980 Ti's even allow choke cooling if you threw on a waterblock.
 
More like avoid nVidia ref boards lol

(they've come a long way since the exploding VRM days of Fermi for sure, but build quality is still meh)
 
Temp guns come in handy with nvidia cards.

More like avoid nVidia ref boards lol

(they've come a long way since the exploding VRM days of Fermi for sure, but build quality is still meh)

My last two nvidia reference pcb cards have been reliable, though I do prefer amd's ref board build quality.
 
This is why I never buy top of the line GPUs that suck a lot of power. I usually buy at least one model down, because that top model is usually pushing it in terms of voltage and heat anyway. It's both twice the price, and more likely to fail in short order. So for instance, I'd be more likely to consider buying a 970, or two 960 cards. It does suck when you want that last bit of performance, but it's very high risk. Especially if there's even a small overclock. It's also a reason I like MSI's Military Grade hardware. They use better components that are more resistant to this kind of thing and don't heat up as much. I've had a lot of issues with voltage killing computers, more than heat actually. If you do get another 980, it's probably worth trying a better OEM like MSI or ASUS.

I don't think top of the line GPUs fail any more than lower models. First thing I do with a new CPU, GPU, or motherboard is to OC the snot out of it. My current build is from summer 2011 and has been running OC'd 24/7 every since.
 
I did not contact them yet, nor did I reinstall the stock cooler. I might try telling them the whole truth, I don't know. It's not like all companies are being all nice and honest with their customers in the first place. The dead card is still right here, with its lovely perfume. But I did not make this thread to talk about RMA and such, laws and conditions vary from country to country anyway. I just wanted to share what happened and to try and discuss the problem. There is at least one person in this thread who thinks this may not be heat related (so that would be a manufacturing defect and not my fault) and that is the kind of stuff I'm interested in reading. It's possible I messed up, but also it's possible I did nothing wrong.

VRMs can fail for reasons other than heat. A lot of current goes through VRM chips and they can simply fail like any other electronic component. It would be unwise to presume that the VRM died because it got too hot. VRMs are designed to and often do run extremely hot. The majority of PC gamers don't know a damn thing about case cooling or VRM temps. They will stuff whatever they can afford into whatever PC they have and never look any temperature even a single time... Manufacturers still warranty the card.

My take on this is the OP modified the card and assumed all risk (unless Inno3D explicitly allows for cooler replacement, which it doesn't look like they do). By the OP's own admission the Prolimatech was blowing almost no air at the VRMs, so if heat did kill the VRMs, it's his fault. Would be a douche move to submit this card for RMA, accept responsibility and move on.

RIP 980Ti.
 
Last edited:
True VRMS can typically handle 120C, and up to 150C for really high quality mosfets. Most of the time VRM death is not heat related but an electrical issue.
 
I don't think top of the line GPUs fail any more than lower models. First thing I do with a new CPU, GPU, or motherboard is to OC the snot out of it. My current build is from summer 2011 and has been running OC'd 24/7 every since.

Well, yeah, but you run watercooling setups and you're used to high-end overclocking. I'm sure they run fine if you're willing to deal with the heat issues and monitor voltage stability, etc... but when I buy a card, I want to be able to just stick it in my computer and have it work without having to pull off the stock fan or do anything that might void the warranty. The OP's situation of either paying out of pocket for a brand-new GPU or trying to deceive the manufacturer about fan replacement (although it might not be his fault the VRMs were bad) is not an enviable one.

The problem is that with a high-end card, it seems like they generally run hot and wear themselves out faster (within 2 or 3 years) unless you are willing to invest some time and money into a custom cooling solution. With a GTX 6800, I saw the GPU board had slowly warped and the fans had partly melted until they stopped working over the course of a couple years, but my cheaper model that didn't put out as much heat ran fine for several years. And integrated even better for longevity, if not performance. Granted, my experience is mostly with older cards, maybe that's changed. Not sure. I do suspect that MSI and ASUS might be able to do better than the OEMs I used back then. Admittedly, part of the reason is also price. The extra performance comes at a huge price premium, and the reduced reliability along with the potential of having to pay nearly $600 to replace a part that's really pushing itself if it fails isn't an appealing combination.
 
Last edited:
For what it's worth this model is advertised as being modular with a cooler that's easy to remove and take apart. It's on the box.

And yes I did buy another ti already (EVGA one, which seems pretty nice so far although obviously audible under load compared to my Prolimatech setup). I will contact Inno3d later and see what happens. It's a 2 years warranty so I'm not even in a rush here.

Don't know why my USB powered Xonar U7 died at the same time though (maybe the abrupt shutdown killed it) but that's less than 6 months old so under warranty as well.
 
Back
Top