RTX 2080 Ti FE Escapes Testing by Dying After 8 Hours @ [H]

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
52,348
So I did put the bad RTX, the No-POST card, not the Space Invaders card, back into the machine, just to see what happened. It did actually give me video and the ASUS splash screen...then went black. It was pretty hot too. I do not think I am going to RMA this card, but see if I can get it to someone a lot smarter than me to take a look at it.
 

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
52,348
RMA#2 Samsung with Backplate.jpg
 

Nytegard

2[H]4U
Joined
Jan 8, 2004
Messages
3,535
So, where's the pool for how long this card lasts?

Anyway, good luck with it. Practically two months in, just to get a review.
 

Raghar

Limp Gawd
Joined
Jun 23, 2012
Messages
209
Are you sure your MB isn't blowing up cards?

I wonder if MBs are durable enough to survive pixel art.
 

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
52,348
Are you sure your MB isn't blowing up cards?

I wonder if MBs are durable enough to survive pixel art.
I guess if it is different motherboards killing different cards in different ways, then maybe.
 

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
52,348
I'm glad you are taking my suggestion to heart. :) I think it will make for an interesting article and hopefully many hits for your add space. Not to mention the longer term benefit of increased (somehow) mindshare.
Your suggestion, my $1300.
 

Grimlaking

2[H]4U
Joined
May 9, 2006
Messages
3,246
This problem is reminding me of the old IBM Diskstar drives. The 80Gb ones that were failing left and right. In the end the business was sold off to Toshiba, but the fault was variance in the power to the drives as they were made with very tight expected thresholds that common desktops could not deliver reliably.
 

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
52,348
There are GPUs that run close to 90C on the backplate without problems so 75C seems far from indicating a big issue. That is, if they haven't cheaped out on the components.
Never said it was an issue. Simply and observation. As per specs, the Micron card was 10C away from maximum operating temperature under full load. Simply looking to make some comparisons between Micron and Samsung cards is where this started.
 

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
52,348
I will say this, Kyle.
I had a similar failure on a MXM GTX 1070 (MSI laptop) when i overclocked the core or RAM too high or did really stupid stuff to the curve (this is what really caused it more than a bad overclock). It wasn't space invaders, but it was random small multicolored squares. And what I found is, after the initial crash, sometimes if I tried to just reboot windows, the system would freeze on the desktop or the laptop would freeze at the MSI logo and not load windows.

I had to actually turn off and unplug the laptop and then plug it back in to work.

I could also get this same failure to happen if I did weird stuff with the voltage curves in MSI Afterburner, probably by messing up the idle voltage somehow. Same effect...multicolored small squares. Often followed by the computer failing to reboot properly. But never a brick.

Still using this card right now, TDP hardware modded to 230W TDP directly through the MXM slot (Core+145, RAM+675) no problems.
We are not so sure that it is actually only one issue based on the failures we have seen.
 
D

Deleted member 134608

Guest
FX series was a different problem. That wasn't an issue with hardware dying, but that they made the wrong R&D decisions and ended up with an architecture that sucked for D3D9 / SM2.0.

If AMD offered something that was on par with the 2080ti or near it while using less power i'd agree - But that just isn't the case here. The Vega 64 is just so far off from the 2080ti, and that's the unfortunate part - There isn't anything to use from the red team instead in this case.

I meant in general terms, not technical.
 

Nobu

Supreme [H]ardness
Joined
Jun 7, 2007
Messages
6,832
There are GPUs that run close to 90C on the backplate without problems so 75C seems far from indicating a big issue. That is, if they haven't cheaped out on the components.
Iirc, there's no thermal pad on the backplate...could be thinking of an aib card though.
 

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
52,348
Do you still have the 2080Ti Strix review card? I have been wondering if those are a viable purchase due to the different VRM setup.
Yes, but it has not been used beyond the review.
 

iamjanco

Limp Gawd
Joined
Jul 8, 2016
Messages
460
Kinda makes me wonder if it's not Space Invaders after all, but intentional ASCII art, an Easter egg if you will, programmed to fully display once the cards hit a certain temp. Maybe you just haven't gotten the cards hot enough yet to display the full, preprogrammed image:

ca78a92e91944c4e39138eaa78c9d2f6--ascii-art.jpg
 

noko

Supreme [H]ardness
Joined
Apr 14, 2010
Messages
6,644
Looking at Kyles thermal images and backplate configuration for the thermal pads makes me think of installation of heatsinks on the outside of the backplate in the hotter areas might be beneficial with some air flow, system configuration permitting. With the backplate getting up in temperature and possibly trapping and exposing heat to other components not designed for it are my first thoughts.

I just never measured backplate temperatures before so I am not sure how out of normal those temperatures may be. I've removed video cards right after shutting down and never fried my fingers with a hot backplate. Looks like Nvidia is using the backplate for supplementary cooling with the thermal pads if that is what they are. Looking online, something like these, that stick on and are omnidirectional for airflow:

 
Last edited:

tom_ozahoski

Limp Gawd
Joined
Feb 24, 2014
Messages
330
First thought would be looking for common denominator such as different components exposed to heat or temperature not designed for which one of may fail before the other on different cards. Not sure all of the symptoms for failed cards but what I recall are, space invader screen, no display, card not recognized or some internal fault in windows but will display, random crashes.
. And fire, don't forget fire!
 

Dayaks

[H]F Junkie
Joined
Feb 22, 2012
Messages
8,516
Looking at Kyles thermal images and backplate configuration for the thermal pads makes me think of installation of heatsinks on the outside of the backplate in the hotter areas might be beneficial with some air flow, system configuration permitting. With the backplate getting up in temperature and possibly trapping and exposing heat to other components not designed for it are my first thoughts.

I just never measured backplate temperatures before so I am not sure how out of normal those temperatures may be. I've removed video cards right after shutting down and never fried my fingers with a hot backplate. Looks like Nvidia is using the backplate for supplementary cooling with the thermal pads if that is what they are. Looking online, something like these, that stick on and are omnidirectional for airflow:



I think that’s pretty typical. Thermal pads on the back plate is nothing new either. The tech sheet for GDDR6 has the same 0-95C temp GDDR5 had. While high temps make everything in general degrade faster, these parts would have been very flawed to begin with to have it be a factor. We’ll have to see what Kyle digs up.
 

noko

Supreme [H]ardness
Joined
Apr 14, 2010
Messages
6,644
I think that’s pretty typical. Thermal pads on the back plate is nothing new either. The tech sheet for GDDR6 has the same 0-95C temp GDDR5 had. While high temps make everything in general degrade faster, these parts would have been very flawed to begin with to have it be a factor. We’ll have to see what Kyle digs up.
Yes, now I would also be interested in if those hotter but in spec parts that maybe exposing other parts to above their spec temperatures. Backplates can also trap heat in as a side note which you cannot capture with the thermal imager when the backplate is on since those parts will be covered.
 

noko

Supreme [H]ardness
Joined
Apr 14, 2010
Messages
6,644
. And fire, don't forget fire!
lol, now that means a part most likely shorted out or near shorted out and heated up pretty high. Not the GPU since it has automatic protection on temperature, ram? I think that also has automatic protection, VRMs? That too I think has automatic protection but not much else does.
 
Joined
May 16, 2007
Messages
630
don't tell me some sort of colossal design and prototype blunder, the prototype probably worked fine on their 250,000 dollar bench supply and no thought was given to different power supply quirks.
 

noko

Supreme [H]ardness
Joined
Apr 14, 2010
Messages
6,644
don't tell me some sort of colossal design and prototype blunder, the prototype probably worked fine on their 250,000 dollar bench supply and no thought was given to different power supply quirks.
Also the test benches with hundreds of cards being tested have power conditioners going to them so none of the cards are exposed to spikes, voltage drops etc. from the refrigerator turning on and off, air conditioner, next door neighbor tesla coil experiments etc. I wonder if Nvidia gives out the cards for testers to use in a normal user environment? I remember ATi use to do that which brought about a lot of leaks but at least the testing was real world conditions giving the engineers real world feedback. Today it appears neither Nvidia or AMD are doing that ( are they?)
 

Flexion

[H]ard|Gawd
Joined
Jul 20, 2004
Messages
1,607
Man... Just reading about all this makes me worried my RTX 2080 Ti FE is going to explode one night. XD
 

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
52,348
Plus, I'm betting that Kyle's test bench is running on a power conditioning UPS and not just a standard wall outlet. ;)

Yes.

That and his power supply isn't a standard consumer grade device but a enthusiast class power supply that has been tested and reviewed well by HardOCP.

Currently I am using the Thermaltake Toughpower iRGB Plus 1250W Titanium PSU.

The PSU that is in my personal system that gave me the Space Invaders is a SilverStone 1KW PSU.
 

jgonz

Weaksauce
Joined
May 8, 2009
Messages
98
Man... Just reading about all this makes me worried my RTX 2080 Ti FE is going to explode one night. XD

Reading this thread made me hit the eject button. My card was working fine but I could not have myself second guessing every time I turn the computer on. NVIDIA received my card today, now I wait the 5-7 business days for my refund. Great card when it works as mine did. But the cloud of fear of failure is to real to ignore.
 

Big_Dally

Zero Posts in 14 Years
Joined
Mar 27, 2004
Messages
7
Reading this thread made me hit the eject button. My card was working fine but I could not have myself second guessing every time I turn the computer on. NVIDIA received my card today, now I wait the 5-7 business days for my refund. Great card when it works as mine did. But the cloud of fear of failure is to real to ignore.

Don't blame you, as I will give up some FPS any day for peace of mind and reliability.
 

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
52,348
What about heating/cooling cycles messing with the thermal pads and them getting holes punched through them and then components on the back of the card being shorted out by the backplate?

Thermal pads that are the only thing keeping parts from shorting out with a metal plate is a very bad idea in general.

Those cards are going to be bouncing around in shipping which I am sure will end up wiggling the backplate around at least a tiny bit or at least squishing the backplate harder against the back of the card.

And then you have the heating/cooling cycles which is going to make the backplate and components expand and shrink which is also going to end up compromising the little bit of electrical insulation that the backplate has from the components on the back of the card.

My guess is that the clearance tolerances are too close or non-existent in between the backplate and some of the components on the back of the card.

It would be very interesting to see a clearance test for a failed card.
That entire backplate is covered with a nonconducive layer where it needs to be. That is the black material.
 

Falkentyne

[H]ard|Gawd
Joined
Jul 19, 2000
Messages
1,816

Ranger101

Weaksauce
Joined
Sep 11, 2015
Messages
116
... [patiently waits for 7nm cards instead of this Turing beta test program]...
Back in 2016, I had a 980ti that released "the magic smoke" and subsequently died, so I can genuinely sympathise with those
whose expensive graphics cards have failed. That being said, from what I have read on the intenet, the consensus is that the
failure rate is at worst, not significantly above normal for a new GPU release and I feel that some around here are just flogging
a DEAD horse, if you will excuse the pun.

At the time of release of Turing, I was in the market for a new GPU and not being able to afford a 2080Ti, my choice was
between Asus Strix 1080Ti and Asus Dual RTX 2080. The Strix was actually R2000 more expensive than the Dual and I ended
up buying the Dual.

I am 100% happy that I didn't buy a 1080Ti, because as others have noted, the 1080Ti is as good as it's going to get
while the RTX cards are yet to come into their own. If and when they do, those who rushed out to buy 1080Ti before
the inventory surplus evaporated might not look so smart and if you think raytracing is not going to be a big thing in
the computer graphics industry going forward, I think you are deluding yourself.

I am very much looking forward to see raytracing and other new features exclusive to RTX cards in titles like Battlefield V and
hopefully SOTTR and I don't care if framerates are halved. This kind of experience is standard with the introduction of new
technologies. I'm just happy that I'll be able to see the new tech in action and play around with it.
 

Uvaman2

2[H]4U
Joined
Jan 4, 2016
Messages
3,143
Back in 2016, I had a 980ti that released "the magic smoke" and subsequently died, so I can genuinely sympathise with those
whose expensive graphics cards have failed. That being said, from what I have read on the intenet, the consensus is that the
failure rate is at worst, not significantly above normal for a new GPU release and I feel that some around here are just flogging
a DEAD horse, if you will excuse the pun.

At the time of release of Turing, I was in the market for a new GPU and not being able to afford a 2080Ti, my choice was
between Asus Strix 1080Ti and Asus Dual RTX 2080. The Strix was actually R2000 more expensive than the Dual and I ended
up buying the Dual.

I am 100% happy that I didn't buy a 1080Ti, because as others have noted, the 1080Ti is as good as it's going to get
while the RTX cards are yet to come into their own. If and when they do, those who rushed out to buy 1080Ti before
the inventory surplus evaporated might not look so smart and if you think raytracing is not going to be a big thing in
the computer graphics industry going forward, I think you are deluding yourself.

I am very much looking forward to see raytracing and other new features exclusive to RTX cards in titles like Battlefield V and
hopefully SOTTR and I don't care if framerates are halved. This kind of experience is standard with the introduction of new
technologies. I'm just happy that I'll be able to see the new tech in action and play around with it.
Not sure if you have extra information, but google searches for me (as of about 2 days ago) only bring up "information" that is parroting nVidia, and nothing else (google searches for other than the H obviously).
 

Pantalaimon

Limp Gawd
Joined
Aug 17, 2006
Messages
194
That being said, from what I have read on the intenet, the consensus is that the
failure rate is at worst, not significantly above normal for a new GPU release and I feel that some around here are just flogging
a DEAD horse, if you will excuse the pun.
Seriously? It's normal for many people to have multiple cards fail like what's been described here?
 
Top