The Moment Your RTX 2080 Ti FE Fails

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
55,510
So we are starting to play around with our failed RTX 2080 Ti Founders Edition card this evening. Below are two pictures of the card operating on an open test bench with an ambient temperature of 68F/20C. The card ran Heaven for about 45 seconds before locking up with the same artifacts we saw when it failed last Friday while gaming. We did purchase two 2080 Ti FE cards, so we will likely start dissecting the "good" card to get some better temperature readings. It is hard to test when your $1200 video card is broken.

Thermal images with Temperature Sensors.

While 138F/59C on the exterior of the backplate might seem a bit excessive, we will surely find out. As a side note, the FLIR ONE Pro is on sale.
 
As an Amazon Associate, HardForum may earn from qualifying purchases.
That backplate temperature is pretty high. I was getting 122F around the GPU area and 112F at the outer edges.

I think multi monitor mode runs the power use around 55W. Can you check if it goes into low power for a single monitor?
 
For reference, mozzarella cheese melts at 55 degrees C (130F).
Swiss cheese melts at 66 C (150F)
Parmesan cheese melts at 83C (180F).

So you're backplate is hot enough to melt pizza cheese, but you're going to have to up your game to melt [H]ard cheeses.
 
For reference, mozzarella cheese melts at 55 degrees C (130F).
Swiss cheese melts at 66 C (150F)
Parmesan cheese melts at 83C (180F).

So you're backplate is hot enough to melt pizza cheese, but you're going to have to up your game to melt [H]ard cheeses.

But you will still get Salmonella from the chicken you bbq'ed on it.
 
For reference, mozzarella cheese melts at 55 degrees C (130F).
Swiss cheese melts at 66 C (150F)
Parmesan cheese melts at 83C (180F).

So you're backplate is hot enough to melt pizza cheese, but you're going to have to up your game to melt [H]ard cheeses.
Mozzarella, Parmesan - after this thing lights off, all that'll be left is de brie.
 
I think the VRM is failiing

If you can hold the core at a specific temperature and step it up over time to where it fails then it may be core or memory related, but if it fails at any temperature in a semi-predictable time period (would depend how long it took the specific bad part to heat up) then part of the VRM may be at fault
 
If you could measure the Vpp voltage, I think that will tell us a bunch.

I can't find a number for the current draw, but the part that caught on fire was the Vpp power supply.

There's a R005 resistor, measuring the voltage there will tell what the heat is about, and is likely the difference in the two boards.

That's the current sense resistor for the Vpp power supply.

Anything below 1.5V is likely a problem. And I'll bet a cookie the good board is there, and the bad one is below that.

You might look at both, and see if the good board Actually has capacitors on the pads where the other one probably does not have them, just empty pads.

There are three power sections for this chip, one for internal power, one for pin drivers, and one for internal biasing; I think the internal biasing, Vpp, is the fail.


Thermal cameras are awesome for troubleshooting. :)


I posted this in another thread, this is what I think happened to this card:

Looking at the PCB photos linked in the threads, this looks like the vias overloaded when the power supply section here failed.

There are 10 vias, looking like ~10 mil vias, so those are likely good for about 1.2A each, so 12A total.

This would raise the temperature to 42C, if it was alone, but power adds, so Wow, no wonder it burned.

I'm not printing that number until I check it.

This power supply is going to be the issue, if it falls, the chips overheat, and it goes downhill fast.

This power supply is the Vboost for the Memory chips, and if this voltage is too low, the chips will run hot, and draw more power.


Nvidia says capacitor failures, but I don't see anywhere Near enough caps for this power level.

This is based on a similar design on the Titan, and it has 4 large caps right beside this part, and only has one PS section; this has two. And No Big Caps.

There seems to be two empty Capacitor pads right beside the L64 inductor label, those are pretty important. :)

There don't seem to be ANY large caps at all, all are 0201, or o603 at most.


I can't believe there's no listing on how much power this memory chip draws; I've never seen a datasheet without it.

There are several references to low power, and lower voltage, but 1.25V at 10A is 12.5W, as is 1.5V at 8.33A, so lower voltage doesn't mean lower power.

This board needed twice the power level of the Titan, after all; there's two power sections.


Look at your boards; if all those cap pads are empty, there will be a recall.

Here are good pix:

https://xdevs.com/guide/evga_2080tixc/

Anyone want to measure across theR005 resistor for me, while it's running? :D That's a 0.005 ohm resistor, and is for current sensing for the controller chip...​
 
Last edited:
If the problem is power related, wont this issue do nothing but get worse when those tensor cores are switched on?

Hell if its heat related those cores aren't going to make the card any cooler...
 
For reference, mozzarella cheese melts at 55 degrees C (130F).
Swiss cheese melts at 66 C (150F)
Parmesan cheese melts at 83C (180F).

So you're backplate is hot enough to melt pizza cheese, but you're going to have to up your game to melt [H]ard cheeses.

A few months back I put a frozen steak in a container and left it on top of my SLI rig for a few hours to thaw. Wife kept coming in and asking why it smelled so good, hilarious.
 
I would be more concerned about buying a good FIRE EXTINGUISHER........

so much for "test escapes"
 
So let me get this straight, these 2080 RTX Ti Founder's Edition cards are just being benchmarked with... stock... settings? And failing?

If that's so, then that's a fucking rip off.
 
So let me get this straight, these 2080 RTX Ti Founder's Edition cards are just being benchmarked with... stock... settings? And failing?

If that's so, then that's a fucking rip off.
Now imagine people overclocking and a power mod.
 
2f25r7.jpg
 
That backplate does seem pretty hot. I certainly wouldn't want to put dual GPUs in with that kind of heat.

Also, sadly that camera is not compatible with my phone.

For reference, mozzarella cheese melts at 55 degrees C (130F).
Swiss cheese melts at 66 C (150F)
Parmesan cheese melts at 83C (180F).

So you're backplate is hot enough to melt pizza cheese, but you're going to have to up your game to melt [H]ard cheeses.

That is why you use american cheese for TIM.

Just need to get that Government cheese, everyone knows Government cheese doesn't melt.
 
That backplate does seem pretty hot. I certainly wouldn't want to put dual GPUs in with that kind of heat.

Also, sadly that camera is not compatible with my phone.





Just need to get that Government cheese, everyone knows Government cheese doesn't melt.
We will know soon from heatlesssun since he bought 2 I believe. Maybe he can give us some numbers on heat and how close they are on his MB.
 
It seems to level off really quick. Just checked after an hour run and temps are pretty much the same.
 
Can you get a better angle e.g. top down/side on view to see ram and power circuitry temperature directly closer up?
Or even from below - these microbolometers usually read reflected heat off glass/mirror etc so you may be able to stick some glass under the slot to get a better temperature reading off the reflected thermal image in the lower hot spot from an 'under slot' POV. Not sure how accurate/lossy it is on that rig so test and account for that.
If it's covered like it looks like though it isn't going to show much with fins in the way.

Thermal can be quite disorientating so made this for you all. Not perfect but within a cm or less for the wider pic.
View attachment 121279 View attachment 121280

I'm extremely curious how hot it'll be in direct measurements. Another reviewer measured ram temps at 95°, which is why tech jesus makes me laugh when he says it's not an issue. DrBorg looked at datasheets for temperatures and had some interesting results (as well as hiding and fuckery) and has plenty of experience with hot ram. At high temp it can start to corrupt data, resulting in artifacts or total loss of memory. How surprising..
We are actually going to do an article. Just some fun shots here I wanted to share.
 
Looking forward to it, this has been a wonderful train wreck to watch so far and of great interest as I love learning the how/why and also what not to do from a business PoV.
I bet they're wishing they'd only had hardware problems, Nvidia is getting double teamed on RT and reliability now.. cheers.
P.s. let me know if you want any more overlay pics like the above. Pretty simple to do.
That overlay is awesome. Would you send me over the PSD file that you used? [email protected] Pretty please. :)
 
Can you get a better angle e.g. top down/side on view to see ram and power circuitry temperature directly closer up?
Or even from below - these microbolometers usually read reflected heat off glass/mirror etc so you may be able to stick some glass under the slot to get a better temperature reading off the reflected thermal image in the lower hot spot from an 'under slot' POV. Not sure how accurate/lossy it is on that rig so test and account for that.
If it's covered like it looks like though it isn't going to show much with fins in the way.

Thermal can be quite disorientating so made this for you all. Not perfect but within a cm or less for the wider pic.
View attachment 121279 View attachment 121280

I'm extremely curious how hot it'll be in direct measurements. Another reviewer measured ram temps at 95°, which is why tech jesus makes me laugh when he says it's not an issue. DrBorg looked at datasheets for temperatures and had some interesting results (as well as hiding and fuckery) and has plenty of experience with hot ram. At high temp it can start to corrupt data, resulting in artifacts or total loss of memory. How surprising..
At least we know the vrm isn't overheating...probably. The card looks scary hot near the PCIe slot, though. Could that be an issue? The pins couldn't desolder from that amount of heat–would have to be more, right?
 
Back
Top