Nvidia RTX 2080 Ti graphics cards are dying on a lot of users

I should note, this was with power limit to 120%, which I was doing to test the system. Put it back at 80% and was able to play around 1 hour without a crash.

From my personal experience I do not advise going past 110% power limit on the founders edition cards. Even with great case thermals the card just runs way to hot above that mark as the cooler it comes with isn't adequate IMO. Running the max of 124% is absolutely going to kill these cards given the heat and the frequent spikes of about 6-10% higher then what a user puts the limit at.
 
From my personal experience I do not advise going past 110% power limit on the founders edition cards. Even with great case thermals the card just runs way to hot above that mark as the cooler it comes with isn't adequate IMO. Running the max of 124% is absolutely going to kill these cards given the heat and the frequent spikes of about 6-10% higher then what a user puts the limit at.

So if I have both BIOs modded and shunt modded and am at about 200% PL would that be too much in your opinion?

:D

It’s under water of course... they only have to live until the 7nm ti card, right? Yolo / carpe diem!! Seriously though, some of these cards boost to 2000Mhz+ out of the box. I wonder if part of it is they pushed them too far.
 
So if I have both BIOs modded and shunt modded and am at about 200% PL would that be too much in your opinion?

:D

It’s under water of course... they only have to live until the 7nm ti card, right? Yolo / carpe diem!!

Wow, what sort of clocks are you able to reach with that power limit.
 
Wow, what sort of clocks are you able to reach with that power limit.

It only hits about 400W so I don’t hit PL anymore. I sustain 2100Mhz/+1000 on the vram. All the cards hit about the same clocks... sustaining is a different story. My Asus Dual would settle in the 1800s with the stock heatsink mainly because of power limit.
 
I'm guessing right about now Jensen is regretting his "it just works" spiel he repeated over and over and over at the insomnia cure reveal of the Turing series.
 
I see people talking about power supply issues, what cables to use and all; let me try to give some insight there.

300W at 12V is 25Amps.

Across 4 18awg wires in a single cable:

https://www.calculator.net/voltage-...ance=1&distanceunit=feet&amperes=25&x=36&y=24

This shows a 0.08V drop, using One 8-pin cable, with 4 power wires. That's using one cable for both connectors, and is worst case.

That is completely OK. The power fed to these chips is 3.3V at a maximum, so there's plenty of headroom for the on-board power supply.

If there's too much drop in a switching power supply, the inductors and capacitors fail first; then the transistors overheat and blow.

That said, I always use two cables back to the power supply, just because. :)

We're seeing memory or chip issues, so it should not be power supply related, unless your power supply is dropping below ~10V. And then it won't be the memory.

You should be able to see that on any monitoring program.


More likely, this has to do with the soldering of a single batch or two of boards; Nvidia has famously had soldering issues and lead free solder is not very forgiving, and requires removing the chips to fix.

I removed and reballed a couple of laptops with the Nvidia problem a few years ago for a friend at work, and I used lead balls (really, lol.) and they haven't failed again.

Look up reballing BGA's for reference.

It requires special equipment, and running the PCB thru a reflow machine; so not for the hobbyist, really.
 
So I did some more testing. Trying to disable as much stuff as I can, but still getting crashing.

I uninstalled the Nvidia USB-C driver. I disabled Surround and SLI. I disabled my 2 side monitors, so just a single monitor and a single card (the older one). Also set power limit to 80%.

At first it was looking okay but Sot Tomb Raider crashed after about 20 minutes when adjusting settings. There was no blue screen or anything, the game just disappeared.

Tried Dying Light. Was able to play around 20 minutes as well, but it crashed on an menu screen. Same MO, the game just disappeared and I was back at the desktop.

I think at this point I will have to remove the older card from my system and see if the 0324 replacement by itself is okay.
 
This time I removed the second card and SLI bridge, in case they were complicating things. Still got crashing in Tomb Raider and Dying Light after about 10 minutes.

upload_2018-11-4_19-14-56.png



XboxCrash.PNG
 
Could that not be a driver issue or a bug in the game? These cards are brand new and so is the game.

Yeah, it definitely could (especially with Tomb Raider). Dying Light is pretty old now, though, and Far Cry 5 has been out for months.

Thinking about it, the USB problems could be related to a USB PCIE card I added in recently (though after I got the Ti, and I was getting crashes before that point).
 
So I removed the USB card, that may have been the cause of the blue screen, but I'm still getting crashing.

Both RotTR and Dying Light. I even moved the card to the second PCIE slot, still getting crashing.

I'm not sure what else I can try, I think the card is bunk. Either that or the Nvidia driver is bunk.
 
Finally, I put in just the new replacement card (0324) and it's still crashing.

I can't believe I got 3 bad cards in a row. I know there are problems, but that seems highly unlikely.

Maybe something else is up with my system? I don't know. This is really strange.
 
Finally, I put in just the new replacement card (0324) and it's still crashing.

I can't believe I got 3 bad cards in a row. I know there are problems, but that seems highly unlikely.

Maybe something else is up with my system? I don't know. This is really strange.

3 Cards in a row seems pretty bad luck, have you try testing it on your other rig?
 
So I did some more testing. Trying to disable as much stuff as I can, but still getting crashing.

I uninstalled the Nvidia USB-C driver. I disabled Surround and SLI. I disabled my 2 side monitors, so just a single monitor and a single card (the older one). Also set power limit to 80%.

At first it was looking okay but Sot Tomb Raider crashed after about 20 minutes when adjusting settings. There was no blue screen or anything, the game just disappeared.

Tried Dying Light. Was able to play around 20 minutes as well, but it crashed on an menu screen. Same MO, the game just disappeared and I was back at the desktop.

I think at this point I will have to remove the older card from my system and see if the 0324 replacement by itself is okay.
Tried underclocking vram?
 
Those having issues with their cards, I am sure many have tried many of these below but maybe could help if not done
  1. If you have another computer, friends, relatives etc. Put those cards in another machine to rule out the current machine.
  2. Also if one have a spare drive or a drive they can remove the files to some other location; do a clean install, bare bone as much as possible, which also keeps current configuration safety kept
  3. Remove the bad card and put back in the previous card if still own and test to see if your rig performed like it use to - remote possibility that the new card is damaging your rig
  4. From past experience, with a Radeon Nano I would get crashing, corruption, machine shutting down - changing out the power supply did nothing. What it turned out to be was the card was not seated correctly, it looked really close but it was hitting the case preventing full engagement. Anyways a little bending and the card seated correctly and it worked perfectly afterwards. Hours and hours, money etc. spent
  5. Check all the contacts on the card that they are in good shape, clean and will fully install in the socket. Are they different enough so that on some sockets or motherboards they have issues? Missalignment in the socket? However slight could be disastrous. Maybe put weight on top of the card if you can put your case in position to do that or force sideways or while operating wiggle the card to see if it is sensitive (if it is as in you wiggle the card and you have issues then that would indicate a socket and card issue). If properly seated the card would be unaffected by wiggling a little
  6. Shift PCIe from 3 to 2 in the bios if available for testing purposes - anything non-standard would be alleviated
  7. Cool the sucker down, cool down the room as cold as possible, blow a fan on the card to see if temperature sensitive
    1. Some saw results when reducing power limit which would decrease heat - possible temperature issue
    2. Some reduced memory speed - less heat . . .
    3. Machine will crash after a period of time - heat build up?
    4. If temperature sensitive maybe a build quality issue in cooling - looks like Nvidia FE's did not standardize on a particular TIM and used various different vendors which I would think would also cause a lot of variations in results
  8. If spending so much time, especially considering price and you are not satisfied - GET YOUR MONEY BACK. Depending upon where you live you have rights making it mandatory for getting your money back if you want to do that. You can always rebuy later when the bugs are worked out or have a better opportunity with a newer better card.
 
This time I removed the second card and SLI bridge, in case they were complicating things. Still got crashing in Tomb Raider and Dying Light after about 10 minutes.

View attachment 117532


View attachment 117533

I see an issue with xbox? Disable Windows game mode. Be sure anything related to that shit is disabled even the xbox app if you dont use it..

Also this remembered me about disabling windows full screen optimizations in the game .exe file. This is a practice made since win10 creators.
 
Finally, I put in just the new replacement card (0324) and it's still crashing.

I can't believe I got 3 bad cards in a row. I know there are problems, but that seems highly unlikely.

Maybe something else is up with my system? I don't know. This is really strange.

Sorry for your bad luck, man that really sucks.

It is possible to get 3 bad cards in a row. I got 3 MSI GTX 680 cards that were faulty. One was DOA, the other two were causing crashes and blue screens with my computer. The fourth card worked perfectly and is still working perfectly. If your system was fine before the new cards were installed, and the issues only started happening when you got the new cards, then its more than likely faulty cards. There shouldn't be that much effort needed to get a new card working.

Also, I don't believe for one second that it's a power supply issue.
 
1 maybe 2 cards problem is a possibility, 3 cards acting up makes me think something else is a problem in the system.
 
I had 3 bad cards originally. One dead, one artifacting/blue screening, and one with a broken LED logo. It's certainly possible. My current cards seem to be working alright though.
 
  • Like
Reactions: N4CR
like this
I've been continuing my stress-testing with continuous runs of Timespy and FFXV gaming. Temps average between 68 and 84 C. I have a +180 on the core and +650 on the memory via Precision X1. No crashes or artifacting yet.

I still can't see any of the symptoms reported. Though I did note that my card is from the 0323 batch that some people have claimed there are problems with.
 
So I think I may have figured it out and I kind of feel like a jackass.

Turns out, my RAM was heavily overclocked and I think that may have been the problem. I reset BIOS to default so now it's at 2133 and things seem working.

Played about 2 hours of SotTR and Dying Light without any crashing. I still think the first card I got was bad, but maybe it was the RAM all along.

That said, I do recall specifically testing with no OC at one point, and there were still problems, so I think this may have been a conspiracy of settings or something.

This is what happens when you change too many things at the same time. Really hope that's the end of this ordeal.
 
I almost thought I had a bad card last night with my evga 2080ti. I couldn't get the led colors to sync and had applied the bios update patch in the forum. On reboot, the video card drivers were not detecting the card. I did a clean wipe of the drivers and reinstalled and all was well and I could play with colors. The default color on the evga 2080 ti xc was full blue.

Am able to report in that max temp I saw in 6 hrs of gaming and benchmarking was 81C with a 125mhz OC and 525 mhz mem overclock. I wasn't testing for max OC, rather my limit temp for being comfortable for noise/heat. Beyond that the fans would have to be 2500rpm or more which made them audible enough to distract me while gaming.

Stats wise: 2/2 good cards, 1 Asus Dual fan, 1 EVGA XC
 
So I think I may have figured it out and I kind of feel like a jackass.

Turns out, my RAM was heavily overclocked and I think that may have been the problem. I reset BIOS to default so now it's at 2133 and things seem working.

Played about 2 hours of SotTR and Dying Light without any crashing. I still think the first card I got was bad, but maybe it was the RAM all along.

That said, I do recall specifically testing with no OC at one point, and there were still problems, so I think this may have been a conspiracy of settings or something.

This is what happens when you change too many things at the same time. Really hope that's the end of this ordeal.


I think that's a great lesson though for all of us when running in to problems...reset everything to stock and move from there. Glad you seem to have things figured out though!
 
So I think I may have figured it out and I kind of feel like a jackass.

Turns out, my RAM was heavily overclocked and I think that may have been the problem. I reset BIOS to default so now it's at 2133 and things seem working.

Played about 2 hours of SotTR and Dying Light without any crashing. I still think the first card I got was bad, but maybe it was the RAM all along.

That said, I do recall specifically testing with no OC at one point, and there were still problems, so I think this may have been a conspiracy of settings or something.

This is what happens when you change too many things at the same time. Really hope that's the end of this ordeal.

Lived and learn, glad you figured it out!
 
He doesn't even sell FE cards. These things are ticking time bombs, so let's see where the situation stands after people have them for longer than a week or a month. If I were buying today, I wouldn't touch anything using the FE PCB. Now I'm stuck playing RMA games with Zotac. Fun.
 
He doesn't even sell FE cards. These things are ticking time bombs, so let's see where the situation stands after people have them for longer than a week or a month. If I were buying today, I wouldn't touch anything using the FE PCB. Now I'm stuck playing RMA games with Zotac. Fun.

I've had my 2 2080 Ti FEs since October 8th, 60+ gaming hours in, no overclocking thus far but so far solid, nothing personally would steer me away from them.
 
I've had my 2 2080 Ti FEs since October 8th, 60+ gaming hours in, no overclocking thus far but so far solid, nothing personally would steer me away from them.

Not being able to overclock without having to worry about frying your cards is not my idea of a $1200 product that is "designed to overclock". I hope yours are fine, I really do. In a few months I hope the problem is narrowed to a few bad batches. It's still totally unacceptable we are even having these discussions to begin with.
 
  • Like
Reactions: N4CR
like this
+100mhz and 115% power target. Sits at around 2ghz. Overclocks just fine.

I had the exact same experience as you, for a solid 29 days. I hope yours is fine and you don't get a sudden black screen of death like I did. Might want to lower that power limit a tad just to play it safe.
 
Not being able to overclock is not my idea of a $1200 product that is "designed to overclock".

It's not that I can't overclock that I know, I simply haven't bothered to this point. I know this is an enthusiasts' forum but overclocking for anything I'm playing now with a number of the titles supporting SLI/mGPU would largely be academic since I'm blowing well past 60 FPS @ 4k with a 60 FPS screen.
 
I had the exact same experience as you, for a solid 29 days. I hope yours is fine and you don't get a sudden black screen of death like I did. Might want to lower that power limit a tad just to play it safe.
I’m sitting under 80c - I’m well aware of the temp issues.
 
So I put the OC back on, CPU to 4.7 and RAM at 4133. This time I bumped the RAM voltage to 1.375 (it was 1.35 default) and this may have helped.

I will do more testing, but the OC has gained me a big boost in FPS so I'd like to keep it on if I can get it stable.
 
Back
Top