GeForce RTX 2080 Ti FAILS After Gaming for 2 Hours @ [H]

Discussion in 'nVidia Flavor' started by FrgMstr, Nov 9, 2018.

  1. N4CR

    N4CR 2[H]4U

    Messages:
    3,615
    Joined:
    Oct 17, 2011
    This. Most hardware failures occurs in the first two weeks, look at bell curves.
     
    AceGoober likes this.
  2. bill_d

    bill_d Limp Gawd

    Messages:
    193
    Joined:
    Jun 8, 2007
    well after a little over week no problem with my Strix yet

    knock on wood and everything
     
    AceGoober and CoreStoffer like this.
  3. DrBorg

    DrBorg Gawd

    Messages:
    555
    Joined:
    Jan 22, 2005
    The internal metallization layers are susceptible to electromigration, physical traces are not, except at Extreme current densities; and 100A at 1.4V won't really electromigrate except as a cloud of copper vapor, if you're talking about interposer or PCB traces.

    The power traces should be on a whole LAYER, called a Power Plane; they're connected by at least 1 via per pin to the die, vertically thru the PCB.

    The internal metallization is, but the current densities on a 100micron deposited metal interconnect is pretty extreme. :)

    But this would be a hard fail; like poof, it's gone, and shorting a power supply rail due to metal vapor all over the chip, and volcanoing the chip. (the hole in the chip looks like a volcano; btdt) (I can't find a reference to the term, must be an Older American term.)


    WTF? Hardware failures are Not a Bell curve, it's called a Bathtub Curve, and the lower end is "Infant Mortality", the upper end is "Wearout Phenomenon", which includes electromigration as a cause.

    https://en.wikipedia.org/wiki/Bathtub_curve

    If that's a Ham license as a name, you need to retake some tests... :)


    Electromigration in a new product is referred to by the Good Engineers in the group as "Lightbulb Effect", and you'll be needing a new job after one of those designs hit the field, lol.

    Disclaimer: I design Electronics...

    The heat that was shown in one of the thermal images show it's the actual Memory parts that are overheating, 90°C is not an operating temp for a memory chip, for very long; memory errors are almost certain at those temps.

    I looked up the datasheet; there's some serious bogosity there. Micron is saying the Case temperature is ok at 95°C, but the maximum Storage temperature is 125°C.

    If the case is at 95 degrees, and the max is 125, that's BS; there's no way.

    The datasheet says there's an ON DIE temperature sensor; anyone got data from one, on a card that's failing?
     
  4. reaper12

    reaper12 2[H]4U

    Messages:
    2,224
    Joined:
    Oct 21, 2006
    Don't know about USA, but in Europe guys who are returning their 2080Ti's are been told that the delays are because of unusually high return rates.
     
    AceGoober, jnemesh and cybereality like this.
  5. DrBorg

    DrBorg Gawd

    Messages:
    555
    Joined:
    Jan 22, 2005
    Maybe because their consumer protection laws don't permit lying to their customers?

    If you cut out the lies, there's wouldn't be any commercials here. :)
     
  6. Redmud

    Redmud [H]Lite

    Messages:
    107
    Joined:
    Jan 30, 2007
    My understanding is that the EVGA XC (and XC Utlra) cards as well as the MSI Ventus cards use the reference boards. Has anyone found where these are failing?
     
    AceGoober likes this.
  7. Digital Viper-X-

    Digital Viper-X- [H]ardForum Junkie

    Messages:
    13,608
    Joined:
    Dec 9, 2000
    Doesn't electromigration happen over time, as in eventually, as a result of actually running the IC?
     
  8. N4CR

    N4CR 2[H]4U

    Messages:
    3,615
    Joined:
    Oct 17, 2011
    Whoops, similar curve just inverted aka wrong name, I don't do mathsy graph distribution stuff like that much for my area of work. Pretty clear what I meant though, most failures within first two weeks would not be a bell curve now, would it..
    Anyway, that's what I was told by a large manufacturer of laptop systems when I worked for them and also handled warranty + insurance claims, I have walked the walk over multiple launches, not just engineered it ;)


    Could storage temp be higher because the memory doesn't have to reliably store data at that temp? Is it possible that it can run hotter, but the hotter it runs the more errors it may have?
    To me the behaviour looks like what happens when you OC memory too much.
     
  9. DrBorg

    DrBorg Gawd

    Messages:
    555
    Joined:
    Jan 22, 2005
    Storage temperature is a function of the Silicon; over The storage temp (125C) it starts rediffusing the N and P type material back into random silicon.

    I've ran 2N3055 transistors at 185 degrees C, and after a while, they aren't transistors anymore; at least, they won't turn OFF anymore. :)

    That's why the really high power devices for Radio transmitters are still Toobz. :) Water Cooled Tubes, but still vacuum tubes.

    The package temp is always lower than the core temp, and I'd love to know what the interior temp is when the outside is at 90C. :D

    This video shows WTF 90C I'm talking about:


    Also realize: this is the back side of the card, and there's a heat sink on the other side of the die; this is thru the PCB. :O

    The max temp is a function of the complexity of the chip; finer architecture is more sensitive.

    I'd bet memory is at the bleeding edge of complexity, like microprocessors.


    I did find a weird effect, tho: a Flash-based Digital Potentiometer we used was suspected of a problem, so we tested it to failure in an environmental chamber.

    It would fail after ~1M write cycles, and fail to overwrite 1's. We found by accident that heatsoaking it at 160C for an hour allowed us to write to it another 1M times, and it could be fixed multiple times.

    We never expected that at all.

    The guys that made the chip were like, "that voids the warranty, and we don't advise that", lol.


    There are Silicon Carbide, Gallium Nitride, and Diamond transistors coming onto the market that are good to 1000°C+. When they make chips out of those, we won't need water blocks, we can use molten Tin cooling loops.

    :)
     
    Last edited: Nov 11, 2018
    AceGoober and N4CR like this.
  10. kamikazi

    kamikazi Limp Gawd

    Messages:
    179
    Joined:
    Jan 19, 2006
    Good start. How do we make it work?
     
  11. evilpaul

    evilpaul Limp Gawd

    Messages:
    181
    Joined:
    Dec 31, 2016

    Is this symptomatic of what other people are seeing?
     
    AceGoober likes this.
  12. cybereality

    cybereality [H]ardness Supreme

    Messages:
    4,326
    Joined:
    Mar 22, 2008
    No, I haven't seen that on mine.
     
  13. Slade

    Slade 2[H]4U

    Messages:
    2,539
    Joined:
    Jun 9, 2004
    Have not seen that on mine. My 2nd screen is only a 60hz model which I keep portrait mode for browsing. nvidia has some weird bugs with multi monitor which may or may not be killing the cards. I know multi monitor runs ~55W idle which is excessive. I also have seen in notes about edge browser had 2d issues in the past, could also be driver related hitting other browsers as well.
     
  14. evilpaul

    evilpaul Limp Gawd

    Messages:
    181
    Joined:
    Dec 31, 2016
    I've had mine get stuck at 1350 MHz and ~70W idle and G-SYNC wouldn't turn back on a few times. It is usually idling at 300 MHz at ~32W. Temps seem to have dropped with the driver update this week. The power consumption numbers are what HWiNFO64 reports.

    The left monitor is 4K 60 Hz. The right one is 1440p running at 144 Hz.
     
    AceGoober and Sprayingmango like this.
  15. lostin3d

    lostin3d [H]ard|Gawd

    Messages:
    1,962
    Joined:
    Oct 13, 2016
    I could easily be wrong but that sounds like the driver issue that NV recently put a hotfix out for. If so, I wouldn't be surprised if there's still some situations still occurring. What I remember is that it had something to do with 2 monitor, g-sync and non-g-sync combos.
     
  16. lostin3d

    lostin3d [H]ard|Gawd

    Messages:
    1,962
    Joined:
    Oct 13, 2016
    How about:

    "Torturing users, reviewers, vendors, and wallets alike, Turing promises you pay now and play later"
     
    AceGoober and cybereality like this.
  17. schlitzbull

    schlitzbull Limp Gawd

    Messages:
    276
    Joined:
    Feb 19, 2014
    Ugh mine started locking up like this while playing last night. I didn't get any artifacting or crashes, but brief lock ups that definitely seemed different from frame drops. Plus the game I was playing wasn't taxing for the card either. As with my other issues i've had, this was using gamestream to my TV while playing at a locked 60hz. I hope your video isn't a sign of things to come for me. At least I haven't sold my 1080 yet.
     
    vjhawk likes this.
  18. 3dfan

    3dfan [H]Lite

    Messages:
    84
    Joined:
    Jun 2, 2016
    to everyone:

    please excuse me if i make a dumb, not too tech pro statement, since i may not be so tech experienced as many user here, or if maybe this was already discussed, i havent read this whole thread, (just searched for words "1709" "1803" "1809" with no results), but this video, and other i recently watched, dont remeber exactly where, where a RTX user is also having similar issues while playing shadow of the tomb rider where the game has a lot of pauses while in game, just like this video, reminds me that i recently did a test on a spare HDD with a clean installed windows 10 pro version 1809, 17763.55 with latest updates and with NV drivers 416.16, 416.34, and even the latest 416.81 with a GTX980TI , i7 4770k @4.2hz, 8GB ram system and i have experienced very similar issues like those videos on rise of the tomb rider testing at full settings no AA@ 1920x1080 dx12 API, where the game has a lot of in game pauses similar to those videos. however, the interesting thing is that the issue does not occur in a clean installation of windows 10 pro version 1709 using same settings, same hardware, same drivers, same game scene.

    i have the feeling that in 1809 there may be something broken with VRAM management since those in game pauses are typical issues when a video card runs out of VRAM. it would be interesting to know if there are RTX users on the 1709 version having all those issues which seems related to VRAM (visual artifacting, pauses, etc).

    so even if my test was not performed with RTX card, i think it can be worth to share my experience, and since 1809 is known to be a very rushed update, who knows? it could be a potencial culprit of all this RTX mess? in fact a have experienced other gaming issues with 1809 like in evil within 2, which has some random mouse stutters when i play it at 75 fps@75hz, an issue that also does not occur in 1709
     
    Last edited: Nov 12, 2018
  19. DrBorg

    DrBorg Gawd

    Messages:
    555
    Joined:
    Jan 22, 2005
    Wouldn't it be funny if Win10 were killing the cards?

    That would be really funny to me; Win10 the eternal Beta OS.

    "We're Winning!!"

    :rofl:
     
    AceGoober, Dayaks and spine like this.
  20. zehoo

    zehoo Limp Gawd

    Messages:
    250
    Joined:
    Aug 22, 2004
    It's stories like this that make me glad I decided to cheap out and stick with my gtx 1080 while waiting for 7nm parts.
     
    dvsman and N4CR like this.
  21. Slade

    Slade 2[H]4U

    Messages:
    2,539
    Joined:
    Jun 9, 2004
    Interesting, running 1803 here. So far still works. Makes me hesitant to go Fall update...
     
  22. bill_d

    bill_d Limp Gawd

    Messages:
    193
    Joined:
    Jun 8, 2007
    i'm on 1809 upgraded both systems first day and could not roll back by the time problems showed up
    but my 2080 ti Strix is running fine and so is my 1080 ti strix on other system
     
  23. Domingo

    Domingo Skip My Posts

    Messages:
    16,922
    Joined:
    Jul 30, 2004
    Guess I'm extra glad I got an EVGA card rather than an FE.
     
    TahoeDust likes this.
  24. Nytegard

    Nytegard 2[H]4U

    Messages:
    3,087
    Joined:
    Jan 8, 2004
    I'm getting those stutters too in Call of Duty when I have SLI enabled. The game might graphically freeze for 10-20 seconds at a time before proceeding to normal. Removing SLI they went away, but I'm hoping this is just a driver problem, as I haven't seen any other issues.
     
  25. Slade

    Slade 2[H]4U

    Messages:
    2,539
    Joined:
    Jun 9, 2004
    So many configurations to consider.
     
  26. mothandras

    mothandras n00b

    Messages:
    32
    Joined:
    Jan 8, 2012
    Looks like Nvidia has pulled the 2080ti FE cards from its own website, they are no longer listed at all. Not notify or anything.
     
  27. Nytegard

    Nytegard 2[H]4U

    Messages:
    3,087
    Joined:
    Jan 8, 2004
    That's not ominous or anything...
     
    AceGoober and cybereality like this.
  28. Digital Viper-X-

    Digital Viper-X- [H]ardForum Junkie

    Messages:
    13,608
    Joined:
    Dec 9, 2000
    Nothing to see here.. move along now
     
    AceGoober, N4CR, Verado and 2 others like this.
  29. Domingo

    Domingo Skip My Posts

    Messages:
    16,922
    Joined:
    Jul 30, 2004
  30. Nytegard

    Nytegard 2[H]4U

    Messages:
    3,087
    Joined:
    Jan 8, 2004
  31. cybereality

    cybereality [H]ardness Supreme

    Messages:
    4,326
    Joined:
    Mar 22, 2008
    That's really suspect. What do they know?
     
  32. FrgMstr

    FrgMstr Just Plain Mean Staff Member

    Messages:
    48,087
    Joined:
    May 18, 1997
    I ordered a thermal imaging camera to look into this a little closer.

    Also, I talked to a couple of people that actually BUILD video cards and I got the same feedback as to what I expected.....memory issues. And yes the card does have Micron on it.
     
    AceGoober, N4CR, Aluminum and 3 others like this.
  33. Mchart

    Mchart 2[H]4U

    Messages:
    3,092
    Joined:
    Aug 7, 2004
    Multi monitor is reported as causing problems and I don't believe the latest update fixed it at all. The only issue fixed is the g-sync related BSOD currently.
     
  34. cjcox

    cjcox [H]ard|Gawd

    Messages:
    1,093
    Joined:
    Jun 7, 2004
    But it was the greatest 2 hours of gaming of my life. Looking forward to Nvidia365 where we can rent our cards.
     
    N4CR and d50man like this.
  35. Slade

    Slade 2[H]4U

    Messages:
    2,539
    Joined:
    Jun 9, 2004
    Guess bad batch micron is high on the suspect list or a cooling implementation failure.
     
  36. Gripen90

    Gripen90 n00b

    Messages:
    30
    Joined:
    Aug 21, 2012
    My Gainward RTX 2080Ti Phoenix Golden Sample finally got returned for RMA, it should arrive at the retailer tomorrow and I am curious if they'll just refund me. They have said that because of the many RTX 2080Ti issues they are currently not restocking those models.

    Also Nvidia are changing the prefix on the cards now.
    713b5d.png
     
    Legendary Gamer, vjhawk, N4CR and 2 others like this.
  37. rgMekanic

    rgMekanic [H]ard|News Staff Member

    Messages:
    3,741
    Joined:
    May 13, 2013
    2180 Ti++ will be $3k and have flying toasters.
     
    AceGoober and lostin3d like this.
  38. jonneymendoza

    jonneymendoza [H]ardness Supreme

    Messages:
    6,250
    Joined:
    Sep 11, 2004
    so far my 2080 ti has been fine. touch wood!
     
  39. shansoft

    shansoft [H]ardness Supreme

    Messages:
    5,073
    Joined:
    Oct 20, 2008
    Holy FUCK

    My EVGA 2080 Ti XC just burst into flames....

    WHAT THE FUCK....
     

    Attached Files:

    AceGoober, Fleat, rgMekanic and 6 others like this.
  40. Verado

    Verado Limp Gawd

    Messages:
    162
    Joined:
    May 16, 2017
    Loving this launch!
    Hope my 2070 doesnt catch fire or turn into a classic arcade when i get it next week!
     
    AceGoober, lostin3d and N4CR like this.