GeForce RTX 2080 Ti FAILS After Gaming for 2 Hours @ [H]

Discussion in 'nVidia Flavor' started by FrgMstr, Nov 9, 2018.

  1. cybereality

    cybereality [H]ardness Supreme

    Messages:
    4,273
    Joined:
    Mar 22, 2008
    It took me a second to get it.
     
  2. Mchart

    Mchart 2[H]4U

    Messages:
    3,054
    Joined:
    Aug 7, 2004
    I'm on week two of my FE card still running fine. Hoping it stays that way, but who knows.
     
  3. CoreStoffer

    CoreStoffer Limp Gawd

    Messages:
    213
    Joined:
    Nov 20, 2008
    Damn. Which Asus model is that? A vanilla or one of the ROG STRIX?
     
  4. doox00

    doox00 2[H]4U

    Messages:
    2,891
    Joined:
    Aug 28, 2005
    It is the Dual-RTX2080TI-011G, was little over a month ago, had a bunch of nowinstock alerts set at the time and was able to snag one of these for the minute it was in stock at newegg.
     
    AceGoober and CoreStoffer like this.
  5. CoreStoffer

    CoreStoffer Limp Gawd

    Messages:
    213
    Joined:
    Nov 20, 2008
    Thanks for your reply. I think that card is using a reference PCB, unlike the Asus ROG STRIX series. I know it has a different cooler setup, but probably still nVidia reference PCB.
     
  6. Ranger101

    Ranger101 [H]Lite

    Messages:
    75
    Joined:
    Sep 11, 2015
    Agreed, both Kyle and Steve deserve RESPEK for their combative attitude towards corporate BS.
     
    GHRTW, Sharps97, Starrbuck and 3 others like this.
  7. Gripen90

    Gripen90 n00b

    Messages:
    30
    Joined:
    Aug 21, 2012
    Well my Gainward RTX 2080Ti Phoenix Golden Sample which I got monday this week also decided it wanted to be RMA'ed. Suddenly today wanting to game, it would display star flashing like artifacts in all different colours - looking like something on a christmas tree, and afterward the image would stutter to a halt and the then black screen, and then only a hard reset would get me back into windows. The card has now been packed and I'm awaiting an RMA number....

    My wifes MSI RTX 2080 Duke OC knocking on wood still lives.
     
    AceGoober and ltron like this.
  8. SixFootDuo

    SixFootDuo [H]ardness Supreme

    Messages:
    5,441
    Joined:
    Oct 5, 2004
    Both my friend and I are on our 2nd 2080 Ti ... I sold my first system I built a few weeks ago. I gamed the hell out of that card and have gamed the hell out of this one as well.

    I have many many many hours "overclocked" on both. The 2nd card, the past 2+ weeks .. .6 to 8 hours a day of Black Ops 4 Black Out.

    The good news is, you guys will get new cards. The bad news, it's gonna be a hassle or two.

    Good luck.
     
    Last edited: Nov 10, 2018
  9. Philairflow

    Philairflow n00b

    Messages:
    15
    Joined:
    Sep 12, 2017
    This is so sad... And despite the absurd pricing it is hard to get a good custom like EVGA FTW or Asus Strix here in Germany...
     
  10. Chris_B

    Chris_B [H]ardness Supreme

    Messages:
    5,048
    Joined:
    May 29, 2001
  11. ebduncan

    ebduncan [H]ard|Gawd

    Messages:
    1,995
    Joined:
    Feb 1, 2008
    I don't really see much of a issue, as long as they honor their warranties. Probably some sort of electromigration issue, hopefully a slight revision in the stepping will solve these issues. Either way sucks for owners who have to deal with the RMA process on their new 1000$+ cards, and really sucks for the guys who are slapping water blocks on day one who are just SOL (although some places still honor those warranties)
     
    N4CR likes this.
  12. Digital Viper-X-

    Digital Viper-X- [H]ardForum Junkie

    Messages:
    13,579
    Joined:
    Dec 9, 2000
    Electromigration after a few hours?
     
  13. Aireoth

    Aireoth 2[H]4U

    Messages:
    2,305
    Joined:
    Oct 12, 2005
    If the numbers pan out in line with [H]'s poll (yes I know all the issues with the small poll) and failure is around the 10% mark, after a few hours use, then it speaks more to a manufacturing issue of some kind (GDDR, PCB, thermals).
     
  14. Comixbooks

    Comixbooks Ignore Me

    Messages:
    12,501
    Joined:
    Jun 7, 2008
    I had two lockups with Hunt Showdown a Crytek rep from Korea on Steam told me to disable start up programs while the game was running at the same time which is a old Pc trick back from the Ultima Online or Quake days.
     
    AceGoober and geok1ng like this.
  15. ebduncan

    ebduncan [H]ard|Gawd

    Messages:
    1,995
    Joined:
    Feb 1, 2008
    ya its very possible if the traces are not of the proper size in some parts. Failures like this are very common on early die revisions. I wonder what the stepping number they are on. A,B,C? most A dies have issues like this, and they are fixed with a major b stepping.
     
    N4CR likes this.
  16. Digital Viper-X-

    Digital Viper-X- [H]ardForum Junkie

    Messages:
    13,579
    Joined:
    Dec 9, 2000
    I'm fairly certain you wont get electromigration after a few hours
     
    DrBorg likes this.
  17. Fragment

    Fragment n00b

    Messages:
    6
    Joined:
    Aug 5, 2017
    I logged in after a long time just to say that this statement made me ... for the first time in my life ... spill out coffee over my keyboard.

    edit: On another note... who is "HotHardware" anyways, I scrolled through 2 years of their videos on YT to be greeted by video screenshots that basically cover everything about NVIDIA ... and when it comes to AMD they have 1-2 vids about Ryzen CPUs and some review of a workstation PRO card.
    Nothing about Vega, Polaris etc.. no gaming stuff. Are they strongly affiliated with NV or is that just a very strange coincidence??
     
    Last edited: Nov 10, 2018
    N4CR likes this.
  18. DocNo

    DocNo Gawd

    Messages:
    656
    Joined:
    Apr 23, 2012
    Ouch! Failures on new kit are never fun. I had an MSI 970 that was notorious for failures; after the 2nd RMA I just choose to write it off :(

    I nabbed a Asus 180ti factory overclock for MSRP on launch last year, and I was a little wary of being in the first wave of a new generation. There are enough of these stories with the 2080's and the lack of significant performance differences to where I feel no need to replace my 1080ti.

    I sure wish I would have sold it to the cryptominers when they were going for over $1K - oh well...
     
  19. Patton187

    Patton187 Gawd

    Messages:
    678
    Joined:
    Feb 12, 2012
    Dilly dilly
     
  20. NKD

    NKD [H]ardness Supreme

    Messages:
    7,447
    Joined:
    Aug 26, 2007
    This dude had row MSI died lol and comment right below it had gigabyte die. May be nvidia just rushed the big ass die? Hopefully everyone who spent shit load of money on these cards is taken care of.


     
    Solhokuten likes this.
  21. Brent_Justice

    Brent_Justice [H] Video Card Managing Editor Staff Member

    Messages:
    17,794
    Joined:
    Apr 17, 2000
    No, and there is a theory that maybe, custom PCBs are fair better in regards to these issues. It may be the reference PCB design, that has issues, the PCB, or the memory. If it's a custom PCB like on the ASUS STRIX, issues don't seem to occur (to my knowledge) but this is all a theory at this point.
     
  22. ebduncan

    ebduncan [H]ard|Gawd

    Messages:
    1,995
    Joined:
    Feb 1, 2008
    then you don't understand what electromigration is. You are just confused because most retail parts don't have these issues as they have been tested, and revised to avoid such issues. Electromigration can happen at any point of a products lifespan. If the die is flawed from the start you can see failure on first use or hours after.
     
    AceGoober and N4CR like this.
  23. sk3tch

    sk3tch [H]ard|Gawd

    Messages:
    1,448
    Joined:
    Sep 5, 2008
    2080 Ti FE running since 10/5 with tons of multi-hour sessions of ARK: Survival Evolved under my belt. No issues.

    Been playing a ton of BFV this weekend, too - no issues.
     
    Last edited: Nov 11, 2018
  24. N4CR

    N4CR 2[H]4U

    Messages:
    3,466
    Joined:
    Oct 17, 2011
    This. Most hardware failures occurs in the first two weeks, look at bell curves.
     
    AceGoober likes this.
  25. bill_d

    bill_d Limp Gawd

    Messages:
    193
    Joined:
    Jun 8, 2007
    well after a little over week no problem with my Strix yet

    knock on wood and everything
     
    AceGoober and CoreStoffer like this.
  26. DrBorg

    DrBorg Gawd

    Messages:
    563
    Joined:
    Jan 22, 2005
    The internal metallization layers are susceptible to electromigration, physical traces are not, except at Extreme current densities; and 100A at 1.4V won't really electromigrate except as a cloud of copper vapor, if you're talking about interposer or PCB traces.

    The power traces should be on a whole LAYER, called a Power Plane; they're connected by at least 1 via per pin to the die, vertically thru the PCB.

    The internal metallization is, but the current densities on a 100micron deposited metal interconnect is pretty extreme. :)

    But this would be a hard fail; like poof, it's gone, and shorting a power supply rail due to metal vapor all over the chip, and volcanoing the chip. (the hole in the chip looks like a volcano; btdt) (I can't find a reference to the term, must be an Older American term.)


    WTF? Hardware failures are Not a Bell curve, it's called a Bathtub Curve, and the lower end is "Infant Mortality", the upper end is "Wearout Phenomenon", which includes electromigration as a cause.

    https://en.wikipedia.org/wiki/Bathtub_curve

    If that's a Ham license as a name, you need to retake some tests... :)


    Electromigration in a new product is referred to by the Good Engineers in the group as "Lightbulb Effect", and you'll be needing a new job after one of those designs hit the field, lol.

    Disclaimer: I design Electronics...

    The heat that was shown in one of the thermal images show it's the actual Memory parts that are overheating, 90°C is not an operating temp for a memory chip, for very long; memory errors are almost certain at those temps.

    I looked up the datasheet; there's some serious bogosity there. Micron is saying the Case temperature is ok at 95°C, but the maximum Storage temperature is 125°C.

    If the case is at 95 degrees, and the max is 125, that's BS; there's no way.

    The datasheet says there's an ON DIE temperature sensor; anyone got data from one, on a card that's failing?
     
  27. reaper12

    reaper12 2[H]4U

    Messages:
    2,206
    Joined:
    Oct 21, 2006
    Don't know about USA, but in Europe guys who are returning their 2080Ti's are been told that the delays are because of unusually high return rates.
     
    AceGoober, jnemesh and cybereality like this.
  28. DrBorg

    DrBorg Gawd

    Messages:
    563
    Joined:
    Jan 22, 2005
    Maybe because their consumer protection laws don't permit lying to their customers?

    If you cut out the lies, there's wouldn't be any commercials here. :)
     
  29. Redmud

    Redmud [H]Lite

    Messages:
    105
    Joined:
    Jan 30, 2007
    My understanding is that the EVGA XC (and XC Utlra) cards as well as the MSI Ventus cards use the reference boards. Has anyone found where these are failing?
     
    AceGoober likes this.
  30. Digital Viper-X-

    Digital Viper-X- [H]ardForum Junkie

    Messages:
    13,579
    Joined:
    Dec 9, 2000
    Doesn't electromigration happen over time, as in eventually, as a result of actually running the IC?
     
  31. N4CR

    N4CR 2[H]4U

    Messages:
    3,466
    Joined:
    Oct 17, 2011
    Whoops, similar curve just inverted aka wrong name, I don't do mathsy graph distribution stuff like that much for my area of work. Pretty clear what I meant though, most failures within first two weeks would not be a bell curve now, would it..
    Anyway, that's what I was told by a large manufacturer of laptop systems when I worked for them and also handled warranty + insurance claims, I have walked the walk over multiple launches, not just engineered it ;)


    Could storage temp be higher because the memory doesn't have to reliably store data at that temp? Is it possible that it can run hotter, but the hotter it runs the more errors it may have?
    To me the behaviour looks like what happens when you OC memory too much.
     
  32. DrBorg

    DrBorg Gawd

    Messages:
    563
    Joined:
    Jan 22, 2005
    Storage temperature is a function of the Silicon; over The storage temp (125C) it starts rediffusing the N and P type material back into random silicon.

    I've ran 2N3055 transistors at 185 degrees C, and after a while, they aren't transistors anymore; at least, they won't turn OFF anymore. :)

    That's why the really high power devices for Radio transmitters are still Toobz. :) Water Cooled Tubes, but still vacuum tubes.

    The package temp is always lower than the core temp, and I'd love to know what the interior temp is when the outside is at 90C. :D

    This video shows WTF 90C I'm talking about:


    Also realize: this is the back side of the card, and there's a heat sink on the other side of the die; this is thru the PCB. :O

    The max temp is a function of the complexity of the chip; finer architecture is more sensitive.

    I'd bet memory is at the bleeding edge of complexity, like microprocessors.


    I did find a weird effect, tho: a Flash-based Digital Potentiometer we used was suspected of a problem, so we tested it to failure in an environmental chamber.

    It would fail after ~1M write cycles, and fail to overwrite 1's. We found by accident that heatsoaking it at 160C for an hour allowed us to write to it another 1M times, and it could be fixed multiple times.

    We never expected that at all.

    The guys that made the chip were like, "that voids the warranty, and we don't advise that", lol.


    There are Silicon Carbide, Gallium Nitride, and Diamond transistors coming onto the market that are good to 1000°C+. When they make chips out of those, we won't need water blocks, we can use molten Tin cooling loops.

    :)
     
    Last edited: Nov 11, 2018
    AceGoober and N4CR like this.
  33. kamikazi

    kamikazi Limp Gawd

    Messages:
    166
    Joined:
    Jan 19, 2006
    Good start. How do we make it work?
     
  34. evilpaul

    evilpaul Limp Gawd

    Messages:
    181
    Joined:
    Dec 31, 2016

    Is this symptomatic of what other people are seeing?
     
    AceGoober likes this.
  35. cybereality

    cybereality [H]ardness Supreme

    Messages:
    4,273
    Joined:
    Mar 22, 2008
    No, I haven't seen that on mine.
     
  36. Slade

    Slade 2[H]4U

    Messages:
    2,543
    Joined:
    Jun 9, 2004
    Have not seen that on mine. My 2nd screen is only a 60hz model which I keep portrait mode for browsing. nvidia has some weird bugs with multi monitor which may or may not be killing the cards. I know multi monitor runs ~55W idle which is excessive. I also have seen in notes about edge browser had 2d issues in the past, could also be driver related hitting other browsers as well.
     
  37. evilpaul

    evilpaul Limp Gawd

    Messages:
    181
    Joined:
    Dec 31, 2016
    I've had mine get stuck at 1350 MHz and ~70W idle and G-SYNC wouldn't turn back on a few times. It is usually idling at 300 MHz at ~32W. Temps seem to have dropped with the driver update this week. The power consumption numbers are what HWiNFO64 reports.

    The left monitor is 4K 60 Hz. The right one is 1440p running at 144 Hz.
     
    AceGoober and Sprayingmango like this.
  38. lostin3d

    lostin3d [H]ard|Gawd

    Messages:
    1,920
    Joined:
    Oct 13, 2016
    I could easily be wrong but that sounds like the driver issue that NV recently put a hotfix out for. If so, I wouldn't be surprised if there's still some situations still occurring. What I remember is that it had something to do with 2 monitor, g-sync and non-g-sync combos.
     
  39. lostin3d

    lostin3d [H]ard|Gawd

    Messages:
    1,920
    Joined:
    Oct 13, 2016
    How about:

    "Torturing users, reviewers, vendors, and wallets alike, Turing promises you pay now and play later"
     
    AceGoober and cybereality like this.
  40. schlitzbull

    schlitzbull Limp Gawd

    Messages:
    270
    Joined:
    Feb 19, 2014
    Ugh mine started locking up like this while playing last night. I didn't get any artifacting or crashes, but brief lock ups that definitely seemed different from frame drops. Plus the game I was playing wasn't taxing for the card either. As with my other issues i've had, this was using gamestream to my TV while playing at a locked 60hz. I hope your video isn't a sign of things to come for me. At least I haven't sold my 1080 yet.
     
    vjhawk likes this.