Old RTX 2080 Ti FE Meets Replacement RTX 2080 Ti FE

Discussion in '[H]ard|OCP Front Page News' started by Kyle_Bennett, Nov 20, 2018.

  1. Kyle_Bennett

    Kyle_Bennett El Chingón Staff Member

    Messages:
    54,829
    Joined:
    May 18, 1997
    After our first RTX 2080 Ti Founders Edition went into Space Invaders mode, after a bit of testing, we put in with NVIDIA for an RMA since we did purchase the card. We got our replacement card in on Monday. Our "new" RTX 2080 Ti FE card has been equipped with Samsung GDDR6 instead of Micron. Luckily, we bought two 2080 Ti FE cards, and still have one here with Micron VRAM to test side by side with the new Samsung VRAM card.

    GPUz Side by Side.


    This VRAM change has of course been reported elsewhere, but we assuredly can confirm this firsthand finally. The $64K question is of course, does this point to the issue with all the failing 2080 Ti FE cards being VRAM associated. Space Invaders points to yes quite possibly, as it looks to very much be a VRAM failure, but according to NVIDIA this is just a "Test Escape." We are still digging.

    We are however sure that the Micron VRAM is running at a very toasty 86C under a very normal workload on an open test bench this evening.

    FLIR Micron Memory Temp.
     
    Last edited: Nov 20, 2018
    eclypse, Frobozz, Armenius and 5 others like this.
  2. Brian_B

    Brian_B [H]ard|Gawd

    Messages:
    1,820
    Joined:
    Mar 23, 2012
    It may be posted elsewhere, if it is I apologize I haven't seen it.

    Any chance we could get a thermal shot of the Samsung vs Micron under similar conditions? I know you would have to pull the card and re-establish the test case, it's purely a curiosity thing.
     
  3. IKV1476

    IKV1476 Lurker

    Messages:
    257
    Joined:
    Dec 26, 2005
    I believe Kyle_Bennett said he was working on an article about the card failures and has a new thermal image camera that will be used to show us this.
     
  4. tunatime

    tunatime 2[H]4U

    Messages:
    3,259
    Joined:
    Sep 15, 2011
    Man if the frist batch of cards shipped with bad ram what a disaster that has to be for Nvidia
    Also how long does it take to change the ram out on cards? Kinda makes you wornder if they knew before the release date that they had a potential problem. I imagine it takes at least a few months to find and make board-level changes and get them shipping.
     
  5. gamerk2

    gamerk2 [H]ard|Gawd

    Messages:
    1,397
    Joined:
    Jul 9, 2012
    In theory, the VRAM module should just be a drop in replacement; NIVIDIA has almost certainly standardized the on-board packaging for exactly this reason.
     
    AceGoober and Armenius like this.
  6. focbde

    focbde Limp Gawd

    Messages:
    409
    Joined:
    Jan 31, 2008
    It's pin-identical so no issues there at all.
     
    AceGoober and Armenius like this.
  7. Prisoner849

    Prisoner849 Gawd

    Messages:
    668
    Joined:
    May 5, 2016
    This isn't the first time Micron VRAM has been suspect on an Nvidia product.

    Anyone remember the 1070 overclocking disappointments with people who had Micron equipped cards? Yes, it's overclocking and there weren't any failures at stock speeds, but still, it seems to indicate a weak link; the Samsung equipped cards consistently did much better.

    https://hardforum.com/threads/nvidia-gtx-1070-vram-lottery-micron-or-samsung.1908758/
     
    Revdarian, SickBeast, Mav451 and 2 others like this.
  8. homernoy

    homernoy Limp Gawd

    Messages:
    393
    Joined:
    Jan 31, 2007
    I have a Gigabyte Windforce 2080ti. Unfortunately it has Micron v-ram. Hopefully my card doesn't take a shit. So far, no problems to report after extensive 4k gameplay, and benchmarking.
     
  9. Slade

    Slade 2[H]4U

    Messages:
    2,456
    Joined:
    Jun 9, 2004
    Both cards micron. Been at 2 and 4 weeks for my cards. So far so good. Could be a bad batch of ram. I know that I can't push the ram on either card past 575mhz or it starts getting small artifacts in testing.

    I wonder what the samsung mem oc limit is.
     
  10. t1k

    t1k n00bie

    Messages:
    41
    Joined:
    Apr 16, 2017
    Thank you for the honest, independent investigation and analysis Kyle. I saw a user on the NVIDIA forums singling [H] out and accusing it of fearmongering regarding this issue. That must mean you're doing something right (y). I am very curious as to whether there is any clearly distinguishable differences between the two VRAM types.
     
    Revdarian, jmilcher, N4CR and 7 others like this.
  11. Peter2k

    Peter2k Limp Gawd

    Messages:
    322
    Joined:
    Apr 25, 2016
    While under 90° might be in spec (I'm guessing) I still dont think any engineer would think it as healthy long term.

    I wonder, does GDDR6 just run that hot or is it a question of better/proper cooling?

    Also there might be hardware faults we are not privy to with the Micron RAM; stuff that does deeper then just temps.
     
  12. Armenius

    Armenius [H]ardForum Junkie

    Messages:
    16,200
    Joined:
    Jan 28, 2014
    All first run cards have Micron memory because NVIDIA had an exclusive deal with them for this launch, as far as I'm aware.
     
  13. coynatha

    coynatha [H]Lite

    Messages:
    102
    Joined:
    Jun 9, 2004
    Kind of off topic but made me think of this thread - I just bought a GTX 1070 on eBay, and I'm assuming it was used in cryptomining, and I'm assuming it has a modded bios. So I went looking up the bios versions that are out on the interwebs. There's a whole lot of chatter of cards with Micron memory being shit overclockers, and at least Asus and Zotac released BIOS updates to fix instability with Micron memory on the GTX 1070.

    Sorry if this is old news to you guys, I'm coming off of a Radeon HD6900 series, been on the sidelines a long time.
    *edit*
    Point being, this isn't nVidia/Micron/Samsungs first rodeo. Which IMO gives them even worse visibility.
     
  14. Trepidati0n

    Trepidati0n [H]ardForum Junkie

    Messages:
    12,333
    Joined:
    Oct 26, 2004
    Engineer here.....thinking a number is high because it sounds high to you doesn't make it high. Even shit silicon is performance rated for a 105 deg C junction temperature and that temperature will result in a typical FIT rate. Most FIT rates, even for RAM, are pretty benign in a singular sense but in aggregate can indicate to a manufacture the warranty and overall failure rates. Regardless, these 'burnouts" have little to deal with the junction temperature...something else is going on. The only time you need a lower junction temperature than 105C is your die is so shit it cannot meat its performance requirements and gets binned into "super shit class" (0-40C ambient).

    Something else is going on here besides the silicon itself.....conflating silicon temp and these board failures may land up leading you down the incorrect rabbit hole. I do not believe what is happening is intentional but could be a culmination of multiple factors resulting in "uh oh".
     
    Last edited: Nov 20, 2018
  15. NukeDukem

    NukeDukem 2[H]4U

    Messages:
    2,215
    Joined:
    Feb 15, 2011
    My replacement for ZOTAC BRICK EDITION 2080 Ti comes today, crossing my fingers for some Samsung memory
     
    Revdarian, Armenius, N4CR and 2 others like this.
  16. phawkins633

    phawkins633 [H]Lite

    Messages:
    71
    Joined:
    Jan 26, 2013
    I have a Zotac 2080ti Amp that also has Micron Ram. As these problems are primarily on the FE cards, especially the "Space Invaders" effect, I to am crossing my fingers/toes/eyes.....but so far it's a shit-ripper....and playing BF5 at 3440 x 1440 ultra works as smooth as butter on my Predator G-Sync monitor....but I'm still following this pretty closely.....yikes!!
     
    mikeo likes this.
  17. Creig

    Creig Gawd

    Messages:
    781
    Joined:
    Sep 24, 2004
    Nvidia sure like to run the memory toasty warm on their cards.

    01-PCB.jpg
     
  18. Regretably a FLIR shot of the micron may not reveal the weakness of the memory. It could be how the memory is patterned internally. It's a matter of different silicon where a single trace or two could be making the failure.

    So far all the major manufacturers are still reporting 1% return rates, which is normal. This might all be negative attention bias affect: People are more likely to be affected and report negative results than positive ones so the bias leans toward the negative. I'm not a fan of NVIDIA, but I think the jury is still out too early to determine. If there is a large defect return rate, they will have a legal obligation to present those losses to investors next quarter in the form of write-downs. (GAAP rules)

    That said, the story should not be ignored and more data is always useful and appreciated at this early stage.
     
    Last edited by a moderator: Nov 20, 2018
  19. TheMadHatterXxX

    TheMadHatterXxX 2[H]4U

    Messages:
    2,802
    Joined:
    Sep 7, 2004
    How do you find out what rams on the card without taking it apart?
     
  20. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    13,485
    Joined:
    Oct 7, 2000
    gpu-z shows it
     
  21. lightsout

    lightsout Gawd

    Messages:
    569
    Joined:
    Mar 15, 2014
    Kyle_Bennett any truth to the new card having a darker box?
     
    AceGoober likes this.
  22. Yes and no. Resitence is a function of voltage with silicon. And modern day memory uses LESS VOLTAGE. So you shouldn't be seeing high temps. You might fall out of the operating window with anything above 85-90C. The 105C limiter is just where silicon actually starts to degade, not just fail. That's a huge differentiation. For example: My chips are spec'd to run at 16V. I can run them up to 18V 100C. If I go over that, I will damage them.

    Cooler is always better for reliability (to a certain point) That's why hard core overclockers use LN.

    The only way to tell for sure is to have engineering test samples with test probes hooked up to the various lines to see where they are failing.
     
    Last edited by a moderator: Nov 20, 2018
  23. Brahmzy

    Brahmzy [H]ardness Supreme

    Messages:
    4,947
    Joined:
    Sep 9, 2004
    95 degrees is the high side of spec.
    All the eVGA cards are running in the 80s and low 90s.
     
  24. The Mad Atheist

    The Mad Atheist Gawd

    Messages:
    639
    Joined:
    Mar 9, 2018
    Oh god, those toasty RAM chips remind me of the days of slapping on chip-sinks on video cards back in the day.
    Hopefully they still sell them and there's room on the cards.
     
  25. GoodBoy

    GoodBoy [H]ard|Gawd

    Messages:
    1,071
    Joined:
    Nov 29, 2004
    A lot of arm chair experts here... "well it might be this!?!" "It might be that!?!"

    They use Micron for the first run production, it's coincidence that the bad cards all got "Micron" because they ALL got Micron, and more recent manufacture has something else because that is what was available. Making a leap that Micron ram is a causal factor isn't backed by any facts that I am aware of...

    This is much more plausible, however I think it is other components failing, not the RAM. The card that caught fire, had the issue at the very back end of the card, and there is no ram there. We do not have enough evidence to know what exact parts are the cause, but the one pinpointed failure, the fire, was not a ram module. All this speculation about the ram, is lemmings falling in a rabbit hole...

    "Hey yeah, it could be.."

    'Oh yeah, this might be.."

    blah blah bullshit
     
  26. Kyle_Bennett

    Kyle_Bennett El Chingón Staff Member

    Messages:
    54,829
    Joined:
    May 18, 1997
    I have been speaking to companies that manufacturer video cards, and every single one that I have discussed this with have said they believe it to be a RAM failure. You can call that a leap if you want.
     
  27. Stoly

    Stoly [H]ardness Supreme

    Messages:
    6,016
    Joined:
    Jul 26, 2005
    I did like the "Space Invaders" screen. :D:D
     
    Brian_B and AceGoober like this.
  28. Trepidati0n

    Trepidati0n [H]ardForum Junkie

    Messages:
    12,333
    Joined:
    Oct 26, 2004
    I'm sorry..but pretty much everything you just stated sounds smart enough to be right but wrong on so many levels. I don't even know where to start with the "resistance (sic) is a function of voltage with silicon" statement.
     
    Armenius, Brian_B and AceGoober like this.
  29. https://www.allaboutcircuits.com/textbook/direct-current/chpt-12/temperature-coefficient-resistance/

    As you were saying? While silicon can decrease, it's interconnect points where it isn't silicon can increase dramatically and that creates issues. It leads to things like signal reflection. But I should have stated "The dye" over the "The Silicon" which would have been technically more correct.
     
  30. Kyle_Bennett

    Kyle_Bennett El Chingón Staff Member

    Messages:
    54,829
    Joined:
    May 18, 1997
    Not sure what that points to, but yes.

    IMG_20181120_114249.jpg
     
    Armenius, Denjoy, Geforcepat and 2 others like this.
  31. GoodBoy

    GoodBoy [H]ard|Gawd

    Messages:
    1,071
    Joined:
    Nov 29, 2004
    This is the kind of evidence I like...

    Thanks Kyle :)
     
    Armenius, Geforcepat and AceGoober like this.
  32. lightsout

    lightsout Gawd

    Messages:
    569
    Joined:
    Mar 15, 2014
    Not sure either. Just saw it elsewhere. Maybe that differentiates between the revised cards?
     
    AceGoober likes this.
  33. DNMock

    DNMock Limp Gawd

    Messages:
    278
    Joined:
    Apr 16, 2015
    https://www.quora.com/What-is-the-relationship-of-temperature-with-voltage-and-current

    edit: I always hated that crap in physics, I personally have no idea, just casually throwing gas on the fire for my viewing pleasure
     
    Brian_B likes this.
  34. velusip

    velusip [H]ard|Gawd

    Messages:
    1,496
    Joined:
    Jan 24, 2005
    Mike vs Sam: Reloaded.
     
  35. Kyle_Bennett

    Kyle_Bennett El Chingón Staff Member

    Messages:
    54,829
    Joined:
    May 18, 1997
    Samsung Replacement card serial number - 0324518088XXX
    Micron RMA's cars serial number - 0323918014XXX

    Looking at the BIOS versions on the card's you have to wonder if the Micron card has a beta on it, 90.02.0B.00.0E. The Samsung's card is all numbers...
     
    Armenius, N4CR, AceGoober and 2 others like this.
  36. Todd Walter

    Todd Walter Gawd

    Messages:
    528
    Joined:
    May 10, 2016
    Armenius and AceGoober like this.
  37. nEo717

    nEo717 Limp Gawd

    Messages:
    242
    Joined:
    Jun 2, 2017
    lol, didn't read site with NVFlash... its habit, rarely do I read another forum than hardocp.

    The bios on your replacement supports:

    Memory Support GDDR6, Samsung GDDR6, Micron GDDR6, Hynix
     
    AceGoober likes this.
  38. Galvin

    Galvin 2[H]4U

    Messages:
    2,497
    Joined:
    Jan 22, 2002
    Whats the best way to test the memory? Got a 2080ti XC from evga. And it has micron ram. Played world of warcraft yesterday no issues.
     
  39. Kyle_Bennett

    Kyle_Bennett El Chingón Staff Member

    Messages:
    54,829
    Joined:
    May 18, 1997
    Not sure there is a way to "test" the memory. I did a 8 hour Heaven stress test with my card when I installed it, no issues. Just used it for gaming from then on. I have not seen any reports of being able to diagnose or predict an impending failure.
     
    Armenius likes this.
  40. maxius

    maxius 2[H]4U

    Messages:
    3,279
    Joined:
    Dec 17, 2001
    watch you release the magic smoke again
     
    The Mad Atheist and AceGoober like this.