Old RTX 2080 Ti FE Meets Replacement RTX 2080 Ti FE

Discussion in 'HardForum Tech News' started by FrgMstr, Nov 20, 2018.

  1. FrgMstr

    FrgMstr Just Plain Mean Staff Member

    Messages:
    47,992
    Joined:
    May 18, 1997
    After our first RTX 2080 Ti Founders Edition went into Space Invaders mode, after a bit of testing, we put in with NVIDIA for an RMA since we did purchase the card. We got our replacement card in on Monday. Our "new" RTX 2080 Ti FE card has been equipped with Samsung GDDR6 instead of Micron. Luckily, we bought two 2080 Ti FE cards, and still have one here with Micron VRAM to test side by side with the new Samsung VRAM card.

    GPUz Side by Side.


    This VRAM change has of course been reported elsewhere, but we assuredly can confirm this firsthand finally. The $64K question is of course, does this point to the issue with all the failing 2080 Ti FE cards being VRAM associated. Space Invaders points to yes quite possibly, as it looks to very much be a VRAM failure, but according to NVIDIA this is just a "Test Escape." We are still digging.

    We are however sure that the Micron VRAM is running at a very toasty 86C under a very normal workload on an open test bench this evening.

    FLIR Micron Memory Temp.
     
    Last edited: Nov 20, 2018
    eclypse, Frobozz, Armenius and 5 others like this.
  2. Brian_B

    Brian_B 2[H]4U

    Messages:
    2,494
    Joined:
    Mar 23, 2012
    It may be posted elsewhere, if it is I apologize I haven't seen it.

    Any chance we could get a thermal shot of the Samsung vs Micron under similar conditions? I know you would have to pull the card and re-establish the test case, it's purely a curiosity thing.
     
  3. tunatime

    tunatime 2[H]4U

    Messages:
    2,844
    Joined:
    Sep 15, 2011
    Man if the frist batch of cards shipped with bad ram what a disaster that has to be for Nvidia
    Also how long does it take to change the ram out on cards? Kinda makes you wornder if they knew before the release date that they had a potential problem. I imagine it takes at least a few months to find and make board-level changes and get them shipping.
     
  4. gamerk2

    gamerk2 [H]ard|Gawd

    Messages:
    1,547
    Joined:
    Jul 9, 2012
    In theory, the VRAM module should just be a drop in replacement; NIVIDIA has almost certainly standardized the on-board packaging for exactly this reason.
     
    AceGoober and Armenius like this.
  5. focbde

    focbde Gawd

    Messages:
    546
    Joined:
    Jan 31, 2008
    It's pin-identical so no issues there at all.
     
    AceGoober and Armenius like this.
  6. Prisoner849

    Prisoner849 Gawd

    Messages:
    683
    Joined:
    May 5, 2016
    This isn't the first time Micron VRAM has been suspect on an Nvidia product.

    Anyone remember the 1070 overclocking disappointments with people who had Micron equipped cards? Yes, it's overclocking and there weren't any failures at stock speeds, but still, it seems to indicate a weak link; the Samsung equipped cards consistently did much better.

    https://hardforum.com/threads/nvidia-gtx-1070-vram-lottery-micron-or-samsung.1908758/
     
    Revdarian, SickBeast, Mav451 and 2 others like this.
  7. homernoy

    homernoy Limp Gawd

    Messages:
    440
    Joined:
    Jan 31, 2007
    I have a Gigabyte Windforce 2080ti. Unfortunately it has Micron v-ram. Hopefully my card doesn't take a shit. So far, no problems to report after extensive 4k gameplay, and benchmarking.
     
  8. Slade

    Slade 2[H]4U

    Messages:
    2,539
    Joined:
    Jun 9, 2004
    Both cards micron. Been at 2 and 4 weeks for my cards. So far so good. Could be a bad batch of ram. I know that I can't push the ram on either card past 575mhz or it starts getting small artifacts in testing.

    I wonder what the samsung mem oc limit is.
     
  9. t1k

    t1k n00b

    Messages:
    41
    Joined:
    Apr 16, 2017
    Thank you for the honest, independent investigation and analysis Kyle. I saw a user on the NVIDIA forums singling [H] out and accusing it of fearmongering regarding this issue. That must mean you're doing something right (y). I am very curious as to whether there is any clearly distinguishable differences between the two VRAM types.
     
    Revdarian, jmilcher, N4CR and 7 others like this.
  10. Peter2k

    Peter2k Limp Gawd

    Messages:
    309
    Joined:
    Apr 25, 2016
    While under 90° might be in spec (I'm guessing) I still dont think any engineer would think it as healthy long term.

    I wonder, does GDDR6 just run that hot or is it a question of better/proper cooling?

    Also there might be hardware faults we are not privy to with the Micron RAM; stuff that does deeper then just temps.
     
  11. Armenius

    Armenius I Drive Myself to the [H]ospital

    Messages:
    16,824
    Joined:
    Jan 28, 2014
    All first run cards have Micron memory because NVIDIA had an exclusive deal with them for this launch, as far as I'm aware.
     
  12. coynatha

    coynatha Limp Gawd

    Messages:
    230
    Joined:
    Jun 9, 2004
    Kind of off topic but made me think of this thread - I just bought a GTX 1070 on eBay, and I'm assuming it was used in cryptomining, and I'm assuming it has a modded bios. So I went looking up the bios versions that are out on the interwebs. There's a whole lot of chatter of cards with Micron memory being shit overclockers, and at least Asus and Zotac released BIOS updates to fix instability with Micron memory on the GTX 1070.

    Sorry if this is old news to you guys, I'm coming off of a Radeon HD6900 series, been on the sidelines a long time.
    *edit*
    Point being, this isn't nVidia/Micron/Samsungs first rodeo. Which IMO gives them even worse visibility.
     
  13. Trepidati0n

    Trepidati0n [H]ardForum Junkie

    Messages:
    8,810
    Joined:
    Oct 26, 2004
    Engineer here.....thinking a number is high because it sounds high to you doesn't make it high. Even shit silicon is performance rated for a 105 deg C junction temperature and that temperature will result in a typical FIT rate. Most FIT rates, even for RAM, are pretty benign in a singular sense but in aggregate can indicate to a manufacture the warranty and overall failure rates. Regardless, these 'burnouts" have little to deal with the junction temperature...something else is going on. The only time you need a lower junction temperature than 105C is your die is so shit it cannot meat its performance requirements and gets binned into "super shit class" (0-40C ambient).

    Something else is going on here besides the silicon itself.....conflating silicon temp and these board failures may land up leading you down the incorrect rabbit hole. I do not believe what is happening is intentional but could be a culmination of multiple factors resulting in "uh oh".
     
    Last edited: Nov 20, 2018
  14. phawkins633

    phawkins633 [H]Lite

    Messages:
    76
    Joined:
    Jan 26, 2013
    I have a Zotac 2080ti Amp that also has Micron Ram. As these problems are primarily on the FE cards, especially the "Space Invaders" effect, I to am crossing my fingers/toes/eyes.....but so far it's a shit-ripper....and playing BF5 at 3440 x 1440 ultra works as smooth as butter on my Predator G-Sync monitor....but I'm still following this pretty closely.....yikes!!
     
    mikeo likes this.
  15. Creig

    Creig Gawd

    Messages:
    785
    Joined:
    Sep 24, 2004
    Nvidia sure like to run the memory toasty warm on their cards.

    01-PCB.jpg
     
  16. Regretably a FLIR shot of the micron may not reveal the weakness of the memory. It could be how the memory is patterned internally. It's a matter of different silicon where a single trace or two could be making the failure.

    So far all the major manufacturers are still reporting 1% return rates, which is normal. This might all be negative attention bias affect: People are more likely to be affected and report negative results than positive ones so the bias leans toward the negative. I'm not a fan of NVIDIA, but I think the jury is still out too early to determine. If there is a large defect return rate, they will have a legal obligation to present those losses to investors next quarter in the form of write-downs. (GAAP rules)

    That said, the story should not be ignored and more data is always useful and appreciated at this early stage.
     
    Last edited by a moderator: Nov 20, 2018
  17. TheMadHatterXxX

    TheMadHatterXxX 2[H]4U

    Messages:
    2,873
    Joined:
    Sep 7, 2004
    How do you find out what rams on the card without taking it apart?
     
  18. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    12,077
    Joined:
    Oct 7, 2000
    gpu-z shows it
     
  19. Yes and no. Resitence is a function of voltage with silicon. And modern day memory uses LESS VOLTAGE. So you shouldn't be seeing high temps. You might fall out of the operating window with anything above 85-90C. The 105C limiter is just where silicon actually starts to degade, not just fail. That's a huge differentiation. For example: My chips are spec'd to run at 16V. I can run them up to 18V 100C. If I go over that, I will damage them.

    Cooler is always better for reliability (to a certain point) That's why hard core overclockers use LN.

    The only way to tell for sure is to have engineering test samples with test probes hooked up to the various lines to see where they are failing.
     
    Last edited by a moderator: Nov 20, 2018
  20. Brahmzy

    Brahmzy [H]ardness Supreme

    Messages:
    4,955
    Joined:
    Sep 9, 2004
    95 degrees is the high side of spec.
    All the eVGA cards are running in the 80s and low 90s.
     
  21. The Mad Atheist

    The Mad Atheist Gawd

    Messages:
    910
    Joined:
    Mar 9, 2018
    Oh god, those toasty RAM chips remind me of the days of slapping on chip-sinks on video cards back in the day.
    Hopefully they still sell them and there's room on the cards.
     
  22. GoodBoy

    GoodBoy [H]ard|Gawd

    Messages:
    1,217
    Joined:
    Nov 29, 2004
    A lot of arm chair experts here... "well it might be this!?!" "It might be that!?!"

    They use Micron for the first run production, it's coincidence that the bad cards all got "Micron" because they ALL got Micron, and more recent manufacture has something else because that is what was available. Making a leap that Micron ram is a causal factor isn't backed by any facts that I am aware of...

    This is much more plausible, however I think it is other components failing, not the RAM. The card that caught fire, had the issue at the very back end of the card, and there is no ram there. We do not have enough evidence to know what exact parts are the cause, but the one pinpointed failure, the fire, was not a ram module. All this speculation about the ram, is lemmings falling in a rabbit hole...

    "Hey yeah, it could be.."

    'Oh yeah, this might be.."

    blah blah bullshit
     
  23. FrgMstr

    FrgMstr Just Plain Mean Staff Member

    Messages:
    47,992
    Joined:
    May 18, 1997
    I have been speaking to companies that manufacturer video cards, and every single one that I have discussed this with have said they believe it to be a RAM failure. You can call that a leap if you want.
     
  24. Stoly

    Stoly [H]ardness Supreme

    Messages:
    6,108
    Joined:
    Jul 26, 2005
    I did like the "Space Invaders" screen. :D:D
     
    Brian_B and AceGoober like this.
  25. Trepidati0n

    Trepidati0n [H]ardForum Junkie

    Messages:
    8,810
    Joined:
    Oct 26, 2004
    I'm sorry..but pretty much everything you just stated sounds smart enough to be right but wrong on so many levels. I don't even know where to start with the "resistance (sic) is a function of voltage with silicon" statement.
     
    Armenius, Brian_B and AceGoober like this.
  26. https://www.allaboutcircuits.com/textbook/direct-current/chpt-12/temperature-coefficient-resistance/

    As you were saying? While silicon can decrease, it's interconnect points where it isn't silicon can increase dramatically and that creates issues. It leads to things like signal reflection. But I should have stated "The dye" over the "The Silicon" which would have been technically more correct.
     
  27. DNMock

    DNMock Limp Gawd

    Messages:
    399
    Joined:
    Apr 16, 2015
    https://www.quora.com/What-is-the-relationship-of-temperature-with-voltage-and-current

    edit: I always hated that crap in physics, I personally have no idea, just casually throwing gas on the fire for my viewing pleasure
     
    Brian_B likes this.
  28. velusip

    velusip [H]ard|Gawd

    Messages:
    1,577
    Joined:
    Jan 24, 2005
    Mike vs Sam: Reloaded.
     
  29. FrgMstr

    FrgMstr Just Plain Mean Staff Member

    Messages:
    47,992
    Joined:
    May 18, 1997
    Samsung Replacement card serial number - 0324518088XXX
    Micron RMA's cars serial number - 0323918014XXX

    Looking at the BIOS versions on the card's you have to wonder if the Micron card has a beta on it, 90.02.0B.00.0E. The Samsung's card is all numbers...
     
    Armenius, N4CR, AceGoober and 2 others like this.
  30. Todd Walter

    Todd Walter Gawd

    Messages:
    592
    Joined:
    May 10, 2016
    Armenius and AceGoober like this.
  31. Galvin

    Galvin 2[H]4U

    Messages:
    2,688
    Joined:
    Jan 22, 2002
    Whats the best way to test the memory? Got a 2080ti XC from evga. And it has micron ram. Played world of warcraft yesterday no issues.
     
  32. FrgMstr

    FrgMstr Just Plain Mean Staff Member

    Messages:
    47,992
    Joined:
    May 18, 1997
    Not sure there is a way to "test" the memory. I did a 8 hour Heaven stress test with my card when I installed it, no issues. Just used it for gaming from then on. I have not seen any reports of being able to diagnose or predict an impending failure.
     
    Armenius likes this.
  33. maxius

    maxius 2[H]4U

    Messages:
    3,348
    Joined:
    Dec 17, 2001
    watch you release the magic smoke again
     
    The Mad Atheist and AceGoober like this.
  34. nEo717

    nEo717 Limp Gawd

    Messages:
    274
    Joined:
    Jun 2, 2017
    EVGA has artifact scanner (there's check box to check for artifacts) - EVGA OC Scanner X - Also there's a video card memory test called Check Flash (ChkFlsh)
     
  35. FrgMstr

    FrgMstr Just Plain Mean Staff Member

    Messages:
    47,992
    Joined:
    May 18, 1997
    No. That is not in the plan right now. Testing as delivered.
     
    Armenius and AltTabbins like this.
  36. FrgMstr

    FrgMstr Just Plain Mean Staff Member

    Messages:
    47,992
    Joined:
    May 18, 1997
    Honestly, I think I would push it inside of warranty limits. I would like to know if I got another "bad" card sooner rather than later. I would want to see it fail inside of warranty assuredly. My 2 cents, you may need change.
     
    Armenius, Geforcepat, Aireoth and 4 others like this.
  37. Trepidati0n

    Trepidati0n [H]ardForum Junkie

    Messages:
    8,810
    Joined:
    Oct 26, 2004
    You just admitted to conflating things that didn't match your original assertion. You said the voltage affected the resistance...and then you give me a link to how temperature affects resistance. *boggle* Again...now with signal reflection comment on top of it....seriously...just stop. And the "die" is the die...there is no bonding or packaging at that point...the die IS the result of process on the silicon. I know you are trying really hard, but right obvious of your experience and knowledge in this field is not productive when you responded to my original post. As a FYI, the reason they went away from lead attach bonding wires wasn't heat..it was a small fraction..but a very small.
     
    Armenius and AceGoober like this.
  38. I went off on a tangent with signal reflection. Tthat was from the fact interconnects are of different resistance than the silicon and that causes heat and signal reflection issues.

    That said I am correct. The package as a whole changes as the resistance changes with heat. And my link backs that up. You said otherwise.
     
  39. nEo717

    nEo717 Limp Gawd

    Messages:
    274
    Joined:
    Jun 2, 2017

    Do you mean that the collisions from the different resistance (interconnects and silicon) are what's causing heating, to which in-turn the heat and vibrations are enhancing resistivity resulting in end result of artifacts, including from the reflections of this all happening?

    If so, I could buy into that as 1 possible theory...

    EDIT:
    This covers temperature coefficient of resistance somewhat well:
    https://www.allaboutcircuits.com/textbook/direct-current/chpt-12/temperature-coefficient-resistance/
     
    Last edited: Nov 20, 2018
  40. Lol. That's the link I posted a couple back