RTX 2080 Seemingly Died

Discussion in 'nVidia Flavor' started by T4rd, May 2, 2019.

  1. T4rd

    T4rd [H]ardForum Junkie

    Messages:
    16,055
    Joined:
    Apr 8, 2009
    Just got the card a couple weeks ago; Zotac AMP 2080 and was working great up until last night when I was stress testing my 2700X and RAM to make sure it was stable after . In the middle of running Prime95 after doing a mild OC on my CPU (4.2 GHz) my display just went out. The PC seemingly kept running with no other indication of crashing, but I just assumed it crashed still because it was a new OC I was trying out. Normally when an OC fails though, my mobo (Asus ROG Strix X470-F) will reboot back to stock/default speeds and all is well again, but now every time I turn it on, it seems to boot normally other than the white LED indicator on it that stays on that indicates something is wrong with the video card.

    I checked the power connections to the card, the 8-pin connector was a tad bit pulled out, but the issue persists after securing it into the plug. I also pressed on the card firmly to make sure it was in the slot good, but didn't fully reseat it yet.

    Didn't have time to swap cards out yet with my son's box to see if the issue follows the card, but I will tonight. Just wondering if there's anything I can try with this card to revive it maybe before I initiate an exchange with Amazon since it's under 30 days old. Googling the issues with the RTX cards, it seems most of them are on the reference PCB cards, which I don't think this card is on just by looking at its cooler on it. So if this one did die, I'm chalking it up to just bad luck rather than a systemic issue with the cards.
     
    Last edited: May 2, 2019
  2. RazorWind

    RazorWind 2[H]4U

    Messages:
    3,226
    Joined:
    Feb 11, 2001
    There's plenty you can do to diagnose the problem, if you can confirm there really is something wrong with the card, assuming you have the right equipment. The short version being that you would check each inductor on the board for the proper voltage. If you found an inductor with zero volts to ground, you'd know you're missing a power rail, and likely have a bad card.

    That said, if the problem follows the card to a different machine, just RMA it. This is what warranties are for.
     
    T4rd likes this.
  3. T4rd

    T4rd [H]ardForum Junkie

    Messages:
    16,055
    Joined:
    Apr 8, 2009
    Yeah, my volt meter just broke and I haven't got a new one yet, so I can't do that right now and I'm not sure what the inductors look like or where they're at on the card either.
     
  4. mnewxcv

    mnewxcv [H]ardness Supreme

    Messages:
    6,310
    Joined:
    Mar 4, 2007
    If you can try another card or try your card in another system that would be best.
     
  5. RazorWind

    RazorWind 2[H]4U

    Messages:
    3,226
    Joined:
    Feb 11, 2001
    Heh, I was mostly being facetious. Even if you did diagnose the problem as a failure on a particular rail, there's some fairly elaborate equipment you need to actually repair it. You'd have to be a special kind of masochist to do this if you also have the option of just RMAing it.

    But yeah, try a different card in the affected motherboard, and test the suspect card in a known-good motherboard if you can, and make sure that the problem follows the card. If it does, RMA that shit. If it doesn't, and the problem stays with the system, reset the BIOS and then troubleshoot further. It's possible something other than the card failed, such as the motherboard or power supply.
     
    T4rd likes this.
  6. T4rd

    T4rd [H]ardForum Junkie

    Messages:
    16,055
    Joined:
    Apr 8, 2009
    Haha, thought inductor sounded odd, but didn't questioned it.

    I'll swap cards with my son's B450 system today and see how that goes.
     
  7. Furious_Styles

    Furious_Styles [H]ard|Gawd

    Messages:
    1,232
    Joined:
    Jan 16, 2013
    If you're in the 30 day window I wouldn't even bother testing I'd probably just RMA it.
     
  8. T4rd

    T4rd [H]ardForum Junkie

    Messages:
    16,055
    Joined:
    Apr 8, 2009
    The plot thickens...

    My card works fine in my son's rig.

    His card didn't make a difference in mine, it still wouldn't boot and the mobo still showed the white/VGA LED staying on while booting.

    I thought maybe PSU, so swapped another one I had in and it didn't make a difference at first, then I disconnected power and jumped the pins to reset the BIOS. It booted that time and I was able to get into the BIOS. Then I reconnected my original PSU and it booted again into the BIOS. I let it boot to Windows just to make sure everything was good. T

    I just tried to put my RAM back at its previously stable 3000 MHz settings that it was running fine on before I started messing with the CPU and started all of this. Same shit happened then and couldn't get anymore video signal and got the white/VGA light again. No amount of jumping the pins to clear the CMOS brought it back. Then I took the battery out and jumped the pins, successful boot again, but after going with an even slower RAM speed and the same Voltage settings, it killed it again and all video was lost even after clearing the CMOS again.

    Then I hooked up my son's PSU that is better than the one I have and it wouldn't boot still until I cleared the CMOS again. Reconnected my PSU again and all was well again and I could get back into Windows fine. But if course I can't leave my RAM at 2133HZ, so I just try to bump it up a little and the no VGA issue/light comes back.

    I don't have enough time to troubleshoot it all night, so I'm going to bed now.

    I'm thinking it's either the PSU or mobi at this point. My PSU is an old BFG 1KW PSU that isn't 80+ rated in any way, whereas my son's is a 650w EVGA 80+ Gold.

    Any more ideas is appreciated.
     
  9. Bawjaws

    Bawjaws Limp Gawd

    Messages:
    434
    Joined:
    Feb 20, 2017
    It sounds to me like this issue is with one of the motherboard, the PSU or the RAM. Given that your GPU works fine in your son's card, I'd be surprised if it's the GPU that's faulty, especially as the problem originally occurred while you were overclocking your CPU rather than GPU.

    The good news here is that you have two machines to test with, so you should be able to narrow down the problem to the individual component responsible by a process of elimination. It might take a while, though...

    With your PC, I'd try booting with a single stick of RAM first, and seeing if this helps. If it does, add in the other sticks one at a time. Try different DIMM slots if it fails to boot. If you don't have any joy with any of your RAM, borrow the known good RAM from your son's machine and repeat the process. This should identify whether there's a problem either with your board or memory, and if so you could try your RAM in your son's machine to double-check if your RAM is the problem.

    Similarly, I'd try your GPU, RAM and PSU in your son's machine, and then swap out those three components one at a time for your son's stuff to see if that makes any difference. You may need to try all of the different combinations, so it could take a while.
     
  10. Auer

    Auer Limp Gawd

    Messages:
    502
    Joined:
    Nov 2, 2018
    Wouldn't it be a bit risky to try the PSU in his sons setup if it's faulty?
     
  11. Auer

    Auer Limp Gawd

    Messages:
    502
    Joined:
    Nov 2, 2018
    I would replace that PSU regardless. That would be my starting point.
     
  12. Bawjaws

    Bawjaws Limp Gawd

    Messages:
    434
    Joined:
    Feb 20, 2017
    Unless it's so faulty that it's going to kill components as soon as it's switched on, then I think it should be fine to see if the system will POST. In my experience, faulty PSUs manifest in one of two ways - they operate normally and then fail under load, or they blow things up as soon as they're switched on! But yeah, I'd probably try to eliminate all of the other components as being faulty first before plugging that PSU into another working system.
     
  13. RazorWind

    RazorWind 2[H]4U

    Messages:
    3,226
    Joined:
    Feb 11, 2001
    After all that, I doubt it's the power supply. It didn't make a difference when you swapped a known-good one in until you reverted to the non-overclocked defaults. This suggests your problem lies elsewhere. I agree with the others urging caution, but I also think you'd be seeing different symptoms if the cause of this were actually the power supply.

    The source of the problem could be any combination of the CPU, motherboard and RAM. You mentioned this happened while you were getting increasingly aggressive with your overclocking, and it's possible you managed to degrade something. Remember that the memory controller is part of the CPU.

    Do you have a known-good CPU and similar memory that you can test with this motherboard?
     
    Bawjaws likes this.
  14. T4rd

    T4rd [H]ardForum Junkie

    Messages:
    16,055
    Joined:
    Apr 8, 2009
    Yeah, I forgot to mention that I removed all the RAM except for 1 stick too after I switched out PSUs the first time, then when it finally booted, I plugged back in my PSU to test with that and it booted again, then I put all the RAM back in again and it booted again. Only when I tried to OC the RAM did it take a dump again and refuse to boot with the white/VGA LED on again. It kind of seems like the PSU may be a factor still because I think it would only boot again after I cleared the CMOS and switched PSUs, then it would boot again even after I reconnected my original PSU.

    I can pull the RAM out of my son's rig and his 2600 CPU and test that in mine as well if need be. I'm going to be away all weekend though, so I won't get to do all of that until I get back Sunday evening. I'm thinking about just returning the mobo at this point and getting a good deal on a Crosshair VII I just saw in FS/FT, which would just be a slight upgrade to the mobo I have now. I've been looking for a good PSU deal on a 650W+ or so PSU to replace this old one with too anyways. If you all have any recommendations or have seen any deals lately, I'd appreciate it. I got a eVGA 650W 80+ Gold fully modular PSU for $55 for my son's rig that I built him last Christmas, so been looking for something like that again.

    I just remembered that I did actually update the BIOS right before all this started happening too. I thought maybe it didn't flash right or something , so when I finally got it to boot again the 2nd time, I plugged back in the USB drive I had the BIOS update on and re-flashed it to see if that helped, but it died again after I applied any OC to the RAM.
     
    Last edited: May 3, 2019
  15. Grimlaking

    Grimlaking 2[H]4U

    Messages:
    2,767
    Joined:
    May 9, 2006
    I'd be looking at motherboard or Ram in your case.
     
  16. Bawjaws

    Bawjaws Limp Gawd

    Messages:
    434
    Joined:
    Feb 20, 2017
    I would do everything you can to isolate the faulty component(s) before you return anything, to be honest, otherwise you're just kicking the can down the road. If any of the CPU, RAM or PSU is fucked then a new motherboard is just going to lead you right back to where you are now.

    As said previously, given that you have a complete second system to hand, I'd be swapping components out one by one to try and isolate the faulty part(s). Then once you know for sure what needs replaced, proceed with RMAs.
     
    T4rd likes this.
  17. Dan_D

    Dan_D [H]ard as it Gets

    Messages:
    53,486
    Joined:
    Feb 9, 2002
    Replace that power suppply. I can almost guarantee that's what the issue is.
     
    Chimpee, Auer and Grimlaking like this.
  18. jmilcher

    jmilcher [H]ardness Supreme

    Messages:
    4,150
    Joined:
    Feb 3, 2008
    Or simply swap out the video card. That rules out any other issue.
     
  19. Bawjaws

    Bawjaws Limp Gawd

    Messages:
    434
    Joined:
    Feb 20, 2017
    He already has, and it works fine in another machine. So we know it's not the GPU, and the question now becomes which other component is at fault.

    And I don't think RazorWind was exactly being 100% serious in his reply...
     
    Auer and Dan_D like this.
  20. RazorWind

    RazorWind 2[H]4U

    Messages:
    3,226
    Joined:
    Feb 11, 2001
    But it sounds like the problem persisted even after he swapped the power supply out.

    Not that I fundamentally disagree with you or anything. A BFG power supply would have to be ten years old, at least, wouldn't it?
     
  21. Dan_D

    Dan_D [H]ard as it Gets

    Messages:
    53,486
    Joined:
    Feb 9, 2002
    If he swapped the PSU out more than once, I didn't catch that. Yes, that BFG power supply would be at least that old. That's why I said to replace it for sure. I have run power supplies for the better part of ten years without issue and then all of the sudden, they do weird crap on some specific configurations. I've even seen PSU incompatibilities with certain motherboards for reasons you'd have to ask Paul about. I don't know the cause of that. I had a PC Power & Cooling 1Kw SR that was like that. Later on my Thermaltake Toughpower 1300 did the same thing. It was ten years old and suddenly quit working right with some video cards and motherboards.
     
    DooKey and T4rd like this.
  22. T4rd

    T4rd [H]ardForum Junkie

    Messages:
    16,055
    Joined:
    Apr 8, 2009
    Ok, just got a chance to start messing with this again and before I swapped out my PSU with my son's eVGA 650W PSU to do some more testing, I decided to just try it as it sits first and it booted right up. More weird shit continues to happen though...

    I noticed my CPU fan keeps ramping up and down sporadically, so I check CPU temps and they're spiking at idle in correlation with the fan ramping up going from low 40s to mid 50s constantly.. just sitting there with nothing running showing less than 5% utilization in task manager.

    Then I look at the CPU voltage in the Ryzen Master app and see it's bouncing around 1.5-1.55v! It's dangerous to go over 1.425v on these procs, so I immediately shut down and check settings in BIOS and all is default/auto there with the auto voltage showing at 1.2v.. so I manually set it to 1.25 and boot... stalls out at windows animation screen before getting into windows and the yellow LED on the mobo starts flashing indicating RAM issue. Rebooted and same thing happens. I boot back into BIOS, make sure everything is at stock speeds again and bump CPU voltage to 1.3. Windows boots this time and although temps seem to be stable at 40ish, Ryzen Master is still showing CPU voltage at 1.5-.1.55v, so then I open up CPU-Z to see if it shows the same and it does not; it shows core voltage around 1.26.. so dunno wtf is up with that discrepancy:

    Weird Voltages2.PNG

    So then I notice the reset button at the top right of the Ryzen Master UI there and hit that and the CPU voltage there immediately drops down to 1.33ish in the app, but voltage doesn't change in the CPU-Z app, so I'm still not sure which one to trust. But it did seem that idle temps dropped a bit after I reset the Master app down to mid to low 30s. Comparing the screenshot I took before resetting everything to default values, I see that for some reason the Precision Boost Overdrive control mode is selected and all values under it (PPT, TDC, EDC) are at the absolute max values (1000 each) while default they are at 141, 95, and 140 respectively it seems:

    Default Master.PNG

    So obviously, without me knowing what those settings do or even remember ever touching them before, those values are completely retarded and seemingly beyond dangerous. I hope this didn't somehow damage my CPU or anything else now. I've been running Prime95 as I've been typing this post though at completely stock CPU/RAM speeds (3.8 GHz CPU / 2133 MHz RAM) and it has been passing all tests with all 16 threads loaded for the past 40 mins and CPU temp hasn't exceeded 60C by much.

    I haven't tried to reboot again yet and change/OC anything in the BIOS, so I'm about to do that and just wanted to post this before everything most likely takes a shit again on me. I think I'm about to swap the PSU out regardless just to eliminate that piece and go from there still. I'll keep my son's 650w eVGA 80+ gold PSU in mine from now on and I just ordered him another 600w eVGA 80+ Bronze for his PC.

    I'll post back after I do that and some more testing.
     
  23. Dan_D

    Dan_D [H]ard as it Gets

    Messages:
    53,486
    Joined:
    Feb 9, 2002
    Generally I'd say to trust Ryzen Master's voltage readings over CPU-Z's. I see inaccuracy with CPU-Z and voltage nearly every time I test a motherboard and CPU combination. What's set in the BIOS and what CPU-Z shows me are often vastly different things.
     
    T4rd likes this.
  24. T4rd

    T4rd [H]ardForum Junkie

    Messages:
    16,055
    Joined:
    Apr 8, 2009
    Pretty sure it's the mobo now, or at least the most recent BIOS update is garbage.

    Before I swapped PSU, I wanted to replicate the boot issue just to make sure it's still there. OC'd only the RAM to the exact settings/profile I was using before I ever touched the CPU speeds/voltages and sure enough, same boot issue with the white LED.

    Swapped my PSUs out, cleared the BIOS by removing the battery and shorting the pins for it. Didn't boot (well it seems like it's booting, there's just no display signal - same thing it's been doing still).

    Removed all RAM except for one stick. Didn't boot.

    Swapped my RAM with my son's G-Skill RAM. Didn't boot.

    Cleared CMOS again with G-Skill RAM. Finally booted again.

    Decided to flash previous version of BIOS (4406 - see downloads/versions here) I was using before completely stable with my RAM at 3000 MHz (rated for 3200 MHz, but couldn't get it to boot past 3066 and can't make 3066 stable no matter how much voltage I throw at it up to 1.5v).

    BIOS flashed successfully and it boots normally at stock/default speeds just fine. I go back into BIOS and set RAM back to 3000 MHz with previous settings and it boots perfectly back into Windows where I'm typing this now and Prime95 has been running on all threads for the past 10 mins now with no errors. Just to reiterate, I couldn't get anything past default RAM speeds (2133 I think) to boot on the newest BIOS (4602). So I'm not sure how I feel about this if newer BIOS updates are borking my rig. Should I still return the board and get a better one, you think? Regardless, I'm not happy with this RAM not reaching its rated speeds and I'm not sure if that's the mobo or RAM's fault at this point. The problem is that I got it from another member here on FS/FT, so I'd just have to resell it and let the buyer know that it doesn't seem to run as well with Ryzen as it probably would on Intel, but I'm not even sure about that.
     
  25. spine

    spine 2[H]4U

    Messages:
    2,484
    Joined:
    Feb 4, 2003
    Sounds to me like you've blown the IMC on that CPU.

    From my experience with modern chips, it's the IMC that goes first when overclocking aggressively. You can trying bumping up the uncore/imc voltage and see if that helps. Could be all you need is bump that up and you'll be stable again.

    For reference, my old 4790K now requires 1.5v to be stable at current clocks! :eek::LOL:
     
  26. T4rd

    T4rd [H]ardForum Junkie

    Messages:
    16,055
    Joined:
    Apr 8, 2009
    What if you go back to stock clocks and voltage then (is it stable)? I stress tested it for 30 mins last night on all threads at stock clocks (3.8 GHz) and voltage (1.2ish volts) and it was completely stable with only the RAM OC'd at 3000 MHz/1.35v. I'll try to bump the CPU back up to 1.4v tonight and back up to 1.41 GHz it was stable at before. I never set voltage passed 1.4v in the BIOS because I didn't want to risk damaging it, but it seems somehow that Ryzen Master app had some bogus settings in it and overrode the BIOS, pushing it up to 1.5+ volts, but fortunately I never stress tested it at that voltage so hopefully it didn't damage it much if at all. As you can see in that first screenshot too, the CPU was running at 4.325 GHz too when the voltage was that high, which is kinda excessive on the stock cooler as well. I don't think most people push these much past 4.2 GHz typically.
     
  27. T4rd

    T4rd [H]ardForum Junkie

    Messages:
    16,055
    Joined:
    Apr 8, 2009
    Just for posterity in case anyone else has a similar issue like this on this mobo, it was definitely a borked BIOS issue. Been 100% stable on the previous BIOS so far and RAM OC'd back to 3000 MHz, but I haven't touched the CPU again since just looking at it in the Ryzen Master app, is already boosting to 4.1 GHz on its own across all cores it seems. So I'm not even going to bother locking it in at that speed if I don't have to and let it boost up to that clock if it can or needs to.
     
    N4CR, GoldenTiger, spine and 3 others like this.
  28. Grimlaking

    Grimlaking 2[H]4U

    Messages:
    2,767
    Joined:
    May 9, 2006
    Glad to hear that.
     
    spine likes this.
  29. funkydmunky

    funkydmunky 2[H]4U

    Messages:
    2,263
    Joined:
    Aug 28, 2008
    Didn't your problems start on the previous (now current) BIOS, and then you updated?
     
  30. T4rd

    T4rd [H]ardForum Junkie

    Messages:
    16,055
    Joined:
    Apr 8, 2009
    Nah, see my quote here:

     
  31. funkydmunky

    funkydmunky 2[H]4U

    Messages:
    2,263
    Joined:
    Aug 28, 2008
    Okay, gotcha. Was trying to follow to help (No easy task! I mean just read above. Yikes!) but it did get a little confusing. Very glad you got the ship righted though :)
     
  32. Thatguybil

    Thatguybil [H]Lite

    Messages:
    85
    Joined:
    Jan 21, 2017
    I have read about people having issues with Vcore being jacked after bios updates. It’s is definitely one of those things I check automaticity after bios updates.

    Quick reddit example of a similar issue.