computer hardlocks - possibly GPU

Discussion in 'General Hardware' started by killernerd, Jun 5, 2018.

  1. killernerd

    killernerd Limp Gawd

    Messages:
    206
    Joined:
    Mar 8, 2012
    My computer's been acting strange lately, and by strange I mean it's been hardlocking on me.

    hardware:

    Asus Rampage IV black edition
    Intel core I7 3820
    16 gigs RAM (4 x 4gig)
    Nvidia GTX 980 ti (hooked to 2 24" 1080p monitors)
    Corsair HX1050 modular PSU
    5 harddrives (2 SSD, 3 data drives)
    1 disk drive I no longer use but am too lazy to remove

    windows 10 pro (64 bit)

    When did it start:

    Not sure on exact moment, a couple of weeks ago probably. One day it was fine and the next it wasn't. No major windows updates/driver updates/hardware changes to my knowledge.

    What happens:

    monitors will freeze (including the little one on my G19 keyboard) and computer will no longer respond to any inputs.
    Sound keeps playing like normal for approx 30 seconds to 1 minute before "corrupting" (playing the last bit of sound in a loop, like a buzzing sound nothing recognizable).

    OR

    Computer will freeze for a couple seconds then reset itself. Nothing particularly interesting to be found in the logs.

    No BSOD or other error messages are given (eventlogs only indicate an unexpected Kernel Power event due to resetting), the computer does respond to pressing the reset button which is the only way to fix it.

    When does it happen:

    So far I've encountered 2 scenarios:

    1. I boot up a game and after a random amount of time (though never more than 10 minutes) the computer will hardlock. If it doens't crash in the first 10-15 minutes it'll be fine for the rest of the session, though scenario 2 might still occur.
    2. I'll play a game without issues, close it down and go do something else, mostly either youtube or netflix. After a couple of minutes the computer will hardlock.
    Both scenarios have the GPU in common, which is why it's currently my main suspect.

    Things I've checked/tried
    • Voltages all seem fine according to Asus' AI suite (both at idle and under load the voltages don't move a bit)
      • 12V rail sits at 12V dead (12.096V at idle, highest i've seen is 12.196 but only sporadically and only for a second )
      • 5V is at 5.04
      • 3.3 is at 3.296
      • Vcore is at 1.275
    • Checked voltages in BIOS too but since there's no load this only indicates that voltages are fine when idling
    • temperatures are fine (according to speedfan):
      • GPU will get up to about 85-90°C before throttling starts
      • CPU is high-ish but nothing concerning at 65 - 68°C (depending on core)
      • all other temps are sitting at 30 - 35°C (harddrives, motherboard)
    So in short: no anomalies to be found.
    • Ran Furmark for an hour, no issues (strangely enough)
    • Ran intel burntest (for about 30 minutes), no issues
    • Ran memtest (couple of hours), no issues
    • updated GPU drivers
    • CMOS reset
    • Case is as clean as can be expected for a system that hasn't been changed in 3 years (last thing replaced was the motherboard and GPU).

    Drivers/BIOS:

    Motherboard BIOS is latest stable version 0701, only newer version is 0801 but that's been in "BETA" since 2014 and only unlocks support for 128gigs of ram

    GPU drivers are recent (391.35 from 2 months ago) since they're the first thing I updated when all this started


    So as i said my main suspect is the GPU but i'd like a second opinion on this. Maybe I've missed something or there's something else I can try before I start buying new parts.
     
    Last edited: Jun 7, 2018
  2. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    9,660
    Joined:
    Oct 7, 2000
    seems like the gpu is crashing. update you video drivers to the newest ones and try giving it a bit extra voltage. also test your cmos battery, had low ones cause all sorts of issues. if none of that helps, then try another gpu/psu next.
     
  3. MrTroy03

    MrTroy03 Limp Gawd

    Messages:
    415
    Joined:
    Feb 12, 2008
    Try a different GPU if you have one or the onboard just to see if it still locks up.
     
  4. mnewxcv

    mnewxcv [H]ardness Supreme

    Messages:
    5,785
    Joined:
    Mar 4, 2007
    Downclock by 10% and retest
     
  5. killernerd

    killernerd Limp Gawd

    Messages:
    206
    Joined:
    Mar 8, 2012
    My driver was only 3 or so releases behind the latest stable but can't hurt I suppose. cmos battery... didn't think of that one before. Have to admit there's some other weird behavior that might be explained by a low battery.
    PSU's *probably* good, but only probably, given the stability in the sensor readouts.

    Unfortunately I'm too good of a person and hand out my old GPUs so no, I don't have one in reserve atm. Also pretty sure an Asus Rampage IV black edition doesn't have an onboard one.

    I'll do this as a last ditch effort. If downclocking fixes it then it's broken in my eyes
     
  6. mnewxcv

    mnewxcv [H]ardness Supreme

    Messages:
    5,785
    Joined:
    Mar 4, 2007
    My point in downclocking isn't to fix the problem, it's to identify it.
     
  7. vick1000

    vick1000 [H]ard|Gawd

    Messages:
    1,817
    Joined:
    Sep 15, 2007
    Had a similar problem on both a 970 and 760 with certain drivers, GPU just drops signal to monitor. DDU from safe mode, and driver change to a known good one fixed it for me. If it's happening out of the blue though, maybe the card is dying. If you are on win10, could it be updating the driver on you?
     
  8. killernerd

    killernerd Limp Gawd

    Messages:
    206
    Joined:
    Mar 8, 2012
    Ofcourse but then, in order to fix it i'd have to increase the voltage or keep it running at 10% lower clocks.

    Not sure tbh, I have noticed win10 automatically updating drivers before but since I manually updated the GPU drivers it's no longer doing it.

    In any case, I'm redoing the stability tests on the latest nvidia drivers and so far it hasn't hardlocked.
    So even though I've tried 2-3 other releases, this appears to be the one that fixes the issue (at least, so far).
     
  9. killernerd

    killernerd Limp Gawd

    Messages:
    206
    Joined:
    Mar 8, 2012
    As i expected, drivers were not the issue. While playing bioshock computer froze again and reset itself (which I forgot to add it does sometimes on its own).

    Next step is downclocking with msi afterburner, though this only allows me to downclock by 90 MHz on the core and 200 MHz on the memory. Plus I'm not sure how much effect this'll have because during most crashes the GPU had already throttled itself because of temperature.

    Someone suggested upping the voltage, msi afterburner shows the slider but it's greyed out. What other app/method could I use to increase voltage to the GPU?
     
    Last edited: Jun 7, 2018
  10. mnewxcv

    mnewxcv [H]ardness Supreme

    Messages:
    5,785
    Joined:
    Mar 4, 2007
    if your card is getting hot enough to throttle, then you don't want to increase voltage. Try setting a custom fan curve in after burner so that when the card hits 85C, the fan is at 100%. Combine that with downclocking by 90/200 and see what happens.
     
  11. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    9,660
    Joined:
    Oct 7, 2000
    yeah I didn't see that you were getting throttle issues. crank you fans up. maybe even clean and re-tim the gpu.
     
  12. killernerd

    killernerd Limp Gawd

    Messages:
    206
    Joined:
    Mar 8, 2012
    Yeah, the case I'm using (coolermaster cosmos 2) is better suited for watercooling (which I intended on doing but shit hit the fan which is why this computer is such a weird/imbalanced build) so the GPU'll hit about 80-85° and start throttling.
    Though I must admit I thought this was pretty normal for the GPU to hit a certain temp and prioritize throttling over increasing fanspeed as the default fan-curve refuses to go above 65-ish %. (A quick google says max temp is 92°C which the card has never hit)
    The GPU will throttle itself from 1200-ish MHz down to as low as 900 (on the core, haven't checked memory) so maybe I should start thinking about getting a case with more airflow...

    In any case, Core and memory have been downclocked and a custom fan-curve has been set so it'll hit 100% when gpu gets to 80°C (card will start throttling at 83°C according to afterburner)
    btw: I found the setting to unlock voltage control, apparently MSI disables it for safety. But as you said it might not be the brightest idea to start playing with the voltages.
     
    pendragon1 likes this.
  13. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    9,660
    Joined:
    Oct 7, 2000
    a little voltage bump wont make to much heat difference if your running stock otherwise. if your case has dust filters, take em out to help airflow.
     
  14. Pennaith

    Pennaith n00bie

    Messages:
    1
    Joined:
    Jun 7, 2018
    I feel like I had similar behavior when I had an SSD on its way out. You might try limiting the drives you have hooked up and see if you can narrow it down to the issue only occurring when certain one(s) are present.
     
  15. killernerd

    killernerd Limp Gawd

    Messages:
    206
    Joined:
    Mar 8, 2012
    Pennaith Possible but then i'd expect other issues as well outside of stuff that uses the GPU because right now it's only crashing when gaming and watching stuff on youtube (though the latter is rare).
    And the only SSD in my system that might be going out is my old 120 gig OCZ vertex that i bought 7-odd years ago which only holds the OS and drivers (and a couple other small bits).

    Not to mention that SSDs tend to fail pretty spectacularly, going from functioning to dead pretty much overnight as the amount of "dead cells" explodes exponentially.

    In any case.

    I've been running the underclock + custom fan profile for the past couple of days now with 0 crashes so I suppose that's good news.
    It does lead to 3 possible conclusions:

    1. the GPU is dying and requires the underclock to keep functioning (worst case imo)
    2. it's temperature related which was solved by running the custom fan profile (it stabilizes at 75-80° with fans spinning at 90-95%)
    3. it's voltage related, somehow the stock V settings aren't cutting it and thus the GPU can't get enough juice so it dies. Though I think this is less likely given the fact that I've also had 2 or 3 crashes when it was pretty much idling.

    (4. all of the above)

    In the end I think further testing is required... What I'll be doing next is resetting to stock clocks but keeping the custom fan profile.
    If that's stable it's probably temperature related, if it crashes again the GPU's probably dying or in need of a voltage bump.

    I'll update again in a week or so.
     
  16. killernerd

    killernerd Limp Gawd

    Messages:
    206
    Joined:
    Mar 8, 2012
    A rather quicker update than anticipated but the computer started locking again on stock clocks with the custom fan profile.

    Last thing i'm going to do is run stock clocks, with the custom fan profile, and bump the core voltage up by 30 mV.
    Though at this stage I think it's safe to conclude that the GPU is on its way out.

    Oh well as long as I can get it stable (with either underclocking or overvolting) i suppose it's fine.
    I'm thinking of upgrading anyway towards the end of the year. Maybe get myself an AMD Ryzen system.

    In any case, I'll update this topic once more to report my findings with the +30mV on the gpu core.
     
  17. vick1000

    vick1000 [H]ard|Gawd

    Messages:
    1,817
    Joined:
    Sep 15, 2007
    One last thing, if you are running the rig on a UPS, try bypassing it and running straight from the wall, a lot of times an old UPS cannot provide stable power to a hungry system.
     
  18. TheFlayedMan

    TheFlayedMan n00bie

    Messages:
    39
    Joined:
    May 29, 2015
    Any updates? I'm curious about this thread.