computer hardlocks - possibly GPU

killernerd

Limp Gawd
Joined
Mar 8, 2012
Messages
215
My computer's been acting strange lately, and by strange I mean it's been hardlocking on me.

hardware:

Asus Rampage IV black edition
Intel core I7 3820
16 gigs RAM (4 x 4gig)
Nvidia GTX 980 ti (hooked to 2 24" 1080p monitors)
Corsair HX1050 modular PSU
5 harddrives (2 SSD, 3 data drives)
1 disk drive I no longer use but am too lazy to remove

windows 10 pro (64 bit)

When did it start:

Not sure on exact moment, a couple of weeks ago probably. One day it was fine and the next it wasn't. No major windows updates/driver updates/hardware changes to my knowledge.

What happens:

monitors will freeze (including the little one on my G19 keyboard) and computer will no longer respond to any inputs.
Sound keeps playing like normal for approx 30 seconds to 1 minute before "corrupting" (playing the last bit of sound in a loop, like a buzzing sound nothing recognizable).

OR

Computer will freeze for a couple seconds then reset itself. Nothing particularly interesting to be found in the logs.

No BSOD or other error messages are given (eventlogs only indicate an unexpected Kernel Power event due to resetting), the computer does respond to pressing the reset button which is the only way to fix it.

When does it happen:

So far I've encountered 2 scenarios:

  1. I boot up a game and after a random amount of time (though never more than 10 minutes) the computer will hardlock. If it doens't crash in the first 10-15 minutes it'll be fine for the rest of the session, though scenario 2 might still occur.
  2. I'll play a game without issues, close it down and go do something else, mostly either youtube or netflix. After a couple of minutes the computer will hardlock.
Both scenarios have the GPU in common, which is why it's currently my main suspect.

Things I've checked/tried
  • Voltages all seem fine according to Asus' AI suite (both at idle and under load the voltages don't move a bit)
    • 12V rail sits at 12V dead (12.096V at idle, highest i've seen is 12.196 but only sporadically and only for a second )
    • 5V is at 5.04
    • 3.3 is at 3.296
    • Vcore is at 1.275
  • Checked voltages in BIOS too but since there's no load this only indicates that voltages are fine when idling
  • temperatures are fine (according to speedfan):
    • GPU will get up to about 85-90°C before throttling starts
    • CPU is high-ish but nothing concerning at 65 - 68°C (depending on core)
    • all other temps are sitting at 30 - 35°C (harddrives, motherboard)
So in short: no anomalies to be found.
  • Ran Furmark for an hour, no issues (strangely enough)
  • Ran intel burntest (for about 30 minutes), no issues
  • Ran memtest (couple of hours), no issues
  • updated GPU drivers
  • CMOS reset
  • Case is as clean as can be expected for a system that hasn't been changed in 3 years (last thing replaced was the motherboard and GPU).

Drivers/BIOS:

Motherboard BIOS is latest stable version 0701, only newer version is 0801 but that's been in "BETA" since 2014 and only unlocks support for 128gigs of ram

GPU drivers are recent (391.35 from 2 months ago) since they're the first thing I updated when all this started


So as i said my main suspect is the GPU but i'd like a second opinion on this. Maybe I've missed something or there's something else I can try before I start buying new parts.
 
Last edited:
seems like the gpu is crashing. update you video drivers to the newest ones and try giving it a bit extra voltage. also test your cmos battery, had low ones cause all sorts of issues. if none of that helps, then try another gpu/psu next.
 
Try a different GPU if you have one or the onboard just to see if it still locks up.
 
seems like the gpu is crashing. update you video drivers to the newest ones and try giving it a bit extra voltage. also test your cmos battery, had low ones cause all sorts of issues. if none of that helps, then try another gpu/psu next.

My driver was only 3 or so releases behind the latest stable but can't hurt I suppose. cmos battery... didn't think of that one before. Have to admit there's some other weird behavior that might be explained by a low battery.
PSU's *probably* good, but only probably, given the stability in the sensor readouts.

Try a different GPU if you have one or the onboard just to see if it still locks up.

Unfortunately I'm too good of a person and hand out my old GPUs so no, I don't have one in reserve atm. Also pretty sure an Asus Rampage IV black edition doesn't have an onboard one.

Downclock by 10% and retest

I'll do this as a last ditch effort. If downclocking fixes it then it's broken in my eyes
 
My driver was only 3 or so releases behind the latest stable but can't hurt I suppose. cmos battery... didn't think of that one before. Have to admit there's some other weird behavior that might be explained by a low battery.
PSU's *probably* good, but only probably, given the stability in the sensor readouts.



Unfortunately I'm too good of a person and hand out my old GPUs so no, I don't have one in reserve atm. Also pretty sure an Asus Rampage IV black edition doesn't have an onboard one.



I'll do this as a last ditch effort. If downclocking fixes it then it's broken in my eyes

My point in downclocking isn't to fix the problem, it's to identify it.
 
Had a similar problem on both a 970 and 760 with certain drivers, GPU just drops signal to monitor. DDU from safe mode, and driver change to a known good one fixed it for me. If it's happening out of the blue though, maybe the card is dying. If you are on win10, could it be updating the driver on you?
 
My point in downclocking isn't to fix the problem, it's to identify it.

Ofcourse but then, in order to fix it i'd have to increase the voltage or keep it running at 10% lower clocks.

Had a similar problem on both a 970 and 760 with certain drivers, GPU just drops signal to monitor. DDU from safe mode, and driver change to a known good one fixed it for me. If it's happening out of the blue though, maybe the card is dying. If you are on win10, could it be updating the driver on you?

Not sure tbh, I have noticed win10 automatically updating drivers before but since I manually updated the GPU drivers it's no longer doing it.

In any case, I'm redoing the stability tests on the latest nvidia drivers and so far it hasn't hardlocked.
So even though I've tried 2-3 other releases, this appears to be the one that fixes the issue (at least, so far).
 
As i expected, drivers were not the issue. While playing bioshock computer froze again and reset itself (which I forgot to add it does sometimes on its own).

Next step is downclocking with msi afterburner, though this only allows me to downclock by 90 MHz on the core and 200 MHz on the memory. Plus I'm not sure how much effect this'll have because during most crashes the GPU had already throttled itself because of temperature.

Someone suggested upping the voltage, msi afterburner shows the slider but it's greyed out. What other app/method could I use to increase voltage to the GPU?
 
Last edited:
As i expected, drivers were not the issue. While playing bioshock computer froze again and reset itself (which I forgot to add it does sometimes on its own).

Next step is downclocking with msi afterburner, though this only allows me to downclock by 90 MHz on the core and 200 MHz on the memory. Plus I'm not sure how much effect this'll have because during most crashes the GPU had already throttled itself because of temperature.

Someone suggested upping the voltage, msi afterburner shows the slider but it's greyed out. What other app/method could I use to increase voltage to the GPU?

if your card is getting hot enough to throttle, then you don't want to increase voltage. Try setting a custom fan curve in after burner so that when the card hits 85C, the fan is at 100%. Combine that with downclocking by 90/200 and see what happens.
 
yeah I didn't see that you were getting throttle issues. crank you fans up. maybe even clean and re-tim the gpu.
 
Yeah, the case I'm using (coolermaster cosmos 2) is better suited for watercooling (which I intended on doing but shit hit the fan which is why this computer is such a weird/imbalanced build) so the GPU'll hit about 80-85° and start throttling.
Though I must admit I thought this was pretty normal for the GPU to hit a certain temp and prioritize throttling over increasing fanspeed as the default fan-curve refuses to go above 65-ish %. (A quick google says max temp is 92°C which the card has never hit)
The GPU will throttle itself from 1200-ish MHz down to as low as 900 (on the core, haven't checked memory) so maybe I should start thinking about getting a case with more airflow...

In any case, Core and memory have been downclocked and a custom fan-curve has been set so it'll hit 100% when gpu gets to 80°C (card will start throttling at 83°C according to afterburner)
btw: I found the setting to unlock voltage control, apparently MSI disables it for safety. But as you said it might not be the brightest idea to start playing with the voltages.
 
a little voltage bump wont make to much heat difference if your running stock otherwise. if your case has dust filters, take em out to help airflow.
 
I feel like I had similar behavior when I had an SSD on its way out. You might try limiting the drives you have hooked up and see if you can narrow it down to the issue only occurring when certain one(s) are present.
 
Pennaith Possible but then i'd expect other issues as well outside of stuff that uses the GPU because right now it's only crashing when gaming and watching stuff on youtube (though the latter is rare).
And the only SSD in my system that might be going out is my old 120 gig OCZ vertex that i bought 7-odd years ago which only holds the OS and drivers (and a couple other small bits).

Not to mention that SSDs tend to fail pretty spectacularly, going from functioning to dead pretty much overnight as the amount of "dead cells" explodes exponentially.

In any case.

I've been running the underclock + custom fan profile for the past couple of days now with 0 crashes so I suppose that's good news.
It does lead to 3 possible conclusions:

1. the GPU is dying and requires the underclock to keep functioning (worst case imo)
2. it's temperature related which was solved by running the custom fan profile (it stabilizes at 75-80° with fans spinning at 90-95%)
3. it's voltage related, somehow the stock V settings aren't cutting it and thus the GPU can't get enough juice so it dies. Though I think this is less likely given the fact that I've also had 2 or 3 crashes when it was pretty much idling.

(4. all of the above)

In the end I think further testing is required... What I'll be doing next is resetting to stock clocks but keeping the custom fan profile.
If that's stable it's probably temperature related, if it crashes again the GPU's probably dying or in need of a voltage bump.

I'll update again in a week or so.
 
A rather quicker update than anticipated but the computer started locking again on stock clocks with the custom fan profile.

Last thing i'm going to do is run stock clocks, with the custom fan profile, and bump the core voltage up by 30 mV.
Though at this stage I think it's safe to conclude that the GPU is on its way out.

Oh well as long as I can get it stable (with either underclocking or overvolting) i suppose it's fine.
I'm thinking of upgrading anyway towards the end of the year. Maybe get myself an AMD Ryzen system.

In any case, I'll update this topic once more to report my findings with the +30mV on the gpu core.
 
One last thing, if you are running the rig on a UPS, try bypassing it and running straight from the wall, a lot of times an old UPS cannot provide stable power to a hungry system.
 
As promised, an update.

One last thing, if you are running the rig on a UPS, try bypassing it and running straight from the wall, a lot of times an old UPS cannot provide stable power to a hungry system.

I'm not running a UPS but good idea nonetheless.

As I said in my previous reply I've been running the system with +30 mV on the gpu core and a custom fan profile that'll actually go up to 100% if required for about 2 weeks now.
I've noticed that it keeps the system from crashing with a 99% success rate. I've only had it crash once on these settings (which was probably a fluke).

What I do still notice is some random mini-freezes (if that makes sense) where the system (or at least the game) freezes for about half a second as if the GPU's trying to decide whether or not life's worth living (or that game's worth rendering).
Which is always a butt-clenching moment, especially when you're playing a high-paced multiplayer shooter with your friends. So far it's always pulled through but I have a feeling that there's gonna be a moment when it won't.
Maybe going to +50 mV will fix this? Food for thought (and experimentation).

I did try reverting back to stock (or at least reset the core voltage and keep the fan profile) a couple of times as a ways of verifying that stock settings does still crash the system and that this isn't some kind of fluke (which it wasn't).

So, in short:
At this point it's pretty much certain that the GPU is on the way out or has at least been banged up enough to require more voltage to keep working properly.
Other than those little mini-freezes and that one (fluke) crash it's been completely stable on +30 mV.

I'll keep running the system on these settings for as long as its stable, if it starts crashing again I can try upping the voltage some more but there's a limit to it obviously (which is 86mV when using afterburner, if i want to go beyond that I'll have to find another app, but at that point I might as well consider the damn thing dead and bury it).

As I've said before I might simply end up building a completely new system by the end of the year but that'll depend on finances 'n such, if it really comes down to it I might just replace the GPU.
 
Another update.

The lagging and crashing is getting worse (again).
It's getting to the point that the shooters i'm playing are becoming frustratingly unplayable, pretty much each explosion/enemy I face will cause the game the freeze for about a second invariably killing me...

And though my pc doesn't crash while the game is running anymore (so far at least), it's now mainly crashing using the "hang and reset a couple of minutes after the game has shut down" way.
Really strange (btw: still nothing out of the ordinary on the temps and voltages front).

At this point the GPU is running at +50 mV and it's clear that more Vs isn't going to solve the problem.

So I guess it's time to start planning the burial of this aging system (but i'm still open to suggestions should anyone still have any).
 
lower the clock/mem speeds and see what it does. if it act normal its def dying.
 
Did you replace the thermal compound on the GPU? It sounds to me like the crashes are almost entirely heat-related. The factory TIM on the 980 is probably dried up.
 
lower the clock/mem speeds and see what it does. if it act normal its def dying.

Good idea

Did you replace the thermal compound on the GPU? It sounds to me like the crashes are almost entirely heat-related. The factory TIM on the 980 is probably dried up.

That's what I initially thought too but the thermal throttling on the card kept it from going above 85°C.
Now that I've got a custom fan profile running it's keeping it sub 80°C
 
It's been a couple of months but this is probably the last update.

Seems like overvolting, underclocking and a custom fanprofile are enough to keep it stable in *most* games.
Specifially those that don't push the card too hard.

But even now there are games that push it over the edge in the "lock and reset" kind of way.

I'm growing kinda tired of this shit so i'm going to call it quits.
Guess i'm gonna have to dive into the world of system building again for the first time in like 8 years.
 
Back
Top