Display driver atikmdag/nvlddmkm stopped responding and has recovered

taltamir

Weaksauce
Joined
Oct 20, 2007
Messages
65
How to fix the error:
"Display driver atikmdag/nvlddmkm stopped responding and has recovered".

A year ago if your video card crashed your computer will simply get stuck on one picture (usually with it being a distorted picture) and remain that way until you turn it off and then on again.
However with new drivers and the help of some new windows capabilities it will now terminate the driver and power cycle the video card. And then give you the message listed above... "atikmdag" for ati users. "nvlddmkm" for nvidia. This is a wonderful thing as it eliminates the need to restart when a video card crash occurs, protects your open data, and even allows some games to continue operating without crashing (some games can rehook into the driver once it restarts and proceed as if nothing happened).

Unless this is happening with only one game (in which case it is the fault of the game) then you have a HARDWARE PROBLEM!
For example, I get it under a very specific reproduce able scenario in galciv2. I get it whenever the buggy nwn2 crashes, and I got it a few other times with specific games where it STOPPED happening after they released patches for said games OR newer drivers... but if you are getting this all the time for all games then your video card is crashing.

It is ALSO possible for it only happen with some games and STILL be a hardware problem... Just short of a year ago I built a system that will crash after about an hour of intensive gaming on only two games... those were the only two games I played that were intensive enough to cause it (with frame limitations preventing it on others)... The reason was that my power supply was inadequate. Upgrading the power supply eliminated all the problems.

Here is a step by step process of testing:

1. Make sure your power supply outputs enough amps on the appropriate rails for your video card. Remember that factory overclocked video cards require more power and that some power supplys cannot MAINTAIN the max amount provided over long periods of time... So it should be a certain amount over the minimum.
Note: If available you can test the video card on another computer (which has a good enough power supply) to quickly find out if your video card is defective.
2. Run memtest+ from www.memtest.org overnight, there should be 0 errors... even 1 error means your are overclocking the ram too hard, or that the ram / motherboard is defective.
3. If neither of the above is the problem then get a warranty replacement of your card!
4. If it still doesn't work get a warranty replacement for the motherboard.
5. If you have a factory overclocked card for which the warranty has EXPIRED you can try downclocking it to the "stock" speeds for the chip it uses. This will most likely solve your problem.

Unless you are overclocking or have ATROCIOUS blockages in your computer then cooling shouldn't be a problem... but it just might. I assume that if you are overclocking then you would know better and already realize that you simply need to overclock less aggressively.
 
Your last few lines there nail it.

I've only had this happen with ATi and nVidia products when I overclock way too far. I've been using Vista x64 since release. My GTX is a first rev card I've had since they came out too.

Recently, I have been playing with HD 3850 Crossfire in my other rig, Then an HD 3870 X2.

Also great points on the PSU. Never put hardware on a cheap a$$ PSU with "dirty" power and expect your overclocking to do well or your components to last long. Beware of very cheap PSUs with very high peak power ratings tested in cool conditions.
 
This has been very frustrating for me.

I've had this problem come up after the system was running fine.

I just put this system together a few weeks ago and it was doing OK until a few days ago when this started happening while playing COD4. It started to happen more and more to the point that it was doing it on the desktop when nothing was open or running. It was happening every minute making the system virtually unusable.

System Specs:
Vista Ultimate 64-bit
P5E mobo
Q6600
Mushkin DDR2 800 (2 x 2GB)
Seagate 7200 320GB (Windows and Programs)
Seagate 7200.11 750GB (File Storage)
CORSAIR CMPSU-520HX 520 watt PSU
BFG Tech 8800 GTS (G92) 512MB
Forceware 169.25 drivers


I've got the Processor running stock.
The video card is set to 650 core / 970 memory.
Windows update has everything installed.
I've installed many of the hotfixes that I've read about in other posts and such (ex: kb945149, kb936710).
I've disabled the transparent desktop.
I'm running the Windows Classic theme.
I've uninstalled and reinstalled the video drivers.

This has helped. It does not seem to crash on the desktop or while web surfing anymore. But I can't play COD4 for more than 2-3 minutes without a
freeze...black screen...freeze...back to life again.

I don't think heat is a problem. I've had the case opened up and the problem will still happen.

What else can I do? Everything was fine for a few weeks and then this came up out of nowhere.
 
opening the case can actually make heat worse (but usually it offers slight improvement). and even if it doesn't its improvements are minimal. Get rivatuner and use it to measure the actual temperature, your GPU has a built in temparature sensor.

The 8800GT and the 8800GTS 512 have pretty bad overheating issues. I have gone through several GTS cards myself. I recommend you RMA yours (the shorter time before crashing indicates to me that some permanent damage was done).

And when you get the new card use rivatuner to increase the default fanspeed from 29% to something higher. I went with 52% because at 53% it gets loud for me.
 
I know opening the case is not always going to bring temps down.

Here's some more info:

I have found that the GPU temp (using the nvidia control panel with ntune installed) is about 55-57°C when idle with automatic fan control on. If I change to Direct fan control and crank it up to 99% the temp will drop down to around 44-45°C. The problem will still continue even with the fan cranked up and the temp staying at 44-45. Please note it is still happening while the system is pretty much idle; no gaming, just having the nvidia control panel open and one web page. If I down clock the Core bus to 450 Mhz and the Memory bus to 750 Mhz the system does seem to respond at a low usage level. Meaning any kind of 3D gaming is out of the question.

My CPU temp is around 28°C and system temp is 35°C according to AI Suite when idle. I rendered a movie using Sony Vegas. The CPU usage went up to about 85-90%. The CPU temp rose to about 47°C. During the render (about 15 minutes long) the System temp rose to about 35°C and the GPU rose 1°C. I'm sure the minimal rise in GPU temp was due to the overall temperature increase in the case. There were no problems during the render, but I really didn't expect any during that type of operation.

I tired to launch Quake III Arena and was going to play a bit while windowed so I could keep an eye on the temps and such. But the system failed and actually Blue Screened and reset. When it came back there were a few "Driver stopped responding..." problems but after I cranked up the fan control and down clocked the Core and Mem it has been stable enough to at least type this post.

I have noticed a lot of disk usage (sometimes lasting up to 5 minutes after a reboot). I can't be sure but it seems like sometimes before/during/after one of these failures there will be a spike in usage and CPU load.

I'm sorry for the long post, but I wanted to give as much info as I could. I'm pretty much ready to RMA the card to newegg anyway. Unless anyone can see something I missed.
 
I'm sorry for the long post, but I wanted to give as much info as I could. I'm pretty much ready to RMA the card to newegg anyway. Unless anyone can see something I missed.

Why RMA with NewEgg? Just go straight to BFG...

@OP: good post, nice info :)
 
I called BFG this morning to discuss with them and RMA if necessary. When I pulled out the manual to get the phone number I noticed they have 24/7 support. What an idiot. It never occurred to me that I could have called over the weekend. Anyway, they want me to call back tonight while I'm in front of the PC with the card installed. The tech support guy did give me some information:

He told me I should install some of the Microsoft Hotfixes. I told him that I installed everything that Windows Update offered and some other fixes that I downloaded directly from MS that I had read about on forum posts.

He told me that I should install the latest chipset drivers. I said I had the latest ones installed. He said I should reinstall them.

He said at worst case scenario I may have to format/reinstall. I really hate that "last resort" approach they always seem to come up with. He said he has heard this problem before and it has always been fixed with driver updates and he has only had one person who formatted/reinstalled. He specifically said it was a Windows problem in the way it works with the driver.

I told him that I cranked up the fan control and he said that really would not do anything. I told him that many others have found heat to be the problem and that's why I increased the fan control. He didn't think that had anything to do with it.

The funny thing is that he never asked or said anything about installing the latest Forceware drivers. At the end of the conversation I did tell him that I had 169.25 installed.

It's funny what the tech support guys say and think sometimes.
 
Tech support knows jack.
Don't call them. Submit an online RMA repair request. prefereably with cross ship. They send you a recertified card, you send the damaged card. They either fix it, trash it, or send it to some other guy (doubtful).
The fact is, it deteriorated, thats clear indication of heat damage.
The "auto" fanspeed is actually locked at 29%, it doesn't actually scale (except for the geforce 8800GT from eVGA for which eVGA made a custom bios that scales it with heat)
The idle of 57c is exactly the reason there is issues. 99% fanspeed is unnecessary, 52% is fast enough. with 57c idle you are gonna have over 90c load (probably 95c or so). Maybe even over 100c... Consider, water boils at 100c.

My current temp is 41c, I am idling with 52% fanspeed on room temp of 18c (65 Fahrenheit). Fanspeed and temp are not linear.
Normally my idle is higher. But even with heating on I don't get the insane temps I got with 29% fanspeed (ie, "auto")
 
I talked to BFG again last night. This time I got a much more knowledgeable guy.

He told me that this is a known problem (he blamed Vista). He said there is a fix that will be in the Vista Service Pack due out soon. It has something to do with the nvlddmkm.sys file getting overloaded and crashing. Something like it keeps getting written to over and over. He had me do the following workaround:

Unistall nvidia drivers.
Boot into safe-mode and run “Drive Sweeper” utility that he had me download from guru3d.
Then we checked for nvlddmkm.sys. There were still 3 copies of the file in system folders. We deleted them.
I then booted up regularly into Vista and reinstalled Forceware 169.25 drivers.
He said that should do it, if it does not the card is probably bad and I will have to RMA it.

After all that I fired up COD4 hoping for the best. About 90 seconds into it I froze, went black and came back. Same thing again.

Anyway, I'm posting the above “work around” because maybe it will help someone. I didn't do it for me. When I get the new card I will be upping the fan control to at least 50% or so. I'll go as high as I can without it being too loud.

The "auto" fanspeed is actually locked at 29%, it doesn't actually scale (except for the geforce 8800GT from eVGA for which eVGA made a custom bios that scales it with heat)

So is it really the case that the fan speed never changes? That seems pretty lame to have an adjustable fan control that doesn't adjust.
 
Sorry to bring this post back up but I'm having this problem. It only occurs on startup though. Usually if I reset the computer the problem will go away, but lately I've been getting the same error message even when I restart the computer. As long as I don't get the error message on startup, games will run fine. If I do get the error message and play a game, then the screen will show my desktop but would be unresponsive and the game music will play. It's as if the game is running but does not show up on the screen. I have a pretty beefy psu (silverstone 750w with 4x18A rails).
 
Back
Top