EVGA GTX 1070 FTW BSoD crashes due to "nvlddmkm.sys"

Ph.D

Limp Gawd
Joined
Jul 9, 2010
Messages
190
Hello there,

Today I had a BSoD out of the blue (heh). The BSoD mentioned "nvlddmkm.sys" (I did not manage to get the entire error code but if the BSoD happens again I'll be sure to write it down.) so I looked it up and found many people having issues dating back to 2009. I then set out to try some of the (least extreme) fixes that have been suggested over the years.

I uninstalled all Nvidia drivers, then used driver sweeper to clean the remnants and reinstalled the latest Nvidia GPU drivers (375.70, Release date: 10/28/2016). Sadly I got another BSoD shortly after performing the clean installation of the drivers and now I'm getting what appears to be tearing when scrolling up or down or watching moving images.

Here is an example of the what I assume to be tearing during a video:
sTg7tpu.jpg


Things of note:

- The BSoDs happen at seemingly random times but possibly always while a video is playing.

- The GPU temperature, usage or fan speed do not spike when the BSoD happens.

- I also wanted to try http://ccm.net/faq/6210-how-to-fix-an-nvlddmkm-sys-error-message
but not only do I not have c:\Nvidia I also do not have (or at least can't find) the "nvlddmkm.sy" file described in that post.

- The most recent changes I have made to the PC are getting this GPU in September and disabling the nasty Nvidia telemetry processes described here: http://www.majorgeeks.com/news/stor...o_latest_drivers_heres_how_to_disable_it.html

- Oh and yes I know this is one of the infamous EVGA FTW cards and I have already submitted the request for new thermal pads. Whether today's issues are related to the GPU model I do not know.

- The internet browser I am using here is Firefox. I've tried a chrome-based browser and IE as well and it seems they don't really get the tearing issue.

That's about all I can think of.

Additional info:
  • OS: Windows 7 Ultimate
  • Mobo: Asus P8Z68-V PRO
  • GPU driver: 375.70 - Stock settings, no overclock.
  • CPU: i7 2600K @4.2GHz
  • PSU: Seasonic X-760
  • RAM: 16GB Corsair Vengeance DDR3
  • Monitor: Dell U2311H (1080p 60Hz)
 
Last edited:
Try DDU to uninstall the old drivers. Then reinstall your new drivers.

AMD and Nvidia have been having issues with video playback in browsers lately. Black screen flashes are prominent. I was told to disable hardware acceleration in the browsers to alleviate the issue. thesmokingman suggested using MSI Afterburner to track which applications are still using hardware acceleration and how to block them from doing so.

If your card uses Micron memory then it may have a bios update on the EVGA website as the bad bios caused screen tearing and artifacts. This might be your problem. You can use GPU-Z to see who manufactured your memory as the video card manufacturers swap in different memory types as they see fit.


 
I totally forgot about this. I still had 1.45 on my PC too. Sadly my C:\Windows\MiniDump folder is empty. I think I might have automatically cleaned it out hen shutting down the PC or something.

Try DDU to uninstall the old drivers. Then reinstall your new drivers.

AMD and Nvidia have been having issues with video playback in browsers lately. Black screen flashes are prominent. I was told to disable hardware acceleration in the browsers to alleviate the issue. thesmokingman suggested using MSI Afterburner to track which applications are still using hardware acceleration and how to block them from doing so.

If your card uses Micron memory then it may have a bios update on the EVGA website as the bad bios caused screen tearing and artifacts. This might be your problem. You can use GPU-Z to see who manufactured your memory as the video card manufacturers swap in different memory types as they see fit.


According to Nvidia Inspector I have the Samsung memory. Does that change anything?
This tearing also started yesterday when the BSoDs started. I had no such issues before.
I suppose I could go through the entire driver removal and reinstalling process again but since it didn't fix anything the first time I'm not sure how much good it will do this time.
 
Just for the sake of it, try an older driver. Like 369.05 release just to exclude WDDM 2.1 issues.

MS updated a few bits lately as well. Make sure Windows is fully updated too.

And dont mess with solutions to disable stuff if you got issues.

To get a minidump to debug from you need a pagefile. So make sure you enable that and then get a BSOD to see the exact error.
 
Try DDU to uninstall the old drivers. Then reinstall your new drivers.

AMD and Nvidia have been having issues with video playback in browsers lately. Black screen flashes are prominent. I was told to disable hardware acceleration in the browsers to alleviate the issue. thesmokingman suggested using MSI Afterburner to track which applications are still using hardware acceleration and how to block them from doing so.

If your card uses Micron memory then it may have a bios update on the EVGA website as the bad bios caused screen tearing and artifacts. This might be your problem. You can use GPU-Z to see who manufactured your memory as the video card manufacturers swap in different memory types as they see fit.



Did you follow the RTSS method? I never have any of these problems going BACK YEARS... :D
 
So I used DDU to once again wipe all the drivers I could find and performed a clean installation. Back up and running again and I hope it will last this time. Fingers crossed.
 
UPDATE:

Alas, bad news. I thought the problem was fixed but I just had another BSoD. On the bright side, I managed to get the BSoD info this time using Bluescreenview.

==================================================
Dump File : 111516-15459-01.dmp
Crash Time : 15/11/2016 03:22:57
Bug Check String :
Bug Check Code : 0x00000116
Parameter 1 : fffffa80`169ba4e0
Parameter 2 : fffff880`0f95e0a8
Parameter 3 : 00000000`00000000
Parameter 4 : 00000000`0000000d
Caused By Driver : dxgkrnl.sys
Caused By Address : dxgkrnl.sys+5d1f0
File Description : DirectX Graphics Kernel
Product Name : Microsoft® Windows® Operating System
Company : Microsoft Corporation
File Version : 6.1.7601.23418 (win7sp1_ldr.160408-2045)
Processor : x64
Crash Address : ntoskrnl.exe+70400
Stack Address 1 :
Stack Address 2 :
Stack Address 3 :
Computer Name :
Full Path : C:\Windows\Minidump\111516-15459-01.dmp
Processors Count : 8
Major Version : 15
Minor Version : 7601
Dump File Size : 702,891
Dump File Time : 15/11/2016 03:25:00
==================================================

BSoD GPU Bluescreenview.png



If you need anything else, let me know!
(Oh god please don't let it be the other hardware.)
 
Last edited:
This one is a DirectX module and not nVidia. Make sure you run everything at stock while doing these debugs. So remove the overclocking. Also try a memory tester.

Perhaps its as simple as your overclock isn't stable anymore. If so, then after removing the overclock and verify its the cause you should reinstall again due to the possibility of corrupted files.

Also any reason you dont run Windows 10?
 
This one is a DirectX module and not nVidia. Make sure you run everything at stock while doing these debugs. So remove the overclocking. Also try a memory tester.

Perhaps its as simple as your overclock isn't stable anymore. If so, then after removing the overclock and verify its the cause you should reinstall again due to the possibility of corrupted files.

Also any reason you dont run Windows 10?
The only thing that is overclocked is the CPU so I guess I can drop that to stock settings. The only problem is I have no idea what exactly is causing the problem so I'll have no way of knowing if removing the CPU OC is actually helping outside of the long term.

I'll run MemTest86 to see if my RAM is OK.

Some added info: From what I can tell nearly every single time I've had the BSoD I've been watching a video. I've had video driver crashes before but until recently it was always "the video driver crashed and recovered" and now it seems to not recover. The screen goes black and the lights on the GPU go bright and dim and bright again (I normally have them dimmed) as if it's trying to restart or something. The screen stays black and the audio keeps going for a while. Then the audio gets stuck and loops, then it goes from a black screen to a BSoD.


As for why I'm not running Windows 10. I really don't like replacing an OS with a new one through an update. If I do it I'd rather make a clean wipe and start fresh. I don't feel it's worth it for this 5-6 year old setup anymore. I'll probably build a new PC next (originally this year but Kaby Lake is too much of a disappointment) year so I'll just start from scratch with Windows 10 on that one.
 
Last edited:
You may end up reinstalling anyway if its some broken installation.

Every time you get a BSOD, keep checking if the error changes.
 
hi Ph.D! i saw your post earlier and wanted to dig up the link for you. i had the exact same 'nvlddmkm.sys' BSOD except i am running Win10. i know it sounds unlikely, but the Nvidia graphics card driver crash was related to my Nvidia networking drivers! i could only use the wireless connection or wired connection / had to disable the other one. in my case, i went to the control panel / device manager / network adapters and disabled the Nvidia Nforce networking adapter.

"Only Lan working
Only W-Lan working

Both activated --> Bluescreen"

http://www.tenforums.com/bsod-crashes-debugging/19187-nvlddmkm-sys-bluescreen-windows-10-a-3.html
 
I think I can now say I have completely ruled out the RAM as a potential cause.
First I ran Windows Memory Diagnostic and received 0 errors.
Then I ran a full MemTest86 test, you can see the results of it below:

memtest86 results.png



So now that I've definitely crossed that off the list of possible problems (and have sadly sacrificed a big chunk of productivity because these tests take a very long time) I think I can move onto the next possibility.

Because removing the CPU overclock and then just hoping for the best cannot give me any immediate or definite results I'm thinking about performing a CPU stress test as well. Maybe that way I can definitely confirm the OC is the actual problem or not. What do you think?



hi Ph.D! i saw your post earlier and wanted to dig up the link for you. i had the exact same 'nvlddmkm.sys' BSOD except i am running Win10. i know it sounds unlikely, but the Nvidia graphics card driver crash was related to my Nvidia networking drivers! i could only use the wireless connection or wired connection / had to disable the other one. in my case, i went to the control panel / device manager / network adapters and disabled the Nvidia Nforce networking adapter.

"Only Lan working
Only W-Lan working

Both activated --> Bluescreen"

http://www.tenforums.com/bsod-crashes-debugging/19187-nvlddmkm-sys-bluescreen-windows-10-a-3.html
That's an interesting problem/solution. I'm afraid it does not apply to me though. I have no Nvidia Nforce network adapter. Or at least I don't see one listed. Even when I set it to "Show Hidden Devices".


You may end up reinstalling anyway if its some broken installation.

Every time you get a BSOD, keep checking if the error changes.
What kind of broken installation would it be? I've had this PC for so long now and it's not like I change out parts every week. I don't recall having made any big software installations in recent weeks/months either.

And yes I will make sure to back up the minidumps if I get more BSoDs.
 
What kind of broken installation would it be? I've had this PC for so long now and it's not like I change out parts every week. I don't recall having made any big software installations in recent weeks/months either.

And yes I will make sure to back up the minidumps if I get more BSoDs.

Software, unless its OC :)

A bad OC can also have corrupted files.

but lets see, I assume you run CPU at stock now and we await if a BSOD comes along again.
 
Software, unless its OC :)

A bad OC can also have corrupted files.

but lets see, I assume you run CPU at stock now and we await if a BSOD comes along again.
Since running at stock would not give me a conclusive answer I actually decided to run some stability/stress tests on the CPU to see if it's no longer stable. I figure that is the most effective way to actually know if the CPU or its OC is at fault. Or could there be some other reason I'd need to go full stock?

I ran an overnight x264 stress test. Kind of screwed up the first 2 hours because I forgot about the fact that my PC is set to shut down after being idle for 2 hours but once I disabled that I let it run until almost noon the today day without any issues.

Loop 40: 11:37:26.13
encoded 2121 frames, 3.16 fps, 36016.71 kb/s

I'm planning on running a RealBench stability test as well but if both of them result in zero errors I feel like I should probably start thinking in another direction.

RMA Time.
I sure hope not but on the other hand EVGA has definitely dropped the ball with all these GPUs. The thermal thing alone is enough to make me think twice about a GPU I had to pay nearly 500 euros for.





UPDATE:


I might have found something. When running RealBench Stress Test for 15 minutes I get a few instances where the screen goes black but otherwise the PC makes it through the test.
But when I run the test for 30 minutes I seem to be able to hit the problem I've been encountering, albeit without the BSoD. The display driver crashes and recovers and Luxmark64 stops working, stopping the Stress Test.
If anyone knows what this could mean, I'm very interested in finding out!


UPDATE 2:

After reading online that I shouldn't run the Realbench stress test with a program like EVGA Precision or Afterburner active I decided to give that a try.
... I passed the 30 minutes test without any problems at all.


UPDATE 3:

I just ran the test again with Afterburner on and it passed again. I think I'm just going to try a few other tests and see what turns up on those.
 
Last edited:
Re-install nvidia driver, click custom and clean install
I've already done the complete clean install twice though. First time using Disk Sweeper and the second time using the newer DDU.



I ran the Aida64 stability test for a few hours while I was away today and these are the results: http://imgur.com/a/oDm2a

Notice any problems or issues here? (Temps, stability, etc.) It looks pretty OK to me.
 
I've already done the complete clean install twice though. First time using Disk Sweeper and the second time using the newer DDU.



I ran the Aida64 stability test for a few hours while I was away today and these are the results: http://imgur.com/a/oDm2a

Notice any problems or issues here? (Temps, stability, etc.) It looks pretty OK to me.

I'm just suggesting you do try the built in NVidia way...takes like 5 minutes. Give it a whirl before you pull your hair out.
 
I'm just suggesting you do try the built in NVidia way...takes like 5 minutes. Give it a whirl before you pull your hair out.
Oh don't get me wrong. I have done that. Frankly I'm still amazed you can update GPU drivers now without having to even reboot or wipe the previous drivers.

I just can't seem to find the issue causing these BSoDs I've been having. I thought the last wipe/clean install I did solved the problem. But a few days after it happened again.
 
Did it happen again after you ran stock CPU? And any BSOD dump?
 
Did it happen again after you ran stock CPU? And any BSOD dump?
Did you check the results I posted? That was with the CPU OC in place.

I currently have everything reverted to stock settings and I'm about to run the entire battery of tests again.
 
Did you check the results I posted? That was with the CPU OC in place.

I currently have everything reverted to stock settings and I'm about to run the entire battery of tests again.

Its it just to play at stock and see if it happens again instead of running limited tests? :)
 
Its it just to play at stock and see if it happens again instead of running limited tests? :)
I'm not sure what you mean. I have no intentions of keeping my CPU at stock clocks for the foreseeable future. Just until I can pinpoint the problem. I've also never had the crashes this entire thread is about while playing video games. They have only happened while browsing/watching videos.

So far I've already had my browser crash on me once since reverting to stock clocks while watching multiple videos. No display driver crash or BSoD though, just the pages going black and Firefox basically getting stuck.
 
WELL.

Everything seems to have gone from bad to way worse. Since resetting the CPU clocks my PC boots up extremely slowly/sluggishly and I am now also getting a constant USB device being connected/disconnected sound. I also get the "USB Device Not Recognised" pop-up but when I click on it I just get an empty window. When I go to device manager the window constantly refreshes and I do not see device with a problem anywhere.
I've already tried disconnecting every USB device connected to my PC one by one but none of them have any effect on the issues.
 
I think at this point you are beyond a graphics issue. Tried a fresh Windows install?
 
Well, since I was out of ideas and everyone else seemed to be as well I went for the nuclear option and reinstalled Windows.

Surprise surprise the USB issue has not gone away at all.
Trying to fix these BSoD issues have really proven to be extremely ruinous for my PC, my free time, my mood and my work experience in general.

But of course we try to look at this on the bright side and I feel like I might have made some headway into analysing the problem.

In the image below you can see all the USB ports I was using permanently.

From bottom to top:
  1. USB cable going to the monitor to enable its USB ports (basically never used)
  2. My desk microphone. Used in some video games
  3. My keyboard.
  4. My mouse.
PZuLkL6.jpg



In pure desperation I started uninstalling Universal Serial Bus Controllers in device manager and when I deleted the bottom one my mouse and keyboard stopped working BUT ALSO the Unrecognised USB Device issue + constant connecting and disconnecting stopped.
I then plugged my keyboard and mouse into the USB ports at the front of the case and they seem to be working fine.This leads me to believe there might be something wrong with that row of USB ports or its software? Maybe it's all a pure goddamn coincidence that it started going wrong as soon as I removed the CPU overclock?


If anyone knows any potential test methods to find the problem or fixes for said problem I'd love to hear from you.



P.S.
Oh and in case anyone is still wondering: Yes my PC is still running way slower than it did before resetting the CPU clocks. Even though this is a fresh install of windows it's very noticeable both in the boot-up speed and in general usage.
 
Last edited:
Back
Top