Error 43 on GPU has me completely clueless

Admiral-Awesome

Weaksauce
Joined
Sep 15, 2007
Messages
105
This has been a SAGA so far, with some very perplexing symptoms and I am hoping to get some fresh eyes on it.

System specs:
CPU: 5600x with a CM Hyper 212 air cooler
MOBO: Asus Strix X570-E Gaming
RAM: G-Skill TridentZ Neo 3600 CL16 2x8GB
GPU: eVGA 3070 XC3 Ultra
PSU: eVGA 750 GA 750w Gold+
OS: Windows 10 21H1

No components were under any sort of overclock (including PBO), RAM was operating at 3600 via XMP



After about 9 months of having a perfectly stable system, I was playing COD and closed the game for the night when my computer froze on the desktop (hard lock, no BSOD). After rebooting, my resolution was set to a forced 800x600 (which persists in the BIOS). Nvidia software was no longer active in Windows and the resolution options within windows were greyed out. Device manager showed my 3070 with a code 43 error "Windows has stopped this device because it reported problems". I'm going to list my attempts at solving this issue so it's quicker to read:

1. Attempted to reinstall multiple versions of Nvidia drivers, all of them failed with error "This NVIDIA graphics driver is not compatible with this version of Windows. This graphics driver could not find compatible graphics hardware".
2. Fresh install of Windows on a secondary SSD, same behavior.
3. Change PCI-E mode from 4.0 to 3.0. No change.
4. Moved GPU to Z170/6700K system - it works perfectly and I've been gaming on this setup for weeks now while I try to figure this out.
5. Installed working GTX 960 into system - same symptoms with code 43 error and no ability to install drivers.
6. Swapped to a different RAM kit. No change.
7. Installed a different power supply (from the Z170 system). No change. GPU is connected via 2x independent 6-pin cables for both PSUs.
8. Flash to multiple previous BIOS versions as well as the current release and beta release. I'll note that I reset the BIOS prior to re-flashing with the same results.
9. Moved GPU to second PCI-E slot - same behavior with both 3070 and 960.
10. Removed NVME drive.
11. RMA'd motherboard - received a replacement which was an entirely new board with a different S/N and booted up to the same issue.
12. Bought a B550 board because I couldn't believe the result of #11. Same result. (!!!)
13. Moved PSU and RAM to Z170 system, running fully stable.

At this point, I've swapped everything except the CPU for another tested and working part and still can't get this issue to go away. I'm wearily looking towards the CPU as the next thing to isolate, but I don't have access to one to swap and it's just such a strange issue if it is the CPU. I NEVER post requests for technical help, as I've been a lifelong PC enthusiast and have a decade of professional IT experience - I can usually always figure something out through my knowledge and Google searches, but I've exhausted all the recommended solutions to this issue online and have resorted to brute-forcing the isolation by straight up replacing almost the entire computer piece by piece.

Maybe one of you has my lightbulb moment? Thanks for reading!
 

Admiral-Awesome

Weaksauce
Joined
Sep 15, 2007
Messages
105
Thanks for the reply! I've installed both chipset drivers for the x570 and b550 board from the manufacturer. I also installed them from a USB to a fresh windows install that didn't have internet access. Still the same problem.
 

djstarfox

Limp Gawd
Joined
Sep 20, 2010
Messages
496
Shot in the dark here... but in the BIOS, try disabling the following:
* Above 4G Decoding
* Re-Size BAR Support

The fact that the 3070 works on an Intel CPU setup leads me to believe there is something AMD-specific (or the RAM) that is causing the problem.

Also, could not hurt to run MemTest86+ Free Edition for 3 passes. Download the bootable USB stick image here:
https://www.memtest86.com/download.htm
 

robijito123

Limp Gawd
Joined
Feb 2, 2021
Messages
227
I had this error on an older msi haswell gtx 970m laptop, it was triggered by update 1903 in windows, I am thinking because it was a GPU bios issue. I know you have a way newer system but do you have a bios switcher on your GPU? Maybe try that or see what version you have vs techpowerup GPU db...
 

learners permit

[H]ard|Gawd
Joined
Jun 15, 2005
Messages
1,223
Check cpu pins with a magnifying glass for deformities. Sounds like it's time for AMD to test that one for you.
 

Scoobydo2

n00b
Joined
Sep 27, 2020
Messages
27
I had the same problems. From my understanding reading the internet it's related to PCI express 4 changing from a high power to a low power state and back again. It confuses the computer, video card, or driver. I never found a fix.

That said things that would cause the error no longer happened since I upgraded to windows 11. I have not had a crash since my reinstall of win 11 6 months ago. I am not saying that will help you but it seems something there fixed the problem for me. *Knocks on wood*
 

Axman

[H]F Junkie
Joined
Jul 13, 2005
Messages
14,714
CPU issues can happen. People think they don't, but when they do, they're usually incredibly difficult to pin down. And they tend not to fail in any predictable way when they do.
 

Admiral-Awesome

Weaksauce
Joined
Sep 15, 2007
Messages
105
Interesting development. The B550 board I purchased for testing has 1 x16 PCI-E and 1 x8. The Strix X570 had 3 x16. When I plug my 3070 into the x8 slot on the B550, it works "normally". Drivers install, I can play games just fine and get decent frames. However, it's locked to x4 3.0 and bandwidth dropped from ~27gbps to 3 (according to 3dMark). If I move it back to the x16 top slot, I'm back to error 43. I'm going to put the 3070 back into the z170 setup and test the bandwidth there to see if x16 is enabled.
Interesting. Have you tried turning off XMP or increasing the RAM voltage?
I have turned off XMP and ran two different RAM kits which have passed memtest. Have not increased voltage and probably won't as my memory kit is pretty fussy.

Shot in the dark here... but in the BIOS, try disabling the following:
* Above 4G Decoding
* Re-Size BAR Support

The fact that the 3070 works on an Intel CPU setup leads me to believe there is something AMD-specific (or the RAM) that is causing the problem.

Also, could not hurt to run MemTest86+ Free Edition for 3 passes. Download the bootable USB stick image here:
https://www.memtest86.com/download.htm
Both settings end with the same result. When I research this issue at least 95% of the people with the problem have Ryzen systems.

I had this error on an older msi haswell gtx 970m laptop, it was triggered by update 1903 in windows, I am thinking because it was a GPU bios issue. I know you have a way newer system but do you have a bios switcher on your GPU? Maybe try that or see what version you have vs techpowerup GPU db...
I've checked the vBIOS and it's the same one on TPU. I've considered flashing it to potentially clear any corruption, but given the current state of the GPU market I'm slightly nervous to take this step haha.
Check cpu pins with a magnifying glass for deformities. Sounds like it's time for AMD to test that one for you.
I'm onto this next. I'm going to move my 3070 to a friends B550/5600x setup to see if a change of CPU solves the issue.
I had the same problems. From my understanding reading the internet it's related to PCI express 4 changing from a high power to a low power state and back again. It confuses the computer, video card, or driver. I never found a fix.

That said things that would cause the error no longer happened since I upgraded to windows 11. I have not had a crash since my reinstall of win 11 6 months ago. I am not saying that will help you but it seems something there fixed the problem for me. *Knocks on wood*
Interesting. I may try Windows 11 since I hear that some people having this issue have no problems operating under Linux.
CPU issues can happen. People think they don't, but when they do, they're usually incredibly difficult to pin down. And they tend not to fail in any predictable way when they do.
That's the hardest part about this - I've never had a CPU go bad on me, but I know it can manifest itself in so many ways and a breakdown in PCI-E lanes seems possible, but I'm no expert on the internal workings to this degree.
I didn't see any mention of trying a different monitor or cable.
Tried multiple monitors and cables, from DVI (on the 960)/HDMI/DP - each reacts the same way.
 

spine

2[H]4U
Joined
Feb 4, 2003
Messages
2,699
I'm thinking it's the CPU. Double check all the pins as someone else suggested and reseat it firmly.

Also, try disabling as many power saving features as you can, both Bios and Windows, and see if that changes anything.
 

Admiral-Awesome

Weaksauce
Joined
Sep 15, 2007
Messages
105
Just wanted to follow up on this as there are multiple dead end threads across a number of forums on this topic. The CPU was the issue. There were no bent pins or any visible damage, but I received my RMA from AMD today and it works wonderfully. Both ASUS and AMD were great to work with for the RMA process, with the nod going to AMD for them paying shipping and having a much faster turnaround time. Thanks to everyone for providing input on this issue!
 

Attachments

  • IMG_20220312_163858.jpg
    IMG_20220312_163858.jpg
    334.6 KB · Views: 0
Top