Strange, intermittent VGA failures - 1080ti

VanGoghComplex

[H]ard|Gawd
Joined
Apr 5, 2016
Messages
1,995
I'm just checking in to see if anyone else has experienced anything like this.

I have a Thermaltake Core P5 with a vertical GPU mount, seen below:

IMG_20190830_124539024.jpg

I recently replaced a few of the hard tubed lines with ZMT, as I needed the 14mm fittings for the build's next iteration. One of the runs replaced was the connection between CPU and GPU, and that 3/8" x 5/8" tubing has got some spring to it in a bend that tight. It's kinda pressing against the GPU block terminal, forcing it toward the viewer.

Now, the machine only boots one in ten tries. The other nine, it fails on VGA, as indicated by the status LEDs and beep codes.

It seems like if I grab the GPU waterblock and pinch it by hand while booting, the machine will start. That's really bizarre though, so before I drain it and rework it, I wanted to ask: has anyone else seen any issue like this before?
 

Armenius

Fully [H]
Joined
Jan 28, 2014
Messages
22,719
The strain may have caused the video card to make imperfect contact on the PCI-E pins in the extender. I would definitely rework the loop to allow a better connection to the PCI-E extender. It's never good to have any kind of strain on component connections.
 

Zedicus

Gawd
Joined
Nov 2, 2010
Messages
662
ZMT can be softened with a (good) hair dryer on the high heat setting. it will form enough to take some of the spring out of it. you should be able to do this with out draining the loop. put some extra pressure on the card while doing this to get a bit more form on the tubing. (a cautious person with a heat gun would be fine as well)
 
Last edited:

VanGoghComplex

[H]ard|Gawd
Joined
Apr 5, 2016
Messages
1,995
I reconfigured the machine to a more traditional state with the GPU mounted directly in the PCI slot. No video and beep codes still indicate VGA failure. So I grabbed the passively cooled GT710 I keep on hand for situations like this and plug it into my second PCI slot. Switched monitor inputs (leaving the DP cable plugged into the 1080ti) and hit the button.

The machine boots and I load the BIOS - I'm still getting error LEDs and beep codes indicating VGA failure, but in the BIOS, my 1080ti is properly identified and I'm getting temperature readings from it.

Unrelated, but I'm recovering the system from a malware infection right now (be CAREFUL when you snag VLC media player) and so Windows boots into safe mode. In safe mode, both display adapter drivers are running and reporting no problems.

I try booting in regular mode and the OS refuses to load fully - I get a black screen with a mouse cursor, but no windows interface and no HDD activity. So I grab my USB stick and reinstall Windows.

I install while unplugged from ethernet and so both the display adapters present are running on generic Microsoft display drivers - and one of them is reporting a problem. I don't know which one it is, but when I switch inputs on my monitor to the 1080ti, I get no signal. I switch back to the 710 and plug in the ethernet cable to allow Windows to snag drivers for the cards.

It does so (and goes through it's cycle of reboots for applying updates), and I pull up device manager again - both cards are reported and identified, neither are reporting problems. I switch to the 1080ti input on my monitor and it's WORKING. I'm typing this on it right now.

I've got my fingers crossed while I download a video benchmark to test stability. I do NOT want to have to replace my GPU right now. Have you ever heard of malware screwing a video driver so hard that it has problems like this?
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,611
I reconfigured the machine to a more traditional state with the GPU mounted directly in the PCI slot. No video and beep codes still indicate VGA failure. So I grabbed the passively cooled GT710 I keep on hand for situations like this and plug it into my second PCI slot. Switched monitor inputs (leaving the DP cable plugged into the 1080ti) and hit the button.

The machine boots and I load the BIOS - I'm still getting error LEDs and beep codes indicating VGA failure, but in the BIOS, my 1080ti is properly identified and I'm getting temperature readings from it.

Unrelated, but I'm recovering the system from a malware infection right now (be CAREFUL when you snag VLC media player) and so Windows boots into safe mode. In safe mode, both display adapter drivers are running and reporting no problems.

I try booting in regular mode and the OS refuses to load fully - I get a black screen with a mouse cursor, but no windows interface and no HDD activity. So I grab my USB stick and reinstall Windows.

I install while unplugged from ethernet and so both the display adapters present are running on generic Microsoft display drivers - and one of them is reporting a problem. I don't know which one it is, but when I switch inputs on my monitor to the 1080ti, I get no signal. I switch back to the 710 and plug in the ethernet cable to allow Windows to snag drivers for the cards.

It does so (and goes through it's cycle of reboots for applying updates), and I pull up device manager again - both cards are reported and identified, neither are reporting problems. I switch to the 1080ti input on my monitor and it's WORKING. I'm typing this on it right now.

I've got my fingers crossed while I download a video benchmark to test stability. I do NOT want to have to replace my GPU right now. Have you ever heard of malware screwing a video driver so hard that it has problems like this?
Is there any chance that coolant from your water cooling setup leaked on it? I've never heard of malware that could cause a failure in this manner - leakage from the cooling system seems vastly more likely, particularly if you're using hard tubes.

The behavior you're seeing where it works with the compatibility driver, but not the real one is a pretty common indicator of a failure of the logic portion of the graphics card. It's not uncommon for some cards to start working again once they heat up after a few minutes with power applied.

Have you checked for corrosion on the back of the card? If that's the problem, you may be able to just clean it off and get it to work again.
 

VanGoghComplex

[H]ard|Gawd
Joined
Apr 5, 2016
Messages
1,995
Is there any chance that coolant from your water cooling setup leaked on it? I've never heard of malware that could cause a failure in this manner - leakage from the cooling system seems vastly more likely, particularly if you're using hard tubes.

The behavior you're seeing where it works with the compatibility driver, but not the real one is a pretty common indicator of a failure of the logic portion of the graphics card. It's not uncommon for some cards to start working again once they heat up after a few minutes with power applied.

Have you checked for corrosion on the back of the card? If that's the problem, you may be able to just clean it off and get it to work again.
It's not impossible, but I'd say it's highly unlikely. I'm very fastidious with my loop and very careful when making and breaking connections. I also had the block apart for a loop change I'm doing, and I checked the PCB for any indication of a prior leak - there's no residue or anything there.
 

VanGoghComplex

[H]ard|Gawd
Joined
Apr 5, 2016
Messages
1,995
And now I'm pretty sure I'm losing my mind. After replying to you, RazorWind , I pulled the GT710 out. On boot the MB reported VGA failure again via LED and beep codes, but... the machine booted anyway? I got video and ran a benchmark, no problem at all. So I rebooted it again, and same behavior. The MB is reporting a VGA failure but the system runs fine.
 

lopoetve

Imhotep
Joined
Oct 11, 2001
Messages
29,498
Might have cooked part of the board. Years and years ago I had a similar issue from an intermittent short to ground (spacer was damaged and I didn't realize it; was making contact outside of the hole). Put it on a test bench and see if it does it out of the case?
 

VanGoghComplex

[H]ard|Gawd
Joined
Apr 5, 2016
Messages
1,995
Might have cooked part of the board. Years and years ago I had a similar issue from an intermittent short to ground (spacer was damaged and I didn't realize it; was making contact outside of the hole). Put it on a test bench and see if it does it out of the case?
The case is a Core P5 so it's basically a test bench already.
 

lopoetve

Imhotep
Joined
Oct 11, 2001
Messages
29,498
The case is a Core P5 so it's basically a test bench already.
Yeah, but things are connected and held in place by structure; something there might have gone weird. I’m thinking a cardboard box and stuff just sitting there to test. :)
 

VanGoghComplex

[H]ard|Gawd
Joined
Apr 5, 2016
Messages
1,995
Yeah, but things are connected and held in place by structure; something there might have gone weird. I’m thinking a cardboard box and stuff just sitting there to test. :)
Not a bad idea, but very inconvenient as I'm on a water loop. Like I said in previous posts, it's now weirdly booting just fine as the motherboard reports VGA errors via LED and beep codes... stress tests just fine and Windows reports no errors.
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,611
What do you get as the PCI-E link speed in GPU-Z? Could be there's something wrong with the slot.
 

VanGoghComplex

[H]ard|Gawd
Joined
Apr 5, 2016
Messages
1,995
What do you get as the PCI-E link speed in GPU-Z? Could be there's something wrong with the slot.
GPU-Z reports 16x.

Further weirdness: I cannot access the BIOS with the 1080ti installed. Since this debacle began, the boot process does not show me the splash screen and jamming DEL won't get me my bios. I've disabled fast boot, tried accessing the UEFI from the Windows restart menu, etc... no dice.

UNTIL I install the GT710 in the second slot. Then, as long as the monitor is paying attention to the HDMI input and not the DP one, I can get to BIOS. Furthermore, with both GPUs slotted, I get no error LEDs or beeps. With only the 1080ti slotted, i get both, but I still get display and the machine runs fine.

I'm about to just give up. The thing works but it's not doing what it's supposed to.
 

RazorWind

2[H]4U
Joined
Feb 11, 2001
Messages
3,611
Have you tried the suspect card in another system? That might help narrow down whether it's the motherboard or the graphics card that's the culprit here.

You can totally boot a 1080 Ti with just an empty water block on it, FYI. No need to set up a whole water cooling circuit just to test it.
 
Top