Graphics Card Necromancy: Resurrecting a Dead GTX 690

L19, eR33 marking. C229 looks like it went from wet to dry too. Hard to tell if q15 and q16 were wet and now dry. That may just be "over spray" from the other parts.
 
If you guessed that the suspect component is the low side mosfet (Q15) to the right of the choke in the photo, you guessed right! We can even see that the short is in the upper right corner, as that's where the alcohol is evaporating off of. We can confirm that this our problem by removing the suspect component from the board, and checking the resistance across it, and the resistance of our circuit without it, which should be back to normal.

Edit: Well, I should probably clarify that that we can confirm that this is a problem - it's totally possible that we have more than one. Also, I checked to make sure that 0 ohm resistor they seem to be using as a fuse is still ok. It is.

With the mosfet in place, we have 1.6 ohms across it.

IMG_4612.jpg
And here's our pad where it goes, with the mosfet removed.
IMG_4615.jpg

With the mosfet removed, our resistance is back in the normal range.
IMG_4614.jpg

And the resistance across our suspect mosfet, which should be several thousand ohms, is actually almost zero. This was very difficult to photograph. I think the real resistance may be more like 2 ohms, and one of the probes slipped, but you get the idea.
IMG_4618.jpg

Luckily, we have a brand new replacement, which I purchased a few weeks ago for one of the other cards, and didn't end up using it.
Untitled-2.png

Resistance across this new one is much higher. I tried to get a photo of this, but I need at least two more hands for that. Suffice to say that it's several million ohms.
 
Here's the new mosfet, soldered to the board in place of the old one.

IMG_4622.jpg

Resistance across the circuit still checks out, at 7 or 8 ohms. I've got continuity to the gate pin on the controller, continuity to ground on the source pins, and continuity to the positive side of the circuit on the drain.

I sat for a little while, trying to think of anything else I should test before plugging the card into the system and powering it up. I triple checked the resistance on all three of the mosfet's terminals, and compared them to Card B, which still has this circuit unaltered. They're all within a couple of percent of each other. So, I plugged the card in, crossed my fingers, and hit the switch...













And after a brief eternity (probably less than 10 seconds), it worked!
IMG_4623.jpg

Now that I know it at least kinda works, I need clean the flux residue off of it, and put the cooler back on. Then I can let it run long enough to check the voltages at the rail's controller. It's possible for it to run, and still be broken, such that it would fail again if I ran it for too long, or too hard.
 
Alright, posting from the test bench now, actually using the card. All the requisite voltages are there, and within spec.

The card seems to work, with the exception that GPU #2 (as reported by precision) runs really hot, so I bailed out of Heaven after about 30 seconds. It got up to about 80 before GPU #1 got past 55 or so.
Is the stock cooler on this card just that ineffective? I tried re-pasting it, but that didn't make any difference. I guess I should try one of the Arctic Cooling coolers I have piling up in my house. Kind of a bummer - they sure don't look as cool as the stock one.

Still, I just played a few minutes of Mass Effect 2, and everything seems to work. I redid the graphics settings and reinstalled the motherboard's chipset drivers, and now it runs at a steady 60FPS (with vsync on).
 
The stock cooler is fair, though not really great by any means. Being only a single fan and a blower style, it cannot compete with dual fan non blower cooling systems. However, in my experience it does keep both GPUs within a few (2 to 4C) degrees of each other when the load is averaged between them, such as under normal SLI conditions while gaming or benchmarking. Both GPU speeds should be exactly the same or vary slightly (+- 15MHz or so) during SLI usage.

Using a custom fan curve in MSI Afterburner, I always set it so 80C resulted in 100% fan and (almost) linearly configured the fan speed back from there. So 80C was 100%, 70C was 85%, 60C was 70% and 50C was 50%. Something like that anyway. Under that custom fan profile, gaming would result in 70C - 72C on both cores once heat saturated, which I considered acceptable both in terms of temperatures and noise.
My 690 has been re-installed in the original system I bought it for (Asus Rampage IV Formula w/ i7 3820) since I upgraded my main rig and moved the video cards over to it. I can even play FC5 and New Dawn 1920x1200 max detail at around 45fps under SLI and it's a decent, if not stellar, experience. Certainly better than expected given only 2GB memory per GPU.

If you are seeing much higher temps on one GPU #2 compared to #1, then it could be that the heatsink isn't fully touching the complete surface of the GPU #2 die, or excess voltage is reaching GPU #2. You might try checking the GPU voltages (try HWiNFO64 or MSI Afterburner if your software doesn't show GPU voltages).
 
The stock cooler is fair, though not really great by any means. Being only a single fan and a blower style, it cannot compete with dual fan non blower cooling systems. However, in my experience it does keep both GPUs within a few (2 to 4C) degrees of each other when the load is averaged between them, such as under normal SLI conditions while gaming or benchmarking. Both GPU speeds should be exactly the same or vary slightly (+- 15MHz or so) during SLI usage.

Using a custom fan curve in MSI Afterburner, I always set it so 80C resulted in 100% fan and (almost) linearly configured the fan speed back from there. So 80C was 100%, 70C was 85%, 60C was 70% and 50C was 50%. Something like that anyway. Under that custom fan profile, gaming would result in 70C - 72C on both cores once heat saturated, which I considered acceptable both in terms of temperatures and noise.
My 690 has been re-installed in the original system I bought it for (Asus Rampage IV Formula w/ i7 3820) since I upgraded my main rig and moved the video cards over to it. I can even play FC5 and New Dawn 1920x1200 max detail at around 45fps under SLI and it's a decent, if not stellar, experience. Certainly better than expected given only 2GB memory per GPU.

If you are seeing much higher temps on one GPU #2 compared to #1, then it could be that the heatsink isn't fully touching the complete surface of the GPU #2 die, or excess voltage is reaching GPU #2. You might try checking the GPU voltages (try HWiNFO64 or MSI Afterburner if your software doesn't show GPU voltages).
Thanks for your insight!

I think I may have figured it out, and this is a new one for me. It appears that my #2 GPU die may be slightly convex, and thus doesn't make very good contact with its heatsink. This seems to be exacerbated by the relatively low mounting pressure of the stock heatsinks, and I suspect also that the heatsink I was trying to use may also be slightly convex, which aggravates the problem.

This is the heatsink for GPU #1. See how the grease looks pretty even?
Untitled-2.jpg

This is the heatsink for GPU #2. See the thin spot in the center?
Untitled-3.jpg

I swapped one of the Arctic Cooling "Twin Turbo 690" coolers onto the card, and now both cores hit about 60 degrees after a couple of minutes of Heaven. GPU #2 still seems to run a couple of degrees hotter, though, and heats up faster, hence why I think it's the die surface, and not just the heatsink that's convex.

I had not considered that it might be running extra voltage on GPU number 2, but I'll have to check on that tomorrow.
 
Ok, so, good news and bad news today. Bad news is, there's still something wrong. Good news is, i'm not really convinced that the problem is the card. I managed to kill the card again, in the same failure mode as yesterday, where it burned out that little 0 ohm resistor AND the low side mosfet, probably not in that order.

Out of an abundance of caution, I stuck a thermocouple directly on the mosfet before running it, and I discovered that it gets alarmingly hot - over 100 degrees as measured, meaning Tj is likely much higher. I wasn't quick enough shutting it down, and it overheated the mosfet, which in turn led to overdrawing current through that little resistor. It's a good thing that resistor is there. I replaced both MOSFETs and the resistor, and the card now works again.

Now, one of the things that can cause a MOSFET to run hot is inadequate gate drive voltage, as I've talked about before. The gate drive is the signal that causes the mosfet to switch on. If the voltage is too low, the MOSFET will only switch on partially, and have a high resistance from the source (connected to ground) to the drain (connected to the load - our GPU in this case). This high resistance causes more of the power passing through the MOSFET to be dissipated as heat that you'd normally see, causing the MOSFET to run hot, which is undesirable for several reasons.

So. One thing I've noticed is that the voltage making it to the controller seems to be a little low. I had assumed this was intentional, as the card has a VRM that seems to exist for the purpose of regulating the 12V input power from the PSU that outputs about 11.5volts. There are even traces on the board to supply 12V power direct from one of the 8 pin connectors that are not used, that VRM is used instead. If you look at the photos I posted of card C, there are empty pads next to that 0 ohm resistor. They connect to the .95V rail's input power on one side, and the leftmost 8 pin on the other - to use it, I'd need to install another 0 ohm resistor on those pads, but I assume nvidia left this out, and included a VRM for a reason.

With this in mind, I'm starting to suspect that the reason my cards keep dying is not a defect in the cards themselves, but rather that the power being supplied to them by the power supply is insufficient, and some quirks of this .95v rail design cause the gate drive voltage to droop with the power supply's 12V voltage. I have a feeling that, under load. the "12V" being supplied by the power supply is dropping to like 11.5 or less. Notably, this power supply is an older 650W unit. 650W is the minimum that nvidia calls for for the GTX 690. Thus, I think the next step is to swap that unit out with a newer, more robust one. I have one I could borrow from my main rig, but it would be a huge pain to extract it with all of its cables, so I'm considering buying another.

Keeping in mind that I'm going to hook this thing up to used hardware I bought from ebay and repaired myself using questionable logic and methods, I'd prefer not to spend a gazillion bucks on something super fancy. Anyone have any recommendations on a reliable, but inexpensive power supply in the 1000ish watt range?

Edit: I did check the voltage, as Azrak suggested - GPU #2 is actually running a hair less voltage, I assume because it's running hotter and GPU Boost is throttling back a little.
 
Last edited:
I wonder if you've considered that maybe the motherboard is the real villain here? :confused:
 
I wonder if you've considered that maybe the motherboard is the real villain here? :confused:
Yes, I've considered that. The thing is, the rail that keeps failing doesn't seem to be connected directly to the motherboard power. Power comes into the card at three places. One is the PCI-E connector, and the other two are the 8 pin power connectors. The VRM I mentioned that produces "12 volts" doesn't connect to the PCI-E power at all, as far as I can tell, and instead is connected to one of the 8 pins*.

So, I've considered it, but it seems less likely to be the root cause of the problem than the power supply. Not impossible, though.

*I think. I've slept since I checked on this, but I'll double check it again tonight.
 
I mean it most likely is the PSU, if at all.

If it were me, I'd find a new relatively one, and use it purely to power the GPU via the PCI connectors, and leave the existing one powering the mobo + PCI slot. It'll also post without the extra power, which could be a useful for testing. (y)
 
I mean it most likely is the PSU, if at all.

If it were me, I'd find a new relatively one, and use it purely to power the GPU via the PCI connectors, and leave the existing one powering the mobo + PCI slot. It'll also post without the extra power, which could be a useful for testing. (y)
It actually won't post without the 8 pins connected. The VCore VRMs have a total of five phases each, three of which are connected to one of the 8 pins, and two of which are connected to the other. Without both 8 pins hooked up, the Vcore VRM doesn't power up at all.

Also, the .95 rail seems to be hooked up to both of the 8 pin connectors, such that it wants both to be working before it starts.

I've pretty much resigned myself to pulling my water cooled 1080 Ti out of my gaming machine, and testing with that, since it has a really nice Corsair 1000 watt power supply, with voltage monitoring. It's just that draining the coolant loop is a pain...
 
It actually won't post without the 8 pins connected. The VCore VRMs have a total of five phases each, three of which are connected to one of the 8 pins, and two of which are connected to the other. Without both 8 pins hooked up, the Vcore VRM doesn't power up at all.

Also, the .95 rail seems to be hooked up to both of the 8 pin connectors, such that it wants both to be working before it starts.

I've pretty much resigned myself to pulling my water cooled 1080 Ti out of my gaming machine, and testing with that, since it has a really nice Corsair 1000 watt power supply, with voltage monitoring. It's just that draining the coolant loop is a pain...

Correction to this: The system doesn't post, but the card does produce a nice, friendly warning on the screen about how you need to plug in the power connectors if you try to use it without them.
IMG_4654.jpg

Anyway, good news, kids.

After some deliberation, I splurged, and hooked myself up with this beast - a Corsair AX1200i:
IMG_4655.jpg IMG_4656.jpg

I know it's overkill in the overall wattage sense for my specific use case here, but I have some plans for other stuff in the relatively near future where I think having a power supply this robust will be handy, so I think it's worth it. Also, I really like the per-rail voltage monitoring that my RM1000i in my gaming machine has, and this unit really goes ham with that feature. It even lets you set individual overcurrent protection for each of the 6/8 pin cables, which is kinda neat. For troubleshooting faulty components, that seems like a really handy feature.

Anyway, I hate to say it, but I'm not super impressed with it. It appears that the components inside it are packed in there so closely that one of the ATX power connectors for the motherboard has a cap in the way of its little retaining clip, such that I can't quite get it plugged in securely (see the second photo). This isn't a huge deal for my case, since I'm not installing it permanently anywhere, but it would leave a pretty unpleasant taste in my mouth if I were using this in a normal build. I may yet return it as defective, but I haven't decided if I really care to deal with that. It does work as intended otherwise.

Nevertheless, I swapped out the old suspect power supply with this new hottness one, and now my low side mosfet runs way cooler. Instead of hitting 60C immediately when turned on, it hits about 40. Seems like maybe the gate drive voltage was, in fact, the problem.
IMG_4658.jpg

Under load, it goes up to the 80s, but I still eventually get a black screen if I run the card under full load long enough. It doesn't seem to related to the .95 rail, though, as the card turns right back on if I turn the system back on, with no apparent damage. I have my doubts about the effectiveness of the Arctic cooling heatsink at cooling the VRMs, though, so it may be related to that, or the PSU's overcurrent protection not being set right. That seems unlikely, since the limits are something like 23 amps per cable (which is almost 300 watts per cable), but maybe I didn't read the instructions right. Still, i think it's a victory that our .95 rail seems to survive actual use, now.

One thing I may try is swapping the nvidia heatsink back on. The Arctic Cooling heatsinks have great big monster heatsinks for the core, but just a thin piece of aluminum for all the other parts of the card, unlike the nvidia heatsink, which has a massive casting for the memory and VRM, with some actual fins. The VRMs on this card are all somewhat more stressed than on a 680, because they have generally fewer phases in order to fit them all on there, and it wouldn't surprise me if the Arctic Cooling heatsink doesn't really cool them well enough.
 
Hi RazorWind. I found myself a pair of 690s for basically free, where one of them were broken. From the looks of it I am pretty sure its just a single capacitor from it getting hurt during shelf storage. Trying to confirm the issue, I attempted moving the capacitor from my working board to the broken one, but accidentally broke it. I have tried a few surface mount capacitors I have around, but none of them gets the board to recognize with the right ID for driver installation.
I found one capacitor I tried putting on the board I just broke (board A) and it worked with an older version of the nvidia driver, but causes crashes in games.
Since you apparently have quite a few dead boards now, I was wondering if you could help me out with new caps? My cap is shorted, so its impossible to measure as well.

The missing cap is labeled C913 - on the back of the board, opposite from the PCIE finger.

Would really like to at least get one of these boards working again.
 
IMG_4677.jpg

Is that the cap you're talking about? I can desolder it later and check the capacitance, but it's probably best to get known good ones, if you can. I guess I could send you one if you can't, though. Are you in the US?


Can you clarify, you have two cards? One that used to work, but you removed this cap, and now it doesn't ("Board A"), and another where it was broken off, and never worked("Board B")?

I'd expect that the card would mostly work without it, as it seems to be there for the purpose of smoothing out noise in the 12V supply power, but there's a lot about this card that seems less robust than it maybe should be, to me. If you replaced it with a capacitor of some unknown value, though, one never knows what might happen.

PS: How did you manage to break the cap in the process of removing it? Are you just using a soldering iron, and not hot air?
 
View attachment 163667

Is that the cap you're talking about? I can desolder it later and check the capacitance, but it's probably best to get known good ones, if you can. I guess I could send you one if you can't, though. Are you in the US?


Can you clarify, you have two cards? One that used to work, but you removed this cap, and now it doesn't ("Board A"), and another where it was broken off, and never worked("Board B")?

I'd expect that the card would mostly work without it, as it seems to be there for the purpose of smoothing out noise in the 12V supply power, but there's a lot about this card that seems less robust than it maybe should be, to me. If you replaced it with a capacitor of some unknown value, though, one never knows what might happen.

PS: How did you manage to break the cap in the process of removing it? Are you just using a soldering iron, and not hot air?

Yes, I have 2 cards. Card A was working fine, card B was not working. I suspected it was this missing cap. To confirm this theory, I wanted to move the cap from A to B, but as I only have a normal soldering iron I managed to break it with my pliers (just me being sloppy, didn't realize it was so brittle). I can now confirm that it does not work without that cap. I tried moving C507 (from the row of 4 at the front end of card B to card A) but the capacitance is wrong so it doesn't run stable.

If its a common capacitance value its probably easier to order from china, as I live in Norway. I haven't found any way to check the capacitance value though, as I broke mine and no schematics are available. It would be awesome if you could help me out by measuring it.
 
Oh. Yeah, you really need hot air or at least hot tweezers to do this sort of SMD rework.

I desoldered the cap from one of my dead boards, and I measure it at ~11.8 uF. If I had to guess, I'd say it's probably supposed to be 12uF.

I guess I should post an update about my own Card D while I'm here.

I cobbled together one of the nVidia coolers such that it cools Card D's second GPU a little better, and then installed the system in the case, plugged it into my TV and played a few rounds (that is - several hours) of Mass Effect and Skyrim. The card seems to do fine, although the GPUs only seem to be seeing about 40% load, I think because the CPU in my test rig isn't fast enough to keep up with them. So, I really need to get my hands on a better motherboard/CPU, I guess, but it hasn't died yet, and it doesn't crash.

I unplugged it while I was vacuuming yesterday, but here it is as it sits currently.

IMG_4678.jpg

My next order of business may be to modify the case so I can use it as a test bench. The short version of that being removal of the drive cage and cutting a hole in the bottom so a riser cable can reach out of the case.
 
  • Like
Reactions: Azrak
like this
Ok, so philosophical forum usage question:

I mentioned before that I bought a pile of other cards to work on, which are not GTX 690s. Would it be considered bad form to start a fresh thread about those? I'm thinking I may make some videos for Youtube, which I'd like to post here for the sake of sharing, but I don't want to be accused of shilling my Youtube channel or anything like that. It just seems like if a picture is worth 1000 words, a video is probably worth at least ten thousand.
 
  • Like
Reactions: Halon
like this
Ok, so philosophical forum usage question:

I mentioned before that I bought a pile of other cards to work on, which are not GTX 690s. Would it be considered bad form to start a fresh thread about those? I'm thinking I may make some videos for Youtube, which I'd like to post here for the sake of sharing, but I don't want to be accused of shilling my Youtube channel or anything like that. It just seems like if a picture is worth 1000 words, a video is probably worth at least ten thousand.
I think a fresh thread would be fitting as ultimately these threads are resourceful for anyone else in a similar situation. Separate threads means more accurate search results.
Having said that, I too have thoroughly enjoyed reading this thread. I'm following you (mostly) through the troubleshooting and testing and wish I was as skilled or experienced so that I could fix a few of my own dead GPUs. One is an old 7800 GTX 512MB which is VERY rare these days. Oh, and Port A is fun little spot. I live in San Antonio and we sometimes make the trip down to Port A for vacation.
 
Ok, so philosophical forum usage question:

I mentioned before that I bought a pile of other cards to work on, which are not GTX 690s. Would it be considered bad form to start a fresh thread about those? I'm thinking I may make some videos for Youtube, which I'd like to post here for the sake of sharing, but I don't want to be accused of shilling my Youtube channel or anything like that. It just seems like if a picture is worth 1000 words, a video is probably worth at least ten thousand.
Yeah, I think a new thread would be best, but please stick a link to it in this thread so we don't miss it :)

This stuff is way over my head but I find it absolutely fascinating. I've really enjoyed reading this thread. Great stuff.
 
Well, I thought I would come back here and update you on my progress in fixing my 690s and my horrible soldering skills :)

w0GPYsfGG9TfpWTozgh1-p56TUZ4bO-T3C9K7H9gSNwfTZSlQ375peD0P_4jpX7kRTbBNZ0WKhZLQa3o5Ic=w958-h539-no.jpg


Here is a picture of card number one with all of my soldering gear. This card worked before I broke it. This was the 120 uF capacitor you measured for me - thanks :) I couldn't find a 120, so I put a 100 and a 20 in parallell.
8aJw2uzP31hnq3AZkt9k_B4LiLxhuglxwKzgKrNnNwB-N7-tIkiLcqcBSRLsBETHlqiy9ssfdJgELLELiuI=w958-h539-no.jpg

Ugly as hell, but no shorts and they sit well.

tDJj52UYOJvfsDR4AqTq1dre3znegn9hWfn2sZc8WxgqpjvCN1mI-t-17AhZkvBFG7mOYFTO5oZ4Rp71fGU=w958-h539-no.jpg


Card back in PC, got issues.

hFvEpbnY-32v6MTaRjW9sXVpIzJApF8dLK71HQEkmCdyGgg-cLHYw-qojsWB2ehvHn9qtWefRAnOgv3ejbg=w958-h539-no.jpg


CHRqgaTCnGW_FlFArmUkuo0MctvvbXGtZiSZ5VrvFwSxV1fpA6AAqIcKk5MpL55oa0IkKX_s43RLIWpE5kE=w958-h539-no.jpg


So I have no idea whats up, was guessing it was graphics driver related or the fact that I got windows updates and bumped the power button while closing the cover. Anyways, this windows install was no good anymore. Doing a fresh install with nividia card only, installing drivers and then adding AMD card worked. Had to put nvidia in top slot for things to work properly.
 
So, one card fixed, I went on to the other one. This one I have never seen work, but seller claims it worked when a while back when he last used it. 2 caps are missing - one of them my work. I tried to move it and see if it worked as a replacement for the 120 uF cap, but it didn't. No idea how big it is. Here I just put it back into place
XUX03pfSGbICYzsCfzJp3L5qCY81LxPOTaBruau9h3vn7I7HbpJP-FLYN8xEozagTx4dviB60Y0p92ZdHkA=w958-h539-no.jpg

Ugly as hell, but from my measurements those caps are in parallell so its fine.
TSjvXO0w-Z0iI_KIqzifNVuVaFBei_pux-0zqgbOjMUp8kvydefqvLtTxeHKay_tI-pxwi-9CEuDeNSETUY=w958-h539-no.jpg


Went on to solder the 120uF caps as well, same as on the other card. Didn't get a picture of the last cap, and its too dark to get a good picture now but its soldered in the same place as its loosely placed on this picture.

So, plugged it into my computer, aaaand - magic smoke. It came from the inside of the card just below where I replaced the 120uF cap. I don't have a small enough torks screwdriver, so I can't open it up and check what burnt.
I did measure the resistance accross the 12v rails after I soldered, so I know it wasn't a straight short there. Any guesses what it might have been?
 
Um, the cap I measured was 11.8uF, not 120.

Without tearing the card down and looking, it's hard to say. If you're lucky, it's one of the zero ohm resistors that they seem to be using either as fuses, or to configure power distribution. I say get yourself the right torx bit and take the heatsink off and look for clues.

Either way, if you did this to two cards, undo it and get the right caps before you try it on the second one.
 
Ah, sorry, remembered the number wrong. I did order the correct values though :) I will see about getting the torx bit, but at this point I might just sell it as parts.
 
Would you be interested in a dead obscure graphics card? I have a 512 MB Matrox M9120 that doesn't POST, but compared to these things it should be a cinch. You'll need a DMS-59 connector to get video output, but I'd put it in the mail for the cost of shipping to see it examined here.
 
Would you be interested in a dead obscure graphics card? I have a 512 MB Matrox M9120 that doesn't POST, but compared to these things it should be a cinch. You'll need a DMS-59 connector to get video output, but I'd put it in the mail for the cost of shipping to see it examined here.
Sure. I can certainly take a look at it, anyway. Figure out how much it'll cost to ship, and PM me.
 
Sure. I can certainly take a look at it, anyway. Figure out how much it'll cost to ship, and PM me.

If you're in Texas like me it'll probably run about $8 for Priority Mail. I can find a cheaper solution but it won't be a huge cost change. Let me know.
 
This is amazing.
And with a long face i look onto myself why did i wasted so many cards just because of small issues.
I have a question!
If a card works all the way good but does not install any drivers. What could be the problem ?
So like, you get a picture, and you can boot into Windows with the "VGA Display Adapter" driver, but when you try to install the real Nvidia/AMD driver, it doesn't recognize the card as compatible? Are you getting Error 43 in the device manager?

That could be indicative of a lot different things - a damaged or corrupted BIOS (hardware or software), some sort of GPU die or memory hardware failure, even problems unrelated to the card itself, such as the motherboard or just a corrupted Windows install. From memory, both AMD and Nvidia cards are capable of doing that, but they do it for different reasons, and each card design may be prone to different modes of failure.

That sort of problem is the hardest to troubleshoot without insider information from the card's manufacturer, but a corrupted BIOS is what I would suspect first (this could maybe be repaired without the need for soldering). After that, some sort of hardware failure within the GPU or memory would be next, either in the silicone itself, or the BGA solder balls holding it to the board. At that point, if all you want is a working card, just buying a working one is probably the best option, unless it's a super expensive 2080 Ti, or something. If that's the case, it should still be under warranty.
 
  • Like
Reactions: aQi
like this
I have a working 690 that I'd sell trade you for.
Did you not read the thread? I got one working.

And you can get working ones for about $100-$120 on ebay. You could almost get two for the price of that SSD.
 
  • Like
Reactions: aQi
like this
Sorry to resurrect an old thread, I tried to send Razorwind a pm but I must be restricted since I'm new.
Atleast my post is related to the thread topic and hopefully will help others.

I recently purchased two gtx 690's on eBay. Both cards look brand new and seem to work just fine, that is until I go to play a game or run 3dmark. Then they start to act up in the same manner that Razorwinds does, the screen goes black. I can do a hard reset and everything works just fine again until I go to play a game or run 3dmark.
I took the cards apart and replaced the thermal paste and pads.
After that now the 1st card will run every game I throw at it, 3dmark is kind of spotty though. I can run 3dmark 5 times and on the 6th the screen may go black, sometimes it'll go black on the 3rd...you just never know.
The 2nd card I have seen no change in. It still goes to a black screen as soon as I try to run a game.
I contacted the company that's selling on eBay and they gave me a new card for free and let me keep the faulty one. The new one works perfectly by the way.
So now with 3 cards I have a faulty one I can fiddle with.
I started thinking, either a component is overheating or not getting enough power. Since I already replaced the thermal paste and pads I decided to download Kepler Bios Tweaker 1.27 and proceeded to up the mW on the power table, what have I got to lose.

Below are pictures of some bios adjustments I made. The left being stock bios the right being the modded.
I'm not just raising mW's Willy Nilly...ok I am somewhat, but i've determined the ones that are for the pci-e power and kept them at a max of 75w and figured if I stay below 300w on the rails I should be good. I also raised minimum values just in case there wasn't enough wattage.

Now I can game! Atleast for 15mins or so and then I get a black screen.

I still haven't ruled out that a component is overheating. But if it was overheating why would putting more power to the card help it?

Any recommendations or thoughts on what is going on here?

IMG_20200128_224251.jpg

IMG_20200128_224309.jpg

IMG_20200128_224326.jpg
 
Ok, first flash the stock BIOS back on there. Trying to troubleshoot this with a BIOS you've messed with will be a nightmare. Next, stop trying to play games until you figure out what the problem is. The potential exists to kill the cards such that they require SMD component-level repairs if you keep doing that.

Now... I suspect that your problem is your power supply. The 690 is a thirsty card, and some odd quirks of its design mean that it demands 12 actual volts on its 12 volt rails in order to function properly. Unlike most other cards, it is not tolerant of lower voltage. If it doesn't have the required 12 volts, what happens is that the power MOSFETs, which require that voltage to turn on all the way will only turn on like 90% of the way. This causes them to burn off a very large amount of energy as heat as they switch on and off, and no amount of cooling you can provide will be able to cool them adequately, regardless of how fresh the pads and grease are.

Here's what you do:
1. Get yourself a multimeter and set it to voltage mode.
2. Start the system up and while it's idling at the desktop, measure the voltage from the 12V pins at the PCI-E connectors to ground. The housing of the DVI ports is a good ground to use. What you want to see is above 12V. Something like 12.05. Take note of this measurement. Obviously, you need to be very careful to only touch the probes to the 12V pins and ground.
3. Start up a game or a benchmark, and wait three or four seconds for it to get running.
4. Repeat the measurement between the 12V pins and ground on the card. Take note of this measurement.
5. Shut the game down.

A few things to note: You need to take this measurement at the card. You can't use a loose pigtail from the power supply. It must be measured at the card. Also, the 12V pins are the ones that have about 4.5K ohms of resistance to ground at the PCI-E connector, with the cables disconnected. Figure this out before you turn the system on, so you know which terminals to probe.

I suspect what you'll find is that your "12V" rail is drooping into the 11.7-11.8 range, and after a few minutes, it may droop even lower. If you read the thread, you'll see that I figured this out by repeatedly killing two of my cards before replacing the (very old) power supply in my test bench with a new one. Especially if you're trying to eventually use two of these cards for quad SLI, you need a very powerful, healthy power supply. I used an AX1200i with good results with a single card. I'm not sure if it could handle two.
 
Last edited:
  • Like
Reactions: Halon
like this
Thanks for the reply I really appreciate it.
I am indeed planning on doing a dual 690 setup once I get two cards running stable. It may not be practical but I'm more interested in cool hardware than crazy fps. And at only $89 a card for what was once a $1000 videocard that can still hold its own, I find that awesome.
I bought these cards three weeks ago and was constantly getting dpc watchdog violations and black screens. So after I got the dpc violations figured out the next thing I did was try a different power supply.
My original is a Thermaltake 875watt unit. I ended up taking my EVGA 1300watt out of a computer that I have running dual gtx 590's and it didn't seem to make a difference at all.
I am using an EVGA x99 classified board so I decided to plug in the extra power connector that feeds the pci-e ports...but it didn't make a difference.
I cleaned the pci-e slot but still no joy.
It really wasn't until I received the replacement card Monday that I decided to completely rule out any problems with my computer hardware or software, because the new card works perfectly.
I have a volt meter.... somewhere. I'll hunt for it and post my findings.
 
Thanks for the reply I really appreciate it.
I am indeed planning on doing a dual 690 setup once I get two cards running stable. It may not be practical but I'm more interested in cool hardware than crazy fps. And at only $89 a card for what was once a $1000 videocard that can still hold its own, I find that awesome.
I bought these cards three weeks ago and was constantly getting dpc watchdog violations and black screens. So after I got the dpc violations figured out the next thing I did was try a different power supply.
My original is a Thermaltake 875watt unit. I ended up taking my EVGA 1300watt out of a computer that I have running dual gtx 590's and it didn't seem to make a difference at all.
I am using an EVGA x99 classified board so I decided to plug in the extra power connector that feeds the pci-e ports...but it didn't make a difference.
I cleaned the pci-e slot but still no joy.
It really wasn't until I received the replacement card Monday that I decided to completely rule out any problems with my computer hardware or software, because the new card works perfectly.
I have a volt meter.... somewhere. I'll hunt for it and post my findings.
Yeah, you still need to do the voltage test. It doesn't matter what the rated wattage of the power supply is - it's the voltage that it actually provides that matters, and that can degrade over time.

Failing that, assuming it's not something like the GPU dies overheating (I assume you checked for this already, beyond just replacing the grease), I'd be inclined to suspect a board issue. It could be a cracked solder ball or trace, degraded dies, or something power related. You could conceivably try turning off SLI to see if the problem goes away when you only run one of the GPUs.

Edit: One other thing to check for is missing tiny SMD capacitors. These are super easy to knock off the board via rough handling, particularly when you see a pile of the cards stacked up like some ebay sellers like to show in their photos. As I showed in my thread about the 290X, you can get this kind of behavior if even one of those tiny caps is missing, if it's the right one.
 
Last edited:
I checked resistances on all of the pins. I also checked voltages on 3 of the 12v pins, the pin on the bottom row furthest to the left, the pin on the bottom row furthest to the right and the pin on the bottom row second to the last on the right.
Bottom row furthest to the right showed 0 voltage....but when i tested it for ohm's it showed close to 3k. The other two pins showed 11.5v but did not fluctuate at all even when the screen went black. I figured this was low so i decided to put it in the pc with the 1300w psu (which is new, just purchased about a month ago). This showed the exact same results. No voltage at the bottom right pin, constant 11.5v no fluctuations. Maybe my volt meter is reading a little on the low side.
How do i check to see if the gpu dies are overheating, i only replaced the paste and pads. I will do a side by side comparison and see if there are any components missing.
Another thing to mention that I had forgot about. Both cards would black screen just sitting at the desktop not doing anything, that is until I adjusted a setting in the nvidia control panel. I set the power management mode from optimal to performance. Which raised the static voltage from .987 to 1.050
Just this change prevented the computer from black screening at the desktop.
IMG_20200129_132143.jpg IMG_20200129_132149.jpg IMG_20200129_132157.jpg IMG_20200129_132206.jpg IMG_20200129_132213.jpg IMG_20200129_132220.jpg IMG_20200129_132231.jpg IMG_20200129_132240.jpg IMG_20200129_134446.jpg IMG_20200129_140411.jpg

I only uploaded pictures for the left bank, the right bank gives the exact same readings. The second to last picture is from the pc with the 875w psu. The last pictures is from the pc with the 1300w psu.
 
Your resistance measurements look good. A few thousand is what you should expect to see across a whole graphics card.

11.5 volts is way too low. You need twelve. actual. volts. It may just be that your voltmeter reads a hair low, but if that's accurate, that's your problem right there. I'd try to get my hands on a digital one and do the voltage test again. You should definitely see a small difference in voltage between idle and running a game - like .2 volts or less. In the pictures, is the card running a game, or just idling?

You can use GPU-Z or Afterburner to check the GPU die temperatures, but that's probably not the problem if it's not stable even at idle. I think I might check the actual core voltage with a digital multimeter next, to see if it agrees with what's reported by GPU-Z or afterburner. If it's lower than expected, that will lead to instability as well.
 
  • Like
Reactions: Halon
like this
Back
Top