Graphics Card Necromancy - Radeon R9 295x2

RazorWind

Supreme [H]ardness
Joined
Feb 11, 2001
Messages
4,646
I concentrated on the video for this one, so I regrettably didn't get a ton of photos, but what we have on the bench today is 2014's king of heat, a Radeon R9 295x2. For those unfamiliar, this is two full-fat Radeon 290Xs on one board, which can be engaged together using Crossfire, or simply used to drive many displays or for compute tasks independently. I have a soft spot in my heart for these dual core cards, just because I think they're cool. They're obviously not the most practical thing in the world, especially given that this card cost $1500 when it was new, which means you're paying almost a 50% premium to get those two GPUs on one board. This particular design is also interesting in that it features a novel dual-jacket liquid cooling unit that integrates a separate pump/jacket for each GPU and one shared radiator. As far as I know, this was the first time this design was used, although AMD did release a few similar designs later, which they called "Pro Duos."

252479_295x2_1.jpg


The cooling unit itself:
295x2_6.jpg 295x2_7.jpg

The cooling unit is contained inside this shroud assembly, and the two aluminum and copper plates pictured here are used as heatsinks for the memory and VRMs.
295x2_8.jpg 295x2_9.jpg

I actually have two cards, which we'll call A and B. A is the one that's stripped down here, mid-repair, and B is the one still wearing its cooling unit. Both cards came to me via ebay for about $50 apiece, not working. Card B looks pretty clean. Card A looked like it had lived a pretty hard life in a hot and humid environment, and has residue from a strange, very shiny thermal past on it. Also, note how grubby the pads look, and the corrosion on the heatsink in the photos above.

295x2_4.jpg

Card A had some very obvious damage to one of these tantalum caps on the #2 12 volt rail, shown here with the offending cap removed. Given the obvious physical damage, I decided this was the better candidate of the two for repair. I did some quick probing on the card when I received it, and found that I had a 100 ohm short to ground on that rail, versus ~3K ohms on the other GPU's 12 volt rail. 3K is sane for the resistance through an entire graphics card. This would explain why the card doesn't work, and also indicates the potential for further damage if I tried to power it up, so we're going to fix the obviously broken bits before we try to test it.

295x2_5.jpg

I don't know what's wrong with Card B. I have plugged it in and tested it, and it really doesn't work, but beyond some physical damage to these two tiny SMD caps that I assume are for noise filtering on one of the PCI-E lanes, I don't see anything obviously wrong with it. That's the extent of the testing I've done, though. We'll get to this card eventually, but I suspect that the problems with it run deeper than just those two caps. Note the nick in the backplate nearby - I suspect the card got dropped or mishandled after whatever actually killed it happened.
295x2_10.jpg

Anyway, back to Card A...

I did some resistance checks, and here's what I found:
295x2_11.jpg

#1 Vcore (bright red): 4.0 ohms - looks ok
#1 Memory (bright purple): 121 ohms - looks ok
#1 Memory controller (light yellow): 47 ohms - probably ok

#2 Vcore (dark red) 4.0 ohms - looks ok
#2 Memory (dark purple) 138 ohms - a little high, but probably alright
#2 Memory Controller (dark yellow) 60 ohms - also a little high, but alright.

I'm not sure what the other voltage controllers do without powering the card up and checking them. I would guess that 1.8v is produced by the ones labeled in dark green (one for each GPU), the light green is a 1.8ish supply for the PLX bridge, and the white is maybe .95v, which I think has something to do with the display drive. The one by the DVI connector in dark gray is an ON Semiconductor 78M05G, which produces 5V, I assume having something to do with the display output. The ICs labeled in blue are most likely bios chips, since there are four of them, but this form factor is sometimes used for small VRMs, like we saw in my thread about the GTX 690.

Here's the back of card A, as it sits now:

295x2_2.jpg

The remains of the failed capacitor. From what I can determine, this is (was) a Panasonic 16TQC100MYF tantalum electrolytic. These are among the best, most expensive capacitors on the market, prized for the SMD mounting, compactness and very good high frequency noise filtering, and they appear to be used here for space reasons. In bulk, these things cost about $2 apiece, meaning we're looking at $20 worth of capacitors just in the two 12V banks on the back of the card. Once you start thinking in those terms, it's a little easier to understand why high end graphics cards cost so much. It should be noted that there are more of these caps on the front, at the right-hand side, south of the power connectors.

295x2_3.jpg

When I heated the failed cap up, it sort of disintegrated, and the ceramic casing on the outside started flaking off, as you can see there. This card has a ton of copper in that area, and it took a ton of heat to unsolder this. I'm a little worried I may have damaged something else, as the resistance on that rail went up to 20Kish once I removed it which is pretty high, but we'll see once I get the new cap installed.

And that's where I'm at. I made a video for Youtube here, for those who prefer that:
 

Attachments

  • 295x2_1.jpg
    295x2_1.jpg
    1.9 MB · Views: 0
Aww yiss!!



Moving right along, I finally got the new caps in my hand, and proceeded to clean up the pads, and solder a fresh cap in its place. The markings on the cap indicate the capacitance (100 uF in this case) and the manufacture date, hence why the new cap's markings differ from the others. Soldering it on there was tough - the card has so much copper in that area that my soldering iron has a hard time getting it hot enough. I may reheat it with my hot air station later and see if I can get it soldered a little neater. Anyway...
295x2_1.jpg

I sanity checked the resistance across the two 12V rails against card B, and everything checked out. The number 1 rail tops out around 3.2K, and the number 2 rail tops out around 25K. Given that card B didn't work either, that's not a guarantee that we're fixed, but it's something.

I then reinstalled the CLC on the cores, using Arctic Silver 5 instead of the wacky bright silver crap that was on there before. Whatever the other stuff is, it's a pain in the neck to remove. Anyway, here it is, ready to test.
295x2_2.jpg

So, I plugged the card in, flipped the switch, and...


Signs of life!
295x2_3.jpg

I didn't have an OS drive plugged in, so I don't know if it will work in Windows just yet, but it at least works well enough to produce an image. There's something wacky going on with the BIOS that looks... not good, like it's in a different language or something, but we'll worry about that later. For now, I need to fully reassemble the cooling unit so I can run the card without risking damage to the VRM.
 
I have a soft spot in my heart for these dual core cards, just because I think they're cool.

These are dual GPU cards, not dual core.

I'm a little worried I may have damaged something else, as the resistance on that rail went up to 20Kish once I removed it which is pretty high, but we'll see once I get the new cap installed.

I don't know which points you were measuring, but you want to be as far away from 0 ohms as possible between a high and low impedance part of the circuit. Less resistance means more current will be drawn. The fact it jumped up to 20k ohms means you removed a short on the power rail.
 
Very cool. You have some nice skills there. Glad to see people making the most of old hardware!
 
When dealing with the internals, it makes more sense referring to them as cores.

The card has two discrete GPUs on it, it's a dual GPU card. Calling it a dual core card makes no sense because each GPU die has 2,816 shader cores on it.
 
These are dual GPU cards, not dual core.

I bet you're really fun at parties. :D

Edit: I guess you're not technically wrong, though. Fair enough.

I don't know which points you were measuring, but you want to be as far away from 0 ohms as possible between a high and low impedance part of the circuit. Less resistance means more current will be drawn. The fact it jumped up to 20k ohms means you removed a short on the power rail.
Experience working on graphics cards of this vintage has taught me that what you want is not necessarily high resistance, but correct resistance. Too high is unlikely to cause any damage, but it's also a clue that you haven't fixed the problem yet. I talked about it in the video, but once I removed the failed cap, and the resistance went up to 20Kish, I couldn't be sure if that's just normal, or if I damaged something in the process. The other rail's resistance was much lower, and they appear to be nearly identical, with the remaining components apparently connected to power via the slot connector.

If you read my second post, I talk about how I got out Card B to compare it to, and saw that the resistance through both rails on that card matches Card A's, so either the same thing is wrong with both of them, or 3K on #1 rail and 20K on #2 rail is normal. I concluded that it's probably the latter, but lacking a schematic, it's hard to say for sure without powering the card up to see if it works or not.

Anyway, I was all set to get started reassembling the cooler when I realized that the thermal pads for the memory chips are super thin, so I had to order some in that thickness. I should probably go outside or something...
 
Last edited:
The card has two discrete GPUs on it, it's a dual GPU card. Calling it a dual core card makes no sense because each GPU die has 2,816 shader cores on it.

Again, we get that. But he said he reinstalled the CLC on the cores. You wouldn't say he reinstalled in on the two CARDS as it is on just one card. As is anyone was confused and thought he reinstalled the CLC on each one of the 2816 cores.
 
Again, we get that. But he said he reinstalled the CLC on the cores. You wouldn't say he reinstalled in on the two CARDS as it is on just one card. As is anyone was confused and thought he reinstalled the CLC on each one of the 2816 cores.
I think he was giving me shit for calling it a "dual core card" in my original post, which he's not really wrong about.

It's just that, that's a super pedantic thing to give me shit about, when my soldering looks that bad, and I'm clearly just guessing at what the problem is.
 
The picture of the card shows 2 big cores so calling it dual core is not really a big deal. People with 1/2 a brain will figure it out.
 
There are probably hundreds of high end cards with similiar problems on eBay that you could flip for a profit with your skills.

Reducing the amount of e-waste is a great bonus too.
 
Quick $50-$100 GTX 1080ti, Good example:
https://www.ebay.com/itm/EVGA-GeFor...038055&hash=item3d979cf62d:g:AL4AAOSwj-5d0aeR

"Puff of smoke, and then didn't work." Could be just a blown cap as well.
That closed for almost $300, which as I recall, is not much more than I sold a like-new one for a few months ago.

I've thought about this before, and no matter what angle I approach it from, I don't think I could really make the economics of it work out as a business.

* The really high end cards come with warranties that last much longer than the card is relevant for
* Ebay, at least, is not a cheap enough source to leave much room for profit when reselling the cards as working.
* I don't think it would go too well to try to sell repaired cards without disclosing they've been repaired, and given how angsty even HardForum users seem to be about expecting used parts to be perfect, I don't think you could expect anyone to pay enough for them to be worth it.

What could maaaaaybe work is just repairing broken cards for their owners, like Louis Rossmann does with Macbooks, but even that is a long shot. He charges about $350 for a typical logic board repair, which is a lot easier to justify on a $2000 laptop with as short a warranty as they tend to come with, especially now with their soldered in SSDs trapping the user's data in there. Graphics cards, on the other hand, have too short a lifespan and too long a warranty to really be worth the cost of repairing for most people.

I should probably be clear that I don't actually need this card for the purpose of playing games. I just like fixing broken stuff and posting on Youtube.
 
As an eBay Associate, HardForum may earn from qualifying purchases.
I love seeing your posts and videos covering the troubleshooting and repair work. I have a defective card I would love to somehow resurrect. I'll shoot you a PM about it.
 
* Ebay, at least, is not a cheap enough source to leave much room for profit when reselling the cards as working.
* I don't think it would go too well to try to sell repaired cards without disclosing they've been repaired, and given how angsty even HardForum users seem to be about expecting used parts to be perfect, I don't think you could expect anyone to pay enough for them to be worth it.

People sell refurb and repaired parts on Ebay all the time, I used to do it years ago before ebay started screwing individual sellers in favor of people or companies running large stores pushing high volumes of products.

What could maaaaaybe work is just repairing broken cards for their owners.

I've done that a few times here and on another forum. There's really no profit in it, it's just a hobby of repairing old gear.

I got an Xbox 360 E and a controller from the dump a couple of weeks ago for $10 that was half submerged in water and corroded to hell. I just got it and the controller working on the bench a few days ago. The controller was a basket case, one of the analog stick complexes was a rusting hulk and several micro switches were the same. I got replacements off Ebay and it works fine now. The only thing that can't be fixed is the connector for the optional NiMH battery pack as the mosfet on the board corroded away the pads on the PCB (batteries were left in the controller when it was submerged in water) so it'll always be limited to two AA cells. I need to find a tiny microswitch for the wireless button on the top of the controller as well, but I don't really know what it's for since you can push the xbox logo to connect to the console.
 
People sell refurb and repaired parts on Ebay all the time, I used to do it years ago before ebay started screwing individual sellers in favor of people or companies running large stores pushing high volumes of products.



I've done that a few times here and on another forum. There's really no profit in it, it's just a hobby of repairing old gear.

I got an Xbox 360 E and a controller from the dump a couple of weeks ago for $10 that was half submerged in water and corroded to hell. I just got it and the controller working on the bench a few days ago. The controller was a basket case, one of the analog stick complexes was a rusting hulk and several micro switches were the same. I got replacements off Ebay and it works fine now. The only thing that can't be fixed is the connector for the optional NiMH battery pack as the mosfet on the board corroded away the pads on the PCB (batteries were left in the controller when it was submerged in water) so it'll always be limited to two AA cells. I need to find a tiny microswitch for the wireless button on the top of the controller as well, but I don't really know what it's for since you can push the xbox logo to connect to the console.
As I recall, the button on the top is used for pairing. You can turn the controller on with the "jewel" button, and it will reconnect to the last device it was used with, but to use it with a different device, you need that button to put it into pairing mode. That's pretty impressive if you managed to get something that had been submerged in water for days with a battery installed going.

Anyway, I spent some time last night working on the card. I shot some B-roll for the next episode, of reassembly of the heatsink, where I cleaned the flux residue off, removed the gross old thermal pads, and replaced them with about 40(!) new ones, which I cut by hand from sheets of thermal pad stuff, and then shut the camera off and tried to test the card again.

The card sort of works in Windows, with the "Microsoft display adapter" driver loaded. It seems to choke once it tries to load whatever Windows thinks is the real driver (that is, not the one direct from AMD), and I just get a black screen with the display connected to it. I have some ideas about why this is, which I'll test tonight, hopefully. If I use a second card for the display, it seems to work properly, and I have a hunch I could mine the shit out of some Ethereum with it, but I didn't have time to try.
 
Back
Top