Graphics Card Necromancy, Continued: Radeon R9 290X

Discussion in 'Video Cards' started by RazorWind, Jun 7, 2019.

  1. RazorWind

    RazorWind 2[H]4U

    Messages:
    3,288
    Joined:
    Feb 11, 2001
    Folks! Following up on my previous thread here, we've got a new pile of dead graphics cards on the bench, and we're going to attempt to get at least one of them working.

    For those who prefer video, I made a video version.


    The cards in this case are Radeon R9 290X's - once again, we'll call them Card A, B and C.

    IMG_4067.jpg

    Card C has belonged to me for several years. I actually traded for it with a fellow [H]er, back when it was relatively current. It works perfectly fine, although as you can probably guess from the photo, fans are somehow a consumable for these things.

    Cards A and B I bought on ebay for about $20 apiece. They're both "dead," although Card B did actually produce a picture when I tested it a couple of weeks ago. Subsequent tests result in no picture, though. All three cards are identical physically, but Card A has the reference BIOS, as opposed to the Sapphire overclocked version that B and C have.

    Given the apparently intermittent nature of the problem with Card B, we're going to concentrate on Card A as our candidate for repair first. Card C will serve as a reference, since it's undamaged, and works properly.

    I've already removed Card A's heatsink. Unsurprisingly, there's no sign of physical damage to the front of the PCB.

    IMG_4070.jpg

    But if we look at the back...


    IMG_4760.jpg
    What's this? A missing cap? Hmm...

    I'd be sort of surprised if the missing cap is the reason our card doesn't work. Those tiny caps are generally there to help filter out noise in the power plane, but as someone commented in the GTX 690 thread, his card didn't work with one of the smaller ceramic SMD caps broken off, so some designs may be sensitive enough where most of the caps are actually critical.

    Before we just replace that capacitor, though, we should do some additional testing, to make sure it's related to our apparent problem. First, we'll check resistance on each of our voltage rails to see if any are shorted or open. Remember, we're looking for between 1 and 1000. Anything in that range is probably OK.

    Here's VCore. Looks OK.
    resistance_vcore.jpg

    VDDCI (AKA memory power). Also looks OK.
    resistance_vddci.jpg

    The Aux rail. Also looks sane.
    resistance_aux.jpg

    The 1.8 rail. I think this is related to the display ports... looks sane.
    resistance_1_8.jpg

    The .95 rail - I don't actually know what this does, but it's required for function. Also looks sane...
    resistance_95.jpg

    Unknown SOP-8 chip on the back of the card, pin 8, which is usually the phase pin on this type of regulator. This looks sane too, although I don't know for sure what this IC does.

    resistance_unknown.jpg


    Ok, that's all of our resistances. We didn't find any shorts, so that's good, and there's nothing with a huge resistance, which might indicate a totally open circuit. Now we need to power the card up and see which rails actually run.

    VCore - this should be 1.0 - 1.2 volts. So we know this rail isn't working.
    voltage_vcore.jpg

    VDDCI - this should be about 1.5 volts, so we also know that this rail isn't working either. Notably, this and VCore share a pretty complex controller.
    voltage_vddci.jpg

    .95 - Ok, this one is working.

    voltage_95.jpg

    1.8v Rail - This one is working too.
    voltage_1_8.jpg

    Aux - Not working. I have a feeling that this may be waiting for an enable signal from something else, maybe the memory rail. I think I mentioned in the GTX 690 thread that the output of one VRM is frequently wired up to the enable input on another, so that rails start up in a specific order.
    voltage_aux.jpg

    5V Rail - Also working. This is what powers the VRMs themselves. The controllers need power of their own, and in some cases it's also used for the gate drive of the MOSFETs.

    voltage_5.jpg

    Ok, so we've learned that something major isn't working at all. These symptoms lead me to suspect the problem lies in or around the control IC for our VCore VRM, which is shared with the VDDCI VRM. That's a pretty elaborate chip with 56 (!!!!) tiny pins. I think the next step is to look for anything simple, like power to it, or an enable signal that's missing, and for that, I need to consult the data sheet.
     

    Attached Files:

    buttons, Gamer X, AceGoober and 11 others like this.
  2. FlawleZ

    FlawleZ Gawd

    Messages:
    812
    Joined:
    Oct 20, 2010
    Subscribed (again). I hope to learn more from your endeavor.
     
  3. Bawjaws

    Bawjaws Limp Gawd

    Messages:
    433
    Joined:
    Feb 20, 2017
    I love this stuff, although it's way over my head. Looking forward to reading more!
     
    Kardonxt and Randall Stephens like this.
  4. Meeho

    Meeho [H]ardness Supreme

    Messages:
    4,472
    Joined:
    Aug 16, 2010
    You're the text version of Louis Rossmann
     
  5. THUMPer

    THUMPer 2[H]4U

    Messages:
    3,054
    Joined:
    May 6, 2008
    Make sure you do a follow up. You'll have some views shortly. :D
     
    auntjemima likes this.
  6. Thevoid230

    Thevoid230 n00b

    Messages:
    57
    Joined:
    May 7, 2019
    Was following your 690 thread and by coincidence, I just had a G92 card go dead short and blow up the mosfet in a PSU in a family computer.
     
  7. RazorWind

    RazorWind 2[H]4U

    Messages:
    3,288
    Joined:
    Feb 11, 2001
    And here I was thinking he was the video version of me... :D

    Also, I made a video of this one.

    G92 = 8800GT generation? You probably got your money's worth out of that.

    A quick update:

    Three dead rails, two controllers. One (ONSemi NCP5230) has a reasonably complete data sheet. The other (Infineon IR3567B) just has a pinout, with no explanation of what the pins do, I suspect because it's super complex and they actually program it at the factory for their customer's specific application, and it's designed to control two rails independently.

    I did a lot of probing, and as best I can tell, I'm missing an enable signal for the memory rail. The trace has a test pad I can probe, but I think the other end of it is on the opposite side of the board, so I'm going to have to rig up a way to find it. I wanted to cover this in a video, so I didn't take any pictures. Without finding the other end of the trace, I can't be sure whether it's the memory waiting on the core/aux to power up or the other way around, but I don't think both of them are actually damaged. Another possibility is that the memory rail is starting up, but has a short I haven't found, and then aborts.

    Lastly, with great respect, I think Mr. Buildzoid may be incorrect in his 290X breakdown video. He claims that the memory and core VRMs are controlled by the IR3567B, and just kind of glosses over the Aux rail. This is untrue. In reality, it's the core and aux that are controlled by the IR3567B, and the memory is controlled by the NCP5230.

    Oh yeah, also Card B just started working again, at least well enough to load windows. Not sure what's up with that, but it's convenient for testing purposes.
     
  8. RazorWind

    RazorWind 2[H]4U

    Messages:
    3,288
    Joined:
    Feb 11, 2001
    Semi-bored at work, and browsing ebay, I found this:
    https://www.ebay.com/itm/MSI-Lightn...739910?hash=item2acd954b86:g:rbQAAOSwdtBc-KJT

    And this:
    https://www.ebay.com/itm/Titan-Z-12...587991?hash=item287ed51e57:g:eCIAAOSwTPdc-vrH

    I'm kind of tempted to buy that Titan Z...

    Anyway, progress!


    I tracked down the data sheets for our control and phase drive ICs, and tested the phase drive pins for each of the large power MOSFETs on the board. We need the data sheets because two of each mosfet's three terminals are hidden under that huge drain terminal on the top, so we need to know which pins on the controller they connect to, and check there.

    The VCore ones seem sane...
    High side:
    vcore_high.jpg

    Low Side:
    vcore_low.jpg

    The memory rail has a different controller with integrated phase drivers. Its pins are SUPER tiny, so I'm not even sure I'm probing the right pins, but this all looks at least kind of sane. I don't see anything that's obviously a problem here, so I'm moving on. I'll come back to it if I can't find anything else wrong...
    mem_high.jpg
    mem_low.jpg


    Finally, I looked at the Aux rail.

    High side looks alright.
    aux_hi.jpg

    But the low side...
    aux_low.jpg

    That's, uh, not so good.


    So, at this point, we know that some part of the low side gate drive on the Aux rail is basically shorted to ground. The problem could lie in either the phase drive IC or the MOSFETs themselves, but we can't tell which it is with both of them still on the board. So, let's remove the drive IC, since it's the easier of the two.

    Flux on...
    flux.jpg
    Heat it up...
    heat.jpg

    And off it comes.
    yank.jpg

    Now, we test the resistance on the low side gate again.
    better.jpg

    That looks much better. We'll confirm our issue by testing the IC we removed.
    yep.jpg

    Same resistance value as when it was on the board. So, while the MOSFETs may also be hosed, we know this IC definitely is.

    While I could cannibalize one of the working cards, I think I'll just source a new IC. Ill probably also get a couple of fresh MOSFETs just in case, too. So, we'll reconvene once we have our spares on hand and can solder them back on the board.
     
  9. RazorWind

    RazorWind 2[H]4U

    Messages:
    3,288
    Joined:
    Feb 11, 2001
    Aaaaand we're back. First, I'd like to apologize to those of you who were following along. I didn't mean to abandon this thread, but real life happened, and the card sat on my healing bench for two months, and I'd look mournfully at it every once in a while. Today, though, it's hot as balls outside, meaning it's a perfect day to get back after this.

    I eventually got my hands on some replacement MOSFETs and a replacement phase drive IC, and swapped the old ones out with the new ones. Here it is with the flux reside still on it.
    IMG_5126.jpg
    IMG_5125.jpg

    Unfortunately, the Aux rail, and thus the card, is still dead. At this point, I was pretty stumped. Clearly, there is something wrong with the circuit that creates this rail, but I've now replaced the three most complex and delicate components, and it's not fixed. Today, though, I took a closer look at one of the other 290Xs that I have, and I noticed that on my dead card, this little resistor here has no markings at all, which may not be indicative of damage, but struck me as odd, since they usually have at least some kind of marking. On my other cards, it's marked with a zero, which usually indicates that it's a zero ohm resistor.
    suspect_resistor.jpg

    A resistance check across it reveals an open circuit, though. I sanity checked this against one of my working cards, and sure enough, I get zero ohms on that card.

    A little more probing reveals that it's connected to pins 2 and 4 on our dead phase driver IC. These pins accept the voltage that will be supplied to the high side and low side mosfets' gate drives, respectively. If no voltage is supplied here, the phase driver won't be able to turn the mosfets on at all, because the signal it sends to them is dead. Perhaps unsurprisingly, the other side of the dead resistor is connected to our 12V pin on the PCI-E edge connector. Per the data sheet for the CH8510/IR3537 phase drive IC, you're supposed to supply 12V to those pins.

    So, we've found a burned out zero ohm resistor, which seems to be used as a fuse. Our next step is to replace that resistor with a good one.

    Here's our patient, once again prepared for surgery.
    IMG_5127.jpg

    Bad resistor removed.
    IMG_5129.png
    IMG_5130.jpg

    The pads cleaned up, and ready for the new resistor.
    IMG_5131.jpg

    New resistor installed. Note the "0" marking.
    IMG_5132.jpg
     
    Kardonxt, IKV1476, AceGoober and 2 others like this.
  10. RazorWind

    RazorWind 2[H]4U

    Messages:
    3,288
    Joined:
    Feb 11, 2001
    Well, crap.

    I changed out the failed resistor, checked resistances on the rail's various pins, and then, satisfied I hadn't obviously made it worse, plugged the card back into the test bench and switched it on. Sadly, the aux rail is STILL dead.

    I triple checked that the proper voltage is being supplied at each of the phase driver's pins, and it is as far as I can tell. I've got 12V at VCC and both gate supplies, and ~11.5v at the bootstrap pin, which I think is appropriate if the phase isn't actually running. I then checked the numerous 0 ohm resistors on the board, and all of the minor rails one more time. None of the resistors are open, and all of the minor rails I tested in the original post are still working.

    At this point, there are only so many components on the card that could cause this, with the number one suspect being our main voltage controller, the IR3567B.

    As I mentioned before, this IC is super complex, and is also not very well documented. Furthermore, it's apparently programmed by the factory to the specs given by the customer. This is problem in that even if I could remove it from the board and replace it (not a given, since it has a HUGE ground contact on the bottom), I would have to get one that's programmed for use on a reference 290X. I have two working boards that I could cannibalize, but that doesn't seem right, cannibalizing a working board to fix a dead one.

    Anyway, to rule out something simple, I did at least check that the obvious required voltages are present at the right pins. AMD was kind enough to provide probe pads near this thing, so I at least didn't have to probe the QFN pins directly.

    VCC for this IC is 3.3V. I've got 3.3 volts at the pad for pin 39 (blue), which is its main power, and at the enable pin (red). From what I can tell, if those are present, it should be attempting to run. I also checked the CFP pin (yellow), which is an output that goes high if the output of the VRM exceeds whatever its configured maximum voltage is. Zero volts there.

    IMG_5135.jpg

    And that's where I'm at. I suspect what may have happened is that one of the low side MOSFETs on the Aux rail shorted the gate pin to its drain. When this happened, it sent 12V back through the phase driver, and then back through its current sense output to our controller, which then lobotomized the controller.
     
  11. Halon

    Halon Limp Gawd

    Messages:
    289
    Joined:
    Aug 13, 2004
    Aww. That's too bad. I don't have your acumen for this stuff, but your MOSFET hypothesis sounds well-reasoned. Fare thee well, 290x.
     
  12. Kardonxt

    Kardonxt 2[H]4U

    Messages:
    2,991
    Joined:
    Apr 13, 2009
    Sad we won't get to see this thing resurrected but really impressed with how far you have made it. Way over my head and very interesting. Thanks for the thread!