Bought an ex-mining RX480 and it's crashing at stock

Discussion in 'Video Cards' started by ChefJoe, May 13, 2019.

  1. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    Hi, I recently bought a MSI Gaming x RX480 4G card. I removed my prior Nvidia drivers using DDU and put in the RX 480, then installed Adrenaline. I'm new to AMD graphics cards and their wattman utility, so I'm trying to decipher what adjustments are reasonable and where to make them.

    The card started up with the right clocks and a valid bios number, so I'm thinking it was flashed appropriately back to stock bios. However, when I tried to run Firestrike or Time Spy benchmarks, the card was crashing hard (blank screen, required full power and cmos reset to get running again).

    I thought it might be the incorrect bios for the card, so I extracted the bios using GPU-Z and asked on MSI's forum for the bios that belongs with the serial number. The bios provided by MSI matches the SHA value of the one I extracted, so I have not flashed the bios and have relied on what the seller did.

    I started from bios-default for the card (1303 / 1750 gpu/mem clock) and then disabled zero RPM, adjusted the power level higher, and used the slider to adjust GPU frequencies down 10%. Then I could game and even run benchmark programs without the hard crash, but it was like the card was briefly freezing every 2-3 seconds. When I looked at the longer graph in GPU-Z's clock, it looks like the card is throttling back frequently. When I only dialed the GPU back -7%, it was crashing mid-run.

    I'm doing what testing I can on my end to see if this card is broken in some way or if there's something I'm overlooking, but the added confusion of new tools and different terminology is pretty high.

    Does anyone out there have any thoughts about these sensor readings that were recorded running Time Spy/suggestions on what to try or might be wrong? t6-graphs.PNG t6-wattman.PNG
     
    Last edited: May 13, 2019
  2. N4CR

    N4CR 2[H]4U

    Messages:
    3,518
    Joined:
    Oct 17, 2011
    Turn memory speed down. Miners usually rape the memory more than anything else and then claim the cards have an easy life. On Vega it's an issue due to hbm degradation so I'm curious and suspecting you have unhappy gddr. Good luck..
     
  3. thebufenator

    thebufenator [H]ard|Gawd

    Messages:
    1,090
    Joined:
    Dec 8, 2004
    RAM wouldn't causes crashes with RX cards, just graphical issues.

    I know you said you checked the bios, but I suggest reading the values in that bios using Polaris Bios Editor. The graph you post shows a very solid VDDC which looks too low, as well as really high temps.
    Also, those downclocks are when the core temp spikes.

    My guess is that your heatsink needs to be re-seated with thermal paste, and verify the bios is correct.

    If you post a link for your bios that you extracted I can take a look at it.
     
    ChefJoe and auntjemima like this.
  4. Eymar

    Eymar Limp Gawd

    Messages:
    232
    Joined:
    Sep 15, 2005
    Temps and fan speed looks high on load, don't think should touch 75c or above 2000rpm if it's the huge MSI 2 fan heatsink. Maybe dried up thermal paste?
     
  5. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    It's easy enough to try, so can attempt it.

    The bios was at MSI's forum in this thread. With the SHA values the same, I figured that's a pretty strong indication not to mess with it just yet.

    Yeah, I did expect I'd probably need fresh thermal paste and possibly a set of new fans, but I figured I'd worry about doing that TLC after just running stock for a bit. The temp isn't completely out of line with blower card values, but maybe if it's throttling down the gpu and still hitting mid 70s there's more of an issue.

    Perhaps changing too many things at once, but I backed the memory down from 1750 mhz to 1600, the memory voltage back to 975 mv from the 0.985 default, and went down the cpu states list and reduced the voltages by 1 notch (roughly 50 mv) the whole way through the list. Didn't crash, but the trace doesn't look all that different and it was still doing the ~2 second interval stuttering while running timespy.
    t8-graphs.PNG
     
    Last edited: May 13, 2019
  6. thebufenator

    thebufenator [H]ard|Gawd

    Messages:
    1,090
    Joined:
    Dec 8, 2004
    Is 1175 the max core clock?
     
  7. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    Stock from the bios is 1303 but my -10% setting in wattman brings it to 1175.

    The more I look at temperature graphs from reviews, the more I'm seeing this card's rapid temperature fluctuations look off.
     
    Last edited: May 13, 2019
  8. Joust

    Joust 2[H]4U

    Messages:
    2,468
    Joined:
    Nov 30, 2017
    Is the card able to breathe?

    Are the fans running? Set them to max. Damn the noise: cool the card. If you have ANYTHING like the temp spikes you're getting, with the fans running, you're not transferring heat worth a crap. A good cleaning wouldn't hurt. Thermal paste at the same time.

    If you suspect the BIOS is wrong - and I would only get there after exhausting temperature issues - it's a bit of a hassle but it is fixable. That being said, if you bone up that deal, it moves from a mere inconvenience into a real bitch.

    This is why I insisted on doing BIOS flashes on all the (many) cards I sold. I had the original bios, literally from each card, to put back on.
     
  9. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    The card has plenty of room to breath, the side panel is off and there's like 5 inches + of space below the card.

    At first I suspected the bios might be wrong, since it couldn't handle stock and there are a few bios files at techpowerup for this model card. Now I've peeked at the card a lot more carefully and I noticed a screw is completely missing out of the backplate, which is the piece with the serial number sticker on it. The little MSI warranty screw sticker is still on, so I kind of presume it's the original TIM.
     
  10. N4CR

    N4CR 2[H]4U

    Messages:
    3,518
    Joined:
    Oct 17, 2011
    Is the screw for the GPU mount?

    Ive never seen temp graph that granular before... paste definitely worth checking.
     
  11. RazorWind

    RazorWind 2[H]4U

    Messages:
    3,183
    Joined:
    Feb 11, 2001
    Yeah, the temp/clock/power graph looks pretty funny, to me. I'd be checking the heatsink mounting and TIM. If you're missing a screw, that's probably the source of your problem right there, if it's one of the four that hold the heatsink on the GPU.

    I'd probably also try to get my hands on the real deal stock bios image and just flash it on there. Who knows what the miner did, and whether or not he or she actually flashed it back.
     
  12. ccityinstaller

    ccityinstaller 2[H]4U

    Messages:
    3,839
    Joined:
    Feb 23, 2007
    which driver are you running? I had some FUNKY issues with an RX 550 4GB tht was a miner dump back in spring (got it for $85 which was dirt cheap at the time and I was power maxed running 3 OC'd and undervotled VEGAs (along with the 7 large fans and pump and 4.3Ghz 2700 AC turbo in sig on a Focus PLus 850W Gold)....I needed to stay under 75W, so I got a card made for that knowing I was going to only be using ~35W or so for HTPC duties (x265 decoding etc)....The card would run fine at it's stock 1750Mhz ram for about 20 mins, and while temps never were very high (llow 50cs with the single fan 1/2" from my case side) it would start to artifact or give a solid green screen.



    I tried everything from driver 18.4.3 to the 18.7.x before ruling out a driver issue. I pulled the bio stats from P. Bios Editor and restored to stock with no luck. I then DL'd the factory bios direct from MSI and flashed 3 times with no lucl, reinstalling drivers or not....Finally had a mining buddy shoot me over a bios for his Asus 4GB 550 and I flashed that...Windows booted, the card installed, but was giving me a code 43 error (but HTPC playback was FIXED)...

    I left it this way until tearing down my loop and replacing the VEGAs with a pair of VIIs. I moved the 550 to an old AMD P2 X945 system I had collecting dust while waiting for GPU blocks to be released for the VIIs. I decided to reflash the MSI BIOS (this was with 19.2.x installed now in order to support the VIIs) and the system rebooted and the card was found and WAttman picked it backup. Code 45 was gone, and it has worked every day running HPTC duties on high bitrate x265 (it rams the card up quite high for 2+ hours at a time) with no issues...




    Sorry for the long winded post, but I would try redoing the TIM, and then reflashing the stock BIOS. DDU and install fresh drivers once the card comes back up. If that does not fix it, if the card uses a reference PCB, then flash with another vendor's referece PCB or seek a return from the vendor or your payment type.
     
  13. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    Nope, it was a screw at the "far from the display connections" end of the GPU that connects the backplate to that metal plate on the front that connects to most of the components but isn't the main heatsink base. All 4 screws of the main heatsink base are there and have the little springs present.

    What I was provided at the MSI forums for the serial number had the exact same SHA value as what I'd extracted from the card using GPUz. They said it was flashed back to stock and it was in their "test rig" before sending, but when I said I was having issues and inquired about what tests were done/if there's any troubleshooting suggestions they have, I've not gotten any response back from them. Different coasts, work schedules, etc so I'm trying to be patient considering I first messaged them about issues Sunday evening.

    I started out with the 19.4.1 version of Adrenaline and then uninstalled using DDU and installed 19.5.1 yesterday in case there was some odd bugfix, possibly.

    Taking apart the card to re-seat/replace TIM or even bios flashing is not something I want to do without the seller's acknowledgement/permission to attempt it. Right now, the card crashes at stock settings and, if downclocked, does short freezing every 2 seconds during 3d games/benchmarks such that it's choppy in games and scores about half the benchmark score the GPU should. I think that's a very strong case if I should have to go through a return process. If I start removing heatsinks or bios flashing, those are things I'd only attempt with permission from the seller.
     
    Last edited: May 14, 2019
  14. Eymar

    Eymar Limp Gawd

    Messages:
    232
    Joined:
    Sep 15, 2005
    The way the GPU clocks have stair step look then either a power or temp throttling. Since the target temp for the MSI gaming rx480 is 73c( https://www.techpowerup.com/vgabios/185789/msi-rx480-4096-160720 ), that is most likely reason for the throttled clocks. Which could be to TIM, fans, heatsink seating or GPU has indeed gone bad. If not replacing TIM then manually set fans to 75-100% before running benchmarks to see if that helps.
     
  15. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    I believe that temperature target is specifically for the fan speed control and thermal throttle for the gpu is 90C. Right now the most favorable "the GPU works" outcome is if I get permission from the seller to remove the heatsink to check on/replace the TIM/make sure the heatsink is mounted on the chip. I just don't want to be fighting with them about my competency in doing that if replacing TIM doesn't fix the problem.

    The card was from FS/FT here so generally the flavor is nobody does that level of ebay-accusations.

    edit: I set GPUz to record with 0.3 sec resolution and recorded the start-up of TimeSpy with the prior -10%, reduced memory speeds, etc. I'm not sure it's physically possible for the actual GPU temperature to jump 10 C in 0.3 seconds, even if the TIM was gone.

    Code:
            Time                 GPU Clock [MHz]      GPU Temperature [ーC]      GPU only Power Draw [W]      VDDC Power Draw [W]
    34:55.6    1104.6    74    82.2    56.9
    34:55.9    361.8    64    28.7    17.9
    34:56.2    1124.4    73    91.3    72.7
    34:56.5    420.2    66    30.3    13.7
    34:56.8    1071.9    70    100.7    81
    34:57.2    610    70    35.8    18.2
    34:57.5    944.1    68    101.5    82.3
    34:57.8    949.8    72    54.1    35
    34:58.1    738.5    64    54.7    58.7
    34:58.4    1109.2    73    68.3    58.3
    34:58.7    396.2    63    18.3    10.2
    
     
    Last edited: May 14, 2019
  16. thebufenator

    thebufenator [H]ard|Gawd

    Messages:
    1,090
    Joined:
    Dec 8, 2004
    btw, you probably don't want to use the bios pulled with GPU-z. Whenever I did that, it was always a different file from what atiflash would pull. Atiflash would also refuse to flash a bios pulled with GPU-z saying something like, "device ID mismatch". I do not know why.


    I suggest using the newest ATIflash on techpowerup -- https://www.techpowerup.com/download/ati-atiflash/

    Download the bios, then use Polaris Bios editor to read it -- https://github.com/IndeedMiners/PBE...ses/download/1.7.2/PolarisBiosEditor1.7.2.zip

    Will take you no more than a few minutes to see what the power steps are set to.
     
  17. ccityinstaller

    ccityinstaller 2[H]4U

    Messages:
    3,839
    Joined:
    Feb 23, 2007
    I'm willing to wager it's still on a modiifed 1 click strap timing or the power limit is turned down at the bios level, either way using PbE.


    If you bought it here, I would simply return the card. There are too many great deals on used cards to deal with a bad one.
     
  18. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    The longer filename (top picture) is what I extracted using atifliash just now. The bottom picture is the file that MSI support provided me based on the serial number sticker on the backplate. I can't see any difference and the CRC values (I think in green) came up the same.

    polaris-extracted.PNG
    polaris-163.PNG
     
  19. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    12,090
    Joined:
    Oct 7, 2000
    my rx470 works best with afterburner, not wattman. you could try that but would need to ddu and reinstall the new drivers to undo the wattman control. but as mentioned, you should to redo the TIM and make sure the heatsink screws are good and tight. if it were ram you would get sparkles/glitches not crashes.
     
  20. thebufenator

    thebufenator [H]ard|Gawd

    Messages:
    1,090
    Joined:
    Dec 8, 2004
    Do the tail end of the timing straps look same? The part that is hidden?
     
  21. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    Yeah, I don't see any difference.

    extracted from card -

    polaris-extracted-timings.PNG
    163 straight from MSI
    polaris-163-timings.PNG
     
  22. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    12,090
    Joined:
    Oct 7, 2000
    the bios looks fine to me, theres something else going on. do you have any other monitoring programs or afterburner running in the background? i see you have tried downclocking but have you tried bumping the power limit?
     
  23. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    No other monitoring programs than GPUz, HWinfo, and the Adrenaline panel.

    After it crashed using bios defaults I actually bumped up power limits to 50% and progressively backed down from that, even with a -10% on the gpu frequency. It was still doing that rapid gpu frequency/temp cycle.
     
    pendragon1 likes this.
  24. auntjemima

    auntjemima [H]ardness Supreme

    Messages:
    4,339
    Joined:
    Mar 1, 2014
    This is clearly a paste issue. It's stuttering because it's overheating and downclocking. It's the very first thing you should have done when you noticed fan speed at near 100% and high temperatures.

    If you won't do it without the sellers permission, then get it and do it.
     
    pendragon1 and RAutrey like this.
  25. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    My concern with calling it overheating is that the vbios is set to thermal throttle at 90 C and shutdown at 94 C. When I put the card under stress for 1 min and record with 0.3 second intervals with gpuz, it not only rapidly bounces around (sometimes 10 degrees between measurements) but the highest temp I record is 75 C. I would have expected a reading closer to the thermal throttle setting if that was the cause.

    I hope to hear from the seller.
     
  26. ryan_975

    ryan_975 [H]ardForum Junkie

    Messages:
    13,959
    Joined:
    Feb 6, 2006
    Are GPUz and the vBIOS reading the same temperature sensor? Does the vBIOS apply any offset to the sensor data like AMD does with their CPUs?
     
  27. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    12,090
    Joined:
    Oct 7, 2000
    nope, no offset on the gpus.
     
  28. Susquehannock

    Susquehannock 2[H]4U

    Messages:
    3,196
    Joined:
    Jul 26, 2005
    Have the same exact card. If it's like mine the factory paste was installed like crap by the factory robot. Thick as a dime in some spots and not making much contact.

    102032_paste1.jpg
     
  29. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    Minor update: I did make contact with the seller earlier tonight and they're not going to leave me with a bum card, even offering return shipping. I'm not sure if I'm going to solve what's going on with this card, though.
     
  30. funkydmunky

    funkydmunky 2[H]4U

    Messages:
    2,240
    Joined:
    Aug 28, 2008
    Did they give permission to re-paste? Worth a try to see.
     
    chockomonkey likes this.
  31. auntjemima

    auntjemima [H]ardness Supreme

    Messages:
    4,339
    Joined:
    Mar 1, 2014
    I'm not sure why you're even here, honestly. The two major suggestions were a BIOS change to the known one from MSI, even though you assure us your SHA value is the same and reapplying paste.

    Neither of which you are willing to do. Are you here for help or?
     
    Armenius likes this.
  32. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    I was asking what looked off about the card, for things to try, and am fully willing to attempt things that won't compromise the seller's willingness to accept a return. I know if I sold something and then someone returned it as broken after saying they performed bios flashes and TIM replacement, I'd be more suspicious of what that person has done to the card. If I considered my money gone/the card to be fully mine I'd have followed all those suggestions. I've offered to do TIM replacement on my end if they'll still accept a return if that doesn't solve the problem.

    I also pointed the seller to this thread to report what I was seeing better than a conversation thread.
     
    Last edited: May 15, 2019
  33. HAL_404

    HAL_404 Limp Gawd

    Messages:
    251
    Joined:
    Dec 16, 2018
    Yup. It's a mining card alright :barefoot:
     
  34. thebufenator

    thebufenator [H]ard|Gawd

    Messages:
    1,090
    Joined:
    Dec 8, 2004
    Hey now, my ex mining cards make great lanparty gaming cards :LOL:
     
    Shadowed likes this.
  35. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    The seller did give permission to change the paste and they were also very willing to just accept a return without me bothering. I forged ahead and found that the paste was indeed extremely dry. I'd removed most of it before remembering I should take a photo or two. The CPU chip looked ok, although maybe one quarter of the top had some sort of pocked/micro-abrasion appearance to it that didn't lift off with a q-tip.
    thermal-paste.jpg odd scratches on chip.jpg
    I'd initially noticed some extra dust around the VRM area and some oily looking residue (like, little droplets on the board in-between mosfets). I cleaned that off but the thermal pads appear soaked with whatever that is. I'm not sure what to make of that as it's around all the VRMs (even the two on the other side of the board/the memory vrm area). Might just be cheaper thermal pads breaking down after a long time running (something I haven't seen before).
    residue near vrm.jpg oily residue in thermal pads.jpg
    After cleaning that oily residue up, removing the smaller broken bits of the vram thermal pads, and then applying some MX-4 thermal paste to the chip, the card appears to be doing OK and running Timespy benchmarks without throttling down. During teardown I did notice that the standoffs for the heatsink have a bevel to rest on the card, but the heatsink also relies on 4 springs to keep tension. Maybe during shipping the heatsink got dislodged/rode up that bevel and wasn't making real contact with the chip.

    added timespy run graph, which scores a shade under 4,000 for graphics in time spy 1.0, which is expected.
    p3-graphs-ts.PNG
     
    Last edited: May 16, 2019
  36. Eymar

    Eymar Limp Gawd

    Messages:
    232
    Joined:
    Sep 15, 2005
    That's more like it, the fans on the huge heatpipe HSF rarely go above 25% so big sign it's temp related. Nice to know drivers save your card from burning up, looks like the target temp becomes max temp if temp rises too fast.
     
    ChefJoe likes this.
  37. thebufenator

    thebufenator [H]ard|Gawd

    Messages:
    1,090
    Joined:
    Dec 8, 2004
    Looks like you are good to go!

    Nice that it worked out.
     
    auntjemima and ChefJoe like this.
  38. ChefJoe

    ChefJoe Limp Gawd

    Messages:
    154
    Joined:
    Oct 25, 2009
    It's a very nice part of dealing with HardForum's FS/FT, sellers aren't trying to move product quickly without making sure the buyer is happy/just viewing problems as an expense.

    Yes, I was planning to replace the thermal paste of an ex-mining card all along, but I was very confused about the readings being worse than I'd have expected from just dead/dried thermal paste.
     
  39. vividshock

    vividshock [H]Lite

    Messages:
    124
    Joined:
    May 31, 2014
    Lesson of the day...don't need seller's permission to change simple thermal paste...
     
  40. auntjemima

    auntjemima [H]ardness Supreme

    Messages:
    4,339
    Joined:
    Mar 1, 2014
    Incorrect. If a buyer contacted me to tell me his video card he bought from me isn't working as expected and that they had already removed the heatsink, I would be weary.

    edit: but I am glad the thermal paste application did the trick, ChefJoe !