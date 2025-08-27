I figured I would post about a recent theory I have been researching, and now am trying to track down hardware for, as well as any pictures I can find of GPU Artifacting for Pre-GCN 1, GCN 1, and the slight refresh that I call GCN 1.1, but really barely matters unless you care about package underfill.I am batty and care about underfill, so GCN 1.1 it is.## So What Is This Moonbat Talking About?2013 Apple releases this:The Tube Mac Pro, The nMP, honorable names.The Trashcan, Literal Garbage, more common names.I got this machine in order to shrink down my PC. I wanted to keep the platform, keep my CPU, and really just not have to worry about GPU's at all. This machine is actually part of an entire broadcast pipeline, and is essential to making the entirety of my broadcast setup for livestreams work.For now.This actually doesn't involve using the GPU's at all, whatsoever, and the machine doesn't get hot during broadcasts. Temps stay under 40, theres barely any thermal problems. I've also spent 3 months of my time stabilizing the cooling and understanding the best way to set the thing up, but after a period of time I determined something.Despite GPU problems, everyone was full of it about this machine.What was not understood, however, was the total pipeline needed to actually USE the nMP to begin with.## The Flaw, Outside the GPU Issues (we'll get to that)This machine is a Fat Client.And I am not being fascitious about that whatsoever.The expectation was that, if you bought this, you still had your old mac pro, which you would use to feed your nMP data and act as an expansion cage, leaving lighter scale, more threaded tasks, to the machine.Also yknow, icloud.If you had decent storage locally, even your own SSD you install, and a good network to host a NAS with VM's on the side, you not only have a storage rack, you have a GPU shelf, you have firewire if you bloody want it, you have thunderbolt allowing for actually insane breakouts to any sort of expansion you want, including a pcie slot in a box.While most people will complain its only a gen 3 X4 tier connection, I'll argue I'll take that over the GPU issues on the stock machine with no mods.Oh yeah about that.## The Issue People Actually Have Valid Complaints About: The GPUAt launch there were issues with the lowest tier GPU's, the D300's. We will get to this. There were other issues, completely separate to the issues I am looking at on the D300, that the higher tier cards, the D500 and 700, had that more or less were the death of the actual chips.The original run chips were the D300 and the D500. These machines are actually, mostly, fine from what I can tell. You will have the programming issues that I am looking at, but past that, unless you were just unlucky with GPU underfill, these should mostly act like they aren't programmed correctly.Second run was the final run for the D300, the release of the D700, and the start of the screen explosion art you see online. This was not resolved until mid 2017, and by then, who was buying these?The last last run, 2017-2019, will be the best machines you can get with all functioning parts that won't light on fire, but I don't know if they have other issues going on with the engineering, or if they stay 2013 standard.--------------------The exact issue I am looking at with the D300 is simple.The D300 is not a W7000, nor an HD 7850.Its a Sky 500.Over the years there have been many theories on the base of where the D300 chip originated. Many have assumed since it is a Pitcairn XL that it must be more aligned to what got released on the desktop.Apple was BUYING laptop and desktop parts, were they not? In fact it was only in 2011 that apple kicked Nvidias can and went full commit.However there was a fun thing that happened between 2009 and 2011. And it starts by looking at what had been before FireGL V had gone full blast.These were the "Server Tier GPGPU Processors" before Boltzman, before HSA, before AMDGPU was a concept in linux. When the design house for this hardware tier was rolling, there were a number of other design houses using all the same relative Dies and, while a 9270 might be based on UT SE, it was SIGNIFICANTLY different in power stages and timings compared to its desktop X2600XTX counterpart.In 2010, going into 2011, AMD got big britches knowing they were going to be Apple's ONLY GPU provider, and in turn underwent a corporate restructuring that ultimately was just a renaming and what would be the same as 4400 new Jira tickets today all relating to the CEO's confusion.During this Restructuring, projects like Torrenzo were closed and relpaced by Boltzmann, introducing GPGPU as a closed compute concept with the end goal being APU's gaining an "extra core" for low end users.This never turned out, sadly, mostly turning into how everything is developed, tested, and verified in hardware, and software wise the end result was the AMD ROCm suite in 2016.But where that entire mess got snared was this damn D300.## Introducing GCN 1, a Horrendous Mess of Impeccable AutismGCN in general is a mess. Trying to trace out what strains of GPU end up where, starting as a 7750 ending asn an RX435, its about as complicated as weed in pot shops nowadays, or at least it can feel that way.7970 is a 280 is a 380 is sort of(?) a 460But a 280X is something else entirelyA 290X2 and 295X2 are not the same cardand a D300 is a radeon Sky 500The reason this is important is because of the IO pickups to the cache, the addresses that poll from microcode reference, ram timings, power stages, power access. All of these can be tweaked to become 7770's, 7750's 250X's, RX420's, its simply a matter of the nm generation, the print quality, and what else is being marketted.With that massive corporate transition that AMD did to essentially redefine the company in order to look better standing next to Apple, they ended up screwing up the entire engineering department's testing and implementations lab, at some point mislabelling the Sky 500 based dies as W7000 based, which at most were just the desk accessible versions of the card in development meant for early testing in developing Metal and iPhone crap.Where this went wrong, I don't know, and where in the pipeline things get mixed up I'm not sure, but from how things are wired, from what I can tell from my current standpoint, both GPU's have their own "slots", so its not a problem with laning, but the SSD being mounted on the back of the GPU that is logically wired to the display logic throws everything off.This X16 is actually bifurcated, sharing X4, or adding a X4, no idea, for the SSD to communicate to the host bus.Depending on the SSD you actually use, entirely differently problems can occur.Anything from screen garbage, to absolute system halts and restarts.At most, the SSD should be a cache, or all 3D media NEEDS to be run on the 2nd GPU that IS NOT wired to the display logic. Near as I can tell, any SSD reads or writes knocks GPU 1 completely off course, and sends an interrupt and reset until the GPU gets whatever process correct. As the GPU is programmed to operate as a W7000, not a Sky 500 with 1 extra power stage, the chip has to do constant internal ECC in L1 cache to correct assumed addresses to real addresses that are actually available, and the more this fails in a row the closer you are to system reset.This is also why in some games, if you stop moving, go to the pause menu, touch nothing, the system actually could wake back up and come back. The GPU has been given enough time to recover.As this GPU is running an on-die ECC suite from the early Boltzmann days, this halt and correction state can take up to 5 whole minutes, where a modern RX7k can do this in milliseconds.However, it CAN recover. Eventually.## What the hell? So what do you do?Well, first off, what the hell firmware is being loaded?In linux there is no specific D300, DCN, Radeon Sky anything firmware. I looked. I actually dug through all the available repos between 2011 to now on arch, ubuntu, and gentoo, and other than getting drivers from AMD where a bin was in the driver pack, there is generally nothing out in the wild about this, at all, and the GPU's are otherwise forced to run the basic Pitcairn package.Again, while they are pitcairn, while they are the same family of print, the changes are just barely there enough to make ANY internal crash catastrophic.## So wat do?Well, right now I am looking for a huge list of crap. And some of you can even help if you want.First off, if you want to help me research this, go back and look up any pictures of AMD GPU artifacting you might be in possession of from your personal rig. Tell me the specs, what happened, and if the card eventually went black and died or not. Did you mod it? Did you reflow it? What did you do? What happened? Tell me literally everything you can possibly remember.Past this, I have a short list of cards I need to find and do physical testing with, possibly even killing the hardware. These include:The last AMD FireStream Cards, and their Desktop CounterpartsAMD Radeon Sky 500, and 700AMD Radeon R7 7770, 7870, 7950AMD FirePro W7000, W7100I also need multiple tiers of SSD to use in the PCIe slot, but if and when I have a remote possibility of affording one, I will get a DMI logic dumper plugged in to the M.2 slot, and get every firmware dump and signal dump I possibly can.I will also need to locate the UART on each board, including the PSU and USB controller.While mapping this information, I will need to cross check chip failure modes to any and all noticed difference in features, and end--of---life failure modes, determining if memory had more problem at the time, or the die itself.If the end result is an SMC firmware fix and an early init system added to opencore's bootloader, yeet. I have no damn clue how to do that but god dammit I'll learn./thread