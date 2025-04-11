I would put this in the apple section, but to be honest, the problem I have is with the GPU itself. This is more an investigation of firmware, PCIe calls, and all that stuff, in order to solve hard crashes, not "My Mac Brokie".



So heres the deal.



I have been optimizing my space since coming home after a rough period of life. I've now actually trimmed my power budget down a lot, and I have, essentially, 2 machines with server hardware that barely peak at 800watts at max use.



I have a Server that hosts data / feeds a Mac Pro that I use as a workstation / broadcast / recording station. This little tube replaces a P9X79LE build that was a fire hazard, and runs vastly cooler without a bunch of effort to find a GPU to put in it. However, I have run into an issue that... well, I guess wasn't really ever fixed until 2017, a year before the 6,1 was ceased in manufacturing, and even sparked the original D300 recall, which seemed to be a mix of issues (early manufac errors on top of the firmware problem).



My mac pro is a 2013 original build with 2 D300's. It seems the year 1 machines were all fine, possibly because the designers will still there working on it, so any weird problems were caught out the door.



The issue only exists in Linux and Windows, but in MacOS theres no problems.



The lead I was given was that the X4 connection for the NVMe happens to share lanes with the GPU its mounted on, which happens to be Card(0) (the first GPU). The reason would have been the same as the PS5. Fast access to 3D assets on request. However, I'm not sure that even works.



So, in MacOS, the NVMe is throttled. X4 drives operate at 1370MBps max, X2 at 1550.



In Linux, my drive supposedly hits 2200 - 2800, this I will need to double check.



In Windows 10 the NVMe hits 1800 solid, as reported by CDiskMark.



When a 3D Game is running that loads assets actively from storage, then runs to direct render, supposedly, the SSD and the GPU can collide in signals, and then the machine hard resets.



However, games that preload assets, that do not bitstream from storage, run fine.



TF2 in Linux will last for anywhere from 20 frames to 1500 frames, depending on the map and assets loaded, then will reset the system. On Windows 10 TF2 will last maybe 15 - 20 minutes before crashing. However, Adrenalin 23 WILL catch the hard crash, suspend the system, and hard reset it. It just looks like its taking a moment to get to the desktop. Only on occasions with lots of other things going on will the machine hard reset. Example, using the VCE encoder on GPU(1) (the second card) in OBS, while running TF2 on Card(0). This will trash everything and hard reset the machine.



Borderlands 2, and similar low asset games, even at highest settings, will not crash the game. I haven't played long enough to get past the snow area (I like running around in it), so I don't know if loading a new area will trash everything. Possible, but I'm not sure.



Emulators I have yet to test.



In Windows 10 games like Rust or Escape From Tarkov run for... idk, HOURS, before a crash. High Low Medium, doesn't matter the settings. However, theres a time period that is pretty hard set for when the games crash. Even in menu's. And, again, this DOESN'T happen in MacOS.



However, in Linux, any bethesda game, FO3, FONV, Skyrim, same as TF2. A few seconds, bang, system is gone.



In fact in MacOS the only things that trash everything are apps that call for Metal 2 and screw up the rendering because these aren't Mt2 capable. Vivaldi, anything electron, completely broken on newer MacOS with OCLP. Maybe worth looking at, no idea.



I'm trying to mostly just collect my notes rn but the behaviour is pretty consistant. On Linux crashes are fast, Windows, they take longer but can still happen, MacOS they don't happen.



First is to look for tools that can pull those calls out and log them, possibly externally, then a few tests.



1. Does using USB storage work?

2. Network storage?

3. TBolt?



Then some questions



1. Does throttling the SSD work in other OS's?

2. Are the SSD and GPU actually colliding?

2a. Why?????

3. Find a firmware image and see how this garglefunk mess actually operates.



Reapproaching forums so sorry if I am a little scatterbrained on post one. Honestly trying to understand this is a little baffling, so if anyone has any idea's I am all ears.