Ryzen 7000X3D Series: A Brief Technical Chat with AMD

The 7000 series, and especially the delay between the original and the 3D cache chips was supposed to be for them to sort out these issues but...it seems it was incomplete
TSMC recently (2022) presented on the hybrid pitch bonding that they use for the stacked cache
https://www.semianalysis.com/p/packaging-developments-from-ectc
There is a lot to unpack there but TSMC is still having problems getting the cache stacked correctly, and the copper-based formulas' electrical resistance is too great.
When the chips aren't perfectly stacked the resistance goes up substantially, making a bad situation worse. They also know they have a hard ceiling of around 1.35 volts before the current itself causes the bond to break down.
Most of the delay was TSMC working with BESI to clean up the accuracy of the 8800 Ultra they use for the stacking process, it had too large a variance and it resulted in too large of a failure rate at the smaller N5 process node.
 
This is frustrating considering that AMD , while they don't have Intel's wallet, have been very successful in the CPU and GPU arenas over the past several generations. I can somewhat understand not necessarily wanting to spend for a hardware solution but if AMD is ever going to get beyond the "Intel and Nvidia are the default" viewpoint in the market they can't afford to screw up. If they really couldn't afford to add to a hardware monitor in the CPU (or something chipset based in their big new fancy X670E boards?) then I would expect them to innovate by having whatever firmware and OS/driver/software side absolutely on point to make up for it. Even if they just had a comprehensive (and ideally open source) driver package and utility that included a bunch of profiles for known games and applications, allowed users to modify or make their own custom ones, some sort of "best guess" heuristics and monitoring, and then spent some time with Windows and Linux devs to see what sort of improvements could be made to the scheduler and other components to mesh well with the CPU and its software - things could have been better, even without an on-chip monitor. The fact they didn't (or haven't) for long awaited, top of the line chips is very frustrating.
On Linux at least you can sort of do this with tasket - ie. pin a process to a particular set of CPUs. If a process spins off children in separate PIDs, as games might for audio, then you could also pin those to separate CPUs. Basically if all AMD's Windows scheduler is really doing is something like looking up the optimal CPU mask set for an application like:

Code:
if (info = list_of_optimized_applications[process])
    pin_process(process, info.cpu_mask)

then you can easily replicate that in Linux from userspace. I suspect (hope?) there's more going on though. For example, what if child processes communicate with the parent via shared memory/IPC? You'd probably want to be a bit smarter about where you place parent that requires MOAR cache relative to children which require MOAR freq. None of this addresses the fact that a process may need moar cache at some point and moar freq at another. Oh and account for the movement of processes from one CPU to another.
 

New BIOS Updates Attempt To Keep Ryzen 7000X3D Processors From Frying Themselves (arstechnica.com)

"Over the weekend, users on Reddit and YouTube began posting about problems with AMD's newest Ryzen 7000X3D processors. In some cases, the systems simply stopped booting. But in at least one instance, a Ryzen 7800X3D became physically deformed, bulging out underneath and bending the pins on the motherboard's processor socket. In a separate post, motherboard maker MSI indicated that the damage "may have been caused by abnormal voltage issues." Ryzen 7000X3D processors already impose limits on overclocking and power settings, but new BIOS updates from MSI specifically disallow any kind of "overvolting" features that could give the CPUs more power than they were built to handle."
 

New BIOS Updates Attempt To Keep Ryzen 7000X3D Processors From Frying Themselves (arstechnica.com)

"Over the weekend, users on Reddit and YouTube began posting about problems with AMD's newest Ryzen 7000X3D processors. In some cases, the systems simply stopped booting. But in at least one instance, a Ryzen 7800X3D became physically deformed, bulging out underneath and bending the pins on the motherboard's processor socket. In a separate post, motherboard maker MSI indicated that the damage "may have been caused by abnormal voltage issues." Ryzen 7000X3D processors already impose limits on overclocking and power settings, but new BIOS updates from MSI specifically disallow any kind of "overvolting" features that could give the CPUs more power than they were built to handle."

This probably deserves its own thread just for awareness sake, as it has now been reported on non 3D chips too. I came across a der8auer video earlier where he talks about it on his 7900X non-3D and other cases that have popped up as well now that folks are looking for it.

 
This probably deserves its own thread just for awareness sake, as it has now been reported on non 3D chips too. I came across a der8auer video earlier where he talks about it on his 7900X non-3D and other cases that have popped up as well now that folks are looking for it.


 
In a way I'm happy to see this is not solely a 3D Cache issue, or a 2 CCX issue, but either way I hope it A) is not particularly widespread. It only takes a relative few apparent instances for the rumor mill online to spread and things to get out of hand quickly (ie don't buy brand X, don't by anything model Y from Brand X, don't buy anything of model Y by manufacturer Z from Brand X etc) but it may not be entirely accurate. It may be exceedingly rare (though still worth fixing, exchanging those affected, and patching where possible) or it may be limited to particular configurations like the whole 4090 thing. and B) there are fast remedies. Glad to see Asus atop even possible factors with some solutions, but I am curious if these mitigations will end up being overly restrictive when it comes to overclocking or other performance issues. Ultimately we'll need to properly identify the specific frame under which this is likely to happen to ensure the most targeted fixes possible, hopefully without things ending up in a "well we know it kinda happens in this area, so instead we're going to just rope off the whole area and call it a day" situation. There's a lot of discussion about the relationship or lack thereof between EXPO, SoC voltage and other factors that could be involved (including most of all lack of thermal protections kicking in) but it would be nice to see what sort of hardware configurations are seemingly leading to these problems, not to mention the fact we're seeing it on even non 3D cache proc's recently despite the half a year since launch adds more questions, including if any recent updates are what leads to this vulnerability. Lots of variables, in any case - I admit from a more than slightly self-interested perspective I grant - to see how the 7950X3D fares paired with a higher end Asus ROG X670E mobo and reasonably OC friendly RAM. In theory these boards were both most likely to default to more agressive performance , but also had some of the best hardware and firmware features capable of handling OC, tweaking, and reporting safely.

Either way its a bit of a frustration though as it may very well mean that AMD gets distracted from the previously discussed slapdash 3D cache 79xxX3D scheduling solution in favor of something that seems to be way more dramatic from a PR standpoint. Either way, lets hope it get sorted soon on both fronts. It seems that when everything is going right the 3D Cache CPUs should be an excellent addition to the already solid 7000 series, but I want to see AMD , chipset/board manufacturers, and others putting effort into ensuring this.

On Linux at least you can sort of do this with tasket - ie. pin a process to a particular set of CPUs. If a process spins off children in separate PIDs, as games might for audio, then you could also pin those to separate CPUs. Basically if all AMD's Windows scheduler is really doing is something like looking up the optimal CPU mask set for an application like:

Code:
if (info = list_of_optimized_applications[process])
    pin_process(process, info.cpu_mask)

then you can easily replicate that in Linux from userspace. I suspect (hope?) there's more going on though. For example, what if child processes communicate with the parent via shared memory/IPC? You'd probably want to be a bit smarter about where you place parent that requires MOAR cache relative to children which require MOAR freq. None of this addresses the fact that a process may need moar cache at some point and moar freq at another. Oh and account for the movement of processes from one CPU to another.
Oh I agree - I too hope there is more going on/communication about as much information of what is going where. If nothing else, I'm sure that enterprising users (perhaps assisted by Valve?) will eventually pull together a utility that is a collaborative, open source profiler or what to pin where, - if they have to. However, the fact that (at least on launch and so far), AMD, mobo partners, and the like haven't shown a plan is on their radar (aside from WINDOWS XBOX BAR ON? CACHE GO!) thus far is the bigger issue. Clearly this is something that can and should have been dealt with on multiple levels to make up for the decision not to have Intel style on-board monitoring, planned in advance. It was an easy PR loss to avoid with relatively little investment, plus the potential for AMD to pick up even more favor from the Free/libre open source community by having whatever monitoring, tweaking etc. utilities be FOSS and with open APIs and SDKs for others to further extend them. Failure to do this (or at least something similar seems like snatching defeat from the jaws of victory - launching your high end "do everything and all of it well, old school HEDT alternative" CPUs with so many circumstances where some reviewer can catch it not working in an optimal fashion is mind boggling.
 
Last edited:
This probably deserves its own thread just for awareness sake, as it has now been reported on non 3D chips too. I came across a der8auer video earlier where he talks about it on his 7900X non-3D and other cases that have popped up as well now that folks are looking for it.



AMD response

https://www.techpowerup.com/307808/...ement-on-ryzen-7000x3d-series-burn-out-issues
 

AMD Releases Second Official Statement Regarding Ryzen 7000X3D Issues

by T0@st Today, 12:58 Discuss (22 Comments)
AMD has today released another statement to the press, following on from controversy surrounding faulty Ryzen 7000X3D series processors - unlucky users are reporting hardware burnouts resulting from voltage-assisted overclocking. TPU has provided coverage of this matter this week, and made light of AMD's first statement yesterday. AMD ensures customers that it has fully informed ODM partners (motherboard manufacturers) about up-to-date and correct voltages for the Ryzen processor family - yet user feedback (via online hardware discussions) suggests that standard Ryzen 7000 models are also being affected by the burnout issue - this side topic has not been addressed by AMD (at the time of writing). This second statement repeats the previous one's recommendation that affected users should absolutely make contact with AMD Support personnel:
 

New BIOS updates try to keep Ryzen 7000X3D processors from frying themselves [Updated]

AMD is putting more voltage limitations in place in future BIOS updates.

AMD says that it has identified the issue that was causing some Ryzen 7000 CPUs to burn out and that it has released a new version of AGESA (an AMD-controlled part of every Ryzen system's BIOS) "that puts measures in place on certain power rails on AM5 motherboards to prevent the CPU from operating beyond its specification limits, including a cap on SOC voltage at 1.3V." "None of these changes affect the ability of our Ryzen 7000 Series processors to overclock memory using EXPO or XMP kits or boost performance using [Precision Boost Overdrive]," the company's statement continues. Motherboard makers should release new BIOS versions with these updates "over the next few days." "We recommend all users to check their motherboard manufacturer's website and update their BIOS to ensure their system has the most up to date software for their processor," says AMD.

I'm guessing that these new limitations apply to both standard X and 3D cache 7000 series? Despite AMD's assurances, I am a bit concerned that the limits they place may be overzealous and end up restricting overclocking even on capable, high end motherboards and high performance coolers. However, I only have a passing familiarity with Ryzen 7000 OCing and performance tweaking, so these limitations may be a non issue; can anyone confirm? So far it mentions SoC voltage of 1.3V, but clearly they are limiting other factors as well. Especially after this negative buzz and these potential limits as mitigation for the burnout issue, I can only hope that AMD decides to put some effort into making sure their chips, especially the 3D cache models, are as performant as possible by developing something akin to the open source/spec platform independent software and firmware core monitoring/pinning/profiling solution I proposed earlier.
 
  • Like
Reactions: erek
like this
1682714646036.png
 
This deserves it's own thread. Never change AMD, never change.
Why AMD?
Have you watch GN video?

AMD's problem: they have set the voltage limit for x3d chip and "only communicate" to the mobo vendors (ODM), instead of forcing them to follow through the limit.
Asus is so far the worst offender here by setting auto VSoC to spiking 1.39~1.4v and giving 400w limit to x3d instead of lower value.
 
I think like other issues of the recent past, this will be corrected and few will be harmed and AMD will do what they should.
Seems like we have constant fanboi battles of trying to over hype anything.
That said it seems to me that AMD isn't employing enough "details" folk these days. From an outside observer it seems to me that there are too many "yes men" and their corporate culture may be in need of a shake up. Less diamond rings and more leather jackets?
/s
 
Why AMD?
Have you watch GN video?

AMD's problem: they have set the voltage limit for x3d chip and "only communicate" to the mobo vendors (ODM), instead of forcing them to follow through the limit.
Asus is so far the worst offender here by setting auto VSoC to spiking 1.39~1.4v and giving 400w limit to x3d instead of lower value.
I have. It goes beyond VSoC limits and Steve explains and hints on other issues.
 
I have. It goes beyond VSoC limits and Steve explains and hints on other issues.
If the VSoC goes beyond the limit, then it's not AMD's. It's mobo vendors (ODM) responsibility to follow through the limit spec of Vcore / VSoC set by AMD, and it's also ODM's responsibility to ensure that their $200++ boards have basic safety function such as lower OCP / OVP etc...

Like I said, the only problem caused by AMD is that they don't force the limit upon mobo vendors. They just communicate / info it, not force it. They have to force the mobo vendors to limit the vsoc to 1.3v and lock the vcore in the first place especially for x3d chip.
 
Wouldn't a cpu a bit (if not mostly) in charge of throttling itself (shutdown if need too) in case of overheating ?

AMD seem to be doing the fix and seem to involve CPU stuff like:
PROCHOT Control and 'PROCHOT Deassertion Ramp Time

at least according to articles:
the new firmware also updates the PROCHOT Control & PROCHOT Deassertion Ramp Time which is an internal mechanism on Ryzen 7000 CPUs used for thermal protection.

And a quick google.

There is different issues, over tension, over current, heat, how high temperature are detected and handled, etc..
 
Just watched Steves's video, he mentions specifically that many of the faults could be around the GPU IO, I wonder if that has anything to do with the rumored issues with the Ryzen 3 stuff and the last-minute changes they had to make there. I know the iGPU is Ryzen 2, but the IO die is all new and I wonder if there is some sort of share flaw between the two platforms?
 
Wouldn't a cpu a bit (if not mostly) in charge of throttling itself (shutdown if need too) in case of overheating ?

AMD seem to be doing the fix and seem to involve CPU stuff like:
PROCHOT Control and 'PROCHOT Deassertion Ramp Time

at least according to articles:
the new firmware also updates the PROCHOT Control & PROCHOT Deassertion Ramp Time which is an internal mechanism on Ryzen 7000 CPUs used for thermal protection.

And a quick google.

There is different issues, over tension, over current, heat, how high temperature are detected and handled, etc..
It should have been like that but Steve of GN found out that the Volt SoC of 1.4v already enough to cause instability towards x3d chip, then in the long run, causing damage to the chip while the Asus mobo in-test kept over supplying current to the chip while trying to resurrect / power on the chip, thus burning the inside component(s).
 
I think it’s a good idea to hide it in here. That way people have to dig through a thread not related to chips exploding to see what is going on.

Though I have a strong feeling if it was related to Intel or Nvidia, it would have its own post 1 min after if was discovered on Reddit…. A simple observation.

If it’s broken out to it’s own thread then other members of the forum may complain
 
Back
Top