Major flaw detected with AMD 7900 XTX vapor chamber cooler

Some OEM screwing up and underfilling the cooler or improperly sealing it allowing it to evaporate away make a lot more sense to me than anything else I have seen speculated.
This is exactly the issue. AMD only makes the actual GPU, and that is it. All the reference cards were assembled by Flextronics. I do not know what company built that cooler.
 
This is exactly the issue. AMD only makes the actual GPU, and that is it. All the reference cards were assembled by Flextronics. I do not know what company built that cooler.
While true, AMD is still responsible for each of the issues so far. The buck ultimately must stop with them.
 
This is exactly the issue. AMD only makes the actual GPU, and that is it. All the reference cards were assembled by Flextronics. I do not know what company built that cooler.
I'm thinking Dynatron, but yeah there are so many options out there that it doesn't really matter somewhere some poor shmuck making a slave wage somewhere in China was working at a station with a faulty pressure valve or some substandard solder and a batch or two went through their station and QC didn't catch it.
 
Some OEM screwing up and underfilling the cooler or improperly sealing it allowing it to evaporate away make a lot more sense to me than anything else I have seen speculated.

Yep, this sounds completely plausible, and I hope it is correct, as that would be a relatively easy and relatively cheap RMA fix.

Hopefully it won't have reputationally sabotaged sales for the entire generation of AMD cards already though :/
 
AMD should be able to hopefully identify the "bad" batch of coolers and cross reference those with build serial numbers and get the affected ones back to be replaced.

(In a way it is very good to see this as this hopefully addresses some of the sporadic benchmark results we saw. Given we are back into the day of mostly "reviewers" running canned benchmarks, it makes me wonder how many ever checked actual clocks while under load.)
 
AMD should be able to hopefully identify the "bad" batch of coolers and cross reference those with build serial numbers and get the affected ones back to be replaced.

(In a way it is very good to see this as this hopefully addresses some of the sporadic benchmark results we saw. Given we are back into the day of mostly "reviewers" running canned benchmarks, it makes me wonder how many ever checked actual clocks while under load.)
Has anyone mentioned that we all REALLY miss [H] reviews these days? REALLY really miss them?
 
if it was this, wouldnt the "current temp" just keep climbing too?(kyles doesnt even throttle)
has anyone confirmed that the junction temps go to "normal" with a different cooler or even under water?

edit: are there multiple different issues? im confused...


jays take
 
Has anyone mentioned that we all REALLY miss [H] reviews these days? REALLY really miss them?

Second this.

Everyone these days is trying to be an influencer. No one is a tech journalist anymore.

And that's exactly what big tech wants. They can sell their spin via influencers. They don't want anyone challenging them on the facts.
 
if it was this, wouldnt the "current temp" just keep climbing too?(kyles doesnt even throttle)
has anyone confirmed that the junction temps go to "normal" with a different cooler or even under water?

edit: are there multiple different issues? im confused...


jays take

Watching his take on this will only make you more dumb than you were to begin with. His take shaved two points off my IQ. This is just another talking head talking about what someone that did actual research presented. I am not sure that JazzyJ even understands what derBauer presented.

Here is the original information. Video starts where it gets into the meat and potatoes of the issues.

 
Watching his take on this will only make you more dumb than you were to begin with. His take shaved two points off my IQ. This is just another talking head talking about what someone that did actual research presented. I am not sure that JazzyJ even understands what derBauer presented.
I'm pretty sure he does but he needs something to get the clicks to hold him over until the CES stuff finishes up so he can report on that.
 
so does this affect the 7900xt too? don't they have the same cooler?
 
Watching his take on this will only make you more dumb than you were to begin with. His take shaved two points off my IQ. This is just another talking head talking about what someone that did actual research presented. I am not sure that JazzyJ even understands what derBauer presented.

Here is the original information. Video starts where it gets into the meat and potatoes of the issues.


lol
yeah i watched that but skipped a bit and missed some important parts i guess as i dont recall the vertical vs horizontal part but caught it in jays recap. was your link supposed to be time stamped?
has anyone slapped another cooler/water onto a problem card, see if temps are normal and stay that way?


edit: and just so i got things straight, this is only the amd ref. 7900xtx affected?
 
lol
yeah i watched that but skipped a bit and missed some important parts i guess as i dont recall the vertical vs horizontal part but caught it in jays recap. was your link supposed to be time stamped?
has anyone slapped another cooler/water onto a problem card, see if temps are normal and stay that way?


edit: and just so i got things straight, this is only the amd ref. 7900xtx affected?
And some AIB's... It's probably going to all land on one major OEM none of us have really heard of who makes these coolers and others like it for all sorts of other companies had an off week. Maybe a faulty pressure valve, a bad vacuum pump, or something relatively minor that resulted in a few thousand of the coolers being produced before the part finally gave out completely or was discovered to be faulty. Not knowing how long the tool was malfunctioning all they can really do is sit and wait for AMD and the AIB's to process their stuff so it can get pushed up hill back to them.
Happens all the time, it sucks but it happens with anything mass produced, its just tolerances aren't what they once were and there is very little room for error so we are seeing it more.
 
And some AIB's... It's probably going to all land on one major OEM none of us have really heard of who makes these coolers and others like it for all sorts of other companies had an off week. Maybe a faulty pressure valve, a bad vacuum pump, or something relatively minor that resulted in a few thousand of the coolers being produced before the part finally gave out completely or was discovered to be faulty. Not knowing how long the tool was malfunctioning all they can really do is sit and wait for AMD and the AIB's to process their stuff so it can get pushed up hill back to them.
Happens all the time, it sucks but it happens with anything mass produced, its just tolerances aren't what they once were and there is very little room for error so we are seeing it more.
December is a horrible time to hard launch a video card, no way around that. Chinese new year being very early this year, also probably had an impact as well. I imagine it was all "assholes and elbows," which is not a good way to run.
 
December is a horrible time to hard launch a video card, no way around that. Chinese new year being very early this year, also probably had an impact as well. I imagine it was all "assholes and elbows," which is not a good way to run.
Dec and Jan nothing gets done the way it's supposed to be getting done, it's either a countdown to not being there or a shit show as everybody comes back and everybody is playing catchup. I get AMD had to do it when they did and it sucks for everybody, they really need to get a new production schedule in place.
 
Yep, this sounds completely plausible, and I hope it is correct, as that would be a relatively easy and relatively cheap RMA fix.

Hopefully it won't have reputationally sabotaged sales for the entire generation of AMD cards already though :/
Oh, don't worry, it's already sabotaged sales for AMD right now, given that I have to return my RX 7900 XTX and get my money back once its RTX 4080 FE replacement arrives, estimated on Saturday.

Now, I paid attention and know that it's specifically an issue with reference 7900 XTXs, but thanks to the media spin and general inaccuracy of reporting, some people think that literally all XTXs are affected when that's already proven to not be the case.

That's what I mean when I say the reputation damage is already done, and AMD just squandered their biggest opening against NVIDIA in a long while.
 
lopoetve said:
Has anyone mentioned that we all REALLY miss [H] reviews these days? REALLY really miss them?

Second this.

Everyone these days is trying to be an influencer. No one is a tech journalist anymore.

And that's exactly what big tech wants. They can sell their spin via influencers. They don't want anyone challenging them on the facts.
I don't understand statements like this. The very same guy that did GPU reviews here is doing them the very same way just on another site. There is no reason to miss them.
 
I don't understand statements like this. The very same guy that did GPU reviews here is doing them the very same way just on another site. There is no reason to miss them.

That is fair, and I do read TheFPS Review, and those guys are great and I am thankful for what they are doing.

What I miss is the more investigative side of things. The, "I am going to get to the bottom of the space invaders issue" or "take on the industry when they are being asshats".

HardOCP when it was active had a bit of a pull behind it because it had a large readership, and that gave Kyle quite a bit of influence. As great as TheFPSReview is, it doesn't have the same ability to make the big chipmakers backpeddle.
 
That is fair, and I do read TheFPS Review, and those guys are great and I am thankful for what they are doing.

What I miss is the more investigative side of things. The, "I am going to get to the bottom of the space invaders issue" or "take on the industry when they are being asshats".

HardOCP when it was active had a bit of a pull behind it because it had a large readership, and that gave Kyle quite a bit of influence. As great as TheFPSReview is, it doesn't have the same ability to make the big chipmakers backpeddle.
Interestingly, ever since actually joining what's colloquially called "big tech" as an engineering leader, I couldn't possibly list the names of the people I know -- and are on my teams -- who root cause and provide detailed report-outs on issues of this magnitude as weekly tasks. Three or four world class design engineers with ~$5M in lab equipment and a couple of technicians to operate, would be able to burn through "tech journalism exclusives" matters like a knife through butter. It's just not worth it when said engineers safely earn mid-six figures while designing the actual cutting edge, next gen thing... why would you instead waste your time trying to debug somebody else's poorly designed crap in order to get an article published? The real investigative capacity is already captured by the companies that actually create products like silicon, PCBAs and hardware components.
 
That is fair, and I do read TheFPS Review, and those guys are great and I am thankful for what they are doing.

What I miss is the more investigative side of things. The, "I am going to get to the bottom of the space invaders issue" or "take on the industry when they are being asshats".

HardOCP when it was active had a bit of a pull behind it because it had a large readership, and that gave Kyle quite a bit of influence. As great as TheFPSReview is, it doesn't have the same ability to make the big chipmakers backpeddle.
Oh agreed - but [H] had a bigger pull for getting the kit and working with it than they do, sadly :( And the above- Kyle had the sources and contacts to dig in and uncover some unique stuff, and wasn't afraid to take the hits (Infinium labs, anyone?).
 
Interestingly, ever since actually joining what's colloquially called "big tech" as an engineering leader, I couldn't possibly list the names of the people I know -- and are on my teams -- who root cause and provide detailed report-outs on issues of this magnitude as weekly tasks. Three or four world class design engineers with ~$5M in lab equipment and a couple of technicians to operate, would be able to burn through "tech journalism exclusives" matters like a knife through butter. It's just not worth it when said engineers safely earn mid-six figures while designing the actual cutting edge, next gen thing... why would you instead waste your time trying to debug somebody else's poorly designed crap in order to get an article published? The real investigative capacity is already captured by the companies that actually create products like silicon, PCBAs and hardware components.
Also true - I've worked for the software and hardware side of that for almost 20 years now.
 
Igor Wallossek has access to unreleased internal emails from AMD Acknowledging the issue

Exchange or return? AMD offers both

There are now also instructions to the employees concerned to address the problem actively and accommodatingly if the customer can plausibly explain or prove a corresponding error. The first cards are already in exchange and AMD offers both a refund and an exchange for a working graphics card. However, the return shipping for a refund is at the customer’s expense, which is unattractive but not unusual.

https://www.igorslab.de/en/amd-rade...apor-chamber-affected-the-replacement-begins/

Of course, one can ask why all this was not recognized much earlier and why quality control obviously failed so grandiosely. This also has something to do with the fragmentation of production and supply chains. Both the chamber and the complete cooler come from a third-party manufacturer far in advance and are of course very difficult to test without the board assembled. Using QR codes, however, all components can be assigned in time and also located subsequently.

If the PVT samples (Production Validation Test) worked, then later one takes samples from the current MP (Mass Production) at most. In the best case, such cards are tested e.g. at PC-Partner(manufacturer of ZOTAC) in special hot boxes, but are there in a vertical setup. And then it comes exactly to the occurred case, because the affected vapor chambers are not completely non-functional, but only more or less limited functional. However, this is almost impossible to monitor with granular tests.
 
Igor Wallossek has access to unreleased internal emails from AMD Acknowledging the issue

Exchange or return? AMD offers both

There are now also instructions to the employees concerned to address the problem actively and accommodatingly if the customer can plausibly explain or prove a corresponding error. The first cards are already in exchange and AMD offers both a refund and an exchange for a working graphics card. However, the return shipping for a refund is at the customer’s expense, which is unattractive but not unusual.

https://www.igorslab.de/en/amd-rade...apor-chamber-affected-the-replacement-begins/

Of course, one can ask why all this was not recognized much earlier and why quality control obviously failed so grandiosely. This also has something to do with the fragmentation of production and supply chains. Both the chamber and the complete cooler come from a third-party manufacturer far in advance and are of course very difficult to test without the board assembled. Using QR codes, however, all components can be assigned in time and also located subsequently.

If the PVT samples (Production Validation Test) worked, then later one takes samples from the current MP (Mass Production) at most. In the best case, such cards are tested e.g. at PC-Partner(manufacturer of ZOTAC) in special hot boxes, but are there in a vertical setup. And then it comes exactly to the occurred case, because the affected vapor chambers are not completely non-functional, but only more or less limited functional. However, this is almost impossible to monitor with granular tests.
TLDR: AMD had started RMAing some cards in Germany.
 
Igor Wallossek has access to unreleased internal emails from AMD Acknowledging the issue

Exchange or return? AMD offers both

There are now also instructions to the employees concerned to address the problem actively and accommodatingly if the customer can plausibly explain or prove a corresponding error. The first cards are already in exchange and AMD offers both a refund and an exchange for a working graphics card. However, the return shipping for a refund is at the customer’s expense, which is unattractive but not unusual.

https://www.igorslab.de/en/amd-rade...apor-chamber-affected-the-replacement-begins/

Of course, one can ask why all this was not recognized much earlier and why quality control obviously failed so grandiosely. This also has something to do with the fragmentation of production and supply chains. Both the chamber and the complete cooler come from a third-party manufacturer far in advance and are of course very difficult to test without the board assembled. Using QR codes, however, all components can be assigned in time and also located subsequently.

If the PVT samples (Production Validation Test) worked, then later one takes samples from the current MP (Mass Production) at most. In the best case, such cards are tested e.g. at PC-Partner(manufacturer of ZOTAC) in special hot boxes, but are there in a vertical setup. And then it comes exactly to the occurred case, because the affected vapor chambers are not completely non-functional, but only more or less limited functional. However, this is almost impossible to monitor with granular tests.

AMD statement to HardwareLuxx (scroll to bottom for update)

"We are working to determine the root cause of the unexpected performance limitation of the AMD Radeon RX 7900 XTX graphics cards. Based on our observations so far, we believe the issue is related to the thermal solution used in the AMD reference design and with a limited number of cards sold. We are working to resolve this issue for the affected cards. Customers experiencing this unexpected limitation should contact AMD Support https://www.amd.com/en/support/ contact call) ."


https://www-hardwareluxx-de.transla...html?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en-GB
 
some people have already fixed the problem...


Pinned comment under youtube video:

Screenshot_20230104-155854_Opera.jpg
 
AMD are their own worst enemy.
We'll see, it probably needs more time. In fairness Nvidia, Intel, Apple, Microsoft and most corps under shareholder rule would be responding similarly in the situation:

1. Try to assess what happened
2. Try to appear in control and say you already know what happened in the meantime
3. Cost-benefit analyze which course of action toward resolution (or the perception of resolution) will lose the least amount of blood

Not an enviable position but they'll get past it and be fine. AMD is different from Nvidia in that nobody's rooting for them to fail.
 
Last edited:
Derbauer maintains an impressive degree of levelheadedness, and emphasizes the importance of not jumping to conclusions with every internet drama.

I think there is one big hole in his argument here. He is making the HUGE assumption that coolers are cataloged/tracked in the same way that the GPU and SMT components are. To my understanding, it is not done with near that granularity.
 
I think there is one big hole in his argument here. He is making the HUGE assumption that coolers are cataloged/tracked in the same way that the GPU and SMT components are. To my understanding, it is not done with near that granularity.
I highly doubt the coolers are serial numbered in any way. If it was caused by a line failure but the factory making them has multiple lines at best AMD and the AIBs can say GPUs manufactured between date X and Y may be affected. Cataloging and tracking cost money and lowest bidder manufacturing doesn’t generally allow for those sort of extras.
 
I highly doubt the coolers are serial numbered in any way. If it was caused by a line failure but the factory making them has multiple lines at best AMD and the AIBs can say GPUs manufactured between date X and Y may be affected. Cataloging and tracking cost money and lowest bidder manufacturing doesn’t generally allow for those sort of extras.
I would suggest that two weeks ago when this happened, AMD stopped any more Ref cards from going into the channel immediately till they had a handle on what was actually going on. So the long and short of that is, all the affected cards that could have been sold, have already been sold.

I keep seeing the word "recall" being used, but recalls are generally for safety issues, which this is not. If we are reaching for a automotive metaphor here, this is more of a Service Bulletin.

I would suggest that nearly every early adopter knows how to check junction temp. Heck, it is built right into the Adrenaline software. Two clicks and you are there.

On another note, derBauer's and Igor's technical prowess continue to impress me over and over. derBauer is at god-tier when it comes to diagnosis and analysis, and having the tools he needs that are right for the job. However, both derBauer and Igor are heavily lacking in understanding of the overall business model, supply chain, build chain, and channel chain structure overall. I am not poo-pooing them, just a critique on my part. I have seen both say things this week that are not truly informed opinions.
 
Back
Top