RTX Space Invaders Wanted

Well in the case of TSMC chemical wafer issue it's easier to understand how it kind of went a bit more unnoticed. The degradation might not be immediately noticed, but over time over a period of heating and cooling cycles. Outside of that without knowing their quality control standards it's sort of anyone's guess. They may have tested a initial batch of wafers that checked out fine and a later batch was effected. Both batches though may be part of the initial stock pile of GPU's they shipped since otherwise they get complaints about "paper" launches.


Can you provide any link that contaminated products left the factory? I am unable to locate any such evidence.
 
Can you provide any link that contaminated products left the factory? I am unable to locate any such evidence.
2nd time I've seen this unsubstantiated tsmc contaminants rumor here. No links, nothing.
 
2nd time I've seen this unsubstantiated tsmc contaminants rumor here. No links, nothing.

Exactly. They don't exist. I'm trying to be respectful and all because it's a front page article and I'm tired of getting suspended. Lol
 
Thanks for taking the time to do all this work related to the card. I am very interested to see the results!
 
is there an eventual goal to this ? do we expect to see a class action lawsuit? Hardware (even batches) have been known to have issues throughout computing history. You rma it and try again or get a refund and move on. Why would this hardware defect be something to single out amongst all the various hardware defects consumers come across?

I mean, if i visit car forums, I'm exposed to a insanely biased sampling and would totally think a given car is riddled with problems ...but if you look at the population in total, it's not a big problem at all relatively. It seems similar here or we'd see some sort of skew number change as they silently fix this issue or a discontinue of production.
 
Can you provide any link that contaminated products left the factory? I am unable to locate any such evidence.
Who said or claimed it was evidence.

2nd time I've seen this unsubstantiated tsmc contaminants rumor here. No links, nothing.
It's simply a theory explanation to what might have happened. Feel free to come up with another more explainable one. I personally see TSMC being more probably than some sort of Micron chip failing and so catastrophically in various unpredictable ways. I would think they'd simply die slowly if anything or exhibit issues pretty quickly and be detected by Micron/Nvidia far prior to shipping to consumers. Since when was evidence required for a theory and why can't one be partially substantiated at the same time? We know degradation can cause catastrophic problems red ring of death and what about those wonderful Nvidia solder joints as well. That along with a graveyard of fried chips from overclocking or a fan failure. I think there would have been obvious red flags had it been faulty Micron chips. It's a horribly bad bluff if there were and they got ignored by Micron/Nvidia is all I can say about that though.
 
Who said or claimed it was evidence.

It's simply a theory explanation to what might have happened. Feel free to come up with another more explainable one. I personally see TSMC being more probably than some sort of Micron chip failing and so catastrophically in various unpredictable ways. I would think they'd simply die slowly if anything or exhibit issues pretty quickly and be detected by Micron/Nvidia far prior to shipping to consumers. Since when was evidence required for a theory and why can't one be partially substantiated at the same time? We know degradation can cause catastrophic problems red ring of death and what about those wonderful Nvidia solder joints as well. That along with a graveyard of fried chips from overclocking or a fan failure. I think there would have been obvious red flags had it been faulty Micron chips. It's a horribly bad bluff if there were and they got ignored by Micron/Nvidia is all I can say about that though.


You presented your theory/opinion as fact.

I'll wait for Kyle to come back with his engineering report.
I'll also wait for some evidence/proof to come about that says damaged silicon made it to customers and then out into the channel.
 
Who said or claimed it was evidence.

It's simply a theory explanation to what might have happened. Feel free to come up with another more explainable one. I personally see TSMC being more probably than some sort of Micron chip failing and so catastrophically in various unpredictable ways. I would think they'd simply die slowly if anything or exhibit issues pretty quickly and be detected by Micron/Nvidia far prior to shipping to consumers. Since when was evidence required for a theory and why can't one be partially substantiated at the same time?
I'm not here to hypothesize. No offense intended, it's just my nature. Carry on with your unmitigated analysis.
 
alxlwson & Mega6
I'm just interested in your opinion here.

That said, would you not agree that without speculation like this, we are doing nothing but spinning our wheels while going no where?
I mean, up until now, all it has been was unsubstantiated claims as to what has caused these to fail. Realistically, pointing the finger at the recent discovery by TMSC is no different. Furthermore, it sounds like a far more plausible cause considering they stated the problem couldn't be tested for and found until after production.

On top of that, and yes this is an assumption by me based on just watching TV shows like How It's Made, that the quality control testing is probably a continual and unrelenting test. Meaning if it does have anything to do with thermal cycling, then those QC samples aren't undergoing the cool-down phase. To me that sounds likely, since the goal is to ensure that they can remain functional after hours and hours, over years, of use. Only way to accomplish that is to keep them under load 24/7.

Point is, that up until now it's been "wild accusations" to the cause. This at least is a likely cause considering the affected node, is it not? As such, it provides a starting point for Kyle's engineering firm to perhaps look into, instead of (potentially) blindly analyzing the card for problems.


In my opinion, just because it hasn't been said that the issue doesn't apply to anything that made it out into the product stream, doesn't mean it isn't the case. We all know that companies will keep this sort of detail hush-hush in order to not impede the flow of cash. If it was mentioned that there was even a small chance that some of the 2000-series chips were affected, then that would greatly impact the sales of cards since people wouldn't want to take the risk.
 
alxlwson & Mega6
I'm just interested in your opinion here.

That said, would you not agree that without speculation like this, we are doing nothing but spinning our wheels while going no where?
I mean, up until now, all it has been was unsubstantiated claims as to what has caused these to fail. Realistically, pointing the finger at the recent discovery by TMSC is no different. Furthermore, it sounds like a far more plausible cause considering they stated the problem couldn't be tested for and found until after production.

On top of that, and yes this is an assumption by me based on just watching TV shows like How It's Made, that the quality control testing is probably a continual and unrelenting test. Meaning if it does have anything to do with thermal cycling, then those QC samples aren't undergoing the cool-down phase. To me that sounds likely, since the goal is to ensure that they can remain functional after hours and hours, over years, of use. Only way to accomplish that is to keep them under load 24/7.

Point is, that up until now it's been "wild accusations" to the cause. This at least is a likely cause considering the affected node, is it not? As such, it provides a starting point for Kyle's engineering firm to perhaps look into, instead of (potentially) blindly analyzing the card for problems.


In my opinion, just because it hasn't been said that the issue doesn't apply to anything that made it out into the product stream, doesn't mean it isn't the case. We all know that companies will keep this sort of detail hush-hush in order to not impede the flow of cash. If it was mentioned that there was even a small chance that some of the 2000-series chips were affected, then that would greatly impact the sales of cards since people wouldn't want to take the risk.


Plausible assumptions? Maybe. Anything is possible at this point. I have faith that the experts contracted here by Kyle will get to the issue(s) and are not flying blind. Carry on I guess?
 
You presented your theory/opinion as fact.

I'll wait for Kyle to come back with his engineering report.
I'll also wait for some evidence/proof to come about that says damaged silicon made it to customers and then out into the channel.
More like you interpreted it as such and ignored parts of what else was said that suggested it could be tied to Micron's DRAM or even other unrelated more obscure stuff that didn't get spotted and passed basic tests, but came back to haunt them in the form of degradation. I think there is a stronger chance it's tied to TSMC than to Micron DRAM personally because it would have been easier for that go undetected and slip thru and pass initial quality assurance tests at Nvidia while Micron would have performed their own internal testing as well which they do all the time for binning purposes. Believe what you want think what you want, but quite trying to start crap for no reason because you got your panties in a bunch over how I may have worded a few things and presented them to you on a forum of loose discussion about a problem that none of us know the exact cause of at this point definitively with facts and evidence right now that you of course claim I made or presented.
 
Last edited:
Very possible it could be a board issue.
A few years ago when I was an X Ray inspector I discovered an odd image in one of our boards. Looking into it further it appeared that one of the layers was misaligned with the layers above and below. This had the potential to cause electrical clearance violations (electrical shorts). The engineer took a few of the cards and sliced them and discovered they were indeed bad. The boards that we had built passed test, but we didn't fully load them during testing, the testing was more of a circuit check. They may or may not have failed in an actual use situation, but we could not allow them to get that far into production to find out. We then isolated the boards and had to scrap out any boards already populated with components.
 
More like you interpreted it as such and ignored parts of what else was said that suggested it could be tied to Micron's DRAM or even other unrelated more obscure stuff that didn't get spotted and passed basic tests, but came back to haunt them in the form of degradation. I think there is a stronger chance it's tied to TSMC than to Micron DRAM personally because it would have been easier for that go undetected and slip thru and pass initial quality assurance tests at Nvidia while Micron would have performed their own internal testing as well which they do all the time for binning purposes. Believe what you want think what you want, but quite trying to start crap for no reason because you got your panties in a bunch over how I may have worded a few things and presented them to you on a forum of loose discussion about a problem that none of us know the exact cause of at this point definitively with facts and evidence right now that you of course claim I made or presented.

The issues might as well be caused by space magic at this point, lol, though I'm pretty sure most of us have considered that TSMC might be the culprit after their announcement. Instead of ascribing to any theories, I'm just going to wait patiently for the experts to let us know what they find out.
 
Plausible assumptions? Maybe. Anything is possible at this point. I have faith that the experts contracted here by Kyle will get to the issue(s) and are not flying blind. Carry on I guess?
I have faith they will as well. It'll definitely be interesting to find out, no matter what the outcome.

I'm also curious to see whether or not nVidia will try to "head Kyle off" by making their own announcement...

Then again, that's assuming they know and aren't waiting for Kyle's findings as much as the rest of us (which I'd find it hard to believe they don't already know).
 
I'm not here to hypothesize. No offense intended, it's just my nature. Carry on with your unmitigated analysis.

I have no idea if it's true or not. But there have certainly been reports of this. Doing a quick search did reveal these links. As always take with a grain of salt until it is proven to be the issues with these cards.

https://asia.nikkei.com/Business/Co...r-Nvidia-and-Huawei-hit-by-defective-chemical

https://www.extremetech.com/computi...stroys-tens-of-thousands-of-nvidia-gpu-wafers
 
One fab and one batch of bad dye could not affect Nvidia with a 20% 200 series failure rate since their release if that is what you are all implying.

"TSMC has discovered a shipment of chemical material used in the manufacturing process that deviated from the specification and will impact wafer yield," the company said in a statement. It said it was investigating the cause of the problems at the Fab 14B production site in southern Taiwan and contacting customer."
 
Last edited:
I like how people are dismissing the ram thing. Samsung and Micron are being used, but yet, he states in the video that the OC community is finding that it is cards with Micron ram that seem to be having issues. Where as cards with samsung dont seem to be. Cant see to different layouts being used for 2 different manufactures type of ram on the same card. Micron ram may not be up to snuff for their rated speeds as they claim.
 
I like how people are dismissing the ram thing. Samsung and Micron are being used, but yet, he states in the video that the OC community is finding that it is cards with Micron ram that seem to be having issues. Where as cards with samsung dont seem to be. Cant see to different layouts being used for 2 different manufactures type of ram on the same card. Micron ram may not be up to snuff for their rated speeds as they claim.

It isn't that people are dismissing RAM or any other factor. It is that without proof from an engineering firm anything is just a random guess. They want actual proof as to what the cause is before they say with 100% confidence that yes that is the issue.
 
One fab and one batch of bad dye could not affect Nvidia with a 20% 200 series failure rate since their release if that is what you are all implying.

"TSMC has discovered a shipment of chemical material used in the manufacturing process that deviated from the specification and will impact wafer yield," the company said in a statement. It said it was investigating the cause of the problems at the Fab 14B production site in southern Taiwan and contacting customer."
Okay, so where is your source that the failure rate of RTX cards is 20%?
 
I'm pondering what might cause these types of problems outside of VRAM itself or wafer damage. It seems like of host of other things could cause problems and even be cumulative compounded issues. It also seems like just shrinking from 16nm to 12nm could make a lot of things more tricky that could be taken for granted like ripple current and jitter and digital VRM/capacitor quality could even come into play from a cumulative aspect. We've seen how vdroop can impact stability and at a smaller node if it's not controlled as well would be a bigger concern especially if compensating for it with more voltage and thus more degradation may occur in line with that.
 
This just dawned on me, but does nVidia handle their memory similar to the way AMD does, in terms of the "ECC"?
If so, perhaps it's a requirement these days in order to have a stable card with how fast the VRAM runs these days? Furthermore, that the RAM inherently produces errors but to a degree that the ECC handles it w/o issue or performance loss.

As such, maybe that's what's failing on these cards, the ECC circuit in the GPU, which is allowing the memory's "natural" errors to manifest visually, but also get out of control and crash the system?

If down-clocking the RAM is a solution for more than just that YouTuber (and I have no clue if it is), then that may explain it a little since fewer of these "natural" errors would occur. Additionally, it'd explain why it doesn't seem to be specific to Micron or Samsung GDDR chips.


Anyone here have that sort of expertise and knowledge to say whether or not it's plausible?
 
It's a strange one. I have seen instances reported where downclocking the ram didnt stop the crashes. Maybe it's the memory controller itself. There is also reports of it happening with Samsung memory, but not to the extent of the Micron ram. I still think it's really up in the air as to the true cause of this. I really hope the people who Kyle has working on this can get to the bottom of it.
 
Well, this is not a good outcome so far. It seems that now the card is going through thermal testing, the card no longer produces the issues. Interestingly enough, the last card I sent, "fixed itself" during testing as well, although the failure was totally different.

"Kicking off another round of 2hr+ thermal lab runs to see if we can reproduce. The sudden disappearance of the issue hints at high-thermals providing some temporary fix to the problem (solder joints?). Can’t confirm yet with the data we have generated, so we hope the additional runs will allow us to get to the bottom of it.


"Once we can confidently rule out (or not) any correlation with thermals we will proceed with memory test plan."
 
That's interesting. I did not expect to see the card come back to life so to speak. That certainly sounds like some sort of solder issue. The thing that's strange about that is that AIB cards also suffered. The plot thickens.
 
Is it possible it's an issue with certain chipset configs w/ the card itself? This would explain why some people were so 'unlucky' and having consistent issues across new cards.

The thing is, the engineer's that Kyle is using was able to get the card to artifact. They then did more testing, mainly thermals by the sounds and the card has stopped artifacting. Unless they did there initial tests on a different system, it doesn't sound like the case.
 
so about like how covering a Xbox 360 with a towel to make it heat up "fixed" the red ring of death. Definitely an interesting find.
I actually mentioned solder joints and red ring of death as possible quality control degradation issues, but for Nvidia to actually have solder issues again so shortly after the last incident would be sad to actually be the case. You'd really expect them to be keeping keener eye on that kind of issue today. It could happen, but it just feels like reaching so I'd be a lot more shocked if it ended up being due to that than otherwise. I'd be less shocked with particular GPU workloads causing it to manifest a lot more readily. Has anyone happen to keep track of when these occurrence have triggered like particular games and such? Perhaps that could shed a bit of light into the problem based on the workloads in question. Like a game that's very heavy on TMU's, ROP's, VRAM, poly count, ect...maybe could somewhat narrow it down to a closer cause.
 
Self-repairing Nvidia nanobots?

Probably originally intended to downgrade the RTX's when the next gen comes out but in a moment of panic some code was sneaked out in an update to help with these invasions.
 
Ok so 2 for 2 (sorta) in bad cards being unreliably bad for them... Could mounting have something to do with it?
For example, was your's (that you originally sent to them) and Akbar's cards installed in a case that had the card mounted traditionally (horizontal, fans facing down), but at the firm it had been mounted in a test bench (vertical)?

I ask because there's a chance that the weight induced flexing of the card may be causing a component or the GPU chip to be twisting ever so slightly. If it is a solder issue, then that twisting may be exacerbating the flaky connection just enough to have it manifest... or in this case, to stop manifesting.


It's either that, or nVidia has somehow gotten to your firm and paid them off, and they just so happen to be unable to reproduce the problem! :eek: However, given we don't know who you've used, and I'm not sure anyone even knew you were doing this with your first card... this comment is meant to be taken purely in jest. lol
(Either that, or it IS nanobots programmed to sabotage that are doing their job too soon....)

"I'm here because we've heard you're testing a FE RTX2080ti at this facility"

240?cb=20090606152024&path-prefix=en.jpg
 
Back
Top