RTX Space Invaders Wanted

I'm interested in the detailed process they used for each tests if you can gather some information and pictures. Maybe do a visit ?
That would make a great article :)

Space Invaders: The journey of a test escape !
 
Ultimately I don't want Kyle to draw any attention on his quest for the truth. I don't want to see anything, I don't want to hear anything, I don't want people discussing it... Until he's got the hard evidence and then crucifies Nvidia with it in a scathing article that gets carried by more than one outlet and blows the shit out of the cover-up that Nvidia had been running.

The reason for my desire to know nothing is because Nvidia can, rather easily, step in and throw money at any number of individuals involved to silence the truth before it emerges. It's essentially what they're doing right now with all their AIB partners.

This isn't a conspiracy, it's just the tale of a mega corporation that owns 75+% of the market and how they feel they are invincible. Leading them to anti competitive business practices and releasing flawed hardware that is then sold at unseen before premiums and most outlets continue to praise it because they're not allowed to say anything else or they don't get video cards.

Honestly, none of us know if this will yield anything amazing. I don't hate Nvidia. I just wish they were a shitload more transparent and gave a damn about their consumers.

I will shut up now :cool:
 
I'm interested in the detailed process they used for each tests if you can gather some information and pictures. Maybe do a visit ?
That would make a great article :)

Space Invaders: The journey of a test escape !
Not going to happen. Too much IP to be exposed there.
 
I don't hate Nvidia. I just wish they were a shitload more transparent and gave a damn about their consumers.
IMO, for those reasons you should hate nVidia (or at least be pissed at them). You can like their products, I can respect that. I hate nVidia, but I still recognize their superiority, I just personally will have nothing to do with them regardless.
(And yet, despite my lack of respect for them, I still feel it necessary to spell their name the way they used to... Go figure... lol)
 
IMO, for those reasons you should hate nVidia (or at least be pissed at them). You can like their products, I can respect that. I hate nVidia, but I still recognize their superiority, I just personally will have nothing to do with them regardless.
(And yet, despite my lack of respect for them, I still feel it necessary to spell their name the way they used to... Go figure... lol)
Hatred maintained is a slippery slope, my friend. I try not to hate anyone, I can mad as hell at someone or something for a while ... But even that I let go. I try to understand why people and companies do what they do instead. That's how I've come to terms with people that have done some really unforgivable things to me in my life. I can forgive, forgetting is another matter entirely.

Nvidia is a corporation that's run largely unchecked in it's lifetime and it's doing everything it can to remain top dog in an industry that can change in a few short years. While their tactics appear a lot more like what Intel had done to AMD in the past, and they've more or less worked in one form or another, they will not be well received if evidence of their anti consumer practices ever comes to light. It will probably hit their stock value, give them a slap on the wrist and send them back to work. That sounds underwhelming but they won't likely pull this shit again if they're exposed.

The fact that the number one graphics card maker is even resorting to these measures indicates they fear the future. They're trying to crush AMDs graphics division before Intel enters the market.

Say, AMD recaptures 36% of the market share after Navi. So they're back up there and then Intel steals 15-20% of the market from Nvidia... That's why they're trying to place a strangle hold on the market, because they know with more consumer choice, they're going to lose market share, no matter what.

I buy their tech because it's the fastest. Someone else releases a 4K, max resolution graphics card that will put out 60+ FPS in damn near all games, I will buy it. Until then, I only have one choice.

I may despise their business practices but if I've learned one thing in the IT industry it's that corporate ethics are pretty damn hard to come by. I have a business degree and an IT degree and the business one taught me all about good business ethics... None of which I have seen, in damn near any form, in corporate America...
 
Hatred maintained is a slippery slope, my friend. I try not to hate anyone, I can mad as hell at someone or something for a while ... But even that I let go. I try to understand why people and companies do what they do instead. That's how I've come to terms with people that have done some really unforgivable things to me in my life. I can forgive, forgetting is another matter entirely.

Nvidia is a corporation that's run largely unchecked in it's lifetime and it's doing everything it can to remain top dog in an industry that can change in a few short years. While their tactics appear a lot more like what Intel had done to AMD in the past, and they've more or less worked in one form or another, they will not be well received if evidence of their anti consumer practices ever comes to light. It will probably hit their stock value, give them a slap on the wrist and send them back to work. That sounds underwhelming but they won't likely pull this shit again if they're exposed.

The fact that the number one graphics card maker is even resorting to these measures indicates they fear the future. They're trying to crush AMDs graphics division before Intel enters the market.

Say, AMD recaptures 36% of the market share after Navi. So they're back up there and then Intel steals 15-20% of the market from Nvidia... That's why they're trying to place a strangle hold on the market, because they know with more consumer choice, they're going to lose market share, no matter what.

I buy their tech because it's the fastest. Someone else releases a 4K, max resolution graphics card that will put out 60+ FPS in damn near all games, I will buy it. Until then, I only have one choice.

I may despise their business practices but if I've learned one thing in the IT industry it's that corporate ethics are pretty damn hard to come by. I have a business degree and an IT degree and the business one taught me all about good business ethics... None of which I have seen, in damn near any form, in corporate America...
Trust me, I realize hate is a strong word and that it may not really be the proper word to use in this context. I also am a forgive-but-not-forget person as you are; however, some people just don't have any desire to change and will continue on the path as they always have (and companies, which are made up of people). Therefore, there are times when forgiveness just isn't deserved, not until they manage to change their ways and continue to do so.

But as you said, most companies are some level of shady unfortunately. I just see nVidia being openly shady, except they have good lawyers who keep them within the scope of various loopholes and thus out of the courtroom since it'd cost too much to try to prove they're guilty. nVidia just isn't shy about it, which is primarily what I 'hate' about them.
 
Does anyone know if Nvidia has mostly resolve this issue with current manufacturing? Or is this problem going on at the same rate or slightly diminished?

I wonder if it was related to TSMC chemical issues earlier reported that caused a number of issues with 12nm-14nm dies. This should really be a huge news story that keeps being updated but the quietness of it all is actually rather disturbing. 20% failure rate is way out there for a product especially if these GPU's are used for more than gaming.
 
Does anyone know if Nvidia has mostly resolve this issue with current manufacturing? Or is this problem going on at the same rate or slightly diminished?

I wonder if it was related to TSMC chemical issues earlier reported that caused a number of issues with 12nm-14nm dies. This should really be a huge news story that keeps being updated but the quietness of it all is actually rather disturbing. 20% failure rate is way out there for a product especially if these GPU's are used for more than gaming.
No.
 
out of curiosity, will you try replacing the memory before doing destructive testing?

If known good chips still error then that will eliminate the memory from the equation at least
 
Kudos to [H] and Kyle for doing this.

What is disturbing to me is the way Nvidia is handling this failure. Kyle mentioned some quiet, backchannel, comms from AIBs. That indicates the AIBs are reticent to speak out. Obviously, the source of that reticence is the fear of Nvidia's response.

This aligns with GPP and Nvidia's other market practices. Unfortunately, none of these reflect well on Nvidia.

Manufacturing errors do occur. With GPUs and their billions of transistors melded into a complex working whole, it's amazing they work as well as they do, as often as they do. Imagine if Nvidia had taken an open approach, stopped selling their cards, put out a reward for anyone discovering the source of the problem, etc. In other words, an open and frank admission. What a difference.

Keeping OT, this failure mode and the failure rate are very interesting. Obviously, Nvidia did not mean for this to occur... How did such an experienced manufacturer fall into this error?
 
Kudos to [H] and Kyle for doing this.

What is disturbing to me is the way Nvidia is handling this failure. Kyle mentioned some quiet, backchannel, comms from AIBs. That indicates the AIBs are reticent to speak out. Obviously, the source of that reticence is the fear of Nvidia's response.

This aligns with GPP and Nvidia's other market practices. Unfortunately, none of these reflect well on Nvidia.

Manufacturing errors do occur. With GPUs and their billions of transistors melded into a complex working whole, it's amazing they work as well as they do, as often as they do. Imagine if Nvidia had taken an open approach, stopped selling their cards, put out a reward for anyone discovering the source of the problem, etc. In other words, an open and frank admission. What a difference.

Keeping OT, this failure mode and the failure rate are very interesting. Obviously, Nvidia did not mean for this to occur... How did such an experienced manufacturer fall into this error?

Probably tried to cut cost somewhere. In a different industry I have seen the results before of a company lowering the cost of a chip 1-2 cents causing a board to suddenly have a 15% failure rate in the field after some time due to the chips failing. I don't know what they changed to make them a tad cheaper but it caused a lot of problems that is for sure. So it could be something as simple as they decided that since their profits were starting to slip they would try to save a few cents here or few cents there and slightly changed a few things that is resulting in the issues.
 
As per this evening...

We can’t get the board to remain stable following driver load anymore. Stability has deteriorated significantly over time. I spent a couple hours with it today, the GPU shows corruption at windows load screen (immediately after loading driver), screen blinks, driver de-loads, code 49 in device manager. If I disable and re-enable the device, immediate corruption, rinse and repeat until a BSOD.

We are sending it out to our other lab for imaging/analysis on the package, DRAM, and board for any faults. We may have some early shots by end of next week, although analysis will take longer.
 
After my unpleasant experience of a dying RTX 2080ti FTW3 few days ago, I will provide here the only logical explanation about this RTX mess.

Facts:

- all models from all brands are affected: this problem can impact any RTX card

- different PCB with oversized power supply and cooling are also affected: so it's not a problem related to temperature or power failure

- cards can die with Micron memory (used from the beginning so widespread) but also with Samsung memory (used later): so the problem is not related to the memory, otherwise it would mean both Samsung and Micron are unable to manufacture GDDR6 (moreover without giving any explanation on other types of death with no artifacts)
https://forums.geforce.com/default/...-evga-2080ti-ftw3-icx2-hydro-copper-error-43/

- we can see different type of death with different symptoms: it's hard to believe that this RTX generation is cursed to the point of undergoing several types of problems

- the core itself remains the only common point between all these dying cards: knowing that some cards are still working fine, then it's not a faulty design of the Turing architecture from Nvidia

- The survival time is variable: this implies abnormal and progressive deterioration of the core over time (not detectable at factory)

- we learn at the same time that TSMC 12nm fab who builds RTX has suffered from a serious contamination: but little information provided to not scare away customers and investors
https://www.extremetech.com/computi...stroys-tens-of-thousands-of-nvidia-gpu-wafers

- a contamination of the core can appear in many ways: when the memory controller is affected (inside the core) then we can see some artifacts (corrupted cache), if the card has light contamination level then it can survive longer but with strange behavior (power drop, freezes, crashs, temperature spikes...) and if the core is seriously damaged then Windows shows blue screen with core dump and card not detected by device manager


If someone has a better explanation, I'm open-minded, but do not expect any information from Nvidia, who has flooded the market with their TOXIC RTX, leaving the burden to the customers to clean all their bad inventories until the last one!
 
Last edited:
Leak the story? Get the truth out even if it’s just another unsubstantiated rumor. Maybe someone else can run with the information and push it? Maybe there really is no story and we’re grasping at any failure as a portent of doom for RTX. Who knows.
 
Probably tried to cut cost somewhere. In a different industry I have seen the results before of a company lowering the cost of a chip 1-2 cents causing a board to suddenly have a 15% failure rate in the field after some time due to the chips failing. I don't know what they changed to make them a tad cheaper but it caused a lot of problems that is for sure. So it could be something as simple as they decided that since their profits were starting to slip they would try to save a few cents here or few cents there and slightly changed a few things that is resulting in the issues.

Not sure I’d buy that given how over engineered the VRM was and general high fit and finish. There were plenty of places they could have skimped first that wouldn’t carry the risk. I would more believe an outsourced assembly or supplier cutting corners and not telling them.

I do agree their heavy handed BS to silence partners is no bueno though and respect the work that is going into this :)
 
Not sure I’d buy that given how over engineered the VRM was and general high fit and finish. There were plenty of places they could have skimped first that wouldn’t carry the risk. I would more believe an outsourced assembly or supplier cutting corners and not telling them.

I do agree their heavy handed BS to silence partners is no bueno though and respect the work that is going into this :)

Wouldn't explain custom cards though. Custom designed cards use different power circuitry than the stock board. The defect has to be something that doesn't change between stock and custom cards.
 
Wouldn't explain custom cards though. Custom designed cards use different power circuitry than the stock board. The defect has to be something that doesn't change between stock and custom cards.

That would only leave the GPU die which isn’t fabbed by NVIDIA?
 
That would only leave the GPU die which isn’t fabbed by NVIDIA?

That and the memory. There are likely other minor chips and parts shared between stock and custom cards, but not sure if they'd be causing the issue.
 
That and the memory. There are likely other minor chips and parts shared between stock and custom cards, but not sure if they'd be causing the issue.

Memory chips are picked by AIBs as long as they meet the spec
 
Only two companies make GDDR6 so Nvidia and AIBs both use Micron and Samsung chips. That said, given that this happens across both manufacturers, it is unlikely the culprit.
Could be that nvidia engineers decided to ignore or misinterpreted some parts of the gddr6 spec, which could cause issues down the road (if not immediately). I kinda doubt that, but if it was their opengl team I wouldn't be surprised. :p
 
Nvidia lies about everything, especially since their stock has been taking a beating. Anyone who signed their NDA can't speak the truth and their recent investor figures are all twisted to make it appear as if their RTX cards have outstripped sales for 10 series ones.

The problem with their cards is in the silicon, their GPU silicon. It's not the RAM, not the power delivery, not custom boards, not their boards, it's all boards of every make, model and manufacturer.

If it were an issue with anything else on these cards it would have been isolated and resolved by now. Yet it still continues to happen... There is a flaw buried in their die somewhere that can be binned into manifesting less to some degree. I think all the 20 series cards are ticking time bombs. I'm just waiting for mine to reach it's final destination.

We may never see Nvidia called out on their bullshit, however, does it matter anymore? We already know there's something deeply flawed somewhere in their silicon. Calling them out on it won't break their company, it will only make them get nastier on their next set of programs that limit what anyone can say about their products.

It would hit their stock value, maybe, if it doesn't get outright discredited by all the other hardware review sites that are kissing Nvidia's ass for review products.

I'm honestly happy Kyle got out when he did. Because no matter the findings, it would have been a shit show. And he would be the target of all of it.

I'm looking forward to seeing some good stuff from Intel coming up here. If anyone can give Nvidia a run for their money, it's them. It will be good to see Intel with a soul for once, and they appear to have assembled a solid crew of hardware people that give a shit about us .

Edit: it doesn't matter to me anymore because I know , following my gut and all my shitty experiences with the 20 series as knowledge. As hardware enthusiasts, all we have sometimes is that gut feeling based off of experience. That's enough for me, and it can be enough for you too. We know, and while we may be a vocal minority, people do hear us daily. I've personally influenced over a hundred people into not buying the 20 series. If you're on this forum, you love this shit as much as I do and chances are, people listen to you . You're not under Nvidia's NDA, so tell people to ban this generation and that hits Nvidia right in the nuts.

Maybe next generation they will release something without a 20-25% failure rate. However, by then I will probably have an Intel GPU and won't give a shit anymore ;)
 
Last edited:
The problem with their cards is in the silicon, their GPU silicon.

...

We may never see Nvidia called out on their bullshit, however, does it matter anymore? We already know there's something deeply flawed somewhere in their silicon. Calling them out on it won't break their company, it will only make them get nastier on their next set of programs that limit what anyone can say about their products.

Given that the problem is most likely in the GPU silicone itself, do we actually know that it is Nvidia? TSMC has had issues with their smaller nm processes, how do we know that this problem is ultimately with TSMC?
 
Given that the problem is most likely in the GPU silicone itself, do we actually know that it is Nvidia? TSMC has had issues with their smaller nm processes, how do we know that this problem is ultimately with TSMC?

I don't buy cards from TSMC. If there is a problem in he silicone it is up to NVIDIA to fix it/catch it and not sell me something they know is broken.
 
I don't buy cards from TSMC. If there is a problem in he silicone it is up to NVIDIA to fix it/catch it and not sell me something they know is broken.

Unless they did not know it was broken. Sometimes errors only pop up when production is ramped up. By that time all the chips are already in motion...literally. That isn't to say that Nvidia doesn't share the responsibility, but it is hard to blame Nvidia for not fixing a problem they are not directly involved with, and what I quoted was a claim that it was specifically Nvidia's design that was at fault. I was merely suggesting it may not be the design, it may be the manufacturing process.
 
Given that the problem is most likely in the GPU silicone itself, do we actually know that it is Nvidia? TSMC has had issues with their smaller nm processes, how do we know that this problem is ultimately with TSMC?
The issues at TSMC have been isolated. Until it is explicitly stated that TSMC has screwed up a bunch of Nvidia's silicon, it's going to be Nvidia's issue. What I think will happen if someone calls Nvidia out on this is that they will simply blame TSMC. They're holding that card in their back pocket.

I agree with Grimlaking , the issues are so prevalent that it's fucking irresponsible for a company like Nvidia to have allowed their product to be propagated to AIBs and done nothing about it when they would have had to know this was an issue given the sheer volume of the issues. Even if they didn't initially, they certainly did an internal investigation and discovered what is actually going on. Yet they have done nothing, this is and example of corporate ethics at their finest... Which, essentially, is that any sense of good business ethics in corporations typically doesn't exist. Nvidia has been rather open about how they could give a shit about fair business practices, with their anti-competitive tactics, lack of transparency and a history of attempting to obscure the truth that has actually ended in at least one class action lawsuit over the 970 Series cards which while they had 4 GB of Ram, only 3.5 of it ran at 256 Bit and .5 of it ran at 32 bit, which caused stuttering when programs would access that block of GDDR. You can bet your ass that this was one of many initiatives that they attempted to get away with because it saved money and they "technically" didn't lie about the size of the RAM only about the bandwidth available to 512 Megs of it.
 
Unless they did not know it was broken. Sometimes errors only pop up when production is ramped up. By that time all the chips are already in motion...literally. That isn't to say that Nvidia doesn't share the responsibility, but it is hard to blame Nvidia for not fixing a problem they are not directly involved with, and what I quoted was a claim that it was specifically Nvidia's design that was at fault. I was merely suggesting it may not be the design, it may be the manufacturing process.
I think the big argument here is about protecting the consumer, which Nvidia totally doesn't give a rats ass about. The 20 series cards saw a price hike, a massive one at that. There is zero transparency in that company. They have a history of trying to screw their consumer base. I get what you're saying, that once the wheel is in motion, it's hard to stop. Nvidia acknowledging that there was an issue would have been the ethical thing to do. However, they didn't do that. They flat out lied about the frequency of issues by calling them test escapes and it was some absurd .001 percent of their cards that had the issues. When, in fact, it was more like 20-25% of all of their 20 series products. We have seen 2070's, 2080's, 2080Ti's, RTX Titans all fail hard. I don't have any figures about the 2060's... However, it's likely an issue there as well. The 1660 and 1660Ti are derived form the same silicon and may very well suffer the same issues...

The right call would have been to announce the problem, support their user base, correct the issue and then inform the user base of the batches of cards moving forward that don't have an issue anymore. This would have hit their stock prices, people would have stopped buying their cards and their stock which had already tanked would have been reduced to a dumpster fire. The discovery TSMC made hadn't yet aired into public knowledge and that is why Nvidia lied and remained silent.

If they came out now, it would look even worse, like corporate negligence or flat out lying to their consumer base. Furthermore, coming out now would likely ignite a class action lawsuit against them.

The fact that Nvidia hasn't even broached the subject of TSMC's contaminated batches of silicon is one of two things. As above I said it's a card to save in their back pocket in case they do get called out.... And/or it's because TSMC's bad batches have nothing to do with the problem.
 
ha...kyle..please do what it take to know the final word on that mess.im soo looking to it...after all you have gone through (nvidia nda etc) it would be a shame to not finish what you have start..that is...know the real answer as why its happening....tks!
 
I think the big argument here is about protecting the consumer, which Nvidia totally doesn't give a rats ass about. The 20 series cards saw a price hike, a massive one at that. There is zero transparency in that company. They have a history of trying to screw their consumer base. I get what you're saying, that once the wheel is in motion, it's hard to stop. Nvidia acknowledging that there was an issue would have been the ethical thing to do. However, they didn't do that. They flat out lied about the frequency of issues by calling them test escapes and it was some absurd .001 percent of their cards that had the issues. When, in fact, it was more like 20-25% of all of their 20 series products. We have seen 2070's, 2080's, 2080Ti's, RTX Titans all fail hard. I don't have any figures about the 2060's... However, it's likely an issue there as well. The 1660 and 1660Ti are derived form the same silicon and may very well suffer the same issues...

The right call would have been to announce the problem, support their user base, correct the issue and then inform the user base of the batches of cards moving forward that don't have an issue anymore. This would have hit their stock prices, people would have stopped buying their cards and their stock which had already tanked would have been reduced to a dumpster fire. The discovery TSMC made hadn't yet aired into public knowledge and that is why Nvidia lied and remained silent.

If they came out now, it would look even worse, like corporate negligence or flat out lying to their consumer base. Furthermore, coming out now would likely ignite a class action lawsuit against them.

The fact that Nvidia hasn't even broached the subject of TSMC's contaminated batches of silicon is one of two things. As above I said it's a card to save in their back pocket in case they do get called out.... And/or it's because TSMC's bad batches have nothing to do with the problem.

I think people just like to hate on Nvidia because they are the big bad guy. But they aren't doing anything that other companies don't do as well. As far as their customer focus, to be honest, I have always gotten great support from Nvidia. So I am not sure why you would say they are not customer focused. The improvements they make to their cards and architecture are with the express purpose of pushing the limits of what you can do with the technology. So they are obviously customer focused.

As for this particular issue, they are not handling it any different than other companies would handle this. If they do not know the root cause, or they have not yet found a fix for it, then what do you expect them to say? This is why there are RMAs. It isn't all cards that are having these problems or we would have seen a lot more people offering theirs up for Kyle.

As for looking worse, you are suggesting that it would look worse for them to come out when they have a solution and a path forward, than to just say..."well shit, we got nothing"? This does not generally happen in the business world expressly because that does not make the business look better.

Again, I am not saying they are not culpable, but to suggest things that have zero backing to them and especially others who suggest actually propagating false stories is just a bit ridiculous.
 
There is no truth in unsubstantiated rumors.

God damn you use a lot of fucking words to say nothing of value.

Anywho, yes there can be truth in an unsubstantiated rumor. In this example it’s only unsubstantiated because the hypothetical evidence might not be possible to release. In other examples rumors can be true from undisclosed sources.
 
There are a lot of murmurings about increased failure of the card, but the return rates are only at 3.5% with half being normal returns. So you have a failure rate if we are being generous of 2%. Do you shut everything down and stop selling the product because of a 2% failure rate?
Hard to believe when people (and reviewers) are complaining about receiving a replacement card defective too.
What is the probability of getting several defective cards?
I don't remember such a mess with previous generations...
 
There are a lot of murmurings about increased failure of the card, but the return rates are only at 3.5% with half being normal returns. So you have a failure rate if we are being generous of 2%. Do you shut everything down and stop selling the product because of a 2% failure rate? Do you panic before you find the cause of some of these recurring and similar failures? Tell me, what do you do?

LOL WTF?? 3.5% failure rate, where are you plucking that figure from? Nvidia?

It was Kyle who said that AIBs have reported to him that they are experiencing an RMA rate higher than 20%. It would only take a quick look around the various tech forums to see that the figure is much more likely to be 20% than 2%.

And if you trust and have faith in Kyle's ability and willingness to get to bottom of this issue then you have to believe his reporting of the returns rate is accurate.
 
LOL WTF?? 3.5% failure rate, where are you plucking that figure from? Nvidia?

It was Kyle who said that AIBs have reported to him that they are experiencing an RMA rate higher than 20%. It would only take a quick look around the various tech forums to see that the figure is much more likely to be 20% than 2%.

And if you trust and have faith in Kyle's ability and willingness to get to bottom of this issue then you have to believe his reporting of the returns rate is accurate.
My failure rate, alone, with the 2080Ti is 66.666666666666666666666666666666% Lol :D
 
After my unpleasant experience of a dying RTX 2080ti FTW3 few days ago, I will provide here the only logical explanation about this RTX mess.

Facts:

- all models from all brands are affected: this problem can impact any RTX card

- different PCB with oversized power supply and cooling are also affected: so it's not a problem related to temperature or power failure

- cards can die with Micron memory (used from the beginning so widespread) but also with Samsung memory (used later): so the problem is not related to the memory, otherwise it would mean both Samsung and Micron are unable to manufacture GDDR6 (moreover without giving any explanation on other types of death with no artifacts)
https://forums.geforce.com/default/...-evga-2080ti-ftw3-icx2-hydro-copper-error-43/

- we can see different type of death with different symptoms: it's hard to believe that this RTX generation is cursed to the point of undergoing several types of problems

- the core itself remains the only common point between all these dying cards: knowing that some cards are still working fine, then it's not a faulty design of the Turing architecture from Nvidia

- The survival time is variable: this implies abnormal and progressive deterioration of the core over time (not detectable at factory)

- we learn at the same time that TSMC 12nm fab who builds RTX has suffered from a serious contamination: but little information provided to not scare away customers and investors
https://www.extremetech.com/computi...stroys-tens-of-thousands-of-nvidia-gpu-wafers

- a contamination of the core can appear in many ways: when the memory controller is affected (inside the core) then we can see some artifacts (corrupted cache), if the card has light contamination level then it can survive longer but with strange behavior (power drop, freezes, crashs, temperature spikes...) and if the core is seriously damaged then Windows shows blue screen with core dump and card not detected by device manager!

I think this is it as well. The GPU silicon. I was down to either the GPU silicon, or the PCB board, but since custom AIB's are also seeing the issue, which I believe use custom their own PCB designs, all that is left is the GPU itself.

As far as the affected RTX cards, is it still just the 2080Ti? Or is the 2080, 2070 etc also lumped into this? For them to be included, they should be at the same 20% failure rate. I think up to 5% is fairly typical, beyond that would call it a high rate of failure.
I thought that the silicon of the 2080 vs the 2080Ti were different? https://en.wikipedia.org/wiki/GeForce_20_series They have differing transistor counts by the billions, as well as different sizes.

So it could be a TSMC issue with the production, or a design flaw with the GPU (seems less likely). If the issue affects all three RTX dies (TU102, TU104, TU106), then design flaw becomes much more probable. If it's a manufacturing issue, then it's just one more nail in Moore's Law's coffin, one more issue to be resolved before things move forward...

As far as nVidia not saying anything more than what they already have, they have to protect relationships with TSMC, et all. Plus, they don't want to scare away any sales. If 80% of the cards are good, and end up having a normal lifespan >3years, at least the customer/enthusiast isn't screwed over... I haven't heard that nVidia or any AIB's are not supporting their customers having these issues. Of course it sucks to have to deal with it, sucks to get multiple failed cards in a row, sucks to pay return shipping, and it sucks they were so expensive.

I'm just going to wait for the 7nm GPU's before I upgrade from my 1080Ti.

If nVidia came out and said "we are going to double the warranty on the 2080Ti cards", that would be well received by the community. Of course, if EVGA still has lifetime warranty, this is already in place at least for some... does anyone still do lifetime warranties? Maybe I am thinking of old BFG...

Still hoping that there is some kind of conclusion to the testing Kyle is doing, but if its the silicon, not sure that any amount of testing will reveal that, short of swapping the GPU out with a card that has been running for months with no issues. Expensive, and not very easy to do. I wonder if Louis Rossman could swap these giant ass GPU's...

Appreciate all the work you've put into this Kyle.
 
Last edited:
LOL WTF?? 3.5% failure rate, where are you plucking that figure from? Nvidia?

It was Kyle who said that AIBs have reported to him that they are experiencing an RMA rate higher than 20%. It would only take a quick look around the various tech forums to see that the figure is much more likely to be 20% than 2%.

And if you trust and have faith in Kyle's ability and willingness to get to bottom of this issue then you have to believe his reporting of the returns rate is accurate.

3.5% is the highest reported figure I have seen from either Nvidia or resellers. It may be the numbers are higher, but they are not being reported. What is being reported has nothing to do with my faith in Kyle's ability or his conclusion. Also my original premise that it may have been something that slipped through also seems to be a thought shared by Nvidia, as early as November of last year. And it seems mostly to deal with Ti cards. Even if we take the figure of 20%, the same logic applies though.

I see no one has chosen to answer what should be done?

God damn you use a lot of fucking words to say nothing of value.

Anywho, yes there can be truth in an unsubstantiated rumor. In this example it’s only unsubstantiated because the hypothetical evidence might not be possible to release. In other examples rumors can be true from undisclosed sources.

No, by definition, there cannot.
 
3.5% is the highest reported figure I have seen from either Nvidia or resellers. It may be the numbers are higher, but they are not being reported. What is being reported has nothing to do with my faith in Kyle's ability or his conclusion. Also my original premise that it may have been something that slipped through also seems to be a thought shared by Nvidia, as early as November of last year. And it seems mostly to deal with Ti cards. Even if we take the figure of 20%, the same logic applies though.

I see no one has chosen to answer what should be done?



No, by definition, there cannot.
The only thing that can be done is for the truth to come out.

I understand what you, personally, are citing. However, I'm going to trust what I know and the word of hardware legend Kyle Bennet over anything that's been reported by Nvidia at this point.

I have had 2 bad cards in a row and my third still exhibits issues that lead me to believe it will fail at some point. I must be part of your 3.5 percent then.. lol
 
I think this is it as well. The GPU silicon. I was down to either the GPU silicon, or the PCB board, but since custom AIB's are also seeing the issue, which I believe use custom their own PCB designs, all that is left is the GPU itself.

As far as the affected RTX cards, is it still just the 2080Ti? Or is the 2080, 2070 etc also lumped into this? For them to be included, they should be at the same 20% failure rate. I think up to 5% is fairly typical, beyond that would call it a high rate of failure.
I thought that the silicon of the 2080 vs the 2080Ti were different? https://en.wikipedia.org/wiki/GeForce_20_series They have differing transistor counts by the billions, as well as different sizes.

So it could be a TSMC issue with the production, or a design flaw with the GPU (seems less likely). If the issue affects all three RTX dies (TU102, TU104, TU106), then design flaw becomes much more probable. If it's a manufacturing issue, then it's just one more nail in Moore's Law's coffin, one more issue to be resolved before things move forward...

As far as nVidia not saying anything more than what they already have, they have to protect relationships with TSMC, et all. Plus, they don't want to scare away any sales. If 80% of the cards are good, and end up having a normal lifespan >3years, at least the customer/enthusiast isn't screwed over... I haven't heard that nVidia or any AIB's are not supporting their customers having these issues. Of course it sucks to have to deal with it, sucks to get multiple failed cards in a row, sucks to pay return shipping, and it sucks they were so expensive.

I'm just going to wait for the 7nm GPU's before I upgrade from my 1080Ti.

If nVidia came out and said "we are going to double the warranty on the 2080Ti cards", that would be well received by the community. Of course, if EVGA still has lifetime warranty, this is already in place at least for some... does anyone still do lifetime warranties? Maybe I am thinking of old BFG...

Still hoping that there is some kind of conclusion to the testing Kyle is doing, but if its the silicon, not sure that any amount of testing will reveal that, short of swapping the GPU out with a card that has been running for months with no issues. Expensive, and not very easy to do. I wonder if Louis Rossman could swap these giant ass GPU's...

Appreciate all the work you've put into this Kyle.
The failures afaik are on 2070's, 2080's, 2080Ti's, RTX Titans and I don't have any figures on the 2060's but suspect they are suffering in similar fashion . The 1660 and 1660Ti are made from the same silicon... I don't have figures on those recent boards tho.
 
Back
Top