Amazon’s New World game is bricking GeForce RTX 3090 graphics cards

Ebernanut

[H]ard|Gawd
Joined
Dec 15, 2010
Messages
1,569
I don't understand why most seem to think that both can't be partially at fault. There's a clear trend in the cards that are dying showing that there's something going on at the hardware/firmware/driver level that's making certain models more susceptible as well as the high end of the 3000 series overall. At the same time it's one game that's killing them which shows that it's doing something that no other game does and from what I can tell it's not anything that benefits the user like better graphics simply pushing the cards harder.

I think in the early days of furmark it was more to blame for killing cards since it was pushing them in a way the designers had never anticipated, once the engineers started designing cards to withstand the abuse the responsibility shifted to them to make sure it actually works. I also don't think it's simply high framerates that are responsible for this since plenty of other games have had that issue(and still do) without killing this many cards. Edit: It might be interesting to see if some of these games that have super high framerates in the menu also cause these cards to die, I know they're still out there because I've encountered it fairly recently in older games but maybe there's not enough people running them on these cards to get any attention.

These days I do expect cards to have more protections that keep them from dying so it's fair to blame the hardware but if the game is doing something weird that was never anticipated then it shares some blame as well, how much blame each deserves in this scenario depends on what exactly the game is doing to trigger this and how hard it would have been to anticipate.

I suspect the game will get patched to fix whatever is causing this and that GPU manufacturers will add protections and/or fix design issues in future designs. Like with furmark it will mainly be because of the cost of replacing cards under warranty but they also don't want to deal with the backlash and this time they can't hide behind the fact that furmark isn't a game and was intentionally made to stress cards.
 
Last edited:

Wat

Limp Gawd
Joined
Jun 23, 2019
Messages
333
If it were the game (high framerates), then why wouldn't 20 year old games be killing cards too? Surely someone with a 3090 has tried to play interstate 76 or something.
 

GoodBoy

2[H]4U
Joined
Nov 29, 2004
Messages
2,195
Fair still the cards fault. Hear a lot of people blaming the game... which seems odd. There should be no software you can run that would kill your card, it should have both overvolt and heat protections that work. Worse case it should shut down right.
I generally agree with this point, but you can write software that will damage components.
No rage... we just disagree on the expected behvaiour of voltage protection systems. :)

I believe its silly to blame software any software for melting a GPU. These cards all have hardware level protections against exactly this. This probably is a soldering defect as per the company that makes the cards in question. Speculation goes that Nvidia changed the spec requiring higher voltage handling last min and some of the MFGs solutions are sub par, but neither evga or Nvidia have admitted anything like that nor would they. That is on them and perhaps Nvidia not Amazon and new world.
The GPU's are not melting. That protection is inplace and works. The GPU is the main processor for the graphics card, none of those have been damaged by these issues.
Those warnings where at a time before most chips even adjusted their frequencies never mind real time fine tuning of voltage and power delivery. Completely different era where ya a sub par cooling setup could be exposed by far worse then black screens and reboots. :) Also when many peoples video cards hand not been used for anything but basic 2d output... I remember the first time I got Quake to run in hardware mode on a old AMD card. Different time. First time that card had to really work and the room got warmer. ;)

No software stress test should be capable of smoking a modern GPU or CPU if its made properly.
The GPU doesn't 'smoke', as previously mentioned, no GPU to our knowledge has been damaged. It's either been a fuse, solder joint, or single capacitor.

Even still, the software should be patched. There is 0 reason for a menu screen to run at 9000fps.

While everyone agrees this shouldn't damage any components, it is also an obviously unexpected scenario, therefore it wasn't tested. The total power, the total current - those protections are in place and work - see fuse and board power limits that are firmware enforced. The unexpected scenario allows unanticipated power levels thru a single sub-pathway, but the total often is not high enough to trigger the other protection mechanisms. GPU and RAM protections are temperature based, and voltage capped. The entire board has a total power limit. Not all external components to the GPU are temperature monitored, or even individually voltage monitored. Doing so is not feasible.

The next generations (and probably some in this generation) will likely have a specific mechanism in place to protect against this problem. It would also be ill-advised to leave this 'feature' in the software, I do not see why you keep trying to excuse it.
 

ChadD

Supreme [H]ardness
Joined
Feb 8, 2016
Messages
5,475
The next generations (and probably some in this generation) will likely have a specific mechanism in place to protect against this problem. It would also be ill-advised to leave this 'feature' in the software, I do not see why you keep trying to excuse it.

Well mostly cause its not an outlier. Nvidias early 3000 cards have been dying the same types of deaths in many games. The fault isn't a menu running at too many fps. Of course I agree best practice would assume you put a Frame limit on something like that.... but I mean should they need to ? The driver should cap max FPS in the 500 range at the least anyway... and again how does running a menu at 9000 FPS put any more strain on a GPU then running scenes with millions of polygons and 10+GB of textures at 200 FPS ? I supposed it perhaps could load more power to one specific point on a GPU... but I mean if you make a 900 horse power super car you better engineer the axil to withstand some force. In this case it seems the power delivery designs where inadequate... could have been some MFGs cheaping out, and or perhaps Nvidia upped the power requirements after designs where in production. (which has been reported)

The only difference with New world is it was a widely popularized beta. You had a lot of people playing the same game and talking a lot about it at the same time. Also Amazon likely purposely made sure they had many players in their beta with new hardware. When developers select players for closed betas, they try and get a wide swath of rig variation... but they also try and get the folks with the expensive hardware. Those guys tend to both be purchasers themselves and influence sales. If you game on a potato you won't get beta invites, if you game on a shiny new top of the line RIG you will get an invite to any beta you apply too. Anyway long post short New world had eyes on it, and a lot of 3090 owners playing. So if 10% of all 3090s died everywhere... people wouldn't noticed the 8 or 9 people complaining Halo killed their cards, but they might notice 100 saying New worlds beta did. (and yes I understand they did an open beta... but they had a closed round before and I'm sure they got a bunch of 3000 owners in on the beta)

In any event it looks like basically all the Nvidia MFGs have updated/upgraded their designs. This was the fault of the GPU MFGs and perhaps Nvidia, all we know for sure is the GPU manufacturers have all taken responsibility and are making it right for consumers.
 

cybereality

[H]F Junkie
Joined
Mar 22, 2008
Messages
8,789
I still don't think it's the game. If that Jay video is correct, and it's the power spikes, then I don't see how the game developer would cause that.

For example, in DirectX 12 (or other APIs), it's not like you request that the GPU jumps to 125% power (well maybe you could hack it with C++ but I seriously doubt Amazon did this). You can request that a 3D model is loaded onto the GPU, or that a set of textures goes to video memory, etc.

As the developer, you are making a request via a standard interface (the API which the OS software and hardware has to conform to). Then the DirectX 12 library and runtime processes the request, communicates with the video driver, a finally sends it to the GPU hardware.

If you were to make a mistake in the code, sending the wrong values to a function, chances are the game would just crash, or you would get a black screen, or some other fail state. But it would not damage the hardware.

Since cards are dying, but not all of them, my guess is that it's on the hardware level. I'm a software guy, so I won't posit what the issue is, I just don't believe it can be code. Nvidia or AMD could also find the issue and maybe put a fail safe on the driver level, but it would basically be a band-aid on a hardware flaw.
 

GoodBoy

2[H]4U
Joined
Nov 29, 2004
Messages
2,195
Well, JayZ said to not ever ever increase the power settings.. and that's all I currently have set on my 3090FE. Set +100% on the core voltage, and 114% (likely the true limiter) on the Power Limiter, the card can oc itself just fine. It's been running like this for months.

Some amount of failure is to be expected. What's unknown is how many cards have had this issue. Let's try and overestimate it at 20,000, from a generous estimate of 2,000 people complaining about it on the (games publishers) forums, and multiplying that by 10. How many 3090's have been sold? Well, I know nVidia sold 18 million cards in 21q1 alone. 3090 has been out for a year now. Estimate 60 million nvidia GPU's sold since the 3090's release, just have to break out the ratio vs other chips. Steam hardware survey can give the ratios. Add up all 3000 series chips (including laptop gpus) and that total is 7.48 percent, the 3090 .42 percent, or 5.61% of all 3000 gpus. That means in q1 21 alone, 1.01 million 3090's were sold, Since launch we can safely estimate nearly 4 million. So with a likely overestimate of 20,000 being affected, that's less than a half percent.

If anyone knows the real number affected, let us know.

In the big scheme, looks pretty minor. But 2,000 people complaining on the game's forums is a loud noise that all nerds can hear. Just have to put it in perspective.

Devil's Advocate: Not all 3090's are playing this game. But I don't know how to estimate the % of 3090 owners who play it.. the game would need it's own hardware survey.
 

cybereality

[H]F Junkie
Joined
Mar 22, 2008
Messages
8,789
Sounds believable. Except I seriously doubt it's 20k people with the problem. It would be kind of hard to have a $1,000+ product die and not talk about it on the internet.
 

GoodBoy

2[H]4U
Joined
Nov 29, 2004
Messages
2,195
Yeah, I overestimated it on purpose, just to point out how insignificant this is in the grand scheme.
 

Geforcepat

Gawd
Joined
Jun 2, 2012
Messages
1,000
I hate it for the gamers who's hardware is messing up on this game. But when they saw Amazon they should of just waited a year or two and tried it at half off and a year of driver updates and game patches.
 

Sycraft

Supreme [H]ardness
Joined
Nov 9, 2006
Messages
5,011
Those warnings where at a time before most chips even adjusted their frequencies never mind real time fine tuning of voltage and power delivery. Completely different era where ya a sub par cooling setup could be exposed by far worse then black screens and reboots. :) Also when many peoples video cards hand not been used for anything but basic 2d output... I remember the first time I got Quake to run in hardware mode on a old AMD card. Different time. First time that card had to really work and the room got warmer. ;)

No software stress test should be capable of smoking a modern GPU or CPU if its made properly.
I don't understand why you seem to cast all blame on GPUs, and give software a complete pass. I could flip around what you say and it should also be true "No GPU should be damaged by properly made software." It goes both ways. While hardware should be made to try and limit itself so that it can't be damaged, software should also avoid acting in ways that puts undue stress on hardware that can damage it. The idea that safeties are perfect is silly. They aren't, they never have been, and they can have issues when things are run past the norm.

As a kind of analogy take elevators: They have lots of safety systems to make sure they don't hurt or kill you during normal operations. Otis brakes, buffers at the bottom, position sensors, lockouts, etc. They are even designed to deal with conditions outside normal operation, like if they get overloaded they'll just get mad and refuse to move, rather than risk breaking... However that doesn't mean you can't hurt yourself. If you override normal operation, particularly the rooftop inspection panel or the motor control room, you can create an unsafe condition. If you want to be safe it is incumbent on you NOT to do those things that are outside of normal operation unless you are trained and know what you are doing. Likewise safeties can fail. While an elevator should never move unless both doors are closed and latched, it can happen. If it does, it is incumbent on you not to stick your body in the shaft, or you will get hurt. So to be safe, don't lean against doors even in properly operating elevators. While they SHOULDN'T fail that doesn't mean they CAN'T and you are part of the safety equation.

So yes, the GPU manufacturers can and should try and figure out how this particular load is bypassing the safeties on the card and make sure it can't, the game maker also needs to stop operating the card in such a manner. It is doing something stupid, given that this is not something we see with other software.
 

mouacyk

Limp Gawd
Joined
Dec 5, 2015
Messages
158
The word on the street is that some cards have improperly balanced VRMs, so some phases get overloaded beyond spec. Could be a mismatch of controller to phases available at either hardware or firmware level. Definitely not a software API problem. An API is an interface contract (from the hardware) to do a known thing. If an unknown thing happens through a standard API call, that's on the hardware/driver provider.
 

ChadD

Supreme [H]ardness
Joined
Feb 8, 2016
Messages
5,475
I don't understand why you seem to cast all blame on GPUs, and give software a complete pass. I could flip around what you say and it should also be true "No GPU should be damaged by properly made software." It goes both ways. While hardware should be made to try and limit itself so that it can't be damaged, software should also avoid acting in ways that puts undue stress on hardware that can damage it. The idea that safeties are perfect is silly. They aren't, they never have been, and they can have issues when things are run past the norm.

As a kind of analogy take elevators: They have lots of safety systems to make sure they don't hurt or kill you during normal operations. Otis brakes, buffers at the bottom, position sensors, lockouts, etc. They are even designed to deal with conditions outside normal operation, like if they get overloaded they'll just get mad and refuse to move, rather than risk breaking... However that doesn't mean you can't hurt yourself. If you override normal operation, particularly the rooftop inspection panel or the motor control room, you can create an unsafe condition. If you want to be safe it is incumbent on you NOT to do those things that are outside of normal operation unless you are trained and know what you are doing. Likewise safeties can fail. While an elevator should never move unless both doors are closed and latched, it can happen. If it does, it is incumbent on you not to stick your body in the shaft, or you will get hurt. So to be safe, don't lean against doors even in properly operating elevators. While they SHOULDN'T fail that doesn't mean they CAN'T and you are part of the safety equation.

So yes, the GPU manufacturers can and should try and figure out how this particular load is bypassing the safeties on the card and make sure it can't, the game maker also needs to stop operating the card in such a manner. It is doing something stupid, given that this is not something we see with other software.
My point is it has little to do with this load. These cards are dying in all games. This game is simply in the spotlight and has a high concentration of PC gamers with $1000+ GPUs playing it right now. Amazon like all developers making a MMO stacked their closed beta with as many potential whales as possible.

Here is something most people don't think about... when a developer invites you to a beta, or there is a application process. They are not really asking you what CPU and GPU your running for technical reasons, at least alone. They want to get as many high value marks in on the game, they want players with $5000 rigs to feel special.... they got in on the game before anyone else ect.

So the terrible fail rate on first gen 3000 cards got noticed. That is all that happened here. You can find plenty of people that posted to Redit and many other game forums long before the Amazon beta went open beta. Cards reportedly died in the same ways... and the replacements consumers got in RMAs have newer boards. It sounds like many MFGs knew there where issues... this is only news due to the high number of 3080/90 owners playing this game.
 

cybereality

[H]F Junkie
Joined
Mar 22, 2008
Messages
8,789
It's not the game. People want to place blame because it's Amazon and it's a popular new title, but that is just for headlines. It could have happened with any game (and not because of bad programming).

Fact is, there was a design flaw in the video cards. Maybe not Nvidia's fault, it might have been AIBs that cheaped out on components, or maybe Nvidia did push the design too far. In either case, it's a hardware problem, plain and simple.
 

ZodaEX

Supreme [H]ardness
Joined
Sep 17, 2004
Messages
4,129
It should be really simple for the developer to cap the menus at say....120fps.

I'm reading that the game is a work in progress.
I bet by the time the game comes out of early access, the fps in the game will be capped.
 

cageymaru

Fully [H]
Joined
Apr 10, 2003
Messages
20,783
New World runs great on my RX Vega 64. I would think if it was bad software, it would kill off the old hardware first. Maybe Nvidia is experiencing test escapes again?
 

Lateralus

More [H]uman than Human
Joined
Aug 7, 2004
Messages
16,813
New World runs great on my RX Vega 64. I would think if it was bad software, it would kill off the old hardware first. Maybe Nvidia is experiencing test escapes again?
It was multiple cards from both vendors. The 3000 series might have been affected the most (we don't know?) but according to an earlier post, EVGA RMA'd about two dozen cards over the issue. If that is indeed the case, this thread/issue is beyond laughable and a fine example of internet sensationalism.
 

cageymaru

Fully [H]
Joined
Apr 10, 2003
Messages
20,783
It was multiple cards from both vendors. The 3000 series might have been affected the most (we don't know?) but according to an earlier post, EVGA RMA'd about two dozen cards over the issue. If that is indeed the case, this thread/issue is beyond laughable and a fine example of internet sensationalism.
All I can say is the game runs like butter on my system. :) That Reddit thread is hilarious. People complaining that the game uses too many PC resources. I always thought that was the DREAM for a PC enthusiast; to find a game that can utilize all of their PC's resources to give us a reason to upgrade!

Well off to play in the real world called the Post Office and the grocery store. Maybe tonight I can find time to fire up the "New World." :)
 

tangoseal

[H]F Junkie
Joined
Dec 18, 2010
Messages
9,329
Ive got about 40 hours (lev 30) into new world on my 6900xt water cooled, custom loop. Game barely pushes my gpu past 45c. Im not sure what to say but it seems the 2nd best GPU is the winner in the end. My garbage AMD, as some would call it here, is still working great!
 

GoodBoy

2[H]4U
Joined
Nov 29, 2004
Messages
2,195
EVGA Said it was about 2 dozen cards, that it was a bad solder job on a small batch, and that they were all advance RMA'd without the usual upfront deposit. They probably know the serial number range of affected 3090's.

This is why we love EVGA.

EVGA FTW.
 

tunatime

Well...OK
Joined
Sep 15, 2011
Messages
4,906
Ive got about 40 hours (lev 30) into new world on my 6900xt water cooled, custom loop. Game barely pushes my gpu past 45c. Im not sure what to say but it seems the 2nd best GPU is the winner in the end. My garbage AMD, as some would call it here, is still working great!
This is what happens when they push cards to the limit out the box. Out the box stock My 3090 was always power limited.... something is worng when undervolting helps clock speeds even on water
 

Gorankar

[H]F Junkie
Joined
Jul 19, 2000
Messages
10,865
About 20 hours in game and I have not had any issues with a 3090 FE, or vanilla, or whatever we are calling a direct from NV card these days. Temps seem OK, even mem temps are staying below 70c. I do run a custom fan profile, fans always run, and ramp sooner, but that is it really.
 

pendragon1

Extremely [H]
Joined
Oct 7, 2000
Messages
35,670
EVGA Said it was about 2 dozen cards, that it was a bad solder job on a small batch, and that they were all advance RMA'd without the usual upfront deposit. They probably know the serial number range of affected 3090's.

This is why we love EVGA.

EVGA FTW.
its not just evga and that situation was only for that batch. other cards/brands are dying in other ways.
 

cybereality

[H]F Junkie
Joined
Mar 22, 2008
Messages
8,789
Been testing some 2D engine stuff, and easily getting over 10,000 fps on my 6800 XT. There is some coil whine, but no problems.
 

OutOfPhase

Supreme [H]ardness
Joined
May 11, 2005
Messages
4,428
It really depends upon the exact workloads. 2D stuff doesn't keep very much of the card on, so it would be very hard pressed to upset any limits.

It really appears as though there's a just right blend to sneak through the heuristics, and amazon nailed it. :)
 

GoodBoy

2[H]4U
Joined
Nov 29, 2004
Messages
2,195
its not just evga and that situation was only for that batch. other cards/brands are dying in other ways.
Yeah, no matter your GPU a max fps cap that matches your LCD's max refresh rate is still recommended.
 

Falkentyne

[H]ard|Gawd
Joined
Jul 19, 2000
Messages
1,817
No it's not the game. It's crappy AOZ power stages combined with multiple power stages per phase, causing imbalancing and then overloading.

Founder's edition cards (3080 and 90) have 10 phases to 10 power stages for NVVDD, so, no boom.

The game is shader heavy in the menus.
Path of Exile with Global Illumination quality=Ultra is even more shader heavy. But I guess no one cares about that.
 

tunatime

Well...OK
Joined
Sep 15, 2011
Messages
4,906
I really think Nvidia changed the specs to the card makers at the last minute or something when they leaned how good the and cards where this round.
It explains why they had to lower bost clocks, the bad caps, cards hitting the power limit and undervolting helping clocks
 

t1337duder

Limp Gawd
Joined
Sep 7, 2014
Messages
229
Hardware is supposed to be reliable for the software, as it's lower level in the chain. It doesn't work the other way around. Software developers can't anticipate scenarios like this because they have to code for a variety of hardware. There are smarter ways to code things so that they are more efficient, yes, but in the end it's up to the hardware developer to be responsible for the reliability of their products.

It's NVIDIA's job to anticipate every single imaginable scenario that could kill the GPU and put protections in place to prevent that from happening. It's not much of a philosophy debate and you shouldn't have to be have a bachelor's in computer scientist to understand this.
 

DanNeely

Supreme [H]ardness
Joined
Aug 26, 2005
Messages
4,134
No it's not the game. It's crappy AOZ power stages combined with multiple power stages per phase, causing imbalancing and then overloading.

Founder's edition cards (3080 and 90) have 10 phases to 10 power stages for NVVDD, so, no boom.

The game is shader heavy in the menus.
Path of Exile with Global Illumination quality=Ultra is even more shader heavy. But I guess no one cares about that.
Has POE blown up any cards? I haven't seen anything about it doing so; but I've also taken this league off to play other stuff, so I've not been following it's news closely.

Edit: If it did though, I'd've expected to see a dev response collected on GGG Tracker not just a forum/Reddit thread.
 

GoodBoy

2[H]4U
Joined
Nov 29, 2004
Messages
2,195
So many GPU card design experts in this thread!

[H] is really lucky!

Remember, this issue is affecting an estimated tenth of a percent of cards. That is what you are arguing about.
 
Top