Nvidia's real problem - the next generation

JayteeBates

[H]ard|Poof
Joined
Jul 21, 2007
Messages
5,125
So Fermi is here and it's performance is within the same ballpark as ATi currently. But the engineering at it's heart is radically different going for brute force "MOAR POWAH"!

So while they now have a card to compete with ATi - can they produce a next generation with the same philosophy? I say they cannot and I present you with what I consider the end of the brute force approach.

Code:
Core   Card                Fab   Die     Core    Mem Bus  Transistors  Watts
                           (nm)  (mm^2)  (Mhz)   (bits)   (millions)
G80    GeForce 8800 Ultra  90    484     612     384      681          ?
GT200  GTX 280             55    576     602     512      1400         ?
GF100  GTX 480             40    529     700     384      3200         ?

Now if we look at the history of transistor count in each gen from the G80 onward we see an increase of more than double the previous generation. So lets make some future predictions...

If we extrapolate the next generation NV product will have:

6,400,000,000 transistors
32nm fab process

The transistor count follows the trend and lets even give them the benefit of the doubt and assume they can manufacture it at the same process level intel does currently, 32nm. So that leaves us with a die size of approximately:

655 mm^2

Just doesn't seem feasible for them to continue down the current path. they are going to be forced to start a different approach be it either mimicing ATi's more modular design or some other path. The power requirements for a 655mm^2 die would be astronomical.
 
Isn't most manufacturing companies skipping 32nm and going to 28mm. That would be a die size in the 500 to 510 size. Still way too big but pretty much the same as the previous generations.
 
Nvidia and ATI are already bout halfway through their next next gen development.
 
TSMC is not going to go 32nm, next process size in 28nm, followed by 22nm middle to late next year.

Keep in mind, the GF100 is the "big jump" architecture-wise - the last big jump for them was G80. And the interesting jump with G80 (size wise) was G92b.

Think of how much mileage and scaling (in both directions) was achieved with G80. In the short term (next 12 months), there is almost no chance of a "double Fermi" as you have extrapolated. The G92b shrunk a lot compared to G80, and was faster. I have no doubt that there are things that will be trimmed from the current GF100 die for its G92/G92b equivalent in addition to enabling the remaining SMs to make the next major revision, before a new derivative with more SMs is ever released. I would predict that the next major revision will actually be smaller, not larger.

One of the major issues with the GF100 was the memory controller and the other was heat / power draw - I bet we'll see a "485" (or renamed 5xx series) down the road that is close to the same size (or smaller) and will be quite a bit faster, with all cores active.
 
While 28nm is possible it isn't even listed on TSMC's site under the advanced tech yet. I consider intel to be the leader in silicon fab and they are still on 32 at the moment. I don't think TSMC will be able to provide 28nm process en masse for at least 1-2 more years.

Link to TSMC Advanced Tech page: http://www.tsmc.com/english/b_technology/b01_platform/b0101_advanced.htm

Here is a link from Jan 8th 2010 from TSMC about them working on 28nm process with Qualcomm: http://www.tsmc.com/tsmcdotcom/PRListingNewsAction.do?action=detail&newsid=4462&language=E

However assuming they can make it to 28nm that would extend out perhaps one more cycle. They still are going to run into a wall with regards to a monolithic design. The complexity increase is not linear. Intel had to back off frequency increases with the P4 and take another approach - NV will have to do the same with this current line of engineering. I don't forsee them failing, nor do I want them to. I want cheaper, faster cards with more eye candy in my games.
 
Last edited:
I thought part of the reason Fermi has such ridiculously high thermals is due to some leakage problems they had on the 40nm process. At least isn't that what Charlie was spouting about for the last 5 months? So NVIDIA could reduce the thermals considerably by fixing whatever they broke on the 40nm size, and then even more by doing a die-shrink.

I definitely agree with OP that NVIDIA is doing a "brute force" approach in order to get the GTX 480 out the door right now, but I'm fairly certain the architecture itself has a lot of room for improvement in both performance and efficiency.

I'm intrigued by the technology the GF100 brings to the table, but not enough to go out and grab one just yet. There's no question that NVIDIA is scrambling to get a respin done, and no doubt in my mind that the respin will be a much better product. The GTX 480 is really just a stop-gap product for PR damage-control IMO. I'm very much looking forward to seeing their next card.
 
We'll have to see what NVidia does with it's engineering strategy. It seems obvious to me that AMD's strategy makes a lot more sense. It's just a question of whether NVidia decided to start designing smaller GPUs after RV700 launched, or whether they continued with their big GPU design.

I thought part of the reason Fermi has such ridiculously high thermals is due to some leakage problems they had on the 40nm process. At least isn't that what Charlie was spouting about for the last 5 months? So NVIDIA could reduce the thermals considerably by fixing whatever they broke on the 40nm size, and then even more by doing a die-shrink.
Yeah. I think even Anand was talking about how the TSMC 40nm was leaked a lot more and was a lot more variable than other processes, which impacted performance and yields.
 
Global foundries is doing 28nm process and amd/ati are supposed to jump to them. For the past 2 years ati was faster in switching to a smaller process node as compared to nvidia, I would attribute this to their design approach but also having more experience in the actual process integration when it comes to making these chips.

Nvidia is fabless, amd/ati just went fabless, hence their manufacturing experience is greater. The overall looser will be TSMC and I would not be surprised if nvidia wants to jump ship.
 
Fermi's problem isn't really that it was designed from the start to be a "brute force, more power" GPU. It's because of flaws and leakage in TSMCs 40nm manufacturing that resulted in them having to pump ridiculous amounts of power into the card to get it working at decent speeds. They could have reduced the TDP but then it would slower than a 5870 and do them no favors.

Similarily, I'm sure ATI, if they really wanted. Could beef up the VRM (weakest area) and GPU cooling on the 5870 and put another 120 watts into it to get to Fermi levels and maybe the 5870 would be even more of a beast.
 
Fermi's problem isn't really that it was designed from the start to be a "brute force, more power" GPU. It's because of flaws and leakage in TSMCs 40nm manufacturing that resulted in them having to pump ridiculous amounts of power into the card to get it working at decent speeds. They could have reduced the TDP but then it would slower than a 5870 and do them no favors.

Similarily, I'm sure ATI, if they really wanted. Could beef up the VRM (weakest area) and GPU cooling on the 5870 and put another 120 watts into it to get to Fermi levels and maybe the 5870 would be even more of a beast.

no, IT IS the design problem of Fermi itself that cause the leakage, while TSMC couldn't manufacture the thing out like they did with ATI.

didn't we already discuss about this while back about RV 870 article?
 
I saw an interview with a senior ATI engineer a month or so back. His job was to take the giant how to make your chip work on our 40nm process book from TSMC and figure out what stuff in it was really important and what would cause more problems elsewhere if implemented. One of the things his team figured out early on was that TSMC's 40nm process was going to have problems with connections between layers of the chip which would require cranking the voltage and dropping the clock speed to maintain suitable yields. ATI responded by doubling up every connect for redundancy. Nvidia apparently didn't and ended up with a hot as hell chip with disappointing yields. At some point they'll be doing a refresh of the fermi design; and almost certainly will end up doing the same. This should make Fermi 1.1 significantly more competitive than the current revision.
 
I saw an interview with a senior ATI engineer a month or so back. His job was to take the giant how to make your chip work on our 40nm process book from TSMC and figure out what stuff in it was really important and what would cause more problems elsewhere if implemented. One of the things his team figured out early on was that TSMC's 40nm process was going to have problems with connections between layers of the chip which would require cranking the voltage and dropping the clock speed to maintain suitable yields. ATI responded by doubling up every connect for redundancy. Nvidia apparently didn't and ended up with a hot as hell chip with disappointing yields. At some point they'll be doing a refresh of the fermi design; and almost certainly will end up doing the same. This should make Fermi 1.1 significantly more competitive than the current revision.

Actually Fermi is supposed to be A3 revision, so that would make it Fermi 1.3
 
ATI has more engineers and experience in jumping to a smaller process more quickly than Nvidia. ATI was more conservative in what they expected out of TMSC's 40nm process, and was able to better predict and compensate for potential problems. Nvidia was more optimistic. ATI turned out to be right.

What nvidia does for their next generation will certainly be interesting. ATI's smaller die approach is working out better than expected. Will Nvidia follow suit? I doubt it. Fermi tried to be great at GPGPU and gaming, and I expect Nvidia to try that approach again which will continue to cause their transistor count to skyrocket.
 
no, IT IS the design problem of Fermi itself that cause the leakage, while TSMC couldn't manufacture the thing out like they did with ATI.

didn't we already discuss about this while back about RV 870 article?

No, you are confusing cause and effect. Leakage is an unintended consequence of poor tolerances by TSMC for which the design did not compensate for. The design did not CAUSE leakage. Leakage is a symptom. TSMC and ATI had the same leakage problems because TSMC could not provide the tolerances neccessary for transistor channel length. ATI had the same problems as Nvidia did but they switched to 40nm a generation earlier and learned their lessons. Nvidia didn't because TSMC kept promising them it was fixed and they could do it (when they couldn't). What is different is ATI realized the leakage was there and redesigned from the lessons they learned from RV740 to put in double redundancy on all the connects to deal with leakage. Nvidia didn't do this and therefore the leakage got out of control and that caused the power problems TSMC told them they could deal with it but obviously they couldn't.

-edit
Exactly what DanNeely said.
 
Last edited:
ATI has more engineers and experience in jumping to a smaller process more quickly than Nvidia. ATI was more conservative in what they expected out of TMSC's 40nm process, and was able to better predict and compensate for potential problems. Nvidia was more optimistic. ATI turned out to be right.

What nvidia does for their next generation will certainly be interesting. ATI's smaller die approach is working out better than expected. Will Nvidia follow suit? I doubt it. Fermi tried to be great at GPGPU and gaming, and I expect Nvidia to try that approach again which will continue to cause their transistor count to skyrocket.

ATI was fortunate to learn their lessons in the less relevant RV740 a generation ago. Nvidia jumped to 40nm with a giant and key part so if it failed, their problems would be compounded.

Also Fermi is rumored to have be a reclaimed GPGPU Tesla part off the shelf wrangled into a graphics chip.
 
Maybe on 28nM nvidia will finally hit their target 900Mhz core clocks for Fermi with all 512 SP's enabled.

Assuming equivalent boost in GDDR5 speeds as well would 30% speed boost enable Fermi to compete with HD 6870?
 
We still dont know if Nvidia will keep developing these massive GPUs, or maybe they can move to a different strategy like AMD actually does: doing smaller GPUs and putting 2 of these in the same card.
 
Spare-Flare and DanNeely, are redundant interconnects something that could be added fairly easily, that is, are there a manageable number of them between chip layers that could be tweaked without heavy reworking, or are such interconnects present all through the design?

If they are relatively few, one would think nVidia could apply ATi's solution to the problem and make Fermi what it should have been.
 
Also Fermi is rumored to have be a reclaimed GPGPU Tesla part off the shelf wrangled into a graphics chip.

Given nVidia's mysterious statement that there will be no B1 revision of the current design, I wonder if Fermi was a re-purposed stopgap and a pure gaming chip is on the way already?
 
Spare-Flare and DanNeely, are redundant interconnects something that could be added fairly easily, that is, are there a manageable number of them between chip layers that could be tweaked without heavy reworking, or are such interconnects present all through the design?

If they are relatively few, one would think nVidia could apply ATi's solution to the problem and make Fermi what it should have been.

No, you basically have to redo the entire thing from scratch. You'll have to wait for their next generation.

Nvidia hasn't even done a base layer respin on Fermi, I think they did a metal layer respin which is mostly useless.

I think they had to put Fermi out in this state as a stopgap and they need some revenue for return on the investment they made in it. Nvidia is sitting on bags of cash though. I recently sold all my Nvidia stock so I kept up on their financials for awhile. They'll be fine if their next generation is okay. For all the 6 months they were having problems with Fermi, I'm pretty sure they have another whole team out there working on the next chip and they are maybe even halfway to tape-out.
 
I've been negative on Fermi for the past couple of days, but I'm looking forward to Fermi 2. I think it's going to be everything the current Fermi meant to be and then some. I hope they don't do the brute force approach and try to ram as much shit as possible on the die. We need efficiency, speed and affordability. I hope Nvidia took some notes from ATI. Let your hardware do the talking, not the CEO/marketing department.
 
IIRC, the FX lead to the 6800 series. Maybe some good can come out of this. I was looking forward toward some competition, but it sounds like it won't happen for several months. Maybe not until fall or the end of the year.
 
Honestly the 4xx cards aren't that bad. They have technical problems but also understand the other compute parts to the GPU. It could quite easily have been a Tesla style GPU that was re-purposed, but that doesn't mean it isn't a powerhouse. I think that Fermi will have more eventual success than ATI will with having more and more shaders. Considering that now ATI cards have 1600 shaders and if it keeps up at that rate eventually and probably sooner than later it'll end up that adding shaders does nothing. GPUs need to remain innovative unlike CPUs.
 
Honestly the 4xx cards aren't that bad. They have technical problems but also understand the other compute parts to the GPU. It could quite easily have been a Tesla style GPU that was re-purposed, but that doesn't mean it isn't a powerhouse. I think that Fermi will have more eventual success than ATI will with having more and more shaders. Considering that now ATI cards have 1600 shaders and if it keeps up at that rate eventually and probably sooner than later it'll end up that adding shaders does nothing. GPUs need to remain innovative unlike CPUs.

ATI and Nvidia count shaders differently. And graphics work is massively parallel by nature, so ATI and Nvidia can continue to double shaders and see massive gains until we stop using shaders. And remember, with Fermi Nvidia also doubled the shaders. The unique things like the cache design and double precision units don't help with games - they are for GPGPU applications.
 
No, you basically have to redo the entire thing from scratch. You'll have to wait for their next generation.
Nvidia hasn't even done a base layer respin on Fermi, I think they did a metal layer respin which is mostly useless.
I think they had to put Fermi out in this state as a stopgap and they need some revenue for return on the investment they made in it. Nvidia is sitting on bags of cash though. I recently sold all my Nvidia stock so I kept up on their financials for awhile. They'll be fine if their next generation is okay. For all the 6 months they were having problems with Fermi, I'm pretty sure they have another whole team out there working on the next chip and they are maybe even halfway to tape-out.

I wish I could be that optimistic.

Assuming a new base spin, with new/redundant connects, that equals a bigger chip all else being equal. Those redundancies don't come free. NVidia engineers may make a cooler chip (less leakage), but at the cost of real estate. I don't know, maybe there will be a new chip layout with signifigant changes. It is telling when TSMC cannot even provide enough chips with all 512 SPs enabled for the consumer market, even for a PR halo. 28nm won't be online anytime soon, and if you though leakage at 40nm was bad...........

Long story short, there is no immediate solution for NVidias G4XX, even with a base respin. TSMC doesn't have the capability to make a chip that big, with low enough leakage, thermal envelope, and defect rates, to make it economically viable for NV to push it out the door at a price that can compete with ATI. Essentially, I have no faith TSMC can fix their 40nm problems to the point to make G400s viable short term.

I think the current crop of G400s are nothing more than beta products. After 2 respins they got a chip competitive enough to release without everyone thinking it was a complete joke, and it was pushed out the door as a face saving measure. No way NVidia will turn a buck on this current G400 series. I would almost be willing to bet that there are no further plans to lay out another 3 billion transistor chip on TSMC 40nm. NV might just bide it's time, as ATI looks to find itself in a similar situation with it's nextgen part, and wait 9 months for TSMC or GF to get their shit together for 28nm. NV does have a pile of cash to sit on.

Opinions are like assholes, everyone has one.
 
I wouldn't say "Fermi 2" would scale so linearly in transistor count. Some of the aspects of Fermi's core seem to be fixed more or less, and wouldn't just be "doubled" for double the SP's/ROP's/TMU's etc.

I also think Nvidia will cut a lot of the GPGPU features down, as the "cache" and all that seemed to do jack all, even in F@H which will only be 2x that of a 285 (2x the specs with 2x the perf. = no architecture boost really).

GF100 at this point does seem very much like a beta chip, as it was obviously very leaky and didn't see any real performance gains from the architecture changes.

Wouldn't it be interesting to see Nvidia "double up the redundancy" or whatever this term is (sorry, no computer engineering degree yet :p ) and have a new card out in 6 months at most? I don't know how much work it would take to re-do the chip with redundancy lanes (or w/e) but I guess the real pain would be simply the time TSMC takes to do any chip.

And yeah, GF FTW. TSMC bought $1.5B worth of machinery and they still can't manufacture jack shit.
 
The real question is when ATI jumps to GF for the wafers (is there any doubt?) how much further can they pull ahead? GF is becoming a monster of a Fab with capabilities only second to Intel at this point. I have little doubt ATI has a 28nm part in less than a year maybe not on shelves, but likely being polished to be released.

Links for fun reads

http://www.globalfoundries.com/
http://www.brightsideofnews.com/new...-28nm-launch-customer-at-globalfoundries.aspx
http://www.dvhardware.net/article35585.html

A lot of speculation (well except that ATI is already on board with GF, and GF has 28nm already..so 1+1=2) but it will really come down to if TSMC can get there shit together and actually compete with the GF juggernaut. Will nvidia stay if TSMC gives them great prices? If the do stay with TSMC are they doomed to be second tier from here on out? I know this much the next gen could really hurt Nvidia as I think ATI has an entirely different process, different goals and a big head start on the next gen. Pressure from the current Intel war isn't going to help them either.
 
Last edited:
I know this much the next gen could really hurt Nvidia as I think ATI has an entirely different process, different goals and a big head start on the next gen.

The head start must be keeping them awake at night. Hopefully they created different teams with different priorities.

I think that even if TSMC gives nVidia good prices, why should they stay? If GF is becoming second only to Intel it's a no-brainer. nVidia has to bolt.
 
AMD's experience with manufacturing process played a vital role with the 5 series. Nvidia will learn from this and should rebound for Fermi 2. I also thought 32nm is reserved for CPUs and all GPU's will jump straight to 28nm.

Beginning of next year will make up for the Fermi saga and we can hopefully see a real battle between 6 series and Fermi 2.
 
The head start must be keeping them awake at night. Hopefully they created different teams with different priorities.

Of course they did. That's how both sides work. ATI was well into working on the 5xxx series before the 4xxx series even launched. GPUs take years to engineer. Fermi 3 is already well under way by now.

The real question is when ATI jumps to GF for the wafers (is there any doubt?) how much further can they pull ahead? GF is becoming a monster of a Fab with capabilities only second to Intel at this point. I have little doubt ATI has a 28nm part in less than a year maybe not on shelves, but likely being polished to be released.

Links for fun reads

http://www.globalfoundries.com/
http://www.brightsideofnews.com/new...-28nm-launch-customer-at-globalfoundries.aspx
http://www.dvhardware.net/article35585.html

A lot of speculation (well except that ATI is already on board with GF, and GF has 28nm already..so 1+1=2) but it will really come down to if TSMC can get there shit together and actually compete with the GF juggernaut. Will nvidia stay if TSMC gives them great prices? If the do stay with TSMC are they doomed to be second tier from here on out? I know this much the next gen could really hurt Nvidia as I think ATI has an entirely different process, different goals and a big head start on the next gen. Pressure from the current Intel war isn't going to help them either.

Well, GF was AMD's manufacturing division before it spun off into its own company, so its no surprise its second only to Intel. AMD's manufacturing lagged behind Intel, but it was still no slouch compared to everyone else. Wouldn't be surprising at all for ATI to switch to GF - ATI is owned by AMD after all and TMSC isn't keeping up with ATI's needs very well.
 
What the hell? Let's see how Fermi performs with a respin before calling its architecture a failure. I get the sense that TSMC's screwed-up 40 nm process is the root of GTX 480's problems. And ATI has its work cut out for it in terms of catching up to Fermi's GPGPU-friendly, modular design.
 
We'll see ATI's new architecture this year also. I'm sure they will have a Fermi card in little pieces in their R&D department very soon or now.
 
We'll see ATI's new architecture this year also. I'm sure they will have a Fermi card in little pieces in their R&D department very soon or now.

They do, and I hear that once the laughter died down they found a real use for it.
They removed the stock cooler and mounted a saucer shaped plate on it. I hear it scrambles eggs quite nicely. But at only 200 degree F it takes forever to cook a burger.
 
What the hell? Let's see how Fermi performs with a respin before calling its architecture a failure.

This thread isn't calling Fermi architecture a failure - I started this thread simply saying it really looks like they are about to hit the wall of complexity/physics.

I get the sense that TSMC's screwed-up 40 nm process is the root of GTX 480's problems. And ATI has its work cut out for it in terms of catching up to Fermi's GPGPU-friendly, modular design.

If their 40nm process if giving them issues how bad will a 32 or 28 nm process be? Getting smaller is generally more difficult, not easier. Plus first run production on a new process is always going to turn out more defective parts than a mature process.
 
What the hell? Let's see how Fermi performs with a respin before calling its architecture a failure. I get the sense that TSMC's screwed-up 40 nm process is the root of GTX 480's problems. And ATI has its work cut out for it in terms of catching up to Fermi's GPGPU-friendly, modular design.

Except some of Fermi's yield issues could very well be architectural problems that a respin can't fix. Nvidia has also stated that there won't be a Fermi respin.

Here is a really good read: http://www.anandtech.com/video/showdoc.aspx?i=3740&p=7
 
Can't wait to read it soon as Anandtech gets their malware attack sorted out. :)
 
How impossible would it be for Nv to create separate cores for GPGPU and gaming? Is it that much money in it that it's worth the trouble? Is it just impossible for them with their current manpower and patents/licenses to make something AMD/ATI sized?
 
Back
Top