Spitballing some Maxwell napkin math for 20nm!

GoldenTiger · Feb 15, 2014

I'm gonna go off the deep end here with a detailed guesswork at what we might see with GM200 at 20nm. THE RESULTS AND MATH IS THEORETICAL ONLY; THIS IS NOT BASED ON ANY SECRETLY-KNOWN INFO FOR UNRELEASED PRODUCTS. So rumor sites, don't post this up as anything if it's even half-sensible like I think. Feel free to use for a base of your own speculations the idea here, of course.

This slide says the case for Maxwell's first generation is reportedly, for the same performance level we should be seeing half the wattage used even on 28nm, and that per-core we are going to see a 35% gain (by their claims, at this point of course). So a card that would take Kepler 300w can be done in a 150w or less envelope is their claim, on 28nm even. That leaves a huge amount of extra power to continue stuffing in more cores and increasing clocks. Now add in that the Big Maxwell is going to be (GM200/204) on 20nm, and that improves power efficiency and die size usable even further. Now add in that this is Maxwell FIRST generation, for the GM107, and that GM200/204 are SECOND generation by all known info.... and we have a recipe for easily doubling the performance at least, if their claims are anywhere near true.

So, time for some math based on the theoretical and what is a real product launching in a couple of days (GTX 750 Ti).

GM107 has 5 SMM units containing 128 cuda cores, in each GPC. One GPC is what GM107 is using. It uses 60 watts for performance that is only beaten out by a GTX 660 by about 12% per the leaked benches. A GTX 660 has 960 Kepler cores, while a GM107 has 640 Maxwell cores. A GTX 660 uses a TDP rating of 140 watts. See an interesting number here?

960 is 50% more cores than the 640 ones Maxwell is using for the 750Ti. Now, it is around 12% slower there.... see the magic number close by? Nvidia claims a 35% performance increase PER CORE and additionally that the cores there will use only around half the power overall in TDP. Now extrapolate on some napkin math: the GM107 only has a 128-bit bus. That means the bandwidth efficiency is greatly improved because a GTX 650 Ti Boost is a huge amount above a GTX 650 Ti (192-bit vs. 128-bit bus widths and basically the same otherwise).

So it's safe to say that with 50% more bandwidth and 50% more cores, a GTX 660 only performing ~12% higher than a Maxwell part with slightly higher clocks is incredibly impressive. Now let's scale! We know at 28nm lithography that GM107 is around 148 square millimeters for the die size. Let's COMPLETELY IGNORE 20NM for a second here on the power savings and size! Forget about it for a minute. It would be extremely easy to see, since even on 28nm the reticule size is around 570mm2, that they could use 3 GPC units on 28nm taking around 420-430mm2 with this imaginary chip that would have 15 SMM units. 15 SMM units times 128 per unit would mean 1920 Maxwell cores.

Pretend their scheduler is great and the performance scales well with core count and clocks, and that they kept the idea of triple everything there in this hypothetical, non-existent card that is an illustration only. So we'd have a 384 bit bus with 1920 Maxwell cores, probably 7ghz memory speed of GDDR5 like Kepler does at least, and a TDP that fits inside of 200 watts. Now let's say that you only get about 75% scaling from core count here, which is reasonable even though Kepler scales pretty linearly, but it's a new architecture with Maxwell, so let's make the safe assumption. So a GTX 660 performs 12% better than a GM107 with 640 cores. Triple the core count there with our rough napkin math again with everything else and you would have a card performing around the same as GK110 fully unlocked by that theory-crafting, at least, and it has better potential for higher clocks thanks to the lower power usage.

However, in reality, we know they are going 20nm. This allows for even more power savings. This also allows for many more transistors per square mm on the die. So pretend they want to go for a 520mm2 chip on 20nm, keep costs down a tad for Big Maxwell and improve yields per wafer. According to released documents such as this: http://www.cadence.com/rl/Resources/overview/20nm_qa.pdf we can expect to see transistor counts possible of 8-12 billion. GK110 is 7.1 billion transistors. Using that as a point of reference, let's scale 28nm GK110 to 28nm Maxwell GM107: we need less memory controllers and pad space, so we can safely come up with a number in the neighborhood, considering the 148mm2 die size compared to GK110's 551mm2 size. At 20nm, you will be able to fit upwards of 11-12 billion transistors for a high-end part. For 28nm the 148mm2 die size indicates roughly 1.7-1.8 billion transistors with the 128-bit bus.

So now we have a decent number here: we know that Maxwell at 28nm in GM107 form is taking about 2 billion transistors to perform at a level about 89% as fast as a GK106. Let's use this as a base for the next part of this thought exercise

.

So 2 billion.... it's safe to say they could fit 7 GPC's at 20nm easily since 20nm should provide roughly a 2x density shrink in die size used per transistor, very easily, and we wouldn't need to duplicate memory controllers beyond 3x of the 28nm design's if we went for a 384-bit bus. At 20nm let's say they went for a 384-bit bus, to feed 7 GPC's worth of cores since the architecture is more bandwidth-efficient than Kepler, clearly by far. That'd give us 4480 Maxwell cores which already are much more bandwidth-efficient, so it probably would be plenty well-fed on that end of things.

Power-wise we'd be looking at an envelope of, 250 watts since Maxwell GM107 at 28nm takes a full-card power of 60w for 640 cores. That means probably around 45w for the GPU itself, allowing 10w for the GDDR5 and 5w for the fan and other circuitry. The move to 20nm will improve power efficiency drastically, due to the shorter gate lengths. So it's fair to say they could fit seven of those inside of that envelope, more than easily.

Again, let's go for a linear scaling factor of approximately 75% for the core count improvements... so we have a GTX 660 GK106 card we will use and compare to a GK110 780 Ti. A GTX 660 at 1080p is able to pull about 50% of the performance of a fully-unlocked, 780 Ti (source: http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_780_Ti/27.html).

Now we know each SMM will provide around 90% of that performance, and revision 2.0 (second gen) on 20nm will probably be closer to a 100% figure. So let's conservatively say each SMM-based GPC results in a real-world performance of a GTX 660. In other words, around half of a 780 Ti. Now let's conservatively also say we only see the benefit of 75% of the cores when scaling it to 7 GPC's or 35 SMM's. 35 SMM's would be, as you recall 4480 maxwell cores. Multiply the performance of the 50% card by 7 and we'd have 350% (or 2.5x faster than) of the performance of a full GK110. However, let's now apply the 75% rough rule and we come up with a much more reasonable 262.5% of the performance, or 2.6x as fast as (1.6x faster than) a GTX 780 Ti.

My predictions, therefore, are that we will see a Big Maxwell on 20nm with at minimum twice the performance of a GTX 780 Ti, and by current rumors it is due this year. Add in that Maxwell will be able to clock higher at 20nm (I based those power figures off of the numbers above which were of a card at 28nm with a 1085mhz GPU clock, and realistically they can probably fit a stock core speed of 1100-1150mhz of this big a chip in then. Thus my predictions are we would see a 450-460mm2 die size for this hypothetical card, at 20nm, with a 250-260w TDP rating and thanks to the 2x transistor density, approximately 9.2b transistors.

Napkin math, for sure, but worth thinking about, eh?

TaintedSquirrel · Feb 15, 2014

My predictions, therefore, are that we will see a Big Maxwell on 20nm with at minimum twice the performance of a GTX 780 Ti

Why would Nvidia kill future profits to make a jump like that?
Something like 30% is much more reasonable and it gives them leeway for the future. There's no reason for them to put all their cards on the table.

Nvidia will do the least amount possible to match AMD while also charging an extra $100. If the R9 390 is 50% faster than the 780 Ti, then big Maxwell GTX 880 will be 60% faster.

Disclaimer: I only read your conclusion to see what kind of results you came up with. I'll read the rest of your post now. :3

jwcalla · Feb 15, 2014

Don't forget that they might be reserving some die space for some ARMv8 IP. Whether that comes in the first or second gen Maxwell or whether it appears in GeForce at all (as opposed to Tesla-only) remains uncertain.

I don't think we'll see 2x performance of a 780 Ti in the first round (800 series) but we should see a dramatic increase in "performance per watt" and nvidia has claimed likewise. The 770 can do twice as many GFLOPS as the 570, so it's not unheard of.

Milena · Feb 15, 2014

They're going to milk out Maxwell as long as possible just like they did with Kepler. I'm sure a new GPU with twice the performance of the 780ti wont come before before mid or end of 2015. +30% seems reasonable, then +50-60% with the refresh and eventually the full chip again.

LordEC911 · Feb 15, 2014

Again... like I mentioned in the other thread, you math is completely wrong. ALUs account for less than half the die size, they aren't going to be what is holding you back due to power consumption.
GM107 isn't going to have the same shader complexity, cache hierarchy and a lot of other misc little things that will all add up on GM200.

20nm is a complete unknown at this point and with the quick move to Finfet and relabeling it 16nm Finfet, you have to expect there to be some missed target from TSMC.
You are taking all around numbers of an extreme best case scenario, adding them together and doing the math. It isn't that simple.

trick0502 · Feb 15, 2014

I have a feeling nvidia will do the same thing they did with the gk. Release a gm104 as the high end and later release the gm110.

Both nvidia and amd will see large performance jump with their 20nm gpus. Also, I wouldn't count on tsmc having 20nm ready this year.

GoldenTiger · Feb 15, 2014

LordEC911 said:
Again... like I mentioned in the other thread, you math is completely wrong. ALUs account for less than half the die size, they aren't going to be what is holding you back due to power consumption.
GM107 isn't going to have the same shader complexity, cache hierarchy and a lot of other misc little things that will all add up on GM200.

20nm is a complete unknown at this point and with the quick move to Finfet and relabeling it 16nm Finfet, you have to expect there to be some missed target from TSMC.
You are taking all around numbers of an extreme best case scenario, adding them together and doing the math. It isn't that simple.

Thanks for proving my point. The mem controller and other aspects take up tons of die space, so extrapolating off of a worst case scenario results in my guess being conservative. Of course if you read what I wrote and knew about gpu arch, you wouldn't be flaming. Also I hadn't written about this in any other thread. Maybe you are mixing me up with someone else...?

jwcalla · Feb 16, 2014

trick0502 said:
I have a feeling nvidia will do the same thing they did with the gk. Release a gm104 as the high end and later release the gm110.

Isn't that what they did for Fermi? It seems kind of ordinary. It's clear that nvidia needs to get two generations out of this -- an 800 and 900 series.

They can't just roll out a new architecture every eight months.

And TSMC has been producing 20nm chips for at least a month now. They have three fabs open currently.

Semantics · Feb 16, 2014

jwcalla said:
Isn't that what they did for Fermi? It seems kind of ordinary. It's clear that nvidia needs to get two generations out of this -- an 800 and 900 series.

They can't just roll out a new architecture every eight months.

And TSMC has been producing 20nm chips for at least a month now. They have three fabs open currently.

I thought fermi was a top down release, flagship first then little variants later which is why it was rot with manufacturing issues as making the fat chip was the hard part and they needed more time to refine the process.

Liger88 · Feb 16, 2014

Semantics said:
I thought fermi was a top down release, flagship first then little variants later which is why it was rot with manufacturing issues as making the fat chip was the hard part and they needed more time to refine the process.

That and other things. Didn't help that the cooling solution was absolute shit on the first generation of Fermi chips. Thing was practically a tornado at idle, but again that does go hand in hand with the fundamental building of the chip itself.

At least we should be good for the next 3 years once 20nm does start making it out to the masses since 16nm FinFet research began at the same time 20nm did and is practically wrapped up itself. After that and where FinFet will take us, god only knows. TSMC isn't quick like Intel when it comes to node changes lately and if Intel is struggling with 14nm god help us all we may be in the final hoorah before Volta drops.

LordEC911 · Feb 16, 2014

GoldenTiger said:
Thanks for proving my point. The mem controller and other aspects take up tons of die space, so extrapolating off of a worst case scenario results in my guess being conservative. Of course if you read what I wrote and knew about gpu arch, you wouldn't be flaming. Also I hadn't written about this in any other thread. Maybe you are mixing me up with someone else...?

I did read it and it still reeks of complete silly season worthy.

So there are two different GoldenTigers? You didn't post in the "Official Nvidia Maxwell slides" thread?

limitedaccess · Feb 16, 2014

jwcalla said:
Isn't that what they did for Fermi? It seems kind of ordinary. It's clear that nvidia needs to get two generations out of this -- an 800 and 900 series.

They can't just roll out a new architecture every eight months.

And TSMC has been producing 20nm chips for at least a month now. They have three fabs open currently.

The 4xx series were not fully enabled Fermi designs, so for the 5xx series they refreshed by using fully enabled Fermi designs (with some other tweaks) as well as clockspeed increases. GF100 was released before GF104.

If they follow Kepler and use GM204 for the 800 series what will be interesting is how they address memory bandwidth. The GTX 680 matched the GTX 580 in memory bandwidth (and had more VRAM) despite the narrower bus because of a launch jump in memory clockspeeds. If the GTX 880 is a GM204 part (with the largest design being saved for 9xx) how will it compare to the GTX 780 (let alone the 780ti) in this aspect?

trick0502 · Feb 16, 2014

jwcalla said:
Isn't that what they did for Fermi? It seems kind of ordinary. It's clear that nvidia needs to get two generations out of this -- an 800 and 900 series.

They can't just roll out a new architecture every eight months.

And TSMC has been producing 20nm chips for at least a month now. They have three fabs open currently.

Gf100 was the 480, the gf104 was the 460. The respin gf110 was the 580.

Nahawand · Feb 16, 2014

To Be Honest here am I the only one that would like this kind of jump ? remember when the 8800GTX came out ? it changed the freaking game it was such a big leap that it took years for people that bought it to even replace it, it left the competition in the dust, why not make a 100% faster card than what you currently have and sell it for a profit for the years to come just go all out.

Edit: Also a card that is 2 times faster than a 780 Ti will be so dominant for people with Surround or just 4k or 1600p monitors that they will be set for a long time. tbh we are due to this big of a jump the 20%-40% is getting really old and unjustifiable same is happening with CPU's also.

limitedaccess · Feb 16, 2014

Twice the performance of the GTX 780ti with a "full" Maxwell part isn't as impressive (or as big of a jump) as you might think though. The GTX 780ti is basically a "full" Kepler, now compare it to the GTX 580 which was basically "full" Fermi. Also the assumption here is that Nvidia actually releases a fully realized Maxwell sooner rather than later (both Fermi and Kepler were later, albeit in different ways).

There's also two aspects of performance, especially as it relates to gaming particularly for scaling up resolutions, the core and the memory. Even with a 512bit bus and 7ghz GDDR5 that would only be a ~33% increase over the 780ti. By comparison the 780ti has ~75% more bandwidth than the GTX 580 (if you're wondering 8800 GTX had ~69% more than the 7900 GTX).

Will be interesting to see how this issue is tackled. GTX 750ti performance scaling could need to be looked at to see if there is any "secret sauce" (the larger cache?) that might address this. More room with GDDR5? GDDR6 introduction? Something else entirely?

If Volta actually reaches the memory bandwidth currently be touted (1 tb/s) this would give it over 100% the bandwidth of a hypothetical GPU with 7ghz memory with a 512bit bus.

Sunin · Feb 16, 2014

GoldenTiger said:
My predictions, therefore, are that we will see a Big Maxwell on 20nm with at minimum twice the performance of a GTX 780 Ti, and by current rumors it is due this year. Add in that Maxwell will be able to clock higher at 20nm (I based those power figures off of the numbers above which were of a card at 28nm with a 1085mhz GPU clock, and realistically they can probably fit a stock core speed of 1100-1150mhz of this big a chip in then. Thus my predictions are we would see a 450-460mm2 die size for this hypothetical card, at 20nm, with a 250-260w TDP rating and thanks to the 2x transistor density, approximately 9.2b transistors.

Napkin math, for sure, but worth thinking about, eh?

My post from this thread http://hardforum.com/showthread.php?t=1806578&page=3 at 3:49pm on 2/15:

So no math put to this yet? At a minimum, not assuming gains in cuda cores from going 20nm and efficiency gains that will bring from a wattage side. Here is the minimum we will see at 28nm:
3200 cuda cores (640@60w projected to a 300w spec)
That is 11.11% gain in cores! that are 35% faster or at a min we will see 50% increase in speed at 28nm. I also would expect if nvidia is smart they will put out the first 880 at 28nm unless amd comes in strong

Carry this logic to 20nm, and depending on wattage drops and such I would expect we will see 6x that of the 750, thus 3840 cuda cores or a peak gain of 33.33% in cuda cores and about 80% increase in performance this will be their 880ti refresh, they may even cripple it some so they can do a 990 refresh and offer up another 10-20% performance if again amd doesn't come in strong

Just my magic ball prediction

So I think you are being a touch over optimistic, time will tell but unless amd comes strong i see no reason for nvidia to come in at 2x performance

Spazturtle · Feb 16, 2014

Their may be other limits your are not thinking off, they may not be able to make dies that large on 20nm.

GoldenTiger · Feb 17, 2014

LordEC911 said:
I did read it and it still reeks of complete silly season worthy.

So there are two different GoldenTigers? You didn't post in the "Official Nvidia Maxwell slides" thread?

I didn't post this speculation in any other thread. I did make posts about different things in the linked one

. So, what is your theory on what will happen? I took the time to explain mine.

Lord_Exodia · Feb 18, 2014

Wow...

Stretch, yawn. I've been away for a long time.

GoldenTiger, I think your napkin math makes perfect sense. If my memory and history serves me correctly, on each new Major generation Nvidia shoots for 2x the performance of the previous generation. Sometimes they don't hit that number. This time I think they will. Your math adds up to a scenario where they have the flexibility to scale back and hit 2x the performance of 780Ti pretty easily in either 28nm or 20nm.

Realistically since they are going to try to impress with the new Maxwell architecture and AMD has been really aggressive lately I expect them to double performance this new release. however in lieu of Mantle and how it may really boost AMD performance figures on their new architecture they may go and shoot for the stars on Big Maxwell.

GoldenTiger · Feb 18, 2014

Oh wow, I haven't seen your name in a loooong time Exodia

. Thanks for the post, hope all is well!

Spitballing some Maxwell napkin math for 20nm!

Fully [H]

[H]F Junkie

2[H]4U

Limp Gawd

[H]ard|Gawd

Supreme [H]ardness

Fully [H]

2[H]4U

2[H]4U

2[H]4U

[H]ard|Gawd

Supreme [H]ardness

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

[H]ard|DCer of the Month - August 2008

[H]ard|Gawd

Fully [H]

Supreme [H]ardness

Fully [H]