RTX 3xxx performance speculation

There's an argument for spreading out the heat generation a bit; supposing everything else is lined up properly, see AMD failing at this with their first HDM implementations, cooling is perhaps both less complicated and more efficient. AMD has also shown this with the Ryzen 3000 (Zen 2) series with a large cache/uncore die and up to two eight-core CPU dies in a pachage.

Current 10th gen Intel desktops, using significantly more power, still run at cooler temperatures than Ryzens, using less power with the same cooler. So I am not seeing this advantage. Here the OC 10900K is consuming ~70 Watts more than the Ryzen 9, and is running few degrees cooler. Single or multi-chip is near irrelevant difference in cooling.
Temperature-Testing.png.webp
 
Last edited:
There's an argument for spreading out the heat generation a bit; supposing everything else is lined up properly, see AMD failing at this with their first HDM implementations, cooling is perhaps both less complicated and more efficient. AMD has also shown this with the Ryzen 3000 (Zen 2) series with a large cache/uncore die and up to two eight-core CPU dies in a pachage.

Years ago, when Kirk was the main man planning NVIDIA's architecture I watched a webcast about the G80.
In that talk, an engineer from NVIDIA talked about what consumes power in a chip:
Compute = cheap.
Moving data around = EXPENSIVE.

Moving data between chip(let)s is the worst case...the notion about a compute node (involved in rendering frames) would be separate from the main GPU not only goes against NVIDIA's way of designing things, but also against industry knowledge about data-transfer.

It is indeed Silly Season once again *sigh*
 
Current 10th gen Intel desktops, using significantly more power, still run at cooler temperatures than Ryzens, using less power with the same cooler. So I am not seeing this advantage. Here the OC 10900K is consuming ~70 Watts more than the Ryzen 9, and is running few degrees cooler. Single or multi-chip is near irrelevant difference in cooling.

You have to also consider that Intel is physically shaving down the dies to reduce the thermal resistance of these chips, as well as the fact that they're quite a bit less dense. It's not simply a "chiplet vs monolithic" comparison.
 
It's not simply a "chiplet vs monolithic" comparison.

It never will be that simple, the other factors will always swamp the single, vs multi-chip with same power, because those differences are inconsequential.

There is no real case for their being a significant difference between one chip and two under the same heat-spreader. 120 W of cooling is still needed to cool 1-120W die or 2-60W dies.

Furthermore, splitting die increases the actual power dissipation, because off chip communications, use more power than on chip.
 
Here the OC 10900K is consuming ~70 Watts more than the Ryzen 9, and is running few degrees cooler. Single or multi-chip is near irrelevant difference in cooling.
Temperature-Testing.png.webp
There's... a lot of assumptions being made here in order to make a comparison. First being that there is no way to verify that temperatures are being recorded in the same way, which is highlighted by the increased power draw of the Intel CPU with its monolithic die, but with a lower reported temperature. Temperature, at least as reported by CPUs, is an innaccurate method of comparing heat generation and removal between architectures, unfortunately. Then you add in the different packaging styles...

The main point with chiplets is that there's a physical separation of 'heat centers'. The chiplet approach spreads the main energy consumers apart, enlarging the package overall, and thus enlarging the size of the needed heatspreader and enlarging the contact area for a cooling soluion. Obviously there's now some interconnect power needed where it wasn't before, but when using a silicon interposer this is minimized relative to spreading the the dies about a PCB as AMD is currently doing with Ryzen.
 
  • Like
Reactions: noko
like this
Compute = cheap.
Moving data around = EXPENSIVE.

Moving data between chip(let)s is the worst case...the notion about a compute node (involved in rendering frames) would be separate from the main GPU not only goes against NVIDIA's way of designing things, but also against industry knowledge about data-transfer.
That's true whether you're inside the GPU die or trying to pipe stuff around a datacenter, yes. I don't really see a means for a separate RT-focused part to be efficiently integrated with an RTX GPU. Nothing stopping them from trying, of course, and I believe that's really all there is to substantiate an argument for such an approach, but I agree that it's just not very likely for interframe rendering. Much more likely (and yet still highly unlikely) would be some form of multi-GPU arrangement where the second die consists of most of the compute blocks found on the main GPU and works in parallel on the same data.
 
The main point with chiplets is that there's a physical separation of 'heat centers'. The chiplet approach spreads the main energy consumers apart, enlarging the package overall, and thus enlarging the size of the needed heatspreader and enlarging the contact area for a cooling soluion. Obviously there's now some interconnect power needed where it wasn't before, but when using a silicon interposer this is minimized relative to spreading the the dies about a PCB as AMD is currently doing with Ryzen.

What evidence do you have that having two dies under the same heat spreader is better than one? Also AFAIK AMD doesn't use silicon interposer for Ryzen.

Also when interposers are used, they typically butt chips directly again each other, meaning there is no significant separation.

MCM is all about yields, and pretty much nothing else.
 
Last edited:
What evidence do you have that having two dies under the same heat spreader is better than one? Also AFAIK AMD doesn't use silicon interposer for Ryzen.

Also when interposers are used, they typically butt chips directly again each other, meaning there is no significant separation.

MCM is all about yields, and pretty much nothing else.
Yes it is about yields but also market combination of products, like in Zen using a basic building block from low end to Data Centers with Zen designs. Maybe a type of lego but very effective in not having to have multiple different chip designs, fab production and so on. Which in itself also increases yields.
 
Last edited:
NVIDIA is going chiplet with Hopper (next generation), not Ampere.
Years ago, when Kirk was the main man planning NVIDIA's architecture I watched a webcast about the G80.
In that talk, an engineer from NVIDIA talked about what consumes power in a chip:
Compute = cheap.
Moving data around = EXPENSIVE.

Moving data between chip(let)s is the worst case...the notion about a compute node (involved in rendering frames) would be separate from the main GPU not only goes against NVIDIA's way of designing things, but also against industry knowledge about data-transfer.

It is indeed Silly Season once again *sigh*
WAT???

Can you please explain Nvidia going chiplet with Hopper (Rumor) to that it goes against NVIDIA's way of designing things?

"It is indeed Silly Season once again *sigh*" is fitting.
 
What evidence do you have that having two dies under the same heat spreader is better than one? Also AFAIK AMD doesn't use silicon interposer for Ryzen.
It's better if the solution is unworkable without it. In this case, it's just different. However, there is an argument to be made that pushing the power consumers apart allows for a greater contact surface and eases cooling (at the cost of a larger heatspreader). That's what I'm getting at, whether on interposer or just a tightly-coupled package like Zen 2.
Also when interposers are used, they typically butt chips directly again each other, meaning there is no significant separation.
Perhaps; if the chiplet arrangement is simply the result of a monilithic design being 'chopped' into smaller dies with interconnects added through the interposer, then there's likely very little tangible benefit and perhaps even an increase in cooling needed.
MCM is all about yields, and pretty much nothing else.
Well, yes. Whether that's what AMD is doing with Zen, where a monilithic version would still be manufacturable just at greater difficulty and cost, or if it's a product that would simply be too large to manufacture (i.e. impossible, so yields would otherwise be zero), which is where we expect Nvidia to be heading soon.
 
  • Like
Reactions: noko
like this
It's better if the solution is unworkable without it. In this case, it's just different. However, there is an argument to be made that pushing the power consumers apart allows for a greater contact surface and eases cooling (at the cost of a larger heatspreader). That's what I'm getting at, whether on interposer or just a tightly-coupled package like Zen 2.

Perhaps; if the chiplet arrangement is simply the result of a monilithic design being 'chopped' into smaller dies with interconnects added through the interposer, then there's likely very little tangible benefit and perhaps even an increase in cooling needed.

Well, yes. Whether that's what AMD is doing with Zen, where a monilithic version would still be manufacturable just at greater difficulty and cost, or if it's a product that would simply be too large to manufacture (i.e. impossible, so yields would otherwise be zero), which is where we expect Nvidia to be heading soon.
It is hard to to compare temperature with two different nodes, fab process and density of the chips. It is meaningless in other words, too many other factors to consider. If one had the same node, design of chip with a big chip vs two smaller chiplets with the same power envelope then yeah that would be good grounds to determine temperature wise if a monolithic chip for temperature is better than breaking it up into chiplets. In the end it doesn't matter as long as it works as expected. Temperature has little bearing as long as it works.

AMD one design chiplet that can be combined, one design, one test cycle, one round of revisions and then it goes in desktop, HEDT and data centers/Servers - Intel has a separate chip, all the R&D, Testing, Revisions . . . Then repeat for HEDT chips, then repeat for desktop chips which in itself has more monolithic designs. AMD has basically two designs, APU and Chiplet which APUs maybe chiplets in the future. Well then of course Xbox and PS5 designs which are joint ventures.

Look at this way, Zen 3 will update from the low end all the way to HPC/Data centers/Servers virtually overnight while Intel would have to do each segment separately. Speed of implementation, cost to design and produce all dramatically reduced including much better yields and lower costs at the fabs since only one design vice many.
 
Last edited:
Well, yes. Whether that's what AMD is doing with Zen, where a monilithic version would still be manufacturable just at greater difficulty and cost, or if it's a product that would simply be too large to manufacture (i.e. impossible, so yields would otherwise be zero), which is where we expect Nvidia to be heading soon.

I don't think we are there that soon, and NVidia has shown they are willing to go very big on GPU dies.

GPU MCM aslo has a lot more negative trade-offs than CPU MCM. We have had years of multi-socket CPU without much issue, but multi-card or even multi-GPU chip on the same card are stuck with needing individual, and exclusive memory pools for each chip and some kind of CF/SLI software kludge.

It could happen with Hopper in as rumored, but the MCM part of that rumor, might just be for Data Center GPUs, where MCM is more viable.

Even if it we get real MCM Gaming GPUs (shared memory pool, no CF/SLI SW), I wouldn't expect a big drop in the price of GPUs.
 
I don't think we are there that soon, and NVidia has shown they are willing to go very big on GPU dies.

This is true. Not sure what the 'limit' is, but if they plan to exceed it, they'll need to break the GPU apart.

GPU MCM aslo has a lot more negative trade-offs than CPU MCM. We have had years of multi-socket CPU without much issue, but multi-card or even multi-GPU chip on the same card are stuck with needing individual, and exclusive memory pools for each chip and some kind of CF/SLI software kludge.
These are... incomparable. CPUs are relatively low bandwidth, latency-sensitive devices, and GPUs are quite the opposite. As seen on modern CPUs, keeping multiple cores going requires fucktons of cache (Imperial fucktons, not the new-age metric ones). GPUs make up for this by not having such random workloads, such that bandwidth latency may be accounted for. CPUs can't do that; see Zen and Zen+. Zen2 is little more than Zen with enough cache and fewer barriers.

Further, we're not talking about SLI/CF or some other form of linking two complete GPUs. We're talking about splitting GPU work up between dies, in hardware, with software transparency (the driver will probably need to be aware to do some data massaging to account for developer laziness / stupidity, as always). Where external multi-GPU solutions could provide a full 2x performance jump with proper optimization through the software stack, a hardware split could just run as a monolithic die with the same resources would without developer tuning for most common usecases (of which games are).

It could happen with Hopper in as rumored, but the MCM part of that rumor, might just be for Data Center GPUs, where MCM is more viable.
Cost is a pretty big factor here, so you're probably right. Nvidia will likely target compute SKUs first just to hide any BOM increases due to production problems behind the higher MSRPs that compute SKUs command for the first go around.

Even if it we get real MCM Gaming GPUs (shared memory pool, no CF/SLI SW), I wouldn't expect a big drop in the price of GPUs.
Oh no, I don't expect pricing to go down. I expect it to go up!

But I also expect the performance ceiling to go up too.
 
These are... incomparable. CPUs are relatively low bandwidth, latency-sensitive devices, and GPUs are quite the opposite. As seen on modern CPUs, keeping multiple cores going requires fucktons of cache (Imperial fucktons, not the new-age metric ones). GPUs make up for this by not having such random workloads, such that bandwidth latency may be accounted for. CPUs can't do that; see Zen and Zen+. Zen2 is little more than Zen with enough cache and fewer barriers.

Further, we're not talking about SLI/CF or some other form of linking two complete GPUs. We're talking about splitting GPU work up between dies, in hardware, with software transparency (the driver will probably need to be aware to do some data massaging to account for developer laziness / stupidity, as always). Where external multi-GPU solutions could provide a full 2x performance jump with proper optimization through the software stack, a hardware split could just run as a monolithic die with the same resources would without developer tuning for most common usecases (of which games are).

Workload determines latency sensitivity. In the NVidia MCM GPU paper, they tested a variety of compute workloads, and saw varying impacts from latency, I expect greater latency sensitivity from gaming workloads. I also expect Ray Tracing will increase latency sensitivity, since Rays can bounce anywhere on the scene, so you need fast access to everything in the scene (you can see RT drastically increases memory usage), not just a little local slice.

Cost is a pretty big factor here, so you're probably right. Nvidia will likely target compute SKUs first just to hide any BOM increases due to production problems behind the higher MSRPs that compute SKUs command for the first go around.

At the very least Data center will be first, Ampere Data Center is monolithic, so IMO zero chance gaming Ampere will be MCM.

If Hopper Data Center is monolithic, then the same applies. But even if Hopper Data Center is MCM, then there is still no guarantee that Hopper gaming is MCM, but still no guarantee.

Data Center work won't be as latency sensitive. In general this is work that is easily split across multiple cards, without issue, and it also isn't real-time, so again the kind of load that won't suffer from a bit more MCM latency.
 
Another factor is TSMC rapid success not only 7nm but 5nm and looks like even smaller nodes which was not predictable several years back when Nvidia and AMD was seriously looking at MCM options. Except price constraints may still win out with MCM.
 
Katcorgi Twitter, source of much of the Ampere rumors has minor update again:
https://twitter.com/KkatCorgi/status/1273889616282521603

2nd Gen NVIDIA TITAN
GA102-400-A1 5376 24GB 17Gbps

GeForce RTX 3090
GA102-300-A1 5248 12GB 21Gbps

GeForce RTX 3080
GA102-200-Kx-A1 4352 10GB 19Gbps

It makes more sense that the 24GB card is a Titan, but at such a huge drop in memory speed, it would likely perform worse.

But I am very skeptical of the 21 Gbps memory on the 3090.
 
But I am very skeptical of the 21 Gbps memory on the 3090.

I suppose there's nothing stopping Nvidia from using non-standard memory speeds but at what cost? Surely if you could run GDDR6 at 20Gbps+ with reasonable power consumption and yields we would have heard something from JEDEC and the memory manufacturers by now. I would keep this in the nonsense rumor bucket.
 
I suppose there's nothing stopping Nvidia from using non-standard memory speeds but at what cost? Surely if you could run GDDR6 at 20Gbps+ with reasonable power consumption and yields we would have heard something from JEDEC and the memory manufacturers by now. I would keep this in the nonsense rumor bucket.

AFAIK, JEDEC only specifies up to 16 Gbps, and only Samsung has announced anything faster at 18 Gbps. NVidia OCing to 21 Gbps seems like quite a stretch. I don't think they have a history of doing this. They need their products to be reliable.
 
Well if the rumors of GA102 3080 is true, this could be a $699 3080 that's 30% faster than 2080 Ti, and you could have a $999 3090 thats 45% faster. Maybe with $799/$1199 launch prices again for thr first year. Full uncut Titan for $2499 with 5376 cuda.

This would more or less give you the same gap as 2080 Super vs 2080 Ti.
 
  • Like
Reactions: Elios
like this
According to that article, 2080Ti Kingpin is 33% faster than 2080Ti FE.
What? Am I reading that wrong? That's like, one generation leap.
 

48% faster with 28% more power draw compared to the a 2080ti.

29% boost clock (1700 -> 2200?) 24% more cores it should be 60% faster with 0% increase to IPC.....
10% boost clock (2000 -> 2200) 24% more cores it'd be 36% faster; with a 8% increase to IPC it matches the numbers on the chart.

Not out of the realm of possibility they have a 8% IPC increase, slightly higher mhz, and more cores...
 
627mm2 is the new rumor for GA102. With 25% less density on 8nm Samsung vs 7nm TSMC it would be about 501mm2 on 7nm. Big Navi will be 505mm2 on 7nm TSMC. So it sounds like next gen flagships will both come in about the same size. Reminds me of the Vega 64 vs 980 Ti days.
 
This guy seems to enjoy it ;)



Kinda off topic but this guy was great and my son was always excited to show me vids of some of the cool projects he would do. It's too bad the guy died unexpectedly last year.
 
627mm2 is the new rumor for GA102. With 25% less density on 8nm Samsung vs 7nm TSMC it would be about 501mm2 on 7nm. Big Navi will be 505mm2 on 7nm TSMC. So it sounds like next gen flagships will both come in about the same size. Reminds me of the Vega 64 vs 980 Ti days.
Is this a new rumor I haven't heard about? We had actual industry insiders back in May say that low-end products will be using 8LPP or 8LPU while mid- to high-end products will be using N7P.
 
Back
Top