Chiplets Are the Future, but They Won’t Replace Moore's Law

Megalith

24-bit/48kHz
Staff member
Joined
Aug 20, 2006
Messages
13,000
At its recent New Horizons event, AMD’s EPYC demonstration proved the company had fully embraced the chiplet concept, having unveiled a processor with eight 7nm chiplets surrounding a single 14nm die for I/O and logic. While this is a sensible way forward for AMD, especially in terms of yield and cost, ExtremeTech argues it isn’t the breakthrough many say it is and calls it a step backward, as recent advancements in computing technology have stemmed from integration of functionality, not segregation.

We’ve reached the point where it simply makes little sense to continue scaling down wire sizes. Every time we perform a node shrink, the number of designs that benefit from the shift are smaller. AMD’s argument with Rome is that it can get better performance scaling and reduce performance variability by moving all of its I/O to this single block, and that may well be true — but it’s also an acknowledgment that the old trend of assuming that process node shrinks were just unilaterally good for the industry is well and truly at an end.
 
Stacking was suppose to be the answer. But cpu's are too hot to apply this to. Intel will probably copy amd anyway and both will be doing chiplets
 
Sure, if harking back to the 1990s for trends, lol. Regardless of how negatively charged the terms "reverse" and "segregation" are, it is exactly that what we need. While true, segregating responsibility based on process scale is a step backward opposing recent trends, it's also a step in the right direction. Especially as more and more memory is integrated directly on die. Memory-centric architecture has been a pipe dream for too long, and we are finally starting to see this "trend" in the right direction.
 
Moore's Law meets Rock's Law and no one can afford better lawyers than TSMC.
 
I read the ExtremeTech article, which seems both circular and gaseous. "Node scaling isn't working like it used to, so companies are doing things like chiplets, which can't fix scaling, which will continue to not work like it used to."

There - saved you all the trouble. I can't see why it got any links.
 
Tech is circular. As new innovations are made, sometimes an old way of doing things becomes great again. Progress is progress no matter how it's done.

I like to think of it as that we end up looking at the old way with fresh eyes, notice something we hadn't seen the first time around, and then we find out it's actually useful, and end up making a better implementation.
 
This is also the writing on the wall for the end of x86-64.
The IPC gains are so minimal through die shrinks any more (not to mention the extreme cost and difficulty of doing so) that the only true performance gains we are seeing at this point are more cores and better SMP scaling and software optimization.
Hardware-level and API-level offloads to the GPU and other ASICs are great, but even that is nearing its zenith for optimization and functionality for general-purpose computing.

A paradigm shift is upon us, and whether that be a new CPU ISA or offloading to data and processing centers, or some other unseen variable, will soon be seen within the next decade and perhaps even the next few years.
 
It's a decent article, I like these guys, and have read them since they were a text page.

but the article been dumbed down so make it understandable to more people, and doesn't explain at all, really.

To fully explain the issues would take a different audience. :)

So, here we go. :D


Simply put, the smaller a transistor gets, the more it leaks current.

Voltage effects:
The Gate Bias that turns the transistor off has to decrease, or electrons tunnel straight thru the silicon; it's voltage/thickness dependent. 100% of this is excess heat and current; it's not useful.

A lower voltage isn't as effective at turning them off, so leakage thru the channel increases. (A partially off transistor is in the linear conduction mode; great for an audio amp, bad for a switch.)

Capacitance:
The gate overlap has to be smaller, or the capacitance is bigger, so the operating frequency is slower. (This isn't too bad to skim; you don't want to see the final exam: http://bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f10/Lectures/Lecture11-MOS_Cap_Delay-6up.pdf)

The faster the switching frequency, the more current goes into charging/discharging the gate capacitance. It's directly proportional, loss=Capacitance *constant * Freq.

Heat:
Everything is closer together, so getting the Heat out of the chip is harder, and raising the temperature increases the rate of both tunnelling and leakage. (Heat is lattice vibration, energetic electrons are easier to break loose; ionization causes vaporization)

And makes the gates grow, if they're metal, but polysilicon gates help that. but they're less conductive than aluminum or copper, so that's another set of issues. (Electrons take more time to diffuse across the gate; gate turn off is electrostatic.)


Those effects combined are why that when we get to a maximum overclock, it starts going exponential with respect to heat and current draw; that's the Wall we're seeing right now.

When the graph starts to go vertical, you're done. We're done way befre that usually, you see the wall just before it dies. :)


Smaller chiplets can be more thoroughly designed to "Do what they do best"; no single architecture does everything well.

Vrm transistors are not at all like logic transistors; they want Area, and Max thermal transfer, as do pin drivers and such.

A 300A driver transistor based on 7nm tech would be stupid, and someone would be fired for suggesting such a shrink.

Driving an output pin is a bitch, and if it's not a perfect transmission line, it's very iffy. No PCB trace is ever perfect. These losses are proportional to capacitance, as above.

That's why we see quad clocked serial transmission; parallel address/data lines died 20 years ago.

Putting an Oscope on a data line will crash any modern computer instantly. :)


Logic transistors want to be as close as possible; the same gate structure can switch several transistors, and designs are interleaved to an extreme degree.

Memory cells are huge in comparison, but they need to be low leakage, or you get memory errors.

But Cache ram is a must if your internal and external buses are different speeds.

The biggest delay in a modern processor is a cache miss and pipeline flush; it can be microseconds. More cache means more hits, less misses.

This is where Branch prediction and speculative execution come in, with Spectre, and Melptdown, and...



That's why we have multiple cores; doing Anything Else means Something Happens while you fetch from main memory.

Otherwise, you have the single thread Prescott thing, where you just wait. (long pipelines + heat killed it as a product)



I took the points they were making, and tried to explain them a little better, but I'd bet I just pissed off everyone here.

Sorry; it's what I do. :)
 
.....

I took the points they were making, and tried to explain them a little better, but I'd bet I just pissed off everyone here.

Sorry; it's what I do. :)

Thanks for the better explanation. Enjoyed it. And don't be sorry. Pissing off everyone [H] gets you instant [H] points redeemable at the Microsoft Store. In 2028. :D
 
To me it's pretty simple:
* Node shrink is usually used to improve the performance of a core. Higher clock speed and less power consumption is the assumed benefits. That is now at a stage where the returns are diminishing.
* (Further) node shrink is even less beneficial to the other parts of a CPU.
* Chiplets is the clear way forward for adding more cores to a CPU (instead of having them all integrated on the same chip).

Youtuber AdoredTV has a two-part series where he investigate and explain what chiplet design is and how/ why/ when it's beneficial.

Tech is circular. ...
True! Just look at racing bicycle frames: Started out made of steel tubing, went through aluminium and carbon fiber only now to be back to steel (of higher quality, allowing it to be made lighter than CF).
 
We learned that in the 70's on bicycle frames; a buddy ruined a mongoose frame to where it was flexible. :)

Steel doesn't accumulate defects like aluminum or CF; if you anneal a steel frame after it's welded, and temper it properly, it can last forever.

We hit the wall with Si at about the 22nm level; it's been diminishing returns ever since.

But making a 2"x2" chip is not really possible, there would be too many defects in a chip that big.

Clean rooms are amazing, but the defects per inch of silicon haven't changed a lot since the 2000's, and layer counts have exploded.

It only takes 1 defect on any layer to ruin the whole chip...
 
This is also the writing on the wall for the end of x86-64.
The IPC gains are so minimal through die shrinks any more (not to mention the extreme cost and difficulty of doing so) that the only true performance gains we are seeing at this point are more cores and better SMP scaling and software optimization.
Hardware-level and API-level offloads to the GPU and other ASICs are great, but even that is nearing its zenith for optimization and functionality for general-purpose computing.

A paradigm shift is upon us, and whether that be a new CPU ISA or offloading to data and processing centers, or some other unseen variable, will soon be seen within the next decade and perhaps even the next few years.

I'm not an expert by any meassure but I'm pretty sure that most non-gaming software can profit heavily from just throwing more cores at it.

The other thing is when it comes to gaming, how much more singlethreaded performance do we need?
When will GPUs start to level out?
Yes you might see some issues if you're rocking a 200+Hz display driven by a RTX 2080 Ti while playing Diablo 2 at 480p.
I'm just not seeing the real world performance issues yet, a Ryzen 2700/ Core i9 9900k has more than enough juice to drive ~144 FPS in low settings with some room to spare for a potential 3080/4080.

Maybe we'll even start seeing some some improvements to multithreading in games/engines and IPC and frequency start to matter less than today.
 
I'm not an expert by any meassure but I'm pretty sure that most non-gaming software can profit heavily from just throwing more cores at it.
As long as the software isn't single-threaded and is programmed to utilize more than 4 cores/threads, then yes, adding more cores will net the system performance.
However, if only 4 cores/threads are used, or if the application is indeed single-threaded, as a majority still are, only IPC or clock frequency increases will improve performance - both of which are nearly tapped out at this point.

Where more cores (threads) really help is with video/audio rendering software and HPC - these are not really "general purpose" applications, though.
As far as games go, from what I have seen at least, ~8 threads still seem to be the upper limit on most titles, not counting RTS games.

Hopefully this is enough and will be enough for the next decade, at least.
 
Chiplets remind me of when they moved from CPU sockets to slots with the PII slot 1 and Athlon slot A.
Back then it was to add additional external cache, and by the next generation everything was back to a CPU socket.
 
Back
Top