TSMC Reveals Wafer-on-Wafer Chip Stacking Technology

DooKey

[H]F Junkie
Joined
Apr 25, 2001
Messages
13,548
TSMC is showing off their new Wafer-on-Wafer (WoW) chip stacking technology and it might be a boon for multichip solutions in the future. The technology allows two dies to sit on top of each other and this allows interconnects to be very short and minimizes transfer times between them. Imagine the new Intel and AMD EMIB chip utilizing this technique and you can imagine how much space it would save. I think it would be great for multi GPU processors as well since the boards wouldn't have to put the chips side by side and then run traces all over the place. Hopefully this will make it to market in the near future and work with high performance processors. The future looks bright.

Notice that this new tech is called Wafer-on-Wafer and not die-on-die, this technique stacks silicon while it is still within its original wafer, offering advantages and disadvantages.
The advantage here is that this tech can connect two wafers of dies at once. Imagine an alternative method where we connect individual dies in the same way, offering a lot less parallelisation within the manufacturing process and the possibility of higher end costs.
 
Wouldn't you end up with a lot more heat and a lot less surface area with which to cool? I might be understanding this wrong.
 
Even with a cooling system, the die size will still be limited due to the amount of thermal enegry which can be dissipated by any given cooling solution small enough to work through the die itself. Reduction of the amount of thermal energy, to be dissipated, is the best bet to get the die size larger.

Then again, a thermal dissipation layer between the dies could be a good solution. If you could achieve the thermal transfer rates of sodium, then it might allow for larger die sizes to be sandwiched together.

Neat tech though. Very neat.
 
They need to stack the wafers in a Big Little Core approach you could basically have lower voltage/clock speed wafers stacked vertically downward that produce less heat between each wafer. Just stack the faster and hottest wafers on top and slowest and coolest ones on bottom. You might also incorporate vapor chamber/heat piping between the wafers that comes in contact with the IHS. The hardest problem to get around will be the heat dissipation, but if the cores are clocked much slower and have less voltages probably a non issue think stacking a low clocked Atom under a i3/i5/i7 for example the voltage and clock speeds are lower and thus the heat output is minimized a lot, but you get some extra performance to help with background tasks and can turn off your more power hungry main CPU more routinely while having it clocked to the ceiling. They could also use clock modulation to reduce the heat dissipation impact of the lower stacked wafers. I think V-NAND was mentioned, but I believe MOSFET's are already kind of are doing this as well.
 
Last edited:
They need to stack the wafers in a Big Little Core approach you could basically have lower voltage/clock speed wafers stacked vertically downward that produce less heat between each wafer. Just stack the faster and hottest wafers on top and slowest and coolest ones on bottom. You might also incorporate vapor chamber/heat piping between the wafers that comes in contact with the IHS. The hardest problem to get around will be the heat dissipation, but if the cores are clocked much slower and have less voltages probably a non issue think stacking a low clocked Atom under a i3/i5/i7 for example the voltage and clock speeds are lower and thus the heat output is minimized a lot, but you get some extra performance to help with background tasks and can turn off your more power hungry main CPU more routinely while having it clocked to the ceiling. They could also use clock modulation to reduce the heat dissipation impact of the lower stacked wafers. I think V-NAND was mentioned, but I believe MOSFET's are already kind of are doing this as well.

You could also heatsink the backside with a mount hole on the board. That would make pin placement shoved to the edges making the package bigger but the overall die size would remain the same.
 
The cooling solution is a super easy fix. The return of the slot over the socket. Problem solved. Soldered on heat spreader on either side fan on hottest side... or simple water block both sides. Problem solved, no need for complicated bottom of the board cooling, or even super strict "bottom.TOP" (big.LITTLE) designs.
 
The cooling solution is a super easy fix. The return of the slot over the socket. Problem solved. Soldered on heat spreader on either side fan on hottest side... or simple water block both sides. Problem solved, no need for complicated bottom of the board cooling, or even super strict "bottom.TOP" (big.LITTLE) designs.

Pcb boards aren't exactly good thermal conductors. You'll have to have direct contract and that contact exposed to the air. That requires a backside hole.
 
There is no "easy" cooling solution as the PCB is more insulator than thermal conductor.
 
Interesting but fairly limted application as you need a high yield (90+% from the article) on both wafers and at least the bottom die had to run cool. Betting the lower die could be memory of some sort and the top an SOC. You're still going to have lower thermal performance for the top die vs. other multi-chip module tech.
 
I don't know if this is going to be as much of a problem as people think it is. I seem to recall this same conversation came up when we made the die shrink from sandy to ivy. You'd run into issues if you had the same overall TDP in a smaller die package, because there is less surface area to dissipate the heat. So in that example you need to make sure that the heat transfer of the die to the IHS is very good or it make become a bottleneck. However, just because you stack more cores doesn't mean it will have a heat issue. If those cores themselves are more efficient, then they generate less heat. When you factor this out as TDP vs die area, the actual thermal density remains similar. Case in point would be GPU dies. They seems to target around the same total die area with around the same TDP. They might have double the number of cores, but the actual thermal density doesn't really change. Obviously you can't stack 2 300W TDP dies one on top of each other because you will end up with 600W TDP within the same surface area, but if you stacked 2 150W TDP dies of the same die size as a 300W design, the amount of dissipation required is basically the same.

The thing to keep in mind is that if you push in height, that doesn't mean that you can just crank up the power. If you go up in height you just need to keep the surface area heat transfer the same. So if you had 1mm Squared that was designed to dissipate 5W effectively, you could have 1mm of thickness that generates 5W in 1mm cubed, or you could have 2mm of thickness that only generates 2.5W per 1mm cubed. Obviously at some point the dies themselves will have issues with heat transfer, but in reality when you figure out the amount of heat that needs to flow through the silicon it's basically the same by the time you get to the surface.

I know everyone is probably thinking that they would use this to just stack 2 of a current design on top of each other, but I don't think that's what this will do at all. Without running a bunch of numbers I don't know what the actual wattage vs surface area is on CPU cores. They might be less dense than GPUs, so this allows them to make use of that extra dissipation potential that exists. It might be that core density is actually going up slower than power consumption is going down, so they need to keep making larger dies to get the logic they need, but the actual power consumption is still the same or lower even with a bigger die. Obviously the bigger the die the lower the yields and the more costly it is to produce. If stacking allows yields to go up and keep the power consumption the same for the same die surface area, it would make sense why you'd want to do that.
 
I am thinking this is not a good way to go yet for high power devices and thinking more for phones and tablets where space is an issue and heat is already dissipated passively. If you were to take the Apple A9 chips that TSMC already produces and look at their layout you can easily see how they could in theory shrink the overall die area by taking out the GPU Core Pairs and place them on vertical layers instead. Doing this they could easily double or triple their core pairs while also shrinking the area the die covers, with Apple going on about how they are going to bring 8K video to their tablets and phones in the next 3 years I am thinking this is how they are going to do it.
 
At 7nm and lower nodes the feasibility of this is certainly reduced much further since the requirements for heat dissipation is lowered along with that. Big Little Core design for wafer stacking I think has it's advantages over a more balanced stacking. 60W + 60W vs 90W +30W where the 30W chip can run at a lower clock and voltage and save power when needed by turning off the 90W chip that's a 30W reduction advantage over load balancing wafer stacks effectively which if the task doesn't require much power that is a good thing. Actually just the nature of having twice as many transistors due to the stacking could lower power and heat a lot. Just don't expect to double the TDP from it that would require a manner of dealing with extra heat dissipation between wafer stacks. I guess the real benefit is it negates the need for cherry picked CPU binning quite a lot much like AMD's approach does and in fact you could combine the two approaches and basically get much closer to Epyc level performance for more like ThreadRipper cost point and ThreadRipper performance at the cost of Ryzen so that is a big deal. Basically if they take a CPU with multiple cores and cut the cores in half while retaining about the same transistor count it should actually be lower heat dissipation and power.
 
IBM seems to have some ideas on this;
https://patents.google.com/patent/US9748218

Alcatel-Lucent also has a solution;
https://patents.google.com/patent/US20150084182

Hitachi has a patent that could be adapted... or perhaps provide enough addional cooling to work out just fine;
https://encrypted.google.com/patents/US5345107

And of course perhaps the most simple solution of all... just do a better job of removing heat in general... lots of ideas around there.
AMD holds this one for an on chip Peltier;
https://patents.google.com/patent/US6800933B1

At the end of the day however if no simple inexpensive mechanical fix can be found...
There is always the supercomputer / military chip cooling option;
http://multimedia.3m.com/mws/media/65495O/3m-heat-transfer-fluids-brochure.pdf
Perhaps its time for fans to go away anyway. :) Really some interesting work being done with stuff like 3m Fluorinert. Bit of trivia its the same stuff that makes the real life breathing liquid seen in the movie the Abyss. The humans didn't use it but the rats in the movie really did... and no they didn't die. lol
 
That was the best scene of the movie Abyss. I think with the heat dissipation they'll just double up on the transistor count density by splitting the overall chip complexity in half cores & cache so each wafer will have the same transistor count, but be much like a a E8400 +E8400 rather than a Q9550 except will have the transistor count of a Q9550 per wafer. That in turn should lower TDP and increase efficiency and probably will be easier to fabricate than a larger wider and more complex multi core chips plus require less binning. Even if they don't double transistor count if they add as little as 20% more per wafer it'll be overall quicker and more efficient and probably have less defects in the end presumably at least.

AMD is already essentially doing this same approach, but are growing chips horizontally rather than vertically which between the two is a more simple approach and has the benefit of a bigger and wider heat sink to aid in cooling to deal with the increased TDP. The vertical stacking is more complex because the practical method to increase performance without increasing TDP and running into cooling complexities is to increase transistor count to reduce the heat output through increased energy efficiency. I wonder if AMD will use this on it's future MB chip sets as I believe they are fabricated at TSMC.
 
Last edited:
Back
Top