AI GPU production to be constrained till end of 2024 due to TSMC packaging capacity crunch


Sep 28, 2018
Shortages of a key chip packaging technology are constraining the supply of some processors, Taiwan Semiconductor Manufacturing Co. Ltd. chair Mark Liu has revealed.

Liu made the remarks during a Wednesday interview with Nikkei Asia on the sidelines of SEMICON Taiwan, a chip industry event. The executive said that the supply shortage will likely take 18 months to resolve.

Historically, processors were implemented as a single piece of silicon. Today, many of the most advanced chips on the market comprise not one but multiple semiconductor dies that are manufactured separately and linked together later. One of the technologies most commonly used to link dies together is known as CoWoS.

TSMC reportedly intends to expand its CoWoS capacity from 8,000 wafers per month today to 11,000 wafers per month by the end of the year, and then to around 20,000 by the end of 2024.

TSMC currently has the capacity to process roughly 8,000 CoWoS wafers every month. Between them, Nvidia and AMD utilize about 70% to 80% of this capacity, making them the dominant users of this technology. Following them, Broadcom emerges as the third largest user, accounting for about 10% of the available CoWoS wafer processing capacity. The remaining capacity is distributed between 20 other fabless chip designers.

Nvidia uses CoWoS for its highly successful A100, A30, A800, H100, and H800 compute GPUs.

AMD's Instinct MI100, Instinct MI200/MI200/MI250X, and the upcoming Instinct MI300 also use CoWoS.

Taiwan Semiconductor Manufacturing Co. Chairman Mark Liu said the squeeze on AI chip supplies is "temporary" and could be alleviated by the end of 2024.
"Currently, we can't fulfill 100% of our customers' needs, but we try to support about 80%. We think this is a temporary phenomenon. After our expansion of [advanced chip packaging capacity], it should be alleviated in one and a half years."

Liu revealed that demand for CoWoS surged unexpectedly earlier this year, tripling year-over-year and leading to the current supply constraints. The company expects its CoWoS capacity to double by the end of 2024.

CoWoS is cool

"AI and HPC megatrends are boosting requirements for advanced packaging."

"Current demand for packaging technologies like chip-on-wafer-on-substrate (CoWoS) far outpaces the available capacity, which is why TSMC is presently accelerating efforts to boost such production capacity, the report says.

TSMC reportedly pledged to process an extra 10,000 CoWoS wafers for Nvidia throughout the duration of 2023. Given Nvidia gets about 60-ish A100/H100 GPUs per wafer (H100 is only slightly smaller), that would mean an additional ~600,000 top-end data center GPUs.

The projections imply an increase of about 1,000 to 2,000 wafers each month for the rest of this year. TSMC's monthly CoWoS output oscillates between 8,000 and 9,000 wafers, so supplying Nvidia with an additional 1,000 to 2,000 wafers monthly will significantly enhance the utilization rate of TSMC's high-end packaging facilities. This upsurge might lead to a supply scarcity of CoWoS services for other industry players due to the heightened demand, and that's why TSMC reportedly plans to expand its advanced packaging capacities.

TSMC's production increase is reportedly aimed at supporting the escalating demand for Nvidia's AI chips, which are extensively employed across the industry. For instance, Google recently launched its new A3 supercomputer, based on Nvidia's H100, boasting 26 ExaFLOPS of AI performance. Similarly, several prominent firms such as Microsoft, Oracle, and even Elon Musk's upcoming AI venture have procured tens of thousands of Nvidia's AI GPUs in the past few months.

It remains unclear which specific compute GPUs Nvidia intends to ramp up, as its current range includes the A100, A30, H100, and China-exclusive A800 and H800 GPUs. All of TSMC's facilities that provide advanced packaging services are located in Taiwan."

View attachment 575038
Until TSMC can bring additional capacity online, Nvidia's H100 and older A100 – which power many popular generative AI models, such as GPT-4 – are at the heart of this shortage. However, it's not just Nvidia. AMD's upcoming Instinct MI300-series accelerators – which it showed off during its Datacenter and AI event in June – make extensive use of CoWoS packaging technology.

AMD's MI300A APU is currently sampling with customers and is slated to power Lawrence Livermore National Laboratory's El Capitan system, while the MI300X GPU is due to start making its way into customers' hands in Q3.

We've reached out to AMD for comment on whether the shortage of CoWoS packaging capacity could impact availability of the chip and we'll let you know if we hear anything back.
TSMC currently produces the vast majority of processors that power popular AI services, including compute GPUs (such as AMD's Instinct MI250 and NVIDIA's A100 and H100), FPGAs, and specialized ASICs from companies like d-Matrix and Tenstorrent as well as proprietary processors from cloud service providers, such as AWS's Trainium and Inferentia as well as Google's TPU.

It is noteworthy that compute GPUs, FPGAs, and accelerators from CSPs all use HBM memory to get the highest bandwidth possible and use TSMC's interposer-based chip-on-wafer-on-substrate(CoWoS) packaging.

While traditional outsourced semiconductor assembly and test (OSAT) companies like ASE and Amkor also offer similar packaging technologies, it looks like TSMC is getting the lion's share of the orders, which is why it can barely meet demand for its packaging services.

Industry analysts believe that OSATs are less motivated to offer advanced packaging services because it requires them to invest hefty amounts of capital and poses more financial risks than traditional packaging. For example, if something goes wrong with a mainstream processor that sits on an organic substrate, an OSAT loses only one chip, whereas if something goes wrong with a package carrying four chiplets and eight HBM memory stacks, the company loses hundreds if not thousands of dollars. Since OSATs do not get substantial margins making those chiplets, such risks slow down the expansion of advanced packaging capacity at OSATs, even though advanced packaging costs significantly more money than traditional packaging.
giphy (7).gif
TSMC first introduced CoWoS technology in 2012 and has continued to upgrade its packaging technology since then. In the meantime, the global semiconductor industry saw the emergence of a new technology that combines different types of semiconductors, such as memory and system semiconductors, to create entirely new classes of semiconductors (heterogeneous integration). Now, Nvidia, Apple, and AMD cannot make their core products without TSMC and its packaging technology.

This unrivaled packaging technology explains why global IT giants such as Nvidia and Apple still want to use TSMC’s production lines even though Samsung Electronics succeeded in mass-producing 3-nm semiconductors ahead of TSMC in 2022. As a result, all big foundry orders for AI and autonomous driving semiconductors have gone to TSMC, and the market share gap is getting wider and wider between that company and Samsung.

In order to overtake TSMC’s CoWoS, Samsung is also developing a more advanced concept of I-cube and X-cube packaging technologies. In particular, the Korean chipmaker is reportedly focusing its research on three-dimensional (3D) packaging in which multiple chips are stacked vertically to boost performance. “Samsung is preparing a more advanced way, the three-dimensional packaging of semiconductors,” a semiconductor industry insider said. “Soon there will be a head-on collision between Samsung and TSMC in packaging.”
This focus on advanced chip packaging is not exclusive to TSMC; other industry giants like Intel and Samsung are also prioritizing it, with Intel aiming to quadruple its capacity for its top-tier chip packaging by 2025.

(hilarious url, btw)
Intel’s focus on their packaging technologies is one of their key strategies for the 20 and 18 nodes. Their advancements in power delivery and cooling methods for dealing with stacked silicon is also very impressive.

Their work with Samsung for node agnostic stacked cache in the form of HBM3 is also very cool as works from their 14nm nodes down.

The biggest hurdle for TSMC, Samsung, Intel, and everybody else wanting to improve packaging is the backlog at ASML for the needed hardware. Hardware Intel has purchased and pre paid for basically consuming their entire manufacturing lineup for the next year on that class of machinery. Leaving everyone else waiting in line while they get a jumpstart on it.

This lack of advanced packaging capacity was something Intel aimed to tackle back in 2019 as they saw it as an industry wide looming problem and they made a big deal at the time about wanting to get ahead of it.

The node shrink race has reached a point where the benefits to moving to a smaller node are plateauing so for the next while it’s going to be all about moving and connecting the bits like a complex Tetris game.
"AMD will reportedly prioritize its allocation of wafers from TSMC to GPGPU and FPGA products, according to a report from Bits and Chips on Twitter."

One issue I see with this rumour is that if AMD is reallocating its packaging capacity of RDNA 4 chiplets to FPGAs & compute GPUs, then surely the same should hold good for Navi 31 also ?

Once the monolithic navi 43 (dieshrunk navi 32) & navi 44 (dieshrunk navi 33) are released (planned for sometime next year), then will AMD also stop all fresh production of navi 31 or will they release a monolithic die shrunk navi 31 also ??
"AMD will reportedly prioritize its allocation of wafers from TSMC to GPGPU and FPGA products, according to a report from Bits and Chips on Twitter."

One issue I see with this rumour is that if AMD is reallocating its packaging capacity of RDNA 4 chiplets to FPGAs & compute GPUs, then surely the same should hold good for Navi 31 also ?

Once the monolithic navi 43 (dieshrunk navi 32) & navi 44 (dieshrunk navi 33) are released (planned for sometime next year), then will AMD also stop all fresh production of navi 31 or will they release a monolithic die shrunk navi 31 also ??
AMD is quite back ordered with their Xilinx catalog, Xilinx does after all make the FPGA used for the Javelin missile program.
AMD also made quite a few promises to Oil and Energy regarding their accelerators which they are somewhat behind schedule on because of everything else going on just about everywhere, so not dumping on them for that it's out of their control.
So yeah with a cool consumer GPU market, and big Enterprise offering Jensen parking lot BJ's for the chance of getting their AI shipments faster, now is the best time for them to shift production.
One issue I see with this rumour is that if AMD is reallocating its packaging capacity of RDNA 4 chiplets to FPGAs & compute GPUs, then surely the same should hold good for Navi 31 also ?

Apparently TSMC has several kinds of packaging technologies & CoWoS is only one of them. Used primarily by nvidia & xilinx FPGAs

The big showstopper is CoWoS-S (Silicon Interposer). It involves taking a known good die, flip chip packaging it onto a passive wafer which has wires patterned in it. This is where the name CoWoS comes from, Chip on Wafer on Substrate. It is the highest volume 2.5D packaging platform out there by a long shot. As discussed in part 1, this is because Nvidia datacenter GPUs such as P100, V100, and A100 utilize CoWoS-S. While Nvidia has been the highest volume, Broadcom, Google TPU, Amazon Trainium, NEC Aurora, Fujitsu A64FX, AMD Vega, Xillinx FPGAs, Intel Spring Crest, and Habana Labs Gaudi are just a few more notable examples of CoWoS usage. Most compute heavy chips with HBM, including AI training chips from a variety of startups use CoWoS

Apparently the tech used in RDNA 3 (Navi 31 & 32) could be InFO_OS (Integrated Fan Out OS)

there is an incredible number of advanced packaging types and brand names from Intel (EMIB, Foveros, Foveros Omni, Foveros Direct), TSMC (InFO-OS, InFO-LSI, InFO-SOW, InFO-SoIS, CoWoS-S, CoWoS-R, CoWoS-L, SoIC), Samsung (FOSiP, X-Cube, I-Cube, HBM, DDR/LPDDR DRAM, CIS), ASE (FoCoS, FOEB), Sony (CIS), Micron (HBM), SKHynix (HBM), and YMTC (XStacking).

Maybe CoWoS is an advanced version compared to InFO ??

At any case, it looks like this affects nvidia hopper but not RDNA 3.

Not sure if RDNA 4 (Navi 41 & Navi 42) was planning to switch to CoWoS from InFO. Reports above say that the new MI300 series use CoWoS

If that was the case then it could be that AMD is reallocating navi 41 & navi 42 capacity to xilinx FPGAs
While advanced packaging methods in general, and multi-chiplet designs in particular, present ample opportunities for chip developers, these are very complex technologies that present numerous challenges to producers of microelectronics and OSATs.

Assembling a processor based on five or more chiplets (e.g., Intel’s Meteor Lake consists of five chiplets) made on different nodes sounds plausible from many points of view, high yield of the packaging process is crucial since if it fails, all chiplets go to bin, which means losses for the chipmaker and/or OSAT.

Both Intel and TSMC argue that their advanced packaging yields are very high, but they certainly do not disclose any numbers.

Intel acknowledged this year that large substrates, such as those used for massive system-in-packages like Ponte Vecchio, tend to warp, which poses yield risks and makes it difficult to assemble them onto a motherboard. For now, Intel seems to be satisfied with what it has. But, to ensure that its future SiPs do not bend and perform better (as they integrate things like optical interconnects), the company plans to implement glass substrates instead of organic substrates in the second half of this decade. Such a move requires a lot of changes and investment and to a substantial degree it is necessitated by the ongoing transition to multi-chiplet designs.