Harvard Dropouts Raise $5 Million for LLM HW Accelerator

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
11,162
Hardware acceleration for LLMs

Specialized accelerator
Uberti cites bitcoin mining chips as an example of a successful specialized ASIC offering. In the AI accelerator domain, several companies have specialized architectures for particular workloads. There are a few of examples of CNN-focused architectures at the edge (see: Kneron), while specialized architectures for the data center have mainly focused on DLRM (deep learning recommendation model), which is notoriously difficult for GPUs to accelerate (see: Neuchips). By contrast, Nvidia already has a fully deployed software feature called the Transformer Engine in its current H100 GPU, which allows LLM inference to be run without further quantization.

There’s also the problem of hyperscalers’ appetite for building their own specialized chips for their own workloads. Meta recently announced it had built its own DLRM inference chip, which is already widely deployed. Google’s TPU and AWS’ Inferentia are built for more general workloads.

Any comparison with recommendation workloads should take into account the timescales involved, Zhu said, since recommendation is relatively mature at this point.

“This is a very recent development—the market for running transformers didn’t really exist six months ago, whereas DLRM on the other hand has had comparatively longer,” he said. “The world has changed very rapidly and that is our opportunity.”

On the flipside, the rapid evolution of workloads in the AI space could spell disaster if Etched.ai specializes too much.

“That’s a real risk, and I think it’s turning off a lot of other people from going down this route, but transformers aren’t changing,” Uberti said. “If you look back four years to GPT-2, compared to Meta’s recent Llama model, there are just two differences—the size and the activation function. There are differences in how it is trained, but that doesn’t matter for inference.”

The basic components of transformers are fixed, and while there are nuances, Uberti is not worried.

“Innovations don’t come out of thin air,” he said. “There’s still this cycle of things published in academia that takes some time to be integrated.”

Uberti’s examples are gated linear activation units, which first appeared in literature in 2018 but didn’t find their way into Google’s Palm model until 2020, and 2021’s Alibi, a method for positional encoding, which didn’t find widespread adoption until the end of 2022. A typical startup might take 18-24 months to develop a chip from scratch.

“The edge industry tells us a lot—the one lesson they’ve learned is not to specialize, that you don’t know what the future holds, place your bets in the wrong place and you could be useless,” Uberti said. “We took that advice and threw it out the window.”

Sohu chip

The partners have been working ideas for their first chip, codenamed Sohu, which they claim can reach 140× the throughput per dollar compared with an Nvidia H100 PCIe card processing GPT-3 tokens.

Sohu will be “a chip that has a lot of memory,” and Uberti hinted that the two-order-of–magnitude performance metric was mainly due to impressive throughput (rather than a drastic cost differential), with the chip designed to support large batch sizes.

The men have “most of the architecture fleshed out” already, Uberti said. The tiled nature of the design should make design faster and minimize complexity, he added. Supporting only one type of model also minimizes the complexity of the software stack, particularly the compiler.

The company’s customers will be anyone who wants to pay less to use ChatGPT. Beyond that, the company is still working on its business model, Uberti said, confirming the company already has “customer dollars committed” but declined to give further details.

Etched.ai plans to spend its seed funding on hiring an initial team, getting to work on the RTL front-end development and begin talking to IP providers. So far, Etched.ai has hired Mark Ross as chief architect, who had a spell as Cypress CTO in the early 2000s.

The company will seek to do a series A likely at the beginning of next year.

“Most investors are skeptical, and rightfully so, because what they see is a pair of undergrads trying to tackle the semiconductor industry,” Zhu said. “But there is definitely still a non-trivial portion of investors who are impressed and excited by the vision that we’re pitching them and what we think that this can become.”

Etched.ai is aiming to have its Sohu chip available in 2024.”

Source: https://www.eetimes.com/harvard-dropouts-raise-5-million-for-llm-accelerator/
 
Back
Top