AI Startup Etched Unveils Transformer ASIC Claiming 20x Speed-up Over NVIDIA H100

Really hope someone takes over the gaming GPU crown.. someone.. Like if NVIDIA gets licked enough from some other company that is faster at AI acceleration.. Maybe NVIDIA doesn't deserve gaming enthusiasts and they'll learn there lesson for abandoning us


"However, there are some doubts going forward. While it is generally believed that transformers are the "future" of AI development, having an ASIC solves the problem until the operations change. For example, this is reminiscent of the crypto mining craze, which brought a few cycles of crypto ASIC miners that are now worthless pieces of sand, like Ethereum miners used to dig the ETH coin on proof of work staking, and now that ETH has transitioned to proof of stake, ETH mining ASICs are worthless.

Nonetheless, Etched wants the success formula to be simple: run transformer-based models on the Sohu ASIC with an open-source software ecosystem and scale it to massive sizes. While details are scarce, we know that the ASIC runs on 144 GB of HBM3E memory, and the chip is manufactured on TSMC's 4 nm process. Enabling AI models with 100 trillion parameters, 8x bigger than GPT-4's 1.8 trillion parameter design.
An LLM transformer part only machine (Groq or this) could beat at it the general compute affair we can imagine/.

That would allow to generate thousand of different answers using 6--7 different models and compare (average, weight, etc..) them, make Sora type video inference outside major film studio viable and so on.
 
And then the equation changes ever so slightly, or a different equation comes to prominence and all the thousands to millions of of these ASICs become worthless.
 
As with any of this shit: You should believe it when you see it. Companies LOVE to make big claims that then often don't pan out in reality. Not only that, but how impressive something is depends on WHEN it comes out. If they released it on the market today and it was 20x as fast, that's something. However if it takes them a couple more years to get it to market, well who knows where nVidia is at that point? Maybe they have a 20x gain.

I can't count how many times I've seen a company claim they have an amazing new chip coming soon that will totally slaughter the competition, only to heavily under deliver when it does come, and also for the competition to have gotten much better in the intervening time.
 
So this company's premise is that they build an AI accelerator ASIC, tailor fit to a specific LLM.
Change the LLM too much and the hardware becomes useless.
If you are building your AI from the ground up, and want to save on deployment costs and don't expect to make drastic changes after launch beyond updates, this is viable.
Develop it on AMD or Intel-based hardware, and deploy it on this hardware which is presumably cheaper.

As their hardware is not running their models with CUDA it must be a generic OpenML model which the Nvidia H series hardware is terrible at, you can get nearly double the OpenML performance on the L series hardware for far less.

I have to assume they are then showing their best-case scenario against Nvidia's worst, still though it should be viable for specifically dialed-in AI models.

But I mean the ASIC companies building all the coin mining gear had to do something right???
 
Teenyman45 said:
And then the equation changes ever so slightly, or a different equation comes to prominence and all the thousands to millions of of these ASICs become worthless.
According to the website they are the pure transformer matrix part, which have been a strong tool since 2017 for that field, but yes they say that the next revolution could mean they are worthless:

By burning the transformer architecture into our chip, we can’t run most traditional AI models: the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion 2. We can’t run CNNs, RNNs, or LSTMs either.

Today, every state-of-the-art AI model is a transformer: ChatGPT, Sora, Gemini, Stable Diffusion 3, and more. If transformers are replaced by SSMs, RWKV, or any new architecture, our chips will be useless.

And if it continue to be a world with so much money(and required energy to run) that it could be constantly worth it to make new specialty chips only and fully optimized for the new way.
 
LukeTbk said:
An LLM transformer part only machine (Groq or this) could beat at it the general compute affair we can imagine/.

https://www.etched.com/

That would allow to generate thousand of different answers using 6--7 different models and compare (average, weight, etc..) them, make Sora type video inference outside major film studio viable and so on.
Researchers Upend AI Status Quo By Eliminating Matrix Multiplication In LLMs


“Researchers from UC Santa Cruz, UC Davis, LuxiTech, and Soochow University have developed a new method to run AI language models more efficiently by eliminating matrix multiplication, potentially reducing the environmental impact and operational costs of AI systems. Ars Technica's Benj Edwards reports: Matrix multiplication (often abbreviated to "MatMul") is at the center of most neural network computational tasks today, and GPUs are particularly good at executing the math quickly because they can perform large numbers of multiplication operations in parallel. [...] In the new paper, titled "Scalable MatMul-free Language Modeling," the researchers describe creating a custom 2.7 billion parameter model without using MatMul that features similar performance to conventional large language models (LLMs). They also demonstrate running a 1.3 billion parameter model at 23.8 tokens per second on a GPU that was accelerated by a custom-programmed FPGA chip that uses about 13 watts of power (not counting the GPU's power draw). The implication is that a more efficient FPGA "paves the way for the development of more efficient and hardware-friendly architectures," they write.”
 
