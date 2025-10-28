erek
[H]F Junkie
2FA
- Joined
- Dec 19, 2005
- Messages
- 13,947
“NVIDIA claims the DGX Spark can achieve one FP4 PetaFLOP, but this relies on structured sparsity, a technique that ignores zeros in a neural network. If you disable this feature, as most standard models do, the machine operates at about half the advertised speed, aligning with the figures we recently observed, so that might be an issue as well. The chip combines a MediaTek-sourced Arm CPU die with a Blackwell GPU die in a 2.5D package, built using TSMC's 3 nm process. On the CPU side it features 20 Arm v9.2 cores divided into two clusters of ten, with each cluster supported by a 16 MB shared L3 cache (32 MB total), while each core has its own private L2 cache. The memory subsystem is a unified LPDDR5X-9400 setup on a 256-bit bus, supporting up to 128 GB and providing approximately 301 GB/s of raw bandwidth to the package. High-speed I/O is concentrated on the CPU die, with NVMe storage and peripherals utilizing PCIe lanes there, and a ConnectX-7 NIC connected via a PCIe Gen 5 x8 link for multi-unit networking.”
Source: https://www.techpowerup.com/342321/...rtedly-runs-at-half-the-power-and-performance
Source: https://www.techpowerup.com/342321/...rtedly-runs-at-half-the-power-and-performance