Red Falcon
[H]ard DCOTM December 2023
- Joined
- May 7, 2007
- Messages
- 12,681
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Wow, two fucking years after release, and they finally fixed the feature-list to make it competitive with every other x86 SBC that includes NVMe out there?
If you want to know why Broadcom has zero customers for their SoC world (outside PI) their complete lack of support here is a pretty easy answer.
NVMe has a much great queue depth and command queues than SATA.Because it's silly for Bradocom to waste time and effort on getting NVMe up on the single PCIe 2.0 lane for I/O speeds that are slower than SATA3?
It depends on the cost and the task, and other ARM-based solutions which natively offer more PCIe lanes with much higher costs associated with them.I'm not sure why RPi are bothering. The PCIe connectivity is only available from RPi 4 Compute Modules, on the 4B it's dedicated to the USB ports. Any application that really needs such I/O would probably be better off with a much higher-end ARM SoC (like Snapdragon or Apple M1 level (I can never keep all of the various ARM architectures straight)) or x86-64.
NVMe has a much great queue depth and command queues than SATA.
This makes it a low-cost boon for ARM developers looking to test high queuing tasks such as databases where singular transfer rates aren't as important, especially compared to current SATA, eMMC, or micro SD flash storage solutions which are currently natively available.
It depends on the cost and the task, and other ARM-based solutions which natively offer more PCIe lanes high much higher costs associated with them.
x86-64 solutions don't really apply unless the ISA doesn't matter to the end-user.
Underlying Grace’s performance is fourth-generation NVIDIA NVLink® interconnect technology, which provides a record 900 GB/s connection between Grace and NVIDIA GPUs to enable 30x higher aggregate bandwidth compared to today’s leading servers.
Grace will also utilize an innovative LPDDR5x memory subsystem that will deliver twice the bandwidth and 10x better energy efficiency compared with DDR4 memory. In addition, the new architecture provides unified cache coherence with a single memory address space, combining system and HBM GPU memory to simplify programmability.
The company isn’t directly gunning for the Intel Xeon or AMD EPYC server market, but instead they are building their own chip to complement their GPU offerings, creating a specialized chip that can directly connect to their GPUs and help handle enormous, trillion parameter AI models.
the reason I didn't post it is that it's still a mystery. I assume this thing is built to accept as many GPU as you have slots for (just like the existing AMD servers)?NVIDIA Announces CPU for Giant AI and High Performance Computing Workloads
Credit goes to Lakados
https://nvidianews.nvidia.com/news/...t-ai-and-high-performance-computing-workloads
https://www.anandtech.com/show/1661...formance-arm-server-cpu-for-use-in-ai-systems
View attachment 347381
Old design infrastructure with x86-64 and PCIE:
View attachment 347382
New design infrastructure with AArch64 and NVLINK:
View attachment 347383
The full thing is rendered in the post by Red Falcon.the reason I didn't post it is that it's still a mystery. I assume this thing is built to accept as many GPU as you have slots for (just like the existing AMD servers)?
It is based in a future microarchitecture to be announced by ARM. But Nvidia has already announced over 300 points on SPECrate2017_int_base.They can't even tell you any fuckng details about the CPU
The Swiss National Supercomputing Center and the Los Alamos National Laboratory will build supercomputers based on this.So yea, this is a Pointless Press Release (tm) that we will have to wait a goddamed year to find out NVIDIA added a tiny tweak to the standard N2 design (plus the obvious addition of on-chip custom AI communicating with that CPU)
This feels like another empty Orin Press Release (along with an 18-month delay before specs and boards were shown)
https://www.anandtech.com/show/12598/nvidia-arm-soc-roadmap-updated-after-xavier-comes-orin
Fuck this preannounce shit man,you're not putting these in cars (so you don't have to give car designers 2-years empty pr notice to design these in)
TLDR: empty press release shows tons of potential, but is, in-reality, purely hype. I'm more pissed-off because Tegra has made this "normal" for NVIDIAThe full thing is rendered in the post by Red Falcon.
There are no slots, because slots are slow.
It is based in a future microarchitecture to be announced by ARM. But Nvidia has already announced over 300 points on SPECrate2017_int_base.
The Swiss National Supercomputing Center and the Los Alamos National Laboratory will build supercomputers based on this.
Ampere moving to custom cores - Anandtech Link
Interesting to see them jump on the Custom chain with their success with the Neoverse cores. I find it absolutely exciting to see a bunch of custom arm stuff popping up in both the server and consumer space.
Now just need to see some more RISC-V movement.
The Ampere Altra Max Review: Pushing it to 128 Cores per Socket
Very unique. Moar Cores / Moar Problems.
TLRL: Less L3, cache coherency rears its head, throughput for some things is amazing as is compiling (not linking), transactional java sux.
The Grace CPU Superchip memory subsystem provides up to 1TB/s of bandwidth, which Nvidia says is a first for CPUs and more than twice that of other data center processors that will support DDR5 memory. The LPDDR5X comes spread out in 16 packages that provide 1TB of capacity. In addition, Nvidia notes that Grace uses the first ECC implementation of LPDDR5X.
This brings us to benchmarks. Nvidia claims the Grace CPU Superchip is 1.5X faster in the SPECrate_2017_int_base benchmark than the two previous-gen 64-core EPYC Rome 7742 processors it uses in its DGX A100 systems. Nvidia based this claim on a pre-silicon simulation that predicts the Grace CPU at a score of 740+ (370 per chip). AMD's current-gen EPYC Milan chips, the current performance leader in the data center, have posted SPEC results ranging from 382 to 424 apiece, meaning the highest-end x86 chips will still hold the lead. However, Nvidia's solution will have many other advantages, such as power efficiency and a more GPU-friendly design.
Things are starting to get interesting.
At long last, GPU functionality on the Raspberry Pi.
It is a start, and that's how everything begins, very small.Exective Summary: the drivers plus i/o architecture are so bad on the RM platform,the onlyu folks who can get GPU Compute working on Arm servers are Supercxomputer vendors like NVIDIA.
https://www.anandtech.com/show/1757...2-the-next-generation-of-arm-server-cpu-cores
The Anandtech article gives a good breakdown from the announcement.
https://www.arm.com/company/news/20...e-with-next-generation-arm-neoverse-platforms
But here’s the original for reference.
We can expect Nvidia to be making announcements about their Grace chips as they are supposedly using the the Nanoverse V2 platform, it might come up in the Nvidia Keynote scheduled for Today (Sept 20, 2022).
The conclusion is about what you would expect, especially if you saw our AoA Analysis Marvell ThunderX2 Equals 190 Raspberry Pi 4. I also discuss the Ampere Altra Max’s HPL performance versus the newly released AMD EPYC Genoa as in the floating point workload, there is a massive gap between the efficiency of AMD and Ampere parts. There is a reason that the Arm server CPUs typically have integer-focused performance figures for things like web serving and SPEC CPU2017 integer rates, not floating point.
This is actually pretty cool.
Will be doing just this on my Raspberry Pi CM4 very soon.
VMware ESXi would have been painful on a micro SD card, but should be much better over native NVMe, even at PCIe 2.0 1x (~500MB/s).
View attachment 536568
System Specs:
Broadcom BCM2711 OC'ed @ 2.0GHz AArch64 SoC
8GB 3200MT/s LPDDR4 SDRAM
Broadcom VideoCore VI GPU
512GB Samsung PM9A1 M.2 NVMe SSD
Raspberry Pi OS 64-bit (Debian 11)
GeeekPi Aluminum Alloy SoC Heatsink
Noctua NF-A4x20 PWM Fan (40x20mm)
ineo Copper Alloy M.2 2280 Heatsink
Integrated ARM SoC 1000Base-T NIC
InnoMaker HiFi DAC PCM5122
NVIDIA’s Grace Superchip is a 500W part, but that includes the LPDDR5X memory. We have been seeing roughly 5W/ DDR5 RDIMM. So for AMD EPYC 9004 Genoa with 12 channels of memory, we would add ~60W to the CPU’s TDP. That makes NVIDIA’s chip slightly more powerful, but with what should be a bit over twice the memory bandwidth than a single Genoa CPU.
That single CPU comparison will become important. The AMD EPYC Genoa more than doubled memory bandwidth, and it has much higher compute performance due to adding more cores and a faster microarchitecture. Our sense is that in HPC workloads, Grace will compete with dual-socket Genoa. On the integer side, AMD should be ahead based on what we have seen with existing Arm architectures.