ARM server status update / reality check

Red Falcon

[H]F Junkie
Joined
May 7, 2007
Messages
11,140
Things are getting more and more interesting.
I can't recommend Jeff Geerling's YouTube channel enough.



NVMe booting is currently in beta but will be added to the main feature-set soon via the Raspberry Pi Foundation.
Jeff's video shows that it is pretty quick to get it up and running, though, and ~400MB/s is pretty good for such a low-power SBC.
 
Last edited:

defaultluser

[H]F Junkie
Joined
Jan 14, 2006
Messages
14,103
Wow, two fucking years after release, and they finally fixed the feature-list to make it competitive with every other x86 SBC that includes NVMe out there?

If you want to know why Broadcom has zero customers for their SoC world (outside PI) their complete lack of support here is a pretty easy answer.
 
  • Like
Reactions: travm
like this
Joined
Dec 1, 2011
Messages
1,007
Wow, two fucking years after release, and they finally fixed the feature-list to make it competitive with every other x86 SBC that includes NVMe out there?

If you want to know why Broadcom has zero customers for their SoC world (outside PI) their complete lack of support here is a pretty easy answer.

Because it's silly for Bradocom to waste time and effort on getting NVMe up on the single PCIe 2.0 lane for I/O speeds that are slower than SATA3?

I'm not sure why RPi are bothering. The PCIe connectivity is only available from RPi 4 Compute Modules, on the 4B it's dedicated to the USB ports. Any application that really needs such I/O would probably be better off with a much higher-end ARM SoC (like Snapdragon or Apple M1 level (I can never keep all of the various ARM architectures straight)) or x86-64.

AFAICT Broadcom's SOC business is doing fine. They're embedded in tons of devices from manufacturers willing to pay for proper support and docs that the cheap Chinese SoCs don't offer (e.g., you're not going to find AllWinner in your car).
 

Red Falcon

[H]F Junkie
Joined
May 7, 2007
Messages
11,140
Because it's silly for Bradocom to waste time and effort on getting NVMe up on the single PCIe 2.0 lane for I/O speeds that are slower than SATA3?
NVMe has a much great queue depth and command queues than SATA.
This makes it a low-cost boon for ARM developers looking to test high queuing tasks such as databases where singular transfer rates aren't as important, especially compared to current SATA, eMMC, or micro SD flash storage solutions which are currently natively available.

I'm not sure why RPi are bothering. The PCIe connectivity is only available from RPi 4 Compute Modules, on the 4B it's dedicated to the USB ports. Any application that really needs such I/O would probably be better off with a much higher-end ARM SoC (like Snapdragon or Apple M1 level (I can never keep all of the various ARM architectures straight)) or x86-64.
It depends on the cost and the task, and other ARM-based solutions which natively offer more PCIe lanes high much higher costs associated with them.
x86-64 solutions don't really apply unless the ISA doesn't matter to the end-user.
 
Joined
Dec 1, 2011
Messages
1,007
NVMe has a much great queue depth and command queues than SATA.
This makes it a low-cost boon for ARM developers looking to test high queuing tasks such as databases where singular transfer rates aren't as important, especially compared to current SATA, eMMC, or micro SD flash storage solutions which are currently natively available.

True, NVMe is technically more capable. But I doubt the RPi SoC or similar is going to be a top choice for anyone who needs to process a ton of IOPS. I'd be concerned that the CPU wouldn't be able to keep up and load average would go through the roof. Initial dev, yeah, sure.


It depends on the cost and the task, and other ARM-based solutions which natively offer more PCIe lanes high much higher costs associated with them.
x86-64 solutions don't really apply unless the ISA doesn't matter to the end-user.

Well yeah, that's a given, more capable == more $$$.
 

Red Falcon

[H]F Junkie
Joined
May 7, 2007
Messages
11,140

NVIDIA Announces CPU for Giant AI and High Performance Computing Workloads

Credit goes to Lakados


https://nvidianews.nvidia.com/news/...t-ai-and-high-performance-computing-workloads
Underlying Grace’s performance is fourth-generation NVIDIA NVLink® interconnect technology, which provides a record 900 GB/s connection between Grace and NVIDIA GPUs to enable 30x higher aggregate bandwidth compared to today’s leading servers.

Grace will also utilize an innovative LPDDR5x memory subsystem that will deliver twice the bandwidth and 10x better energy efficiency compared with DDR4 memory. In addition, the new architecture provides unified cache coherence with a single memory address space, combining system and HBM GPU memory to simplify programmability.

https://www.anandtech.com/show/1661...formance-arm-server-cpu-for-use-in-ai-systems
The company isn’t directly gunning for the Intel Xeon or AMD EPYC server market, but instead they are building their own chip to complement their GPU offerings, creating a specialized chip that can directly connect to their GPUs and help handle enormous, trillion parameter AI models.

Image%20-%20Grace_678x452.jpg

Old design infrastructure with x86-64 and PCIE:
PCIe_575px.jpg

New design infrastructure with AArch64 and NVLINK:
NVLink_575px.jpg
 
Last edited:

defaultluser

[H]F Junkie
Joined
Jan 14, 2006
Messages
14,103

NVIDIA Announces CPU for Giant AI and High Performance Computing Workloads

Credit goes to Lakados


https://nvidianews.nvidia.com/news/...t-ai-and-high-performance-computing-workloads


https://www.anandtech.com/show/1661...formance-arm-server-cpu-for-use-in-ai-systems


View attachment 347381

Old design infrastructure with x86-64 and PCIE:
View attachment 347382

New design infrastructure with AArch64 and NVLINK:
View attachment 347383
the reason I didn't post it is that it's still a mystery. I assume this thing is built to accept as many GPU as you have slots for (just like the existing AMD servers)?

They can't even tell you any fuckng details about the CPU

So yea, this is a Pointless Press Release (tm) that we will have to wait a goddamed year to find out NVIDIA added a tiny tweak to the standard N2 design (plus the obvious addition of on-chip custom AI communicating with that CPU)

This feels like another empty Orin Press Release (along with an 18-month delay before specs and boards were shown)

https://www.anandtech.com/show/12598/nvidia-arm-soc-roadmap-updated-after-xavier-comes-orin

Fuck this preannounce shit man,you're not putting these in cars (so you don't have to give car designers 2-years empty pr notice to design these in)
 
Last edited:

juanrga

2[H]4U
Joined
Feb 22, 2017
Messages
2,801
the reason I didn't post it is that it's still a mystery. I assume this thing is built to accept as many GPU as you have slots for (just like the existing AMD servers)?
The full thing is rendered in the post by Red Falcon.

There are no slots, because slots are slow.
They can't even tell you any fuckng details about the CPU
It is based in a future microarchitecture to be announced by ARM. But Nvidia has already announced over 300 points on SPECrate2017_int_base.
So yea, this is a Pointless Press Release (tm) that we will have to wait a goddamed year to find out NVIDIA added a tiny tweak to the standard N2 design (plus the obvious addition of on-chip custom AI communicating with that CPU)

This feels like another empty Orin Press Release (along with an 18-month delay before specs and boards were shown)

https://www.anandtech.com/show/12598/nvidia-arm-soc-roadmap-updated-after-xavier-comes-orin

Fuck this preannounce shit man,you're not putting these in cars (so you don't have to give car designers 2-years empty pr notice to design these in)
The Swiss National Supercomputing Center and the Los Alamos National Laboratory will build supercomputers based on this.
 

defaultluser

[H]F Junkie
Joined
Jan 14, 2006
Messages
14,103
The full thing is rendered in the post by Red Falcon.

There are no slots, because slots are slow.

It is based in a future microarchitecture to be announced by ARM. But Nvidia has already announced over 300 points on SPECrate2017_int_base.

The Swiss National Supercomputing Center and the Los Alamos National Laboratory will build supercomputers based on this.
TLDR: empty press release shows tons of potential, but is, in-reality, purely hype. I'm more pissed-off because Tegra has made this "normal" for NVIDIA
 

defaultluser

[H]F Junkie
Joined
Jan 14, 2006
Messages
14,103
Ampere moving to custom cores - Anandtech Link

Interesting to see them jump on the Custom chain with their success with the Neoverse cores. I find it absolutely exciting to see a bunch of custom arm stuff popping up in both the server and consumer space.

Now just need to see some more RISC-V movement.

They own X-Gene, so it will be interesting to see what revision 4 brings!

Will it be faster than n2, or is arm upping the license costs after the surprise success of N1?
 
Last edited:
Top