ARM server status update / reality check

That kid is a testing machine... and I love his work, thanks for sharing!
Yeah I love how much they test. They pretty much have it all scripted to run benchmarks, but it still takes them days to get results and obviously any small changes/tweaks need tested again. They collect so much data it's great to see, especially since some of it is actually relevant to me. Time to sit down with some popcorn and read throw their 100+ benchmarks, lol.

Edit: Favorite quote from the article:
"For the past number of days I have been running 140+ benchmarks on Amazon's Graviton2 m6g.metal instance for tapping the bare metal performance of this latest high-end Arm server SoC and then comparing it to the bare metal performance of EPYC 7742, AMD's current generation 64-core server CPU offering. The EPYC 7742 was tested with and without SMT for matching the Graviton2 that lacks SMT. No Intel CPUs were tested in this comparison due to their current lacking of a 64-core processor."
 


https://www.raspberrypi.org/blog/8gb-raspberry-pi-4-on-sale-now-at-75/

The BCM2711 chip that we use on Raspberry Pi 4 can address up to 16GB of LPDDR4 SDRAM, so the real barrier to our offering a larger-memory variant was the lack of an 8GB LPDDR4 package. These didn’t exist (at least in a form that we could address) in 2019, but happily our partners at Micron stepped up earlier this year with a suitable part. And so, today, we’re delighted to announce the immediate availability of the 8GB Raspberry Pi 4, priced at just $75.
 

As... very cool as that is, I really struggle to think of something that I would want to do on a Pi 4 that would actually use 8GB of RAM.

It'd be like putting 32GB of VRAM on a GPU for gaming... most of it would just go unused. I don't use the 1GB on my Pi 3b+ as it is!

(by the time you got around to using that much RAM, for most real workloads you'd be taxing the limits of the SoC; and a bigger complaint with the Pi 4 is that the drivers aren't there yet still to the point that the hardware is fully leveraged...)
 
RAMdisk! :p

EDIT: This is quite the large project in the works as well:

 
Last edited:
I mean, shit, sure; since we're (finally) looking at containerizing our product offerings at work, I was looking at building a four-node cluster out of Pi 4 units, using a stacking chassis with PoE hats... yeah.

Biggest problem I had was figuring out what the hell I'd do with it all after I'd spent hundreds to build it!
 
I mean, shit, sure; since we're (finally) looking at containerizing our product offerings at work, I was looking at building a four-node cluster out of Pi 4 units, using a stacking chassis with PoE hats... yeah.

Biggest problem I had was figuring out what the hell I'd do with it all after I'd spent hundreds to build it!
Ah, if that's the case, you might like this for just such a project and semi-scalable workload:
https://turingpi.com/







If anything, until ARM64 starts to become a serious threat/replacement to x86-64 systems, these do make great proof-of-concept clusters and test beds!
 
Ah, if that's the case, you might like this for just such a project and semi-scalable workload:
https://turingpi.com/







If anything, until ARM64 starts to become a serious threat/replacement to x86-64 systems, these do make great proof-of-concept clusters and test beds!

It's not out of the realm for x86, just Intel and AMD hold all of the technical information tightly, as if it were some sort of national secret. Intel sort of provides some details on their lower end laptop CPUs, but good luck buying those direct from Intel. Quark was a bad joke with no support. Viatech is supposedly in the game, but getting any technical information out of them is also a struggle. AMD just flat out don't provide any information if they don't have to.

While Nvidia doesn't provide detailed information on their GPUs (and the relevance of that is out of scope for a SBC/SoC discussion, anyways), they do provide just about everything for their Jetson (formerly Tegra) products. Specifications, product design guides, IBIS files, board schematics, BOMs, bringup guides, STEP models, software sources, etc, etc, etc. Short of their GPU drivers, they provide more open information and access than any other group (not that rPi GPU drivers are realistically open, either).

As is typical of Nvidia, they charge through the roof for their products. The Jetson Nano Dev Kit is $100 and didn't even include WiFi. The new Jetson Xavier NX includes WiFi (along with 6 "Carmel" arm cores and 384 Volta GPU cores, tensors, advanced I/O [including two M.2 PCIe slots], etc), but costs $400. Next Jetson Nano is scheduled for 2021, and I'm expecting pricing to still be in the triple digits. That being said, they do seem to be coming down in price over time, while increasing performance and I/O. Nvidia spins their own Ubuntu build with their entire SDK (CUDA included), L4T and Jetpack, respectively for free.

Either way, we do see some movement from the Odroid side, even into the x86 space. ASUS somehow sees enough market to have released yet another ASUS Tinker board in 2020. Arduinos are amusingly more and more powerful with every official device release.

Such a shame, IMO. Via was long pioneering in this field (the other ones that come to mind - the old AMD Geode GX series - came from a former failed acquisition/spinoff related to Viatech), but they never pushed any harder. I don't even know what they do most of the time. They come up every ~10 years with a new CPU architecture, but it usually just disappoints and fades away after ~5 years. How are they still afloat?
 
The Jetson Nano Dev Kit is $100 and didn't even include WiFi
It's also quite a bit more powerful than an RPI, and with their Ubuntu spin, more flexible since all of the driver stuff works out of the box. Also, for these applications, WiFi isn't always desired...
Either way, we do see some movement from the Odroid side, even into the x86 space. ASUS somehow sees enough market to have released yet another ASUS Tinker board in 2020. Arduinos are amusingly more and more powerful with every official device release.
The latest Odriod looks pretty nice, while I'd appreciate ASUS figuring out how to drop the size of the x86 stuff; or, alternatively, getting ahold of AMDs ultrabook-grade SoCs and putting those on boards. Granted once you get to NUC-sized boards that x86 SBCs usually come in at, you pretty much want them to be in an enclosure already. They're just too specialized and expensive at that point, and generally, aside from the size shrink, not particularly attractive over building an aftermarket solution with the significant increase in flexibility that brings. There is definitely something special about the Pi form-factor!
They come up every ~10 years with a new CPU architecture, but it usually just disappoints and fades away after ~5 years. How are they still afloat?
If they still have a valid x86 license, that'd probably be the only reason. I maintain significant ire toward VIA for the hell they put us through during the early Athlon days, but most recently the parts they put out outside of the CPU market were very much class-leading if a bit niche.
 
It's also quite a bit more powerful than an RPI, and with their Ubuntu spin, more flexible since all of the driver stuff works out of the box. Also, for these applications, WiFi isn't always desired...

The latest Odriod looks pretty nice, while I'd appreciate ASUS figuring out how to drop the size of the x86 stuff; or, alternatively, getting ahold of AMDs ultrabook-grade SoCs and putting those on boards. Granted once you get to NUC-sized boards that x86 SBCs usually come in at, you pretty much want them to be in an enclosure already. They're just too specialized and expensive at that point, and generally, aside from the size shrink, not particularly attractive over building an aftermarket solution with the significant increase in flexibility that brings. There is definitely something special about the Pi form-factor!

If they still have a valid x86 license, that'd probably be the only reason. I maintain significant ire toward VIA for the hell they put us through during the early Athlon days, but most recently the parts they put out outside of the CPU market were very much class-leading if a bit niche.
It was more powerful than an rPi, now its CPU cores are lagging behind. I do get the Nano and rPi are for somewhat divergent markets, however, and Nvidia is really trying to woo the Raspberry Pi Compute Module's market (also using a SO-DIMM slot for module<->mainboard connection). I do agree there are different aspects of what customers want out of a Jetson Nano vs any rPi/CM. It just seemed like a bit of a missed opportunity to me. After all, the WiFi on all of these Tegra/Jetson kits are not integral to the Jetson/Tegra module, they are usually attached via a M.2 2230 wifi card (the exception was a single variant of the TX2), and Nvidia didn't even bother to bundle a WiFi card on a developer's kit. Fine, the production (and developer) module do not have integrated WiFi, but the dev kit should have come with something. It's just another unnecessary hurdle for a $100 kit, IMO.

I do generally agree on your first two points! Either way, I do hope the 2021 Jetson Nano (on their roadmap) resolves these issues. They aren't tied to the I/O placement like the rPi seems to be, so hopefully Nvidia's next dev kit will be a bit more of a desktop, vs a pure dev kit. I personally bought into the Jetson Nano because it has (almost) the full Nvidia tech stack, with all of the correct dependencies, libraries, tools, and premade workflows. I have since moved onto Linux for the desktop, so maybe it's not as important as it used to be, and even if I go back to Windows, MSFT seems to be integrating all of Nvidia's linux stack into WSL, too.

As for VIA, is an x86 ISA still that desirable? Outside of form factors (thanks for miniITX), they just haven't delivered enough performance to matter, and their chips (usually soldered onto boards, and sold as modules or complete Single Board Computers) are not cheap enough to make up for the lack of performance. I'm seriously still surprised they had enough money to fund a small team to create a new CPU (the Centaur CHA).
 
As for VIA, is an x86 ISA still that desirable? Outside of form factors (thanks for miniITX), they just haven't delivered enough performance to matter, and their chips (usually soldered onto boards, and sold as modules or complete Single Board Computers) are not cheap enough to make up for the lack of performance. I'm seriously still surprised they had enough money to fund a small team to create a new CPU (the Centaur CHA).
It's less that they do anything with it, and more that they actually have it. Intel isn't handing them out anymore, after all!
 
Credit to KD5ZXG for finding this:



You do know we have actual shipping products with reviews, right? These things launched in march!

https://www.windowscentral.com/samsung-galaxy-book-s-review

Intel's fastest-clocked Lakefield tops-out at 3 GHz and thanks to their mixed architecture the multi-core tops-out at 1_8 ghz!


Based on the Geekbench results, compare 3.7 ghz turbo ice lake to 3 GHz turbo lakefield, dropping your single-core results to 900. The multi-cores should be about 60% per-atom, at the same turbo, but the clock speed drops by half, so it's about 30%

So it should have 20% higher single thread, and a disappointing-as fuck full-load result (500 + 300 + 300 + 300 + 300 = 1700 Geekbench!). Thanks Intel!
 
Last edited:
Yeah, I, uhhh, totally knew that... :whistle:
Nice, those are actually surprisingly good, and dat battery life.

Really for business use on the go, it is quite efficient. (y)
Also, right below that article you found: https://www.windowscentral.com/apple-right-embrace-arm-and-dump-intel-some-its-macbook-laptops

Too bad that rumor is from 2011.

While the new development environment crossover makes this the best time ever for this to happen, it is still a week away from the announcement. Wouldn't be the first rumor Bloomberg got wrong.
 
2:28PM EDT - Confirmed: the Mac is transitioning to Apple's silicon (Arm)

Looks likev they were right (for once!)

So this one is all CPU, no GPU acceleration? Might make it more versatile over the previous top contenders.

If it's replacing an Intel laptop CPU, it's likely to come with integrated graphics, with support for PCIe cards (they can't offer something as powerful as AMD without heavily impacting their yields, and they would ave to design multiple chips).

I can't imagine Apple would have trouble implementing the same switchable graphics they already have on x86.

Something for the Mac Pro would probably be CPU- only.
 
Last edited:
So this one is all CPU, no GPU acceleration? Might make it more versatile over the previous top contenders.

Yes, all the number crunching is made on the CPU, like the former K-computer.

If it's replacing an Intel laptop CPU, it's likely to come with integrated graphics, with support for PCIe cards (they can't offer something as powerful as AMD without heavily impacting their yields, and they would ave to design multiple chips).

I can't imagine Apple would have trouble implementing the same switchable graphics they already have on x86.

Something for the Mac Pro would probably be CPU- only.

longblock454 is talking about the Fugaku supercomputer, not about Apple SoCs.
 
schmide posted right above longblock454 with that info, and I'm pretty sure longblock454 was responding to schmide (post #137). :)


Sorry, had Schmide on my ignore list for some reason.

Probably for something he did years ago, because the man goes on "no-post" sabbaticals. How else do you end up with less than 300 post after more than a decade?
 
New #1 Supercomputer: Fujitsu’s Fugaku and A64FX take Arm to the Top with 415 PetaFLOPs

You guys are slacking with the updates.

Noticing 415k / 148k ~= 28k / 10k ~= 2.8 you really don't get something for nothing and your chicks for free.

As listed above, the unit supports INT8 through FP64, and the chip has an on-board custom Tofu interconnect, supporting up to 560 Gbps of interconnect to other A64FX modules.

I like the Tofu interconnect name.:)
 
32GB of HBM, huh... guess that one is not for branching workloads, especially not at 2.2GHz. Sounds more like they didn't want to pay a GPGPU IP holder for compute.

On another note, at the relative 'bottom end', I've been doing alright with my Pi 4 4GB. Figuring out PoE, I think... biggest obstacle is the fanless enclosure I used. I'll say that as a general purpose desktop device, it works extremely well; and the latency of my Pi 3b+ is getting annoying, despite being rock solid. It'll probably get demoted to monitoring something else.
 
32GB of HBM, huh... guess that one is not for branching workloads, especially not at 2.2GHz. Sounds more like they didn't want to pay a GPGPU IP holder for compute.

Sounds as they designed a machine for real-world HPC based on their experience with the K-computer.
 
Sounds as they designed a machine for real-world HPC based on their experience with the K-computer.
Yup, likely using ARM vs. something more fixed like a GPU for flexibility. Just a different tradeoff; they're also likely focusing on SIMD throughput, which along with HBM, will give GPGPU-like performance.
 


https://shop.solid-run.com/product/SRLX216S00D00GE064C04CH/
Based on NXP’s excellent Layerscape LX2160A 16-core Arm Cortex A72 SoC built into a COM Express type 7 module – HoneyComb LX2K is a cutting-edge Mini ITX platform for heavy duty network computing!

ClearFog-CX-LX2K-layout.png

https://openbenchmarking.org/result/1909169-JONA-190709812
 
It wouldn't surprise me if Freescale ditched PowerPC completely for ARM. They haven't released a new core since the e6500, and I'm sure that A72 is higher-performance.

We decided to use e6500 for EA-18G hardware upgrade, (so we could retain Altivec code, and avoid the mess of switching to x86), but I can see everyone high-performance switching to ARM.
 
Last edited:

Just a little sanity check. The metric here is cost of a privately built non public system. Many things could be at play. Artificially set low. Low demand, High availability, etc.

I really did like that write up though. They scoped out all the variables (compilers, memory, sockets) and how they effect and or could effect results.
 
Just a little sanity check. The metric here is cost of a privately built non public system. Many things could be at play. Artificially set low. Low demand, High availability, etc.

GR2 is cheaper to produce (less transistors, cheaper validation) and run (smaller TDP).
 
Back
Top