https://www.servethehome.com/cavium-thunderx2-review-benchmarks-real-arm-server-option/we have an Arm chip that can go toe-to-toe with Intel and AMD and come out ahead in some cases. Best of all, the list price of the 32 core top-bin CN9980 part is $1795 about half of the competitive Intel and AMD chips.
The fact that they also managed this with 50% less cores means they've solved their single-threaded performance issues (or the threading is much enhanced).I'm now a little more interested in how well they fix things in ThunderX 2. Too bad we'll have to wait two years for benchmarks.
Well, that was because XGene always treated their chip as some kind of a science project with no future. And we all know how cobbled-together the AMD Seattle platform was (great software effort, half-assed hardware).Well, two years ago, some of us were at RWT predicting that Intel and AMD would have lots of trouble with 16/14nm ARM servers. Then only 28nm AMD Opteron Seattle and the 40nm XGene-1 were available and lots of geniuses at RWT said us that it wasn't happening: "ARM cannot scale up", "maybe for microservers, but no one will beat Xeon",...
They turned to be wrong, very wrong.
How does he know?So Intel can tune up STREAM Triad a bit better than Cavium can on the Xeons
ThunderX2 has already been benchmarked by third parties and SPEC scores are in the expected range.Cavium is quoting internal benchmarks it has run but not yet submitted to SPEC against Intel results that have been submitted, which is not exactly kosher but we have to get the data we can get
No. The Gold 6148 used in the SPEC comparison has two AVX 512 units.As for floating point math, the custom Armv8 cores in the Vulcan chips have a pair of 128-bit NEON math units, and the Xeon SP Gold chips have a 512-bit AVX-512 unit
In my opinion the better results will come by using Cray optimized compiler (CCE), which already provides boost higher than 20% over the GCC compiler in several dozens of HPC workloads.On the SPEC floating point test, the ThunderX2 can beat the Intel chips using GCC compilers, but Intel pulls ahead on its own iron using its own compilers by about 26.5 percent over the ThunderX2 using GCC compilers. The important thing is that Cavium is working with Arm Holdings, which now owns software tools maker Allinea, to create optimized compilers that goose the performance of integer and floating point jobs by around 15 percent, which will put ThunderX2 ahead on integer performance (for these parts anyway) and close the gap considerably on floating point (with about a 10 percent gap still to the advantage of Intel).
I took the average of the single-thread SPEC results and normalized for the different clocks: 3.8GHz vs 2.5GHz. If I didn't make any mistake the TX2 core gives the 99% of the IPC of the Skylake core.More benches, this time at Anandtech. More praise for the platform, but they're withholding the power consumption figures until they have a shipping system to test on (power management broke).
I like the threaded SPEC performance analysis. Those extra threads can really help, depending on the load.
Just wish the fuckers had put the results on the same page, instead of splitting them up for ad views.
Well yes, I thought they might be jumping to conclusions there at El Reg. But it doesn't counter the fact that the chief engineer left. That will probably slow down progress on Centriq v2.0, while they find a new dream team.Qualcomm NOT leaving ARM servers
The Centriq v2 design is almost finished; so the chief engineer leaving is not a problem. The Centriq v3 design is canceled and it will be replaced by a new core that will be designed by Qualcomm CDMA Technologies unit.Well yes, I thought they might be jumping to conclusions there at El Reg. But it doesn't counter the fact that the chief engineer left. That will probably slow down progress on Centriq v2.0, while they find a new dream team.
Mind you, the Thunder X2 team is in the same boat, since they bought this new design from Broadcom. Nobody seems to be willing to stick it out in the ARM server chip design market.
Customers as Smugmug already moved away from x86. They have similar performance but ~40% cost savings.4x4 A72 cores with no custom L3? 2014 called, and wants it's phone back (there is only a 10% IPC difference between A57 and A72).
You can tell it's no more than a science project when they didn't even target modern cores. The A75 has been in shipping products for a year now, and the lack of it shows under real benchmarks:
The funny thing, Amazon is a big enough cloud provider that they could actually benefit form a real effort building their own cutting-edge ASIC. But at this rate of disinterest by Amazon, that's five to ten years away.
32bit ARM server -- opening moveArm's new Neoverse platforms
Seems interesting and a good path forward.
32bit ARM server -- opening move
40nm 64bit ARM server -- move
28nm 64bit ARM server -- move
16nm 64bit ARM server -- check
10nm 64bit ARM server -- check
7nm 64bit ARM server -- checkmate
This 64C Neoverse offers scalar performance similar to 64C Rome, but on about half the power: ~100W vs ~200W. And Intel Cascade Lake and Copper Lake will require ~400W to get that level of performance.
And a 128C Neoverse is in the pipeline
I thought they already ported CUDA to ARM to support ThunderX years ago?By end of year CUDA on ARM will acquire same status than CUDA on x86 or POWER
That is excellent news, I've been wanting CUDA-based applications for ARM for a long time now.By end of year CUDA on ARM will acquire same status than CUDA on x86 or POWER