AnandTech tests Calxeda's 24 node (96 core) ARM server

pxc

Extremely [H]
Joined
Oct 22, 2000
Messages
33,063
http://anandtech.com/show/6757/calxedas-arm-server-tested

The main contenders:
Calxeda 24 nodes, each with a quad core Cortex A9 (ECX-1000) @ 1.4GHz, 96GB RAM (4GB x 24, single channel per node)

Dell PE R720 dual socket Xeon E5-2660 (8 cores per chip) @ 2.2GHz, 96GB RAM
(other CPUs are compared, see the link)

As Johan notes, Calxeda's server is more of a cluster than a typical rack server. The performance is interesting. In workloads that it targets, it performs well against the dual Xeon E5-2660 Dell server, beating it in web server tests in both response time and throughput for various loads. Power consumption is 33% lower than the Xeon server under average and peak load (idle power is nearly equal between the ARM and Xeon servers). And this is just with Cortex A9 chips. A15 and A50 should do even better in the future, although using significantly more power.

Sounds great, but then there's the price. $20,000 for the Calxeda 24 node server vs about $8000 for the Dell dual Xeon E5-2660 server. However, if filling a whole rack (and this server is intended for that usage), each Calxeda server comes out to around $8500 each.

Notice that "dirt cheap" ARM processors have close to zero advantage at the server level pricing. Those ECX-1000 chips, if they had GPUs and other things needed in a handheld or tablet processor, would sell for around $20. A Xeon E5-2660 lists for around $1300. Scale makes a huge difference here.

If Intel weren't replacing the aging Atom and coming out with a similar very low power Atom processor for high density server nodes based on the new uarch, this would be very alarming. The 2 pages of simple benchmarks including a dual core Atom are kind of interesting in this light. The ECX-1000 memory bandwidth is very poor and the dual core Atom N2800 has a healthy lead over the quad core A9 based ECX-1000. It should be possible for Silvermont (or later) Atoms to remain power and performance competitive against future Cortex A50 server designs. The 32nm Atom N2800 compared though is a bit of a power hog, relatively speaking, although not as bad as Johan suggests. (Not the same workloads, but 8.5W per ARM node vs 12W for an Atom system isn't too far off.)
 
Impressive numbers for the webserver part, dissapointing everywhere else, but good power numbers too.

I wonder how the A15 and the 64bit Arm v8 will do.

Well it might not be bad because you see 3.5W, but that's nearly 50% increase :p
 
64 bit arm is going to kick some ass. I just hope that we see an ecosystem spring up of standard form factors. I hope AMD helps facilitate this.
 
pxc, don't judge it by its price too much. That's Calxeda charging a fortune for being the only one in the market. The chips themselves are absolutely tiny and on an older more mature node with less complex PCBs and features making them cheaper than the Intel parts. The prices should drop considerably once Samsung, Qualcomm, AMD, Marvell and nVidia enter the market.

Bear in mind these are bog standard ARM SoCs that were designed primarily for smartphones. The A57/A53 64-bit cores are the first real ARM chips that will target the server market.
 
Bear in mind these are bog standard ARM SoCs that were designed primarily for smartphones.
Not really "bog standard", but it is almost certainly based on a standard licensed Cortex A9 core. http://www.calxeda.com/technology/products/processors/ecx-1000-series/

The A57, ARM's first "server chip" has been described by the company as a server, tablet and phone processor. Clock frequency and inclusion of big.LITTLE will probably separate the mobile and server segments using the A57 model, not a separate uarch for mobile and servers. The Cortex A53 is targeted at Cortex A9 performance levels, but 40% smaller and using less power. IOW, there really isn't a "server" design even on announced future products.
 
pxc, ARM doesn't make 'specific' cores or SoCs. I think you're misinterpreting their business model here...

What ARM does is license IP. That's pretty much it. What people do with that IP is up to them (there are certain limitations, but the ISA is fair game). For example, a company can make a 20-core ARM SoC utilizing A7's. That's not ARM's SoC, but rather just ARM standard A7 core design. In order to cater to the multitude of markets that ARM cores find themselves in (everything from routers and SSDs to tablets and smartphones), they have to make a core that's very much vanilla. It has to perform well for any potential use that a potential customer will use them for. Whenever you see A9 or A7 or A#, it's very much a bog standard ARM core.

The A57 is also a bog standard ARM core. It has to be. It's up to companies like Qualcomm or Marvell to tinker with it to fit their specific needs. For example, Qualcomm might widen the front end and increase IPC at the sake of higher power consumption because they'd only use two of them in a smartphone SoC, but Marvell would go in the opposite direction for embedded devices.

The jump to 64-bit is what's been holding back ARM processors from gaining significant market share in the server market. RAM is almost always the first bottleneck with most servers, and limiting your cores to 32-bit will certainly limit the number of clients you can appeal to =P Even with the current 32-bit limit, ARM actually has a higher market share than AMD's Opteron division.
 
The biggest problem I see is that while this is cool - nobody is working on it despite there has been talk about "Server ARM" for about 4-5 years.

Price is outrageous.
There is no Windows Server + Stack for it
OS/Server Stacks are probably another 5+ years away from being actually optimized. Re-compiling an application for this type of architecture is not optimizing.
 
Price is outrageous.
There is no Windows Server + Stack for it
OS/Server Stacks are probably another 5+ years away from being actually optimized. Re-compiling an application for this type of architecture is not optimizing.
I think all these things were considered by Calxeda.

1. single server price is crazy at $30,000, but more reasonable at $8500 per server by the rack full. Overall costs, if utilization is high at least, could be significantly lower with the ARM cluster over the long term (AT's tests show a 33% lower power advantage at high load).

2. These types of servers are unlikely to replace traditional office/business server roles. They're made for cloud servers or some type of physicalization-able tasks where host OS doesn't matter much. The stack, when used, will almost certainly be LAMP-ish (someone want to make a new acronym for *nix, Apache, Some SQL/SQL-ish server, Some scripting language ;)).

3. Not all applications will be suitable for these types of servers. I've been pointing that out for a long time. These are high density servers made for data centers, where the bottlenecks could primarily be things other than CPU performance. Not servers that will replace your Windows domain controller, mail server, file server, etc... at least not until a lot more is done for compatibility with clients, Windows security and applications.

ARM certainly has been hyping up servers for years, and the first CPU (Cortex A57) deemed suitable for servers (and tablets and handhelds) isn't coming out in production until next year. Calxeda admits in the marketing materials that ARM servers are in the "early adopter" phase.
 
Yep.

Will have to see what unfolds over the next few years. At least this is a first stepping stone.
 
http://www.fudzilla.com/home/item/30947-arm-and-tsmc-tape-out-first-cortex-a57-chip

ARM and TSMC have taped out the first Cortex A57 processor based on ARM’s next-gen 64-bit ARMv8 architecture.
The all new chip was fabricated on TSMC’s equally new FinFET 16nm process. The 57 is ARM’s fastest chip to date and it will go after high end tablets, and eventually it will find its place in some PCs and servers as well.

Furthermore the A57 can be coupled with frugal Cortex A53 cores in a big.LITTLE configuration. This should allow it to deliver relatively low power consumption, which is a must for tablets and smartphones. However, bear in mind that A15 cores are only now showing up in consumer products, so it might be a while before we see any devices based on the A57.

In terms of performance, ARM claims the A57 can deliver a “full laptop experience,” even when used in a smartphone connected to a screen, keyboard and mouse wirelessly. It is said to be more power efficient than the A15 and browser performance should be doubled on the A57.

It is still unclear when we’ll get to see the first A57 devices, but it seems highly unlikely that any of them will show up this year. Our best bet is mid-2014, and we are incorrigible optimists. The next big step in ARM evolution will be 20nm A15 cores with next-generation graphics, and they sound pretty exciting as well.
 
Why are better chips being fabicrated on 16nm process
The all new chip was fabricated on TSMC’s equally new FinFET 16nm process
and it being looked forward that the lower end chips being fabricated at 20nm
The next big step in ARM evolution will be 20nm A15 cores with next-generation
From my understanding of the fabrication process the more complex the chip is the harder it is to fabricate at a lower end, why our memory/flash fabrication has always led the way in terms of smallest fabrication.

Am I missing/misunderstanding something?
 
Yes, but the complexity between an A15 and an A57 isn't drastic. In fact, you can make the assumption that no core is inherently anymore difficult than another core with respect to fabrication. The GPU portion of an SoC (or APU/CPU) is much more difficult due to the density and complexity.

TSMC's 16nm process is a 20nm BEOL with FinFETs. GloFo is taking the same approach with their 14nm-XM process.
 
Yes, but the complexity between an A15 and an A57 isn't drastic. In fact, you can make the assumption that no core is inherently anymore difficult than another core with respect to fabrication. The GPU portion of an SoC (or APU/CPU) is much more difficult due to the density and complexity.

TSMC's 16nm process is a 20nm BEOL with FinFETs. GloFo is taking the same approach with their 14nm-XM process.

Okay, Thanks for clearing that last bit up.
 
Back
Top