ARM server status update / reality check

Discussion in 'All non-AMD/Intel CPUs' started by pxc, Mar 11, 2016.

  1. defaultluser

    defaultluser I B Smart

    Messages:
    11,902
    Joined:
    Jan 14, 2006
    I do like seeing Intel begin to get their ass kicked. After EPYC was just marginal, this is the first sign of real competition from anyone :D
     
    Red Falcon likes this.
  2. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
    Red Falcon likes this.
  3. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
    ThunderX2 reviewed and benchmarked. It is still pre-production silicon and firmware, but

    https://www.servethehome.com/cavium-thunderx2-review-benchmarks-real-arm-server-option/

    Cavium is still working in compiler optimizations for this microarchitecture. When using GCC and standard flags ThunderX2 runs circles around best Xeons and EPYC in both integer performance and memory bandwidth

    Cavium-ThunderX2-SPEC-Int-Rate-Peak-gcc7.jpg

    Cavium-ThunderX2-Stream-Triad-gcc7.jpg

    ThunderX2 is also being evalutated by the HPC community and will be part of several supercomputers. The high memory bandwidth is key for memory-bound HPC code.
     
    Red Falcon and defaultluser like this.
  4. defaultluser

    defaultluser I B Smart

    Messages:
    11,902
    Joined:
    Jan 14, 2006
    What a difference two years makes! Intel has MOSTLY stood still, and Cavium now has a competitive product. The wait was definitely worth it :D

    The fact that they also managed this with 50% less cores means they've solved their single-threaded performance issues (or the threading is much enhanced).

    EDIT: yeah, 4-wide issue plus 4 threads per-core will do it, but the massive number of threads means threading need to be optimized for each type of workload, so the don't step on each other.

    Has a lot of potential for improvements in threading efficiency with compiler optimization, and even without that it's quite impressive.
     
    Last edited: May 10, 2018
    juanrga likes this.
  5. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
    Well, two years ago, some of us were at RWT predicting that Intel and AMD would have lots of trouble with 16/14nm ARM servers. Then only 28nm AMD Opteron Seattle and the 40nm XGene-1 were available and lots of geniuses at RWT said us that it wasn't happening: "ARM cannot scale up", "maybe for microservers, but no one will beat Xeon",...

    They turned to be wrong, very wrong.
     
    Red Falcon likes this.
  6. defaultluser

    defaultluser I B Smart

    Messages:
    11,902
    Joined:
    Jan 14, 2006
    Well, that was because XGene always treated their chip as some kind of a science project with no future. And we all know how cobbled-together the AMD Seattle platform was (great software effort, half-assed hardware).

    When you don't put in the effort, people don't take you seriously. They were right to poke fun at the projects, but I agree there was a lot of undeserved poking at the the ARM server effort. People Fear Change, even though change is good :(

    ThunderX was the first platform to take ARM on servers seriously. I mentioned they were the only one to support CUDA back in 2016. That's kinda important if you want to do any sort of large-scale simulations or content creation. Those APIs don't write themselves :D

    Dual-socket means they're competitive on the most important server configurations out there (quad is not as popular).

    Once Qualcomm learns how to make a dual-socket system, the ARM invasion will officially begin. Their cores are already fairly impressive, so they just need this small addition.
     
    Last edited: May 10, 2018
    juanrga likes this.
  7. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
    XGene was made on ancient 40nm node and its goal was more for development and testing purposes than for real production workloads. Then we said that the ARM servers would give trouble to Intel would be those made on 14/16nm nodes: i.e. K12, Vulcan, and XGene3. The 10nm Centric wasn't even a rumor then.

    K12 was canceled. Vulcan was sold and is now Cavium ThunderX2. And XGene3 was sold and is now Ampere A1.

    I expected Vulcan (now ThunderX2) to be competitive because we knew it was a 4-wide core with SMT4, 180 instruction ROB, 3ALU+3MEM and a ~3GHz target on 16FF node, but its performance has really impressed me. I didn't expect it to be so good. Makes one wonder that it could probably beat AMD K12, which I expected to be faster than EPYC.
     
  8. defaultluser

    defaultluser I B Smart

    Messages:
    11,902
    Joined:
    Jan 14, 2006
    K12 was cancelled for good reason. At 8 native cores (i.e. Seattle with K12 instead of A57) it would not have been enough to bury AMD's server woes anytime soon, and would have delayed Zen another year.

    I don't feel that AMD would have been fast enough to stand out from their other ARM competitors. When you're ARM, you really have to overwhelm Intel AND YOUR OTHER ARM COMPETITION to get the design win. Qualcomm offers much higher perf/watt, and Thunder X2 offers similar performance for half the price. Both do this without NUMA issues.

    At least with Ryzen they can play pretend in the server market again, because some people are willing to deal with NUMA mess (or treat each node as a separate server) if they can get their x86 performance for cheap.

    What AMD really needs is a native 32-core chip, that bypasses the NUMA mess. That will be big and expensive to develop no matter what the architecture.
     
    Last edited: May 11, 2018
  9. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
    K12 was a 32-core design that did target the same high-performance servers than Zen. Seattle, aimed at microservers, was initially designed as an 16-core system but finally released only as 8-core and replacing the Jaguar Opteron line.

    The problem with AMD canceling K12 due to competition is they continue fighting both Intel and the ARM competition! If EPYC had launched a pair of years ago when the ARM ecosystem was in its infancy, it could get more traction, but now with an ARM ecosystem mature enough companies can switch away from x86. Indeed, several companies have migrated from Xeon to ARM servers, ignoring EPYC. During ThunderX2 servers presentation, Microsoft reiterated again that wants more than 50% of their servers powered by ARM



    Yes, AMD needs a 32-core die, but it is not happening. I expect a 12-core or a 16-core die for Zen2 on 7nm and again MCM2 and MCM4 configurations.
     
  10. defaultluser

    defaultluser I B Smart

    Messages:
    11,902
    Joined:
    Jan 14, 2006
    From what I can gather, both Seattle and K12 were supposed to use the same custom core, being just like Skylake and Skylake-X. So it would be believable if Seattle was supposed to be 16-core, and K12 32-core.

    Unfortunately, AMD didn't have the engineering talent to pull it release these products before similar chips would come out from better-situated companies. Cavium is ONLY developing ThunderX, so they can put all engineering efforts behind that. And Qualcomm is forever pushed to justify their existence by improving processor performance against ARM stock cores and Apple, and that experience flows into their server parts.

    At least if they compete in the x86 world, they have a chance against sleeping Intel. Apple/Qualcomm is all you need to look at to find the drive behind unstoppable architectural development, something AMD has never consistently done before (can't make this judgement about Cavium yet, it's too early in their lifetime. But they are ion the way to becoming a third unstoppable ARM maker.)

    For CPUs, both AMD and Intel get complacent when they have the lead. This works fine if all they have is each other, but that complacency means they lack the deep-seated drive of other more successful companies, which will slowly kill both of them once ARM is introduced.

    Nintendo shows how easy it is to switch (har) the consoles to ARM, if price/performance/efficiency is right. They've been doing it since the Advance, and now ARM is all grown up in the Switch. AMD only won the PS4/Bone because it had no high-end competition in the custom market. But the Smart devices revolution has changed that.

    Imagine how much easier it will be to sell console makers when Qualcomm attaches their server core to a powerful custom Adreno? It's only a matter of time before these markets are opened up to a half-dozen custom core competitors. Unless AMD changes their inconsistency, they will fall behind better-run companies.
     
    Last edited: May 11, 2018
  11. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
    Seattle uses A57 cores and was going to be socket compatible with a puma+ based microserver

    8.jpg

    K12 was going to be socket compatible with Zen

    12.jpg

    But all those ARM-x86 plans were canceled and Keller left.

    Both Microsoft and Sony wanted ARM in the consoles. AMD won the PS4/Bone because then ARM was only 32bit. There was an evaluation of x86-64 vs ARM32 prototypes and finally Microsoft and Sony decided to go the x86 route, which leaved out Nvidia from the competition.
     
    Red Falcon and defaultluser like this.
  12. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
    More benches and info for ThunderX2

    https://www.nextplatform.com/2018/05/16/getting-logical-about-cavium-thunderx2-versus-intel-skylake/

    Interesting article, but with some mistakes, as when the article mentions the Xeon used in SPEC comparison. The model used was 6148 (not 6140) and base clock is 2.4GHz (not 2.5). Also I am not sure what the author means by "only 27.5 MB of cache on the die activated", the Xeon model has the amount of cache that corresponds to 20 cores, i.e. 1.375MB per core just as rest of Skylake models. This is more than what ThunderX2 has: 1MB per core.

    How does he know?

    ThunderX2 has already been benchmarked by third parties and SPEC scores are in the expected range.

    No. The Gold 6148 used in the SPEC comparison has two AVX 512 units.

    In my opinion the better results will come by using Cray optimized compiler (CCE), which already provides boost higher than 20% over the GCC compiler in several dozens of HPC workloads.

    On the HPC results mentioned at the end, the Xeon systems used are E5-2695 v4 and Gold 6152. I guess the Skylake results are using ICC, whereas the ThunderX2 results use a mixture of GCC and Cray compilers.

    For many server workloads ThunderX2 is able to match/beat Skylake but at nearly half the price. For many HPC workloads ThunderX2 is able to provide about 85 percent of the performance but with "42 percent better performance per dollar".[/quote][/quote]
     
    Last edited: May 20, 2018
    defaultluser and Red Falcon like this.
  13. defaultluser

    defaultluser I B Smart

    Messages:
    11,902
    Joined:
    Jan 14, 2006
    More benches, this time at Anandtech. More praise for the platform, but they're withholding the power consumption figures until they have a shipping system to test on (power management broke).

    https://www.anandtech.com/show/12694/assessing-cavium-thunderx2-arm-server-reality

    I like the threaded SPEC performance analysis. Those extra threads can really help, depending on the load.

    Just wish the fuckers had put the results on the same page, instead of splitting them up for ad views.
     
    Last edited: May 23, 2018
    juanrga likes this.
  14. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
    I took the average of the single-thread SPEC results and normalized for the different clocks: 3.8GHz vs 2.5GHz. If I didn't make any mistake the TX2 core gives the 99% of the IPC of the Skylake core.

    Not bad not bad. (y)
     
    Red Falcon and defaultluser like this.
  15. defaultluser

    defaultluser I B Smart

    Messages:
    11,902
    Joined:
    Jan 14, 2006
    Well, that didn't take long. Qualcomm may end up ditching servers entirely. And even if they don't, the creator has already left the team, so progress will hit a wall.

    https://www.theregister.co.uk/2018/05/24/qualcomm_snapdraon_710/

    It's amazing how short-sighted big company management can be. It's barely been on the market a year, and they forget that you have to be dedicated to build a market. Not to mention adding a 2-socket solution to the pot if they want to be taken seriously?

    But Qualcomm has gotten used to being #2 behind Apple and in a race for third place with ARM themselves, and they are in no hurry to change this. Guess it's back to business-as-usual.
     
    Last edited: Jun 10, 2018
  16. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
  17. defaultluser

    defaultluser I B Smart

    Messages:
    11,902
    Joined:
    Jan 14, 2006
    Well yes, I thought they might be jumping to conclusions there at El Reg. But it doesn't counter the fact that the chief engineer left. That will probably slow down progress on Centriq v2.0, while they find a new dream team.

    Mind you, the Thunder X2 team is in the same boat, since they bought this new design from Broadcom. Nobody seems to be willing to stick it out in the ARM server chip design market.
     
    Last edited: Jun 13, 2018
    juanrga likes this.
  18. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
    The Centriq v2 design is almost finished; so the chief engineer leaving is not a problem. The Centriq v3 design is canceled and it will be replaced by a new core that will be designed by Qualcomm CDMA Technologies unit.
     
    defaultluser likes this.
  19. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
    defaultluser and Red Falcon like this.
  20. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017
  21. juanrga

    juanrga Pro-Intel / Anti-AMD Just FYI

    Messages:
    2,219
    Joined:
    Feb 22, 2017