Intel Officials Allegedly Say that Apple Could Move to ARM Soon

It seems you don't understand that you can port any code from x86 to ARM. The reason why you have mobile-kind apps on a phone or tablet is because it is a mobile device, not because it is build over ARM.

Yes with enough work it can be done, but it's not easy and takes many man hours to port something. It isn't free.

Oh and you can't just port some code without some abstract functions. For example neon does not have a movemask instruction. It takes about 10 custom dependent instructions to implement.

Every platform has its own ecosystem but there are tools to facilitate overlap. LLVM Compiler is a great example.

Nothing is going to happen overnight. It will be years to decades from entry point to market capture.

Desktop workstations, developer kits, servers, and supercomputers based in ARM aren't executing mobile apps.

E.g. scientists using Isambard are running high-performance code to solve compressible Navier-Stokes equations for shock-boundary layer interactions. And they are doing that on ARM hardware because the code run faster than on any Broadwell or Skylake Xeon.

source on the custom code faster on ARM?
 
What are you on?

It seems you don't understand that you can port any code from x86 to ARM. The reason why you have mobile-kind apps on a phone or tablet is because it is a mobile device, not because it is build over ARM.

Desktop workstations, developer kits, servers, and supercomputers based in ARM aren't executing mobile apps.

E.g. scientists using Isambard are running high-performance code to solve compressible Navier-Stokes equations for shock-boundary layer interactions. And they are doing that on ARM hardware because the code run faster than on any Broadwell or Skylake Xeon.


Actually i do understand code portabillity to a certain degree, but im not out here decrying the death of x86 just because you can port x86 to 6502 ...


and while i may be blithely ingnorant to all these umm Arm applications on ummm Arm desktops and Workstations (thats so funny) .


Regardless im just not seeing these so called arm performance miracles. and i guess its because i dont use what ever esoteric arm application that gets whizbang bench marks over x86.

but i do use x86, and begrudgingly use those arm chips on a variety of devices and call my experience colored, but thus far, arm has yet to impress me .


now if my x86 experience was always on 300 dollar computers than yeah arm is like wow !! why cant my computer be that fast !!

but thats not the case and every time i use an arm device it feels like i have stepped back 20 years in usability. even that supposedly whiz bang fast iphone x and my new note 9

im sorry but arm just is not proving it to me .

and frankly i belive its because the chips just dont have the power.

one day we might have apples to apples bench marks to prove other wise but till that day i dont care how many navier whatevers arm performs over x86 the truth of it will always be does it run crysis ??
 
Even more fun incoming ....

"I have 8 ARM cores, it's as fast as your 2700X" ...
"Apple does the heavy lifting in the cloud" "So glad I got optical ethernet with no caps and super duper low latency"
"Why can't I install XYZ that works on MacOS / iOS on my ARM Mac?" (Similar to ARM Windows tablet)

Anyway... time will tell how this pans out. Get your wallet ready !

And saying it's already faster and enough, when is the last time you saw an iPhone running Plex server trans coding multiple files, 4 VMs, 2 browsers with lot of tabs, a game at 4K with other stuff in the background ? Yeah those are "enough" for typical users but to say they're as good or better... show me wrong. If you're going to use supercomputer well yeah... they're powerful but also above my pay grade.
 
Last edited:
Not so sure.

Looking at my new, ARM driven, cell phone, it outclasses many of the PC's I come across in daily life, from a CPU power perspective.

X86 is not getting more efficient anymore. Adding circuits is not an endless option, and with 5 or latest at 3nm the end of the pole has been reached, what then ?

ARM keeps pulling and I would not be surprised to see a megashift ahead. Apple is well know to be the first to cut old tails off. I wouldnt be suprised to see ARM driven desktop devices by 2020 from Apple, hell no.

Just because you come across a lot of systems that are pieces of crap doesn't make ARM chips any better. What ARM chip stacks up against my Ryzen 2600X ?
 
one day we might have apples to apples bench marks to prove other wise but till that day i dont care how many navier whatevers arm performs over x86 the truth of it will always be does it run crysis ??
Yes, ARM64 CPUs have the processing capabilities to run Crysis, easily.
The ARM64 CPU in the iPhone 8 is 40% faster overall than the x86-64 Jaguar CPU in the PS4 Pro - the only thing lacking is the GPU, obviously.

We do have Apples to Apples benchmarks - almost every thread listing "Apple moving to ARM in 2020" on here has them.
The games just need to be compiled/programmed/released for ARM64 now.

I wouldn't use the Nintendo Switch as an example, either - the ARM64 CPU within it is years old at this point, and the GPU is meager at best compared to modern equipment.
 
Even more fun incoming ....

"I have 8 ARM cores, it's as fast as your 2700X" ...
"Apple does the heavy lifting in the cloud" "So glad I got optical ethernet with no caps and super duper low latency"
"Why can't I install XYZ that works on MacOS / iOS on my ARM Mac?" (Similar to ARM Windows tablet)

Anyway... time will tell how this pans out. Get your wallet ready !
Apple used Rosetta and universal-binaries when they were moving from PowerPC to x86 and x86-64 - that was back in 2006, so I don't know why this wouldn't be possible in 2019 when moving from x86-64 to ARM64...

And saying it's already faster and enough, when is the last time you saw an iPhone running Plex server trans coding multiple files, 4 VMs, 2 browsers with lot of tabs, a game at 4K with other stuff in the background ? Yeah those are "enough" for typical users but to say they're as good or better... show me wrong. If you're going to use supercomputer well yeah... they're powerful but also above my pay grade.
My old ODROID-U3 from 2014 with a quad-core Cortex-A9 ARM 32-bit CPU @ 1.7GHz could transcode two 720p videos at once beyond real-time, and that was an old CPU back then paired with only 2GB of LPDDR2 RAM.
Cortex-A76 and beyond ARM64 CPUs with 8+ cores would be more than capable of multiple 1080p transcodes.

The reason you don't see iPhones doing this is because they are normally only equipped with 2-3GB of RAM - give it 16GB+ and we will be telling a different story.
 
It's possible, but you also have to efficiently transport data between the cores, each-other, and memory. Latency in the bus will reduce performance scaling to a degree, and the bus will also increase power consumption somewhat (depending, of course, on how well the chip is designed). But even assuming 30% less performance than their "simulated" 64 core processor, that's still over 900 on that test.

The interconnect they have designed scales as O(N) and the whole platform goes above 128C. The scores they give for the 64C chip are accurate (+-5%).

and while i may be blithely ingnorant to all these umm Arm applications on ummm Arm desktops and Workstations (thats so funny) .

https://www.servethehome.com/gigabyte-thunderxstation-using-cavium-thunderx2-launched/
 
Last edited:
Yes with enough work it can be done, but it's not easy and takes many man hours to port something. It isn't free.

No one said that porting code is free or trivial. However, Apple has lots of experience on migrating to a new muarch. They migrated from PPC from x86 before. No problem on migrating from x86 to ARM as Red Falcon mentioned.
 
I remember the last few times x86 died.
Well, that was with some nm left to work with but we are getting to the point of diminishing returns in rather big steps, so we started adding cores/circuits and make it more complex. Also this has a finite amount of what can de done and diminishing returns are already present, acording to what I read in threads that focus on such things. For me, that makes sense, but I am only a consumer and mouse pusher, not a chip designer.
 
Well, that was with some nm left to work with but we are getting to the point of diminishing returns in rather big steps, so we started adding cores/circuits and make it more complex. Also this has a finite amount of what can de done and diminishing returns are already present, acording to what I read in threads that focus on such things. For me, that makes sense, but I am only a consumer and mouse pusher, not a chip designer.

Yep, heard it all before when x86 was declared dead. The magically the architecture evolved. ARM faces the same pressure only worse. As you push it more and more into general computing use you have to add more and more instructions negating loads of the architectural benefits.
 
Yep, heard it all before when x86 was declared dead. The magically the architecture evolved. ARM faces the same pressure only worse. As you push it more and more into general computing use you have to add more and more instructions negating loads of the architectural benefits.
And that's the question... If moar cores is put down as diminishing returns argument for x86.. how is it that moar cores will be a panacea for arm?
Is the fact that arm is more energy efficient (runs cooler) necessarily means that you can sacrifice said efficiency and translate it into lots more performance?.
 
An argument could be made that x86 has suffered from inefficient code induced by decades of easy performance gains.
 
The interconnect they have designed scales as O(N) and the whole platform goes above 128C. The scores they give for the 64C chip are accurate (+-5%).

O(N) = linear

What a convoluted way of saying linear. It darn better scale as linear, it would suck to sort interconnects.

There is no magic sauce that will make it scale different than any other topology. It looks like an Intel ring bus over provisioned like AMD.
 
An argument could be made that x86 has suffered from inefficient code induced by decades of easy performance gains.
Meltdown and Foreshadow were the end results of this on x86-64, specifically on Intel processors.
Cutting too many corners for, as you said, easy performance gains.

2017 and 2018 were shit shows for Intel trying to cover their tracks, and unlike during the Netburst-era, they couldn't easily use anti-competitive and illegal business practices to get out of it.
The results and performance hits were, and still are, very real, and effectively killed off everything older than Sandy Bridge for hardware-level security.

AMD is the only company still innovating and keeping x86-64 alive at this point, because it sure as hell isn't Intel.
Intel is going to eventually turn into Oracle and IBM: 10% technology development and 90% attorneys/legal for x86 legacy licensing and royalty rights.

Not to mention die-shrinks are only going to get them so much farther before the physical wall is hit.
I give Intel another decade, at most, before they become totally irrelevant.
 
Yep, heard it all before when x86 was declared dead. The magically the architecture evolved. ARM faces the same pressure only worse.
That was back when Intel was still developing Core and Core 2, and was at the 65nm process circa 2006/2007.
Intel, and die-shrinks, are nearing their physical limits as well, and not being able to cut security corners for performance gains at this point is out of the question, now that the cats are out of the bag.

There was also nothing "magical" about it - Netburst was a dud from day one, and the only reason Intel succeeded against the Athlon XP and Athlon 64 in overall sales was due to illegal and anti-competitive business practices and license agreements.
Foreshadow basically killed their implementation of SMP (HyperThreading) to the point where the only real security "fix" was to just disable it at the firmware/UEFI/BIOS level - great job, Intel.

As you push it more and more into general computing use you have to add more and more instructions negating loads of the architectural benefits.
Apple alone has proven this statement wrong with their in-house ARM64 processors and their performance gains each year, let alone other companies' performance gains with in-house, or off-the-shelf, ARM64 processors.
Not to mention, ARM64 doesn't have 40+ years of legacy/inefficient instruction sets holding it back, unlike x86/x86-64.
 
That was back when Intel was still developing Core and Core 2, and was at the 65nm process circa 2006/2007.

And in the early 1990's and in the mid 1980's....etc. x86 is Lazarus.

Intel, and die-shrinks, are nearing their physical limits as well

Yep, we have heard this before. And then *poof*

There was also nothing "magical" about it - Netburst was a dud from day one, and the only reason Intel succeeded against the Athlon XP and Athlon 64 in overall sales was due to illegal and anti-competitive business practices and license agreements.
Foreshadow basically killed their implementation of SMP (HyperThreading) to the point where the only real security "fix" was to just disable it at the firmware/UEFI/BIOS level - great job, Intel.

So, x86 beat x86 and didn't die because x86 did better than x86. Thanks, that makes my point.


Apple alone has proven this statement wrong with their in-house ARM64 processors and their performance gains each year, let alone other companies' performance gains with in-house, or off-the-shelf, ARM64 processors.

No, they have not. Quite the opposite really. Every day ARM processors add instruction sets and scale beyond original designs. Sure they look efficient in comparison to other options but so were MIPS, Alpha, SPARC, Power, VLIW based designs etc. x86 not only survived but supplanted them almost universally.

Not to mention, ARM64 doesn't have 40+ years of legacy/inefficient instruction sets holding it back, unlike x86/x86-64.

Yet. But it gains the bloat every day...like I said...."As you push it more and more into general computing use you have to add more and more instructions negating loads of the architectural benefits."

ARM is not the be all end all of computing much to your and others disappointment.
 
And in the early 1990's and in the mid 1980's....etc. x86 is Lazarus.

Yep, we have heard this before. And then *poof*
Do you not realize, transistors cannot physically shrink much past 2nm - as in, that is the physical limit, nearing the size of an actual atom.
So, unless you don't understand basic physics, that statement is complete bollocks.

Again, AMD is the only company keeping x86-64 innovative and competitive, since Intel didn't come up with chiplets, or working outside of a monolithic die.
The scaling, however, beyond 16 cores with the memory architecture is extremely limiting - [H]'s own reviews proved that with 16 vs 32 cores and memory bandwidth and data transfer rates.

I'm not seeing this limitation on 32-64+ core ARM64 CPUs, not to mention they scale much better in SMP than x86-64 is doing right now.
If people had said that x86 was the future in the 1980s, they would have laughed, considering the Motorola 68000-68030 CPUs curb-stomped x86 at almost every turn at that point - we are seeing the same thing with ARM64 succeeding x86-64 now, and here everyone is, laughing - history repeats itself...

So, x86 beat x86 and didn't die because x86 did better than x86. Thanks, that makes my point.
Because that was the market at the time, which was vastly different than it is now, and ARM wasn't a major competitor.
Really at that time, Motorola and IBM PowerPC was the only other true competitor in the desktop and laptop markets at the time, and smartphones and tablets weren't even to start becoming mainstream until the late 2000s at the earliest with ARM processors.

It's starting to sound like you may not have understood anything that has happened in the computer tech industry in the last 20-30 years.
Snarky one-off comments aren't helping your case...

No, they have not. Quite the opposite really. Every day ARM processors add instruction sets and scale beyond original designs. Sure they look efficient in comparison to other options but so were MIPS, Alpha, SPARC, Power, VLIW based designs etc. x86 not only survived but supplanted them almost universally.
x86 survived due to then-emerging markets and cost/performance ratios.
x86 systems were laughable at best in terms of performance compared to MIPS, Alpha, and POWER/PowerPC back in the 1990s and early 2000s, but the emerging software/game/development markets helped x86 gain traction over the rest due to costs, availability, and ease of use - guess what, much like what ARM64 is doing now with smartphones and tablets compared to x86-64 equipment.

Are you not seeing the writing on the wall for x86-64?

Yet. But it gains the bloat every day...like I said...."As you push it more and more into general computing use you have to add more and more instructions negating loads of the architectural benefits."
I won't deny that this may eventually happen to ARM64 the way it happened to x86-64, but we are at least another 10-20 years out from seeing that happen, and who knows what other markets will emerge that may supersede it - AI ASICs, GPUs, etc.
These were things that weren't around for added computational abilities throughout much of x86/x86-64's lifespans, so everything got dumped onto the CPU for performance and backwards compatibility - this may not even happen with ARM64, at least not in a classical sense that it happened with x86/x86-64.

ARM is not the be all end all of computing much to your and others disappointment.
Neither was x86/x86-64, but that didn't stop it from existing for 40+ years.
The end is nigh for Intel (not so much AMD due to their GPU markets and their own ARM processors) - like I said, I will give it another decade before the die-limitations are reached and Intel runs out of performance gains.

The only thing that will keep them afloat, much like Oracle and IBM, will be their licenses and royalties, because much like those companies, the word "innovation" will be totally dead, and so will the x86-64 ISA.
ARM64 is the future starting in 2020 (Apple device unification), much like x86 was the future starting in 1981 (IBM 5150).

History repeats itself. :)
 
Yes, ARM64 CPUs have the processing capabilities to run Crysis, easily.
The ARM64 CPU in the iPhone 8 is 40% faster overall than the x86-64 Jaguar CPU in the PS4 Pro - the only thing lacking is the GPU, obviously.
Lol you are comparing a 6 year old console processor to a 1.5 year old ARM processor? Total apples to apples comparison there!
 
Lol you are comparing a 6 year old console processor to a 1.5 year old ARM processor? Totally apples to apples comparison there!
I compared it because it was one of the 1:1 tests that emerged online.
So yeah, actually it was an apples-to-apples ARM64 to x86-64 comparison.

If you have something better, please share it. :)
 
Do you not realize, transistors cannot physically shrink much past 2nm - as in, that is the physical limit, nearing the size of an actual atom.
So, unless you don't understand basic physics, that statement is complete bollocks.

And I do understand basic physics, far more than you do, so that statement would then not be complete bollocks. So, huh. Weird, my comment stands.

Again, AMD is the only company keeping x86-64 innovative and competitive, since Intel didn't come up with chiplets, or working outside of a monolithic die.

So, x86 beat x86 and didn't die because x86 did better than x86. Thanks, that makes my point.


The scaling, however, beyond 16 cores with the memory architecture is extremely limiting - [H]'s own reviews proved that with 16 vs 32 cores and memory bandwidth and data transfer rates.

I'm not seeing this limitation on 32-64+ core ARM64 CPUs, not to mention they scale much better in SMP than x86-64 is doing right now.

With the current memory architecture. Huh. So let's see. What might change then and do exactly what I said before.

If people had said that x86 was the future in the 1980s, they would have laughed, considering the Motorola 68000-68030 CPUs curb-stomped x86 at almost every turn at that point - we are seeing the same thing with ARM64 succeeding x86-64 now, and here everyone is, laughing - history repeats itself...

Yet x86 was. And it has been every time it has been declared dead. Huh. Seems like you are doing a pretty good job of proving my point for me.


Because that was the market at the time, which was vastly different than it is now, and ARM wasn't a major competitor.

Really at that time, Motorola and IBM PowerPC was the only other true competitor in the desktop and laptop markets at the time, and smartphones and tablets weren't even to start becoming mainstream until the late 2000s at the earliest with ARM processors.

The market has been vastly different EVERY TIME people decalred x86 dead. Yet, it survived.

It's starting to sound like you may not have understood anything that has happened in the computer tech industry in the last 20-30 years.

Not in this case. I have understood it pretty well. I have watched the end be declared to x86 multiple times and it did not die. It evolved and moved up market even. Taking out heavy weights that were untouchable when it was just a lowly PC processor.


x86 survived due to then-emerging markets and cost/performance ratios.
x86 systems were laughable at best in terms of performance compared to MIPS, Alpha, and POWER/PowerPC back in the 1990s and early 2000s, but the emerging software/game/development markets helped x86 gain traction over the rest due to costs, availability, and ease of use - guess what, much like what ARM64 is doing now with smartphones and tablets compared to x86-64 equipment.

And yet, it did not die like it was supposed to. Instead, it evolved and supplanted them. Strange.

Are you not seeing the writing on the wall for x86-64?

I am, just like the last three or four times it died. Your argument of ARM will kill x86 because ARM will kill x86 isn't a real argument.


I won't deny that this may eventually happen to ARM64

You already tried to.


History repeats itself. :)

It does, which seems to be why x86 is still here. Strange.
 
Yes, ARM64 CPUs have the processing capabilities to run Crysis, easily.
The ARM64 CPU in the iPhone 8 is 40% faster overall than the x86-64 Jaguar CPU in the PS4 Pro - the only thing lacking is the GPU, obviously.

We do have Apples to Apples benchmarks - almost every thread listing "Apple moving to ARM in 2020" on here has them.
The games just need to be compiled/programmed/released for ARM64 now.

I wouldn't use the Nintendo Switch as an example, either - the ARM64 CPU within it is years old at this point, and the GPU is meager at best compared to modern equipment.


still unconvincing, the jaguar 64 is an old bulldozer amd processor with a bit of moderation to it, we all know how awesome bulldozer was ...

and i have yet to see any relatable bench marks showing arm out performing x86 in recognizable every day use bench marks,

even so in this article, https://blog.cloudflare.com/arm-takes-wing/ demonstrates how x86 pretty much trounces arm except in certain multi core bench marks. with it being dated it would be interesting to see how it compares vs the newer amds.


regardless arm is not an x86 killer. and apples attempt to move their creative professionals over to it will be just another knife in the back.
 
still unconvincing, the jaguar 64 is an old bulldozer amd processor with a bit of moderation to it, we all know how awesome bulldozer was ...

and i have yet to see any relatable bench marks showing arm out performing x86 in recognizable every day use bench marks,

even so in this article, https://blog.cloudflare.com/arm-takes-wing/ demonstrates how x86 pretty much trounces arm except in certain multi core bench marks. with it being dated it would be interesting to see how it compares vs the newer amds.


regardless arm is not an x86 killer. and apples attempt to move their creative professionals over to it will be just another knife in the back.
Technically they were more like brothers. Jaguar (h16) and Bulldozer (h15) were separate microarchitectures:
The issue widths (and peak instruction executions per cycle) of a Jaguar, K10, and Bulldozer core are 2, 3, and 4 respectively. This made Bulldozer a more superscalar design compared to Jaguar/Bobcat. However, due to K10's somewhat wider core (in addition to the lack of refinements and optimizations in a first generation design) the Bulldozer architecture typically performed with somewhat lower IPC compared to its K10 predecessors. It was not until the refinements made in Piledriver and Steamroller, that the IPC of the Bulldozer family distinctly began to exceed that of K10 processors such as Phenom II.
 
still unconvincing, the jaguar 64 is an old bulldozer amd processor with a bit of moderation to it, we all know how awesome bulldozer was ...
Jaguar isn't Bulldozer, and wasn't part of the CMT design.
It is also around ~4% faster than Bulldozer in general.

Nice article, thanks for sharing.
 
Everything I have been seeing in the last few years is the death-signs of x86-64, with only AMD keeping it truly competitive and afloat at this point.
I don't know the future, but I do see the writing on the wall, primarily for Intel - even their Xeon Phi line is being discontinued, officially, this year, which was x86-based.

What has changed, compared to all the other decades, is that now the limitations on transistors and die-shrinks are nearly being hit.
The only paradigm shift I'm really seeing is AMD moving to chiplets and away from monolithic designs, but even then, the memory architecture suffers with more than 16 cores due to their chiplet designs, though it does greatly reduce overall cost of production and final sales.

Intel has completely screwed the pooch with Meltdown, Foreshadow, and to a lesser extent, Spectre.
Their only hope is to continue to absorb and/or buy up other new technology startups in an attempt to slow the inevitable.

With whatever comes to pass in the next few years, I am going to refer back to this thread.
If I was right with ARM64 succeeding, I would like you to admit it - and if you were right with x86-64 continuing, I will gladly admit it. (y)
 
Last edited:
I remember the last few times x86 died.

Let numbers speak by themselves

nvidia_arm_versus_x86_shipments.jpg


Yep, heard it all before when x86 was declared dead. The magically the architecture evolved. ARM faces the same pressure only worse. As you push it more and more into general computing use you have to add more and more instructions negating loads of the architectural benefits.

Nope. ARM64 is an independent clean ISA developed from scratch. ARM engineers used the transition to 64bit to eliminate all the legacy modes, useless instructions, and weird stuff from the ISA. This is different from AMD64, which is only an extension to x86, and so has inherited all the useless and weird stuff from x86. There are hundred of x86 instructions aren't used anymore in new code, but those ancient instructions have to be supported by modern 64bit chips thanks to the smartness of AMD engineers.

Like x86, ARMv7 had a fair bit of cruft, and the architects took care to remove many of the byzantine aspects of the instruction set that were difficult to implement. The peculiar interrupt modes and banked registers are mostly gone. Predication and implicit shift operations have been dramatically curtailed. The load/store multiple instructions have also been eliminated, replaced with load/store pair. Collectively, these changes make AArch64 potentially more efficient than ARMv7 and easier to implement in modern process technology.
 
Last edited:
O(N) = linear

What a convoluted way of saying linear. It darn better scale as linear, it would suck to sort interconnects.

There is no magic sauce that will make it scale different than any other topology. It looks like an Intel ring bus over provisioned like AMD.

AMD interconnects don't scale linearly. That is why they have one inside CCX and another outside CCX.

And no. ARM isn't using a ring topology. Their design scales above 128 cores, no ring can do that.
 
still unconvincing, the jaguar 64 is an old bulldozer amd processor with a bit of moderation to it, we all know how awesome bulldozer was ...

Jaguar doesn't have anything to do with Bulldozer. Jaguar was even designed by a separate team of engineers than Bulldozer, and derivatives.

Jaguar had higher IPC than Bulldozer and Piledriver. And it was better than any similar chip from Intel. Jaguar was the best that AMD engineers could do, but it was killed by standard phone-class cores as the Cortex A57. Even AMD admitted this when replaced the Jaguar-based Opterons by A57-based Opterons (Seatle).

AMD couldn't do any competitive x86 chip for this range of power/area and so cancelled Jaguar, Puma, and derivatives, including the whole Skybridge project (Keller was working in a new low-power x86 core). AMD focused on Zen only.

and i have yet to see any relatable bench marks showing arm out performing x86 in recognizable every day use bench marks,

In this thread I have given SPEC benchmarks comparing Apple chips to Xeons and N1 ARM chips to Xeon and EPYC.

even so in this article, https://blog.cloudflare.com/arm-takes-wing/ demonstrates how x86 pretty much trounces arm except in certain multi core bench marks. with it being dated it would be interesting to see how it compares vs the newer amds.

First, the author is comparing a commercial Xeon to an engineering sample of ARM

both Qualcomm and Cavium provided us with engineering samples of their ARM based platforms, and in this blog post I would like to share my findings about Centriq, the Qualcomm platform.

Second, as he mentions the x86 workloads have been optimized for a while. They have been optimizing for the "Skylake based Purley platform" since 2016. The ARM platform was entirely new, so binaries couldn't be at the same level of optimization. He mentions this in some workloads. This is an example:

Nevertheless, at the SoC level, Falkor wins big time. It is only marginally slower than Skylake at an RSA2048 signature, and only because RSA2048 does not have an optimized implementation for ARM.

This is another example:

Is seems like the enablement effort so far was concentrated on the compiler back end, and the library was left largely untouched. There are a lot of low hanging optimization fruits out there, like my 20 minute fix for addMulVVW clearly shows. Qualcomm and other ARMv8 vendors intends to put significant engineering resources to amend this situation, but really any one can contribute to Go.

Third, some workloads depend exclusively on the frequency of the SKU used in the test and have little to do with x86 vs ARM. E.g. the Broadwell SKU was faster than the Skylake SKU in some test:

It is interesting, but not surprising to see that in the single core benchmark, the Broadwell core is faster than Skylake, and both in turn are faster than Falkor. This is because Broadwell runs at a higher frequency, while architecturally it is not much inferior to Skylake.

In the end he was impressed by the enginering sample

The engineering sample of Falkor we got certainly impressed me a lot.

Moreover, not everything is raw power in servers; power is almost so important. The sample gave him similar performance to Skylake Xeons but at half the power

during my tests it never went above 89W (for the go benchmark). In comparison Skylake and Broadwell both went over 160W, while the TDP of the two CPUs is 170W.

This means you can place two of those samples in the same power than a single Xeon. Obtaining about twice higher performance.

And that is comparing an engineering sample of the first design made by Qualcomm engineers, and non optimized binaries/compilers vs a commercial Xeon of a highly mature desing by Intel engineers, and using binaries/compilers optimized during years.

The same author wrote a posterior article

https://blog.cloudflare.com/neon-is-the-new-black/

where he explains his experiences on optimizing software for ARM. You can read the whole article, but you can find a summary in this image

jpegtran-asm-1.png


From left to right. The x86-optimized code on Xeon. The code simply ported to ARM and running on the Qualcomm engineering sample. The ARM code optimized with NEON instructions. The ARM code optimized with NEON instructions and assembler.


regardless arm is not an x86 killer. and apples attempt to move their creative professionals over to it will be just another knife in the back.

It is an x86 killer. Even Intel admits, behind closed doors, than ARM will replace x86.
 
Jaguar doesn't have anything to do with Bulldozer. Jaguar was even designed by a separate team of engineers than Bulldozer, and derivatives.

Jaguar had higher IPC than Bulldozer and Piledriver. And it was better than any similar chip from Intel. Jaguar was the best that AMD engineers could do, but it was killed by standard phone-class cores as the Cortex A57. Even AMD admitted this when replaced the Jaguar-based Opterons by A57-based Opterons (Seatle).

AMD couldn't do any competitive x86 chip for this range of power/area and so cancelled Jaguar, Puma, and derivatives, including the whole Skybridge project (Keller was working in a new low-power x86 core). AMD focused on Zen only.



In this thread I have given SPEC benchmarks comparing Apple chips to Xeons and N1 ARM chips to Xeon and EPYC.



First, the author is comparing a commercial Xeon to an engineering sample of ARM



Second, as he mentions the x86 workloads have been optimized for a while. They have been optimizing for the "Skylake based Purley platform" since 2016. The ARM platform was entirely new, so binaries couldn't be at the same level of optimization. He mentions this in some workloads. This is an example:



This is another example:



Third, some workloads depend exclusively on the frequency of the SKU used in the test and have little to do with x86 vs ARM. E.g. the Broadwell SKU was faster than the Skylake SKU in some test:



In the end he was impressed by the enginering sample



Moreover, not everything is raw power in servers; power is almost so important. The sample gave him similar performance to Skylake Xeons but at half the power



This means you can place two of those samples in the same power than a single Xeon. Obtaining about twice higher performance.

And that is comparing an engineering sample of the first design made by Qualcomm engineers, and non optimized binaries/compilers vs a commercial Xeon of a highly mature desing by Intel engineers, and using binaries/compilers optimized during years.

The same author wrote a posterior article

https://blog.cloudflare.com/neon-is-the-new-black/

where he explains his experiences on optimizing software for ARM. You can read the whole article, but you can find a summary in this image

View attachment 145165

From left to right. The x86-optimized code on Xeon. The code simply ported to ARM and running on the Qualcomm engineering sample. The ARM code optimized with NEON instructions. The ARM code optimized with NEON instructions and assembler.




It is an x86 killer. Even Intel admits, behind closed doors, than ARM will replace x86.
 
AMD interconnects don't scale linearly. That is why they have one inside CCX and another outside CCX.

I don't think you understand big O notation then.

AMD's interconnects scale linearly. If you pass through N interconnects you visit each one once.

It doesn't matter if one interconnect has a greater penalty than another, the scaling factor is still linear.

And no. ARM isn't using a ring topology. Their design scales above 128 cores, no ring can do that.

So mesh. ThunderX2 is ring. Doesn't matter.

Ring, Mesh, Hybrid they all have trade offs.
 
I am going to laugh pretty hard if it turns out that Apple beats google to the unified operating system. Google idea to make chrome and fuscia and whatever was down right moronic. They should have just kept scaling up phones and tablets into laptops and finally desktops and ran strait Android on it all.
 
The same author wrote a posterior article

https://blog.cloudflare.com/neon-is-the-new-black/

where he explains his experiences on optimizing software for ARM. You can read the whole article, but you can find a summary in this image

View attachment 145165

From left to right. The x86-optimized code on Xeon. The code simply ported to ARM and running on the Qualcomm engineering sample. The ARM code optimized with NEON instructions. The ARM code optimized with NEON instructions and assembler.

It is an x86 killer. Even Intel admits, behind closed doors, than ARM will replace x86.

I have some issues with this ASM. (edit: intrinsics)

For the SSE version the author used one variable to insert the natural order values. This basically creates a chain of 8 dependent instructions.

For the neon version the author used 2 variables and then combined them allowing for 2 chains of 4 dependent instructions.

The comparison is flawed.
 
Last edited:
and as i have stated, there is still no apples to apples benchmarks everything is abstracted to some degree.
arm is not an x86 killer
its a great processor to be sure and delivers quite a bit of power for its package.
but it still lags behind x86 .

i use arm on a daily basis ... in the note 9, i encounter it on the iphone, it is lacking .

and i dont care about the Specs benchmark , for what ever purpose they are for they dont do jizzle bits for me . They dont seem to mean or relate to anything meaningful when it comes to real world use.


but no thats not good enough , we have to keep hearing repeatedly how arm is sooo much more powerful than x86 when its not true, and every bench mark relys on a layer of abstraction to try and prove the point. even in that article, with the excuses for the missing instructions and the abstraction arm is still showing a respectable performance but nope... cant have that it must be Moar powerful than the x86 so pull out the Spec bench marks again.... sighhhh



regardless i still feel we are comparing apples to oranges and the usage scenario for the both chips are different.

but hell with it here is some more example of arm being trounced in relatable bench marks.

abstract away all you want ...


Linux kernel build times :
https://openbenchmarking.org/showdown/pts/build-linux-kernel


7 zip compress:
https://openbenchmarking.org/showdown/pts/compress-7zip

ImageMagik Compilation time:
https://openbenchmarking.org/showdown/pts/build-imagemagick

open ssl:
https://openbenchmarking.org/showdown/pts/openssl

apache concurrent users:
https://openbenchmarking.org/showdown/pts/apache-siege

since arm peeps like throwing meaningless benchmarks regarding some scientific thing heres one for x86:
https://openbenchmarking.org/showdown/pts/himeno (Wow look at those numbers Arm must be Dead ??!!)


so yeah i dont know why people are so keen on the death of the pc.
Frankly i am sick of it.
 
I don't think you understand big O notation then.

AMD's interconnects scale linearly. If you pass through N interconnects you visit each one once.
Obviously they don't scale linearly since in [H]'s own test, it showed that beyond 16 cores, the performance diminished greatly since it couldn't keep all 24/32 cores fed memory bandwidth (data transfer rate) properly.

Here is another site showcasing this very limitation:
https://www.pcworld.com/article/329...ng-amds-32-core-threadripper-performance.html

2990wx_die_topology_updated-100768693-large.jpg


die_top_2950x_updated-100768695-large.jpg


To quote the article:
A two-die 16-core Threadripper 2950X has 50GBps and two links between two dies,vs. the 25GBps among four dies that AMD originally claimed (and then amended).

That is the complete opposite of scaling linearly, as you stated it could.
Scaling linearly would mean no bandwidth (data transfer rate) loss with the addition of more nodes/cores, but with AMD's current design, that isn't happening at all, and in fact, that bandwidth is literally being cut in half; not trying to knock AMD or anything, the tech and interconnects just are what they are.

Again, x86-64's days are numbered, and if AArch64 (ARM64) continues to move forward, and with consumer-based cost-effective systems and development platforms (as Linus Torvalds stated was needed), then AArch64 will most certainly be the true death, and rightful successor, to x86-64.
 
Last edited:
I don't know if I'm dumber or smarter for having read this thread.
 
I am going to laugh pretty hard if it turns out that Apple beats google to the unified operating system. Google idea to make chrome and fuscia and whatever was down right moronic. They should have just kept scaling up phones and tablets into laptops and finally desktops and ran strait Android on it all.
That isn't their market, and they already have Chromebooks for that.
Google is focused purely on cloud apps and development, and their smartphones and tablets are only considered "access devices" needed to reach the data.

Really, with Google, that pendulum is swinging back towards the mainframe and dummy-terminal methodology with cloud services and smartphones/tablets.
 
Last edited:
Back
Top