Ampere Preps 7nm 128-Core Server CPU to Take on AMD and Intel

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
10,871
Finally heatin up! Hardcore!

"The Q80-33 will eventually pass the torch to the Altra Max, which will flaunt up to 128 cores. Ampere has confirmed that the Altra Max (codename Mystique) will be socket-compatible with current Altra offerings. We suspect that the the Altra Max will have an M prefix in its model names.

Ampere will sample the Altra Max in the fourth quarter of this year, and the processor should be available next year.

The company is also firm on its commitment to roll out the 2nd Generation Altra processors (codename Siryn) in 2022. If the nomenclature remains the same, the Siryn should sport the S prefix. The next-generation processors will leverage TSMC's 5nm process node.

Ampere expects to sample Siryn in the latter part of 2021 with a scheduled launch in 2022."


https://www.tomshardware.com/news/ampere-preps-7nm-128-core-server-cpu-to-take-on-amd-and-intel
 
Finally heatin up! Hardcore!

"The Q80-33 will eventually pass the torch to the Altra Max, which will flaunt up to 128 cores. Ampere has confirmed that the Altra Max (codename Mystique) will be socket-compatible with current Altra offerings. We suspect that the the Altra Max will have an M prefix in its model names.

Ampere will sample the Altra Max in the fourth quarter of this year, and the processor should be available next year.

The company is also firm on its commitment to roll out the 2nd Generation Altra processors (codename Siryn) in 2022. If the nomenclature remains the same, the Siryn should sport the S prefix. The next-generation processors will leverage TSMC's 5nm process node.

Ampere expects to sample Siryn in the latter part of 2021 with a scheduled launch in 2022."


https://www.tomshardware.com/news/ampere-preps-7nm-128-core-server-cpu-to-take-on-amd-and-intel
I thought ampere was a GPU? So confusing.
 
So this company licenses straight up cores from ARM holdings, makes some minor tweaks, crams a ton of them on to a package and then calls them server chips?

Interesting business model.

I wonder what the market is for this.

Counter to common belief, ARM has no power efficiency benefits. Power efficiency is mostly instruction set independent.

Amazon and Apple are likely going ARM due to the benefits they see from controlling their own designs and cutting out a middle man, but what benefit does any datacenter have in buying these chips?

Maybe they will be a lot cheaper than the available x86 alternatives?
 
Last edited:
So this company licenses straight up cores from ARM holdings, makes some minor tweaks, crams a ton of them on to a package and then calls them server chips?

Interesting business model.

I wonder what the market is for this.

Counter to common belief, ARM has no power efficiency benefits. Power efficiency is mostly instruction set independent.

Amazon and Apple are likely going ARM due to the benefits they see from controlling their own designs and cutting out a middle man, but what benefit does any datacenter have in buying these chips?

Maybe they will be a lot cheaper than the available x86 alternatives?
interesting...
We need an updated comparison with Ryzen, 10th gen Core, and A77/A78 from ARM, and maybe Qualcomm's flagship Snapdragon.
Surely there is one flavor of Linux that can support them all and run some benchies...
80 Core ARM CPU @ 3.3 Ghz is pretty good.
 
Last edited:
interesting...
We need an updated comparison with Ryzen, 10th gen Core, and A77/A78 from ARM, and maybe Qualcomm's flagship Snapdragon.
Surely there is one flavor of Linux that can support them all and run some benchies...
80 Core ARM CPU @ 3.3 Ghz is pretty good.

Will largely depend on the kind of server load. While servers tend to run more threads than desktops for sure, 128 cores is still going to be of limited use. Unlike graphics, general purpose computing doesn't automatically scale more parallel so just throwing more cores at something isn't going to work all the time, even if in highly parallel benchmarks it produces big numbers.

Doesn't mean it is worthless, but it is of more limited use so how exciting it ultimately is depends on how it compares to existing CPUs. Like say the 128 core CPU is about as fast as a 16 core Intel CPU. That would actually make the Intel CPU a much better choice, even for the same price, because its power can be more focused on less threads when needed.
 
So this company licenses straight up cores from ARM holdings, makes some minor tweaks, crams a ton of them on to a package and then calls them server chips?

Interesting business model.

I wonder what the market is for this.

Counter to common belief, ARM has no power efficiency benefits. Power efficiency is mostly instruction set independent.

Amazon and Apple are likely going ARM due to the benefits they see from controlling their own designs and cutting out a middle man, but what benefit does any datacenter have in buying these chips?

Maybe they will be a lot cheaper than the available x86 alternatives?
I'm thinking something along the lines of 'threads per watt' or somesuch. The ability to have a core ready to respond with some microservice waiting at the ready, at a much lower cost than relatively fat x86 cores. But outside of something like that, which is a pretty thin use-case, it'd need to be something like Fujitsu's new supercomputer spin of ARM where they strap a fuckton of HBM to the core clusters and stick 512-bit SIMD units in them for HPC work.
 
Will largely depend on the kind of server load. While servers tend to run more threads than desktops for sure, 128 cores is still going to be of limited use. Unlike graphics, general purpose computing doesn't automatically scale more parallel so just throwing more cores at something isn't going to work all the time, even if in highly parallel benchmarks it produces big numbers.

Doesn't mean it is worthless, but it is of more limited use so how exciting it ultimately is depends on how it compares to existing CPUs. Like say the 128 core CPU is about as fast as a 16 core Intel CPU. That would actually make the Intel CPU a much better choice, even for the same price, because its power can be more focused on less threads when needed.
Video converting will be a good start.
And before anyone says that handbrake can't use more than x cores, remember you can have more instances of ffmpeg running at once...
 
Video converting will be a good start.
And before anyone says that handbrake can't use more than x cores, remember you can have more instances of ffmpeg running at once...
This thing should probably suck less at video converting... but wouldn't you want to run something even more suited for it, like GPUs or if possible, dedicated ASICs?

Even the dinky GPUs Intel uses spank the best x86 CPUs in terms of encoding speed.
 
Video converting will be a good start.
And before anyone says that handbrake can't use more than x cores, remember you can have more instances of ffmpeg running at once...

You don't really need servers for that though. There really isn't a market. I'm not saying nobody every would build a server to do a ton of video conversion... but those people are few an far between. Video conversion is one of those quasi-synthetic benchmarks that fanboys like to get all worked up over, but that matters very little to normal use. Video converting is like Cinebench rendering: Something the average user just doesn't do. They play videos, they don't convert them from one format to another. However for people that do...

This thing should probably suck less at video converting... but wouldn't you want to run something even more suited for it, like GPUs or if possible, dedicated ASICs?

Even the dinky GPUs Intel uses spank the best x86 CPUs in terms of encoding speed.

...this is how it is usually done when speed matters. Dedicated silicon can just crush the speed of the intensive math operations like iDCT and such. So when you need throughput, you get dedicated silicon either as part of your CPU or something else. A modern GPU can do multiple realtime streams of encoding at once, and only the tiniest fraction of their silicon is devoted to it. It is the same deal with crypto. We don't try to have massively powerful, parallel, CPUs to do fast AES encryption. Instead we just have specalized silicon on a CPU, or a separate card, to do it. Waaay more efficient.
 
This thing should probably suck less at video converting... but wouldn't you want to run something even more suited for it, like GPUs or if possible, dedicated ASICs?

Even the dinky GPUs Intel uses spank the best x86 CPUs in terms of encoding speed.
For GPUs faster encoding speed doesn't mean it keeps the same quality.
Turing is only starting to get on par with CPU encoding as far as quality.

I've used Intel QuickSync and Maxwell's CUDA for encoding. Yes they are faster, but at a huge quality loss...
If I wanted to compress a 1080p Bluray movie to 6GB with minimal quality loss, i'm stuck with CPU, as GPU algorithms put out trash compared to CPU.
Where space is not a problem, by all means use GPU, but I don't have unlimited space, though I wish I did.

dedicated ASICs, that's another story. They need respins everytime there is a change in the format. Not really feasible unless you're ok with one specific format and you got the money
 
For GPUs faster encoding speed doesn't mean it keeps the same quality.
Turing is only starting to get on par with CPU encoding as far as quality.
Agreed, but that's not because they can't, but because they've picked a particular level of quality to bake in. And that's talking about the fixed-function blocks; port the stuff running on CPUs to the shader blocks, and it'd be a non-issue (pretty sure this is what Premier and Resolve etc. do, probably only using the fixed-function stuff for previews).
 
For GPUs faster encoding speed doesn't mean it keeps the same quality.
Turing is only starting to get on par with CPU encoding as far as quality.

I've used Intel QuickSync and Maxwell's CUDA for encoding. Yes they are faster, but at a huge quality loss...
If I wanted to compress a 1080p Bluray movie to 6GB with minimal quality loss, i'm stuck with CPU, as GPU algorithms put out trash compared to CPU.

Understand though that is a very minimal use. Most people don't care about Blurays period, much less reencoding them to be smaller. That you do is not a problem... but it also isn't a reason why the mass market would care about a 128-core server CPU. People aren't going to say "Let me go buy one of those and have a dedicated server for my media reencoding!" in large numbers. For most people, to the extent they need video encoding, it is for streaming or video conferencing and the hardware is more than good enough for that.
 
Understand though that is a very minimal use. Most people don't care about Blurays period, much less reencoding them to be smaller. That you do is not a problem... but it also isn't a reason why the mass market would care about a 128-core server CPU. People aren't going to say "Let me go buy one of those and have a dedicated server for my media reencoding!" in large numbers. For most people, to the extent they need video encoding, it is for streaming or video conferencing and the hardware is more than good enough for that.
It still doesn't change the fact that it is a good benchmark... It is certainly more relevant than something like linpack
 
Also, running those benchmarks can help you determine how it will handle multitasking.
Say if this 128 core CPU got smoked by a 64 core epyc, you could determine that it may not handle multitasking as well as the 64 core epyc.
Servers can run all kinds of workloads at the same time for max throughput.
So running a multithreaded benchmark on a real world application to compare IS useful whether it be ffmpeg encoding, cinebench, compiling, or all of that at the same time.
To say that an application that people CAN use in the real world to produce something "has minimal use" is totally unfounded.

No one would be interested in seeing how well a 128 core CPU could run BFV because most people know a $200 CPU would be faster anyways
 
Agreed, but that's not because they can't, but because they've picked a particular level of quality to bake in. And that's talking about the fixed-function blocks; port the stuff running on CPUs to the shader blocks, and it'd be a non-issue (pretty sure this is what Premier and Resolve etc. do, probably only using the fixed-function stuff for previews).
I wonder if you could tweak ffmpeg CPU encoding to suck as bad as quick sync and then compare the speed difference. The only problem I see with that is quality is subjective.
I also wonder how much slower GPU encoding would be if you reveresed it.
 
So this company licenses straight up cores from ARM holdings, makes some minor tweaks, crams a ton of them on to a package and then calls them server chips?

Interesting business model.

I wonder what the market is for this.

Counter to common belief, ARM has no power efficiency benefits. Power efficiency is mostly instruction set independent.

Amazon and Apple are likely going ARM due to the benefits they see from controlling their own designs and cutting out a middle man, but what benefit does any datacenter have in buying these chips?

Maybe they will be a lot cheaper than the available x86 alternatives?

ARM Holdings has always had the same business model. They sell architecture licenses and microarchitecture licenses.

The market is all the customers for which Amazon GR2 is not powerful enough.

The Extremetech article is based in a wrong paper from a University group related to Intel. ARM has power efficiency advantages over x86. Even Jim Keller gave a talk about this topic and why K12 was going to be better than Zen.

The benefit for customers are (i) open ecoystem, (ii) cheaper, and (iii) power efficient.
 
Also, running those benchmarks can help you determine how it will handle multitasking.
Say if this 128 core CPU got smoked by a 64 core epyc, you could determine that it may not handle multitasking as well as the 64 core epyc.
Servers can run all kinds of workloads at the same time for max throughput.

Considering that the 80 core version is faster than a 64-core EPYC. It is difficult to believe that this 128-core version will be smoked.
 
It still doesn't change the fact that it is a good benchmark... It is certainly more relevant than something like linpack
I wonder if you could tweak ffmpeg CPU encoding to suck as bad as quick sync and then compare the speed difference. The only problem I see with that is quality is subjective.
I also wonder how much slower GPU encoding would be if you reveresed it.
I wonder if you could tweak ffmpeg CPU encoding to suck as bad as quick sync and then compare the speed difference. The only problem I see with that is quality is subjective.
I also wonder how much slower GPU encoding would be if you reveresed it.
So... benchmarking is actually hard. Relevance is... relative!

QuickSync / NVENC / AMD equivalent are designed largely for speed and efficiency first. Quality goes up over time, and Nvidia is ahead here a bit, but primarily because they have the most recently designed logic. I don't really know what AMD's problem is, other than refusing to put GPUs on all of their consumer CPUs.

But overall, something like transcoding in hardware is going to lag in quality, at least until CODECs stabilize a bit.
 
So... benchmarking is actually hard. Relevance is... relative!

That's something I always try to go on to people about. Sure something like Cinebench is neat... but unless you are using it to do renders the results don't tell you much. Different kinds of workloads vary a lot. Like in my case: I play games but in particular a lot of Elder Scrolls Online which is not well programmed. It tends to be CPU limited by a single thread. It makes a bit of use of other threads but only a couple, and is almost always waiting on the CPU so the GPU doesn't need to be more powerful. So for that, I needed as much per-core performance as I could get. Total performance not relevant for that. So high IPC and high clocks were better than more cores.

Or audio production, I also do that. There it all matters in different ways. Total performance isn't irrelevant, since every plugin is a thread so you can use a core per effect/instrument and thus use tons. However those things don't generally use very much, they can be stacked real high on a single core so you don't need lots of cores. However there are also some instruments with heavy scripting that can hit a single core hard. The scripts only run on one core, they can't be threaded, so you need some beefy per-core performance for those particular instruments. Finally a thing that is very important, and not measured, is latency. You need to be able to do things FAST and service system calls fast, or you can have dropouts when doing realtime audio. You can have situations where your actual CPU load is fairly low, yet you get a dropout and Cubase shows the realtime audio load as peaking out. This is something that finding benchmarks on is hard, as it is a combo of things that contribute to it. So just having a hugely multi-core CPU with slow per core performance wouldn't be great for it.

Now of course I am never going to claim that what I do is what everyone does and that everyone should care about what I care about. But it is an example of how benchmarks can be situation-specific. Much like most people don't care about ESO or Cubase, most people also don't care about x264 or Cinebench. Different people ahve different needs. So trying to find benchmarks that take good advantage of highly parallel CPUs and showing those as a big win... well that is msileading.
 
show me some benchmarks, I would like to see...

Estimated performance

image.jpg


https://www.phoronix.com/image-viewer.php?id=ampere-altra-80core&image=ampere_altra_6_lrg
 
Back
Top