Little-known Japanese CPU threatens to make Nvidia, Intel and AMD obsolete

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
10,891
Impressed?

"It is considered a general purpose CPU, but surpasses even GPUs from Nvidia and AMD on the all-important metric of performance per watt. Indeed, a 768-CPU prototype sits on top of the Green500 list - the leaderboard for supercomputers that deliver the most power per watt.



The A64FX was designed expressly to power the successor of Japan’s main supercomputer, the K, which was decommissioned back in August 2019.


Its replacement - the Fugaku - is expected to be 100 times faster when it launches later this year, will run on a Linux distribution called McKernel and will reach a staggering 400 petaflops. The aim is for it to be the first supercomputer to hit one exaflop when fully deployed with half a million processors buzzing."


https://www.techradar.com/news/litt...e-nvidia-intel-and-amd-obsolete-in-hpc-market
 
I always find it believable when some company out of nowhere says they came up with consumer tech far better than that of companies that do it for the last 40-50 years.

Particularly with something that has as much money and high speed development as CPUs. Really? You REALLY think you found a way to leapfrog everyone else? Skeptical Sycraft is skeptical.

I mean I feel the same way about ARM and more particularly their fans: For years claims that ARM would destroy Intel on the desktop, be faster, cheaper, more efficient, etc... And yet here we are, where ARM chips continue to be excellent low power/mobile chips but are not able to throw down the same amount of high end computation... because for them to do so would require more silicon, more power, just like Intel/AMD.
 
I always find it believable when some company out of nowhere says they came up with consumer tech far better than that of companies that do it for the last 40-50 years.

Fujitsu is the co developer of the sparc processors. (with Oracle) A64FX was also co designed with ARM. The prototype of the Japanese PostK super computer using very early A64FX silicon is already in 159th place on the list of worlds fastest super computers.

Sparc is/was never a consumer ISA.... and the A64 chips although they are ARM are extremely purpose built. There not designed to compete in markets outside HPC. Still many consumers have used Fujitsu CPUs... there is a good chance many of us with some age to us have used a Fujitsu Sparclite CPU if we owned a digital camera in the 90s and early 2000s.

https://en.wikipedia.org/wiki/SPARC

Cray will be building machines using the A64FX. So its possible we may see more Fujitsu in the super computer market now that they have migrated to ARM.
https://insidehpc.com/2020/02/isambard-2-at-uk-met-office-to-be-largest-arm-supercomputer-in-europe/
 
This is stupid. Coreteks said the same thing about A64FX. The truth is your GPU and CPU can both act like a GPU and CPU, but be really shitty at it. Intel and AMD CPU's focus on IPC, specifically serial work loads. Meaning higher clock speed and a lot of cache is what is used to achieve better IPC. GPU's don't care for IPC and instead focus on multi-threaded work that mostly focuses on math.

The A64FX is Intel's Larrabee from 2008. Larrabee used many cut down x86 cores to achieve a GPU like level of performance. Except it wasn't very good at it and eventually faded away. The A64FX which sounds oddly like an old Athlon 64 has only 3.38 TFLOPS, while an old Geforce GTX 970 has 4 TFLOPS. The A64FX sounds like the Sony Cell chip in how it can take over the world but in reality it was a shitty CPU that needed endless amounts of extra code to make the thing work.

 
They somehow broke the law of physics?

In what way do they need to break any laws.... these are the first CPUs in the world to use ARMs scalable vector extensions, developed by ARM together with Fujisu.

There is no doubt this is the most energy efficient HPC architecture there is right now. Cray is also building machines using these chips. Not every super computer in the world is going to switch over... and dump their racks of GPUs. Super computing has a lot of diverse needs. Their are many super computer work loads where GPU compute doesn't really help. Or where general purpose vector extensions can achieve the same result while using far less power.

Fujisu has already proven these do exactly what they where designed to do...
https://insidehpc.com/2019/11/prototype-of-fugaku-supercomputer-reaches-number-one-on-green500/

PostK... the latest Fugaku super computer will be complete late this year (or early next year). It will end up in the top 500 somewhere.... while drawing 1/4 the power and also costing quite a bit less. Its going to run Red Hat Linux with a custom kernel... and provide a good set of tools to programmers.
https://postk-web.r-ccs.riken.jp/
This is a list of what they are targeting to crunch... https://postk-web.r-ccs.riken.jp/appl.html
Its going to have LLVM and GCC compilers support, the compiler will be enhanced with automatic SIMD vectorization and prefetching. Meaning no Cuda or OpenACC. The compiler will also be completely backward compatible with the Fujitsu K super computer compiler (sparc). ARM has been working on ARM SVE compiler support for 3 or 4 years already.

What makes Fujitsu chip interesting is that it is not a Fujistu proprietary design... its not something they pulled form their rears. Its a ARM design. Its ARMv8 with SVE. ARM has already done the work of creating compiler support for SVE. Fujitsu developed Sparc with Oracle... and they developed these chips with ARM. ARM SVE is arms play for HPC not Fujisu alone. Over the next couple years we are going to see more Arm SVE chips come to market.
https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf
The major compilers all have SVE support at this point... and ARM has had emulation tools for SVE for a few years. Software is ahead of the hardware... which is what has torpedoed most previous ARM HPC plays. They have mostly involved ARM cores with tons of proprietary bits that require custom compilers and software workarounds. This ARM play is different as its an official ARM ISA.... we will see more Arm V8 SVE core chips. The compiler work already done will not need to be replicated... once there is a software ecosystem, other companies are going to licence SVE from arm and make their own chips.
 
I always find it believable when some company out of nowhere says they came up with consumer tech far better than that of companies that do it for the last 40-50 years.


When that company is Fujitsu, you may find it be a bit more believable. After all, they're the only company that kept high-performance Sparc chips from going into the dumpster early (After Sun Micro hit the wall on things).

They've just ditched Sun for ARM, but you can believe performance claims. The A64FX was the first arm chip to ship with Scalable Vector Extensions support, and I'm sure their second-generation chip has even higher density.
 
So, how are we, consumers can benefit?
We probably won't since just like the Vector CPUs they use in Japan in NEC computers they are designed for stuff we don't do generally.

Fujitsu's SPARC64 CPUs were pretty good back in the day actually, I've still got an old rack system with them.

I always find it believable when some company out of nowhere says they came up with consumer tech far better than that of companies that do it for the last 40-50 years.
Fujitsu has been making SPARC CPUs since the 80s.
 
Last edited:
A 7nm processor outperforming a bunch of "12"/14/16nm products? Looking at the Green500, that's what I'm seeing.

As already noted, this is a continuation of existing Japanese supercomputer efforts. Generally decent looking HW; very little traction outside of Japanese government use. Does make for some nice conference and IEEE papers to read, though :)

That being said, given it's not chasing the coprocessor approach (Viatech/Centaur) or accelerator (Nvidia, NEC, PEZY, etc), rather A64FX is using arm's SVE ISA extension to address the execution unit's SIMD FPUs (seems to be capable of handling INT16/INT8 as well). Also using their own point-to-point connection(Tofu, now on the 3rd gen as TofuD) for their inter-node network. Looking at their papers for their Tofu network's topology, it does have a lot of scaling and BW limits, like older implementations of Nvidia's NVLink (Nvidia did bring that older layout back for a 4 node board involving their A100 Tensor Core GPU).

Speaking of Nvidia, I suppose we could throw their Jetson products into the coprocessor approach category. In certain aspects (INT8), it has similar throughput to the A64FX. Obviously, a SIMD approach to FPU/ALU (somewhat similar to the NCORE on Via/Centaur's CHA) does have its advantages in register access and (perhaps?) flexibility.
 
A 7nm processor outperforming a bunch of "12"/14/16nm products? Looking at the Green500, that's what I'm seeing.

As already noted, this is a continuation of existing Japanese supercomputer efforts. Generally decent looking HW; very little traction outside of Japanese government use. Does make for some nice conference and IEEE papers to read, though :)

That being said, given it's not chasing the coprocessor approach (Viatech/Centaur) or accelerator (Nvidia, NEC, PEZY, etc), rather A64FX is using arm's SVE ISA extension to address the execution unit's SIMD FPUs (seems to be capable of handling INT16/INT8 as well). Also using their own point-to-point connection(Tofu, now on the 3rd gen as TofuD) for their inter-node network. Looking at their papers for their Tofu network's topology, it does have a lot of scaling and BW limits, like older implementations of Nvidia's NVLink (Nvidia did bring that older layout back for a 4 node board involving their A100 Tensor Core GPU).

Speaking of Nvidia, I suppose we could throw their Jetson products into the coprocessor approach category. In certain aspects (INT8), it has similar throughput to the A64FX. Obviously, a SIMD approach to FPU/ALU (somewhat similar to the NCORE on Via/Centaur's CHA) does have its advantages in register access and (perhaps?) flexibility.

Being on a smaller node for sure helps with power efficiency to a point... you can make a power hungry 7nm chip if that is the goal of your design, and you can make a power sipping 16nm chip as well. When your talking about scaling out with 1,000s of chips small power design choices can have a massive impact.

Anyway to say these will gain no traction out of Japan is not the case. Fujitsu has signed a deal with Cray. Cray has already announced early test bed sales to Los Alamos National Laboratory, Oak Ridge National Laboratory, Stony Brook University, and the University of Bristol. I believe Cray officially launches them this fall... it will be interesting to see if they get and big install wins. Of all the ARM super computer plays the last few years... this one has the best chance for success. As ARM themselves are heavily invested in the software ecosystem. Not having to reinvent the wheel software wise, and having unified mature compiler support makes this imo the first real play at ARM super computing. Every previous ARM super computer chip was just a standard ARM ISA licence with a many many core design and proprietary bits that made software more expensive and make overall costs higher then traditional ISAs. (and meant betting millions of dollars in some cases on a chip package that could end up with zero support when it was abandoned by MFGs) In this case being a official ARM HPC ISA... even if Fujitsu was to leave the market (which is doubtful they have been making HPC chips since the 80s) ARM will be licencing SVE to other MFGs.
 
Impressed?

"It is considered a general purpose CPU, but surpasses even GPUs from Nvidia and AMD on the all-important metric of performance per watt. Indeed, a 768-CPU prototype sits on top of the Green500 list - the leaderboard for supercomputers that deliver the most power per watt.



The A64FX was designed expressly to power the successor of Japan’s main supercomputer, the K, which was decommissioned back in August 2019.


Its replacement - the Fugaku - is expected to be 100 times faster when it launches later this year, will run on a Linux distribution called McKernel and will reach a staggering 400 petaflops. The aim is for it to be the first supercomputer to hit one exaflop when fully deployed with half a million processors buzzing."


https://www.techradar.com/news/litt...e-nvidia-intel-and-amd-obsolete-in-hpc-market

ARM isn't japanese
 
Impressed?

"It is considered a general purpose CPU, but surpasses even GPUs from Nvidia and AMD on the all-important metric of performance per watt. Indeed, a 768-CPU prototype sits on top of the Green500 list - the leaderboard for supercomputers that deliver the most power per watt.



The A64FX was designed expressly to power the successor of Japan’s main supercomputer, the K, which was decommissioned back in August 2019.


Its replacement - the Fugaku - is expected to be 100 times faster when it launches later this year, will run on a Linux distribution called McKernel and will reach a staggering 400 petaflops. The aim is for it to be the first supercomputer to hit one exaflop when fully deployed with half a million processors buzzing."


https://www.techradar.com/news/litt...e-nvidia-intel-and-amd-obsolete-in-hpc-market

Not really. The article title is click bait (the thread tittle moreso). The HPC market is a very specialized area and this is old news in that area.
 
Back
Top