IBM Squeezes AI Into Tiny Scalable Cores

AlphaAtlas · Oct 10, 2018

At VLSI 2018, IBM showed off an interesting machine learning architecture. Instead of squeezing a ton of throughput through huge cores, as AMD, Nvidia, and Google do with their AI products, IBM tiled tiny cores in a giant 16x32 2D array. David Kanter says that each "PE is a tiny core; including an instruction buffer, fetch/decode stage, 16-entry register file, a 16-bit floating-point execution units, and binary and ternary ALUs, and fabric links to and from the neighboring PEs." There are also SFUs designed to handle 32 bit floating point data, and separate X and Y caches with 192GB/s of bandwidth each. Much like Intel's Skylake Xeons, the cores are connected to each other with a mesh fabric. A test accelerator IBM made reportedly offered 1.5 Tflops of machine learning training throughput on a 9mm^2 chip made on a 14nm process, and achieved 95% utilization when training on a batch of images.

As a research project, the absolute performance is not terribly important. However, the key architectural choices are quite interesting. IBM's processor uses a large array of very small processor cores with very little SIMD. This architectural choice enables better performance for sparse dataflow (e.g., sparse activations in a neural network). In contrast, Google, Intel, and Nvidia all rely on a small number of large cores with lots of dense data parallelism to achieve good performance. Related, IBM's PEs are arranged in a 2D array with a mesh network, a natural organization for planar silicon and a workload with a reasonable degree of locality. While Intel processors also use a mesh fabric for inter-core communication, GPUs have a rather different architecture that looks more similar to a crossbar. The IBM PEs are optimized for common operations (e.g., multiply-accumulate) and sufficiently programmable to support different dataflows and reuse patterns. Less common operations are performed outside of the core in the special function units. As with many machine learning processors, a variety of reduced precision data formats are used to improve throughput. Last, the processor relies on software-managed data (and instruction) movement in explicitly addressed SRAMs, rather than hardware-managed caches. This approach is similar to the Cell processor and offers superior flexibility and power-efficiency (compared to caches) at the cost of significant programmer and tool chain complexity. While not every machine learning processor will share all these attributes, it certainly illustrates a different approach from any of the incumbents - and more consistent with the architectures chosen by start-ups such as Graphcore or Wave that solely focus on machine learning and neural networks.

nutzo · Oct 10, 2018

I for one support our tiny scalable overlords

sfsuphysics · Oct 10, 2018

So at what point do we just roll over and die? Now I know now they use AI as more of a buzz word similar to how they use nano everywhere but still. We think we can control the beast

Deleted member 83233 · Oct 10, 2018

sfsuphysics said:
So at what point do we just roll over and die? Now I know now they use AI as more of a buzz word similar to how they use nano everywhere but still. We think we can control the beast

We just have to have a smart, yet fully subservient AI that can go to war with the rogue AI if it ever comes up.

We have to pit the safe machines against the naughty and willful machines. We may lose, but maybe it'll give us time to upload our consciousness into a little box and ship it out into space.

Or maybe we have to merge with the machines, and become a war-faring tyrannical bio-mechanical race known as the Strogg.

At some point we'll just have to stop enjoying meat-life, and shift our interests elsewhere. In the meantime, I'll enjoy the decadence of the meat.

KazeoHin · Oct 10, 2018

Ai as it is used today is merely an imitator. It imitates extremely specific actions well, based on "what was done correctly before". We are nowhere near a GAI or General Artificial Intelligence, which can learn new actions and make unprecedented decisions without specific instructions.

Deleted member 83233 · Oct 10, 2018

KazeoHin said:
Ai as it is used today is merely an imitator. It imitates extremely specific actions well, based on "what was done correctly before". We are nowhere near a GAI or General Artificial Intelligence, which can learn new actions and make unprecedented decisions without specific instructions.

Yeah yeah, but what fun is that to talk about?

R_Type · Oct 10, 2018

I think we're not far from a point where gpus become to cumbersome for machine learning and custom dedicated chips rule. Like in mining.

N4CR · Oct 10, 2018

R_Type said:
I think we're not far from a point where gpus become to cumbersome for machine learning and custom dedicated chips rule. Like in mining.

Decent SoCs finally are arriving for that market so you are right. Like ASIC all over again but this time they won't be forked out because the data is the value, not some bitsheckel that requires acceptance.

lostinseganet · Oct 10, 2018

Isn't this what the cell processor for the Sony playstation was? Spe's and PPE's

andrewaggb · Oct 10, 2018

R_Type said:
I think we're not far from a point where gpus become to cumbersome for machine learning and custom dedicated chips rule. Like in mining.

Guess it depends. Right now nvidia also just works with everything stock and they've been adding the dedicated 'tensor cores' etc as well. I suspect they won't be kicked to the curb anytime soon for general purpose machine learning.

Nobu · Oct 10, 2018

lostinseganet said:
Isn't this what the cell processor for the Sony playstation was? Spe's and PPE's

Something like that (it's even mentioned in tfa), but I'm not sure the details are exactly the same. The Cell is an IBM design, fwiw.

clockdogg · Oct 11, 2018

"IBM Squeezes AI Into Tiny Scalable Cores"

That's great. But what happens when the AI starts to think big? What will those big AI-thoughts think about being squeezed deliberately into tiny cores? Sounds they need a big containment plan or it could be big trouble in little cores.

IBM Squeezes AI Into Tiny Scalable Cores

AlphaAtlas

[H]ard|Gawd

nutzo

Supreme [H]ardness

sfsuphysics

[H]F Junkie

Deleted member 83233

Guest

KazeoHin

[H]F Junkie

Deleted member 83233

Guest

R_Type

Limp Gawd

N4CR

Supreme [H]ardness

lostinseganet

[H]ard|Gawd

andrewaggb

Limp Gawd

Nobu

[H]F Junkie

clockdogg

[H]ard|Gawd