IBM Squeezes AI Into Tiny Scalable Cores

AlphaAtlas

[H]ard|Gawd
Staff member
Joined
Mar 3, 2018
Messages
1,713
At VLSI 2018, IBM showed off an interesting machine learning architecture. Instead of squeezing a ton of throughput through huge cores, as AMD, Nvidia, and Google do with their AI products, IBM tiled tiny cores in a giant 16x32 2D array. David Kanter says that each "PE is a tiny core; including an instruction buffer, fetch/decode stage, 16-entry register file, a 16-bit floating-point execution units, and binary and ternary ALUs, and fabric links to and from the neighboring PEs." There are also SFUs designed to handle 32 bit floating point data, and separate X and Y caches with 192GB/s of bandwidth each. Much like Intel's Skylake Xeons, the cores are connected to each other with a mesh fabric. A test accelerator IBM made reportedly offered 1.5 Tflops of machine learning training throughput on a 9mm^2 chip made on a 14nm process, and achieved 95% utilization when training on a batch of images.

As a research project, the absolute performance is not terribly important. However, the key architectural choices are quite interesting. IBM's processor uses a large array of very small processor cores with very little SIMD. This architectural choice enables better performance for sparse dataflow (e.g., sparse activations in a neural network). In contrast, Google, Intel, and Nvidia all rely on a small number of large cores with lots of dense data parallelism to achieve good performance. Related, IBM's PEs are arranged in a 2D array with a mesh network, a natural organization for planar silicon and a workload with a reasonable degree of locality. While Intel processors also use a mesh fabric for inter-core communication, GPUs have a rather different architecture that looks more similar to a crossbar. The IBM PEs are optimized for common operations (e.g., multiply-accumulate) and sufficiently programmable to support different dataflows and reuse patterns. Less common operations are performed outside of the core in the special function units. As with many machine learning processors, a variety of reduced precision data formats are used to improve throughput. Last, the processor relies on software-managed data (and instruction) movement in explicitly addressed SRAMs, rather than hardware-managed caches. This approach is similar to the Cell processor and offers superior flexibility and power-efficiency (compared to caches) at the cost of significant programmer and tool chain complexity. While not every machine learning processor will share all these attributes, it certainly illustrates a different approach from any of the incumbents - and more consistent with the architectures chosen by start-ups such as Graphcore or Wave that solely focus on machine learning and neural networks.
 
So at what point do we just roll over and die? Now I know now they use AI as more of a buzz word similar to how they use nano everywhere but still. We think we can control the beast
 
So at what point do we just roll over and die? Now I know now they use AI as more of a buzz word similar to how they use nano everywhere but still. We think we can control the beast

We just have to have a smart, yet fully subservient AI that can go to war with the rogue AI if it ever comes up. :D We have to pit the safe machines against the naughty and willful machines. We may lose, but maybe it'll give us time to upload our consciousness into a little box and ship it out into space. ;)

Or maybe we have to merge with the machines, and become a war-faring tyrannical bio-mechanical race known as the Strogg.

At some point we'll just have to stop enjoying meat-life, and shift our interests elsewhere. In the meantime, I'll enjoy the decadence of the meat.
 
Ai as it is used today is merely an imitator. It imitates extremely specific actions well, based on "what was done correctly before". We are nowhere near a GAI or General Artificial Intelligence, which can learn new actions and make unprecedented decisions without specific instructions.
 
Ai as it is used today is merely an imitator. It imitates extremely specific actions well, based on "what was done correctly before". We are nowhere near a GAI or General Artificial Intelligence, which can learn new actions and make unprecedented decisions without specific instructions.

Yeah yeah, but what fun is that to talk about? ;)
 
I think we're not far from a point where gpus become to cumbersome for machine learning and custom dedicated chips rule. Like in mining.
 
  • Like
Reactions: N4CR
like this
I think we're not far from a point where gpus become to cumbersome for machine learning and custom dedicated chips rule. Like in mining.
Decent SoCs finally are arriving for that market so you are right. Like ASIC all over again but this time they won't be forked out because the data is the value, not some bitsheckel that requires acceptance.
 
I think we're not far from a point where gpus become to cumbersome for machine learning and custom dedicated chips rule. Like in mining.

Guess it depends. Right now nvidia also just works with everything stock and they've been adding the dedicated 'tensor cores' etc as well. I suspect they won't be kicked to the curb anytime soon for general purpose machine learning.
 
Last edited:
Isn't this what the cell processor for the Sony playstation was? Spe's and PPE's
Something like that (it's even mentioned in tfa), but I'm not sure the details are exactly the same. The Cell is an IBM design, fwiw.
 
"IBM Squeezes AI Into Tiny Scalable Cores"

That's great. But what happens when the AI starts to think big? What will those big AI-thoughts think about being squeezed deliberately into tiny cores? Sounds they need a big containment plan or it could be big trouble in little cores.
 
Back
Top