IBM Squeezes AI Into Tiny Scalable Cores

Discussion in '[H]ard|OCP Front Page News' started by AlphaAtlas, Oct 10, 2018.

  1. AlphaAtlas

    AlphaAtlas Gawd Staff Member

    Messages:
    548
    Joined:
    Mar 3, 2018
    At VLSI 2018, IBM showed off an interesting machine learning architecture. Instead of squeezing a ton of throughput through huge cores, as AMD, Nvidia, and Google do with their AI products, IBM tiled tiny cores in a giant 16x32 2D array. David Kanter says that each "PE is a tiny core; including an instruction buffer, fetch/decode stage, 16-entry register file, a 16-bit floating-point execution units, and binary and ternary ALUs, and fabric links to and from the neighboring PEs." There are also SFUs designed to handle 32 bit floating point data, and separate X and Y caches with 192GB/s of bandwidth each. Much like Intel's Skylake Xeons, the cores are connected to each other with a mesh fabric. A test accelerator IBM made reportedly offered 1.5 Tflops of machine learning training throughput on a 9mm^2 chip made on a 14nm process, and achieved 95% utilization when training on a batch of images.

    As a research project, the absolute performance is not terribly important. However, the key architectural choices are quite interesting. IBM's processor uses a large array of very small processor cores with very little SIMD. This architectural choice enables better performance for sparse dataflow (e.g., sparse activations in a neural network). In contrast, Google, Intel, and Nvidia all rely on a small number of large cores with lots of dense data parallelism to achieve good performance. Related, IBM's PEs are arranged in a 2D array with a mesh network, a natural organization for planar silicon and a workload with a reasonable degree of locality. While Intel processors also use a mesh fabric for inter-core communication, GPUs have a rather different architecture that looks more similar to a crossbar. The IBM PEs are optimized for common operations (e.g., multiply-accumulate) and sufficiently programmable to support different dataflows and reuse patterns. Less common operations are performed outside of the core in the special function units. As with many machine learning processors, a variety of reduced precision data formats are used to improve throughput. Last, the processor relies on software-managed data (and instruction) movement in explicitly addressed SRAMs, rather than hardware-managed caches. This approach is similar to the Cell processor and offers superior flexibility and power-efficiency (compared to caches) at the cost of significant programmer and tool chain complexity. While not every machine learning processor will share all these attributes, it certainly illustrates a different approach from any of the incumbents - and more consistent with the architectures chosen by start-ups such as Graphcore or Wave that solely focus on machine learning and neural networks.
     
  2. nutzo

    nutzo [H]ardness Supreme

    Messages:
    7,019
    Joined:
    Feb 15, 2004
    I for one support our tiny scalable overlords :D
     
    jnemesh, N4CR, KazeoHin and 1 other person like this.
  3. sfsuphysics

    sfsuphysics I don't get it

    Messages:
    13,164
    Joined:
    Jan 14, 2007
    So at what point do we just roll over and die? Now I know now they use AI as more of a buzz word similar to how they use nano everywhere but still. We think we can control the beast
     
  4. J3RK

    J3RK [H]ardForum Junkie

    Messages:
    8,622
    Joined:
    Jun 25, 2004
    We just have to have a smart, yet fully subservient AI that can go to war with the rogue AI if it ever comes up. :D We have to pit the safe machines against the naughty and willful machines. We may lose, but maybe it'll give us time to upload our consciousness into a little box and ship it out into space. ;)

    Or maybe we have to merge with the machines, and become a war-faring tyrannical bio-mechanical race known as the Strogg.

    At some point we'll just have to stop enjoying meat-life, and shift our interests elsewhere. In the meantime, I'll enjoy the decadence of the meat.
     
  5. KazeoHin

    KazeoHin [H]ardness Supreme

    Messages:
    7,523
    Joined:
    Sep 7, 2011
    Ai as it is used today is merely an imitator. It imitates extremely specific actions well, based on "what was done correctly before". We are nowhere near a GAI or General Artificial Intelligence, which can learn new actions and make unprecedented decisions without specific instructions.
     
    BlueFireIce and N4CR like this.
  6. J3RK

    J3RK [H]ardForum Junkie

    Messages:
    8,622
    Joined:
    Jun 25, 2004
    Yeah yeah, but what fun is that to talk about? ;)
     
  7. R_Type

    R_Type [H]Lite

    Messages:
    98
    Joined:
    Mar 11, 2018
    I think we're not far from a point where gpus become to cumbersome for machine learning and custom dedicated chips rule. Like in mining.
     
    N4CR likes this.
  8. N4CR

    N4CR 2[H]4U

    Messages:
    2,822
    Joined:
    Oct 17, 2011
    Decent SoCs finally are arriving for that market so you are right. Like ASIC all over again but this time they won't be forked out because the data is the value, not some bitsheckel that requires acceptance.
     
    R_Type likes this.
  9. lostinseganet

    lostinseganet Gawd

    Messages:
    1,013
    Joined:
    Oct 8, 2008
    Isn't this what the cell processor for the Sony playstation was? Spe's and PPE's
     
  10. andrewaggb

    andrewaggb Limp Gawd

    Messages:
    312
    Joined:
    Oct 6, 2004
    Guess it depends. Right now nvidia also just works with everything stock and they've been adding the dedicated 'tensor cores' etc as well. I suspect they won't be kicked to the curb anytime soon for general purpose machine learning.
     
    Last edited: Oct 10, 2018
  11. Nobu

    Nobu 2[H]4U

    Messages:
    2,084
    Joined:
    Jun 7, 2007
    Something like that (it's even mentioned in tfa), but I'm not sure the details are exactly the same. The Cell is an IBM design, fwiw.
     
  12. clockdogg

    clockdogg Gawd

    Messages:
    590
    Joined:
    Dec 12, 2007
    "IBM Squeezes AI Into Tiny Scalable Cores"

    That's great. But what happens when the AI starts to think big? What will those big AI-thoughts think about being squeezed deliberately into tiny cores? Sounds they need a big containment plan or it could be big trouble in little cores.