FPGA Startup Claims to Have Solved the Deep Learning DRAM Problem

AlphaAtlas · Nov 1, 2018

Deep learning is big business these days, and Nvidia is currently a major leader in the field. But more and more companies are showing off alternatives to GPU-like architectures for AI processing. IBM unveiled a concept for a tile-based neural net system earlier this month, and according to a report in Spectrum, Flex Logix appears to be taking a similar approach. GPUs optimized for machine learning rely on heaps of external DRAM chips to feed the processors, and Flex Logix finds that approach inefficient. Instead, their architecture spreads SRAM throughout relatively compact "tiles" with a special interconnect, which reduces the processor's reliance on an external DRAM bus.

The tiles for Flex Logix' AI offering, called NMAX, take up less than 2 square millimeters using TSMC’s 16-nanometer technology. Each tile is made up of a set of cores that do the critical multiply and accumulate computation, programmable logic to control the processing and flow of data, and SRAM. Three different types of interconnect technology are involved. One links all the pieces on the tile together. Another connects the tile to additional SRAM located between the tiles and to external DRAM. And the third connects adjacent tiles together. True apples-to-apples comparisons in deep learning are hard to come by. But Flex Logix's analysis comparing a simulated 6 x 6-tile NMAX512 array with one DRAM chip against an Nvidia Tesla T4 with eight DRAMs showed the new architecture identifying 4,600 images per second versus Nvidia's 3,920. The same size NMAX array hit 22 trillion operations per second on a real-time video processing test called YOLOv3 using one-tenth the DRAM bandwidth of other systems. The designs for the first NMAX chips will be sent to the foundry for manufacture in the second half of 2019, says Tate.

spintroniX · Nov 1, 2018

The architecture may be fancy, but if it doesn't come with high level APIs / OpenCL compatibility it's DOA.

nVidia dominates this space because they got everybody on board with CUDA. That's gonna be a tough barrier to entry, even for a superior architecture.

jnmunsey · Nov 1, 2018

spintroniX said:
The architecture may be fancy, but if it doesn't come with high level APIs / OpenCL compatibility it's DOA.

nVidia dominates this space because they got everybody on board with CUDA. That's gonna be a tough barrier to entry, even for a superior architecture.

If Nvidia doesn't compete at the hardware level, meaning they need to improve dramatically in some areas, then the APIs will develop among the competitors and Nvidia will lose significant market share. I really hope Nvidia ramps things up with their hardware, especially in inferencing. There are already competitors that are faster and cheaper. Xilinx just got a huge deal with Microsoft, Google's TPU is strong, and a few Chinese companies have some solid products, all of which appear to beat Nvidia's stuff handily in inferencing.

FPGA Startup Claims to Have Solved the Deep Learning DRAM Problem

AlphaAtlas

[H]ard|Gawd

spintroniX

Gawd

jnmunsey

Limp Gawd