- Joined
- Mar 3, 2018
- Messages
- 1,713
Deep learning is big business these days, and Nvidia is currently a major leader in the field. But more and more companies are showing off alternatives to GPU-like architectures for AI processing. IBM unveiled a concept for a tile-based neural net system earlier this month, and according to a report in Spectrum, Flex Logix appears to be taking a similar approach. GPUs optimized for machine learning rely on heaps of external DRAM chips to feed the processors, and Flex Logix finds that approach inefficient. Instead, their architecture spreads SRAM throughout relatively compact "tiles" with a special interconnect, which reduces the processor's reliance on an external DRAM bus.
The tiles for Flex Logix' AI offering, called NMAX, take up less than 2 square millimeters using TSMC’s 16-nanometer technology. Each tile is made up of a set of cores that do the critical multiply and accumulate computation, programmable logic to control the processing and flow of data, and SRAM. Three different types of interconnect technology are involved. One links all the pieces on the tile together. Another connects the tile to additional SRAM located between the tiles and to external DRAM. And the third connects adjacent tiles together. True apples-to-apples comparisons in deep learning are hard to come by. But Flex Logix's analysis comparing a simulated 6 x 6-tile NMAX512 array with one DRAM chip against an Nvidia Tesla T4 with eight DRAMs showed the new architecture identifying 4,600 images per second versus Nvidia's 3,920. The same size NMAX array hit 22 trillion operations per second on a real-time video processing test called YOLOv3 using one-tenth the DRAM bandwidth of other systems. The designs for the first NMAX chips will be sent to the foundry for manufacture in the second half of 2019, says Tate.
The tiles for Flex Logix' AI offering, called NMAX, take up less than 2 square millimeters using TSMC’s 16-nanometer technology. Each tile is made up of a set of cores that do the critical multiply and accumulate computation, programmable logic to control the processing and flow of data, and SRAM. Three different types of interconnect technology are involved. One links all the pieces on the tile together. Another connects the tile to additional SRAM located between the tiles and to external DRAM. And the third connects adjacent tiles together. True apples-to-apples comparisons in deep learning are hard to come by. But Flex Logix's analysis comparing a simulated 6 x 6-tile NMAX512 array with one DRAM chip against an Nvidia Tesla T4 with eight DRAMs showed the new architecture identifying 4,600 images per second versus Nvidia's 3,920. The same size NMAX array hit 22 trillion operations per second on a real-time video processing test called YOLOv3 using one-tenth the DRAM bandwidth of other systems. The designs for the first NMAX chips will be sent to the foundry for manufacture in the second half of 2019, says Tate.