Like a brick to a house, the chip is the essential building block of any computer system. However, the Convolutional Neural Networks (CNN) that have revolutionized AI have thus far been constructed using a hodgepodge of chips originally intended for other purposes. To boost CNN performance, industry leaders and startups alike are now racing to develop specialized AI chips.
GPUs first appeared in CNNs in 2012, when AI researchers half-stumbled onto the discovery that their parallel computing abilities made them suitable for AI tasks. Leading GPU maker Nvidia has pushed the limits of GPUs with its latest iteration, Tensor V100, widely seen as as a game changer. But GPUs are just what the name advertises: graphics processing units, designed for graphic-based vector tasks, not for CNN.
Although GPUs remain today’s top choice for AI computation, are they the future? Probably not. Head of Facebook AI Research Yann LeCun recently told Wired that “there is a lot of headroom for even more specialized chips that are even more efficient.”
Mobile AI chips for example are specialized for running AI applications on smartphones, which previously had to connect to cloud servers to perform the advanced tasks demanded by AI applications. Huawei and Apple incorporated such AI chips into their phones this September.
Apple’s A11 bionic chip includes a neural engine to enable features like Face ID or Animoji. The company lauds A11 as “the most powerful and smartest chip ever in a smartphone.” Meanwhile, Huawei’s latest SoC (system on a chip) Kirin 970 also has a neural processing unit, making it faster and more energy efficient than top mobile CPUs.
Kirin 970 is backed by tech from Chinese unicorn Cambricon, which makes AI chips for cloud servers, mobile devices, computer vision applications, and autonomous driving. The company recently raised a staggering US$100 million in its Series A funding round.
Many other AI startups are also focusing on AI embedded chips for devices like cameras, home appliances, and mobile tablets — most of which will become intelligent in the next five to ten years. The stakes are high, last year Intel acquired on-device chip maker Movidius for an estimated US$400 million (price was not disclosed).
Smart chip companies are now tailoring their designs for specific purposes such as deep learning or neural networks. “Take the example of a surveillance camera that is meant to recognize humans’ faces, it only needs a CNN-based AI chip,” says Frank Lin, co-founder of Silicon Valley-based chip startup Gyrfalcon. In September the company released its first dedicated AI chip, the Lightspeeur 2801S Neural Processor, which combines high performance with an impressive energy efficiency rating of 9.3 TOPS/Watt.
The Lightspeeur incorporates 28,000 parallel computing cores and a new architecture that accelerates AI in the memory, directly eliminating data movement — a step in other architectures that requires heavy power consumption which can result in overheating.
“Overheating is a problem hindering the development of AI chips. You can build a big system to cool a cloud server, but that is not going to work on the scale of device-embedded AI,” says Lin.
Lightspeeur supports CNN, RNN, and LSTM, and is especially suitable for CNN as it draws only 0.3 Watt at 50 MHZ while running 142 frames/second (224*224*3) images.
Meanwhile at MIT, a project dubbed Eyeriss is addressing overheating challenge using a different approach. Their energy-efficient deep CNN chip features a spatial array of 168 processing elements fed by a reconfigurable multicast on-chip network that minimizes data movement by exploiting data reuse. Data gating and compression are used to reduce energy consumption. The chip can run CNNs in AlexNet at 35 fps with 278 MW power consumption, which is 10 times more energy efficient than mobile GPUs.
Although widespread implementation of specialized AI chips remains years away, no chip maker wants to fall behind in the race to get there. Says Jianxiong “Professor X” Xiao, founder of autonomous driving company AutoX, “Using a general-purpose chip is just a waste of resources. If an ASIC (application-specific integrated circuit) or FPGA (field-programmable gate array) chip is specialized to run convolution, the results will be much better.”