AI Hardware Update - Differentiating AI Inference Accelerator Chips

Al Gharakhanian · Apr 21, 2019

I keep track of approximately 50 companies that are building deep neural network (DNN) accelerator chips and I have been criticized for not having a complete list. While there is no doubt that the market for deep learning chips is and will be growing leaps and bounds, it is hard to imagine seeing more than a dozen viable players with staying power. Clearly the segment will go through consolidation and unfortunately many vendors will not survive.

Marketing 101 has taught us that just having a superior performance is not sufficient to win in a competitive environment. Product positioning, differentiation, and promotion are paramount factors that will set winners apart. While superior architecture, tight integration with hierarchical memory, and advanced process nodes are critical in having an edge in the market, there are many other areas that can be exploited by chip vendors to differentiate.

This leads us to the concept of “approximation” as means of product differentiation. In a nutshell, one can view approximation as doing more with resources at hand or do the same with less resources. Finding ways to eliminate duplicate connections, weights, and neurons in DNNs in addition to utilization of the most efficient quantization (numerical representation format) schemes can go a long way in minimizing the power dissipation, latency, die area, and memory footprint in inference accelerators. The term “DNN approximation” is a catch-all phrase encompassing all of the above. To be clear, DNN approximation is less desirable in training since typically nothing is spared to maximize the accuracy of the DNNs during training. On the contrary, using aggressive approximation techniques in inference DNNs make a lot of sense since low power dissipation, latency, cost, and memory footprint are huge factors in inference settings specially in edge applications.

To appreciate the benefits of the approximation let me present the following example. The original LeNet-5 MNIST (handwritten digit) classification task required nearly 700k arithmetic operations per classification. Fast forward a few years, a more recent VGG16 model for classifying ImageNet required nearly 35G arithmetic operations per classification. The good news is that image classification models are getting better, and the bad news is that their will continue to grow leaps and bounds.

So, what does approximation have to do with product differentiation?

More . . .
Differentiating AI Inference Accelerator Chips – AI Hardware Update

Search

AI Hardware Update - Differentiating AI Inference Accelerator Chips

Al Gharakhanian

New member