[content] => 
    [params] => Array
            [0] => /forum/index.php?threads/chips-for-machine-learning-part-i.8218/

    [addOns] => Array
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021071
            [XFI] => 1050270

    [wordpress] => /var/www/html

Chips for Machine Learning (part I)

Al Gharakhanian

New member
We are at an early stages of having chips earmarked for Machine Learning (ML), but there are already a number of companies that have (or are about to have) products in this domain. My goal here is to take a market snapshot of available products addressing this nascent market. Initially I would like to make a few clarifications about the characteristics of a typical ML processor

The basic architecture of processors intended for deep learning is SIMD (Single Instructions Multiple Data), hence you can expect a large number of on-chip processing blocks running the same code simultaneously

The term “Tensor” is used frequently in ML circles. Although this is not the right forum to dissect the technical intricacies of Tensors, the reader should keep in mind that the term signifies large multidimensional arrays of floating point numbers. Tensor processor are computing blocks suited to execute complex mathematical operations on such matrices

The term “Deep” (as in Deep Learning) signifies a complex pipeline-like configuration containing a large number of stages. In a way, you can envision the outcome of an ML machine to be a “big decision” that starts with a “small decision” that gets better and better as it propagates through a large number of layers or segments

High performance ML processors have insatiable appetite for data. The access latencies and the overall memory throughput are significant determinants of their overall performance

There are several approaches for implementing ML engines. The most common approach is to use general purpose Graphics Processing Units (GPUs). The second approach is having programmable yet optimized SoC’s that can't effectively be used other purposes. There are also several serious implementations that rely on FPGAs (Microsoft and Facebook have relied on FPGAs in the past). Finally there are highly optimized application specific processors that can only cater to a thin sub-segment of an application (e.g image or voice recognition). The ML processors can also be classified in accordance to a second dimension and that is the specific location that they reside. To date most of the deployments have been in large server rooms and data centers that constitute the core of the network. Alternatively we are starting to see scaled-down and optimized processors that are autonomous and sit on the edge of the network. As an example an ML engine can sit in a surveillance cameras. They need to have sufficient performance for, let us say; image recognition (without relying on cloud-based GPUs)

The following is a very brief overview of available solutions:


The undisputed leader when it comes to chips used for Artificial Intelligence (AI) and ML. As a company, NVDIA has done a masterful job of leveraging their traditional GPU (Graphics Processing Units) technology to tackle two very new yet thriving markets that have nothing to do with gaming or graphics. The first being the HPC (High Performance Computing) segment. AI and ML being the second. As a matter of fact, NVIDIA is one of the very few semi. companies that has been able to organically build a commanding position in a brand new market. Jensen Huang (CEO) is truly a visionary and energetic leader and has been able to build a company that has consistently delivered. NVDIA practically owns the market for GPUs used for Deep Learning applications. Most prominent cloud vendors offer NVDIA-based servers used in all sorts of AI applications. The company is offering at least two generations of GPUs (Tesla and Pascal) widely used in ML applications. The true accomplishment of NVIDIA goes far beyond having powerful GPUs, they have formed a marvelous eco-system, developed robust development tools (CUDA), and have formed a vibrant and enthusiastic user community. Despite these strengths, NVDIA’s ML solutions are “general purpose” supporting a large number of bells and whistles that can be burdensome when it comes to power dissipation and price.


Nervana is a startup working on an accelerator chip for deep neural networks. The company was just acquired (yesterday) by Intel for $350M. They are developing a processor architected for inference machines suited to hunt for patterns in large data sets. This is a dream-come-through for massive IoT deployments. The chip integrates a large number of Tensor processors tightly coupled with on-chip and external SRAM. The company is using 3-D memory stacking technology to make this possible. One unusual aspect of Nervana is their business model. Instead of selling chips, they have opted to offer cloud services to various market verticals (Insurance, Medical, …..). They are building servers powered by their chips to do this. It is unclear what would happen to this business model once they are fully integrated with Intel.

Alphabet (Google)

Google has made huge investments in AI, and ML. In addition to many visible and publicized applications such as search, RankBrain, StreetView, and autonomous vehicles; the company uses ML in more than 100 internal applications that are less publicized including human resources and etc. They have made massive investments in developing an elaborate development platform and a framework called TensorFlow specifically intended for Deep Neural Networks as well as other flavors of ML. They have astutely OpenSourced this toolkit enabling masses to build ML applications for a plethora of applications. They also have developed an accelerator chip called Tensor Processing Unit (TPU) that is highly optimized for TensorFlow. They claim that the performance per watt of this device is an order of magnitude better than best-of-breed GPUs. Not much detailed information is available in the public domain about the TPU, and to the best of my knowledge the technology has been developed for internal use only.

More to come next week . . . .