WP_Term Object
(
    [term_id] => 10
    [name] => eSilicon
    [slug] => esilicon
    [term_group] => 0
    [term_taxonomy_id] => 10
    [taxonomy] => category
    [description] => 
    [parent] => 386
    [count] => 84
    [filter] => raw
    [cat_ID] => 10
    [category_count] => 84
    [category_description] => 
    [cat_name] => eSilicon
    [category_nicename] => esilicon
    [category_parent] => 386
    [is_post] => 1
)

Platform ASICs Target Datacenters, AI

Platform ASICs Target Datacenters, AI
by Bernard Murphy on 07-17-2018 at 7:00 am

There is a well-known progression in the efficiency of different platforms for certain targeted applications such as AI, as measured by performance and performance/Watt. The progression is determined by how much of the application can be run with specialized hardware-assist rather than software, since hardware can be faster and consume less power than software running on a less specialized platform. At the low end are general-purpose CPUs, where the application is entirely in software, then GPUs, FPGAs, DSPs and finally custom hardware – an ASIC such as the Google TPU.

21975-efficiency-asic-min.jpg

So why not just build every such solution as an ASIC, at least as long as you can justify the initial build investment? Two reasons dominate. First, the underlying algorithms may be rapidly changing (as in AI) and second the time required to design an ASIC can be significant, making it very difficult to keep pace with rapidly-changing needs. You’d have to look hard to find more fiercely competitive markets than AI applications (q.v. Facebook, Apple, Amazon, Google, Baidu, Alibaba, TenCent, and ADAS/autonomous car suppliers) and datacenters (q.v. Amazon, Microsoft, Google and others). All are working in rapidly-evolving winner-take-all markets. In these domains, time isn’t just money, it’s survival.

Which is why eSilicon is launching a platform approach to targeted applications. These ASIC platforms are augmented with libraries and infrastructure targeting AI and datacenter networking needs. Each is built on 7nm technology and is PPA-optimized as a whole to optimize for the specific needs of those domains.

21975-efficiency-asic-min.jpg

Let’s start with the networking platform. This offers:

  • 56G and 112G SerDes with long-reach and short-reach architectures at 56G, to support many lanes at very high data rates, yet at the lowest power achievable
  • TCAM memory to speed route lookups, packet classification, packet forwarding and ACL commands
  • PHY to connect to high-bandwidth memory (HBM2) stacks in the package. Note incidentally that eSilicon has significant experience in building 3D and 2.5D systems, both at die and package levels. So a system-in-package solution becomes an easy choice
  • Specialized memories/memory compilers for pseudo-2-port, pseudo-4-port and other application-specific memories, providing high bandwidth with area and power saving, along with a range of I/O buffers

21975-efficiency-asic-min.jpg

The AI platform (which they call neuASIC) is a little more involved. The goal here is to provide first all the IP components you would expect in a standard SoC (CPU, local SRAM, NoC interconnect, interface to external memory I/O buffers), here called the ASIC Chassis. The neural-net (NN) part of the design is implemented on a stacked layer above the chassis, with 3D interconnect to connect to the AI layer. Again, this leverages eSilicon experience in 3D packaging.

If you simply hardwire your AI architecture, it will have great PPA but you may need to replace it (build a new ASIC) as soon as a competitor jumps past you. The neuASIC structure is optimized to limit the need for redesign against algorithm changes. First the Chassis hardware should be relatively insensitive to changes in NN algorithms. Next, the AI layer is divided into tiles. This mega-cell partitioning encourages durability in the underlying hardware to changes in the NN algorithms, thanks I would assume to the natural modular style of NN designs. Each tile is built around commonly-used macro AI functions such as convolution or pooling functions, some pre-designed by eSilicon, some might be 3[SUP]rd[/SUP]-party, some may be designed by the ASIC customer.

As of May of this year, neuASIC provides a library of MAC blocks, convolution engines and memory-transpose functions as pre-built macro functions (they continue to work on more), speeding assembly of common NN structures. Since memory and operations must be very tightly coupled in NNs to reduce overall power, they also provide pseudo-4-port memories for neuron support (2 neuron data inputs, 1 weight input, one neuron output) and a specialized memory called a weight-all-zero-power-saving (WAZPS) which will zero outputs at lower power if weights are zero (but at lower power than by default), a common occurrence in NNs with sparse weight matrices.

Design is supported through a modeling system they call the Chassis Builder, through which you can model the functional operation of an NN, while also extracting PPA estimates to guide optimizing the design to your targets.

For both platforms, the goal is to provide a fast path to a working solution, while also meeting your aggressive PPA goals. Doing so requires more than a standard ASIC platform. You need to be able to put together a chassis quickly with predefined I/O ring, interconnect and high bandwidth memory access, you must have the IP/macro primitives required in those applications, those IP should be optimized together for the application and you must be able to configure and characterize your planned design to your PPA objectives. These platforms look like a good start and a promising long-term path to accelerating high-performance, low-power ASIC design in these domains. You can learn more about the networking platform HERE and the AI platform HERE.