Bronco Webinar 800x100 1

Quantum Tunneling for OTPs, PUFs: Higher security

Quantum Tunneling for OTPs, PUFs: Higher security
by Bernard Murphy on 03-17-2021 at 6:00 am

ememory neofuse min

I’ve had a number of enjoyable discussions with John East who ran Actel until it was acquired. (John and Actel devices also play an important role in my book, The Tell-Tale Entrepreneur.) This is relevant because Actel were well-known for their anti-fuse FPGAs. eMemory Technology, the subject of this blog, also produce an anti-fuse device, but with a difference. The Actel devices, worked as the name suggests, by growing a resistive silicon link between two points rather than blowing an efuse-link. John’s FPGAs are pretty impressive – they’ve driven multiple Rovers around Mars and power many satellites. But John admits the technology can be tricky to manage. Better might be Quantum tunneling for secure OTPs.

Tunneling technology

eMemory use this quantum tunneling idea in their NeoFuse process. Rather than growing a resistive silicon link between two points, they start with an ultra-thin gate-oxide. When they apply a programming voltage across this oxide, this can break some silicon/oxygen bonds, creating dangling bonds around each interface (gate and substrate). These dangling bonds grow in density as programming continues and act as traps distributed through the oxide. Electrons can then tunnel through these traps. This tunneling is obviously easier in thin-oxide devices, increasingly common in advanced processes.

According to the company, this dangling bond creation can only be destroyed by heating the device to around 600-700oC, so for all practical purposes it is a one-time programmable (OTP) technology. Pretty neat. Around this technology they have spun two solutions, one for OTP device applications and the other for physically unclonable functions (PUFs).

Secure OTP devices

The OTP is being offered as an embedded block and is already proven in a variety of processes at TSMC and other foundries. One feature especially interesting to me is the high security nature of this approach over traditional fuse devices. Blowing links in standard devices will be visible in SEM analyses. Maybe not to kiddie hackers but certainly to large scale criminal enterprises and nation-state hackers. But dangling bonds are inside the gate-oxide and even then would be very difficult to spot or map. Maybe not impossible (never say never), but a lot harder than SEM imaging. They also note that reliability against electromigration failures is better than in eFuse devices. Whatever reliability problems dangling bonds might have, electromigration does not seem like a likely candidate.

PUF application

The second application, for PUFs, is also appealing. A PUF provides an unpredictable but repeatable and unique ID for a device which can then be used in challenge/response authentication, say to approve an over-the-air software update. A PUF IP often depends on subtle manufacturing variations between chips to generate that unique ID. The NeoPUF technology builds on the same quantum tunneling mechanism described above. Here they apply a high electrical field to two identical and neighborhood transistors. In each this stimulates the growth of dangling bonds. Manufacturing differences will ensure one transistor will support a stronger tunneling current than the other, a measurable binary distinction between the two. Repeat that over many pairs of transistor gates and you have a number that meets all the requirements of a PUF ID.

They have run extensive tests on the key performance indicators for a PUF. For randomness, the IP passes the NIST 800-22 randomness tests. Stability is consistent across a wide range of voltages and temperatures and reliability is equally impressive through burn-in testing.

You can learn more HERE.

WEBINAR: eMemory’s Embedded ReRAM Solution on Nanometer Technologies


Delivering True Wireless Stereo (TWS) Experience

Delivering True Wireless Stereo (TWS) Experience
by Kalar Rajendiran on 03-16-2021 at 10:00 am

Cropped Picture Purchase Criteria when choosing wireless headphones or earbuds

A recent blog about the Hearables market covered how expansive and far reaching the product opportunities promise to be. Of course, consumer purchase criteria will drive the product realizations, adoptions and consequently the market success of each of these envisioned products.­ Whether it is the earbud product or one of those futuristic life augmenting hearables products, there are a few core requirements that need to be satisfied to achieve massive market adoption. For the earbud application, refer to Figure 1 below.

Figure 1: Purchase criteria when choosing wireless headphones or earbuds

Source: The State of Play Report 2020, Qualcomm

Six among the top criteria are sound quality, price, comfort in the ear, battery life, ease of use and active noise cancellation. Comfort in the ear and ease of use are basic requirements without which the earbud will not gain widespread use. For purposes of this blog, we will focus on how to deliver on the other top four criteria and how semiconductor companies can play a key differentiating role through the solutions they offer to hearables product manufacturers.

It’s in this context that I recently reviewed a whitepaper by Hai Yu and Clement Moulin of Dolphin Design. That whitepaper does a thorough job of not only describing the challenges in delivering on the purchase criteria but also offers compelling solutions to overcome the hurdles. In this blog, I’ll highlight just some of what I gathered from my review of the whitepaper.

For starters, the whitepaper addresses lot more aspects than what a quick glance of the title may lead one to assume. It goes into details, not only about selecting optimal ADC architecture for the audio codecs application but also on how to choose the active noise cancellation (ANC) algorithms to execute, how to manage power consumption and how to design quickly and effectively to deliver a cost-effective chip.

Excellent sound quality:

With Bluetooth 5.2 specification and the introduction of Low Complexity Communication Codec (LC3) LE Audio protocol, developers now have greater flexibility in balancing key attributes such as sound quality, multiple independent audio transmission channels, battery life, etc., when designing products. Along with this flexibility comes tradeoff choices that must be made when specifying the components that go into the audio signal chain. The whitepaper goes into lots of details about how to choose the right ADC architecture for implementing the audio codec, how to implement the Voice Activity Detector (VAD) and how to choose the best microphone for an application to get the fullest performance, among other things.

Active Noise Cancellation (ANC):

TWS earbuds use case introduces complexity to ambient-noise suppression. This combined with voice-activation feature necessitates processing a large amount of audio data which in turn will increase power consumption.

Dolphin Design’s WhisperTrigger is a patented Voice Activity Detector (VAD) which detects the presence of voice in a sound and triggers a system wake-up interrupt signal. This solution offers on-the-fly customization to adapt to any kind of environment and optimizes power consumption. This solution also does not need any DSP resource support, thereby reducing power demand on the battery. An Always-on-Voice (AOV) device implemented using WhisperTrigger IP would consume only a fraction of the power that a traditional software algorithm and/or conventional DSP implementation would require.

Additionally, Dolphin Design’s Ultra-low I/O latency codec dramatically eases the ANC software development effort and enhances power efficiency of the signal/noise processing workload.

Managing Power Efficiency:

Dolphin’s PowerStudio platform provides an easy way to implement power management design and integrate into an SoC. Their Power Controller IP is a CPU-less, event-based architecture that consumes ultra-low power and offers high-flexibility to scale with any SoC complexity.

Power efficiency management (refer to Figure 2 below) is accomplished through Always-ON Cluster (low leakage in sleep modes) and Active Cluster (for best energy efficiency in active modes) partitions.

Figure 2: Overview Block Diagram of Dolphin Design’s Power Controller

Source: Dolphin Design

All of the things that were highlighted above should translate to long play time on the earbuds (extended battery life between charges) and reduced silicon area of the chip.

Whether you are product developer at a Hearables product company or a chip developer for the Hearables market, you would gain a lot of very detailed knowledge by reading the entire whitepaper. You can download the whitepaper “Paving the way for the next generation audio codec for True Wireless Stereo (TWS) applications” from Dolphin Design’s website.


Electromagnetic and Circuit RLCK Extraction and Simulation for Advanced Silicon, Interposers and Package Designs

Electromagnetic and Circuit RLCK Extraction and Simulation for Advanced Silicon, Interposers and Package Designs
by Tom Dillinger on 03-16-2021 at 6:00 am

spiral complex

For years, there have been rather distinct domains for the extraction of interconnect models from physical design data.

Chip designers commonly focused on RC parasitics for circuit/path delay calculations and dynamic I*R voltage drop analysis.  The annotation of extracted parasitics to a netlist model required the layout topology to be LVS-clean.  For a select class of high-frequency designs with fast clock slew rates and high switching activity, the impact of inductive impedance was incorporated into the power grid and global clock model extraction. [1]

On-chip inductive spiral components utilized unique methods for electrical model generation.  The layout of these components often required specific metal fill layout topologies below the (thick) top-level metals all the way down to the substrate, to simplify the assumptions about the induced current flow, as depicted below.

The package and printed circuit board design domain requires accurate RCLK model extraction, to provide the power/ground distribution impedance model and signal interconnect insertion/reflection/crosstalk losses between transceivers.  The budget for allowable P/G distribution voltage level swings is inevitably very aggressive, and the cost/area tradeoffs for the addition of decoupling capacitance necessitate very detailed models.  The requirement for very high datarate signaling (especially over long-reach serial interfaces) demands accurate extracted models, valid over a wide frequency range – i.e., to multiple harmonics of the fundamental datarate.

There are several technology trends that are driving new developments in these two extraction domains:

  • increasing use of inductive elements on-die, placed over circuitry

The utilization of tuned RLC “tank” circuits is growing, as part of the on-die clock synthesis requirements.  Wireless opportunities are expanding.  The design of local oscillators as the clock source for high-speed wireline interface links between chips is using LC resonant tanks to a greater degree.

The die area allocated to these circuits is a growing concern.  As illustrated in the figure above, on-die inductors are increasingly being merged with underlying circuitry, necessitating enhanced approaches to model extraction.

  • advanced multi-die 2.5D and 3D packaging technologies introduce new topologies to model

Current packaging technologies incorporate:

  • TSVs for power delivery and signal connectivity from bumps to die, through stacked die
  • short-reach (parallel, clock-forwarded) interfaces between die
  • local redistribution interconnect layers in an interposer

The figures above illustrate a simple 2.5D interposer structure with two die – clock lines are highlighted in yellow, as an example.  It is necessary to analyze electromagnetic (EM) effects throughout the entire structure.

and, last, but most certainly not least:

  • the physical design data volume associated with advanced process node die and multi-die packages is immense

The algorithms for extracting parasitic models need to support distributed computation, with highly scalable performance across multiple processor cores.

I recently had the opportunity to chat with Yorgos Koutsoyannopoulos and Anand Raman at Ansys, to get their perspectives on the trends and tool features needed to support the evolution of these model extraction domains.  Their insights were most illuminating – specifically, how the recently-introduced Ansys RaptorH product addresses these evolving requirements comprehensively.

Yorgos began by saying, “The application space for RLCK extraction and simulation is expanding rapidly.  The designers of 2.5D and 3D ICs are familiar with silicon-centric flows.  They need a modeling solution that combines usability features with the accuracy demanded by the high signal datarates and power delivery challenges of these package solutions.”

“How did you approach that balance, between usability and accuracy?”, I asked.

Yorgos replied, “Ansys HFSS is the gold standard for electromagnetic analysis, spanning the gamut from wireless propagation to PCB-level signal and power integrity simulation.  The previous generation RaptorX product focused on parasitic calculations for on-chip structures – such as spirals, power grids, on-die MIM decoupling capacitors.  We have merged HFSS and RaptorX into RaptorH.  Both engines are integrated.  Designers leverage the best of both algorithms automatically – the tool applies the optimum approach to each element of the model.”

Anand added, “Several considerations were an integral part of the RaptorH product development.  A silicon-centric design environment is the basis for these 2.5D and 3D packages.  GDS-II or OASIS data represents the design.  The techfile stackup definition utilizes the process description from the foundry.  All layer and dimensional information is encrypted.  Process corner definitions use the same definitions as the traditional silicon environment.”

“Yorgos highlighted the focus on usability – how did that influence the product development?”, I inquired.

Anand replied, “The RaptorH desktop will be familiar to both current RaptorX and HFSS users.  The 3D design geometry and the visualization of the electromagnetic field solution use the existing Ansys desktop interface.”

Anand continued, “Both S-parameter and circuit netlist models are provided.  Of specific note is that this analysis is available pre-LVS, while designs are still in flight.”

I asked, “For general electromagnetic analysis, HFSS typically requires significant expertise at the controls – for example, the definition and placement of model ports.  How is that managed in RaptorH?” 

Anand replied, “The silicon-centric nature of the RaptorH flow means we needed to provide a familiar environment to chip designers.  We don’t need to support free-space electromagnetics, waveguides, antennas, and the like.  All metals are created equal.  Designers set circuit ports just as if they were placing a probe tip in the lab.”

I asked, “These 2.5D and 3D package model databases can be huge – how is the RaptorH tool performance?”

Yorgos answered, “The intent of RaptorH is to present the entire layout for EM analysis.  No pruning of data lanes required, hoping the sampled topology is representative of the full interface.  The tool quickly analyzes the footprint of the design, the ports, and techfile stackup data to provide guidelines on the computational resources needed – that algorithmic analysis takes a small percentage of the total computation time.   EM  model generation is extremely parallelizable.  For very large problems, RaptorH utilizes multiprocessing cloud resources, with an excellent speedup factor when using multiple processors.”

If you are pursuing a 2.5D/3D packaging solution, accurate signal and power distribution model extraction is an absolute necessity.  I would encourage you to investigate the unique features of the Ansys RaptorH solution.  Specifically, there is a brief webinar available discussing electromagnetic coupling within these complex systems, that provides lots of additional information – I learned a lot.

Ansys RaptorH Pre-LVS Electromagnetic Modeling — link.

Ansys RaptorH webinar:  De-Risking High-Speed Serial Links from On-Chip Electromagnetic Crosstalk and Distribution Issues — link.

-chipguy

References

[1]  Restle, P., et al., “Measurement and Modeling of On-Chip Transmission Line Effects in a 400MHz Microprocessor”, IEEE Journal of Solid State Circuits, Vol. 33, No. 4, April 1998, p 662-665.

Also Read

Need Electromagnetic Simulations for ICs?

Webinar: Electrothermal Signoff for 2.5D and 3D IC Systems

Best Practices are Much Better with Ansys Cloud and HFSS


Enabling Edge AI Vision with RISC-V and a Silicon Platform

Enabling Edge AI Vision with RISC-V and a Silicon Platform
by Tom Simon on 03-15-2021 at 10:00 am

AI Chipset Market

AI vision processing moving to the edge is an undeniable industry trend. OpenFive, the custom silicon business unit of SiFive, discusses this trend with compelling facts in their recent paper titled “Enabling AI Vision at the Edge.” AI vision is being deployed in many applications, such as autonomous vehicles, smart cities, agriculture, industrial & warehouse robotics, delivery drones, augmented reality, and smart retail & home.

Initially, it was only feasible to run AI vision processing in the cloud due to its capacity and processing power requirements. However, as billions of devices are deployed, processing solely in the cloud becomes unscalable. The network bandwidth requirements from billions of devices capturing high-resolution video from multiple cameras would exceed 5 petabytes per second!

On top of the logistical issues, cloud-based AI vision processing exacerbates privacy and latency issues. I for one would not want my self-driving car to rely on a wireless internet connection for making real-time driving decisions.

Associated with the push to move AI vision processing to the edge, there is large growth in the chipsets used to perform this processing. As shown in the chart, custom ASIC will become a dominant solution to provide the performance, power and functional advantage in AI Vision applications.

Edge AI Vision – Deep Learning Market

SiFive, OpenFive’s parent company, was founded on applying the ideas that have made software development so productive by eliminating the inefficiencies typically encountered. Yunsup Lee, co-founder and CTO of SiFive, participated in development of the RISC-V open- source instruction set architecture (ISA) in 2010. His vision has been to reduce the barriers for hardware design. The work of OpenFive is bearing fruit with impressive reductions in the cost, manpower and time needed to develop custom ASICs.

OpenFive’s use of SiFive’s RISC-V processor IP gives developers access to a well-developed set of operating systems, compilers, development packages and debugging tools. OpenFive’s AI vision platform is intended to speed up development of custom AI vision SoCs by providing multiple customizable subsystems that enable designers to focus on their key differentiators.

The platform contains just about every subsystem needed and can be tailored to eliminate unnecessary ones or to add specialized new blocks for specific applications. At the heart of the platform are SiFive’s multicore super-scalar Linux-capable U74 CPU complex, with support up to 8 cores and 2 MB of L2 cache. 32/64-bit LPDDRx with 6400MT/s provides gigabytes of high-bandwidth DRAMs required by edge AI applications. Powered by SiFive’s S21 embedded CPU, the platform management unit is responsible for power, boot and system health. The platform is secured by SiFive Shield that performs many security functions such as crypto, secure boot and key management. There is a vision subsystem with a vision DSP as well as MIPI interfaces. OpenFive includes an AI accelerator subsystem, of course, or users can add their own. Other customer specific accelerators can be added as well. The audio subsystem offers a wide range of features such as echo suppression and noise cancellation with its audio DSP. For visualization and graphics output, there is an integrated GPU. Naturally there is a wide range of high speed I/Os. There is even a die-2-die interface to improve performance with additional chiplets.

OpenFive’s business model allows their customers to engage with them during all stages of the ASIC development process. Customers can easily and quickly leverage OpenFive to complement their own skills, instead of needing to have in-house expertise in every one of the several dozen fields needed to produce a custom ASIC.

With open-source hardware and platform-based ASIC development, it is certain that we will see new products coming to market quickly that offer much hardware innovation. The rapid progress and growth that SiFive (and OpenFive) is experiencing are proof that there is pent-up demand for this. “Enabling AI Vision at the Edge” offers more details about OpenFive’s AI Vision platform that is worth looking at. The paper is available for download on their website.

Also Read:

WEBINAR: Differentiated Edge AI with OpenFive and CEVA

Open-Silicon SiFive and Customizable Configurable IP Subsystems

Ethernet Enhancements Enable Efficiencies


Sondrel Explains One of the Secrets of Its Success – NoC Design

Sondrel Explains One of the Secrets of Its Success – NoC Design
by Mike Gianfagna on 03-15-2021 at 8:00 am

Seventeen horizontal layers of a complex digital chip design showing the interconnection layouts for each layer
Seventeen horizontal layers of a complex digital chip design showing the interconnection layouts for each layer

Sondrel is an interesting and unique company. They are a supplier of turnkey services from system to silicon supply. So far, not that unique as there are a lot of companies with this mission. What is unique is their focus on complex designs. The company takes on the design of chips that would need teams of engineers working for a year with the aim of providing economy of scale. I’ve spent some time at the leading edge of custom chip design, and I can tell you it’s not for the faint of heart. This stuff is very, very difficult and those that can help are rare and quite valuable. There are lots of ways to address the daunting challenges of complex custom chip design, so I was quite excited to be able to get some of the backstory from a key member of the Sondrel team. Read on to learn how Sondrel explains one of the secrets of its success – NoC design.

First, some of the basics. If you want to learn more about his unique company, you can read an in-depth interview Daniel Nenni did with Sondrel’s CEO here. Next a bit about a NoC, or network on chip technology. If you think of typical IP building blocks for an SoC as the electrical fixtures in a home, the NoC is the wiring.  It’s the interconnect backbone that delivers the right data to the right location at the speed required to make the whole system work. Doing something as complex as interconnecting the elements in an SoC, and even beyond to the external devices such as memory really benefits from a structured approach. That’s what a NoC delivers.  This technology can offer the margin of victory if done correctly.

I was able to catch up with Dr. Anne-Françoise Brenton, Sondrel’s NoC expert. Anne-Françoise  has extensive design experience from ST Micro, Thomson Consumer Electronics and Thomson Multi-Media, and TI before joining Sondrel over seven years ago. Anne-Françoise  offered some great insights into why a NoC is so critical to complex chip design and how Sondrel approaches its design.

She began by explaining that in an ideal design all the sections that need high speed, high data flow between them would be located as close together as possible. That is, memory in the middle of the chip next to the blocks of IP that need memory access. In reality, apart from cache, memory is located off chip on dedicated memory chips, which use state-of-the-art memory technologies so that access points to memory are located on the perimeter of the chip. As a result, a complex network of interconnections is needed to route the data traffic between blocks and to and from off-chip memory. On a big chip design, there could be seventeen layers of horizontal interconnections plus a number of vertical connections between these layers. The graphic at the top of this post illustrates such a case.

In her words, “It’s rather like designing a massive, multi-level office block where you have to design it to allow for optimal movement of people between areas and floors. Where a lot of people need to move rapidly between two locations, you need a wide fast corridor and the length of it affects the timing of the arrival of people. Similarly, an infrequently used, non-urgent route can be long and narrow, and therefore slow. The analogy continues with the vertical interconnects being lifts with big capacity, lifts that just connect two specific floors to provide a dedicated route for high-speed connections, and lifts that stop at all floors that are slower but connect a lot of locations. On top of this is the arbitration that dynamically controls the data flow through the NoC with buffering to smooth and optimize as demand changes, for example when two IP blocks are sharing and accessing the same memory.”

Is your head hurting yet?  Mine was. Anne-Françoise  went on to explain that designing a NoC is an iterative collaboration throughout the entire chip design process between the front end, back end and NoC teams of designers.

She went on to explain that one of the challenges in NoC design is that third party IP blocks can be a black box solution with very little data provided on its demands for data flow as the vendor wants to protect the exact workings of its IP. This is actually overcome as the whole design matures by using timing analysis performance modeling to help ensure that the NoC is delivering the data as required by arbitrating the pathways to deliver the data according to pre-assigned priorities – there cannot be any bottlenecks.

Anne-Françoise concluded our discussion by explaining that “NoC design is a constantly changing juggling act. Change one parameter and several other things could change. It’s as intellectually challenging as playing several games of chess simultaneously and it is immensely rewarding.”

The holistic perspective that Anne-Françoise offered was quite refreshing. Sondrel seems to have its act together when tackling near-impossible high-end design. If you need help doing the impossible, I would strongly recommend you contact Sondrel, now that you’ve seen how Sondrel explains one of the secrets of its success – NoC design. You can learn more about Sondrel here.

Also Read:

SoC Application Usecase Capture For System Architecture Exploration

CEO interview: Graham Curren of Sondrel

Sondrel explains the 10 steps to model and design a complex SoC


All-Digital In-Memory Computing

All-Digital In-Memory Computing
by Tom Dillinger on 03-15-2021 at 6:00 am

NOR gate

Research pursuing in-memory computing architectures is extremely active.  At the recent International Solid State Circuits conference (ISSCC 2021), multiple technical sessions were dedicated to novel memory array technologies to support the computational demand of machine learning algorithms.

The inefficiencies associated with moving data and weight values from memory to a processing unit, then storing intermediate results back to memory are great.  The information transfer not only adds to the computational latency, but the associated power dissipation is a major issue.  The “no value add” data movement is a significant percentage of the dissipated energy, potentially even greater than for the “value add” computation, as illustrated below. [1]  Note that the actual computational energy dissipation is a small fraction of the energy associated with data and weight transfer to the computation unit.  The goal of in-memory computing is to reduce these inefficiencies, especially critical for the implementation of machine learning inference systems at the edge.

The primary focus of in-memory computing for machine learning applications is to optimize the vector multiply-accumulate (MAC) operation associated with each neural network node.  The figure below illustrates the calculation for the (trained) network – the product of each data input times weight value is summed, then provided to a bias and activation function.

For a general network, the data and weights are typically multi-bit quantities.  The weight vector (for a trained, edge AI network) could use a signed, unsigned, or twos complement integer bit representation.  For in-memory computing, the final MAC output is realized by the addition of partial multiplication products.  The bit width of each (data * weight) arc into the node is well-defined – e.g., the product of 2 n-bit unsigned integers is covered by a 2n-bit vector.  Yet, the accumulation of (data * weight) products for all arcs into a highly-connected network could require significantly more bits to accurately represent the MAC result.

One area of emphasis of the in-memory computing research has been to implement a bitline current-sense measurement using resistive RAM (ReRAM) bitcells.  The product of the data input (as the active memory row wordline) and weight value stored in the ReRAM cell generates a distinguishable bitline current applied to charge a reference capacitance.  A subsequent analog-to-digital converter (ADC) translates this capacitor voltage into the equivalent binary value for subsequent MAC shift-add accumulation.  Although the ReRAM-based implementation of the (data * weight) product is area-efficient, it also has its drawbacks:

  • the accuracy of the analog bitline current sense and ADC is limited, due to limited voltage range, noise, and PVT variations
  • the write cycle time for the ReRAM array is long
  • the endurance of the ReRAM array severely limits the applicability as a general memory storage array

These issues all lead to the same conclusion.  For a relatively small inference neural network, where all the weights can be loaded in the memory array, and the data vector representation is limited – e.g., 8 bits or less – a ReRAM-based implementation will offer area benefits.

However, for a machine learning application requiring a network larger than stored in the array and/or a workload requiring reconfigurability, updating weight values frequently precludes the use of a ReRAM current sense approach.  The same issue applies where the data precision requirements are high, necessitating a larger input vector.

An alternative for an in-memory computing architecture is to utilize an enhanced SRAM array to support (data * weight) computation, rather than a novel memory technology.  This allows a much richer set of machine learning networks to be supported.  If the number of layers is large, the input and weight values can be loaded into the SRAM array for node computation, output values saved, and subsequent layer values retrieved.  The energy dissipation associated with the data and weight transfers is reduced over a general-purpose computing solution, and the issue with ReRAM endurance is eliminated.

In-Memory Computing using an Extended SRAM Design

At the recent ISSCC, researchers from TSMC presented a modified digital-based SRAM design for in-memory computing, supporting larger neural networks.[2]

The figure above illustrates the extended SRAM array configuration used by TSMC for their test vehicle – a slice of the array is circled.  Each slice has 256 data inputs, which connect to the ‘X’ logic (more on this logic shortly).  Consecutive bits of the data input vector are provided in successive clock cycles to the ‘X’ gate.  Each slice stores 256 4-bit weight segments, one weight nibble per data input;  these weights bits use conventional SRAM cells, as they could be updated frequently.  The value stored in each weight bit connects to the other input of the ‘X’ logic.

The figure below illustrates how this logic is integrated into the SRAM.

The ‘X’ is a 2-input NOR gate, with a data input and a weight bit as inputs.  (The multiplicative product of two one-bit values is realized by an AND gate;  by using inverted signal values and DeMorgan’s Theorem, the 2-input NOR gate is both area- and power-efficient.)  Between each slice, an adder tree plus partial sum accumulator logic is integrated, as illustrated below.

Note that the weight bit storage in the figure above uses a conventional SRAM topology – the weight bit word lines and bit lines are connected as usual, for a 6T bitcell.  The stored value at each cell fans out to one input of the NOR gate.

The output of each slice represents a partial product and sum for a nibble of each weight vector.  Additional logic outside the extended array provides shift-and-add computations, to enable wider weight value representations.  For example, a (signed or unsigned integer) 16-bit weight would combine the accumulator results from four slices.

Testsite results

A micrograph of the TSMC all-digital SRAM-based test vehicle is shown below, highlighting the 256-input, 16 slice (4-bit weight nibble) macro design.

Note that one of the key specifications for the SRAM-based Compute-in-Memory macro is the efficiency with which new weights can be updated in the array.

The measured performance (TOPS) and power efficiency (TOPS/W) versus supply voltage are illustrated below.   Note that the use of a digital logic-based MAC provides functionality over a wide range of supply voltage.

(Parenthetically, the TOPS/W figure-of-merit commonly used to describe the power efficiency of a neural network implementation can be a misleading measure – it is strongly dependent upon the “density” of the weights in the array, and the toggle rate of the data inputs.  There is also a figure below that illustrates how this measure depends upon the input toggle rate, assuming a 50% ratio of ‘1’ values in the weight vectors.)

Although this in-memory computing testsite was fabricated in an older 22nm process, the TSMC researchers provided preliminary area and power efficiency estimates when extending this design to the 5nm node.

Summary

There is a great deal of research activity underway to support in-memory computing for machine learning, to reduce the inefficiencies of data transfer in von Neumann architectures.  One facet of the research is seeking to use new memory storage technology, such as ReRAM.  The limited endurance of ReRAM limits the scope of this approach to applications where weight values will not be updated frequently.  The limited accuracy of bitline current sense also constrains the data input vector width.

TSMC has demonstrated how a conventional SRAM array could be extended to support in-memory computing, for large and/or reconfigurable networks, with frequent writes of weight values.  The insertion of 2-input NOR gates and adder tree logic among the SRAM rows and columns provides an area- and power-efficient approach.

-chipguy

 

References

[1]  https://energyestimation.mit.edu

[2]  Chih, Yu-Der, et al., “An 89TOPS/W and 16.3TOPS/mm**2 All-Digital SRAM-Based Full-Precision Compute-in-Memory Macro in 22nm for Machine-Learning Applications”, ISSCC 2021, paper 16.4.

 


Honda Asserts Automated Driving Leadership

Honda Asserts Automated Driving Leadership
by Roger C. Lanctot on 03-14-2021 at 10:00 am

Honda Asserts Automated Driving Leadership

When Honda Motor Co. tied up with General Motors almost a year ago to collaborate on vehicle propulsion technology, connected car tech, and assisted driving, observers might have been forgiven for thinking Honda was surrendering its independence to catch up in the EV race. Honda reasserted its independence last week with the launch of the semi-autonomous Honda Legend in Japan – a leasable fleet vehicle equipped with the world’s first Level 3 semi-autonomous driving system.

The announcement was momentous for validating SAE’s Level 3 classification. Industry insiders and experts frequently dismiss these classifications as artificial and unhelpful with Level 3 being the most controversial. Level 3 automated driving allows hands-off, eyes-off driving with the expectation and understanding that the driver must be available – presumably in the driver seat – to take back driving control under appropriate conditions.

SOURCE: SAE

Level 3 semi-autonomous driving is the mode that airline pilots look at and simply shake their heads – “It will never work.” Audi announced its intention to introduce Level 3 semi-autonomous driving on the A8 in Europe and then withdrew the announcement due to the lack of local regulatory support.

Honda’s announcement demonstrates that auto makers will distinguish between their autonomous driving development work for so-called robotaxis and shuttles and the technology they bring to potentially mass market vehicles. Honda was happy to invest billions in GM’s Cruise operation, but continued to pursue its own proprietary assisted driving platform – not unlike GM’s work on Super Cruise. In fact, most auto makers have similar in-house solutions in development.

According to AutoExpress: “Honda’s Sensing Elite system builds on the brand’s existing Sensing Elite safety technology, but uses a more accurate global positioning system, more detailed three-dimensional maps and several sensors which give the ECU a 360-degree view of the car’s surroundings.

“The most impressive part of the new Sensing Elite system is the hand-off driving mode, which uses Honda’s existing adaptive cruise control and lane-keeping assist systems to assume total control over the car when driving on the motorway.” The car is also capable of automatically overtaking slower vehicles as well as detecting an inattentive or unresponsive driver – automatically moving to the hard shoulder, bringing the car to a halt, and flashing its hazards and sounding the horn to warn other road users.

The Sensing Elite-equipped Legend became available in the form of 100 leased fleet vehicles in Japan each of which will cost approximately $100K due in part to the incorporation of five LiDAR sensors. The driver has up to 30 seconds to take over the driving task when alerted by the system – and is not responsible for anything that might occur during that 30 seconds.

The introduction of the car is reminiscent of GM’s launch of the electrified EV-1 as a lease. GM later called back and crushed all of the EV-1’s.

No immediate plans to bring the car and the new system to the U.S., U.K., or E.U. were announced. The arrival of the Honda Legend was enabled by new regulations in Japan that opened the door to Level 3-type operation.

The regulatory changes included a legal adjustment and adoption of WP.29 Automated Lane Keeping System rule. The legal change, adopted in April of 2020, allows for “the car, not the driver, (to be) responsible for the driving,” according to a report in Nikkei Asia. Japan then adopted the WP.29 ALKS regulation which establishes strict requirements for Automated Lane Keeping Systems (ALKS) for passenger cars which, once activated, are in primary control of the vehicle.

The UNECE states: “ALKS can be activated under certain conditions on roads where pedestrians and cyclists are prohibited and which, by design, are equipped with a physical separation that divides the traffic moving in opposite directions. In its current form, the Regulation limits the operational speed of ALKS systems to a maximum of 60 km/h.”

More details on the UN ALKS regulation: https://unece.org/transport/press/un-regulation-automated-lane-keeping-systems-milestone-safe-introduction-automated

Honda committed $2.75B (over 12 years) to GM’s Cruise autonomous vehicle operation, including a $750M equity investment and has announced further plans to bring Cruise’s Origin robotaxi to Japan for testing. From these announcements it is clear Honda wants a defined and limited commitment, but nevertheless a substantial stake, in the autonomous vehicle business.

When it comes to brand-defining mass-market semi-autonomous tech, though, Honda clearly prefers to keep its assets separate from GM. Honda may adopt GM’s electric propulsion tech and the associated Bolt connectivity and driver assist platforms to catch up in EVs. But GM’s Super Cruise will not displace Honda’s Sensing Elite system. Most important of all, Honda’s introduction of the Sensing Elite system on the Legend has served notice to the world that Japan – long silent in the world of autonomous driving – will be a leader.


A Funny Thing Happened on the Way to 5G Cars

A Funny Thing Happened on the Way to 5G Cars
by Roger C. Lanctot on 03-14-2021 at 8:00 am

A Funny Thing Happened on the Way to 5G Cars

Prior to the arrival of the COVID-19 pandemic car makers were inching toward putting their 5G connectivity plans into action.  There was a growing recognition, at the time, that no one wanted to miss the 5G car boat – and no car maker would want to be stuck selling 4G cars in a 5G world.

The onset of the pandemic was a shock to the system.  Advanced connectivity plans suddenly seemed secondary to basic survival.  The 5G road maps were shelved; autonomous vehicle development was frozen; and the industry held its breath for two critical months.

When car dealerships and factories re-opened and customers returned, those 5G plans now looked a little too ambitious.  A closer look at 5G revealed higher costs and some delayed network deployments.  Maybe 5G in cars could wait.

The arrival of 2021 has changed all that.  Car makers are back to the 5G drawing board and 5G deployments are beginning to trickle into premium vehicle RFQs with market introductions just a year or two away.  Still, the return of 5G planning has remained out of the headlines.

Car makers are battling over batteries and prattling about robotaxis, but no one is talking about 5G.  That is, no one, but the Chinese.

Chinese auto makers such as BAIC, BYD, GAC, SAIC, Great Wall, Nio, WM Motors, and DFM have all announced their 5G plans.  Others – FAW, SAIC, Ford, and Human Horizons – have announced C-V2X plans.  In the U.S. and the E.U., there is radio silence.

“Western” auto makers have, for the most part, clammed up regarding their C-V2X and 5G plans due to pending disputes with regulators over the standards and spectrum allocations for enabling inter-vehicle communications.  The dreaded Wi-Fi-based DSRC (dedicated short-range communication) technology proposed 20 years ago for vehicle-to-vehicle communications remains a sticking point on both sides of the Atlantic.

In the U.S., the Federal Communications Commission split up the spectrum that had originally been preserved for V2V (in fact, V2-everything) communications in an effort to open up more spectrum for more unlicensed wi-fi applications.  This was intended to be the “last word” on the subject – the end of DSRC in the U.S.  But the devotees fight on – thereby freezing OEMs leery of adopting a technology in flux.

In the E.U., the European Commission is working under the guidance of the ITS Directive to define a regulatory regime to enable an interoperable transportation connectivity regime.  The Commission is known to still favor ITS-G5 (i.e. DSRC) technology in spite of some claims of being technology agnostic.

Further complicating matters in the E.U. is Volkswagen’s launch of the DSRC-equipped Golf and a Euro NCAP (New Car Assessment Program) which includes V2X – with the relevant technology yet to be defined.  As in the U.S., E.U. auto makers are loath to lay out their plans not knowing which way the regulatory winds will blow.

As noted, none of this 5G hesitancy applies in China where top down regulatory alignment has all but ruled out DSRC technology for most V2X applications.  The clarity is no doubt welcomed by auto makers with their 3-4-year development cycles and adherence to standards.

One can only hope that the DSRC fever will finally pass and auto makers in the U.S. and Europe can get on with the business of making safer, better connected cars.  The promise of C-V2X and 5G wireless technology is enhanced vehicle situational awareness and collision avoidance – something sorely needed in the U.S. where annual highway fatalities are once again on the rise.  It would be a shame, and not very funny, if more lives were lost for the sake of DSRC nostalgia.


Podcast EP11: Semiconductor Shortages and the CHIPS Act Explained

Podcast EP11: Semiconductor Shortages and the CHIPS Act Explained
by Daniel Nenni on 03-12-2021 at 10:00 am

Dan and Mike are joined by Terry Daly for a thoughtful and informative overview of global semiconductor supply challenges and an excellent overview of the CHIPS for America Act, an ambitious piece of US legislation aimed at establishing investments and incentives to support U.S. semiconductor manufacturing, research and development, and supply chain security.

Terry is a Senior Fellow – Council on Emerging Market Enterprises at The Fletcher School of Law and Diplomacy. He is a long-time veteran of the semiconductor industry. His prior experience includes SVP at GLOBALFOUNDRIES. There, he served as head of strategy and corporate development, chief of staff to the CEO, and head of corporate program management. He had instrumental roles in global strategic alliances and M&A, including the acquisition of the microelectronics business from IBM.

There is a lot of very useful information in this podcast. As a result, we exceed our 30 minutes maximum length rule by two minutes.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: R.K. Patil of Vayavya Labs

CEO Interview: R.K. Patil of Vayavya Labs
by Daniel Nenni on 03-12-2021 at 6:00 am

RK Patil 1

RK has over 25 years of industry experience in the domain of Telecom, Embedded software and Semiconductors. Before co-founding Vayavya, he was a co-founder at Smart Yantra Technologies, where he has held various positions in engineering, marketing and management. At Vayavya RK is responsible for overall management and strategic decisions related to engineering and business activities.

Vayavya Labs is different from most EDA companies as it provides solutions for software engineers to validate their software for SoCs. How has this journey been so far?

We started Vayavya Labs in 2006, with the single-minded focus of addressing the evolving needs of the embedded software industry and help companies adapt to the different software environments needed for their SoCs.  By then, highly programmable SoCs had already started making inroads into a number of different devices and systems in consumer electronics, automotive, communication and industrial applications. The programmability of these devices and the associated software environments (operating systems and software architectures) had compelled semiconductor companies to build software engineering teams, which often outnumbered the ASIC engineering teams.

Our approach to addressing this problem in the industry was to provide a set of tools and a methodology that would enable code synthesis/generation from a high-level golden specification. This was rather a niche concept and probably a bit ahead in time. We had a number of people appreciating the technology but unwilling to adopt the tool from a small company based out of India.

In 2008, we started an embedded software services unit to help sustain the company and fund our R&D efforts. Over the past decade we have grown steadily, become profitable and have sharpened our focus to address the demanding requirements of SoC verification from a system, software and a hardware perspective.

Today, we are in a rather, unique situation of having the expertise and the solutions to address the hardware and software verification requirements at the architectural, pre-silicon and post-silicon stages of the SoC design flow, to ensure shorter development cycles.

With the number of people working on SoC designs constantly increasing as system companies start developing their own SoCs, what challenges do you envisage Vayavya Labs to solve?

One of the biggest challenges facing designers developing the complex SoCs of today is that there is a need to not only verify the IC designs functionality according to the specification, but also from an end-system context. With shorter time-to-market windows, it is also becoming a necessity for design houses to ensure that the system and SoCs adhere to the many different industry specific standards and requirements prior to taping out. Verification today for a SoC, no longer implies merely hardware verification but also software verification and requires intimate knowledge of embedded systems to ensure system compliance.

As a contributing member of the Accellera committee for portable stimulus (PSS), we have contributed extensively to this standard and our contributions are referred to as Hardware-Software Interfaces (HSI). To enable design companies easily verify their SoC’s from a system software perspective, we have launched an open initiative called OpenHSI™ where designers can leverage from a readymade library of device drivers and middleware stack. We strongly believe that PSS needs to leverage the full potential of HSI /OpenHSI™  in order to realize  software-driven-verification.

At Vayavya Labs, we help companies meet the challenges of verification in three ways namely:

  • We provide virtual platforms by creating models, which can be used to verify the architecture, explore the performance and then subsequently verify the IP/SoC at both pre-silicon and post-silicon stages. In addition, the virtual platforms can also be used to validate the software using emulation.
  • We provide software tools for generating bare metal and operating system specific device drivers, from a hardware-software interface specification, making it easier to validate the hardware-software interface and the SoC functionality from a software perspective.
  • Enable companies to validate their SoCs by building PSS models to automate verification test cases. Additionally, we also help them realize these tests by providing the necessary software drivers and stacks to validate the SoC across all platforms – virtual models, simulation, emulation etc.

In a post-covid era, there is a growing importance of a Digital-Twin. A number of companies are now adopting virtualization for their IPs and SoCs. What solutions does Vayavya Labs provide in this area?

The notion of a digital twin becomes very pertinent these days as it provides an alternative for software and hardware design teams, dispersed across the world to rapidly test, debug and refine their systems. Virtualization as the name implies, removes the dependency on physical hardware for the designers. It provides design teams an opportunity to explore the performance requirements early in the design cycle as well as ensure consistency in the hardware and software verification results for common tests, eliminating the potential of misinterpretation between the two teams. In addition to ensuring concurrent development of software & hardware, virtualization also enables design teams to robustly test the hardware-software to mitigate post-silicon issues.

Unfortunately, developing virtual models is quite challenging as it needs varied technical skills such as knowledge about the modeling languages, insights into the hardware architecture, embedded software and domain knowledge.

Vayavya Labs provides two types of solutions to address the need for virtual platforms. The first solution includes providing ready-made  generic modeling libraries to jump-start virtual platform development and software for automatic generation of device drivers. The second type of solution includes custom development services such as developing custom SystemC, QEMU, Simics, SimNow models for IPs and SoC peripherals in addition to custom development of bare metal software, OS bring up, OS port and software device drivers.

Vayavya Labs has been part of the Accelera committee for portable stimulus (PSS) and has contributed significantly in the area of hardware-software interface. There is also an OpenHSI™ initiative, which Vayavya Labs has initiated to promote PSS usage. Can you elaborate more about it?

The benefits of using PSS are significant as it enables design teams to define the test intent through the domain specified language (DSL) in PSS and use it across all platforms such as simulation, FPGAs, emulation etc. It also provides the hardware and software engineers an inherent ability to test the SoC from a system and software perspective. However, one of the biggest impediments to PSS adoption is in realizing the tests defined in the DSL as it requires API’s, device drivers and middleware/protocol stacks to be exercised completely by the software. All this falls in the realm of embedded software, the nuances of which most hardware design engineers are uncomfortable with. Consequently, to promote the usage of PSS across semiconductor companies, Vayavya Labs launched the OpenHSI™  initiative which includes some commonly used API’s, device drivers & middleware stacks for use by engineering teams to jumpstart their system level verification.

Your team has been working on the device driver generator (DDGen) tool for some time now, something which would be of immense value as design companies struggle to verify their SoCs from a system perspective. Can you provide any insights  about it?

Thanks for bringing up this question. Our efforts to develop a software tool (DDGen), which can automatically generate device drivers from a specification, continues to evolve as we keep learning and  adapt continuously to the changing trends in the industry such as safety in automotive, consumer electronics etc. DDGen Tool is now a stable product and is currently in use with a few customers. With the current emphasis on system validation, we expect to have more customers using DDGen.

Our current offering of DDGen, also helps automotive ECU developers  with MCAL automation. The MCAL drivers generated through DDGen can easily be integrated with any AUTOSAR stack provider to enable rapid development of ECU software for automotive applications.

DDGen, in addition to creating device drivers, also plays a vital role in PSS adoption for SoC system level verification as it generates consistent  register access APIs and bare metal drivers to be used by all hardware and software teams.

What does the next 12 months have in store for Vayavya Labs?

We have grown at a modest rate over the past decade, building up the necessary expertise, domain knowledge and credibility with the customers while maintaining profitability. We are now well poised for growth in the automotive, consumer electronics and wireless/5G verticals. We are also continuing to expand our presence in North America and Europe. With strong fundamentals we are looking forward to investing in new areas along with growth.

Also Read:

CEO Interview: Dr. Shafy Eltoukhy of OpenFive 

CEO interview: Graham Curren of Sondrel

CEO Interview: Mark Williams of Pulsic