Bronco Webinar 800x100 1

Methodology for Aging-Aware Static Timing Analysis

Methodology for Aging-Aware Static Timing Analysis
by Tom Dillinger on 12-28-2021 at 10:00 am

char STA flow

At the recent Design Automation Conference, Cadence presented their methodology for incorporating performance degradation measures due to device aging into a static timing analysis flow. [1] (The work was a collaborative project with Samsung Electronics.)  This article reviews the highlights of their presentation.

Background

Designers need to be cognizant of the mechanisms that contribute to degradation over the operational lifetime of a part, to ensure the overall product requirements are satisfied (e.g., FIT rate).  There are both failure and degradation mechanics to address.

Failure criteria are an absolute consideration, while degradation (or “aging”) processes may result in a hard fail or have an adverse impact on circuit performance.  The methodology for analyzing an aging mechanism involves an engineering assessment of the expected temperature and voltage environment, plus the switching activity likely to be applied during the part’s lifetime.

Failure Mechanisms

There is little latitude associated with the addition of ESD protect and latch-up suppression circuitry to avoid the related failures.

Time-dependent dielectric breakdown (TDDB) is an aging factor due to the “wearout” of the gate oxide dielectric.  The mechanism associated with TDDB is a thermo-chemical reaction, where (weak) chemical bonds in the dielectric are broken after extended exposure to the gate electric field.  The common model for TDDB is thus strongly dependent upon temperature and applied gate voltage, and may support a “soft” (resistive) followed by a “hard” breakdown current path through the gate dielectric.

The peak current density in interconnects and vias is an immediate failure process.   The resistance change due to electromigration is an aging process, also strongly dependent upon temperature.  (Parenthetically, some methodologies view jRMS-related electromigration wearout analysis as indicative of a hard fail, whereas other methodologies approach the PDN and/or signal interconnect resistance increase as a performance-related impact.)

Degradation Mechanisms

There are two principal device degradation aging mechanisms designers need to analyze, in terms of the potential performance impact – i.e., hot carrier injection (HCI) and bias temperature instability (BTI).  These are not direct fail processes, in that they result in changes in device drive currents and threshold voltages, but not an immediate failure of the circuit to operate.  They relate to the presence of carrier “trap states” at the channel interface and in the gate dielectric stack.  Channel carriers may cross the potential barrier at the interface (at high electric fields) and fill the traps.  The result is a change in the effective electric field at the channel from an applied gate voltage.

  • HCI

HCI is commonly associated with a device operating in the saturation region – also, commonly referred to as “pinchoff” at the drain node.  Carriers accelerated through the pinchoff depletion region are subjected to the gate-to-drain electric field.  These carriers may originate from the channel current and/or from secondary carriers due to impact ionization.  These energetic carriers may undergo a collision resulting in a vertical velocity vector, and may then trap in the dielectric stack near the drain.  Hot carriers may also break chemical bonds in the dielectric stack, resulting in the generation of additional traps.

The result is a localized reduction in the gate-to-drain electric field, as part of the electric field now terminates on the trapped charge.  This is typically modeled as a reduction in the effective channel carrier mobility.  (Note that if the device is used as a bidirectional pass gate, this drain node now becomes the source – a model that alters the threshold voltage rather than the carrier mobility may be more appropriate.)

For logic circuits, devices are operating in the saturation region only during a brief interval of a signal transition.  (For analog/mixed-signal circuits, devices biased in saturation are subjected to greater HCI exposure.)  As a result, logic performance degradation is commonly associated with BTI.

  • BTI

The bias temperature instability mechanism is present when the device is operating in the linear region.  This occurs when a logic device is “on” and has completed a signal transition.

Channel carriers enter the dielectric stack and fill trap states.  BTI manifests as an adverse shift in the device threshold voltage – i.e., an increase in the absolute value of Vt for both nMOS and pMOS devices.  Negative BTI (NBTI) refers to the pMOS device Vt shift, due to the negative gate-to-channel electric field direction; pBTI refers to the nMOS device Vt shift.

The delta in the threshold voltage eventually saturates over time as trap states are filled.  Note that BTI models also include a (partial) recovery in the Vt shift for the time period when the device gate-to-channel electric field is reversed, as depicted below. [2]

As the BTI mechanism is present whenever a logic gate is quiescent, the Vt shift contributes to significant performance degradation over a part’s lifetime.

Static Timing Analysis Methodology with Aging

The simplest method to modeling aging effects would be to apply a multiplicative “derate” to the target cycle time.  In short, the “fresh” cycle time used during design timing closure would be multiplied by a conservative aging factor and released with the reduced frequency spec – i.e., a “guardband” approach.

Alternatively, a more sophisticated method would be to apply a cell instance-specific delay calculation for aging to an STA flow.  The individual cell delay arcs would reflect a (voltage and temperature) environmental assumption over the circuit lifetime.  This method requires a cell library characterization strategy that expands upon the traditional model of:

delay_arc = f( PVT, input_slew, output_load)

to include new dimensions, reflecting the aging delay value.  The figure below depicts the Cadence methodology for cell characterization and aging-aware STA.

The characterization strategy requires adding delay values for different combinations of Vt shifts due to BTI of individual devices.  Spice aging models are provided by the foundry.

The static timing analysis flow is depicted on the right side of the figure above.  An additional input to the aging-aware STA flow is a description of the (piece-wise) expected voltage and temperature conditions which individual blocks will experience over the part lifetime.  The methodology for calculating the duration for which each device is subjected to forward and recovery BTI stress is based on signal probability measures, as illustrated in the figure below.

As an example, for the 2-input NAND gate in the figure, if pin A has a (0,1) probability of (0.44,0.56), and pin B has a (0.6,0.4) probability, the gate output will have a (0.224,0.776) probability to apply to its fanout, derived from the calculation (0.56*0.4, 1 – 0.56*0.4).

An alternative approach would be to apply signal value duty cycles from extensive (gate-level) workload simulations.  The probabilistic approach is simpler, yet it may not reflect extended periods of operation in a specific quiescent state.

To illustrate the flow, Cadence collaborated with Samsung on a 5nm process node design example.  Using the Samsung aging model design kit for library cell characterization, STA was pursued for a core-level design.  Then, 500 paths were selected for a detailed Spice-based aging delay simulation.  The STA versus Spice comparison data is shown below.

Summary

Designers need to evaluate performance degradation effects due to BTI stress over a part’s lifetime.  Using a uniform guardband multiplier is could be quite inaccurate, as it would not be representative of the varying stress/recovery characteristics of (instance-specific) circuit activity.

For more information on the aging-aware STA flow from Cadence, please follow this link.

References

[1]  Amin, C., et al., “Aging-aware Static Timing Analysis”, DAC 2021.

[2]  https://www.cadence.com/en_US/home/tools/custom-ic-analog-rf-design/custom-ic-analog-rf-flows/legato-reliability-solution/advanced-aging.html

Also Read

Scalable Concolic Testing. Innovation in Verification

More Than Moore and Charting the Path Beyond 3nm

Topics for Innovation in Verification


Delivering Systemic Innovation to Power the Era of SysMoore

Delivering Systemic Innovation to Power the Era of SysMoore
by Kalar Rajendiran on 12-28-2021 at 6:00 am

Evolving Landscape

With the slowing down of Moore’s law , the industry as a whole has been working on various ways to maintain the rate of growth and advancements. A lot has been written up about various solutions being pursued to address specific aspects. The current era is being referred to by different names, SysMoore being one that Synopsys uses. Chairman and co-CEO of Synopsys, Aart de Geus coined this term as a shorthand way to describe the new era. One that blends Moore’s law driven advances with innovations that tackle systemic complexity. As per Synopsys’ website, “SysMoore is a descriptive term for state-of-the-art integrated circuit design, which combines the scale complexity of Moore’s law with the systemic complexity of hyper-convergent integration.”

Synopsys gave a presentation at DAC 2021 on the topic of delivering systemic innovation to power the era of SysMoore. The talk was given by Neeraj Kaul, VP of Engineering, Silicon Realization Group (SRG) at Synopsys. He starts by looking back at Moore’s Law era and spends the rest of his presentation focusing on the SysMoore era. He highlights new complexities and opportunities for new advances and what Synopsys is bringing out in terms of new technologies for this era. The following is a synthesis of the salient points I gathered from his talk. You can listen to Neeraj’s entire talk from the TechTalks track of DAC 2021 2021 virtual sessions.

View of an Evolving Landscape

Transformation is happening at a much faster rate than we have seen in the past few decades. The amount of compute power currently available is tremendous. At the same time, the amount of data being sensed, processed, transferred in petabytes, exabytes and zettabytes is requiring us to re-examine our way of computing. The number of design starts are accelerating at a rapid rate. This is placing tremendous pressure on the industry and calls for thinking of new ways of handling the complexity requirements and time pressure demands of the markets.

There are a number of vertical markets in this evolving landscape. Refer to Figure below. While the markets are vertical, there are some things all of them have in common. Those common things are the time-old performance, power and area (PPA) requirements and an increased pressure for cost and turnaround time to results. Together, these five things are termed by the acronym PPAct. Generic purpose chips cannot deliver to market/product expectations on PPAct metrics. The pressure is pushing customers to design custom silicon. Custom silicon initiative allows customers to look at the entire system all the way from software to silicon and optimize through vertical integration.

As if the PPAct pressures are not enough, SysMoore applications introduce Vertical-Specific challenges into the mix. For example, mean-time to failure, longevity of a chip, security, etc., become critically important when dealing with data center, automotive and healthcare markets.

The Waning of Moore’s Law Era

Moore’s Law had been delivering well over several decades. We got accustomed to seeing 2x improvements on all three aspects of the PPA metric, every two years or so. Last few years, we have seen a flattening of the Moore curve. PPA improvement is becoming difficult to achieve simply by moving from current process node to the next node. As we started entering the sub-7nm era, power and performance are not scaling at the same rate as Moore’s law has been delivering. We are seeing only 15% to 30% improvement moving from node to node. Power and performance are becoming bottlenecks, while the area scaling continues to deliver at 2x. But the market demand for power and performance improvements remains. The industry and the market have entered the SysMoore era.

Synopsys’ Approach to Powering the SysMoore Era

The SysMoore era requires innovations in many different areas in addition to moving from node to node. We need ways to deal with systemic complexities and continue to advance in the same way and same rate at which we were doing in the past. The systemic complexities are adding to the explosive demand on engineering resources, compute power needed and turnaround time expectations. We need techniques to improve overall productivity, so that we don’t need 2x-3x number of engineers to tackle the SysMoore era designs and systems.

Synopsys has identified six vectors as complexity/efficiency roadmap drivers to power the SysMoore Era.

Enabling domain-specific architectures

Support for domain-specific architectures is key to achieving customers’ PPAct metrics as these architectures help maximize performance and minimize power for each application. Synopsys’ Platform Architect and RTL Architect products are used by designers and architects to customize and optimize their systems and chips. Neeraj shared a customer example where they used the RTL Architect product to explore a larger design space and choose the right RTL architecture. The customer was able to achieve 5X faster TAT and 300MHz frequency boost for their product.

Scaling Challenge

Traditional tools/flow requires iterations, builds in pessimistic margins and delivers sub-par PPA results. 1D, 2D and diagonal placement rules and context-based timing and power all are crucial to consider up in the early stages of a design. The Fusion technology/platform from Synopsys is a hyperconverged system handling RTL to Tapeout with an integrated common database. The flow/platform is augmented with AI-driven Design-Space-Optimization (DSO) to achieve better results faster. And a comprehensive analytics platform completes the trifecta. This triple play of Fusion, DSO and Analytics platform enables customers to quickly and accurately identify root causes of issues. This in turn helps customers rapidly resolve the issues.

A customer example that Neeraj presented shows a 11% power reduction with just one engineer working on a high-performance GPU design. In the past, achieving comparable results would have consumed many engineers working on it for many months.

Robustness analysis for advanced-node variability

On-chip variation is a big issue these days as we move to finer and finer geometries. Synopsys PrimeShield analyzes robustness of a design for on-chip variation. It performs sensitivity analysis and fixes paths before silicon failure. The tool helps identify sensitive bottlenecks and improves resilience to IR drops. This analytical capability helps improve post-silicon robustness by detecting voltage slack paths and optimizing before tapeout.  Voltage slack is a new metric to measure how resilient a design is to voltage variation. Neeraj shares a customer example where a 9% voltage slack improvement was achieved on a CPU core.

3DIC Compiler

Synopsys 3DIC compiler enables efficient integration of system-of-chips, aka chiplets leveraging 2.5/3D multi-die designs. It leverages the Fusion single data model and allows for fast exploration and pathfinding to accelerate design process. Auto die-to-die (D2D) routing, native DRC and DFT for design realization and validation are included. Together with signal integrity, power integrity, thermal and EMIR analysis, it assists designers in arriving at optimal PPA per sq.mm.

Summary

The fusion of tools over an integrated common database, the deployment of AI techniques to augment the tools and the provision of insightful analytics are key to powering the SysMoore era. Synopsys’s innovations are designed to address the PPAct, productivity, safety, security and resilience requirements of this era’s markets and applications.

Also Read:

Creative Applications of Formal at Intel

Synopsys Expands into Silicon Lifecycle Management

CDC for MBIST: Who Knew?


DAC 2021 – Taming Process Variability in Semiconductor IP

DAC 2021 – Taming Process Variability in Semiconductor IP
by Daniel Payne on 12-27-2021 at 10:00 am

process node variability min

Tuesday at DAC was actually my very first time attending a technical session, and the presentation from Nebabie Kebebew, Siemens EDA, was called, Mitigating Variability Challenges of IPs for Robust Designs. There were three presentations scheduled for that particular Designer, IP and Embedded Systems track, but with the COVID precautions, only one presenter was on site. The technical paper authors were from both STMicroelectronics and Siemens EDA.

ST designs both digital and mixed-signal IPs for use in diverse applications like IoT, automotive and even AI, using process nodes from 90nm down to 18nm. For safety critical products like automotive chips, the challenge is to reduce the failure rate, measured in Parts Per Million (PPM). High-sigma verification is needed to meet the stringent PPM goals, and with each new process node the design sizes are getting more complex, causing even more simulation runs to verify across many Process, Voltage and Temperature (PVT) corners.

To reach the PPM goals would require high-sigma Monte Carlo circuit simulation for millions or billions of runs, something that is not feasible because the run times are just too long.

At ST they wanted to use one EDA tool flow to verify both standard cell libraries and memory IP across all PVT corners, using a smarter Monte Carlo simulation approach to achieve high-sigma verification in much less time.

Standard Cells

The challenge was that the standard cell libraries contained about 10,000 cells, and required 100 PVT corners to be simulated. Another factor was that IC design process variations are non-Gaussian at high-sigma.

For example, the long-tail distribution of 894k samples on an internal node of a sequential cell is shown below:

Non-Gaussian distribution

The bars in Green are from running just 1,000 samples, and if we assumed a Gaussian distribution then linear extrapolation predicts a node value of 723mV at 4.5 Sigma. Using a worst-case of 894k samples, the actual worst node value is 680mV, which is about 50mV different versus linear extrapolation. For sequential cells like a Master Slave Flip-Flop you cannot use linear extrapolation to qualify high sigma, or wait long enough for brute force Monte Carlo results to complete.

An EDA tool from Siemens EDA called Solido Variation Designer addresses these exact issues with a two stage flow:

  1. Solido PVTMC Verifier
  2. Solido High-Sigma Tech

The first tool uses a machine learning algorithm to identify all of the worst-cast PVT corners, fast and accurate. Even the long-tail distributions are captured, and the approach dramatically reduces the actual number of PVT corners to be simulated.

The Solido High-Sigma Tech launches a SPICE simulator called Eldo to capture the rare simulation events at the highest-sigma points using only the worst-case PVT corners identified in the first tool, thus reducing the verification run times.

Worst-Case Points

With this two stage flow to verify latch robustness at 6-sigma for a positive, edge-triggered Master Slave Flip-Flop, required  only 20 PVT combinations:

  • 5 process corners: FF, SS, FS, SF, TT
  • 2 voltages: 0.80V +/- 10%
  • 2 temperatures: -40C / 125C

Real SPICE circuit simulations were run at high-sigma conditions in order to capture the non-Gaussian long-tail behavior, but only on the worst PVT corners, resulting in high-sigma verification that was 10,000X faster than brute-force techniques.

Memory IP

Like with standard cells, there was a huge verification space for Memory IP with about 64 PVT combinations, and tens of IP instances. Netlists were quite large, with over 1M components. The bitcell on memory IP require a 6-sigma verification, and there are millions of bitcells on a memory IP block.

Using conventional verification methods an engineer would identify the memory instances with the smallest race margins and worst PVT corners, then run 300 Monte Carlo circuit simulations by sigma saturation on all of the race conditions, then assume a Gaussian distribution to determine the -5 sigma tail values. Drawbacks of this verification method are that you don’t know which PVT corners that race condition tail failures will occur, instances with larger nominal race margin values may have a larger sigma too, and the process deviates from Gaussian long-tail distributions.

Shown in red below are Gaussian approximations, while in reality the green curve shows actual variations.

Gaussian approximation – Red

Using the two stage flow with Solido Variation Designer on memory IP for 28nm and 40nm designs showed that high-sigma verification results could be obtained with 27,000 faster results, compared to brute-force Monte Carlo circuit simulations.

Summary

This group at ST has adopted a unified flow and methodology for standard cell libraries and memory IP verification, across PVT corners, while still modeling non-Gaussian behaviors, and achieving dramatic runtime reductions compared to brute-force Monte Carlo simulations, by using Solido Variation Designer. Nebabie also did a poster session on this paper at DAC.

Nebabie Kebebew, Siemens EDA

Related Blogs


5 Talks on RISC-V

5 Talks on RISC-V
by Milos Tomic on 12-27-2021 at 6:00 am

Milos Tomic

Veriest recently hosted a webinar focusing on RISC-V as a forerunner of ongoing open-source revolution in chip design. Speakers were distinguished professionals from industry and academia. Webinar covered topics from market trends to open-source hardware initiatives, tools and methodologies.

Zvonimir Bandić: RISC-V market update and CHIPS Alliance

Zvonimir is a Research Staff Member and Senior Director of Next Generation Platform Technologies Department at Western Digital (WD) and Chairman at CHIPS Alliance. He shared a story of how RISC-V came to WD accidentally from Berkeley University and made a huge impact on the company. Zvonimir cited a report from 2020. that claims that 23% percent of all ASIC and FPGA projects incorporate RISC-V in some way, while his personal feeling is that this percentage is even higher. Some of the markets where RISC-V already found its place are Data Centers, Cloud, HPC, Telecom, Automotive, Consumer and IoT, AI/ML, Edge computing etc. It is estimated that RISC-V CPU core market will grow at 114.9% CAGR, capturing over 14% of all CPU cores by 2025 – nearly 80 billion cores. Zvonimir claims that the core is only 3% of the whole ecosystem and that along with the core market, the surrounding IP and software markets will grow as well, offering a vast number of opportunities for companies and individuals to jump on-board.

Some of the main challenges in chip design today are development cost, time-to-market and need for purpose-built architecture. CHIPS Alliance is an organization looking to address these challenges by focusing on open-source hardware and open-source software for hardware design. It develops and hosts open-source RISC-V CPUs, hardware IPs and open-source ASIC & FPGA development tools. People and organizations looking to start design of their own product can find everything they need on CHIPS Alliance GitHub.

Prof. Borivoje Nikolić: Chipyard – Generating the next wave of custom silicon

Prof. Nikolić is a Distinguished Professor of Engineering at University of California, Berkeley. He shared insights on how RISC-V was born at Berkeley and how they are addressing main challenges in the chip design these days. Like Zvonimir, prof. Nikolić sees increased market demand for specialized chips and main challenges are in development cost and time it takes to build custom chip. At Berkeley they believe that current way of delivering IPs as black boxes greatly affect reusability. Instead of delivering instances, their approach is to use parametrized generators to describe the hardware and generate RTL. Generators are written in a hardware design language Chisel which is based on Scala. Generators not only provide easy way to customize designs, but also enable agile hardware development and fast turn-around cycles, something that is hard to achieve with traditional approach. Proof of this concept is Rocket Chip, a parametrized SoC generator with hundreds of commercial implementations.

To build a complete chip, a lot of open—source components are used. To connect tooling, generators and flows together, at Berkely they created a framework called Chipyard which is a one-stop-shop for SoC agile design.

Vladislav Palfy: OneSpin 360 processor verification app

Vladislav is a Senior Applications Engineering Manager at OneSpin. OneSpin company is a part of Siemens EDA and member of RISC-V International and Open Hardware Group. Vladislav explained how OneSpin 360 product can be used to execute formal verification of RISC-V core. Also, Vladislav pointed out the advantages of formal verification over simulation and why it is especially suitable for such complex design as a processor. In functional simulation it is very hard if not impossible to describe and cover all states. In OneSpin 360, the formal verification test process is automated, no test development nor assertion specification is required, and runtime and coverage closure are much faster. In addition, formal verification will help us find bugs we were not looking for, but also discover if there are functionalities which are not documented – this case is something they had with one of the popular RISC-V cores. OneSpin 360 supports RISC-V extensions, custom instructions as well as RISC-V cores specified in Chisel. In case of an issue, tool offers graphical environment for debugging where user can see failing checker, trace and code that caused the failure.

Siniša Stanojlović: RISC-V Memory protection

Siniša is a CEO of company Micro Circuits Development, a professional services provider for embedded systems. Siniša elaborated on vulnerabilities of all-connected/all-smart devices. RISC-V based devices are not immune to these threats, but they are different. While RISC-V devices implement similar security modules as other architectures, key difference is that they are open. Like in software, some see this openness as an advantage, others as a disadvantage.

Further Siniša focused on the example of memory protection in RISC-V through memory isolation. To achieve this, RISC-V ISA includes privileged instruction set specification which defines 3 types of computer systems:

  • Systems that have only machine mode
  • Systems with machine mode and user-mode.
  • Systems with machine-mode, supervisor mode, and user-modes.

RISC-V has physical memory protection, which is used to enforce memory access restrictions on less privileged modes e.g. from machine mode RISC-V can configure which user mode applications can access to which parts of the memory.

Miloš Tomić: Getting started with open-source RISC-V cores

I’m an ASIC Design Engineer at Veriest, an ASIC design and verification services provider. RISC-V surge created a lot of new business opportunities for service companies in the semiconductor industry.

For this webinar, I shared my view on RISC-V ecosystem, and my RISC-V enrolling experience. The focus was on available open-source core implementations and their specifics. I covered some of the key consideration that had to be made when choosing an open-source core for a new project. This includes core features, target application and technology, software requirements, licensing etc.

In the end, a short summary and comparison of some of the most widespread RISC-V implementation was given:

Finally, conclusion was that you can build your own RISC-V SoC just by using open-source tools and components, and there is more than one path you can take.
We’re looking forward to continuing to explore this interesting topic in future events. If would like to be informed about such event, please let us know here.

Also Read:

Ramping Up Software Ideas for Hardware Design

Verification Completion: When is Enough Enough?  Part II

Verification Completion: When is enough enough?  Part I


Podcast EP54: Ventana Micro, RISC-V, HPC and Chiplets

Podcast EP54: Ventana Micro, RISC-V, HPC and Chiplets
by Daniel Nenni on 12-24-2021 at 10:00 am

Dan is joined by Balaji Baktha, founder and CEO of Ventana Micro. Balaji explores the application of RISC-V in high-performance applications and the specific advantages of a chiplet-based approach.

RISC-V Summit Panel: https://www.youtube.com/watch?v=duZaAhWxhWM

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


AI for EDA for AI

AI for EDA for AI
by Daniel Nenni on 12-24-2021 at 6:00 am

Agnisys AI EDA AI

I’ve been noticing over the last few years that electronic design automation (EDA) vendors just love to talk about artificial intelligence (AI) and machine learning (ML), sometimes with deep learning (DL) and neural networks tossed in as well. It can get a bit confusing since these terms are used in two distinct contexts. The first is the use of EDA tools to develop chips for AI applications, which are some of the largest and most complex designs being developed today. Of course, self-driving cars and other autonomous vehicles are the most popular examples. It’s easy to see why; being able to replace human drivers and all the multi-faceted decisions they make requires a ton of powerful AI software running on specialized hardware. The system must be optimized for image recognition, ML, natural language processing (NLP), real-time responses, and more. AI shows up other applications as well; speech recognition in particular is everywhere.

I guess that’s obvious, but the other context is the use of AI techniques within EDA tools. That is not as widely known despite the tendency of EDA vendors to trumpet such usage. At times I’ve wondered if it’s all a lot of hype to jump on the AI bandwagon, but at this point I think it’s clear that there are at least a few, and perhaps many, places in the EDA sphere where AI and ML really do apply. For example, vendors have announced implementation tools (logic synthesis, floorplanning, and place and route) that use ML from past projects and the results thus far on the current project to improve chip power, performance, and area (PPA). In addition, AI-based recognition of error signatures can speed debug of failing tests during chip verification. Just before DAC, another example caught my attention: Agnisys announced the use of AI to translate English descriptions of design intent to SystemVerilog Assertions (SVA), and vice-versa. I had not heard of AI/ML being used for this purpose before, so I decided to learn more.

The first thing that struck me was that the press release announced a “technology” and not a product. It sounded as if the translation was available to anyone at iSpec.ai so I checked it out. I was pleasantly surprised to find the site just as advertised. Users can type in some English text and push a button to generate SVA or enter some SVA code and generate the English equivalent, and then provide feedback on the results. I don’t claim to be an assertions expert, but I tried some English examples and the underlying algorithms seemed to handle them just fine. I wondered why an EDA vendor would offer this site for free rather than charging for the technology in a product, so I asked Agnisys CEO and founder Anupam Bakshi for more information.

Anupam described this as a crowdsourcing site and said that they made it free and open specifically to gather many different examples of how engineers think about design intent and describe assertions in natural language. He said that they performed initial training on the algorithms using assertion examples gathered from many sources, including an industry expert who literally wrote the book on SVA. But they knew that this would not be enough to create a robust technology that users could rely on, so they created the site and announced its availability to their users. Their R&D engineers carefully studied all the examples provided and, especially when users provided feedback that the results were not perfect, provided guidance to the tool as needed to learn from these additional examples. By the end of this process, they were comfortable letting everyone try it out. Anupam remarked that the technology is not yet “done” and that it will continue to improve in capacity and flexibility with additional crowdsourcing and lots more diverse examples.

Having said that, he stressed that what’s available now is powerful and valuable. He pointed out that the developers focused on robustness, necessary given the inherent ambiguity of natural language. He demonstrated the resilience of the algorithms by typing in a bunch of examples with typos in the English text, and the generated SVA was still correct. I was impressed that typing “onenot” instead of “onehot” and “bicycles” rather than “cycles” didn’t cause confusion; I guess that’s truly intelligent AI in action. It seems to me that iSpec.ai will be immediately useful for many types of assertions. I won’t yet go as far as to predict that users won’t have to learn SVA at all, but that seems like an entirely possible outcome as the technology matures further.

Users who do want to learn and write SVA will doubtless benefit from the translation in the opposite direction, using the generated English descriptions to double-check that their assertions specify what they intended. Anupam mentioned two additional uses: understanding and documenting existing assertions. Engineers often license IP blocks or inherit designs from other sources, and these may contain SVA that they didn’t write. Translating them into text could help the users to figure out what these assertions do. This process could also be used to document assertions, whether self-written or inherited, in verification plans and specifications.

I found this whole topic fascinating, and I suggest that everyone interested in assertions visit iSpec.ai and try some examples. I think you’ll be impressed and, if you do fool the AI with a novel way to express design intent, just provide feedback and rest assured that the Agnisys team will use your clever example to enhance and expand the technology for the benefit of all users. That’s what crowdsourcing is all about. Have fun!

Also read:

What the Heck is Collaborative Specification?

AUGER, the First User Group Meeting for Agnisys

Register Automation for a DDR PHY Design


Scalable Concolic Testing. Innovation in Verification

Scalable Concolic Testing. Innovation in Verification
by Bernard Murphy on 12-23-2021 at 10:00 am

Scalable Concolic Testing

Combining simulation and symbolic methods is an attractive way to excite rare branches in block-level verification, but is this method really scalable? Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Scalable Concolic Testing of RTL Models. The paper was published in the IEEE Transactions on Computers, July 2021. The authors are from U. Gainesville, Florida.

Reminder: Concolic uses simulation analysis (Concrete) to together with Symbolic methods to try to achieve a balance between the scalability of concrete and the exhaustive coverage of symbolic. The authors use this technique to improve coverage on hard-to-reach branches in block-level RTL. Their main contribution is improved heuristics to reduce path explosion in the symbolic analysis without impacting coverage.

The authors have a couple of innovations. They have a method they call “contribution-aware edge-realignment” whose goal is to find an efficient way to force a single path to an uncovered branch. This avoids state explosion problems. They look for assignments to variables used in the branch condition and grade these based on their likely contribution to meeting that condition.

The second innovation aims to overcome inefficiency in covering only one rare branch at a time. They strive to cover multiple targets in part by pruning, so that if uncovered branch x is in the path to uncovered branch y, x can be dropped as it will be covered in activating y (unless y is unreachable). They show considerable improvement over other reported work in run times and memory to activate rare branches.

Paul’s view

This is a great paper and a very easy read given the depth of the content. I am big fan of concolic, here used to improve branch coverage. This work is very relevant for mainstream commercial verification.

The core contribution in the paper is the concept of a branch activation graph. This is a graph with nodes representing branches in the RTL (i.e. a branch condition and its begin-end body), with an edge from branch A to branch B if there exists a design state where branch A is triggered and B is not, and where executing the body of branch A takes the design to a state in which branch B is then triggered.

This activation graph serves as a guide for a symbolic simulator to prioritize its search to reach an as-yet uncovered branch in verification. If there are no input values that can trigger a uncovered branch from the current state, try input values that trigger an adjacent branch in the activation graph. If this is not possible, pick a branch that is two hops away from the uncovered branch in the activation graph. And so on. After applying this heuristic for several clock cycles there is a good chance the symbolic simulator will hit the uncovered branch. Certainly a much better chance than were it to just randomly toggle inputs and hope to get lucky.

The results presented are compelling. Use of the activation graph along with a few other innovations to prune searches and pick good starting traces results in a solution that is 40x faster and scales 10x more efficiently in memory consumption with design size compared to prior work using alternate heuristics in the symbolic simulator. There is just one outlier case, a usb_phy, where their approach does not work as well. I am really curious why this testcase was an exception; unfortunately, this wasn’t explained in the paper.

We are working on a formal-based concolic engine at Cadence that we call “trace swarm”. The activation graph concept in this paper could be a great fit for this too.

Raúl’s view

The system uses the Design Player Toolchain to flatten Verilog and the Yices SMT solver for constraint solving. The authors compare experimental results  to the Enhanced Bounded Model Checker EBMC. Also to QUalifying Event Based Search QUEBS (another concolic approach). They selected benchmarks from ITC99, TrustHub and OpenCores with 182 to 456,000 lines of code for the flattened design and 20 hard to reach targets in each benchmark which they picked after running a million random tests.

Coverage for this approach is 100%, execution time from subsecond to 134s, and memory 9.5MB-1.2GB. The other approaches don’t reach 100% coverage. They also run between a few times to a few 100 times slower using 1-10x memory. There are outliers: USB_PHY runs 16-50x slower in the presented approach (134s) using 5 times more memory (just 138MB). As Paul commented an explanation would have been nice. QUEBS runs ICACHE 30,000 times slower.

The paper also shows that memory scalability as unrolled cycles/lines of code increase is much better that EBMC. (EBMC is the more scalable of the competing approaches). Finally, they compare target pruning  to EBMC, showing also better results. For example, the number of targets pruned by this approach is 15 vs 10 in EBMC.

I like the approach as being pragmatic. It uses an intuitive concept of distance based on assigned variables and path lengths, avoiding repeat computations (pruning, clustering). And of course the mix of simulation and constraint solving.  The results are promising, with modest execution times and memory usage, and scale well. This would be a worthwhile addition to constrained RTPG to increase coverage or to existing concolic tools.

My view

As a path to activate rare branches, this looks a more targeted approach than starting traditional BMC checking from various points along a simulation path. Very interesting.

Also Read

More Than Moore and Charting the Path Beyond 3nm

Topics for Innovation in Verification

Learning-Based Power Modeling. Innovation in Verification


Cut Out the Cutouts

Cut Out the Cutouts
by Aaron Edwards on 12-23-2021 at 6:00 am

ANSYS HFSS 3D Layout 1

In 2014, many of the customers that my team and I supported in North America were still using HFSS 3D to model boards and packages. These customers were content with that interface, able to get their models setup quickly, and were okay with the solution times because when HFSS gave them an answer, they knew it was the right answer. I was a little frustrated with this situation because Ansys had delivered some amazing technology in the HFSS 3D Layout environment, and within the HFSS solver, to allow customers to get to those extremely accurate answers, but in less time. As I have blogged about before, Ansys introduced key technologies that enabled substantial reduction in simulation times:

  • Phi Mesher – efficient meshing technique that tackles layers structures found in PCB, package and IC designs
  • Distributed Direct Solver – distributes the matrix solution during the adaptive pass or frequency sweep stages across multiple cores or across multiple nodes for improved scalability
  • Auto HPC – allows HFSS to optimally apply the total number of cores and/or machines available to solve the project in the most efficient manner
  • Ansys on the Azure Cloud – made HPC extremely easy to access and to scale up cores/RAM to solve models fast

These were just a few of the key advances in the software that allowed users to speed up their simulations time… yet users were not switching to HFSS 3D Layout. 6 years later, in June of 2020, due to competitive claims to HFSS, our customers came to us stating, “I hear that other tools can run faster than HFSS… how can you make HFSS run faster?” Like a broken record I told them about HFSS 3D Layout, the Phi Mesher, Distributed Direct Solver, Auto HPC, and Ansys on the Azure Cloud. Those competitive claims was just what we needed to get our customers to see the light! HFSS 3D Layout was poised to be the solution that they were looking for, and for every benchmark ran, it reduced their solutions times from 2X to 20X in some cases. 18 months later, our customers have fully embraced HFSS 3D Layout for their chip/package/board workflows, and I see they are benefiting in 2 ways… some are able to solve their critical nets way faster than before, and are able to give pass/fail metrics to their designs teams a lot sooner. This allows these teams to get their products to the market faster. The other benefit I see is that customers are solving 2X to 4X the amount of nets that they would have solved before. The simulations times are still reasonable for the increased amount of nets, and customers are able to characterize reliability concerns like cross-talk and cross-coupling to ensure robust designs. Being able to solve larger portions of the designs in one model allows customers to reduce failures after production because they have modeled more of the design under real-world conditions.

With the incredible success our customers have had with HFSS 3D Layout, there is still one issue that I see still today with their models… and that is ‘cutouts’. Customers are still trying to make the model as small as possible to reduce the amount of RAM it needs to solve. I want to stop this practice, but I know it is engrained in our user base… and I know why they do it… ME! Yes, this was a common practice 5-10 years ago when the overwhelming majority of our customers were running projects on one machine. This machine may have had 250GB if they were lucky, and making sure the simulation was able to mesh, adaptively refine and complete a frequency sweep, all within that RAM footprint, was critical. So Ansys’ AE staff back then would teach customers how to create cutouts that would minimize size of the model, and in turn, minimize the RAM footprint. Sometimes that practice would cause accuracy issues because the cutout would be too close to the traces and adversely effect the return path, and/or introduce false return paths. There were 2 common ways to cut the model: using a conformal cut that followed the path of the traces (which often caused rounded edges) or manually creating a polygon to closely follow the traces. We would also teach customers to methodically go through the design and remove any object that was not electrically important to the model, Objects like vias, pads, and thru-holes were all manually removed to get rid of those unnecessary mesh elements. This type of cleanup could take hours to perform.

So, I am happy to announce that with 2021R2 HFSS 3D Layout, with the before mentioned key advancements in the software, we can abandon those practices. Our suggestion is to just use a rectangular cut well enough away from the traces. No more conformal cuts, no more polygons, and in general… no need to perform excessive cleanup. Why should you use just a rectangular cutout? For one, hardware has dramatically improved. Many customers have access to on-prem hardware that is well above the previous standard of 250GB of RAM. They have been able to string multiple machines together to allow larger simulations to run without issue. We have also seen adoption of the Ansys Cloud which has given customers access to cores/RAM needed to solve their biggest projects. Secondly, rectangular cuts tend to help the mesher reduce unnecessary edges and vertices that were introduced in the old cutout methodologies.

Need more convincing? Below, I show a quick comparison of the old methodologies used for cutting out a model, versus the 2021R2 methodology of using a rectangular cut.

  • The Conformal Cutout – The outer airbox follows the path of the traces which creates many rounded edges and vertices

 

  • The Bounding Box – Uses a polygon to cut through the model, but causes excessive dielectric regions which didn’t match with the real-world operation
  • The Rectangular Cut – The outline is far away from the traces, and ensures the natural return path is preserved

Key takeaways from the simulation results

  • Creating the smaller models didn’t solve faster
    • Conformal – 54mins; Bounding_Box – 52mins; Rectangular Cutout – 47mins
    • The total solution time for the adaptive convergence was fastest with the rectangular cutout. This largely has to do with the convergence taking only 9 passes rather than 10.
  • The smaller models had less initial/final mesh elements
    • This is true, but I wanted to show that a larger cut doesn’t mean that it will take longer to solve. As you can see, the rectangular cut produced the smallest RAM footprint. The rectangular cut allows the mesher to focus the mesh refinement on fields around the traces, and not on the edge boundary conditions.
    • Conformal – 79GB; Bounding_Box –66GB; Rectangular Cutout – 63.3GB
  • Take the guess work out of the simulation
    • Making a large rectangular cutout allows the users to avoid making cutouts to small that may affect the return paths.
    • It reduces the need to clean up the model

Summary

  • Use the latest release, which is currently 2021R2
  • Use HFSS 3D Layout for planar designs like IC, packages, and boards
  • Use rectangular cutouts when cutting models down from their original size
  • Contact an Ansys AE if you need help with setting up any of our models!

& cut out the cut outs!

Also Read

Is Ansys Reviving the Collaborative Business Model in EDA?

A Practical Approach to Better Thermal Analysis for Chip and Package

Ansys CEO Ajei Gopal’s Keynote on 3D-IC at Samsung SAFE Forum


More Than Moore and Charting the Path Beyond 3nm

More Than Moore and Charting the Path Beyond 3nm
by Kalar Rajendiran on 12-22-2021 at 10:00 am

Cadence AIML Technologies

The incredible growth that the semiconductor industry has enjoyed over the last several decades is attributed to Moore’s Law. While no one argues that point, there is also industry wide acknowledgment that Moore’s Law started slowing down around the 7nm process node. While die-size reductions still scale, performance jumps and power reductions aren’t scaling as they used to. At the same time, die sizes for designs have been increasing at an unsustainable rate, reaching close to current reticle size limit. This has introduced a myriad of issues to tackle. The industry as a whole has been working on various ways to overcome the hurdles. A lot has been written up about various solutions being pursued to address specific aspects.

Something that I haven’t often come across is a treatise of Moore’s Law era and what is needed for the next era. One such presentation was made at the recently concluded DAC 2021. The talk was given by Michael Jackson, Ph.D. Corporate VP, Research and Development at Cadence. The semiconductor market couldn’t have developed to even a fraction of its size today without the electronic design automation (EDA) industry. Michael takes us through his view of how EDA enabled Moore’s Law and the changes happening to EDA as driven by AI/ML. The last part of his presentation covers the integration changes needed to drive the continued growth of the semiconductor industry. The following is a synthesis of what I garnered from his talk titled “More Than Moore and Charting the Path Beyond 3nm.” You can listen to Michael’s entire talk from the TechTalks track of DAC 2021 virtual sessions.

Three ways EDA has fundamentally enabled Moore’s Law

Process technology advances is an obvious enabler of Moore’s Law as it is intrinsic to it. Another, not so intrinsic, nonetheless a fundamental enabler of Moore’s Law is EDA. The three ways EDA has enabled Moore’s Law are design methodology, EDA tool turnaround time (TAT) and process technology enablement.

Design Methodology

EDA has advanced from polygon pushing to transistor-level to cell-based to IP re-use design methodology development and support. At every step of this progress, EDA has delivered an average 10x productivity boost.

EDA Tool TAT

If tool run time could be reduced in half (say from 8 hours to 4 hours), that translates to a huge benefit for a designer. Since the early 2000s, EDA industry began focusing more on such core values and less on features for features sake. Inspired by Moore’s Law, tool TAT improvement became a major focus for each release of tools within the EDA industry.

Michael shares examples of systematic runtime performance improvements release over release. Synthesis products runtime improvements of 1.5x with every release as measured statistically over suite runs versus over just a few select designs. Emulation capacity increase of more than 10,000-fold over the last 30 years. Spectre® X simulator’s 10x speed improvement over Spectre APS while maintaining Spectre golden accuracy standards.

Process Technology Enablement

Process technology advances impact EDA tools with hard and soft requirements that must be addressed. Hard requirements are changes to each process technology node that must be handled by EDA tools. Examples of hard requirements are changes such as double-patterning, special Via support, DRC rule enablement and extraction enablement. Place and Route tool is very highly dependent on process technology driven hard requirements. At the other end of the spectrum are RTL simulation tools with very low dependency on process technology. And then there are soft requirements such as accuracy improvements to enable improved analysis and optimizations at each process node. Low-voltage accuracy and aging analysis are examples of soft requirements that are process node dependent.

ML-enabled EDA is the next Big Thing

EDA is full of NP hard/NP complete problems that are non-trivial to solve and require exponential run times. Because of this, overdesign and margin inefficiencies are traditionally built into designs to save on run times. Machine learning’s robust, rapid pattern-matching framework can reduce overdesign and margin inefficiencies.

ML-Based EDA can

  • help change design methodology as well as help improve run times of EDA tools
  • improve PPA results

Cadence’s ML-enabled EDA tools and capabilities span a wide spectrum of functional areas. Refer to Figure below.

Cadence Cerebrus™ digital implementation full flow, for example, delivers PPA and runtime improvements and frees engineering resources to work on more designs. This has been covered in detail in an earlier post. Michael provides a number of examples of improvements achieved through ML-enabled EDA.

Solution requirements needed to support the More than Moore Era

The slowdown of Moore’s Law has accelerated the growth of complex system designs leading to heterogeneous system integration. This era is termed as the More than Moore era. Just as Moore’s Law era was enabled by EDA, so will the More than Moore era in the form of 3D design methodology, 3D EDA tool TAT improvements, and 3D process technology enablement. Today’s complex systems call for integrating digital, analog, RF, sensors, passives and fluidics in 3D ICs, and on PCBs.

Cadence has been investing in multi-chip(let) packaging for a long time. When dealing with 3D-IC requirements, already complex and time-consuming tasks may take an even larger scale. For example, let’s consider static time analysis (STA) and the number of corners for a signoff. When going from a single die implementation to a chiplet implementation, the number of corners for signoff could increase 10x-100x depending on the design. Cadence’s Rapid, Automated Inter-Die (RAID) analysis significantly reduces STA corner data and TAT. Cadence has also developed and incorporated other capabilities into Tempus for enhancing efficiencies for 3D-ICs. In order to be able to avoid costly overdesign of individual dies and packages that make up a 3D-IC, a fully integrated platform is needed. A platform that integrates die implementation, package design, power, thermal and timing analysis and DRC/LVS check, all operating on an multi-technology common database.

Also Read

Topics for Innovation in Verification

Learning-Based Power Modeling. Innovation in Verification

Battery Sipping HiFi DSP Offers Always-On Sensor Fusion


DAC 2021 – Siemens EDA talks about using the Cloud

DAC 2021 – Siemens EDA talks about using the Cloud
by Daniel Payne on 12-21-2021 at 10:00 am

Craig Johnson

My third event at DAC on Monday was all about using EDA tools in the Cloud, and so I listened to Craig Johnson, VP EDA Cloud Solutions, Siemens EDA. Early in the day I heard from Joe Sawicki, Siemens EDA, on the topic of Digitalization.

Craig Johnson, Siemens EDA

Why even use the Cloud for EDA? That’s a fair question to ask, and Craig had several high-level answers:

  • Increased throughput
  • Higher capacity and availability
  • VMs tailored to specific workloads for maximum compute efficiency
  • Enables multi-party collaboration
  • Provides a global scale and consistency
  • More services are available
  • Better testing and optimization of tools and flows

Siemens EDA support the major three cloud vendors: AWS, Azure, Google. Mr. Johnson shared that engineering teams come up to speed with cloud-based tool flows through a process of: starting out with deployment planning resources, reading technical papers, watching presentations, finding application notes, making their own checklists, creating deployment guides, receiving AE assistance, using templates, deploying EDA tools, and re-using cloud-specific scripts.

Several specific EDA tools were mentioned from Siemens EDA, like:

Calibre nmDRC

Design groups can use cloud-based tools in a self-managed environment, or have Siemens manage the environment for them. Craig showed that there are four ways to use could-based EDA tools: a managed cloud from Siemens, cloud connected as an extension to on premise cloud, cloud native for full or partial tool flows, and finally, Velocity cloud which is using the Veloce emulator in the cloud.

Cloud Offerings

With the managed cloud offering from Siemens, they will configure all of your software tools in the cloud, provide CAD support, and share reference designs to get you started most quickly. This approach keeps your engineering headcount lower, by using the cloud as a service. Data traceability is including, so you’ll always know who uses a particular tool and what designs they have run through each tool. VPN technology gives your engineers a remote desktop to run each of the EDA tools in the managed cloud.

For a cloud connected tool flow, you start with on premise compute, then add on cloud services as needed, depending on the workloads. Peak loads can be done in the connected cloud.

The Cloud native flow has all of your EDA tools in the cloud, along with all design data, tool results, log files, PDK files, semiconductor IP, tests, etc. One application is for IC companies to showcase their new chips by providing a virtual evaluation board, in the cloud, instead of manufacturing a board and then shipping it out for evaluations. Engineers could evaluate the new chip as mounted on a virtual evaluation board, apply stimulus, make measurements, even run their own firmware or software.

Buying a hardware emulator is expensive, so offering an emulator in the cloud makes a lot of economic sense if your team just needs to run some software on a new SoC before silicon is ready. Emulation as a service is an emerging market and can be quite attractive for first-time emulation users.

In summary, doing EDA in the cloud makes sense because of the speed benefits, and with the cloud you can do simulations, verifications, virtual boards and even emulation. Most of these tasks are not as feasible with on premise infrastructure.

Related Blogs