ads mdx semiwiki building trust gen 800x100ai

DAC 2021 – Cliosoft Overview

DAC 2021 – Cliosoft Overview
by Daniel Payne on 12-30-2021 at 6:00 am

Simon and Karim min

It’s been awhile since I really looked at what Cliosoft has to offer in the EDA tool space, so at the 58th DAC I stopped by their exhibit booth on Tuesday to visit with Karim Khalfan, VP of Application Engineering, and Simon Rance, VP of Marketing. Their booth had all of the hot market segments listed: Automotive, 5G, IoT, AI, Foundries.

Simon Rance, Karim Khalfan

History

The founding of Cliosoft goes back to their founding in 1997, and has grown into a worldwide organization, with  350+ customers, providing IC design and data management for semiconductor design companies. Some of their tier one enterprise customers include: On Semi, Cadence, TSMC, Marvel, MediaTek. Other notable clients that design high-tech electronic products are: Google, Qualcomm, Microsoft, AMD, Boeing.

Since EDA users mix and match tools from multiple vendors, Cliosoft is already part of all the well-known partner programs:  Siemens EDA, Synopsys, Cadence, MathWorks, Keysight, Silvaco.

EDA Products

There are three major EDA products from Cliosoft that will interest IC design teams:

  • SOS – design and data management (File, text, binary), runs in the background or interactive and batch tool use
  • HUB – IP management and re-use system
  • VDD – schematic and layout comparison tools (cosmetic changes vs net list changes)

The SOS product is used in several areas: helping your team manage design data, provide revision control for all IP blocks being used, and performing releases on an IP or entire IC design. With SOS multiple engineers can collaborate on a project, sharing data safely between different geographies, allowing clear terms for handing off – like between schematic and layout designers. Everyone on the team knows when changes to any IP have happened. Architects can quickly re-use IP blocks, knowing that they have the most recent version, and at any time review the history of IP versions.

IC designs contain many files, both text and binary formats, so with SOS you save all of your design data in repositories, which can be centralized or distributed, both methods are supported. Enterprise design centers tend to use distributed approaches with referencing to the golden source.

With the SOS storage approach you are saving on disk space, because a user creates a new project without having to make physical copies all of the time which duplicates storage, instead SOS uses a symbolic link in their work area which creates a much smaller footprint than a physical copy. This approach is also faster to create a new work area, and it’s kind of unique, being used since 2001, so well proven over the past two decades. SOS also supports the local work area model too, but about 95% of customers use symbolic links instead.

In the SOS architecture  there’s a main depository for you IC design, along with remote caching servers. Remote geographies don’t have to wait a long time in order to see all of the latest updates.

For projects using the OpenAccess database, there are multiple views for each cell:
  • Schematic view
  • Layout view
  • Extracted view
  • Symbol view

All of these cell views get packaged together, and are always in synch, so the engineers are operating at an abstract level with packages.

The HUB product is used to quickly reference and re-use IP blocks across multiple projects inside of any company. With HUB your management knows exactly where every IP block is being used, and tracks 3rd party IP along with internally re-used IP.

Beyond just check in and check out of IP cells, team members will also be using tagging to move a design block from the logic design to the  layout group. You can even use your existing bug tracking tools, like: Jira, Trac, Bugzilla and Fusion forge.

SOS and HUB tools can both be used on AWS, Google, and the Azure cloud platforms. Each team decides how much cloud and on-premise work is done, and the hybrid approach is a popular design management trend. Cliosoft is also collaborating with Microsoft in the newly announced Rapid Assured Microelectronics Prototypes (RAMP) program, where the design data and IP management flows work in Microsoft Azure.

The final tool is called VDD – Visual Design Diff, and it compares schematics or layouts, flat or hierarchical, to show all of the subtle changes in a highlighted form. This capability is useful for tracking the progress of schematics and layout, so management can determine the percentage of cells that are done, and the percentage remaining. The VDD tool is built into the Cadence environment, so a user can quickly tell how many cell views are completed. Anyone on the team can look at the cell labels to understand the progress.

For Cadence Virtuoso users, they never really have to look at SOS or use command line options, instead they just use the familiar Library Manager. All of the Cadence IP group uses Cliosoft for data management. Likewise, users of Synopsys Custom Compiler also use SOS as a built in set of features.

To protect your IP from being moved to the wrong geography, a team sets up access control restrictions, for example a mil-aerospace contractor could require security clearance to even view an IP.

HUB works with many data manages systems: git, subversion, SOS, Perforce, NAS/SAN (store IPs anywhere). For finding the right IP, you just use a catalog with searching and comparing features, and it even shows differences between similar IPs, all inside of a web-based GUI. You can even find out how many designs a the specific IP  has been used on, in order to lower your risks.

There’s hierarchical visibility, so you know who is using each IP block, and where inside the hierarchy it is placed. Users can see a bill of materials for all sub-blocks inside of an IP.  You can even track all documentation per IP block.

With Cliosoft tools there are plenty of 3rd party software integrations, as the HUB tool connects to Design Management, Bug Tracking, Documents (Confluence, Google Docs, Dropbox, Box), EDA tools (Virtuoso, PLM tools, in-house tools), and Issue tracking (Jira, Bugzilla).

Users have quick access to an IP catalog for reuse, while management has oversight on how IP is being used, and the tools provide IP traceability so IP audits can be performed. Your company gets to define the process by which a new IP block can be used, for example by first requiring that there’s legal approval, management approval, and a signed license agreement.

Summary

My impression from Cliosoft is that after 20 years serving the IC design and data management market, they’ve figured out how to integrate their tool features into your existing IC design flow providing revision control, design release and derivative management. Their customer list looks impressive, so give them a call, or contact Bob Slee from EDA Direct to learn more.

Daniel Payne, Bob Slee

Related Blogs


Heterogeneous Integration – A Cost Analysis

Heterogeneous Integration – A Cost Analysis
by Tom Dillinger on 12-29-2021 at 10:00 am

cost comparison

Heterogeneous integration (HI) is a general term used to represent the diverse possibilities for die technology incorporated into advanced 2.5D/3D packaging.  At the recent International Electron Devices Meeting (IEDM) in San Francisco, a team from Synopsys and IC Knowledge presented data from analyses of future potential HI implementations.[1]

This article briefly summarizes the highlights of their paper, with an emphasis on the rather startling HI cost analysis.

HI Interconnects

The nomenclature for advanced HI packaging is illustrated in the figure below.

A complex HI package could incorporate:

  • 3D (thinned die) high-bandwidth memory DRAM stacks
  • 3D stacked die
  • a 2.5D interposer, with redistribution layers (RDL) for signal interconnects between die and the package substrate
  • a hierarchy of attach technologies:
    • C4 bumps (~110-150um pitch)
    • microbumps for die-to-interposer attach (~40-55um pitch)
    • hybrid bonded (bumpless) attach, for 3D stacked die, in either a face-to-face or face-to-back orientation
    • through silicon vias (TSVs) in the interposer between the bumps and RDL layers
    • micro-TSVs through the silicon in the 3D die stack (~10um pitch)

There is also the potential to replace the silicon interposer with smaller silicon “bridges” between die edges in the 2.5D configuration, maintaining the high interconnect density while reducing cost (not shown in the figure above).  The tradeoff with the use of bridges embedded in an organic substrate versus an interposer is the redistribution interconnect density is reduced considerably.

HI Interconnect Electrical Analysis

A key requirement of any heterogeneous integration system is the available bus bandwidth for data communication between die.

An electrical design consideration is whether the interconnect characteristics between die (on the interposer or bridge) will support wide parallel bus signaling at lower clock rates to achieve the requisite throughput, or whether a more sophisticated (and more power-hungry) high-speed serial interface design is required.

A physical and electrical analysis of the interconnects includes estimates for:

  • interconnect density
  • package wire length
  • signal latency from Tx-to-Rx
  • losses (signal fidelity at the receiver circuitry)
  • bit error rate (past the receiver)
  • power/bit

The interconnect density of interposer (or bridge) RDL wires has led to the development of parallel bus electrical standards for die-to-die communications in advanced 2.5D packages, such as AIB [2] and OpenHBI [3].

Synopsys commented that the circuit challenges for the PHY IP for a parallel HBI interface (@ 4Gbps) are “far less demanding” than for a SerDes operating at a much higher datarate.  This interface is optimal for interconnect lengths on the order of ~5mm.  The table below from the presentation highlights the serial versus parallel interface tradeoffs.

For die-to-die interconnect lengths afforded by 3D hybrid bonding (~1um), direct buffered signaling is viable – no PHY required.

PDN for Heterogeneous Integration

Another design consideration is how to provide the global power distribution network (PDN) to the HI configuration.  The figure below illustrates a unique 2.5D die plus HBM topology proposed by the Synopsys team, where the PDN is fabricated directly on the interposer.

The interposer with PDN is hybrid-bonded to the backside of an ultra-thinned die, with “nano-TSVs” at 3um pitch connecting to buried power rails (BPRs) embedded locally with the logic circuitry.  A silicon lid “carrier” is bonded to the top side of the die to support the die ultra-thinning process.  This configuration offers simplified PDN processing, improved I*R drop on the VDD/GND supplies, and frees up BEOL routing tracks on the die for improved circuit density.

(Foundries are also working on single-die backside PDN fabrication process capability.  This proposal leverages the presence of the 2.5D interposer for HI configurations.)

HI Cost Analysis

An enlightening part of the Synopsys presentation related to an analysis of the relative costs of a monolithic versus disaggregated HI implementation.  The team worked with IC Knowledge, LLC on the financial forecast models.[4]  (Note that the configuration below uses 2nm process technology estimates.)

The parameters used for this comparative analysis were:

SoC:  2nm process note, gate-all-around devices, 17-layer metal (17LM), 600mm**2 die size, with 65% logic, 20% L3 SRAM, 10% I/O

HI implementation:  Core die in the original 17-layer metal 2nm process, L3 SRAM die in 4-layer metal 2nm process hybrid bonded to base die, separate I/O die in 7-metal layer 90nm process on 2.5D interposer

The figure below illustrates the results of the analysis – a 48% cost reduction!

The cost benefits accrue from:

  • higher die yields
  • no need for 17LM fabrication for the non-logic functions
  • 4LM in the 2nm process for the L3
  • 7LM in a 90nm process for the I/O

These cost reductions more than compensate for the additional expense related to:

  • die sort
  • 6LM silicon interposer with TSVs
  • HI assembly, test

Summary

Advanced packaging technology has enabled heterogeneous integration of disaggregated functionality where different process technologies (and BEOL options) are available for the individual die.  The analysis by Synopsys and IC Knowledge indicates the cost advantages of a 3D + 2.5D HI configuration can be substantial.

Additionally, this packaging technology offers tradeoffs in the choice of serial versus parallel bus implementations.  For the high interconnect density and short length of 2.5D signaling, wide parallel buses offer the requisite data bandwidth with simpler circuitry and lower pJ/bit power dissipation.

The Synopsys IEDM presentation also illustrated an alternative for the PDN, utilizing the interposer with ultra-thin die and nano-TSV connections.

-chipguy

References

[1]  Lin, X.-W., et al., “Heterogeneous Integration Enabled by State-of-the-Art 3DIC and CMOS Technologies:  Design, Cost, and Modeling”, IEDM 2021, paper 3.4.

[2]  https://github.com/chipsalliance/AIB-specification

[3]  https://www.synopsys.com/designware-ip/technical-bulletin/openhbi-die-to-die.html

[4]  https://www.icknowledge.com/

All images in this article are copyrighted by the IEEE.

Also Read:

Delivering Systemic Innovation to Power the Era of SysMoore

Creative Applications of Formal at Intel

Synopsys Expands into Silicon Lifecycle Management


2D NoC Based FPGAs Valuable for SmartNIC Implementation

2D NoC Based FPGAs Valuable for SmartNIC Implementation
by Tom Simon on 12-29-2021 at 6:00 am

2D NoC SmartNIC

Smart network interface cards (SmartNICs) have proven themselves valuable in improving network efficiency. According to Scott Schweitzer, senior product manager at Achronix, it has been shown that SmartNICs can relieve up to – and perhaps beyond – 30% of the host processor’s loading. SmartNICs started out taking on simple functions to supplement the host processor. With advances in SmartNIC design and architecture they have taken on much more complex roles and provide a high degree of flexibility with their re-programmability. I recently watched an on-demand webinar replay from Achronix where Scott talked about five important aspects of SmartNICs. The webinar is titled “5 Reasons Why a High Performance Reconfigurable NIC Demands a 2D NoC”.

2D NoC SmartNIC

According to Scott there are three fundamental architectures for SmartNIC design: bump in a wire, Von Neumann Sidecar and single chip. All of these except single chip require multiple chips with chip-to-chip interfaces that create bottle necks. With 100GbE, and above, packet rates are staggering, reaching 2,400 Mpps on dual port 400G. Each packet will typically be touched multiple times when transiting the NIC. Thus, the slower PCIe transfers within multi-chip SmartNICs will hinder throughput. FPGAs are attractive for SmartNIC operations because they are reconfigurable for different workloads depending on the application. All of this points to single chip FPGA based solutions dominating the market.

SmartNICs need high internal bandwidth to handle the increasing external bandwidth they are seeing. Some estimates suggest that internal data movement in a SmartNIC needs to be 10x the external rate in order to smoothly handle the functions they are asked to perform. The 2D network on chip (NoC) that is used by the Achronix Speedster7t has 2 vertical NOC lanes for each Ethernet controller. These lanes each operate at 512Gbps, servicing an Rx/Tx pair (400Gbps/ea).

For receive, network traffic moves easily through the onboard Ethernet SerDes and PCS/MAC layer onto a 2D NoC column.  In the FPGA fabric there is a receiving Rx engine that processes the packets and forwards them along a horizontal NoC row to the matching engine. After this, packets are moved via NoC to a DMA engine for conversion to PCIe buffers. After moving through another vertical NoC column, the packets move to the PCie controller and SerDes.

Virtualization and SD Overlay Networks add complexity to Rx/Tx and matching engines. There can be larger block sizes in these environments. With all this comes increased on-chip traffic. While the overlay network may appear less complex, the data movement on the underlay network can become quite complex. Physical SmartNICs will see heavier loads and more throughput as a result.

Scott talks about the reasons that security, filtering, encryption and key management make single chip SmartNICs more attractive. Each of these activities is necessary and growing more challenging in networks today. For instance, filtering in the matching engine requires deep packet inspection, tagging, rewriting packet headers and unwrapping & wrapping packets, etc. At the same time the SmartNIC needs to offer full support for key management and encryption/decryption for VPN tunnel termination.

Scott also touches on the changes coming with CXL and NVMe during the webinar. He also makes the case that the continuing move to higher bandwidth network interfaces and changes in applications, such as VMs will call for higher throughput and flexibility. All of the above factors play an important role in driving the preferred architecture and the specific implementation for SmartNICs. Achronix’s use of a 2D NoC with their programmable FPGA fabric offers impressive data handling capabilities to meet these needs.

Their 2D NoC offers 20 Tbps aggregate on-chip bandwidth. Each vertical and horizontal bus handles 512 Gbps in a matrix that covers the FPGA fabric. There are numerous Network Access Points (NAPs) for on and off-loading data to the NoC. Scott points out that if each packet moves through 4 processing blocks, 3.2 Tbps would be needed with 4 x 400 GbE. Scalability and future proofing could call for 10x that.

This webinar offers a stark view of the needs of SmartNICs today and in the future. Historically they might have started off as handy assistants to simplify operations on hosts CPUs. It is clear that SmarNICs are becoming more and more the center of gravity for complex network applications. The full webinar is available for viewing on the Achronix website.


Methodology for Aging-Aware Static Timing Analysis

Methodology for Aging-Aware Static Timing Analysis
by Tom Dillinger on 12-28-2021 at 10:00 am

char STA flow

At the recent Design Automation Conference, Cadence presented their methodology for incorporating performance degradation measures due to device aging into a static timing analysis flow. [1] (The work was a collaborative project with Samsung Electronics.)  This article reviews the highlights of their presentation.

Background

Designers need to be cognizant of the mechanisms that contribute to degradation over the operational lifetime of a part, to ensure the overall product requirements are satisfied (e.g., FIT rate).  There are both failure and degradation mechanics to address.

Failure criteria are an absolute consideration, while degradation (or “aging”) processes may result in a hard fail or have an adverse impact on circuit performance.  The methodology for analyzing an aging mechanism involves an engineering assessment of the expected temperature and voltage environment, plus the switching activity likely to be applied during the part’s lifetime.

Failure Mechanisms

There is little latitude associated with the addition of ESD protect and latch-up suppression circuitry to avoid the related failures.

Time-dependent dielectric breakdown (TDDB) is an aging factor due to the “wearout” of the gate oxide dielectric.  The mechanism associated with TDDB is a thermo-chemical reaction, where (weak) chemical bonds in the dielectric are broken after extended exposure to the gate electric field.  The common model for TDDB is thus strongly dependent upon temperature and applied gate voltage, and may support a “soft” (resistive) followed by a “hard” breakdown current path through the gate dielectric.

The peak current density in interconnects and vias is an immediate failure process.   The resistance change due to electromigration is an aging process, also strongly dependent upon temperature.  (Parenthetically, some methodologies view jRMS-related electromigration wearout analysis as indicative of a hard fail, whereas other methodologies approach the PDN and/or signal interconnect resistance increase as a performance-related impact.)

Degradation Mechanisms

There are two principal device degradation aging mechanisms designers need to analyze, in terms of the potential performance impact – i.e., hot carrier injection (HCI) and bias temperature instability (BTI).  These are not direct fail processes, in that they result in changes in device drive currents and threshold voltages, but not an immediate failure of the circuit to operate.  They relate to the presence of carrier “trap states” at the channel interface and in the gate dielectric stack.  Channel carriers may cross the potential barrier at the interface (at high electric fields) and fill the traps.  The result is a change in the effective electric field at the channel from an applied gate voltage.

  • HCI

HCI is commonly associated with a device operating in the saturation region – also, commonly referred to as “pinchoff” at the drain node.  Carriers accelerated through the pinchoff depletion region are subjected to the gate-to-drain electric field.  These carriers may originate from the channel current and/or from secondary carriers due to impact ionization.  These energetic carriers may undergo a collision resulting in a vertical velocity vector, and may then trap in the dielectric stack near the drain.  Hot carriers may also break chemical bonds in the dielectric stack, resulting in the generation of additional traps.

The result is a localized reduction in the gate-to-drain electric field, as part of the electric field now terminates on the trapped charge.  This is typically modeled as a reduction in the effective channel carrier mobility.  (Note that if the device is used as a bidirectional pass gate, this drain node now becomes the source – a model that alters the threshold voltage rather than the carrier mobility may be more appropriate.)

For logic circuits, devices are operating in the saturation region only during a brief interval of a signal transition.  (For analog/mixed-signal circuits, devices biased in saturation are subjected to greater HCI exposure.)  As a result, logic performance degradation is commonly associated with BTI.

  • BTI

The bias temperature instability mechanism is present when the device is operating in the linear region.  This occurs when a logic device is “on” and has completed a signal transition.

Channel carriers enter the dielectric stack and fill trap states.  BTI manifests as an adverse shift in the device threshold voltage – i.e., an increase in the absolute value of Vt for both nMOS and pMOS devices.  Negative BTI (NBTI) refers to the pMOS device Vt shift, due to the negative gate-to-channel electric field direction; pBTI refers to the nMOS device Vt shift.

The delta in the threshold voltage eventually saturates over time as trap states are filled.  Note that BTI models also include a (partial) recovery in the Vt shift for the time period when the device gate-to-channel electric field is reversed, as depicted below. [2]

As the BTI mechanism is present whenever a logic gate is quiescent, the Vt shift contributes to significant performance degradation over a part’s lifetime.

Static Timing Analysis Methodology with Aging

The simplest method to modeling aging effects would be to apply a multiplicative “derate” to the target cycle time.  In short, the “fresh” cycle time used during design timing closure would be multiplied by a conservative aging factor and released with the reduced frequency spec – i.e., a “guardband” approach.

Alternatively, a more sophisticated method would be to apply a cell instance-specific delay calculation for aging to an STA flow.  The individual cell delay arcs would reflect a (voltage and temperature) environmental assumption over the circuit lifetime.  This method requires a cell library characterization strategy that expands upon the traditional model of:

delay_arc = f( PVT, input_slew, output_load)

to include new dimensions, reflecting the aging delay value.  The figure below depicts the Cadence methodology for cell characterization and aging-aware STA.

The characterization strategy requires adding delay values for different combinations of Vt shifts due to BTI of individual devices.  Spice aging models are provided by the foundry.

The static timing analysis flow is depicted on the right side of the figure above.  An additional input to the aging-aware STA flow is a description of the (piece-wise) expected voltage and temperature conditions which individual blocks will experience over the part lifetime.  The methodology for calculating the duration for which each device is subjected to forward and recovery BTI stress is based on signal probability measures, as illustrated in the figure below.

As an example, for the 2-input NAND gate in the figure, if pin A has a (0,1) probability of (0.44,0.56), and pin B has a (0.6,0.4) probability, the gate output will have a (0.224,0.776) probability to apply to its fanout, derived from the calculation (0.56*0.4, 1 – 0.56*0.4).

An alternative approach would be to apply signal value duty cycles from extensive (gate-level) workload simulations.  The probabilistic approach is simpler, yet it may not reflect extended periods of operation in a specific quiescent state.

To illustrate the flow, Cadence collaborated with Samsung on a 5nm process node design example.  Using the Samsung aging model design kit for library cell characterization, STA was pursued for a core-level design.  Then, 500 paths were selected for a detailed Spice-based aging delay simulation.  The STA versus Spice comparison data is shown below.

Summary

Designers need to evaluate performance degradation effects due to BTI stress over a part’s lifetime.  Using a uniform guardband multiplier is could be quite inaccurate, as it would not be representative of the varying stress/recovery characteristics of (instance-specific) circuit activity.

For more information on the aging-aware STA flow from Cadence, please follow this link.

References

[1]  Amin, C., et al., “Aging-aware Static Timing Analysis”, DAC 2021.

[2]  https://www.cadence.com/en_US/home/tools/custom-ic-analog-rf-design/custom-ic-analog-rf-flows/legato-reliability-solution/advanced-aging.html

Also Read

Scalable Concolic Testing. Innovation in Verification

More Than Moore and Charting the Path Beyond 3nm

Topics for Innovation in Verification


Delivering Systemic Innovation to Power the Era of SysMoore

Delivering Systemic Innovation to Power the Era of SysMoore
by Kalar Rajendiran on 12-28-2021 at 6:00 am

Evolving Landscape

With the slowing down of Moore’s law , the industry as a whole has been working on various ways to maintain the rate of growth and advancements. A lot has been written up about various solutions being pursued to address specific aspects. The current era is being referred to by different names, SysMoore being one that Synopsys uses. Chairman and co-CEO of Synopsys, Aart de Geus coined this term as a shorthand way to describe the new era. One that blends Moore’s law driven advances with innovations that tackle systemic complexity. As per Synopsys’ website, “SysMoore is a descriptive term for state-of-the-art integrated circuit design, which combines the scale complexity of Moore’s law with the systemic complexity of hyper-convergent integration.”

Synopsys gave a presentation at DAC 2021 on the topic of delivering systemic innovation to power the era of SysMoore. The talk was given by Neeraj Kaul, VP of Engineering, Silicon Realization Group (SRG) at Synopsys. He starts by looking back at Moore’s Law era and spends the rest of his presentation focusing on the SysMoore era. He highlights new complexities and opportunities for new advances and what Synopsys is bringing out in terms of new technologies for this era. The following is a synthesis of the salient points I gathered from his talk. You can listen to Neeraj’s entire talk from the TechTalks track of DAC 2021 2021 virtual sessions.

View of an Evolving Landscape

Transformation is happening at a much faster rate than we have seen in the past few decades. The amount of compute power currently available is tremendous. At the same time, the amount of data being sensed, processed, transferred in petabytes, exabytes and zettabytes is requiring us to re-examine our way of computing. The number of design starts are accelerating at a rapid rate. This is placing tremendous pressure on the industry and calls for thinking of new ways of handling the complexity requirements and time pressure demands of the markets.

There are a number of vertical markets in this evolving landscape. Refer to Figure below. While the markets are vertical, there are some things all of them have in common. Those common things are the time-old performance, power and area (PPA) requirements and an increased pressure for cost and turnaround time to results. Together, these five things are termed by the acronym PPAct. Generic purpose chips cannot deliver to market/product expectations on PPAct metrics. The pressure is pushing customers to design custom silicon. Custom silicon initiative allows customers to look at the entire system all the way from software to silicon and optimize through vertical integration.

As if the PPAct pressures are not enough, SysMoore applications introduce Vertical-Specific challenges into the mix. For example, mean-time to failure, longevity of a chip, security, etc., become critically important when dealing with data center, automotive and healthcare markets.

The Waning of Moore’s Law Era

Moore’s Law had been delivering well over several decades. We got accustomed to seeing 2x improvements on all three aspects of the PPA metric, every two years or so. Last few years, we have seen a flattening of the Moore curve. PPA improvement is becoming difficult to achieve simply by moving from current process node to the next node. As we started entering the sub-7nm era, power and performance are not scaling at the same rate as Moore’s law has been delivering. We are seeing only 15% to 30% improvement moving from node to node. Power and performance are becoming bottlenecks, while the area scaling continues to deliver at 2x. But the market demand for power and performance improvements remains. The industry and the market have entered the SysMoore era.

Synopsys’ Approach to Powering the SysMoore Era

The SysMoore era requires innovations in many different areas in addition to moving from node to node. We need ways to deal with systemic complexities and continue to advance in the same way and same rate at which we were doing in the past. The systemic complexities are adding to the explosive demand on engineering resources, compute power needed and turnaround time expectations. We need techniques to improve overall productivity, so that we don’t need 2x-3x number of engineers to tackle the SysMoore era designs and systems.

Synopsys has identified six vectors as complexity/efficiency roadmap drivers to power the SysMoore Era.

Enabling domain-specific architectures

Support for domain-specific architectures is key to achieving customers’ PPAct metrics as these architectures help maximize performance and minimize power for each application. Synopsys’ Platform Architect and RTL Architect products are used by designers and architects to customize and optimize their systems and chips. Neeraj shared a customer example where they used the RTL Architect product to explore a larger design space and choose the right RTL architecture. The customer was able to achieve 5X faster TAT and 300MHz frequency boost for their product.

Scaling Challenge

Traditional tools/flow requires iterations, builds in pessimistic margins and delivers sub-par PPA results. 1D, 2D and diagonal placement rules and context-based timing and power all are crucial to consider up in the early stages of a design. The Fusion technology/platform from Synopsys is a hyperconverged system handling RTL to Tapeout with an integrated common database. The flow/platform is augmented with AI-driven Design-Space-Optimization (DSO) to achieve better results faster. And a comprehensive analytics platform completes the trifecta. This triple play of Fusion, DSO and Analytics platform enables customers to quickly and accurately identify root causes of issues. This in turn helps customers rapidly resolve the issues.

A customer example that Neeraj presented shows a 11% power reduction with just one engineer working on a high-performance GPU design. In the past, achieving comparable results would have consumed many engineers working on it for many months.

Robustness analysis for advanced-node variability

On-chip variation is a big issue these days as we move to finer and finer geometries. Synopsys PrimeShield analyzes robustness of a design for on-chip variation. It performs sensitivity analysis and fixes paths before silicon failure. The tool helps identify sensitive bottlenecks and improves resilience to IR drops. This analytical capability helps improve post-silicon robustness by detecting voltage slack paths and optimizing before tapeout.  Voltage slack is a new metric to measure how resilient a design is to voltage variation. Neeraj shares a customer example where a 9% voltage slack improvement was achieved on a CPU core.

3DIC Compiler

Synopsys 3DIC compiler enables efficient integration of system-of-chips, aka chiplets leveraging 2.5/3D multi-die designs. It leverages the Fusion single data model and allows for fast exploration and pathfinding to accelerate design process. Auto die-to-die (D2D) routing, native DRC and DFT for design realization and validation are included. Together with signal integrity, power integrity, thermal and EMIR analysis, it assists designers in arriving at optimal PPA per sq.mm.

Summary

The fusion of tools over an integrated common database, the deployment of AI techniques to augment the tools and the provision of insightful analytics are key to powering the SysMoore era. Synopsys’s innovations are designed to address the PPAct, productivity, safety, security and resilience requirements of this era’s markets and applications.

Also Read:

Creative Applications of Formal at Intel

Synopsys Expands into Silicon Lifecycle Management

CDC for MBIST: Who Knew?


DAC 2021 – Taming Process Variability in Semiconductor IP

DAC 2021 – Taming Process Variability in Semiconductor IP
by Daniel Payne on 12-27-2021 at 10:00 am

process node variability min

Tuesday at DAC was actually my very first time attending a technical session, and the presentation from Nebabie Kebebew, Siemens EDA, was called, Mitigating Variability Challenges of IPs for Robust Designs. There were three presentations scheduled for that particular Designer, IP and Embedded Systems track, but with the COVID precautions, only one presenter was on site. The technical paper authors were from both STMicroelectronics and Siemens EDA.

ST designs both digital and mixed-signal IPs for use in diverse applications like IoT, automotive and even AI, using process nodes from 90nm down to 18nm. For safety critical products like automotive chips, the challenge is to reduce the failure rate, measured in Parts Per Million (PPM). High-sigma verification is needed to meet the stringent PPM goals, and with each new process node the design sizes are getting more complex, causing even more simulation runs to verify across many Process, Voltage and Temperature (PVT) corners.

To reach the PPM goals would require high-sigma Monte Carlo circuit simulation for millions or billions of runs, something that is not feasible because the run times are just too long.

At ST they wanted to use one EDA tool flow to verify both standard cell libraries and memory IP across all PVT corners, using a smarter Monte Carlo simulation approach to achieve high-sigma verification in much less time.

Standard Cells

The challenge was that the standard cell libraries contained about 10,000 cells, and required 100 PVT corners to be simulated. Another factor was that IC design process variations are non-Gaussian at high-sigma.

For example, the long-tail distribution of 894k samples on an internal node of a sequential cell is shown below:

Non-Gaussian distribution

The bars in Green are from running just 1,000 samples, and if we assumed a Gaussian distribution then linear extrapolation predicts a node value of 723mV at 4.5 Sigma. Using a worst-case of 894k samples, the actual worst node value is 680mV, which is about 50mV different versus linear extrapolation. For sequential cells like a Master Slave Flip-Flop you cannot use linear extrapolation to qualify high sigma, or wait long enough for brute force Monte Carlo results to complete.

An EDA tool from Siemens EDA called Solido Variation Designer addresses these exact issues with a two stage flow:

  1. Solido PVTMC Verifier
  2. Solido High-Sigma Tech

The first tool uses a machine learning algorithm to identify all of the worst-cast PVT corners, fast and accurate. Even the long-tail distributions are captured, and the approach dramatically reduces the actual number of PVT corners to be simulated.

The Solido High-Sigma Tech launches a SPICE simulator called Eldo to capture the rare simulation events at the highest-sigma points using only the worst-case PVT corners identified in the first tool, thus reducing the verification run times.

Worst-Case Points

With this two stage flow to verify latch robustness at 6-sigma for a positive, edge-triggered Master Slave Flip-Flop, required  only 20 PVT combinations:

  • 5 process corners: FF, SS, FS, SF, TT
  • 2 voltages: 0.80V +/- 10%
  • 2 temperatures: -40C / 125C

Real SPICE circuit simulations were run at high-sigma conditions in order to capture the non-Gaussian long-tail behavior, but only on the worst PVT corners, resulting in high-sigma verification that was 10,000X faster than brute-force techniques.

Memory IP

Like with standard cells, there was a huge verification space for Memory IP with about 64 PVT combinations, and tens of IP instances. Netlists were quite large, with over 1M components. The bitcell on memory IP require a 6-sigma verification, and there are millions of bitcells on a memory IP block.

Using conventional verification methods an engineer would identify the memory instances with the smallest race margins and worst PVT corners, then run 300 Monte Carlo circuit simulations by sigma saturation on all of the race conditions, then assume a Gaussian distribution to determine the -5 sigma tail values. Drawbacks of this verification method are that you don’t know which PVT corners that race condition tail failures will occur, instances with larger nominal race margin values may have a larger sigma too, and the process deviates from Gaussian long-tail distributions.

Shown in red below are Gaussian approximations, while in reality the green curve shows actual variations.

Gaussian approximation – Red

Using the two stage flow with Solido Variation Designer on memory IP for 28nm and 40nm designs showed that high-sigma verification results could be obtained with 27,000 faster results, compared to brute-force Monte Carlo circuit simulations.

Summary

This group at ST has adopted a unified flow and methodology for standard cell libraries and memory IP verification, across PVT corners, while still modeling non-Gaussian behaviors, and achieving dramatic runtime reductions compared to brute-force Monte Carlo simulations, by using Solido Variation Designer. Nebabie also did a poster session on this paper at DAC.

Nebabie Kebebew, Siemens EDA

Related Blogs


5 Talks on RISC-V

5 Talks on RISC-V
by Milos Tomic on 12-27-2021 at 6:00 am

Milos Tomic

Veriest recently hosted a webinar focusing on RISC-V as a forerunner of ongoing open-source revolution in chip design. Speakers were distinguished professionals from industry and academia. Webinar covered topics from market trends to open-source hardware initiatives, tools and methodologies.

Zvonimir Bandić: RISC-V market update and CHIPS Alliance

Zvonimir is a Research Staff Member and Senior Director of Next Generation Platform Technologies Department at Western Digital (WD) and Chairman at CHIPS Alliance. He shared a story of how RISC-V came to WD accidentally from Berkeley University and made a huge impact on the company. Zvonimir cited a report from 2020. that claims that 23% percent of all ASIC and FPGA projects incorporate RISC-V in some way, while his personal feeling is that this percentage is even higher. Some of the markets where RISC-V already found its place are Data Centers, Cloud, HPC, Telecom, Automotive, Consumer and IoT, AI/ML, Edge computing etc. It is estimated that RISC-V CPU core market will grow at 114.9% CAGR, capturing over 14% of all CPU cores by 2025 – nearly 80 billion cores. Zvonimir claims that the core is only 3% of the whole ecosystem and that along with the core market, the surrounding IP and software markets will grow as well, offering a vast number of opportunities for companies and individuals to jump on-board.

Some of the main challenges in chip design today are development cost, time-to-market and need for purpose-built architecture. CHIPS Alliance is an organization looking to address these challenges by focusing on open-source hardware and open-source software for hardware design. It develops and hosts open-source RISC-V CPUs, hardware IPs and open-source ASIC & FPGA development tools. People and organizations looking to start design of their own product can find everything they need on CHIPS Alliance GitHub.

Prof. Borivoje Nikolić: Chipyard – Generating the next wave of custom silicon

Prof. Nikolić is a Distinguished Professor of Engineering at University of California, Berkeley. He shared insights on how RISC-V was born at Berkeley and how they are addressing main challenges in the chip design these days. Like Zvonimir, prof. Nikolić sees increased market demand for specialized chips and main challenges are in development cost and time it takes to build custom chip. At Berkeley they believe that current way of delivering IPs as black boxes greatly affect reusability. Instead of delivering instances, their approach is to use parametrized generators to describe the hardware and generate RTL. Generators are written in a hardware design language Chisel which is based on Scala. Generators not only provide easy way to customize designs, but also enable agile hardware development and fast turn-around cycles, something that is hard to achieve with traditional approach. Proof of this concept is Rocket Chip, a parametrized SoC generator with hundreds of commercial implementations.

To build a complete chip, a lot of open—source components are used. To connect tooling, generators and flows together, at Berkely they created a framework called Chipyard which is a one-stop-shop for SoC agile design.

Vladislav Palfy: OneSpin 360 processor verification app

Vladislav is a Senior Applications Engineering Manager at OneSpin. OneSpin company is a part of Siemens EDA and member of RISC-V International and Open Hardware Group. Vladislav explained how OneSpin 360 product can be used to execute formal verification of RISC-V core. Also, Vladislav pointed out the advantages of formal verification over simulation and why it is especially suitable for such complex design as a processor. In functional simulation it is very hard if not impossible to describe and cover all states. In OneSpin 360, the formal verification test process is automated, no test development nor assertion specification is required, and runtime and coverage closure are much faster. In addition, formal verification will help us find bugs we were not looking for, but also discover if there are functionalities which are not documented – this case is something they had with one of the popular RISC-V cores. OneSpin 360 supports RISC-V extensions, custom instructions as well as RISC-V cores specified in Chisel. In case of an issue, tool offers graphical environment for debugging where user can see failing checker, trace and code that caused the failure.

Siniša Stanojlović: RISC-V Memory protection

Siniša is a CEO of company Micro Circuits Development, a professional services provider for embedded systems. Siniša elaborated on vulnerabilities of all-connected/all-smart devices. RISC-V based devices are not immune to these threats, but they are different. While RISC-V devices implement similar security modules as other architectures, key difference is that they are open. Like in software, some see this openness as an advantage, others as a disadvantage.

Further Siniša focused on the example of memory protection in RISC-V through memory isolation. To achieve this, RISC-V ISA includes privileged instruction set specification which defines 3 types of computer systems:

  • Systems that have only machine mode
  • Systems with machine mode and user-mode.
  • Systems with machine-mode, supervisor mode, and user-modes.

RISC-V has physical memory protection, which is used to enforce memory access restrictions on less privileged modes e.g. from machine mode RISC-V can configure which user mode applications can access to which parts of the memory.

Miloš Tomić: Getting started with open-source RISC-V cores

I’m an ASIC Design Engineer at Veriest, an ASIC design and verification services provider. RISC-V surge created a lot of new business opportunities for service companies in the semiconductor industry.

For this webinar, I shared my view on RISC-V ecosystem, and my RISC-V enrolling experience. The focus was on available open-source core implementations and their specifics. I covered some of the key consideration that had to be made when choosing an open-source core for a new project. This includes core features, target application and technology, software requirements, licensing etc.

In the end, a short summary and comparison of some of the most widespread RISC-V implementation was given:

Finally, conclusion was that you can build your own RISC-V SoC just by using open-source tools and components, and there is more than one path you can take.
We’re looking forward to continuing to explore this interesting topic in future events. If would like to be informed about such event, please let us know here.

Also Read:

Ramping Up Software Ideas for Hardware Design

Verification Completion: When is Enough Enough?  Part II

Verification Completion: When is enough enough?  Part I


Podcast EP54: Ventana Micro, RISC-V, HPC and Chiplets

Podcast EP54: Ventana Micro, RISC-V, HPC and Chiplets
by Daniel Nenni on 12-24-2021 at 10:00 am

Dan is joined by Balaji Baktha, founder and CEO of Ventana Micro. Balaji explores the application of RISC-V in high-performance applications and the specific advantages of a chiplet-based approach.

RISC-V Summit Panel: https://www.youtube.com/watch?v=duZaAhWxhWM

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


AI for EDA for AI

AI for EDA for AI
by Daniel Nenni on 12-24-2021 at 6:00 am

Agnisys AI EDA AI

I’ve been noticing over the last few years that electronic design automation (EDA) vendors just love to talk about artificial intelligence (AI) and machine learning (ML), sometimes with deep learning (DL) and neural networks tossed in as well. It can get a bit confusing since these terms are used in two distinct contexts. The first is the use of EDA tools to develop chips for AI applications, which are some of the largest and most complex designs being developed today. Of course, self-driving cars and other autonomous vehicles are the most popular examples. It’s easy to see why; being able to replace human drivers and all the multi-faceted decisions they make requires a ton of powerful AI software running on specialized hardware. The system must be optimized for image recognition, ML, natural language processing (NLP), real-time responses, and more. AI shows up other applications as well; speech recognition in particular is everywhere.

I guess that’s obvious, but the other context is the use of AI techniques within EDA tools. That is not as widely known despite the tendency of EDA vendors to trumpet such usage. At times I’ve wondered if it’s all a lot of hype to jump on the AI bandwagon, but at this point I think it’s clear that there are at least a few, and perhaps many, places in the EDA sphere where AI and ML really do apply. For example, vendors have announced implementation tools (logic synthesis, floorplanning, and place and route) that use ML from past projects and the results thus far on the current project to improve chip power, performance, and area (PPA). In addition, AI-based recognition of error signatures can speed debug of failing tests during chip verification. Just before DAC, another example caught my attention: Agnisys announced the use of AI to translate English descriptions of design intent to SystemVerilog Assertions (SVA), and vice-versa. I had not heard of AI/ML being used for this purpose before, so I decided to learn more.

The first thing that struck me was that the press release announced a “technology” and not a product. It sounded as if the translation was available to anyone at iSpec.ai so I checked it out. I was pleasantly surprised to find the site just as advertised. Users can type in some English text and push a button to generate SVA or enter some SVA code and generate the English equivalent, and then provide feedback on the results. I don’t claim to be an assertions expert, but I tried some English examples and the underlying algorithms seemed to handle them just fine. I wondered why an EDA vendor would offer this site for free rather than charging for the technology in a product, so I asked Agnisys CEO and founder Anupam Bakshi for more information.

Anupam described this as a crowdsourcing site and said that they made it free and open specifically to gather many different examples of how engineers think about design intent and describe assertions in natural language. He said that they performed initial training on the algorithms using assertion examples gathered from many sources, including an industry expert who literally wrote the book on SVA. But they knew that this would not be enough to create a robust technology that users could rely on, so they created the site and announced its availability to their users. Their R&D engineers carefully studied all the examples provided and, especially when users provided feedback that the results were not perfect, provided guidance to the tool as needed to learn from these additional examples. By the end of this process, they were comfortable letting everyone try it out. Anupam remarked that the technology is not yet “done” and that it will continue to improve in capacity and flexibility with additional crowdsourcing and lots more diverse examples.

Having said that, he stressed that what’s available now is powerful and valuable. He pointed out that the developers focused on robustness, necessary given the inherent ambiguity of natural language. He demonstrated the resilience of the algorithms by typing in a bunch of examples with typos in the English text, and the generated SVA was still correct. I was impressed that typing “onenot” instead of “onehot” and “bicycles” rather than “cycles” didn’t cause confusion; I guess that’s truly intelligent AI in action. It seems to me that iSpec.ai will be immediately useful for many types of assertions. I won’t yet go as far as to predict that users won’t have to learn SVA at all, but that seems like an entirely possible outcome as the technology matures further.

Users who do want to learn and write SVA will doubtless benefit from the translation in the opposite direction, using the generated English descriptions to double-check that their assertions specify what they intended. Anupam mentioned two additional uses: understanding and documenting existing assertions. Engineers often license IP blocks or inherit designs from other sources, and these may contain SVA that they didn’t write. Translating them into text could help the users to figure out what these assertions do. This process could also be used to document assertions, whether self-written or inherited, in verification plans and specifications.

I found this whole topic fascinating, and I suggest that everyone interested in assertions visit iSpec.ai and try some examples. I think you’ll be impressed and, if you do fool the AI with a novel way to express design intent, just provide feedback and rest assured that the Agnisys team will use your clever example to enhance and expand the technology for the benefit of all users. That’s what crowdsourcing is all about. Have fun!

Also read:

What the Heck is Collaborative Specification?

AUGER, the First User Group Meeting for Agnisys

Register Automation for a DDR PHY Design


Scalable Concolic Testing. Innovation in Verification

Scalable Concolic Testing. Innovation in Verification
by Bernard Murphy on 12-23-2021 at 10:00 am

Scalable Concolic Testing

Combining simulation and symbolic methods is an attractive way to excite rare branches in block-level verification, but is this method really scalable? Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Scalable Concolic Testing of RTL Models. The paper was published in the IEEE Transactions on Computers, July 2021. The authors are from U. Gainesville, Florida.

Reminder: Concolic uses simulation analysis (Concrete) to together with Symbolic methods to try to achieve a balance between the scalability of concrete and the exhaustive coverage of symbolic. The authors use this technique to improve coverage on hard-to-reach branches in block-level RTL. Their main contribution is improved heuristics to reduce path explosion in the symbolic analysis without impacting coverage.

The authors have a couple of innovations. They have a method they call “contribution-aware edge-realignment” whose goal is to find an efficient way to force a single path to an uncovered branch. This avoids state explosion problems. They look for assignments to variables used in the branch condition and grade these based on their likely contribution to meeting that condition.

The second innovation aims to overcome inefficiency in covering only one rare branch at a time. They strive to cover multiple targets in part by pruning, so that if uncovered branch x is in the path to uncovered branch y, x can be dropped as it will be covered in activating y (unless y is unreachable). They show considerable improvement over other reported work in run times and memory to activate rare branches.

Paul’s view

This is a great paper and a very easy read given the depth of the content. I am big fan of concolic, here used to improve branch coverage. This work is very relevant for mainstream commercial verification.

The core contribution in the paper is the concept of a branch activation graph. This is a graph with nodes representing branches in the RTL (i.e. a branch condition and its begin-end body), with an edge from branch A to branch B if there exists a design state where branch A is triggered and B is not, and where executing the body of branch A takes the design to a state in which branch B is then triggered.

This activation graph serves as a guide for a symbolic simulator to prioritize its search to reach an as-yet uncovered branch in verification. If there are no input values that can trigger a uncovered branch from the current state, try input values that trigger an adjacent branch in the activation graph. If this is not possible, pick a branch that is two hops away from the uncovered branch in the activation graph. And so on. After applying this heuristic for several clock cycles there is a good chance the symbolic simulator will hit the uncovered branch. Certainly a much better chance than were it to just randomly toggle inputs and hope to get lucky.

The results presented are compelling. Use of the activation graph along with a few other innovations to prune searches and pick good starting traces results in a solution that is 40x faster and scales 10x more efficiently in memory consumption with design size compared to prior work using alternate heuristics in the symbolic simulator. There is just one outlier case, a usb_phy, where their approach does not work as well. I am really curious why this testcase was an exception; unfortunately, this wasn’t explained in the paper.

We are working on a formal-based concolic engine at Cadence that we call “trace swarm”. The activation graph concept in this paper could be a great fit for this too.

Raúl’s view

The system uses the Design Player Toolchain to flatten Verilog and the Yices SMT solver for constraint solving. The authors compare experimental results  to the Enhanced Bounded Model Checker EBMC. Also to QUalifying Event Based Search QUEBS (another concolic approach). They selected benchmarks from ITC99, TrustHub and OpenCores with 182 to 456,000 lines of code for the flattened design and 20 hard to reach targets in each benchmark which they picked after running a million random tests.

Coverage for this approach is 100%, execution time from subsecond to 134s, and memory 9.5MB-1.2GB. The other approaches don’t reach 100% coverage. They also run between a few times to a few 100 times slower using 1-10x memory. There are outliers: USB_PHY runs 16-50x slower in the presented approach (134s) using 5 times more memory (just 138MB). As Paul commented an explanation would have been nice. QUEBS runs ICACHE 30,000 times slower.

The paper also shows that memory scalability as unrolled cycles/lines of code increase is much better that EBMC. (EBMC is the more scalable of the competing approaches). Finally, they compare target pruning  to EBMC, showing also better results. For example, the number of targets pruned by this approach is 15 vs 10 in EBMC.

I like the approach as being pragmatic. It uses an intuitive concept of distance based on assigned variables and path lengths, avoiding repeat computations (pruning, clustering). And of course the mix of simulation and constraint solving.  The results are promising, with modest execution times and memory usage, and scale well. This would be a worthwhile addition to constrained RTPG to increase coverage or to existing concolic tools.

My view

As a path to activate rare branches, this looks a more targeted approach than starting traditional BMC checking from various points along a simulation path. Very interesting.

Also Read

More Than Moore and Charting the Path Beyond 3nm

Topics for Innovation in Verification

Learning-Based Power Modeling. Innovation in Verification