Bronco Webinar 800x100 1

TECHTALK: Hierarchical PI Analysis of Large Designs with Voltus Solution

TECHTALK: Hierarchical PI Analysis of Large Designs with Voltus Solution
by Bernard Murphy on 03-03-2021 at 6:00 am

voltus min

Power integrity analysis in large chip designs is especially challenging thanks to the huge dynamic range the analysis must span. At one end, EM estimation and IR drop through interconnect and advanced transistor structures require circuit-level insight—very fine-grained insight but across a huge design. At the other, activity modeling requires system-level insight and rolling EM-IR analytics up to the full-chip power delivery network (PDN). Watch this CadenceTECHTALK on hierarchical PI analysis March 11 on a new approach to meet this need. REGISTER NOW to make sure you don’t miss the webinar.

The need

These are real design problems today, such as in the giant AI chips you are likely to see in hyperscalar installations, or perhaps in a CPU cluster together with eight giant GPUs. These are already way too big to run full-flat EM-IR analysis across the whole chip. Yet they are very important analyses to get right, because marketable implementations depend on finding the narrow window between under-design and over-design, between a design that may fail on timing and/or reliability in production or a design that you didn’t sufficiently size up critical areas of the PDN, or a design for which, to overcompensate for an uncertain analysis, you sized up too much of the network, pushing chip area outside a profitable bound.

Cadence has introduced a hierarchical analysis methodology in the Voltus IC Power Integrity Solution, which is particularly well suited to large designs with multiple repeated elements like those GPUs. (Come to think of it, this may well cover most super-large designs. After all, who is going to build such a design purely out of unique functions?) This latest release will generate models for IP blocks that can stand in for those blocks in full-chip analysis. These models have an order-of-magnitude-lower memory demand yet preserve accuracy within a few percent of a full-flat analysis—a very practical approach to managing EM-IR analysis across huge designs.

Summary: Hierarchical PI Analysis of Large Designs with Voltus Solution

Memory requirements and runtime for full-chip EM-IR analysis has become a major challenge at advanced nodes. It is not uncommon to see designs with 100s of millions of cells and some even in the multi-billion range. To run a flat analysis requires multiple terabytes of memory over a distributed network. To mitigate these issues, the Cadence® Voltus™ IC Power Integrity Solution enables designers to run hierarchical analysis using IP modeling technology. This helps designers create xPGV models for their IP blocks, accurately capturing the demand current and electrical parasitics. These xPGV models are an order of magnitude smaller compared to the fully extracted block. When used in the chip-level analysis, can help significantly reduce runtime and memory. The modeling methodology used in the Voltus IC Power Integrity Solution ensures minimal result difference relative to a fully flat analysis. This TechTalk will cover the generation of xPGV models, including the package model, and their use in IC-level analysis. 

Attend this CadenceTECHTALK to learn how to:

  • Run your largest designs much faster with lower memory
  • Perform very accurate sub-chip analysis, including impact of chip-level demand current and parasitics
  • Reuse IP models in different designs or for multiple instantiations within a design

Also Read

Finding Large Coverage Holes. Innovation in Verification

2020 Retrospective. Innovation in Verification

ML plus formal for analog. Innovation in Verification


USB4 Makes Interfacing Easy, But is Hard to Implement

USB4 Makes Interfacing Easy, But is Hard to Implement
by Tom Simon on 03-02-2021 at 10:00 am

USB4 Verification IP

USB made its big splash by unifying numerous connections into a single cable and interface. At the time there were keyboard ports, mouse ports, printer ports and many others. Over the years USB has delivered improved performance and greater functionality. However, as serial interfaces became more popular and started being used for what were previously parallel interfaces, there was a proliferation of new serial cables and protocols. The latest version of USB, referred to as USB4, makes a new bold move to unify many of these different interfaces. USB4 naturally works for USB data streams, but it also can tunnel PCIe, Thunderbolt3, and DisplayPort data streams.

USB4 supports 20 Gbps and can go up to 40 Gbps. It specifies use of the USB Type-C connector, which further simplifies the user experience. And unlike its predecessors, it mandatorily manages the power distribution with USB PD. It offers one connector for device interfaces, storage, peripherals and display output. However, with this unification comes complexity under the hood. Many legacy and new features are included in the host and device specification for USB4.

One of the hallmarks of the USB interface is its backward compatibility. And so, USB4 is USB 2 and USB 3 compatible, as one might expect. USB4 is a multi-lane interface, with lane bonding support as well for pipelining. Higher data rates call for more sophisticated encoding and error correction algorithms. Layers of abstraction for routing and tunneling have added complexity. Indeed, the list of features inside a properly functioning USB4 interface is lengthy.

Implementing USB4 is not a trivial task. At each stage of development, it is essential to have the ability to verify that everything conforms to the specification and is implemented properly. It is imperative to have a verification environment that can exercise all the functionality and provide designers information to help isolate and pin down issues. Last Summer Truechip, a leading provider of verification IP (VIP), announced the customer shipment of their USB4 and eUSB VIP.

USB4 Verification IP

Truechip has a truly impressive offering of VIP for nearly every category of design. These include storage, BUS & interfaces, USB, automotive, memory, PCIe, networking, MIPI, AMBA, display, RISC-V, and defense & avionics. Their VIP includes coverage, assertions, BFMs, monitors, scoreboard and testcases. The host and device BFM models, includes bus functional models and agents for the electrical layer, logical layer, transport layer, configuration layer and the protocol adapter layer. Their VIP works on a wide range of platforms – UVM, OVM, VMM and Verilog.

Truechip’s USB4 VIP is fully compliant with the v1.0 specification. It includes backward compatibility with USB 2.0. As expected, it also includes the Power Delivery for USB 3.0 and Type-C v2.0. Truchip’s VIP also supports all logical layer ordered sets. It has 64/66b, 128/132b and Reed-Solomon FEC encoding and decoding. In reality the list of features it supports is too long to list here.

The deliverables for the USB4 VIP are also comprehensive. In addition to the host and device models, it includes bus functional models and agents for the electrical layer, logical layer, transport layer, configuration layer and the protocol adapter layer. It comes with a monitor and scoreboard. There are test suites for basic and directed protocol tests. It has low power tests, error scenario tests, stress tests, random tests and compliance tests.

Truechip’s USB4 VIP is highly configurable and contains everything needed to verify any portion of a USB4 interface design. With it designers can be assured that their finished products will fully conform to the specification and will work reliably in silicon. For more information on this VIP check out the Truechip website.

Also read:

Bringing PCIe Gen 6 Devices to Market

PCIe Gen 6 Verification IP Speeds Up Chip Development

TrueChip CXL Verification IP

Webinar Replay on TileLink from Truechip

 

 

 

 

 


Features of Resistive RAM Compute-in-Memory Macros

Features of Resistive RAM Compute-in-Memory Macros
by Tom Dillinger on 03-02-2021 at 8:00 am

V bitline

Resistive RAM (ReRAM) technology has emerged as an attractive alternative to embedded flash memory storage at advanced nodes.  Indeed, multiple foundries are offering ReRAM IP arrays at 40nm nodes, and below.

ReRAM has very attractive characteristics, with one significant limitation:

  • nonvolatile
  • long retention time
  • extremely dense (e.g., 2x-4x density of SRAM)
  • good write cycle performance (relative to eFlash)
  • good read performance

but with

  • limited endurance (limited number of ‘1’/’0’ write cycles)

These characteristics imply that ReRAM is well-suited for the emerging interest in compute-in-memory architectures, specifically for the multiply-accumulate (MAC) computations that dominate the energy dissipation in neural networks.

To implement a trained NN for inference applications, node weights in the network would be written to the ReRAM array, and the data inputs would be (spatially or temporally) decoded as the word lines accessing the array weight bits.  The multiplicative product of the data/wordline = ‘1’ and the stored weight_bit = ‘1’ would result in significant memory bitline current that could be readily sensed to denote the bit product output – see the figure below.

At the recent International Solid State Circuits Conference (ISSCC), researchers from Georgia Tech University and TSMC presented results from an experimental compute-in-memory design using TSMC’s 40nm ReRAM macro IP. [1]  Their design incorporates several unique features – this article summarizes some of the highlight of their presentation.

Background

As the name implies, ReRAM technology is based on the transitions of a thin film material between a high-resistance and low-resistance state.  Although there are a large number of different types of materials (and programming sequences) used, a typical metal-oxide thin-film implementation is depicted in the figure below.

The metal oxide thin film material shown incorporates the source and transport of oxygen ions/vacancies under an applied electric field of high magnitude.  (The researchers didn’t elaborate on the process technology in detail, but previous TSMC research publications on ReRAM development did utilize a TiO-based thin film programming layer.  Multiple metal-oxide thin film materials are also used.)

As depicted in the figure above, an initial “filament forming” cycle is applied, resulting in transport of oxygen ions in the thin film.  In the Reset state (‘0’), a high electrical resistance through the metal-oxide film is present.  During the application of a Set (‘1’) write cycle, oxygen ion migration occurs, resulting in an extension of the filament throughout the thin film layer, and a corresponding low electrical resistance.  In the (bipolar operation) technology example depicted above, the write_0 reset cycle breaks this filament, returning the ReRAM cell to its high resistance state.

The applied electric field across the top thin film for the (set/reset) write operation is of necessity quite large;  the applied “read” voltage to sense the (low or high) bitcell resistance utilizes a much smaller electric field.

There are several items of note about ReRAM technology:

  • the bitcell current is not a strong function of the cell area

The filamentary nature of the conducting path implies that the cell current is not strongly dependent on the cell area, offering opportunities for continued process node scaling.

  • endurance limits

There is effectively a “wearout” mechanism in the thin film for the transition between states – ReRAM array specifications include an endurance limits on the number of write cycles (e.g., 10**4 – 10**6).  Commonly, there is no limit on the number of read cycles.

The endurance constraints preclude the use of ReRAM as a general-purpose embedded “SRAM-like” storage array, but it is the evolutionary approach adopted as an eFlash replacement, and a compute-in-memory offering where pre-calculated weights are written, and updated very infrequently.

  • resistance ratio, programming with multiple write cycles

The goal of ReRAM technology is to provide a very high ratio of the high resistance to low resistance states (HRS/LRS).  When the cell is being accessed during a read cycle – i.e., data/wordline = ‘1’ – the bitline sensing circuit is simplified if i_HRS << i_LRS.

Additionally, it is common to implement a write to the bitcell using multiple iterations of a write-read sequence, to ensure the resulting HRS or LRS cell resistance is within the read operation tolerances.  (Multiple write cycles are also initially used during the forming step.)

  • HRS drift, strongly temperature dependent

The high-resistance state is the result of the absence of a conducting filament in the top thin film, after the oxygen ion transport during a write ‘0’ operation.  Note in the figure above the depiction of a high oxygen vacancy concentration in the bottom metal oxide film.  Any time a significant material concentration gradient is present, diffusivity of this material may occur, accelerated at higher temperatures.  As a result, the HRS resistance will drift lower over extended operation (at high temperature).

Georgia Tech/TSMC ReRAM Compute-in-Memory Features

The researchers developed a ReRAM-based macro IP for a neural network application, with the ReRAM array itself providing the MAC operations for a network node, and supporting circuitry providing the analog-to-digital conversion and the remaining shift-and-add logic functionality.  The overall implementation also incorporated three specific features to address ReRAM technology issues associated with:  HRS and LRS variation; low (HRS/LRS) ratio; and, HRS drift.

low HRS/LRS ratio

One method for measuring the sum of the data inputs to the node multiplied times a weight bit is to sense the resulting bitline current drawn by the cells whose data/wordline = ‘1’.  (Note that unlike a conventional SRAM block with a single active decoded address wordline, the ReRAM compute-in-memory approach will have an active wordline for each data input to the network node whose value is ‘1’.  This necessitates considerable additional focus on read-disturb noise on adjacent, unselected rows or the array.)  However, for a low HRS/LRS ratio, the bitline current contribution from cells where data = ‘1’ and weight = ‘0’ needs to be considered.  For example, if (HRS/LRS) = 8, the cumulative bitline current of eight (data = ‘1’ X weight = ‘0’) products will be equivalent to one LRS current (‘1’ X ‘1’), a binary multiplication error.

The researchers chose to use an alternative method.  Rather than sensing the bitline current (e.g., charging a capacitor for a known duration to develop a readout voltage), the researchers pumped a current into the active bitcells and measured Vbitline directly, as illustrated below.

The effective resistance is the parallel combination of the active LRS and HRS cells.  The unique feature is that the current source value is not constant, but is varied with the number of active wordlines – each active wordline also connects to an additional current source input.  Feedback from Vbitline to each current source branch is also used, as shown below.

This feedback loop increases the sensitivity of each current source branch to Reffective, thus amplifying the resistance contribution of each (parallel) LRS cell on the bitline, and reducing the contribution of each (parallel) HRS cell.  The figure below illustrates how the feedback loop fanout to each current branch improves the linearity of the Vbitline response, with an increasing number of LHS cells accessed (and thus, parallel LRS resistances contributing to Rtotal).

LRS/HRS variation

As alluded to earlier, multiple iterations of write-read are often used, to confirm the written value into the ReRAM cell.

The technique employed here to ensure a tight tolerance on the written HRS and LRS value evaluates the digital value read after the write, and increases/decreases the pulse width of the subsequent (reset/set) write cycle iteration until the (resistance) target is reached, ending the write cycle.

HRS drift

The drift in HRS resistance after many read cycles is illustrated below (measured at high operating conditions to accelerate the mechanism).

To compensate for the drift, each bitcell is periodically read – any HRS cell value which has changed beyond a pre-defined limit will receive a new reset write cycle to restore its HRS value.  (The researchers did not discuss whether this “mini-reset” HRS write cycle has an impact on the overall ReRAM endurance.)

Testsite Measurement Data

A micrograph of the ReRAM compute-in-memory testsite (with specs) is shown below.

Summary

ReRAM technology offers a unique opportunity for computing-in-memory architectures, with the array providing the node (data * weight) MAC calculation.  The researchers at Georgia Tech and TSMC developed a ReRAM testsite with additional features to address some of the technology issues:

  • HRS/LRS variation:  multiple write-read cycles with HRS/LRS sensing are used
  • low HRS/LRS ratio:  a Vbitline voltage-sense approach is used, with a variable bitline current source (with high gain feedback)
  • HRS drift:  bitcell resistance is read periodically, and a reset write sequence applied if the read HRS value drops below a threshold

I would encourage you to review their ISSCC presentation.

-chipguy

References

[1]  Yoon, Jong-Hyeok, et al., “A 40nm 64kb 56.67TOPS/W Read-Disturb-Tolerant Compute-in-Memory/Digital RRAM Macro with Active-Feedback-Based Read and In-Situ Write Verification”, ISSCC 2021, paper 29.1.

 


It’s Energy vs. Power that Matters

It’s Energy vs. Power that Matters
by Lauri Koskinen on 03-02-2021 at 6:00 am

Lauri at the white board

In tiny devices, such as true wireless headphones, the battery life of the device is usually determined by the chips that execute the device’s functions. Professor Jan Rabaey of UC Berkeley, who wrote the book on low power, also coined the term “energy frugal” a number of years ago, and this term is even more valid today with the proliferation of true wireless devices.

When optimizing the battery lifetime, many times power and energy are used interchangeably. However, they are not interchangeable as the device’s battery stores energy while reducing power can actually consume more energy. Techniques to reduce energy by reducing voltage are being deployed more broadly as demand takes off for true wireless products. In this blog, I’m going to illustrate what’s behind this trend through several examples that demonstrate the relationship between energy, power and voltage.

Let’s start by reviewing the basic equations for energy and power, shown below in Figure 1.  They look similar but there are a few, critical takeaways: 1) energy consumption cannot be reduced by reducing frequency, 2) leakage cannot be reduced without reducing VDD (excluding process options) and finally, 3) because of the quadratic relationship, VDD is by far the most effective method of reducing energy.

Figure 1: Basic equations for energy and power

Let’s look at the takeaways with some examples. For takeaway 1), an example is simple: reducing frequency by 10%, for example, increases Eleak by 10% (as t increases 10%) while Edyn remains unchanged. This “fallacy” is mostly seen in “run-to-complete” strategies. For example, let’s say that your processor consumes 90% dynamic energy and 10% leakage energy at its nominal voltage. If you run to complete (i.e. run the processor as fast as you can) and then let it leak (i.e. no power gating), neither dynamic energy or leakage change (see equation). But the fallacy shows up if you try to run faster for the sake of shutting down earlier.  For example, let’s say a 10% frequency increase for a 10% VDD increase to run to complete 10% faster. Your new energy consumption is E = 0.9*(1,1)2 + 0.1*1.1*0.9 = 119%. Clock gating doesn’t change this equation as it equally affects all dynamic energy cases but let’s look at power gating’s effect. If your power gating switches super-fast and doesn’t cost active energy, then the theoretical maximum you can save is the leakage energy (10%). How about running as fast as you can and then power gating? The dynamic energy increase is quadratic and the leakage linear, so you can’t win. For the 10% frequency increase case above, you would still end up consuming more energy (0.9*(1,1)2 + 0 = 109%).

For takeaways 2) and 3) above, let’s turn to examples that employ reduced voltage. These are not hypothetical examples as we are working with companies to deploy solutions based on reduced voltage today.  I’ll need to explain a few assumptions to start.  Assume that your computation time linearly depends on VDD (a realistic assumption up to a point). Let’s say that this is a slow operating mode (you also have modes that take more of the clock cycle), so your processor (at the same 90% dynamic/ 10% leakage energy as above) finishes in 50% of the clock cycle. Let’s use the remaining 50% of the clock cycle to reduce VDD (i.e. halve VDD). This would result in a huge reduction in energy.  For those interested in the exercise: E = 0.9*(0.5)2 + 0.1 * 0.5 * 2 = 32.5%. It gets even better,

as Ileak reduces exponentially with voltage. Let’s say that Ileak reduces by 90% when VDD is halved as above. Your new energy is reduced further to only 23.5%. (E = 0.9*(0.5)2 + 0.1 * 0.1 (0.5 * 2) = 23.5%)

In case you are thinking that I’m writing this from an ivory tower, there are also cases where reducing voltage does not make sense when looking at the total chip. Let’s say that you have an old PLL which consumes as much energy as your processor but which can be shut off with no leakage. Then the 50% VDD drop case from above would end up consuming more energy (2*0.5 + 0.5*(0.9*(0.5)2 + 0.1 * 0.1 (0.5 * 2)) = 112%). It’s not an uncommon story in the IC industry that the overhead ends up cancelling out the gains, and in upcoming blogs I’ll show you how to avoid that with dynamic voltage and frequency scaling (DVFS) systems based on our experience working with design teams working on true wireless devices.

https://minimaprocessor.com/


Webinar: Achronix and Vorago Deliver Innovation to Address Rad-Hard and Trusted SoC Design

Webinar: Achronix and Vorago Deliver Innovation to Address Rad-Hard and Trusted SoC Design
by Mike Gianfagna on 03-01-2021 at 10:00 am

Webinar Achronix and Vorago Deliver Innovation to Address Rad Hard and Trusted SoC Design FINAL

Radiation hardening is admittedly not a challenge every SoC design team faces. Methods to address this challenge typically involve a new process technology, a new library or both. Trusted, secure design is something more design teams worry about and that number is growing as our interconnected world creates new and significant attack surfaces. This challenge typically requires the introduction of new IP, new process tweaks or both. There is a webinar coming on SemiWiki that explains how to deal with both of these challenges with minimal perturbation to both the IP and process strategy. The work here is significant. Read on to learn how Achronix and Vorago deliver Innovation to address rad-hard and trusted SoC design.

The webinar presents the collaboration of two companies. Achronix brings embedded FPGA technology to the table and Vorago brings a unique and low-impact approach to radiation hardened design. Together, these two companies solve a lot of rather difficult problems in an elegant way. First, a bit about the speakers.

Dr. Patrice Parris

The webinar begins with a presentation by Dr. Patrice Parris, chief technology officer at Vorago Technologies. With several degrees in EE, CS and physics from MIT and a diverse career in innovative work at NXP, Freescale and Motorola, Patrice provides a comprehensive overview of radiation hardened design that is easy to follow. He describes Vorago’s unique and patented capabilities to provide technology solutions to address radiation hardening and extreme temperature requirements. More on this technology in a moment.

The next speaker is Raymond Nijssen, vice president and chief technologist at Achronix. Raymond has deep background in ASIC/FPGA design as well as EDA product development. He is driving both the software systems to support Achronix FPGAs as well as key aspects of its FPGA architectures. Both of these gentlemen hold multiple patents. The depth of their technical understanding is substantial. More relevant for the webinar is that they both are able to explain complex concepts in ways that are easy to understand.

Raymond Nijssen

If the topics of radiation hardening or trusted, secure design are of interest, I highly recommend this webinar. You will come away with new tools and new insights. I will provide an overview of the topics covered in the webinar and then provide a link to register.

We’ll start with Vorago. The company provides an innovative technology called HARDSIL® that adds radiation hardening cost-effectively to existing production fab capability. The approach is to add a small number of mask steps and implants to achieve rad-hard performance. These additions are easily added and don’t impact transistor performance or yield. So, there is minimal impact on the design flow and IP. If this sounds too good to be true, watch the webinar. You will be treated to a very comprehensive overview of how this all works, including SEM photos.  Patrice also does a great job explaining the various types of circuit events that occur during radiation dosing of semiconductors. There are several, with different implications to short- and long-term performance of the circuit. I thought I understood these issues. I wound up learning some new and interesting concepts.

Throughout the webinar, Patrice and Raymond interleave their presentations to build the complete story. Achronix is a unique company that provides both stand-alone and embedded FPGA solutions. I previously covered the offerings of Achronix in this post. There are many other excellent posts about Achronix on its SemiWiki page. Raymond provides an overview of the threats that exist in the semiconductor supply chain. There are many opportunities for theft, tampering and reverse engineering. A trusted flow is daunting for sure. What is quite interesting are the benefits of using embedded FPGA technology in chip design. You need to see Raymond unfold the benefits in detail, but the primary point is that the function and implementation of a circuit are separated in an FPGA and that makes a big difference regarding security.

Raymond and Patrice also describe how HARDSIL is being applied to the Achronix embedded FPGA technology to complete the picture. There is a lot of very useful information presented in this webinar. The tight collaboration between Achronix and Vorago comes across quite well. This webinar will be presented on Tuesday, March 9, 2021 at 10AM Pacific time. You can register for the webinar here. I highly recommend you attend and see how Achronix and Vorago deliver Innovation to address rad-hard and trusted SoC design.

 

The views, thoughts, and opinions expressed in this blog belong solely to the author, and not to the author’s employer, organization, committee or any other group or individual.


TSMC ISSCC 2021 Keynote Discussion

TSMC ISSCC 2021 Keynote Discussion
by Daniel Nenni on 03-01-2021 at 6:00 am

Mark Liu TSMC ISSCC 2021

Now that semiconductor conferences are virtual there are better speakers since they can prerecord and we have the extra time to do a better job of coverage. Even when conferences go live again I think they will also be virtual (hybrid) so our in depth coverage will continue.

ISSCC is one of the conferences we covered live since it’s in San Francisco so that has not changed. We will however be able to cover many more sessions as they come to our homes on our own time.

First off is the keynote by TSMC Chairman Mark Liu:  Unleashing the Future of Innovation:

Given the pandemic related semiconductor boom that TSMC is experiencing, Mark might not have had time to do a live keynote so this was a great opportunity to hear his recorded thoughts on the semiconductor industry, the foundry business model, and advanced semiconductor technologies. Here are some highlights from his presentation/paper intermixed with my expert insights:

  • The semiconductor industry has been improving transistor energy efficiency by about 20-30% for each new technology generation and this trend will continue.
  • The global semiconductor market is estimated at $450B for 2020.
  • Products using these semiconductors represent 3.5% of GPD ($2T USD).
  • From 2000 to 2020 the overall semiconductor industry grew at a steady 4%.
  • The fabless sector grew at 8% and foundry grew 9% compared to IDM at 2%.
  • In 2000 fabless revenue accounted for 17% of total semiconductor revenue (excluding memory).
  • In 2020 fabless revenue accounted for 35% of total semiconductor revenue (excluding memory).
  • Unlike IDMs, innovators are only limited by their ideas not capital.

Nothing like a subtle message to the new Intel CEO. It will be interesting to see if the Intel – TSMC banter continues. I certainly hope so. The last one that started with Intel saying that the fabless model was dead did not end so well.

Mark finished his IDM message with:

“Over the previous five decades, the most advanced technology had been available first to captive integrated device manufacturers (IDMs). Others had to make do with technologies that were one or several generations behind. The 7nm logic technology (mass production in 2017) was a watershed moment in semiconductor history. In 2017, 7nm logic, was the first time that the world’s most advanced technology was developed and delivered by a pure-play foundries first, and made available broadly to all fabless innovators alike. This trend will likely continue for future technology generations…”

As we all now know Intel will be expanding TSMC outsourcing at 3nm. TSMC 3nm will start production in Q4 of this year for high volume manufacturing beginning in 2H 2022. The $10B question is: Will Intel get the Apple treatment from TSMC (early access, preferred pricing, and custom process recipes)?

I’m not sure everyone understands the possible ramifications of Intel outsourcing CPU/GPU designs to TSMC so let’s review:

  • Intel and AMD will be on the same process so architecture and design will be the focus. More direct comparisons can be made.
  • Intel will have higher volumes than AMD so pricing might be an issue. TSMC wafers cost about 20% less than Intel if you want to do the margins math.
  • Intel will have designs on both Intel 7nm and TSMC 3nm so direct PDK/process comparisons can be made.

Bottom line: 2023 will be a watershed moment for Intel manufacturing, absolutely!


The Chip Market / China Conundrum

The Chip Market / China Conundrum
by Malcolm Penn on 02-28-2021 at 2:00 pm

China Taiwan Chip Dilema

In its February 20, 2021 edition, the Economist published an article entitled “How to kill a democracy; China faces fateful choices, especially involving Taiwan”.  It went on to quote “To many Chinese, the island’s conquest is a sacred national mission” as well as a by-line “America is losing its ability to deter a Chinese attack on Taiwan.  Allies are in denial.”

The thought of such an attack should send cold shivers down the chip industry’s spine given, were this to happen, a pivotal part of the western world’s chip supply would dry up overnight.  Chip inventories would quickly become exhausted and end equipment production lines everywhere would grind to a halt within a matter of weeks, even days.  The near instant impact on global trade and the world economy would be orders of magnitude greater than the 2008 Lehman Brothers crash or the 2020 Covid-19 lockdown.

This problem has been brewing for years, the combined result of an efficient out-sourcing regime, driven faultlessly by TSMC, aided and abetted by super-efficient chip-design tools.  Both trends have been manna from heaven to chip firms, users and their investors alike, as it offered lower chip costs and allowed firms to deploy outsourcing-rich, asset-lite manufacturing strategies, increasing profits and diverting their cash flows from investments to dividends and share buy-back schemes.  It was accounting Excel Sheet heaven.

No-one paid any attention to the loss of control of a key strategic manufacturing industry, why should they?  Taiwan was the West’s friend and TSMC an outstanding company and, in any case, chips were just another commodity.  The Real men have fabs’ naysayers were ridiculed as out of touch, out of date, twentieth century dinosaurs.

The current chip shortage, and its devastation impact on the automotive industry, has to a limited extent stirred the chip-supply hornet’s nest, but this will all blow over once the supply-demand imbalance gets sorted.  Knee-jerk initiatives, such as US ‘Chips for America’ and EU ‘European Initiative on Processors and Semiconductor Technologies’ are the wrong answer to the right problem.  They fail to address the fundamental issue that chip firms do not want to own wafer fabs (it screws up their balance sheet) and the chip users don’t care where the chips come from (so long as they’re cheap).  There’s neither market pull or push!

China has been aware of this out-sourced dependency risk for years, hence its drive for national self-sufficiency in chip production, but any fast follower catch-up strategy is notoriously hard to achieve.  As a benchmark, it took TSMC over twenty-five years to come close to manufacturing parity with the best in practice manufacturers, and only in the past five has it moved into pole position, yet they are, without doubt, the best chip firm in the world.  If it took TSMC this long to catch up, what chance has anyone else, hence the reason why, even before the US-imposed sanctions, China has made such modest progress.

But, as the Economist points out, the Taiwan conundrum represents unfinished business from the 1949 war when the defeated Nationalist regime fled into exile in Taiwan.  Were President Xi to fulfil China’s pledge to bring the 23rd Province of China under Communist Party control or not is more a matter of when, not if, with D-Day shaped more by the judgement call whether America would (could?) stop him.

The big question is America’s ability to deter such an invasion, but as America’s starving of chips to Huawei has shown, invasion today no longer entails tanks and troops on the ground, or the streets of Taipei scorched by fire and stained with blood; simply cutting off the electricity and shutting down TSMC’s factories is all it would take to bring America and the rest of the western world to its knees.

For the hawks in China, what better time to do that than now, whilst the non-China world is still struggling with the Covid-19 pandemic, the US democracy and government has been battered by a brutal and divisive presidential election, the world is struggling with a global chip shortage and there is no global consensus whether Taiwan’s independence is worth angering China, especially for some countries, where China is their largest, or a crucial, trade partner.

Taiwan’s recovery back into the Communist fold is not just a sacred national mission, it would also signal that American global leadership had come to an end.  The only deterrent is if China feels it cannot complete the task at a bearable cost.  Once that fear is reconciled, there is little doubt China will act and, from a chip supply perspective, there will be nothing the rest of the world can do … as the automotive industry has realized, there is no Plan B.

https://www.futurehorizons.com/


Accelerating AI-Defined Cars

Accelerating AI-Defined Cars
by Manouchehr Rafie on 02-28-2021 at 10:00 am

Accelerating AI Defined Cars Figs

Convergence of Edge Computing, Machine Vision and 5G-Connected Vehicles

Today’s societies are becoming ever more multimedia-centric, data-dependent, and automated. Autonomous systems are hitting our roads, oceans, and air space. Automation, analysis, and intelligence is moving beyond humans to “machine-specific” applications. Computer vision and video for machines will play a significant role in our future digital world. Millions of smart sensors will be embedded into cars, smart cities, smart homes, and warehouses using artificial intelligence. In addition, 5G technology will be the data highways in a fully connected intelligent world, promising to connect everything from people to machines and even robotic agents – the demands will be daunting.

The automotive industry has been a major economic sector for over a century and it is heading towards autonomous and connected vehicles. Vehicles are becoming ever more intelligent and less reliant on human operation. Vehicle to vehicle (V2V) and connected vehicle to everything (V2X), where information from sensors and other sources travels via high-bandwidth, low-latency, and high-reliability links, are paving the way to fully autonomous driving. The main compelling factor behind autonomous driving is the reduction of fatalities and accidents. Realizing that more than 90% of all car accidents are caused by human failures, self-driving cars will play a crucial role in accomplishing the ambitious vision of “zero accidents”, “zero emissions”, and “zero congestion” of the automotive industry.

The only obstacle is vehicles must possess the ability to see, think, learn and navigate a broad range of driving scenarios.

The market for automotive AI hardware, software, and services will reach $26.5 billion by 2025, up from $1.2 billion in 2017, according to a recent forecast from Tractica. This includes machine learning, deep learning, NLP, computer vision, machine reasoning, and strong AI. Fully autonomous cars could represent up to 15% of passenger vehicles sold worldwide by 2030, with that number rising to 80% by 2040, depending on factors such as regulatory challenges, consumer acceptance, and safety records, according to a McKinsey report. Autonomous driving is currently a relatively nascent market, and many of the system’s benefits will not be fully realized until the market expands.

Figure 1 – Automotive AI market forecast for the period of 2017 through 2025

AI-Defined Vehicles

The fully autonomous driving experience is enabled by a complex network of sensors and cameras that recreate the external environment for the machines. Autonomous vehicles process the information collected by cameras, LiDAR, radar, and ultrasonic sensors to tell the car about its distance to surrounding objects, curbs, lane markings, visual information of traffic signals and pedestrians.

Meanwhile, we are witnessing the growing intelligence of vehicles and mobile edge computing with recent advancements in embedded systems, navigation, sensors, visual data, and big data analytics. It started with Advanced Driver Assistance Systems (ADAS), including emergency braking, backup cameras, adaptive cruise control, and self-parking systems.

Fully autonomous vehicles are gradually expected to come to fruition following the introduction of the six levels of autonomy defined by the Society of Automotive Engineers (SAE) as shown in Figure 2. These levels range from no automation, conditional automation (human in the loop) to fully automated cars. With increasing levels of automation, the vehicle will take over more functions from the driver. ADAS mainly belongs to Level 1 and Level 2 of automation. Automotive manufacturers and technology companies, such as Waymo, Uber, Tesla, and a number of tier-1 automakers, are investing heavily in higher levels of driving automation.

Figure 2 – Levels defined by SAE for autonomous vehicles

With the rapid growth of innovations in AI technology, there is a broader acceptance of Level 4 solutions, targeting vehicles that mostly operate under highway conditions.

Although the barrier between Levels 3 and Level 4 is mainly regulatory at this time, the leap is much greater between Levels 4 and 5. The latter requires achieving the technological capability to navigate complex routes and unforeseen circumstances that currently necessitate human intelligence and oversight.

As the automation levels increase, there will be a need for more sensors, processing power, memory, efficient power consumption, and networking connectivity bandwidth management. Figure 3 shows various sensors required for self-driving cars.

Figure 3 – Sensors (camera, LiDAR, Radar, Ultrasound) required for autonomous vehicle levels

The convergence of deep learning, edge computing, and the Internet of vehicles is driven by the recent advancements in AI automotive and vehicular communications. Another enabling technology for machine-oriented video processing and coding in visual data applications and industries is the emerging MPEG Video Coding for Machine (MPEG-VCM) standard. Two specific technologies are investigated for VCM:

  • Efficient compression of video/images
  • The shared backbone of feature extraction

Powerful AI accelerators for inferencing at the edge, standard-based algorithms for video compression and analysis for machines (MPEG-VCM), and 5G connected vehicles (V2X) play a crucial role in enabling the full development of autonomous vehicles.

The 5G-V2X and emerging MPEG-VCM standards enable the industry to work towards harmonized international standards. The establishment of such harmonized regulations and international standards will be critical to global markets of future intelligent transportation and AI automotive industry.

There are a number of possible joint VCM-V2X architectures for the future autonomous vehicle (AV) industry.   Depending on the requirements for the given AV infrastructure scenarios, we can have either centralized, distributed, or hybrid VCM-V2X architectures as shown in Figure 4.  Currently, most connected car automaker manufactures are experimenting with the centralized architecture with low-cost cameras.  However, as the cameras become more intelligent, distributed, and hybrid architectures due to their scalability, flexibility, and resource sharing capabilities can become more attractive.  The emerging MPEG-VCM standard also provides the capability of transporting the compressed extracted features rather than sending compressed video/images between vehicles.


Gyrfalcon Technology Inc. is at the forefront of these innovations by using the power of AI and deep learning to deliver a breakthrough solution for AI-powered cameras and autonomous vehicles — an unmatched performance, power efficiency, and scalability for accelerating AI inferencing at the device, edge, and cloud level.

The convergence of 5G, edge computing, computer vision, and deep learning, and Video Coding for Machine (VCM) technologies will be key to fully autonomous vehicles. Standard and interoperable technologies such as V2X, emerging MPEG-VCM standard, powerful edge, and onboard compute inferencing accelerator chips enable low-latency, energy-efficient, low-cost, and safety benefits to the demanding requirements of the AI automotive industry.

About Manouchehr Rafie, Ph.D.
Dr. Rafie is the Vice President of Advanced Technologies at Gyrfalcon Technology Inc. (GTI), where he is driving the company’s advanced technologies in the convergence of deep learning, AI Edge computing, and visual data analysis. He is also serving as the co-chair of the emerging Video Coding for Machines (VCM) at MPEG-VCM standards.  Prior to joining GTI, Dr. Rafie held executive/senior technical roles in various startups and large companies including VP of Access Products at Exalt Wireless, Group Director & fellow-track positions at Cadence Design Services, and adjunct professor at UC Berkeley University. He has over 90 publications and served as chairman, lecturer, and editor in a number of technical conferences and professional associations worldwide.


The Quest for Bugs: Dilemmas of Hardware Verification

The Quest for Bugs: Dilemmas of Hardware Verification
by Bryan Dickman on 02-28-2021 at 6:00 am

The Quest for Bugs Dilemmas of Hardware Verfication

Functional Verification for complex ASICs or IP-Core products is a resource limited ‘quest’ to find as many bugs as possible before tape-out or release. It can be a long, difficult and costly search that is constrained by cost, time and quality. The search space is practically infinite, and 100% exhaustive verification is an unrealistic and non-tractable problem. The goal is to deliver the highest possible quality (measured by both product performance and absence of bugs), achieving that in the shortest possible time (in order to maximise revenue), with the lowest possible costs.

Complexity continuously increases, and the functional verification challenge gets progressively harder. The search gets longer, and the bugs become increasingly more difficult to find. In practice, some bugs will be missed because verification is inherently imperfect and non-exhaustive. How do you find most of the bugs? How do you find all of the critical bugs? In “The Origin of Bugs” we asserted that:

Verification is a resource limited ‘quest’ to find as many bugs as possible before shipping.

Terms of Reference

So, what makes verification more challenging than other aspects of engineering when developing complex semiconductor products? Take other design workflows of semiconductor development such as RTL synthesis, timing analysis, place and route, power analysis and sign-off checks.  As a rule, these workflows tend to be reasonably deterministic. They may consume significant engineering resources (skilled people and compute), but there is usually a well-defined process to converge or iterate towards a result. When you get there, there is a measurable degree of completeness in the results. We’re generalizing of course. There are challenges and uncertainties with these workflows, but with verification the $64,000 question is always “Am I done yet?”. We don’t believe we can answer the verification sign-off question with the same degree of certainty as the design implementation sign-off. We also know that the consequences of missed critical bugs, can be potentially catastrophic. See “The Cost of Bugs” for a more detailed discussion on the cost of “not finding bugs”.

We characterize this verification uncertainty as the following set of dilemmas that need to be carefully navigated; completeness, complexity, constrained-random, resources and delivery.

The Completeness Dilemma

Verification is not an exact science. There is no way to achieve completeness or perfection. You have to decide when you have done enough work that you are satisfied that the risk of critical bugs remaining in the design is acceptably and manageably low.

Verification is an exercise in risk mitigation.

The problem comes down to the impossibly large state space in almost all modern ASICs. It’s a non-tractable problem. You might also think of this as the

“impossibly large state-space dilemma!”

This is also related to the complexity-dilemma which we will discuss in the next section.

Some sense of completeness can be achieved when verification targets have been met. Verification becomes a data-driven quest; it’s all about metrics and analysis, with verification sign-off driven by the assessment of metrics such as coverage, test-pass rates, cycles since last bug, cycles since last RTL change, formal properties proven, bug rates, etc. All of these sign-off criteria should be captured in the “test plan” and reviewed, revised and approved. You’re looking for a convincing absence of bugs despite measurable efforts to find further bugs in an RTL code base that has been stable for an acceptable period of time. You can’t be too complacent. The bugs are still in there. Your efforts so far have failed to find them!

All of your verification efforts will suffer from the completeness dilemma.

  • Take test planning, which is the normal starting point. The process to develop the test plan is one of design and specification analysis, brainstorming, reviewing, and iterative refinement. It’s probably the most important step; sometimes undertaken at the beginning of the project, sometimes evolving over time in a more agile fashion. Either way, it is impossible to develop complete test plans. Late found bugs can usually be traced back to a shortfall in the test plan, maybe a missing corner case, or even a gap where a whole class of behaviors have been omitted from the test plan.
  • Take coverage, which is commonly relied upon as a good measure of the completeness of verification stimulus. Code coverage does give some sense of absolute measure in that every line of RTL code and every branch and expression has been visited during testing. However, it does not inform us that functionality is correct, or about missing lines of code i.e., missing functionality. We know that code coverage is just one measure. Commonly teams will be using functional coverage to measure stimulus in terms of its ability to exercise functional behaviors observed as events or sequences or more complex combinations and crosses of coverage. These functional coverage points collectively form the coverage model, but how do you know that the coverage model is complete? Like test planning, it is another “best-effort” exercise that relies on a process of analysis and review, refinement and iteration. You may still omit an important functional coverage point which (had it been present) would have exposed a critical gap in your stimulus that was hiding a critical bug. Best to always assume,

If it’s not tested, it’s broken!

  • The same applies to most other aspects of verification. Even formal verification, where you might think there is some promise of completeness (by exhaustively proving properties) is an incomplete science. Many properties cannot be exhaustively proven, only shown to not fail for a given depth of cycles (bounded proofs). You have the same issue with your formal properties as you do with functional coverage and stimulus – how do you know for certain that you have planned and implemented a complete set of properties?
  • Ditto for system-level verification/system validation. You will not know that your testing payloads are complete and that they will be able to find all of the remaining bugs that were missed by previous levels of verification.

Sadly, some bugs will be missed.

Those hidden bugs might never present in the field over the lifetime of the product, meaning that the product is fit for purpose and therefore meets its requirements. However, you can’t be certain of what code from the software ecosystem will eventually run on your system. Software development tools may change over time and may produce different code sequences that were not present in the testing payloads at the time of verification sign-off. All of this runs the risk that eventually a previously unseen dark corner-case will be encountered with unpredictable consequences, potentially occurring for the first time in the field.  If you are lucky, there may be a viable and deployable software mitigation that does not significantly degrade performance or function, in which case, you got away with it! If you are unlucky and no such software mitigation is possible, you may be looking at a hardware mitigation or a costly product update. Again, see “The Cost of Bugs” for a more in-depth discussion about cost impacts of hardware bugs.

So, this lack of completeness is a dilemma for the developer of complex ASICs or IP-Cores. It is something to be constantly aware of and something to be considered at length when reasoning about the age-old question,

“Am I done yet?”

The Complexity Dilemma

The key causes of the completeness dilemma are the “impossibly large state-space” dilemma and the complexity dilemma. ASIC or IP-Core products only ever get more complex, they seldom deliver more capabilities by being simpler. We’re talking about complex hardware components or sub-systems such as CPUs, GPUs, ML/AI processors, integrated into multi-core SoCs and ASICs that may be multi-billion gate devices. For verification, it’s a good result if tools, methodologies and platform performances and capacities can at least keep up with complexity growth, let alone getting ahead of it. That’s the dilemma. Engineering teams need to understand complexity and curtail it wherever they can, but at the same time complexity is necessary to achieve performance, capabilities and ultimately, competitive advantage.  Complexity that is introduced to achieve performance and functional goals, is often harder to contain, and design teams are always innovating new architectures and microarchitectures that will set their product apart from the competition in terms of performance and functional capabilities.

Furthermore, complexity is not something that you may have set out to achieve. A once clean and elegant design can degrade over time into something containing a lot of technical debt, as the code is iterated and reworked, optimized and re-factored, potentially by multiple individuals over the lifetime of the development.

When the code author admits that they really no longer have a complete understanding, it’s time to panic!

Think about strategies to contain design complexity wherever possible. Reduce it by refactoring code, purging redundant code, and maintaining code readability and maintainability. Investigate suitable metrics to measure complexity if possible.

The Constrained-Random Dilemma

Over recent decades, constrained-random verification methodologies have become the norm. Given our understanding of the completeness dilemma (which means we acknowledge that it is impossible to identify all possible testing scenarios) random testing offers a way to find scenarios we had not specified thanks to “random chance”. The probabilities of hitting these unknown scenarios are increased by volume of testing.

If we run enough cycles, we will eventually hit it… probably, we hope!

But this philosophy has hugely driven up verification platform costs, and oftentimes we don’t really have a good understanding of how effective all those random cycles are at finding these unknown-unknowns. Of course, there is more to it than that. This is not a fully random strategy, it is ‘constrained-random’. We identify constraints for the stimulus generators to guide stimulus into specific areas of interest. We then use ‘coverage’ methods to measure the effect of the stimulus, and stimulus-generators and coverage models are refined over time. However, this strategy eventually leads to a saturation point, where we are no longer finding new bugs and are now running ‘soak’ cycles to build confidence and assurances that a “respectable” (you have to decide what that is) volume of bug-free testing has been achieved. Determining what this safe assurance level is can be difficult and you may be required to justify exceedingly large engineering costs based on your judged targets and analytics.

How do you know if you have sufficient constraints? Over-constraining means you might be missing some key areas of stimulus where bugs could be hiding. You might not realize that because your coverage model is not complete either! Some bugs may require pathological sequences that are just too improbable for the generator to produce. If you can identify these cases, you might be able to program them into your generator – but that requires you to realize that these cases exist. Shared learning from experience and historical projects can really help here.

Constrained-random suffers from the completeness dilemma and the resources dilemma.

The Resources Dilemma

How do you deliver the product on-time and to the right quality level?

You had better make sure that you are using the available resources in the most efficient and effective way possible.

At the risk of repeating ourselves, but let’s anyway…

Verification is a resource-limited quest to find as many bugs as possible before release.

Server racks with telecommunication equipment in server room

Resources are always finite and limited, regardless of whether your resources are on-prem or in the cloud. On-prem implies investment costs to establish infrastructure and development tool capacity, and then ongoing operational costs to operate and maintain the platforms. Cloud implies that there are some cost constraints or budget parameters that you have to operate within. If your capacity demand changes, there will be additional costs of acquisition and commissioning, but there is likely to be an availability lag as it inevitably takes time to expand on-prem capacities. If you are already using cloud to provision your infrastructure, the availability lag may not be there, but the incremental usage costs will be.

Let’s not forget the human resources side of this equation. Team sizes can flex, but people costs are the biggest resource cost in general, and you have to deliver your product within the constraints of the available team. Make sure that your teams have the right skills profile, the best tools, and are well-motivated and engaged, because staff turnover can be one of the most disruptive things to occur mid-project.

Engineering teams have to use the resources that are available to them, in the most effective and efficient way, to achieve the best verification results possible within these constraints.

Sometimes these constraints help to drive engineering innovation to improve the efficiency and effectiveness of verification workflows. How do we achieve better Quality of Results (QoR), from the same or less resources, in the same or less time, thus reducing the cost of verification and increasing product ROI? In our experience, engineers love to innovate, so direct them towards these challenging problems. After all,

Scarcity of resources, drives innovation.

Optimization is a never-ending quest that requires you to measure everything.  Optimize your workflows; optimize your tools and select the ones with the best performance; profile and optimize your testbenches and verification code to make them run as efficiently as possible.

Securing resources for your project is oftentimes a process of negotiation. Good resource forecasting is essential to ensure you have planned ahead for the resource demand, but these forecasts need to be reviewed and refined throughout the project. If you are competing for shared resources, human behavior can lead to negotiation tactics, e.g., figure out what you think you need and add a buffer or sandbag your estimates by an amount that you think you will be negotiated back down by! Forecasting really needs to be a transparent and data-driven process where predictions are accurate and based on best-practice analytics.

Conclusion – the Delivery Dilemma

Finally, you have to deliver your product on time and on cost. Endlessly polishing the product beyond the point where the documented goals and sign-off criteria are met will erode the product ROI and left un-checked can destroy it. Remember…

Perfection is the enemy of success.

The delivery dilemma can lead to some tough calls on occasions. It’s a matter of risk management. This is where you have to be very clear about what your sign-off criteria are and how you are measuring up against them. You started with good intentions and a comprehensive test plan, but now you need to assess status. Look at all the data and make a judgement call on the remaining risk.  You can think of this signoff in terms of “must do”, “should do”, “could do”, and

“Things that will help you to sleep better at night!”

By the time you get here you have probably achieved all of the first three items and are making a judgment call on the fourth. Consider the following:

  • Delaying the final release will block resources; people and infrastructure, that are needed to execute on the delivery of other revenue bearing projects and the overall business roadmaps.
  • Delaying the final release will increase the product cost and erode ROI.
  • Delaying the final release will have a downstream impact on the customer’s schedules, which in turn impacts their ROI (and potentially your future opportunities).

Get this wrong however, and you might incur substantial rework costs, opportunity costs, and reputational costs, as a consequence of an impactful bug!

Having made the release, there is still a window of opportunity where you could continue to make marginal improvements to the verification so that any new bugs can be intercepted and mitigated before the product is widely deployed into the field. As verification engineers, we know that some level of extended background verification can be a good investment of engineering resources, especially if we are still in a pre-silicon situation.

The real challenge is in deciding when to stop!

Although this paper does not prescribe solutions to these dilemmas, having an understanding of them can help in navigating good verification choices.

Happy sailing!

Also see: The Quest for Bugs Landing Page


Podcast EP9: Why Fabs and Fabless Matter

Podcast EP9: Why Fabs and Fabless Matter
by Daniel Nenni on 02-26-2021 at 10:00 am

Dan and Mike are joined by Ray Zinn, the longest serving CEO in Silicon Valley. Join us for a tour of the very beginnings of the semiconductor industry and the rise of the fabless movement. Beyond a historical perspective, Ray also discusses the importance of semiconductor technology and its impact on the world, governments and the US in particular.

“As the longest serving CEO in Silicon Valley, it is now my mission to enlighten, encourage and guide the next generation of entrepreneurs.”

My web: http://toughthingsfirst.com/
My blog: http://toughthingsfirst.com/blog/
My podcast: http://toughthingsfirst.com/tough-things-first-podcast/

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.