DAC2025 SemiWiki 800x100

TSMC 2022 Technology Symposium Review – Advanced Packaging Development

TSMC 2022 Technology Symposium Review – Advanced Packaging Development
by Tom Dillinger on 06-27-2022 at 6:00 am

3D blox

TSMC recently held their annual Technology Symposium in Santa Clara, CA.  The presentations provide a comprehensive overview of their technology status and upcoming roadmap, covering all facets of the process technology and advanced packaging development.  This article will summarize the highlights of the advanced packaging technology presentations – a previous article covered the process technology area.

General

TSMC has merged their 2.5D and 3D packaging offerings into a single brand – “3D Fabric”.  The expectations are that there will be future customers that pursue both options to provide dense, heterogeneous integration of system-level functionality – e.g., both “front-end” 3D vertical assembly, combined with “back-end” 2.5D integration.

Technically, the 2.5D integration of SoCs with “3D” high-bandwidth memory HBM stacks is already a combined offering.  As illustrated above, TSMC is envisioning a much richer mix of topologies in the future, combining 3D SoIC with 2.5D CoWoS/InFO as part of very complex heterogeneous system designs.

As with the process technology presentations at the Symposium, the packaging technology updates were pretty straightforward – an indication of successful, ongoing roadmap execution.  There were a couple of specific areas representing new directions that will be highlighted below.

Of particular note is the TSMC investment in an Advanced System Integration fab, which will support the 3D Fabric offerings, providing full assembly and test manufacturing capabilities.

2.5D packaging

There are two classes of 2.5D packaging technologies – “chip-on-wafer-on-substrate” (CoWoS) and “integrated fanout” (InFO).

(Note that in the figure above, some of the InFO offerings are denoted by TSMC as “2D”.)

The key initiative for both these technologies is to continue to expand the maximum package size, to enable a larger number of die (and HBM stacks) to be integrated.  As an example, the fabrication of the interconnect layers on a silicon interposer (CoWoS-S) requires “stitching” multiple lithographic exposures – the goal is to increase the interposer size in term of multiples of the maximum reticle dimensions.

  • CoWoS

CoWoS has expanded to offer three different interposer technologies (the “wafer” in CoWoS):

  • CoWoS-S
    • uses a silicon interposer, based on existing silicon wafer lithographic and redistribution layer processing
    • in volume production since 2012, >100 products for 20+ customers to date
    • the interposer integrates embedded “trench” capacitors
    • 3X max reticle size in development – to support a design configuration with 2 large SoC’s and 8 HBM3 memory stacks, with eDTC1100 (1100nF/mm**2)
  • CoWoS-R
    • uses an organic interposer for reduced cost
    • up to 6 redistribution layers of interconnect, 2um/2um L/S
    • 2.1X reticle size supporting one SoC with 2 HBM2 stacks in a 55mmX55mm package; 4X reticle size in development, with 2 SoCs and 2HBM2 in an 85mmX85mm package
  • CoWoS-L
    • uses a small silicon “bridge” inserted into an organic interposer, for high density interconnects between adjacent die edges (0.4um/0.4um L/S pitch)
    • 2X reticle size supports 2 SoCs with 6 HBM2 stacks 2023); 4X reticle size in development to support 12 HBM3 stacks (2024)

TSMC highlighted that they are working with the HBM standards group on the physical configuration of HBM3 interconnect requirements for CoWoS implementations.  (The HBM3 standard appears to have settled on the following for the stack definition:  capacity of 4GB w/four 8Gb die to 64GB w/sixteen 32Gb die; 1024-bit signal interface; up to 819GBps bandwidth.) These upcoming CoWoS configurations with multiple HBM3 stacks would provide tremendous memory capacity and bandwidth.

Also, in anticipation of much greater power dissipation in upcoming CoWoS designs, TSMC is working on appropriate cooling solutions, both improved thermal-interface-materials (TIM) between die and package, as well as transitioning from air to immersion cooling.

  • InFO

After accurate (face-down) placement orientation on a temporary carrier, die are encapsulated in a epoxy “wafer”.  Redistribution interconnect layers are added to the reconstituted wafer surface.  The package bumps are then connected directly to the redistribution layers.

There are InFO_PoP, InFO_oS, and InFO_B topologies.

As shown below, InFO_PoP denotes a package-on-package configuration, and is focused on integration of a DRAM package with a base logic die.  The bumps on the DRAM top die utilize through-InFO vias (TIV) to reach the redistribution layers.

  • InFO_PoP primarily for the mobile platform
  • over 1.2B units shipped since 2016

An issue with the InFO_PoP implementation is that currently the DRAM package is a custom design, and only able to be fabricated at TSMC.  There is an alternative InFO_B topology in development, where an existing (LPDDR) DRAM package is added on top, with assembly to be provided by an external contract manufacturer.

InFO_oS (on-substrate) enables multiple die to be encapsulated, with the redistribution layers and their microbumps connected to a substrate with TSVs.

  • in production for over 5 years, focus is on HPC customers
  • 5 RDL layers on the substrate, with 2um/2um L/S
  • the substrate enables a large package footprint, currently at 110mm X 110mm with plans for greater sizes
  • 130um C4 bump pitch

As depicted above, InFO_M is an alternative to InFO_oS, with multiple encapsulated die and redistribution layers, without the additional substrate + TSVs (< 500mm**2 package, production in 2H2022).

3D packaging

InFO-3D

There is a 3D stacked package technology that utilizes micro-bumped die integrated vertically with redistribution layers and TIVs, focused on the mobile platform.

3D SoIC

The more advanced vertical-die stacked 3D topology packaging family is denoted as “system-on-integrated chips” (SoIC).  It utilizes direct Cu bonding between the die, at an aggressive pitch.

There are two SoIC offerings – “wafer-on-wafer” (WOW) and “chip-on-wafer” (COW).  The WOW topology integrates a complex SoC die on a wafer providing deep trench capacitor (DTC) structures for optimal decoupling.  The more general COW topology stacks multiple SoC die.

The process technologies qualified for SoIC assembly are shown in the table below.

Design Enablement for 3DFabric, including 3Dblox

As illustrated in the upper right corner of the 3D Fabric image above, TSMC is envisioning complex system design-in-package implementations, combining both 3D SoIC and 2.5D technologies.

The resulting complexity in the design flow is great, as highlighted above, with advanced thermal, timing, and SI/PI analysis flows required (which can also deal with the model data volume).

To enable the development of these system-level designs, TSMC has collaborated with EDA vendors on three major design flow initiatives:

  • improved thermal analysis, using a coarse-grained plus fine-grained approach)
  • hierarchical static timing analysis
    • individual die are represented by an abstracted model, to reduce the total (multi-corner) data analysis complexity 
  • front-end design partitioning

To help accelerate the front-end design partitioning of a complex system, TSMC has pursued an initiative denoted as “3Dblox”.

The goal is to break down the overall physical package system into modular components, which are then integrated.  The module categories are:

  • bumps/bonds
  • vias
  • caps
  • interposers
  • die

These modules would be incorporated into any of the SoIC, CoWoS, or InFO package technologies.

Of specific note is that TSMC is driving an effort to enable 3D Fabric designs to use various EDA tools – that is, to complete physical design with one EDA vendor tool, and (potentially) use different EDA vendor products for support for timing analysis, signal integrity/power integrity analysis, thermal analysis.

3Dblox appears to take the concept of “reference flows” for SoCs to a new level, with TSMC driving interoperability between EDA vendor data models and formats.  The overall 3Dblox flow capability will be available in 3Q2022.  (A preliminary step – i.e., automated routing of redistribution signals on InFO – will be the first feature released.)

Clearly, TSMC is investing extensively in advanced packaging technology development and (especially) new fabrication facilities, due to the anticipated growth in both 2.5D and 3D configurations.  The transition from HBM2/2e to HBM3 memory stacks will result in considerable performance benefits to system designs utilizing CoWoS 2.5 technology.  Mobile platform customers will expand the diversity of InFO multi-die designs.  The adoption of complex 3DFabric designs combining both 3D and 2.5D technologies will no doubt increase, as well, leveraging TSMC’s efforts to “modularize” the design elements to accelerate system partitioning, as well as their efforts to enable a broad set of EDA tools/flows to be applied.

-chipguy

Also Read:

TSMC 2022 Technology Symposium Review – Process Technology Development

Three Key Takeaways from the 2022 TSMC Technical Symposium!

Inverse Lithography Technology – A Status Update from TSMC


Multiphysics, Multivariate Analysis: An Imperative for Today’s 3D-IC Designs

Multiphysics, Multivariate Analysis: An Imperative for Today’s 3D-IC Designs
by Daniel Nenni on 06-26-2022 at 6:00 am

Ansys Heat Map

Semiconductor manufacturers are under constantly increasing and intense pressure to accelerate innovative new chip designs to market faster than ever in smaller package sizes while assuring signal integrity and reducing power consumption. Three-dimensional integrated circuits (3D-ICs) promise to answer all these demands but, at the same time, they introduce a new level of design complexity that is challenging traditional tools and processes.

Manufactured by stacking dies and interconnecting them so they perform as a single device, 3D- ICs create new risks, including thermal build-up caused by greater density. Because they’re significantly larger than a typical system-on-a-chip (SoC), with longer interconnects, they need to be rigorously tested for faulty integration points as well as system-level failures.

However, most semiconductor development teams simply aren’t equipped to manage the difficult job of 3D-IC analysis and design validation. They’re burdened by a historical approach to SoC simulation that relies on a serial, step-by-step process, in which single-physics simulation tools are applied one by one. When engineers apply these disparate tools, and this serial process, to complex 3D-IC designs, they’re missing system-level interactions, connection points, consolidated thermal effects and other considerations for something to go seriously wrong.

As 3D-ICs become more common for advanced semiconductor applications, engineering teams need a new analytic approach that’s equally innovative. They need a single, open and proven platform to conduct concurrent, multivariate simulation and analysis across the entire product design. They need to consider multiple physics, quickly and simultaneously, at both the component and system levels.

An Open Platform for Optimizing Every Performance Aspect

Ansys’ industry-leading solutions for 3D-IC simulation and analysis provide engineering teams with best-in-class capabilities for optimizing every aspect of performance, including power integrity, reliability, electromagnetics (EM), thermal, computational fluid dynamics (CFD) and mechanical stress.

This is a resistance heatmap of a chip-package system with pin resolution The IR drop map and electromigration map can also be generated for power integrity and reliability sign-off of 3DIC system.

The comprehensive Ansys toolkit positions semiconductor engineering teams to assess stand-alone performance aspects like thermal conductivity, while simultaneously looking at every other critical metric. The entire 3D-IC design can be subjected to realistic operating conditions as an integrated system, beginning at the earliest design stage.

This is a temperature contour map of a chip-package system with die, interposer and package modeled by RHSC ET. The nodal temperature at each layer of the system can be displayed to identify the hotspot location for applying thermal integrity solutions.

Ansys provides a unified 3D-IC simulation platform incorporating our best-in-class solutions. For example, Ansys RedHawk-SC Electrothermal can be leveraged to verify the thermal hotspots, melting risk, local failure modes of each welding site, based on the electrical current load at that specific point. Ansys CFD capabilities can optimize the performance of fans and heat sinks as they generate airflows to cool the assembly. Ansys solutions can also analyze advanced performance aspects, such as low-frequency power oscillations, and predict their impact on the larger design.

Shown here is an analysis of mechanical stress/warpage, as well as thermal gradients, in a 3D-IC multi-die package. This is an example of a complex real-world problem that can only be solved quickly via a multiphysics, multivariate approach.

Not only does Ansys address all of these individual engineering challenges through best-in-class solvers, but it equips semiconductor development teams to conduct these studies simultaneously. Only Ansys supports this type of multiphysics, multivariate, concurrent approach that reveals critical design trade-offs at the system level, rapidly and at an early stage.

In Your Rush to Market, Don’t Shortchange Your Analysis

Faced with worldwide chip shortages, increasing performance demands, a lack of engineering talent, and an urgent need for low-cost innovation, semiconductor manufacturers may be tempted to focus on their existing set of serial processes and isolated, single-physics simulation tools. But these outdated methods are insufficient to capture the complexity of 3D-IC designs and their equally complex failure risks.

Different engineers, using different simulation and analysis tools, may actually work at cross-purposes. For instance, one team’s efforts to resolve a signal-integrity issue might inadvertently create a timing failure or thermal risk that needs to be resolved by another team — which then hands it back to the signal-integrity team. The result? The dreaded ping pong effect with costly delays, resource-intensive handoffs, and significant rework.

In contrast, the robust and comprehensive Ansys simulation platform supports synergistic, collaborative, and cross functional analysis. It’s fast and intuitive for the multidisciplinary design team to look at the holistic 3D-IC design and concurrently analyze novel physics, to optimize performance aspects from electrical reliability to mechanical and thermal stability.

Collaboration-Driven Innovation: The Wave of the Future

The world’s semiconductor leaders are realizing that true 3D-IC innovation requires a new level of collaboration and vertical integration. Only by removing traditional functional boundaries — and eliminating a single-physics, serial approach — can development teams accelerate the design cycle, drive down costs, and produce game-changing new performance innovations.

Ansys’ open, extensible, and powerful simulation platform for multiphysics 3D-IC simulation is purpose-built to realize this vision. By leveraging a unified platform with proven, best-in-class solutions for multiphysics and multiscale analysis, semi development teams can launch new designs quickly and collaboratively, without sacrificing analytic rigor or product confidence. Costly handoffs and rework are reduced as the cross functional team shares the same understanding of performance trade-offs and ultimate goals.

While it can be difficult to break down cultural and organizational barriers to collaboration and vertical integration, the rewards are well worth it, including faster time-to-market and higher levels of innovation. Replacing sequential analysis and a disparate toolkit with the Ansys platform to support concurrent, multiphysics, system-level design simulations is a critical first step.

Visit Ansys at DAC 2022

If you’re hoping to fully capitalize on the incredible promise of 3D-IC designs, you owe it to yourself to learn more about the Ansys platform for multivariate, multiphysics simulation. Visit Ansys at Booth #1539 at the Design Automation Conference (DAC), in San Francisco July 11-14. Request a meeting or product demo now to start supporting a new level of 3D-IC design optimization.

Also read:

A Different Perspective: Ansys’ View on the Central Issues Driving EDA Today

Unlock first-time-right complex photonic integrated circuits

Take a Leap of Certainty at DAC 2022


The Evolution of Taiwan’s Silicon Shield

The Evolution of Taiwan’s Silicon Shield
by Craig Addison on 06-25-2022 at 6:00 am

Silicon Shield 2025 Poster A4 size

The original Silicon Shield theory, as described in my 2001 book, stated that Taiwan’s role as producer of 90 per cent of the world’s IT products (at that time) protected it from an attack by China because the United States, acting in its own self interest, would come to the island’s defense. A similar scenario – involving oil, not electronics – occurred in 1990 when the US intervened after Iraq invaded Kuwait.

Fast forward a decade after the book, and much of Taiwan’s electronics production, including laptops and mobile phones, had moved to China – although it was still controlled by Taiwanese-owned companies like Compal, Foxconn and Quanta. (The transfer of Taiwanese chip technology to China was restricted, and still is).

The 2009 Silicon Shield documentary reflected this shift by arguing that China would refrain from attacking Taiwan because of the harm it would inflict upon itself. In other words, a Cold War-style mutually assured destruction (MAD) scenario would keep the peace.

So what is the Silicon Shield today?

Both of the above still apply in their own way, but some pundits now believe the Silicon Shield may even increase the risk of Taiwan being forcibly taken by China. The “Broken Nest” theory states that Taiwan should adopt a scorched earth policy and destroy TSMC et al in the event of an attack, thus reducing the island’s value to the invaders.

While the Broken Nest has its fair share of critics, a similar scenario was foreshadowed by one of the people interviewed for the 2009 documentary. Chih-Yuan Lu, former head of Taiwan’s Submicron Project and since then president of Macronix International, said Taiwan’s semiconductor industry could be compared to jade, the precious mineral valued by the Chinese.

“If you have valuable jade in your pocket and you cannot defend yourself, there are many robbers who will target you,” Lu said at the time. In the case of two parties fighting over ownership, “at the last moment they even want to break the jade” to prevent the other from having it, he explained.

Since the outbreak of the pandemic, semiconductors have been elevated from relative obscurity to an industry of keen interest to mainstream media and the general public. The same goes for Taiwan and its role in the hi-tech supply chain. These developments motivated me to revive the original Silicon Shield documentary for a new audience.

The result is “Silicon Shield 2025” – the year being a reference to the date Taiwan’s defense minister believes China will have the ability to invade. The new version, available for streaming on Vimeo On Demand, uses the same voice-over narration and video interviews from the 2009 production, but the content has been digitally remastered and updated with HD b-roll footage as well as new material to reflect recent events. Indeed, it is remarkable how much of the original documentary narrative from 13 years ago is relevant today, perhaps more so.

SemiWiki members choosing the “rent” option on Vimeo On Demand can watch “Silicon Shield 2025” free of charge by using the promo code CHIPS, which is valid until July 25.

In addition, be sure to check out The Chip Warriors podcast – the most recent episode being on Taiwan’s Chip Warriors, featuring the above mentioned C.Y. Lu, as well as legends like TSMC founder Morris Chang.

For those interested in how Taiwan got into this situation in the first place – caught between two superpowers – check out the Nixon’s China Choice podcast. Nothing about semiconductors here, but it is a fascinating look into the minds of Nixon, Kissinger and Halderman as they sought rapprochement with Communist China while trying not to sacrifice Taiwan in the process. Nixon failed in the  latter, but that set back – along with the loss of US diplomatic recognition under Carter in 1979 – provided the impetus for Taiwan’s leaders to take the enormous risk of betting their national survival on semiconductors.

Also read:

US Supply Chain Data Request Elicits a Range of Responses, from Tight-Lipped to Uptight

Losing Lithography: How the US Invented, then lost, a Critical Chipmaking Process

Why Tech Tales are Wafer Thin in Hollywood


Podcast EP90: A Tour of Cadence’s Cloud Solutions with Mahesh Turaga

Podcast EP90: A Tour of Cadence’s Cloud Solutions with Mahesh Turaga
by Daniel Nenni on 06-24-2022 at 10:00 am

Dan is joined by Mahesh Turaga, VP of Cloud Business Development at Cadence Design Systems. Mahesh brings extensive customer-facing experience to Cadence in business development, strategy, pre-sales, and consulting. He provides an overview of the cloud solutions provided by Cadence. The various business models, technical details and target customer profiles are all discussed.

Mahsh holds an MBA from Northwestern University – Kellogg School of Management and Ph.D in aeronautics, structural mechanics, composites and fluid dynamics from Purdue University.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


ASML EUV Update at SPIE

ASML EUV Update at SPIE
by Scotten Jones on 06-24-2022 at 6:00 am

12051 4 SPIE2022 Smeets 0.33 NA EUV systems for High Volume Manufacturing Page 07

At the 2022 SPIE Advanced Lithography Conference, ASML presented an update on EUV. I recently had a chance to go over the presentations with Mike Lercel of ASML. The following is a summary of our discussions.

0.33 NA

The 0.33 NA EUV systems are the production workhorse systems for leading edge lithography today. 0.33 NA systems are in high volume production for both logic and DRAM. Figure 1 illustrates the number of EUV layers for logic and DRAM (bars) and wafers exposed per year (area). Authors note, the 2021 values for logic are typical of foundry 5nm processes at 10+ EUV layers and 2023 logic would be in-line with foundry 3nm processes at ~20 layers, DRAM usages is currently ~5 layers. I asked mike about future DRAM exposures, he pointed out there are ~8 critical layers on a DRAM and eventually some of those layers could need multi-patterning bringing EUV exposure up to 10 per wafers.

Figure 1. EUV Adoption.

Through Q1 of 2022 ASML has shipped 136 EUV systems and ~70 million wafers have been exposed, see figure 2.

Figure 2. Number of EUV Wafers Exposed.

 System availability continues to improve, it is at a little less than 90% today. The new NXE:3600D is better than the NXE:3400C and provide ~93% availability. EUV system availability is getting close to DUV system levels (~95%).

Figure 3. Availability.

NXE:3600D systems can produce 160 wafers per hour (wph) at 30mJ/cm2, 18% better than the NXE:3400C. The NXE:3800E systems in development will provide >195 wph at 30mJ/cm2 initially, and 220 wph with throughput upgrades. The NXE:3600E will have incremental optical improvements in aberration, overlay and throughput.

Figure 4. Throughput Improvements.

Matched machine overlay for the NXE:3400C was 1.5nm and is 1.1nm for the NXE:3600D. The NXE:3600D uses the same new 12 wavelength alignment system as the newest DUV systems with just a few material differences due to vacuum use.

The ASML roadmap includes the NXE:4000F with >220wph around 2025, see figure 5.

Figure 5. System Roadmap.

Pellicles now achieve greater than 90% transmission and manufacturing has been transferred to Mitsui. I run into people from time to time who think Pellicles are a future EUV item, but Pellicles have been in production use on select layers for over a year.

Figure 6. Pellicle Performance.

Finally for the 0.33 NA system ASML is working on reducing the energy required for each exposure by increasing throughput and decreasing total energy.

Figure 7. Energy Per Exposure.

We discussed the ultimate resolution limits for 0.33 NA systems, in theory 0.33 NA can produce 26nm in a single exposure, currently Imec is working on 28nm single exposure, but it isn’t in production yet.

0.55 NA (High NA)

As described in the previous section 0.33 NA EUV is in high volume production. Leading edge foundry processes have now reached the 3nm “node” and double patterning with 0.33 NA EUV is becoming necessary. By raising the NA from 0.33 to 0.55 double patterned layers can be replaced with single exposures.

Figure 8 illustrates how DUV layers counts grew driven by process complexity and multipatterning until 0.33 NA EUV took over eliminating a lot of multipatterning. As 0.33 NA EUV multipatterning use grows 0.55 NA EUV can eliminate some multipatterning reducing layer counts again.

Figure 8. Mask Count Trends.

High-NA provides a better image log slope, stochastic defects are 3D and high-NA helps with defect reduction. ASML is working on attenuated phase shift masks for EUV to improve contrast and depth of field. They will be implemented for 0.33 NA first and then 0.55 NA later.

ASML’s roadmap has the first High NA system (EXE:5000) being installed in a lab at the ASML factory run jointly with Imec in 2023 for initial evaluation. EXE:5000 systems should be delivered to customers in 2024 and the production EXE:5200 system should be delivered to customers for production use around 2025, see figure 9.

Figure 9. High-NA System Roadmap.

The optics for High-NA are significantly larger than for 0.33 NA and require a unique design approach. 0.55 NA systems will have an anamorphic lens system with a 4x reduction ratio in one direction (the same as 0.33 NA) and an 8x reduction ratio in the orthogonal direction. Due to the size of the reticle and the 8x reduction, the printable field size is cut in half to 16.5nm in the scan direction, see figure 10.

Figure 10. Anamorphic Lens System.

Simulations show no direction differences between a half-field and full field exposure. Half-field exposures can be aligned to full field exposures so that existing DUV and 0.33 NA EUV systems can be used in a mix and match strategy with 0.55 NA systems. If necessary for large die, 0.55 NA half-field exposures can stitched together, possibly with a small stitch boundary for global connections.

Using research tools at The Center for X-Ray Optics at Berkely and Paul Scherrer Institut, ASML has been able to demonstrate High-NA EUV resolution down to 8, see figure 11.

Figure 11. 8nm Line/Spaces.

The 0.55 NA system design is broken up into 4 independently testable sub systems (see figure 12) and assembly of the first exposure tool to go into the ASML/Imec lab in 2023 has begun (see figure 13).

Figure 12. High-NA Sub Systems.

Figure 13. 0.55 NA System Integration.

ASML continues to work on increasing source power and has recently demonstrated >500 watts in research. Historically it has taken ~2 years for research developments to reach production. Figure 14 illustrates source power over time.

Figure 14. Source Power Trends.

0.7 NA

In a recent article Tom Dillinger discussed an interview with Mark Phillips of Intel and Mark mentioned 0.7 NA as a successor to 0.55 NA. I was surprised by this, I thought ASML had ruled out developing anything after 0.55 NA due to the high investments ASML has had to make on EUV. Mike said ASML hasn’t ruled out a 0.7 or greater NA system, they are looking at it. He said they have ruled out shorter wavelengths than the current 13.5nm (authors note, at one time there was some discussion of a shorter wavelength system 6.xnm). They do want any new system to be air shippable which limits how much bigger the system can be than the 0.55 NA systems.

Conclusion

0.33 NA EUV systems are now production work horse systems with continuously improving availability and throughput. 0.55 NA systems are expected to enter production in 2025 with higher resolution enabling process simplification. Beyond 0.55 NA ASML is looking at even higher NA systems. EUV is well positioned to continue to drive lithography resolution for the next decade.

Also Read:

Obscuration-Induced Pitch Incompatibilities in High-NA EUV Lithography

The Electron Spread Function in EUV Lithography

Double Diffraction in EUV Masks: Seeing Through The Illusion of Symmetry


Using STA with Aging Analysis for Robust IC Designs

Using STA with Aging Analysis for Robust IC Designs
by Daniel Payne on 06-23-2022 at 10:00 am

Gate Level Aging min

Our laptops and desktop computers have billions of transistors in their application processor chips, yet I often don’t consider the reliability effects of aging that the transistors experience in the chips. At the recent Synopsys User Group (aka SNUG), there was a technical presentation on this topic from Srinivas Bodapati, an engineer at Intel.

Device Aging

As transistors are switched on and off the drain currents can over time slowly decrease, this in turn changes path delays that make the chip speed slow down, and even fail to meet specifications. Device aging is now a first order problem when designing leading edge processor chips and GPUs. To manage power dissipation, many SoC design employ Dynamic Voltage Frequency Scaling (DVFS) techniques, yet the stress from running with a high VDD begins to impact circuit operation when in low VDD mode.

Gate Level Aging: Specification Failure

Device aging is dependent on workload, Voltage, Temperature and Frequency, and the two effects that cause transistor performance to shift over age are:

  • Bias Temperature Instability (BTI)
  • Hot Carriers Injection (HCI)

The device aging mechanisms for HCI and BTI are summarized in this table as a function of each factor:

Device Aging Mechanisms

At 14nm the main aging contribution was from BTI, but at 10nm it was from HCI effects. At the same time the End Of Life (EOL) drive currents increased by 1.65X, going from 14nm to 10nm.

Use Condition Problems

With DVFS circuits during the high VDD frequency mode there is stress to the transistors, which then impacts the circuit operation in low VDD mode. The delay of gates can become slower through aging, even to the point of getting out of specification, causing a timing failure.

During Static Timing Analysis (STA), the challenge is to model the workload dependency of aging, and consider that input slope plus output load impact aging. Consider an SoC example where there is a high performance core (PCore), an efficiency core (Ecore), fabric, and system IP blocks. These four types of IP have very different supply voltage ranges, and also temperatures. Trying to use the same static guard band for each IP block would be overly pessimistic for some scenarios, so using an existing aged library cannot really capture all of the various stress scenarios.

Aging for different circuits

 

STA Aging Model Complexity

In the example below there’s a launch path, and a capture path, but each path has a unique switching activity which then changes their aging degradation to be different amounts. For each path the effects of both BTI and HCI also need to be taken into account, as aging degradation depends on each

Launch path, Capture path

Old and New Approaches

The older approach was to use STA with Aged Libraries and then have path simulation for derates. The drawbacks of the older approach are that DVFS usage is not accounted for, the BTI vs HCI effects are not separated, and it required handcrafted paths. The other challenge is the productivity bottleneck, as the STA and simulation are typically handled by different expert teams, modeling aging involves multiple cycles of identifying paths, running simulations, and analyzing results to come up with derates, which can then be used for modeling aging, however, these derates can often be pessimistic.

The new approach is an aging-aware STA methodology, which has automated workload dependency, simulates actual paths, takes into account BTI and HCI tradeoffs, works within a single simulation structure, and supports scalability of aging mission profiles without trading off for accuracy and enabling them to find the actual worst-case.

Aging-Aware STA Flow

The Synopsys tool for this aging-aware flow is called PrimeShield, and there are two components:

  • Aging STA
  • Aging-aware SPICE simulation

Intel used the aging-aware SPICE simulation component, where the circuit designer specifies a set of paths for simulation in Simlink. This enabled the specify and create stress conditions and simulates with HSPICE creating a degradation file that is used to generate playback with fresh conditions to measure the aging impact. Aging-aware Simlink enables easier stress conditions creation and automates the impact of aging at various other stress condition, based on initial inputs.

Aging-Aware STA Flow

On the other hand aging-aware STA flow eases the methodology further by using aged libraries with mission profile information to calculate the impact of aging on the actual paths using the Synopsys PrimeTime’s PBA methodology. It also enables designers to configures the stress waveform by setting the cycle count, an activity factor, a signal probability, age time, and stress voltage ratio.

Results

Using the aging-aware flow they wanted to see the workload dependency of slack degradation, and the reference case is called slack2, where both the launch clock and capture clock have an activity factor of 0.2, shown in the table below:

Workload dependency of slack degradation

Slack2 is the reference scenario, with equal activity factors for launch and capture clocks. The other three scenarios have a variety of activity factors for launch and capture clocks, and the yellow table shows how the slack degradation increases for each scenario, with scenario slack82 having the worst case slack degradation. These results depend on the effects of HCI and BTI.

Running and plotting many paths to compare normalized degraded slack versus normalized reference slack is shown in the next plot. The legend shows four types of results:

  • Launch clock at 0.2, capture clock at 0.2 (l2c2)
  • Launch clock at 0.8, capture clock at 0.8 (l8c8)
  • Launch clock at 0.8, capture clock at 0.3 (l8c2)
  • Launch clock at 0.2, capture clock at 0.8 (l2c8)
Normalized results

This helps designers identify worst case corners for each IP block in a path aging flow.

Conclusions

Running STA with aging effects is quite complex, especially when using DVFS design techniques, and aging depends on workloads to get accurate answers. Intel designers working with Synopsys tools and AEs have developed an aging-aware STA flow that uses PrimeShield, Simlink and HSPICE together for path simulations. Reliability issues are now first order, so having automation for aging analysis in a STA flow is a must have feature.

Related Blogs

Scaling Safety Analysis. Reusability for FMEDA

Scaling Safety Analysis. Reusability for FMEDA
by Bernard Murphy on 06-23-2022 at 6:00 am

FMEDA generation

It is common when a new type of analysis is introduced in almost any domain that it works well enough for a while. Until it begins to struggle with growing problem size, prompting refinements to the methodology to allow continued scaling. We see this routinely in analytics for SoC design, so it should not be a big surprise that safety analysis, in the form of failure modes, effects and diagnostic analysis (FMEDA), is starting to look a little creaky. Which is no small concern. FMEDAs are the contracts passed up from IP developers to SoC integrators, providing assurance that safety weaknesses have been fully analyzed and mitigated. This is not a requirement we want to short-change because the analysis problem becomes too messy.

Configurability – the root cause

Most IPs are configurable, even in-house IPs, because to be useful as reusable components, they must be able to adapt to a variety of different SoC applications. There is no IP more configurable than a network-on-chip (NoC). The whole structure of the NoC will change depending on how many components it must connect. And what quality of service goals it must meet, how it should adapt to minimize congestion and so on.

All this configurability is essential to meet SoC design goals, but it comes with a downside. FMEDA is a flat characterization based on fault simulation, run on the component as configured. There is generally no way to analyze a parametrized IP before configuration. SoC integrators must run the analysis per IP, even on commercial components. IP suppliers will provide as much help as they can in the form of templates and advice, but the burden of final and lengthy FMEDA remains with the integrator. The SoC team must repeat this analysis if the configuration changes, all adding up to a lot of extra work.

The core problem is a lack of reusability in FMEDA. If this could be restructured to support reuse, then IP suppliers could provide a means to generate not only a configured IP but also an FMEDA for that IP. Integrators could avoid most of the effort in repeating flat analyses in this case

Redesigning FMEDA for reuse

FMEDAs, as they stand, are not parametrizable, but they could be generated through a combination of low-level safety models and a compiler which could read those models together with the configured IP RTL to determine how root causes will propagate to effects. Arteris IP has written a thought leadership paper on putting these ideas into practice. Cutting out much of the unnecessary rework in rebuilding FMEDAs. Conceptually this makes sense to me. Failure modes don’t change on configuration. There may be more or less of a certain type in some cases, added or subtracted in predictable ways. How these can propagate to effects also won’t change much except as perhaps you could analyze through interpolation between a few carefully selected configurations. Allowing a tool to compute the influence on the likelihood of failure. The concept seems very reasonable.

You could extend automated generation not only to generating IP FMEDAs but also to generating the SoC FMEDA. Apparently, leading semis in the automotive space already do something like this internally. The SoC generator must aggregate FMEDAs from the IPs. Applying in-context requirements and assumptions of use to abstract failure modes to those relevant to system behavior. Adding this functionality with IP FMEDA generation could take a lot of the pain out of safety analysis for SoC integrators.

You can learn more about this topic in this Arteris IP presentation HERE.

Also Read:

Why Traceability Now? Blame Custom SoC Demand

Assembly Automation. Repair or Replace?

Experimenting for Better Floorplans


Podcast EP89: An Overview of NXP’s MCX MCU Products with CK Phua

Podcast EP89: An Overview of NXP’s MCX MCU Products with CK Phua
by Daniel Nenni on 06-22-2022 at 10:00 am

Dan is joined by CK Phua of NXP. CK joined Philips Semiconductors in 1993 and worked in various roles including quality, applications engineering, product engineering and technical marketing. After Philips, CK joined Freescale in 2012 and rejoined NXP through the Freescale merger. CK is now a Product Manager for Microcontrollers in the Edge Processing Business Line.

CK provides a detailed overview of NXP’s MCX product line and its product families, including architecture and capabilities across a broad range of applications. The supporting development environment is also discussed, as well as security capabilities.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


TSMC 2022 Technology Symposium Review – Process Technology Development

TSMC 2022 Technology Symposium Review – Process Technology Development
by Tom Dillinger on 06-22-2022 at 5:00 am

finFLEX

TSMC recently held their annual Technology Symposium in Santa Clara, CA.  The presentations provided a comprehensive overview of their status and upcoming roadmap, covering all facets of process technology and advanced packaging development.  This article will summarize the highlights of the process technology updates – a subsequent article will cover the advanced packaging area.

First, here is a brief overview of some of the general observations and broader industry trends, as reported by C.C. Wei, TSMC CEO.

General

  • “This year marks TSMC’s 35th anniversary. In 1987, we had 258 employees in one location, and released 28 products across 3 technologies.  Ten years later, we had 5,600 employees, and released 915 products across 20 technologies.  This year in 2022, we have 63,000 employees, and will release 12,000 products across 300 technologies.” 
  • “From 2018 to 2022, the volume of 12” (equivalent) wafers has had an annual CAGR exceeding 70%. In particular, we are seeing a significant increase in the number of ‘big die’ products.”  (>500mm**2)
  • “In 2021, TSMC’s North America business segment shipped more than 7M wafers and over 5,500 products. There were 700 new products tapeouts (NTOs).  This segment represents 65% of TSMC’s revenue.”
  • “Our gigafab expansion plans have typically involved adding two new ‘phases’ each year – that was the case from 2017-2019. In 2020, we opened six new phases, including our advanced packaging fab.  In 2021, there were seven new phases, including fabs in Taiwan and overseas – advanced packaging capacity was added, as well. In 2022, there will be 5 new phases, both in Taiwan and overseas.” 
    • N2 fabrication: Fab20 in Hsinchu
    • N3: Fab 18 in Tainan
    • N7 and N28: Fab22 in Kaohsiung
    • N28: Fab16 in Nanjing China
    • N16, N28, and specialty technologies: Fab23 in Kumanoto Japan (in 2024)
    • N5 in Arizona (in 2024)
  • “TSMC has 55% of the worldwide installed base of EUV lithography systems.”
  • “We are expanding our capital equipment investment significantly in 2022.” (The table below highlights the considerable jump in cap equipment planned expenditures.)
  • “We are experiencing stress in the manufacturing capacity of mature process nodes. In 35 years, we have never increased the capacity of a mature node after a subsequent node has ramped to high volume manufacturing – that is changing.”
  • “We continue to invest heavily in “intelligent manufacturing”, focusing on precision process control, tool productivity, and quality. Each gigafab handles 10M dispatch orders per day, and optimizes tool productivity.  Each gigafab generates 70B data points daily to actively monitor.” 

For the first time at the Symposium, a special “Innovation Zone” on the exhibit floor was allocated. The recent product offerings from a number of start-up companies were highlighted.  TSMC indicated, “We have increased our support investment to assist small companies adopt our technologies.  There is a dedicated team that focuses on start-ups.  Support for smaller customers has always been a focus.  Perhaps somewhere in this area will be the next Nvidia.”

Process Technology Review

With a couple of exceptions discussed further on, the process technology roadmap presentations were somewhat routine – that’s not a bad thing, but rather an indication of ongoing successful execution of prior roadmaps.

The roadmap updates were presented twice, once as part of the technology agenda, and again as part of TSMC’s focus on platform solutions.  Recall that TSMC has specifically identified four “platforms” that individually receive development investment to optimize the process technology offerings:  mobile; high-performance computing (HPC); automotive; and IoT (ultra-low power).  The summaries below merge the two presentations.

N7/N6

  • over 400 NTOs by year-end 2022, primarily in the smartphone and CPU markets
  • N6 offers transparent migration from N7, enabling IP re-use
  • N6RF will be the RF solution for upcoming WiFi7 products
  • there is an N7HPC variant (not shown in the figure above), providing ~10% performance improvement at overdrive VDD levels

For N6, logic cell-based blocks can be re-implemented in a new library for additional performance improvements, achieving a major logic density improvement (~18%).

N5/N4

  • in the 3rd year of production, with over 2M wafers shipped, 150 NTOs by year-end 2022
  • mobile customers were the first, followed by HPC products
  • roadmap includes ongoing N4 process enhancements
  • N4P foundation IP is ready, interface IP available in 3Q2022 (to the v1.0 PDK)
  • there is an N5HPC variant (not shown in the figure above, ~8% perf improvement, HVM in 2H22)

As with the N7/N6, N4 provides “design re-use” compatibility with N5 hard IP, with a cell-based block re-implementation option.

The complexity of SoC designs for the automotive segment is accelerating.  There will be an N5A process variant for the automotive platform, qualified to AEC-Q100 Grade 1 environmental and reliability targets (target date: 2H22).  The N5A automotive process qualification involves both modeling and analysis updates (e.g., device aging models, thermal-aware electromigration analysis).

N3 and N3E

  • N3 will be in HVM starting in the second half of 2022
  • N3E process variant in HVM one year later; TSMC is expecting broad adoption across mobile and HPC platforms
  • N3E is ready for design start (v0.9 PDK), with high yield on the standard 256Mb memory array qualification testsite
  • N3E adds the “FinFLEX” methodology option, with three different cell libraries optimized for different PPA requirements (more at the end of this article)

Note that N3 and N3E are somewhat of an anomaly to the prior TSMC process roadmap.  N3E will not offer a transparent migration of IP from N3.  The N3E offering is a bit of a “correction”, in that significant design rule changes to N3 were adopted to improve yield.

TSMC’s early-adopter customers push for process PPA updates on an aggressive timeline, whether an incremental, compatible variant to an existing baseline (e.g., N7 to N6, N5 to N4), or for a new node.  The original N3 process definition has a good pipeline of NTOs, but N3E will be the foundation for future variants.

N2

  • based on a nanosheet technology, target production date: 2025
  • compared to N3E, N2 will offer ~10-15% performance improvement (@iso-power, 0.75V) or ~25-30% power reduction (@iso-perf, 0.75V); note also the specified operating range in the figure above down to 0.55V
  • N2 will offer support for a backside power distribution network

Parenthetically, TSMC is faced with the dilemma that the requirements of the different platforms have such a broad range of targets for power, performance, and area/cost.  As was noted above, N3E is addressing these targets with different libraries, incorporating a different number of fins that define the cell height.  For N2 library design, this design decision is replaced by a process technology decision on the number of vertically-stacked nanosheets throughout (with some allowed variation in the device nanosheet width).  It will be interesting to see what TSMC chooses to offer for N2 to cover the mobile and HPC markets, in terms of the nanosheet topology.  (The image below from an earlier TSMC technical presentation at the VLSI 2022 Conference depicts 3 nanosheets.)

NB:  There are two emerging process technologies being pursued to reduce power delivery impedance and improve local routability – i.e., “buried” power rail (BPR) and “backside” power distribution (BSPDN).  The initial investigations into offering BPR have quickly expanded to process roadmaps that integrate full BSPDN, like N2.  Yet, it is easy to get the two acronyms confused.

Specialty Technologies

TSMC defines the following offerings into a class denoted as “specialty technologies”:

  • ultra-low power/ultra-low leakage (utilizing an ultra-high Vt device variant)
    • requires specific focus on ultra-low leakage SRAM bitcell design
    • N12e in production, N6e in development (focus on very low VDD model support)
  • (embedded) non-volatile memory
    • usually integrated with a microcontroller (MCU), typically in a ULP/ULL process
    • RRAM
      • requires 2 additional masks, embedded in BEOL (much lower cost than the 12 masks for eFlash)
      • 10K write cycles (endurance specification), ~10 years retention @125C
    • MRAM
      • 22MRAM in production, focus is on improving endurance
      • 16MRAM for Automotive Grade 1 applications in 2023
  • power management ICs (PMIC)
    • based on bipolar-CMOS-DMOS (BCD) devices: 40BCD+, 22BCD+
    • for complex 48V/12V power domains
    • requires extremely low device R_on
  • high voltage applications (e.g., display drivers, using N80HV or N55HV)
  • analog/mixed-signal applications, requiring unique active and passive structures (e.g., precision thin-film resistors and low noise devices, using N22ULL and N16FFC)
  • MEMS (used in motion sensors, pressure sensors)
  • CMOS image sensors (CIS)
    • pixel size of 1.75um in N65, 0.5um in N28, transitioning to N12FFC
  • radio frequency (RF), spanning from mmWave to longer wavelength wireless communication; the upcoming WiFi7 standard was highlighted

“The transition from WiFi6 to WiFi7 will require a significant increase in area and power, to support the increased bandwidth requirements – e.g., 2.2X area and 2.1X power.  TSMC is qualifying the N6RF offering, with a ~30-40% power reduction compared to N16RF.  This will allow customers currently using N16RF to roughly maintain existing power/area targets, when developing WiFi7 designs.”

The charts below illustrate how these specialty technologies are a fundamental part of platform products – e.g., smartphones and automotive products.  The characteristic process nodes used for these applications are also shown.

Although the focus of smartphone development tends to be on the main application processor, the chart below highlights the extremely diverse requirements for specialty technology offerings, and their related features.  In the automotive area, the transition to a “zonal control” architecture will require a new set of automotive ICs.

N3E and FinFLEX

The FinFLEX methodology announcement was emphasized, with TSMC indicating “FinFLEX will offer full-node scaling from N5.”

As FinFET technology nodes have scaled – i.e., from N16 to N10 to N7 to N5 – the fin profile and drive current_per_micron have improved significantly.  Standard cell library design has evolved to incorporating fewer pFET and nFET fins that define the cell height (specified in terms of the number of horizontal metal routing tracks).  As illustrated above, the N5 library used a 2-2 fin definition – that is, 2 pFET fins and 2 nFET fins to define the cell height.  (N16/N12 used a 3-3 configuration.)

The library definition for N3E was faced with a couple of issues.  Mobile and HPC platform applications are increasingly divergent, in terms of their PPA (and cost) goals.  Mobile products focus on circuit density to integrate more functionality and/or reduced power, with less demanding performance improvements.  HPC is much more focused on maximizing performance.

As a result, N3E will offer three libraries, as depicted in the figure above:

    • an ultra low power library  (cell height based on a 1-fin library)
    • an efficient library (cell height based on a 2-fin library)
    • a performance library (cell height based on a 3-fin library)

The figure below is from TSMC’s FinFLEX web site, illustrating the concept (link).

Now, offering multiple libraries for integration on a single SoC is not new.  For years, processor companies have developed unique “datapath” and “control logic” library offerings, with different targets for:  cell heights, circuit performance, routability (i.e., max cell area utilization), and distinct logic offerings (e.g., wide AND-OR gates for datapath multiplexing).  Yet, the physical implementation of SoC designs using multiple libraries relied upon a consistent library per design block.

The unique nature of the FinFLEX methodology is that multiple libraries and multiple track heights will be intermixed within a block. 

After the TSMC Symposium, additional information became available.  A block design will alternate rows for the two libraries.  For example, a 3:2 block design will have alternate row heights accommodating cells from the 3-fin and 2-fin library designs.  A 2:1 block design will have alternate rows for cells from the 2-fin and 1-fin libraries.

TSMC indicated, “Different cell heights (in separate rows) are enabled in one block to optimize PPA.  FinFLEX in N3E incorporates new design rules, new layout techniques, and significant changes to EDA implementation flows.”

There will certainly be more information to come about FinFLEX and the changes to the general design flow.  Off-hand, there will need to be new approaches to:

    • physical synthesis
      • how will synthesis improve timing on a critical signal
      • will synthesis strive to provide a netlist with a balanced ratio of cells from the two libraries for the alternating rows

For example, to improve timing on a highly-loaded signal, synthesis would typically update a cell assignment in the library to the next higher drive strength – e.g., NAND2_1X to NAND2_2X.

With FinFLEX, additional options are available with the second library – e.g., whether an update to NAND2_1X_2fin would use NAND2_2X_2fin or NAND2_1X_3fin.  Yet, if the latter is chosen, the new cell will need to be “re-balanced” to a different row in the block floorplan.  The effective changes in performance and input/output wire loading for these choices are potentially quite complex to estimate during physical synthesis.

The cell selection options get even more intricate when considering specific flop cells to use, given not only the differences in clock-to-Q delays, but also the setup and hold time characteristics, and input clock loading.   When would it be better for individual flop bits in a register to use different output drive strengths in the same library (and be placed locally) versus having register bits re-balanced to a row corresponding to a different library selection?

With an alternating row configuration, the assumption is that there will be an even mix of cells from the two libraries.  Yet, the synthesis of a block may only require a small percentage of “high-performance” cells to meet timing objectives.   An output netlist without a balanced mix of library cells may have low overall utilization, suggesting a uniform row, single-library block floorplan may be suitable instead.  This may result in iterations in the chip floorplan (and likely, revisions in the power distribution network, as well).

    • sub-block level IP integration

Blocks often contain a number of small hard IP macros, such as register files (typically provided by a register file generator).  With non-uniform row heights, the algorithms in the generator become more complex, to align the power continuity between the macro circuits and the cell rows.  And, there will be placement restriction rules that will need to be added to the hard IP models.

    • timing/power optimizations during physical design

Similarly to the physical synthesis block construction options, there will be difficult decisions on cell selection during the timing and power optimization steps in the physical design flow.  For example, if a cell can reduce its assigned drive strength to save power while still meeting timing, would a change in library selection, and thus row re-balancing, be considered?  Would the corresponding changes in the cell placement negate the optimization?

and, last but most certainly not least,

    • Will there be new EDA license costs to enable N3E FinFLEX?

(Years ago, the CAD department manager at a previous employer of mine went viral at the license cost adder to enable placement and routing for multipatterning requirements.  Given the significant EDA investment required to support FinFLEX, history may repeat itself with additional license feature costs.)

The FinFLEX methodology definitely offers some intriguing options.  It will be extremely interesting to see how this approach evolves.

Analog design migration automation

Lastly, TSMC briefly highlighted work they are pursuing in the area of assisting designers migrate analog/mixed-signal circuits and layouts to newer process nodes.

Specifically, TSMC has defined a set of “analog cells”, with the capability to take an existing schematic, re-map to a new node, evaluate circuit optimizations, and migrate layouts, including auto-placement and (PG + signal) routing.

The definition of the analog cell libraries for N5/N4 and N3E are complete, with N7/N6 support to follow.  TSMC showed an example of an operational transconductance amplifier (OTA) that had been through the migration flow.

Look for more details to follow. (This initiative appears to overlap with comparable features available from EDA vendor custom physical design platforms.)

A subsequent article will cover TSMC’s advanced packaging announcements at the 2022 Technology Symposium.

-chipguy

Also read:

Three Key Takeaways from the 2022 TSMC Technical Symposium!

Inverse Lithography Technology – A Status Update from TSMC

TSMC N3 will be a Record Setting Node!


Qualcomm’s AI play

Qualcomm’s AI play
by Anand Joshi on 06-21-2022 at 10:00 am

int nvda qcom

Qualcomm is a common name in mobile industry for chips. The company has generated $33 billion in revenue in 2021 and continues to march ahead with its innovations. However, Qualcomm doesn’t get the same visibility and mention as Nvidia and Intel in the world of AI chips. By our estimate, Qualcomm’s contribution to AI chip market is comparable to Intel and Nvidia given the volume shipment of smartphones and silicon content dedicated to AI in recent years. Qualcomm has been steadily making progress on key AI chip markets and perhaps has the most diverse and comprehensive portfolio to cater all AI chip markets.

Figure shows different segments within AI chip market and products in each

AI chip market has grown significantly in the past few years and you can read all about it in JP Data’s latest report on AI chips. According to the analysis, overall AI chip market can be best segmented by power consumption: data center AI chips segment (50+W), mid power AI chips  (5-50W, primarily for automotive and such markets), low power AI chips (0.1-5W, primarily for mobile and client computing) and ultra-low power AI chips (<0.1W for always on applications).  There’s no sign of slowdown in AI yet with enterprises as well as edge device markers eager to test out new solutions. Many use cases and exciting applications are continuing to emerge. Proof of concept applications that are going into production are driving the need for AI inference chips.

Qualcomm is poised to play in all markets which sets it apart from other companies. For the data-center market, the company has introduced AI100 chip and results submitted on MLPerf compete well with Nvidia. Qualcomm boasts its significantly higher performance per watt than the competition. Qualcomm is actively adapting its Snapdragon product line to support automotive market and recently claimed design wins at BMW. Qualcomm’s dominance in low power market segment within mobile world is well known and needs no introduction. The same chips offers ultra low power mode for always on applications enabling a whole new set of AI use cases for device manufacturers.

This makes its portfolio even more comprehensive than Nvidia and Intel if we keep training aspect aside. Nvidia for example, doesn’t have products in the mobile space and neither does Intel. Intel and Nvidia don’t have solutions for ultra-low power market either.

Qualcomm was somewhat late to the party and focused earlier on accelerating AI via enhancing its Hexagon DSP and Adreno GPU. The company then acquired Nuvia to create new AI accelerator. At Microsoft’s 2022 Build conference, the company announced Project Volterra, a new device powered by Snapdragon chips that contain AI accelerator, NPU.  The dedicated accelerator will become part of Microsoft’s Windows 11. Via the included SDK to build AI applications, the chip will enable AI usage within large number of Windows applications to potentially challenge X86 dominance in PC world.

Qualcomm has invested heavily into AI since. Qualcomm announced 100 million AI fund way back in 2018, has aggressively invested in AI R&D and released SDK that allows developers to take a model and customize it for mobile, automotive, IoT, robotics or other markets.  While there is no data on active AI developers for Qualcomm, we expect the number to be much lower than bragging rights gained by Nvidia and Intel. In fact, Google trends  search reveals that the searches for Qualcomm AI are far below Nvidia AI or Intel AI suggesting that there’s a lot of catching up to do.

The AI chip market is still emerging. Nvidia has become de-facto standard in training but the inference market is just starting its ramp up. If Qualcomm is indeed able to offer a consistent software experience across different market segments, it has a potential to become a formidable player in the AI chip market.

Also read:

A Fresh Look at HLS Value

How to Cut Costs of Conversational AI by up to 90%

HLS in a Stanford Edge ML Accelerator Design