Semiwiki EDA Webinar 800x100

Stop-For-Top IP Model to Replace One-Stop-Shop by 2025

Stop-For-Top IP Model to Replace One-Stop-Shop by 2025
by Eric Esteve on 06-17-2022 at 6:00 am

ALL Interface 2021 2026

…and support the creation of successful Chiplet business

The One-Stop-Shop model has allowed IP vendors of the 2000’s to create a successful IP business, mostly driven by consumer application, smartphone or Set-Top-Box. The industry has dramatically changed, and in 2020 is now driven by data-centric application (datacenter, AI, networking, HPC…), requiring best-in-class, high performance IP developed on bleeding edge technology nodes.

That’s why the Stop-For-Top IP model should replace the One-Stop-Shop model during the 2020 decade and allow to supply the right IP more efficiently to the demanding customer involved in data-centric application.

The next step will be to develop and market chiplet created from the Stop-For-Top IP portfolio, to help chip makers to overcome Moore’s law limitation and accelerate TTM for systems developed on technology nodes at 3nm and below. We think the IP vendors selecting Stop-For-Top IP model strategy will be the best positioned to offer chiplet at the right time when the semiconductor industry will need this innovation to overcome Moore’s law limitation.

 During the 2010 decade the successful business model for Interface IP has been the One-Stop-Shop IP model. Offering to the IP customer a single place where he could buy several functions was a good way to help him taking the decision to buy, instead of to make, while minimizing admin and legal task. It was faster to negotiate and sign IP license contract with only one supplier than with many.

But the nature of modern IP has changed, as they can’t be seen anymore as a commodity being cheaper to buy than to make. If we consider the star interface IP licensed for multi-million dollar, like PCIe 6 or CXL, DDR5 or HBM memory controller or PAM4 112G SerDes, designed on the most advanced technology node, performance, reliability, and robustness are now the essential pilar for decision making.

We have shown that the Interface IP market has been extremely healthy on the 2016 to 2021, growing with 20% CAGR, passing from $520 million in 2016 to $1300 million in 2021. If we consider the 2021 to 2026 forecast of the interface IP category, there are clearly two groups of protocols.

The first group include PCIe and CXL, DDR memory controller, Ethernet and SerDes and chip-to-chip protocols. For these protocols, the largest part of IP revenues is generated by the most advanced functions targeting bleeding edge technology nodes.

For the other protocols, the group of USB, MIPI, SATA or HDMI, both the weight and growth rate are lower. It’s not a coincidence if in the last group, protocols are used in consumer type of applications like smartphone, PC or TV, or even automotive. Protocols from the first group are requested in applications like datacenter, HPC, networking, wireless base station, storage, etc. that we can summarize by enterprise. Sounds like the old battle, consumer vs enterprise.

We have reworked the interface IP forecast for the next five years to extract the high-end part of PCIe and CXL, DDR memory controller, Ethernet and SerDes and chip-to-chip IP products, which are targeting advanced technology nodes, 7nm and below. The result can be synthetized on Table 1.

High-End Protocols Interface IP Forecast 2021-2026 Table

It can be interesting to compare these results with the total generated by all interface IP protocols for the same period:

All Protocols Interface IP Forecast 2021-2026-Graphic

If in 2021 the high-end part of interface IP revenues are slightly less than 50% of the total, this part is constantly growing to reach 72% in 2026. The reason is linked with the five years CAGR, much larger for the group of high-end part.

For 2010 decade, two EDA vendors have successfully deployed One-Stop-Shop strategy, mostly targeting the interface protocol category, and have created a successful IP business. Synopsys has combined 55% market share (or $727 million) in 2021 in interface IP category by supporting every protocols. On top of PCIe and CXL, DDR memory controller, Ethernet and SerDes, Synopsys supports USB, MIPI, SATA, HDMI and Display Port. These added interconnect protocols are intensively used in consumer, industrial and automotive applications, but almost not selected in the “star” applications of the 2020 decade, the data-centric (datacenter, hyperscale, networking, HPC, AI, etc.).

The main question is to know if it will be possible to create a successful IP business during the 2020 decade by focusing only on the high-end data-centric interconnect protocols developed on advanced technology nodes, 7nm and below? If we consider the 2021 to 2026 forecast of high-end IP (Table 1), the segment which was looking like a niche market in 2020 is expected to become a two-billion-dollar market in 2026. The question becomes: would a vendor employing all engineering resources to support high-end data-centric interconnect protocols be able to reach 25% market share in 2026 and create a successful $500 million business?

An IP vendor able to position on Top IP only, by moving from well-known “One-Stop-Shop” model (selected by Synopsys and Cadence in the 2000’s) to “Stop-For-Top” model, will generate a better ROI. This IP vendor will differentiate from Synopsys and Cadence and extract higher IP revenues growth!

The goal is clear, the strategy will have to be defined and fine-tuned for each data-centric protocol, keeping in mind that the long-term process must be completed by the second step, market deployment of application specific chiplet, with specification based on the high-end data-centric IP portfolio. Stop-For-Top IP strategy is now clearly defined.

To fulfill the need for ever increasing bandwidth has put pressure to move faster to target bleeding edge technology nodes and to release faster new version of interconnect protocols (PCIe, Ethernet, memory controller). Innovation like PAM4 modulation and creation of DSP-based SerDes to replace old, 100% analog-based technique, were implemented, allowing to break the 100Gbps barrier. Innovative architecture has been defined pushing adoption of new standard like CXL, supporting cache-coherent memory sharing for processor, co-processor and AI accelerator, or Chip-to-Chip protocol between main SoC and chiplet, allowing to pass the technological area limit and offer more powerful system in a single package to support ever-increasing needs to compute and interconnect data flow, like in the year 2000 using a SoC has led to smartphone explosion.

If we synthetize, the next technology revolution will require using top interconnect and IP vendors will have to propose best-in-class interface IP, to create a successful IP business based on Stop-For-Top IP positioning. We think that offering Stop-For-Top IP should be the first step of a strategy, the final goal being to offer a chiplet portfolio, built by integrating already available interconnect IP into an Integrated Chip (IC), commonly named chiplet. To support this strategy, IP vendor will have to rely on a pool of dedicated resources specialized in ASIC design service. IP vendor will have to build this engineering team, whether organic or inorganic, by acquisition of an ASIC design service vendor.

Chip maker developing SoCs for high-end applications, such as HPC, datacenter, AI or networking are likely to be early adopters for chiplet architectures. Specific functions, like AI accelerators, Ethernet, PCIe or CXL standards should be the first interface candidate for chiplet designs. When these early adopters have demonstrated the validity of heterogeneous chiplet architecture, leveraging multiple different business models, and obviously the manufacturing feasibility for test and, packaging, it will create an ecosystem that is critical to support this new technology. At this point, we can expect a wider market adoption, not only for high-performance applications.

Like it was the case for Design IP sourcing to build a SoC in the 2000’s, the buy or make decision for chiplet sourcing to complete a system design, will be weighted between core competency protection and sourcing of non-differentiating functions. The historical and modern-day Design IP business growth since the 2000’s has been sustained by continuous increase of external sourcing. Both models will coexist (chiplet designed in-house or by a vendor) but history has shown that the buy decision eventually overtakes the make.

IPnest believes this trend will have two main effects in the interface IP business, one will be the strong growth of D2D IP revenues on the short term (2021-2025), and the other is the creation of the heterogenous chiplet market issued from Stop-For-Top IP portfolio. This market is expected to consist of complex protocols functions like PCIe, CXL or Ethernet. Even IP vendors delivering USB, HDMI, DP, MIPI interface IP integrated in SoCs I/O may decide to deliver I/O chiplet instead.

The Stop-For-Top IP model is the first step of a successful strategy followed by the creation of a chiplet portfolio by IP vendors to support industry need for open chiplet ecosystem. This ecosystem is needed by the semiconductor industry to overcome Moore’s Law limitation and reach the trillion dollars during the 2020 decade.

By Eric Esteve (PhD.) Analyst, Owner IPnest

This white paper has been sponsored by Alphawave IP, nevertheless the content reflects the author’s positioning about the IP market and the way it expected to evolve in the future, during the 2020 decade. To read the complete white paper:

https://www.awaveip.com/en/news-views/stop-for-top-ip-model-to-replace-one-stop-shop-by-2025-and-support-the-creation-of-successful-chiplet-business/

Also read:

Die-to-Die IP enabling the path to the future of Chiplets Ecosystem

Design IP Sales Grew 19.4% in 2021, confirm 2016-2021 CAGR of 9.8%

Alphawave IP and the Evolution of the ASIC Business

Demand for High Speed Drives 200G Modulation Standards


Three Key Takeaways from the 2022 TSMC Technical Symposium!

Three Key Takeaways from the 2022 TSMC Technical Symposium!
by Daniel Nenni on 06-16-2022 at 12:10 pm

TSMC Technology Roadmap 2022

The TSMC Technical Symposium is today so I wanted to give you a brief summary of what was presented. Tom Dillinger will do a more technical review as he has done in the past. I don’t want to steal his thunder but here is what I think are the key takeaways. First a brief history lesson.

The history of TSMC Technology Development with 12 key milstones:

In 1987 TSMC was founded with the creation of the PurePlay business model.

In 1999 TSMC was the first foundry to offer 0.18 micron copper technology.

2001 brought the first foundry reference design flow. I participated in this with multiple EDA and IP vendors and I can tell you first hand that TSMC spent a huge amount of money creating the massive EDA and IP ecosystem we enjoy today.

In 2011 TSMC brought HKMG 28nm to the fabless ecosystem. Other foundries faltered at 28nm so this was a record breaking node for TSMC.

2012 brought CoWos, the first heterogenous 3DIC test vehicle.

In 2014 TSMC delivered the first fully functional FinFET networking processor which began the FinFET era that TSMC dominates today.

In 2015 TSMC qualified InFo, the advanced 3DIC packaging technology.

In 2018 TSMC delivered the most advanced logic technology (N7) available to all.

In 2020 TSMC lead the industry with N5 EUV based logic technology.

In 2021 TSMC launched N4P, N4X, and N6RF.

In 2022 TSMC will launch what will be the most advanced N3 process nodes covering a wide range of vertical markets. N3 will also break the record for tapeouts in a 5 year period, my opinion.

And last but not least, in 2022 TSMC announced the next generation process technology for the masses (N2).

Takeaway #1

TSMC will continue to invest in mature node and specialty technologies with a 1.5x capacity expansion from 2021 to 2025 which includes Fabs F14P8 (Tianan), F16 P1B (Nanjing), F22 P2 (Kaohsiung) and fab F23 P1 (Kumamoto Japan).

TSMC also announced an Integrated Specialty Technology Platform for NVM, HV, Sensor, PMIC, ULP/ULL, analog, and RF technology. Tom Dillinger will go into more detail here.

Takeaway #2

TSMC will continue scaling N3. N3 is on track for HVM in 2H 2022. N3E follows in 2H 2023 with improved performance/power and low process complexity for both mobile and HPC applications.

N3E PPA versus N5 comes in at 18% speed at same power or 34% power reduction at same speed, and a 1.6x logic density increase.

More importantly, TSMC announced FinFlex: Ultimate Design Flexibility for N3. TSMC just published a blog on FinFlex with more detail. Tom Dillinger will also have his say on this so stay tuned. Bottom line: you can change fin configurations to further optimize designs for area, speed, and power.

Takeaway #3

TSMC will use nanosheet transistors for N2. Not a huge surprise since Intel and Samsung have already made announcements but there is much more here than meets the eye. N2 PPA vs N3E is expected to be a 15% speed improvement at the same power or 25-30% power improvement at the same speed, and > 1.1x density. N2 is expected in 2025.

TSMC also discussed device architecture futures which included Nanosheet, CFET, 2D TMD, and CNT. We will be writing more about this later.

Bottom line: One thing we must all remember is that there is a distinct difference between a PurePlay and an IDM foundry. TSMC must produce the most cost effective, wide ranging process technologies with a fully supported ecosystem for hundreds of products. IDM foundries can pick and choose what is important and don’t have to worry about wafer margins. Semiconductor insiders know this but the media does not so expect continued misinformation in the coming days, absolutely.

So many more things were presented. If you have questions post them in the comments section and I will get the answers for you, absolutely.

Also Read:

Inverse Lithography Technology – A Status Update from TSMC

TSMC N3 will be a Record Setting Node!

Intel and the EUV Shortage


Podcast EP87: How Axiomise Addresses the Verification Challenge

Podcast EP87: How Axiomise Addresses the Verification Challenge
by Daniel Nenni on 06-16-2022 at 10:00 am

Dan is joined by GD Bansal, COO at Axiomise.  Dan explores the Axiomise business model to provide training and consulting services for formal verification with GD. The benefits and challenges of using formal verification on complex designs are discussed, along with the benefits of the Axiomize vendor-neutral approach to deploying state-of-the-art tools.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


HLS in a Stanford Edge ML Accelerator Design

HLS in a Stanford Edge ML Accelerator Design
by Bernard Murphy on 06-16-2022 at 6:00 am

AI for stanford min

I wrote recently about Siemens EDA’s philosophy on designing quality in from the outset, rather than trying to verify it in. The first step is moving up the level of abstraction for design. They mentioned the advantages of HLS in this respect and I refined that to “for DSP-centric applications”. A Stanford group recently presented at a Siemens EDA-hosted webinar, extending this range to building ML accelerators for the edge. Their architecture is built around several innovations, also an enthusiastic endorsement of the value of HLS in designing the accelerator core.

Key Innovations

Karthik Prabhu, a doctoral candidate in EE at Stanford, presented their Chimera SoC, with a goal to support training at the edge with excellent performance yet still at edge-like low-power. For this purpose their design uses resistive RAM (RRAM) for weight, eliminating need to go off-chip for this data. The SoC architecture supports scale-out to multiple chips, something they call an Illusion system, with chip-to-chip interfacing (protocol not mentioned). I would imagine this might be even more effective in a multi-chiplet implementation, but as a proof of concept I’m sure the multi-chip version is enough.

For ResNet-18 with ImageNet they measured energy at 8.1 mJ/image, latency at 60 ms/image, average power at 136 mW and efficiency at 2.2 TOPS/W. Given that the intent is support on-chip training, they do note RRAM drawback in high write-energy required and relatively low write endurance. The tests they apply seem to converge on training within the endurance bound, however they didn’t mention how they overcome the energy issue in training.

Architecting the accelerator core

This section could have been taken direct from an earlier Siemens EDA tutorial. The team started with a convolution algorithm (6 nested loops in this case) over input activations, weights and output activations. Their goal was to map that to a systolic array of processors, considering many possible variables in the architecture. How many PEs they might need in the array, how many levels in memory hierarchy, and how should they size the buffers in that hierarchy? In data optimization, were they going to prefer weight stationary, output stationary or row stationary?

They used Interstellar to optimize architecture. This is an open-source tool for design space exploration of CNN accelerator architectures, also from Stanford. I think this is pretty cool. They input a neural net basic spec (layers in network and tensor sizes), a range of memory sizes to explore, along with cost info for a MAC, a register file and a memory. Based on this input, Interstellar told them they should use a 16×16 systolic array with a 9-wide vector inside each PE. They needed a 16KB input buffer, no weight buffer and a 32KB accumulation buffer. And many more details!

Implementation

The Chimera team used Catapult to implement the accelerator, which they were able to accomplish in 2-3 months. This was a timeframe they reasonably argued would not have been possible if they were implementing in RTL. They also stressed another advantage – they made heavy use of C++ templates to parametrize much of the implementation. Simplifying adjusting implementation details, from buffer sizes to changing how weights were distributed to reduce wiring congestion. This level of parametrization also made it easy to reuse the implementation for follow-on designs.

There’s a nice description of the verification flow. All the test development was at the C++ level, allowing for fast testing; a 10 second simulation in C++ versus a 1-hour parallelized simulation in RTL. (Catapult also generated the infrastructure to map this to RTL testing.) They caught almost all bugs at C++ and could experiment with design tweaks given the fast turn-around. This also allowed them to verify training, requiring many samples to run through the design. C++-based simulation made this possible.

An interesting bottom line to this work is that they implemented Chimera in 40nm (I’m guessing for the RRAM support?) A comparison SoC, implemented in 16nm, shows higher core energy and about the same energy and latency/image. Not bad! All in all, a useful validation from an obviously credible academic research source. You can watch the session HERE.


Seeing 1/f noise more accurately

Seeing 1/f noise more accurately
by Don Dingee on 06-15-2022 at 10:00 am

Decimation chain speeds up measurements for 1/f noise

Electronics noise is often described as “white,” spread evenly across a band, typical on older semiconductor processes where thermal and shot noise dominate. As transistors shrink, “pink” 1/f noise takes over at low frequencies – becoming stronger in advanced processes and quantum computing technology. But it’s not an easy thing to characterize. Measurement time is bound by slow sampling at low frequencies, while other noise sources factor in across wider device bandwidths. Now, there’s a new approach to seeing 1/f noise more accurately.

The shape of noise depends on its source

In frequency domain over a narrow bandwidth, white noise may look flat – thus the term noise floor. But stretch the bandwidth out from near-DC to high frequencies, and noise takes on a shape, with different contributions from different types of noise. Contributors include:

  • Thermal noise comes from the Brownian motion of electrons through resistance, showing up as white noise extending across the analog bandwidth.
  • Shot noise happens when electrons flow discontinuously between semiconductor P-N junctions, adding to the white noise profile.
  • Random telegraph noise (RTN), or burst noise, comes from small voltage or current transitions due to charge trapping. RTN shows up with 1/f2 power spectral density at near-DC frequencies, sometimes called “red” noise.
  • 1/f noise is also caused by charge trapping and is usually more prominent than other sources if manufacturing process quality is high. While strongest at near-DC frequencies, 1/f noise can add noise energy up to a corner frequency.

Additionally, two external noise sources can affect 1/f noise measurements. The first is chamber noise, usually optimized by probe station manufacturers. The second is environmental noise, from sources such as AC power line noise, ground loops in cabling, and nearby equipment – all mitigatable through electromechanical best practices.

Decimation chain speeds up 1/f noise measurements

An OEM’s system noise floor profile affects metrics like signal-to-noise ratio (SNR), receiver sensitivity, and error vector magnitude (EVM). For a low system noise floor, low 1/f noise components are a must. Both passive and active device manufacturers can turn to a low frequency noise analyzer for characterizing parts, improving their performance, and extracting device models for customer use in system simulations.

1/f noise measurement has been a different process from wideband noise measurement. One reason is the sheer amount of time a 1/f measurement takes at low frequencies. A good measurement requires a lot of samples and averaging, and as analysis frequencies drop, sample times turn into minutes or longer. This pushed many manufacturers into two different noise measurement instruments, one for low frequencies, one for a wider analog bandwidth. But this approach has twice the test setups and disparate test data from instruments with different noise floors, settings, and algorithms.

One instrument, one test setup, and complete noise results from near-DC to the maximum analog bandwidth sounds great. The question is, how? Keysight looked at the problem differently and came up with the idea of a decimation chain. In short, it takes one set of samples and decimates down to the lowest frequency band. Instead of resampling data for higher frequency bands, it reuses the same samples and runs decimation, FFTs, and averaging on bands in parallel. The result is a solid 1/f noise measurement with major time savings and no reduction in quality.

Turnkey full wafer-level characterization

One core feature of the Keysight E4727B A-LFNA is its ability to see very low noise in devices, such as MOSFET linear region noise. With a system noise floor of 1e-28A^2/Hz and a 1/f noise corner frequency of 15 Hz, plus a maximum analog bandwidth of 100 MHz, the E4727B improves noise characterization. Combining a low noise floor with a wider analysis bandwidth and faster measurement speed makes the E4727B a world-class solution.

When paired with the Keysight WaferPro Express measurement platform and a wafer probing solution like the CM300xi-ULN from FormFactor, the E4727B A-LFNA provides turnkey measurement of DC characteristics, 1/f noise and RTN, and data analysis. Applications include mass characterization of noise specifications for devices on-wafer before packaging, and manufacturing statistical process control.

Another important application is developing process design kits (PDKs). Many firms designing low noise semiconductors are fabless, turning to foundries for wafer fabrication services. The transistors in those foundry processes aren’t under designer control – they’re offered as part of libraries chip designers can choose from. Understanding their characteristics is vital to design success. Adding the Keysight PathWave Model Builder (MBP) or PathWave Device Modeling (IC-CAP) to the suite automates 1/f noise model parameter extraction and model library generation. A foundry can characterize transistors and other primitives in their processes and make 1/f noise data and models available for customers during their evaluation and design workflow.

Seeing 1/f noise more accurately takes on greater importance as transistors get smaller and faster on advanced process nodes. The idea that wafer-level characterization can screen large numbers of devices quickly is powerful. Plus, savings in measurement speed and test setups from using a Keysight E4727B A-LFNA translates to more wafer throughput.


Truechip’s Network-on-Chip (NoC) Silicon IP

Truechip’s Network-on-Chip (NoC) Silicon IP
by Kalar Rajendiran on 06-14-2022 at 10:00 am

Truechip NoC Silicon IP Block Diagram

Driven by the need to rapidly move data across a chip, the NoC IP is already a very common structure for moving data with an SoC. And various implementations of the NoC IP are available in the market depending on the end system requirements. Over the last few years, the RISC-V architecture and the TileLink interface specification have been gaining broad adoption.  While the TileLink specification was originally developed to work with the RISC-V architecture, it actually supports other instruction set architectures (ISAs) too. The conjunction of these trends has created a need for a NoC IP to work with the TileLink protocol.

A recent SemiWiki post discussed the DisplayPort VIP solution from Truechip, an IP company that has been serving customers for more than a decade. While Truechip has established itself as a global provider of verification IP (VIP) solutions, they are always on the lookout for strategic IP needs from their customer base. Truechip has seized the above strategic NoC IP opportunity to develop a design IP targeting RISC-V based chips supporting the TileLink interface specification. Since its introduction to the market last year, this IP has been gaining a lot of adoption within Truechip’s customer base. While this is their first design IP addition to their product offering, we can expect to see more strategic additions in the future.

Truechip’s NoC Silicon IP

Truechip’s NoC silicon IP’s target applications are RISC-V based chip system implementations leveraging the TileLink specification. The IP provides chip architects and designers with an efficient way to connect multiple TileLink based master and slave devices for reduced latency, power, and area. And of course, it helps reduce physical interconnect routing and use of resources inside an SoC. The solution is offered in native Verilog. Truechip’s unique RTL coding technique has yielded a high quality IP that offers low latency, high throughput and takes very little silicon area. While the current version supports the TileLink Uncached Lightweight (TL-UL) and TileLink Uncached Heavyweight (TL-UH) conformance levels, the next version will include support for TL-C (cache coherency) conformance level.

Some Salient Features

  • Supports N master and M slave ports as per customer requirements
  • Supports wide range of memory map
  • Supports both little endianness and big endianness
  • Supports both the TL-UL and TL-UH conformance levels
  • Supports all TileLink networks that follow a directed acyclic graph (DAG)
  • Supports configurable widths of various parameters of data and address bus
  • Supports all types of operations per conformance levels
    • Access
    • Hint
    • Transfer
  • Can work as any node of a graph tree
    • Nothing
    • Trunk
    • Tip (with no branches)
    • Tip (with branches)
    • Branch

Deliverables

NoC Silicon IP in RTL form

Testbench and Sanity Tests

User Manual and Integration guide

Quick start guide

TruEye Tool for debug (optional)

Full Verification IP for TileLink (optional)

About Truechip

Truechip, the Verification IP specialist, is a leading provider of Design and Verification solutions. It has been serving customers for more than a decade. Its solutions help accelerate the design cycle, lowers the cost of development and reduces the risks associated with the development of ASICs, FPGAs, and SoCs. The company has a global footprint with sales coverage across North America, Europe and Asia. Truechip provides the industry’s first 24×7 support model with specialization in VIP integration, customization and SoC Verification.

For more information, refer to Truechip website.

Also Read:

LIDAR-based SLAM, What’s New in Autonomous Navigation

Die-to-Die IP enabling the path to the future of Chiplets Ecosystem

Very Short Reach (VSR) Connectivity for Optical Modules


A Different Perspective: Ansys’ View on the Central Issues Driving EDA Today

A Different Perspective: Ansys’ View on the Central Issues Driving EDA Today
by John Lee on 06-14-2022 at 6:00 am

RedHawk SC uses Ansys SeaScape Big Data Platform Designed for EDA Applications

For the past few decades, System-on-Chip (SoC) has been the gold standard for optimizing the performance and cost of electronic systems. Pulling together practically all of a smartphone’s digital and analog capabilities into a monolithic chip, the mobile application processor serves as a near-perfect example of an SoC. But today’s leading integrated circuits (IC) are pushing up against the upper limit of a chip’s physical size, which is limited by the manufacturing equipment’s optical reticle size. This has proven difficult to increase and has grown only slowly over the years. Yet market pressure continues unabated for bigger, more capable electronic systems with more integrated memory, more digital logic, and more analog/mixed signal circuitry. This tension is driving some significant business and technology trends in EDA that will reshape the market in the coming years.

The Four Engines Driving Semiconductor Design
The road forward has plenty of challenges and we are seeing design companies making significant efforts to adapt and come to grips with the following four technology and market drivers:

  • The requirement for concurrent multiphysics analysis to ensure reliable and efficient electronic systems
  • The blurring of the lines between chip, package, and system
  • The need for open, extensible, and inclusive platforms that interoperate with the full range of tools required to solve today’s multiphysics designs
  • Bespoke silicon as the major driver for EDA at hyperscalers and system companies

Blurring of Silicon and System Design
The advent of 3D-IC opens up new horizons for solutions that can be implemented in silicon. But it also forces a closer integration between three distinct technology markets that have co-existed symbiotically for many decades: IC design, package design, and printed circuit board (PCB) design. These markets use different tools, different data formats, different manufacturing back-ends, operate at different computational and geometric scales, and focus on different physical concerns. Yet, emerging 2.5D/3D-IC technology combines many aspects of all three: It features heterogeneous silicon die but also board-like substrates and interposers that stitch the chips together. The collapse of all this expertise into a single project is requiring companies to re-imagine their design capabilities and flows, as well as their organizational structure.

Open, Extensible, Multiphysics Platforms
The siloed isolation of chip design from PCB design and package design means that each of these markets has developed insular data structures that are ill-suited to deal with the breadth of multiphysics analysis for 3D-IC design. Many different physical disciplines – including computational fluid dynamics, mechanical stress, and electromagnetic radiation – are all needed to solve the multiphysics challenge. No one company offers the entire range of required tools, so we see the need for open multiphysics platforms that allow easy data exchange and tool integration. A crucial factor for advanced users is the ability to customize their design flow around these platforms with popular extension languages like Python. And, finally, there is the issue of tool capacity to handle the enormous size of modern silicon systems. EDA platforms must embrace the modern cloud compute paradigm that enable realistic analysis in a time of relevance.

Bespoke Chips
Today’s market-leading companies are heavily dependent on technology for their continued success and market differentiation. Silicon systems are now so powerful and central that their performance can shift the needle for entire business divisions. Everybody from online retailers to telecommunications to social networking companies and hyperscalers are moving away from off-the-shelf solutions and turning to custom-built silicon to give them an edge. Many of these companies are seeking to gain market share by leveraging proprietary AI/ML algorithms trained on their extensive troves of market data – but this requires yet greater amounts of compute power and specialized chips. Access to high-quality silicon solutions is vital in today’s world and there is strong demand for continually more complex and powerful electronics.

3D-IC an Inflection Point in Electronic Design
3D-IC design is recognized as an inflection point in electronic design and presents major challenges that are realigning the electronic design industry around this new reality.

The key technology breakthrough of 3D-IC is that it makes it possible to spread a system out over multiple chips – moving the industry away from the traditional monolithic SoC approach. By abandoning the need to integrate an entire system on a single SoC and instead allowing it to be disaggregated over multiple chips, 3D-IC enables Moore’s Law to break through the reticle size barrier, improves yield by shrinking the size of individual chips, and makes it possible to mix different process technologies optimized for each function.

Summary
The four trends outlined above are deeply interconnected and mutually reinforcing. We believe that they give a perspective for EDA innovation over the coming years and show a path forward for all stake holders in the electronic design market to align their development priorities to take advantage of the incredible technical opportunities that are available to us.

About John Lee
John Lee is general manager and vice president of the Ansys Electronics and Semiconductor Business Unit. Lee co-founded and served as CEO of Gear Design Solutions (now Ansys), developer of the first purpose-built big data platform for integrated circuit design. He cofounded two other startups (Mojave Design and Performance Signal Integrity), which successfully exited into companies now part of Synopsys. He holds undergraduate and graduate degrees from Carnegie Mellon University.

Also Read:

Unlock first-time-right complex photonic integrated circuits

Take a Leap of Certainty at DAC 2022

Bespoke Silicon is Coming, Absolutely!


Podcast EP86: Negative Outlook for the Semiconductor Industry with Malcolm Penn

Podcast EP86: Negative Outlook for the Semiconductor Industry with Malcolm Penn
by Daniel Nenni on 06-13-2022 at 10:00 am

Dan is joined by Malcolm Penn, founder and CEO of Future Horizons, a firm that provides industry analysis and consulting services on the global semiconductor industry.

Dan and Malcolm discuss the current and future state of the semiconductor industry. What has driven the cyclic nature of the business and are we doomed to repeat these cycles? Will the industry shrink or grow over the next few years, and what are the factors that will shape these outcomes?

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Intel 4 Deep Dive

Intel 4 Deep Dive
by Scotten Jones on 06-13-2022 at 6:00 am

Figure 1

As I previously wrote about here, Intel is presenting their Intel 4 process at the VLSI Technology conference. Last Wednesday Bernhard Sell (Ben) from Intel gave the press a briefing on the process and provided us with early access to the paper (embargoed until Sunday 6/12).

“Intel 4 CMOS Technology Featuring Advanced FinFET Transistors optimized for High Density and High-Performance Computing,”

The first thing I want to discuss is the quality of this paper. This paper is an excellent example of a well written technical paper describing a process technology. The paper includes the critical pitches needed to judge process density, the performance data is presented on plots with real units and the discussion provides useful information on the process. I bring this up because at IEDM in 2019 TSMC published a paper on their 5nm technology that had no pitches, and all the performance plots were normalized without real units. In my view that was a marketing paper not a technical paper. At the conference press luncheon, I asked the organizing committee if they considered rejecting the paper due to the lack of content and they said they had but ultimately decided 5nm was too important.

Intel has disclosed a roadmap for the next four nodes (Intel 4, 3, 20A, and 18A) with dates, device types, and performance improvement targets. They are now filling in more detail on Intel 4. In contrast, Samsung is in risk starts on their 3nm and have disclosed PPA (Power, Performance and Area) targets but no other details, for 2nm they have disclosed that it will be their third generation Gate All Around (GAA) technology due in 2025 but no performance targets. TSMC has disclosed PPA for 3nm that is currently in risk starts and for 2nm a risk-start date has been disclosed but no information on performance, or device type.

Intel 4 Use Target

Before getting into the details on Intel 4, I want to comment on the target for this process. As we went through the details it became clear this process is targeted at Intel internal use to manufacture compute tiles, it is not a general use foundry process. Intel 4 is due late this year and Intel 3 is due next year; Intel 3 is the focus for Intel Foundry Services. Specifically, Intel 4 does not have I/O fins because they are not needed on a compute tile that is going to communicate solely with other chips on a substrate and Intel 4 only offers high performance cells and does not have high density cells. Intel 3 will offer both I/O fins and high-density cells as well as more EUV use and better transistors and interconnect. Intel 3 is designed to be an easy port from Intel 4.

Density

Anyone who has read my previous articles and comparisons knows I put a lot of emphasis on density. In figure 1 of the Intel 4 article, they disclose critical pitches for Intel 4 and compare it to Intel 7, see figure 1.

Figure 1. Intel 4 Versus 7 Pitches.

 The high-performance cell height (CH) for Intel 7 is 408nm and for Intel 4 is 240nm. The Contacted Poly Pitch (CPP) for Intel 7 is 60 and for Intel 4 is 50, the product of CH and CPP for Intel 7 is 24,480nm2 and for Intel 4 is 12,000nm2, providing an ~2x density improvement for high performance cells. Intel 4 also provides a 20% performance per wall improvement versus Intel 7 and high density SRAMs are scaled by 0.77x.

To put this density improvement in context it is useful to better understand Intel’s recent process progression. Figure 2 summarizes four generations of Intel’s 10nm process.

Figure 2. Intel 10nm Generations.

IC Knowledge has a strategic partnership with TechInsights, we believe them to be the best in the world at construction analysis of leading-edge semiconductors. TechInsights first analyzed Intel 10nm in July 2018 and refers to this as generation 1, TechInsights completed another 10nm analysis in December 2019 finding the same density but a different fin structure leading them to refer to this as generation 2. In January 2021 TechInsights analyzed the 10nm Super Fin parts that offers a 60nm CPP option for performance along with the original 54nm CPP (generation 3). Finally in January 2022 TechInsights analyzed a 10nm enhanced Super Fin part, what Intel now calls Intel 7 (10nm generation 4). One interesting thing to me about the result of the Intel 7 analysis is TechInsights only found 60nm CPP in the logic area, no 54nm CPP and taller cells.

My policy for characterizing process density is to base it on the densest cell available on the process. For Intel 7 a 54nm CPP cell 272nm high is “available” but not used and the 408nm high cell with a 60nm CPP yields a transistor density of ~65 million transistor per millimeter squared (Mtx/mm2) versus ~100 MTx/mm2 for earlier generations. So how do we place Intel 4 versus prior generation processes and the forthcoming Intel 3 process, see figure 3.

Figure 3. Intel Density Comparison.

 In figure 3 I have presented high-density and high-performance cell density separately. Intel 4 is ~2x the high-performance cell density of intel 7 as Intel has disclosed. Intel 3 is supposed to have “denser” libraries versus Intel 4. If I assume the same pitches but a smaller track height for Intel 3, I get ~1.07x denser high-performance cells and ~1.4x denser high-density cells versus Intel 10/7.

Another interesting comparison is Intel 4 high-performance cell size versus TSMC high performance cell sizes for 5nm and 3nm, see figure 4.

Figure 4. Intel 4 versus TSMC N3 and N5 High-Performance Cells.

TSMC N5 has a 51nm CPP and 34nm M2P with a 9.00 track high-performance cell that yields a 306nm CH and a 15,606nm2 CPP x CH. We believe TSMC N3 has a 45nm CPP and 28nm M2P, and for a 9.00 track high-performance cell that yields a CH of 252nm and a CPP x CH of 11,340nm2. For Intel 4 the CPP is 50nm and M2P is 45nm (disclosed in the briefing although not in the paper), this yields a tracks height of only 5.33 for the quoted 240nm CH and a CPP x CH of 12,000nm2. These values are consistent with a 4 designation since it slots between N5 and N3 for the leading foundry company TSMC, although it is closer to TSMC N3 than TSMC N5. We also believe Intel 4 will have performance slightly better than TSMC N3. I didn’t include Samsung in Figure 4 but based on my current estimates Intel 4 is denser than Samsung GAE3. Samsung may have a small performance advantage over Intel 4 and TSMC N3, but Intel 3 should surpass both Samsung GAE3 and TSMC N3 for performance next year.

I am surprised that Intel’s high-performance cell works out to just over 5-tracks in height but that is the math for the disclosed cell height and M2P.

DTCO

From a Design-Technology-Co-Optimization (DTCO) perspective Intel 4 has 3 improvements over Intel 7:

  1. Contact Over Active Gate is optimized for Intel 4.
  2. Diffusion break by dummy gate removal used to need two dummy gates (double diffusion break), Intel 7 went to 1 (single diffusion break).
  3. The n to p spacing used to be two fin pitches and is now 1 fin pitch. When we talk about CH in terms of M2P and tracks it is easy to forget that the devices have to fit into that same height and figure 5 illustrates how n to p spacing contributes to cell height.

Figure 5. Cell Height (CH) Scaling.

During the briefing Q&A there was a question about cost per transistor and Ben said that cost per transistor went down for Intel 4 versus Intel 7.

Performance

Intel 10/7 offered 2 threshold voltage (2 PMOS and 2 NMOS = 4 total) and 3 threshold voltage (3 PMOS and 3 NMOS = 6 total) versions. Intel 4 provides 4 threshold voltages (4 PMOS and 3 NMOS = 8 total). This results in ~40% lower power and ~20% higher performance.

I believe the drive current values mentioned during the briefing are 2mA/μm for PMOS and 2.5mA/μm for NMOS.

EUV usage

EUV is used in both the backend and front end of the process. Intel has focused EUV use on where a single EUV exposure can replace multiple immersion exposures. Even though an EUV exposure is more expensive than an immersion exposure, replacing multiple immersion exposures with associated deposition and etch steps can save cost, improves cycle time and yield. In fact Ben mentioned single EUV exposures resulted in 3-5x fewer steps in the sections that EUV replaced. Intel 7 to Intel 4 see a reduction in masks and step count. In the front end of line EUV is focused on replacing complicated cuts, gate or contact. Intel didn’t explicitly disclose that EUV is used in fin patterning but we believe for Intel 7 fin patterning involved a mandrel mask (Intel calls this a grating mask) and 3 cut masks (Intel calls these collection masks). For Intel 4 this could easily have transitioned to 4 cut masks. Without naming the layer replacing 4 cut masks with a single EUV mask was mentioned and we believe this could be where that happens.

In the paper Intel mentions that M0 is quadruple patterned. For Intel 10/7 Intel also disclosed quadruple patterning and TechInsights analysis showed that 3 block masks were needed. It is possible that Intel 4 would need 4 block masks for M0 and this may be another place where EUV eliminate 4 cut/block masks.

A gridded layout was used for interconnect to improve yield and performance.

We believe there are ~12 EUV exposures used in this process, but this was not disclosed by Intel.

Interconnect

It is well known that Intel went to cobalt (Co) for M0 and M1 at 10nm. Co offers better electromigration resistance than copper (Cu) but higher resistance (Authors note, electromigration resistance of a metal is proportional to melting point). For Intel 4, Intel has gone to an “enhanced” Cu scheme where pure Cu is encased in Co (in the past Intel doped the Cu). A typical flow to encapsulate Cu in Co is to put down a barrier layer with a Co layer to serve as the seed for plating. Once plating is complete and planarized to form an interconnect the Cu is capped with Co. This process results in slightly degraded electromigration resistance versus Co but still above the 10-year lifetime goal and the resistance of the line is reduced. In fact, even though the interconnect lines are narrower for Intel 4 versus Intel 7, the RC value are maintained.

The process has 5 enhanced copper layers, 2 giant metal layers and 11 “standard” metal layers for a total of 18 layers.

MIM caps

With the increasing importance of power delivery Metal-Insulator-Metal (MIM) capacitors are used to reduce power swings and have undergone continuous improvement. For Intel’s 14nm process 37 fF/μm2 was achieved, this improved to 141 fF/μm2 for 10nm, 193 fF/μm2 for intel 7 and has now been increased ~2x to 376 fF/μm2 for Intel 4. Higher values enable MIM capacitors with more capacitance improving power stability without taking up excess space.

Where they went wrong

During the Q&A Ben was asked where Intel went wrong in the past, he said that in the past Intel tried to do too much at once (authors note, for example Intel 22nm to 14nm was a 2.4x density increase and then 14nm to 10nm was a 2.7x density increase, see figure 3. Intel has now adopted a modular approach where you can separately develop modules and deliver more performance, more quickly.

When asked what he was most proud of, he said achieving yield and performance with library scaling and the process looks good in factories. The process is simpler with EUV improving yield and reducing registration issues.

Production sites

Also during the Q&A Ben was asked about production sites. He said initial production will be in Hillsboro followed by Ireland. He said they haven’t disclosed additional production plans beyond that.

In our own analysis of EUV availability we have published here that EUV exposure tools will be in short supply for the next few years. This is also consistent with Pat Gelsinger discussing tool shortages for Intel’s new fabs. We believe EUV tool availability will gate Intel’s fab ramp. Furthermore we believe Intel has ~10 to 12 EUV tools presently and until recently they were all in Hillsboro. One of those tools has now been moved to Fab 34 in Ireland and we believe that as intel receives further EUV tools this year they will be able to ramp Fab 34 up. Late this year we expect Fab 38 in Israel to begin ramping and our belief is that will be the next Intel 4/3 production site. Following that in the later part of 2023, Fabs 52 and 62 in Arizona should start receiving EUV tools. We also believe most of this capacity will be needed for Intel’s own internal use and they will have limited EUV based foundry capacity until the 2024/2025 timeframe.

Yield and Readiness

Throughout the briefing everything we heard about yield is that it is “healthy” and “on schedule”. Meteor Lake compute tiles are up and running on the process. The process is ready for product in the second half of next year.

Conclusion

I am very impressed with this process. The more I compare it to offerings from TSMC and Samsung the more impressed I am. Intel was the leader in logic process technology during the 2000s and early 2010s before Samsung and TSMC pulled ahead with superior execution. If Intel continues on-track and releases Intel 3 next year they will have a foundry process that is competitive on density and possibly the leader on performance. Intel has also laid out a roadmap for Intel 20A and 18A in 2024. Samsung and TSMC are both due to introduce 2nm processes in the 2024/2025 time frame and they will need to provide significant improvement over their 3nm processes to keep pace with Intel.

Also Read:

An Update on In-Line Wafer Inspection Technology

0.55 High-NA Lithography Update

Intel and the EUV Shortage