SemiWiki – Page 14 – The Open Forum for Semiconductor Professionals

NVM Survey 25 Wide Banner for SemiWiki 800x100 px (1)

August 28, 2025August 27, 2025

Synopsys Enables AI Advances with UALink

Synopsys Enables AI Advances with UALink
by Mike Gianfagna on 08-28-2025 at 6:00 am
Categories: AI, Events, IP, Synopsys

The evolution of hyperscale data center infrastructure to support the processing of trillions of parameters for large language models has created some rather substantial design challenges. These massive processing facilities must scale to hundreds of thousands of accelerators with highly efficient and fast connections. How to deliver fast memory access to all those GPUs and how to efficiently move large amounts of information across the data center are two significant problems to be solved.

It turns out Synopsys has been working on critical enabling IP technology in this area. The company has announced multiple industry-first solutions. Synopsys also co-hosted a very informative discussion on the emerging UALink standard with several industry heavyweights. Links are a coming but first let’s take a big picture view of how Synopsys enables AI advances with UALink.

About the Standards

A bit of background is useful.

Ultra Accelerator Link, or UALink is an open specification for a die-to-die and serial bus communication between AI accelerators. It was co-developed by Alibaba, AMD, Apple, Astera Labs, AWS, Cisco, Google, Hewlett Packard Enterprise, Intel, Meta, Microsoft and Synopsys. The UALink consortium officially incorporated as an electronics industry consortium in 2024 for promoting and advancing UALink. This is another very large ecosystem effort with over 100 companies participating.

An Informative Discussion

LinkedIn Event Speakers

Getting four industry experts together to discuss technical topics can be challenging. Coordinating the discussion so lots of very useful information is conveyed in 25 minutes is truly noteworthy. That’s exactly what Synopsys achieved with AMD, Astera Labs and Moor Insights in a recent discussion posted on LinkedIn.

Entitled Designing the Future of AI Infrastructure with UALink, the four technologists participating are impressive:

Matthew Kimball, VP & Principal Analyst, Moor Insights & Strategy
Kurtis Bowman, Director of Architecture and Strategy, AMD
Chris Petersen, Fellow of Technology and Ecosystems, Astera Labs
Priyank Shukla, Director of Product Management, Synopsys Inc

Chris Petersen led the discussion. A link is coming so you can watch the entire event. Here are some of the items that are discussed:

UALink aims to improve bandwidth, latency, and density. How does advanced SerDes fit into these scenarios?
How do customers perceive the UALink consortium? What are the benefits of this collaboration?
This is a very broad-based consortium, including many competing entities. How does such a system work? What are the dynamics of the group?
How does this work optimize memory access for massively parallel systems?
What are the key design challenges for advanced design centers and how does UALink address them?
What is the interaction between UALink, OCP, IEEE and UEC?
What will the timing be for product introductions based on this work?

A couple of comments from Priyank Shukla of Synopsys will give you a flavor of the discussion. Priyank discussed how the IEEE has expanded its communication specification to facilitate the very high performance, low power, and low latency required for advanced AI systems. He explained that high-speed SerDes technology is the foundation for the critical physical level link for all communication between any two chips that comprise a system. This includes GPUs, accelerators, and switches for example.

He went on to describe the massive ecosystem effort across many companies to implement the required standards and supporting technology. He pointed out that, “it takes a whole village to train a machine learning model.” Priyank explained that designing these systems is so challenging, there is no one right answer. He stated that, “when over 100 companies come together, the best idea survives.”

There are many more topics covered. A link is coming.

Industry First UALink IP Solution for Scale-Up Networks

Synopsys announced industry-first IP solutions for both Ultra Ethernet and UALink implementations back in December of 2024. The announcement details advances for both scale-up and scale-out strategies.

The Synopsys UALink IP solution facilitates scaling up AI clusters to 1,024 accelerators. Designers can achieve 200 Gbps per lane and unlock memory sharing across accelerators. The solution delivers speed verification with built-in-protocol checks.

The Synopsys Ultra Ethernet IP solution facilitates scaling out of networks to 1 million nodes. There is access to silicon-proven Synopsys 224G PHY IP to achieve up to 1.6 Tbps of bandwidth. Designers can now unlock ultra-low latency for real-time processing. You can learn more about the Synopsys 224G PHY IP on SemiWiki here, including characterization data and how the IP fits in these new standards.

There is a lot of useful information in the announcement, including how Synopsys’ industry-leading communication IP has enabled more than 5,000 successful customer tapeouts.

Synopsys also has a lot of help to scale the HPC and AI accelerator ecosystem through collaboration with industry leaders, including AMD, Astera Labs, Juniper Networks, Tenstorrent, and XConn.

There are informative quotes from senior executives at Junpier, AMD, Astera Labs, Tenstorrent, and XConn, as well as Synopsys. The chairperson of the board at the UALink Consortium also weighs in.

To Learn More

You can read the complete product announcement, Synopsys Announces Industry’s First Ultra Ethernet and UALink IP Solutions to Connect Massive AI Accelerator Clusters here. The LinkedIn panel entitled Designing the Future of AI Infrastructure with UALink can be viewed here. You can learn more about the Synopsys Ultra Ethernet IP solution here. And you can learn more about the Synopsys UALink IP solution here. And that’s how Synopsys enables AI advances with UALink.

Also Read:

448G: Ready or not, here it comes!

Synopsys Webinar – Enabling Multi-Die Design with Intel

cHBM for AI: Capabilities, Challenges, and Opportunities

August 27, 2025August 31, 2025

Revolutionizing Chip Packaging: The Impact of Intel’s Embedded Multi-Die Interconnect Bridge (EMIB)

Revolutionizing Chip Packaging: The Impact of Intel’s Embedded Multi-Die Interconnect Bridge (EMIB)
by Admin on 08-27-2025 at 10:00 am
Categories: Chiplet, Foundries, Intel Foundry

In an era dominated by artificial intelligence (AI), machine learning (ML), and high-performance computing (HPC), the demand for semiconductors that deliver high data throughput, low latency, and energy efficiency has never been greater. Traditional chip designs often struggle to keep pace with these requirements, leading to bottlenecks in performance and scalability. Intel Foundry’s Embedded Multi-Die Interconnect Bridge (EMIB) is a groundbreaking 2.5D interconnect technology that redefines chip packaging. Introduced in high-volume manufacturing since 2017, EMIB enables the seamless integration of multiple dies into a single package, enhancing performance, power efficiency, and design flexibility. This innovation addresses the limitations of monolithic chips by allowing heterogeneous integration, combining dies from different process nodes, without necessitating complete system redesigns.

At its core, EMIB employs small silicon bridges embedded within an organic substrate to facilitate high-bandwidth communication between adjacent dies. Unlike conventional approaches that rely on large silicon interposers, which embed multiple routing layers and require all signals and power vias to pass through them, EMIB uses a compact bridge with targeted microbump pitches. As illustrated in the technology brief, this design maintains a tight pitch only at the bridge interface, while the rest of the die-core region can retain a looser pitch, optimizing cost and efficiency. The manufacturing process aligns with standard semiconductor package-assembly flows, with the key difference being the substrate fabrication: bridges are placed in cavities, secured with adhesives, and layered with dielectrics and metals. This method not only reduces the footprint but also preserves input/output (I/O) signal integrity and power characteristics, avoiding the thermal and electrical challenges posed by full interposers.

EMIB’s advantages extend beyond its architecture. It supports high-data-rate signaling with simple driver/receiver circuitry, enabling customizable layouts for large, heterogeneous die complexes. Each die-to-die link can be optimized individually, tailoring bridges to specific interconnect needs, such as logic-logic or logic-high-bandwidth memory (HBM) communications. In response to evolving demands, Intel has expanded the EMIB portfolio. EMIB-M incorporates Metal Insulator Metal (MIM) capacitors into the bridges to improve power delivery, mitigating noise in high-power applications. Meanwhile, EMIB-T introduces through-silicon vias (TSVs) for vertical power delivery, enhancing compatibility with HBM and facilitating conversions from other packaging technologies. These variants achieve assembly yields comparable to standard flip-chip ball grid arrays (FCBGA), making them viable for high-volume production using both Intel and external silicon.

The true potential of EMIB unfolds when combined with Intel’s Foveros die-stacking technology, creating EMIB 3.5D—a hybrid architecture that merges 2.5D lateral bridging with 3D vertical stacking. This “system of chips” approach overcomes challenges like thermal warping, reticle size limits, and interconnect constraints, expanding the silicon surface area for complex systems. As depicted in the evolution from traditional wire-bond packages to advanced solutions, EMIB 3.5D balances package size, compute performance, power usage, and cost, making it ideal for AI accelerators and HPC workloads. By enabling disaggregated chiplet-based designs, it accelerates time-to-market and supports standards like UCIe for die-to-die interfaces.

Intel Foundry’s role as a pioneer in this space cannot be overstated. Through its Advanced System Assembly & Test (ASAT) division, it offers end-to-end solutions, including testing services and ecosystem partnerships for systems technology co-optimization (STCO). This shift from “system on chip” to “systems of chips” positions Intel at the forefront of the semiconductor industry’s transformation, fostering innovation in diverse sectors like servers, networks, and edge computing.

Bottom line: EMIB represents a paradigm shift in chip packaging, empowering designers to build more powerful, efficient, and scalable systems. As applications like AI continue to push boundaries, technologies like EMIB will be instrumental in driving progress, ensuring that computational demands are met with ingenuity and precision. With ongoing advancements, Intel Foundry is not just adapting to the future—it’s shaping it.

Also Read:

EMIB Technology Brief

Foveros Product Brief

UCIe 3.0 Wiki

August 27, 2025September 8, 2025

Cocotb for Verification. Innovation in Verification

Cocotb for Verification. Innovation in Verification
by Bernard Murphy on 08-27-2025 at 6:00 am
Categories: Cadence, EDA

This time let’s see if we can stir up some lively debate. Cocotb isn’t new but it is an interesting alternative to mainstream testing methodologies. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Analyzing AES Verification: A Comparative Study of UVM and Cocotb Approaches. The authors are from the PSG Institute of Technology and Applied Research in Tamilnadu, India. The paper is from the 2024 ICSSEECC conference and has no citations so far.

I must apologize for this month’s selection, not up to our usual standards. That’s on me. Paul subsequently found a better paper on cocotb which we may consider for a later post.

UVM is so dominant in functional verification that to mention any other possibility seems to invite reflexive opposition. But an alternative needn’t be a threat; it can add complementary value. UVM is certainly a fixture in production verification, however there are other verification needs where a flexible platform with a quick learning curve may fit better. This paper looks at relative strengths in UVM and Python-based cocotb in different contexts and finds pluses and minuses for both approaches.

Ultimately, familiarity for quick and easy testing in early RTL development may be the real advantage for cocotb, while still leaving heavy-duty verification to proven UVM flows.

Paul’s view

We wanted to give some airtime in our blog to cocotb (https://www.cocotb.org/). It’s an open-source Python-based alternative to UVM for testbench design. A few of our customers are loyal advocates for it, claiming how much more productive they are coding in Python, and how much easier it is to ramp up new college grads using it. This month’s paper is on comparing UVM with Cocotb to verify an AES crypto block. The paper is light on details – it shows terrible coverage from their UVM bench, with no explanation why, and has no data on the relative runtime performance of UVM vs. cocotb.

Stepping back from the paper and looking at the big picture, a simulation testbench is a software program to generate stimulus and monitor/check design behavior (design outputs, internal registers, internal signals). It can be written in any software programming language. SystemVerilog/UVM and C/C++ are the most common, Python works, Specman-e still has a loyal following, and there’s a growing following for Portable Stimulus (PSS).

The main advantage I can see from SystemVerilog/UVM is not the language itself, but the significant investment commercial EDA vendors have made to natively integrate support for it with our logic simulators. We’ve tuned compiled UVM code for the best possible runtime performance, and we’ve built very sophisticated constraint solvers to pick pseudo-random values with the best possible distributions (an NP-complete Boolean SAT problem). To the extent that a cocotb benchmark might lose to UVM, probably it would come down these reasons and nothing fundamental to cocotb itself.

Raúl’s view

As the title states, this paper compares Universal Verification Methodology (UVM), the SystemVerilog standard verification framework, and Cocotb, a Python verification framework, through just one case study verifying an Advanced Encryption Standard (AES) hardware implementation. Much of this short paper is spent surveying some literature (15 papers), explaining some AES basics and with a very high level UVM and Cocotb introduction. To compare the approaches, the authors built a UVM testbench with drivers, monitors, sequencers, scoreboard, and reference model for AES. For Cocotb, they created Python-based tests using coroutines and Python’s Crypto.Cipher library as a reference model. The key results of the comparison are:

Coverage: For code coverage, Cocotb 89.55% is slightly higher overall than UVM at 87.49%. For functional coverage, Cocotb 100% of expected cases; UVM covered 47 out of 64 cases. It is unclear why the UVM coverage is so low and how the authors built the testbenches. In a typical design environment this would not be the case.
Simulation Time is stated as 1000ns for UVM and 10000.5 ns Cocotb, most likely they are referring to the amount of time simulated, it is stated that UVM is superior.
Build Time: Cocotb required less effort due to Python’s built-in features and simpler synchronization, although no concrete times are given.
Flexibility, automation: Python (Cocotb) offers richer data types and extensive third-party libraries for automation, computation, and visualization.

For readers who are not familiar with Cocotb, this case study presents a simple overview of its strengths and weaknesses. It is not a real comparison with UVM, but since Python is the “lingua franca” of artificial intelligence, Cocotb may play an increasing role in functional verification.

Also Read:

A Big Step Forward to Limit AI Power Demand

Streamlining Functional Verification for Multi-Die and Chiplet Designs

Chiplets and Cadence at #62DAC

August 26, 2025August 27, 2025

Can RISC-V Help Recast the DPU Race?

Can RISC-V Help Recast the DPU Race?
by Jonah McLeod on 08-26-2025 at 10:00 am
Categories: IP, Quantum Computing, RISC-V
2 Comments

ARM’s Quiet Coup in DPUs

The datacenter is usually framed as a contest between CPUs (x86, ARM, RISC-V) and GPUs (NVIDIA, AMD, custom ASICs). But beneath those high-profile battles, another silent revolution has played out: ARM quietly displaced Intel and AMD in the Data Processing Unit (DPU) market.

DPUs — also called SmartNICs — handle the “plumbing” of the datacenter. They offload networking by managing packet processing, TCP/IP, and RDMA. They handle storage services such as compression, encryption, and NVMe-over-Fabrics (NVMe-oF). They enforce security isolation, a critical requirement in multi-tenant cloud environments where trust boundaries are constantly tested. And they take responsibility for orchestration tasks that would otherwise burn valuable CPU cycles.

NVIDIA (BlueField via Mellanox), Marvell (OCTEON), AMD (Pensando), and Broadcom all adopted ARM cores for their DPUs. The reason was straightforward: ARM cores were small, power-efficient, licensable, and already embedded in networking silicon. By the time Intel reacted with its Infrastructure Processing Unit (IPU) program, ARM had already captured the ecosystem and set the standard.

Market Context: Why Now?

The global Data Processing Unit (DPU) market is projected to grow from $1.5 billion in 2023 to approximately $9.8 billion by 2032, reflecting a robust compound annual growth rate (CAGR) of 22.8% (Dataintelo Consulting Pvt. Ltd., 2024). Dataintelo attributes this growth to the exponential rise in data generation and the need for efficient data management and processing solutions across industries. At present, ARM cores power the overwhelming majority of DPU shipments, while Intel continues to promote its IPUs but has yet to gain broad market traction.

Meanwhile, RISC-V already has momentum in adjacent domains. Storage controllers from companies like Seoul-based Fadu — which integrates RISC-V cores into its enterprise SSD controllers for I/O scheduling and latency optimization — and SiFive use RISC-V to accelerate I/O. Orchestration and security processors also frequently rely on lightweight RISC-V designs such as OpenTitan. These are natural adjacencies to the DPU role. At the same time, geopolitics favors diversification: China in particular is accelerating sovereign RISC-V adoption, and DPUs are exactly the kind of infrastructure component where sovereignty matters.

The combination of market expansion, ARM’s lock-in, and hyperscalers’ desire for architectural alternatives sets the stage for a serious RISC-V entry into DPUs.

RISC-V’s Opportunity in DPUs

Unlike ARM, RISC-V offers an open ISA that companies can tailor to their exact workloads. (Wevolver, RISC-V vs. ARM, 2023). This is especially relevant for DPUs, which integrate diverse functional blocks: networking engines for packet flows, storage accelerators for compression and NVMe-oF, security modules for isolation, and control-plane CPUs for orchestration. RISC-V allows vendors to adapt each of these roles with custom instructions instead of relying on ARM’s fixed roadmaps.

Today’s DPUs often use clusters of ARM Cortex-A cores (ranging from Cortex-A53 to A72) (Marvell, OCTEON 10 Technical White Paper, 2023) to handle control-plane and lightweight compute functions. Here, RISC-V offers advantages: – Customization: Vendors can tune instruction sets for specialized workloads instead of relying on ARM’s fixed roadmaps.

Some RISC-V vendors, such as Akeana, support simultaneous multithreading (SMT) with up to four threads per core (Electronics360, 2024), improving throughput and utilization in workloads with high memory or I/O latency, such as networking and packet processing. Recent RISC-V vector extensions map naturally to packet processing, cryptography, and storage acceleration.

Emerging matrix extensions extend programmability into AI inference and security. Startup Simplex Micro’s architecture integrates scalar, vector, and matrix execution within a time-scheduled framework—leveraging RISC-V’s extensibility to deliver deterministic performance across diverse AI and HPC workloads. Finally, RISC-V avoids ARM royalties while maintaining compatibility with open-source stacks like Linux, TensorFlow, and PyTorch.

Enter RISC-V’s Scalar-to-Matrix Roadmap

What makes this moment interesting is not just another IP vendor’s pitch, but the way RISC-V itself has evolved. The ISA began by addressing scalar compute — small, efficient cores for microcontrollers, embedded systems, and simple Linux-capable processors. Over the past few years, RISC-V has steadily added vector extensions, enabling data-parallel acceleration that maps naturally onto networking, storage, and cryptographic workloads. Most recently, the roadmap has expanded to include matrix extensions, designed to bring AI inference and other matrix-math-heavy tasks into the same ISA framework.

Table 1. RISC-V Companies Advancing Unified Scalar/Vector/Matrix Architectures

Company	Focus	Differentiator
SiFive	General-purpose + AI	Early vector adoption, strong ecosystem support
Andes	Embedded + DSP/Vector	Broad portfolio, DSP + vector extensions for AI/IoT
Akeana	Datacenter-class CPUs	First RISC-V mover with SMT (4 threads) + matrix engine
Ventana	Server-class CPUs	Hyperscaler-aligned, clear path to vector workloads
Simplex Micro	Unified pipeline	Novel scalar/vector/matrix integration, latency-tolerant multithreading
SemiDynamics	Configurable HPC cores	Advanced vector + memory subsystem customization
XiangShan	Open-source research	Academic/industry project exploring unified designs

This progression — scalar to vector to matrix — mirrors the way DPUs are being asked to perform. DPUs must handle scalar control-plane logic, vectorizable packet and crypto flows, and increasingly matrix-oriented inference tasks for telemetry and security. In other words, the RISC-V roadmap provides the full ingredient set for a truly programmable DPU.

Several companies are now pursuing this vision. Akeana, with its SMT-enabled designs and AI matrix computation engines, represents one of the first movers applying RISC-V directly to datacenter-class compute. Ventana Micro Systems is building server-class RISC-V processors with a clear path from scalar to vector workloads, aligning with hyperscaler requirements. SemiDynamics in Europe is focused on configurable vector cores tailored for data-intensive and AI-centric applications.

SiFive has emphasized Linux-capable RISC-V cores with vector support, targeted at HPC and infrastructure. Andes Technology has extended its cores with vector and DSP capabilities for embedded acceleration. Simplex Micro is explicitly developing a unified scalar/vector/matrix architecture with programmable extensions aimed at spanning edge to datacenter-class infrastructure solutions. At the research level, XiangShan in China is already experimenting with scalar and vector unification under one architecture.

Leapfrogging or Reinforcing ARM?

The question is not simply whether RISC-V can replace ARM, but whether it can expand the DPU definition itself. ARM’s current dominance in DPUs relies on scalar cores plus fixed accelerators. RISC-V provides an avenue to leapfrog by blending scalar, vector, and matrix programmability into one platform. This does not have to come at ARM’s expense — indeed, ARM could even adopt RISC-V vector and matrix extensions to strengthen its own DPU position.

Why This Matters

For the broader industry, RISC-V’s rise in DPUs offers a rare chance to reset the playing field. Instead of being restricted by ARM’s licensing model, companies can bend the architecture to their needs. This is especially relevant for hyperscalers, who want to optimize power, performance, and sovereignty. RISC-V also avoids monopoly dynamics: rather than a single vendor dictating the roadmap, an open ecosystem fosters multiple paths forward (SiFive, 2023).

With RISC-V, a company like Qualcomm or any major vendor would find itself in the driver’s seat — able to design a unique, custom CPU optimized for its DPU architecture, rather than depending on ARM’s licensing terms and roadmap. This independence could be a critical differentiator as DPUs become central to datacenter infrastructure.

The timing is right. AI-driven datacenter fabrics are exploding, and DPUs are no longer just about networking. They are about orchestrating compute, storage, and AI flows. In that world, a DPU that combines scalar, vector, and matrix programmability looks far more attractive than one that only integrates scalar ARM cores and fixed-function engines.

A Broader Opening

Just as ARM spotted and exploited the DPU opportunity to outflank Intel and AMD, RISC-V now offers the chance to redefine the category. Instead of fighting NVIDIA head-on in GPUs or trying to revive CPUs, vendors can leapfrog with a programmable DPU platform that reimagines datacenter infrastructure. It would be a comeback story — not by repeating old battles, but by opening a new front.

Final Thought

The industry often frames RISC-V as a CPU story — whether it can replace ARM or x86 — or as an edge IoT play. Yet the more disruptive opportunity may lie in the datacenter’s control plane. ARM built a DPU franchise that Intel and AMD never anticipated, and now RISC-V has a chance to redefine the category with vector and matrix programmability. Ultimately, ARM and RISC-V may coexist in DPUs — with ARM maintaining its incumbency and RISC-V offering an open, customizable alternative — giving vendors and hyperscalers greater architectural choice as the market matures.

Also Read:

What XiangShan Got Right—And What It Didn’t Dare Try

Podcast EP294: An Overview of the Momentum and Breadth of the RISC-V Movement with Andrea Gallo

Andes Technology: Powering the Full Spectrum – from Embedded Control to AI and Beyond

August 26, 2025August 28, 2025

Breaking out of the ivory tower: 3D IC thermal analysis for all

Breaking out of the ivory tower: 3D IC thermal analysis for all
by Admin on 08-26-2025 at 6:00 am
Categories: 3D IC, Chiplet, EDA, Siemens EDA

Todd Burkholder and Andras Vass-Varnai, Siemens EDA

As semiconductor devices become smaller, more powerful and more densely integrated, thermal management has shifted from an afterthought to a central challenge in modern IC design. In contemporary 3D IC architectures—where multiple chiplets are stacked and closely arrayed—power densities reach extreme levels comparable to the surface of the Sun. This is not mere analogy but a demonstrable engineering constraint that defines the boundaries of viable designs today. Consequently, the traditional approach—where thermal analysis was conducted at the end of the design process by esoteric and isolated specialists—is no longer sufficient.

Historically, silicon, package, and system teams worked in silos, with thermal analysis relegated to rule-of-thumb calculations or late-stage verification. In the context of 3D ICs, such practices are a recipe for risk, rework, and likely disaster. With multiple interacting thermal domains in extremely compact spaces, proactive and continuous thermal management is essential from the earliest design phases.

Organizations like Siemens are responding by laying the groundwork for robust thermal analysis workflows that are accessible for design engineers at all skill levels, and that are no longer confined to dedicated thermal analysts possessing esoteric knowledge.

The thermal analysis 3D IC conundrum

Thermal models are foundational for advanced IC packaging. Accurate modeling determines not just whether a chip will function, but also how long it will last and how reliably it will operate. Overheating or poorly managed thermal gradients can corrupt data, degrade performance or cause premature field failures—outcomes with both technical and business consequences.

Figure 1. Illustrative example of a 3D IC with heat dissipation.

This risk is especially pronounced in heterogeneous 3D IC stacks, where devices with different thermal tolerances are integrated together. For instance, high-power logic may safely reach junction temperatures above 110°C, but adjacent high-bandwidth memory may need to operate below 90°C to retain data integrity. Tight proximity optimizes data transfer and electrical performance, but it also creates substantial risk of cross-heating. If a thermal problem is not identified until after layout, required changes can be staggeringly expensive and time-consuming.

Effective 3D IC thermal analysis is not just about improved simulation. It demands unified workflows and seamless communication across electrical, mechanical, and system domains. One of the most pervasive challenges in the industry is the lack of common tools, culture, and terminology. Silicon designers, package architects, and thermal analysts often operate in their own environments, with minimal cross-team or cross-domain data exchange.

This fragmented structure complicates the creation and updating of accurate thermal models. Constructing reliable models often requires detailed physical characterization of chiplets, interposers, substrates and interconnects. The level of necessary detail can range from broad estimates in early exploration to intricate sign-off models. Unfortunately, most models are built manually—and typically late in the process. As a result, any design changes near tape-out can force costly and time-consuming recalibration that is simply unsustainable when each revision can cost millions.

Further adding to the challenge is the disconnect between electrical and thermal disciplines. Electrical engineers may lack insight into thermal issues, while thermal specialists may not grasp silicon constraints. Few current workflows synchronize thermal models automatically as design updates occur, making agile, accurate simulation across the design cycle difficult.

Shift-left thermal analysis

The most effective 3D IC thermal workflow must follow an integrated, dynamic process that begins early in the design phase and continues to keep thermal models current as a design evolves. Shifting thermal analysis “left” means incorporating it much earlier—and updating it continuously—throughout design. This empowers chip designers, package architects, and thermal specialists to work from a consistent, data-rich foundation.

In practice, this translates into updating the thermal model with each design iteration, so that the digital twin always reflects the latest changes. An ideal ecosystem allows all stakeholders to use interconnected tools for seamless data exchange, minimizing manual conversion or handoffs. Under this flow, design, verification and sign-off are not sequential checkpoints but interconnected phases in a fluid, evolving design process.

Key segments are already adopting this approach. Automotive electronics, particularly in AV and ADAS, require chiplet-based designs cooled at the system level. Data center architectures—supporting AI and high-compute workloads—depend on packages with stacked chips and must be analyzed from the die all the way to the system, often including advanced cooling solutions. Efficient bidirectional feedback and holistic analysis are quickly becoming competitive imperatives in these domains.

The democratization of thermal analysis

Democratizing thermal analysis—making it accessible to all engineering disciplines regardless of expertise—is essential for next-generation IC packaging. Recognizing this dire need, Siemens has spent years developing, ground proving, and refining a thermal analysis flow applicable to traditional ICs, SoCs, and 3D ICs.

Figure 2. Siemens’ integrated 3D IC design and thermal analysis process.

Siemens’ Calibre 3DThermal embodies this philosophy by delivering a unified platform that combines established IC verification with advanced thermal modeling. Calibre 3DThermal converts native IC databases into high-fidelity thermal models and supports designers, including non-specialists, in conducting meaningful simulations and interpreting results.

Package architects benefit as well, using tools like Innovator 3D IC, a unified cockpit for design planning, prototyping, and predictive analysis of 2.5/3D heterogeneous integrated devices. Innovator 3D IC works with Calibre 3DThermal to translate full connectivity data into robust thermal models. This integration enables electrical and thermal verification to advance in step, ensuring design changes propagate efficiently across all domains.

Mechanical engineers leveraging Simcenter Flotherm experience new efficiency as system-level models can now be automatically generated upstream, eliminating redundant rebuilding. This digital continuity ensures a synchronized and accelerated workflow from early silicon through package and system validation.

End users of 3D ICs, in turn, depend on thorough thermal analysis to ensure long-term component reliability. Many now request digital twins of a device’s thermal profile for validation in their application environment. However, sharing detailed thermal models raises IP protection concerns—these models can reveal die sizes, placement, and power distribution. This underscores the need for useful, thermal representations that protect valuable IP.

Siemens has addressed this challenge through Flotherm’s boundary condition independent reduced order models (BCI-ROM). BCI-ROMs preserve the essential thermal profile while safeguarding sensitive structural details, enabling secure collaboration and model sharing throughout the supply chain.

Keeping cool in the third dimension

The future of 3D IC thermal analysis will be shaped by four interconnected trends: greater integration as thermal simulation becomes routine; automation as model updates accelerate; collaboration as teams communicate seamlessly across domains; and democratization as powerful analysis becomes accessible to all contributors. Embracing these trends will enable the semiconductor industry to address mounting thermal complexities while reducing project risk and time-to-market for next-generation applications.

As 3D IC packaging matures, thermal analysis must remain central in order to deliver robust, manufacturable products. The Siemens integrated and automated approach—connecting design roles, automating model creation, and ensuring IP security—sets a compelling example for the industry.

Todd Burkholder is a Senior Editor at Siemens DISW. For over 25 years, he has worked as editor, author, and ghost writer with internal and external customers to create print and digital content across a broad range of EDA technologies. Todd began his career in marketing for high-technology and other industries in 1992 after earning a Bachelor of Science at Portland State University and a Master of Science degree from the University of Arizona.

Andras Vass-Varnai obtained his MSc and PhD degrees in Electrical Engineering from the Budapest University of Technology and Economics. He spent over a decade at Mentor Graphics as a product manager, leading various R&D projects focused on thermal test hardware and methodologies. Before assuming his current role as a 3D IC reliability solution engineer, Andras served as a business development lead in South Korea and the United States. Now based in Chicago, IL, he is dedicated to contributing to the development of a novel 3D IC package toolchain, leveraging his experience in thermal and reliability engineering.

Also Read:

Software-defined Systems at #62DAC

DAC TechTalk – A Siemens and NVIDIA Perspective on Unlocking the Power of AI in EDA

Digital Implementation and AI at #62DAC

Calibre Vision AI at #62DAC

August 25, 2025August 27, 2025

Intel’s Pearl Harbor Moment

Intel’s Pearl Harbor Moment
by Daniel Nenni on 08-25-2025 at 10:00 am
Categories: Foundries, Intel Foundry
5 Comments

There is a lot of talk about where Intel went wrong, the latest is missing AI, but people seem to forget one of the more defining blunders in the history of Intel. In April of 2012 Kirk Skaugen, the new general manager of Intel’s client PC group, moderated a Q&A with Mark Bohr, a 33+ year Intel fellow, and Brad Heaney, the Ivy Bridge program manager (22nm). All very accomplished people.

“Being an integrated device manufacturer really helps us solve the problems dealing with devices this small and complex,” Bohr said “the foundries and fabless companies won’t be able to follow where Intel is going.”

This is monopolistic thinking and it was the beginning of the end for Intel. To me this was Intel’s Pearl Harbor moment. As the movie goes: “I fear all we have done is to awaken a sleeping giant and fill him with a terrible resolve,”. The giant being TSMC, Apple, Nvida, AMD, etc… I first wrote about this here: Intel says fabless model collapsing… really? and the rest, as they say, is history.

I met Mark Bohr at a conference after that Q&A. He sat right next to me and was friendly and humble. He is a legend in the semiconductor industry, he was a career Intel employee who worked under “Only the Paranoid Survive” Andy Grove who weaponized semiconductor manufacturing. So how did Mark and Intel miss this one so badly?

For background I asked my AI Super Agent why Intel failed:

Process leadership broke down
Intel’s 10nm program slipped repeatedly as the company set an overly aggressive density target and leaned on complex multi-patterning instead of moving to EUV early. Management later admitted the goals were “too aggressive.” Mass production was pushed out and the cadence unraveled.

Meanwhile, TSMC put EUV into high-volume at 7nm+ in 2019 and ramped 5nm in 2020, giving AMD/Apple a clean runway on leading nodes while Intel was stuck.

The product cadence collapsed
Intel’s classic “tick-tock” rhythm ended in 2016, replaced by a slower “process-architecture-optimization” model just as the 10nm issues hit—compounding stagnation on 14nm refreshes.

Execution stumbles on flagship parts
The Sapphire Rapids Xeon rollout suffered multiple slips and large bug/respins before finally shipping, letting AMD’s Epyc surge in servers. Intel only began to regain footing with the Xeon 6 family (Sierra Forest in June 2024; Granite Rapids later in 2024/2025).

Strategic misses outside the PC/Server core
Intel exited smartphone SoCs in 2016 and sold its 5G modem business to Apple in 2019—ceding mobile. Earlier, the Larrabee discrete-GPU project was cancelled, delaying a credible GPU roadmap.

Security shocks
Spectre/Meltdown and follow-on side-channel issues forced mitigations that dented performance and trust, right in the middle of Intel’s process troubles.

Competitors seized the moment
AMD, fabbed at TSMC, gained steadily in x86 servers and desktops as Intel’s node woes lingered. (Mercury Research data shows AMD’s server unit share around the high-20s% by 2025.) Nvidia came to dominate AI accelerators, shifting the data-center spotlight away from CPU-only roadmaps.

Culture & capital allocation didn’t help
Years of buybacks (tens of billions) and sprawling bets (e.g., Optane later shut down; NAND sold) coincided with underperformance in the core manufacturing mission. In 2024 Intel announced deep cost cuts amid a share-price slump.

Becoming a customer of others’ fabs
As delays mounted, Intel increasingly used TSMC for tiles and products (e.g., Meteor Lake’s GPU tile on TSMC N5; certain HPC tiles), undercutting its historic IDM advantage.

Wrong! These are all symptoms. The root cause of Intel’s failure, in my opinion, was being lulled into complacency by the lack of competition. As I have said many times, being a monopoly is not for the greater good of a company or industry. We have seen this time and time again and, unfortunately, we will see it again. When will we learn? Never it seems.

Today, Intel has its back against the wall and will need to be clever AND innovate. With the recent CEO change and votes of confidence by Softbank and the US Government I have very high hopes. Next we need the top fabless companies to step up and cast a vote of confidence, for the greater good of the semiconductor industry, absolutely.

Lip-Bu Tan, CEO of Intel, said:

“We are very pleased to deepen our relationship with SoftBank, a company that’s at the forefront of so many areas of emerging technology and innovation and shares our commitment to advancing U.S. technology and manufacturing leadership. Masa and I have worked closely together for decades, and I appreciate the confidence he has placed in Intel with this investment.”

“As the only semiconductor company that does leading-edge logic R&D and manufacturing in the U.S., Intel is deeply committed to ensuring the world’s most advanced technologies are American made,” said Lip-Bu Tan, CEO of Intel. “President Trump’s focus on U.S. chip manufacturing is driving historic investments in a vital industry that is integral to the country’s economic and national security. We are grateful for the confidence the President and the Administration have placed in Intel, and we look forward to working to advance U.S. technology and manufacturing leadership.”

Also Read:

Should the US Government Invest in Intel?

Should Intel be Split in Half?

Making Intel Great Again!

August 25, 2025September 8, 2025

A Big Step Forward to Limit AI Power Demand

A Big Step Forward to Limit AI Power Demand
by Bernard Murphy on 08-25-2025 at 6:00 am
Categories: AI, Cadence, EDA

By now everyone knows that AI has become the all-consuming driver in tech and that NVIDIA GPU-based platforms are the dominant enabler of this revolution. Datacenters worldwide are stuffed with such GPUs, serving AI workloads from automatically drafting emails and summarizing meetings to auto-creating software and controlling factories. A true revolution in automation is underway, yet we already see that the power required to meet this new demand will quickly exceed utility generation plans. Bending that power growth curve is imperative, demanding further AI power reduction while the hardware is still in design. Understanding how to make such improvements starts with improved pre-silicon power estimation; Cadence and NVIDIA have been working together for many years on this objective and have announced a major step forward according to a recent press release.

The challenges in pre-silicon power estimation for AI

Pre-silicon dynamic power estimation has been available for some time, but there are three key challenges in applying these methods to AI systems: the size of the designs, the size and complexity of representative test cases based on AI models and benchmarks, and added burdens of dynamic power analysis under these constraints, which for high accuracy in estimation demand modeling at the gate-level, further straining modeling capacity limits.

NVIDIA has been using Cadence emulation platforms (Palladium) for 20 years or more to verify the functionality of their chip designs before committing to manufacturing, as have many other semiconductor (and more recently system) design companies. As designs sizes have grown exponentially, Palladium capacity has kept pace to the point that these emulators can now accommodate designs running to tens of billions of gates, in step with the very largest designs being built today.

The second challenge is the size of test cases. In non-AI applications, real use-case tests have been viewed as impractically large for detailed pre-silicon testing. Engineers have resorted to synthetic tests to validate essential characteristics while postponing real use-case validation to post-silicon, where problems found may require a costly re-spin of the design. This limitation can become even more acute for dynamic power analysis, which builds on top of functional verification. DPA overheads have commonly limited analysis to sampling only in expert-determined time windows. From these samples engineers construct an extrapolated sense of dynamic power averages and peaks through the synthetic test cycle yet risk the possibility that they will miss critical power anomalies outside those windows.

Unfortunately, this sampling approach is ineffective for estimating dynamic power in large AI applications. The only use-cases worth evaluating power against are complete AI tests with models and benchmark testcases, since it is very unclear how synthetic or sampling methods could confidently cover a representative subset of corner cases. We already know how big AI models can be and how involved the processing pipeline is for such models, whether for say 4/8K image CNNs or for transformer-based LLMs. Given typical frames/second rates in image processing or prompt response times in LLMs, it is obvious that billions of cycles must be emulated to span a realistic use case.

Add to this the overhead for dynamic power analysis (DPA) on top of that functional emulation and you can understand why realistic power profiling for big AI in pre-silicon testing seemed out of reach. Until now.

Cadence Redefines What is Possible with their DPA App

Cadence recently released their new DPA App leveraging the capabilities of their Palladium® Z3 Enterprise Emulation Platform. Here I must jump straight to the punchline because it is amazing. Cadence reported, with NVIDIA approval (NVIDIA ran their own benchmarks), that they ran DPA on this platform on billion-gate designs across billions of cycles within a few hours, with up to 97% power accuracy as determined against post-silicon power measurements. Everyone likes to claim that whatever they are selling is a game-changer, but results like this truly merit that description.

It’s worth peeling the accuracy point further since I have some background in this area from a previous life. I talked with Michael Young (Director of Product Marketing at Cadence and one of the Quickturn guys from way back) to get a better understanding.

First, I should acknowledge that I have been as dismissive as others of claims for accuracy in pre-silicon power estimation. These are usually based on RTL simulations, already suspect because they don’t accurately reflect synthesis optimizations or power dissipated in interconnect unless they support parasitics back-annotation from implementation trials. Accuracies under these constraints typically run from 15-20% of signoff estimates, not good enough to drive careful design optimization for power.

Michael makes two counter arguments for this DPA flow. First the analysis must be run at gate level on the design, with directly backannotated parasitics. Second, he was very careful to stress that comparisons with post silicon power should use exactly the same conditions (same model, same benchmarks, same software) as used in emulation DPA testing. He told me that he sometimes hears from other companies in non-AI applications that DPA doesn’t correlate very accurately with their post silicon measurements. When they dig deeper, it becomes obvious that they are not comparing identical pre- and post-silicon conditions. It’s tempting to believe we can be approximate on test similarity for power estimation pre-silicon and measurement in the lab and still get claimed accuracy. But of course we can’t – different conditions are where the discrepancies arise.

As AI usage grows worldwide, we need every possible tool we can find to bend that power curve. Cadence DPA running on Palladium Z3 provides a big step forward to help companies like NVIDIA further tune the power their chips consume, under real workloads. You can learn more at the Cadence Palladium web page.

Also Read:

Streamlining Functional Verification for Multi-Die and Chiplet Designs

Chiplets and Cadence at #62DAC

Prompt Engineering for Security: Innovation in Verification

August 24, 2025August 27, 2025

Free and Open Chip Design Tools: Opportunities, Challenges, and Outlook

Free and Open Chip Design Tools: Opportunities, Challenges, and Outlook
by Admin on 08-24-2025 at 10:00 am
Categories: EDA

Designing semiconductor chips has traditionally been costly and controlled by a few major Electronic Design Automation (EDA) vendors—Cadence, Synopsys, and Siemens EDA who dominate with proprietary tools protected by NDAs and restrictive licenses. Fabrication also requires expensive, often export-controlled equipment. This oligopoly raises barriers for small companies, researchers, and students.

A growing movement is shifting towards free and open-source EDA tools and open Process Design Kits (PDKs) that lower costs, broaden access, and foster innovation. Open EDA can be applied partially for simulation or across the entire design flow from concept to fab-ready layout. While most advanced nodes still require proprietary PDKs, open PDKs for older nodes (e.g., SkyWater 130 nm, IHP 130 nm, GlobalFoundries 180 nm, ICSprout 55 nm) enable ASIC production without NDAs.

Illustrated uses span education, research, and industry:

Commercial adoption: Google, Nvidia, and NXP use tools like Verilator (fast open-source simulation) and DREAMPlace (timing optimization) in production workflows. SPHERICAL applies open tools to design radiation-hardened chips for satellites; Swissbit integrates open, provably correct cryptographic modules.
Educational outreach: Initiatives like Tiny Tapeout and One Student One Chip let thousands of students produce working ASICs for as little as $300, using open PDKs.
Emerging products: Automotive ECUs, high-speed serial links, and gaming consoles are being prototyped with open flows.

Advantages and opportunities include:

Cost reduction: Proprietary EDA licenses can cost $10k+ per workstation per month. Open tools slash entry costs, enabling low-volume and niche ASIC designs to be profitable.
Innovation: Open frameworks allow modification, integration of AI-based design aids, and rapid iteration, often impossible in closed systems.
Security: Open designs and tools can be audited, mathematically verified, and built to standards like Caliptra, reducing risks of hidden hardware Trojans or backdoors. Transparent, community-verified cryptographic modules enhance trust.
Education and skills: Students can install and explore open EDA freely, enlarging the future talent pool in semiconductor design.

Challenges remain significant:

Performance gap: Open tools may lag behind commercial software in supporting advanced nodes, analog/mixed-signal, or mm-wave designs.
PDK access: Leading-edge fabs (e.g., TSMC N3) will likely never open their PDKs, limiting open EDA’s reach to legacy and mid-range processes.
Commercial risk: For high-value ASICs, companies may prefer proven proprietary flows to minimize costly re-spins.
Coordination vs. fragmentation: Efforts like OpenROAD (US), iEDA (China), and Coriolis (France) may duplicate work or compete rather than pool resources.

The paper outlines strategic options:

For investors: Experiment internally with open tools, join cost-sharing consortia (CHIPS Alliance, Linux Foundation), and explore hybrid flows mixing open and proprietary components.
For governments: Fund open PDK development, secure fabrication options, and support international cooperation—even across geopolitical lines—to build technological sovereignty.

Security-driven innovation is a standout theme. Openness enables formal verification of hardware components, provably secure random number generators, and side-channel-resistant designs. These can be deployed in “trusted” or even open fabs, forming verified value chains from chip design to system integration.

The future of open EDA could follow several paths:

Remain primarily educational, producing small ASICs for learning.
Emerge as a viable commercial competitor to the “Big Three,” especially for mid-range nodes.
Blend into hybrid flows, where open tools augment proprietary ones.
Drive creation of transparent, standardized fabs with openly shared process data.

Bottom line: While barriers exist, the trend appear, per UCSD’s Andrew Kahng, “unstoppable and irreversible.” As more companies and governments seek cost-effective innovation, security assurance, and independence from proprietary lock-in, free and open EDA is poised to expand its role in global semiconductor development.

The original paper from HEP Alliance is here.

Also Read:

WEBINAR: Functional ECO Solution for Mixed-Signal ASIC Design

Taming Concurrency: A New Era of Debugging Multithreaded Code

Perforce Webinar: Can You Trust GenAI for Your Next Chip Design?

August 24, 2025August 27, 2025

Chiplets: providing commercially valuable patent protection for modular products

Chiplets: providing commercially valuable patent protection for modular products
by Robbie Berryman on 08-24-2025 at 6:00 am
Categories: Chiplet

Many products are assembled from components manufactured and distributed separately, and it is important to consider how such products are manufactured when seeking to provide commercially valuable patent protection. This article provides an example in the field of computer chip manufacture.

Chiplets

A system-on-a-chip (SoC) is a type of integrated circuit product which acts as an entire computer in a single package, providing low-power and high performance data processing. SoCs are widely used, and provide the brains of smartphones, leading edge laptops, IoT devices, and much more.

A SoC includes essential functionality such as a central processing unit (CPU), memory, input/output circuitry, and so on. A traditional SoC provides these functions within a single monolithic piece of semiconductor material (for example, silicon) manufactured and distributed as a single integral device.

An emerging technology is the manufacture of a chiplet-based SoC by assembling a number of separately manufactured microchips known as ”chiplets” together in a package. Each chiplet is a building block having some functionality, and the collection of chiplets together provides the functionality of the SoC.

A chiplet-based SoC can have various advantages over monolithic SoCs, including increased production yield due to testing of individual chiplets, increased design flexibility as chiplets can be made using different manufacturing processes, and allowing simplified SoC design by assembling off-the-shelf chiplets.

Monolithic system-on-a-chip (SoC) and chiplet-based SoC

The law

In the UK, direct infringement under s.60(1) UK Patents Act requires that an infringing product includes every feature claimed in a patent claim.

This is fairly straightforward when considering infringement by an integral device such as a monolithic SoC: anyone making, importing, or selling a monolithic SoC including the claimed invention infringes the patent. For a monolithic SoC it doesn’t matter which parts of the SoC perform the different parts of the invention: the SoC is manufactured in one go and therefore an infringing SoC includes all of the claimed features from the point of manufacture.

However, due to the introduction of chiplets, a SoC is also an example of a product which may be assembled from separately manufactured parts. When patenting inventions in SoCs, such as developments at an architectural or micro-architectural level or techniques which might be implemented using a SoC, it may be natural to claim features of the SoC as a whole based on an assumption that a monolithic SoC would be used. However, this might lead to difficulties enforcing the patent.

In particular, a patent claim for an invention implemented in a chiplet-based SoC might include features provided by different chiplets, meaning that no chiplet alone provides all of the features of the invention. Therefore, manufacturers (or importers or sellers) of individual chiplets might not directly infringe the patent. The patent might only be directly infringed when the chiplets are finally assembled into a SoC, and this can diminish the commercial value of the patent as large parts of the supply chain are unprotected.

Indirect infringement under s.60(2) UK Patents Act might provide a get-out in some cases where a chiplet could be considered to indirectly infringe a patent for a SoC even if the chiplet does not include all the claimed features. However, it is often much harder to prove that indirect infringement has occurred, especially in cases of cross-border sale between a manufacturer in one territory and a downstream party in another territory.

Practical advice

It is important that an attempt is made to draft claims so they are directly infringed by products which are manufactured and distributed together. In the field of computer chip manufacture, claims should attempt to cover individual chiplets rather than full SoCs.

Often the core concept of an invention is actually provided by features of a particular sub-component, such as a particular chiplet. A careful selection of claim features can limit the claims to that particular component, so manufacture and sale of the component alone directly infringes the claim. If other elements are important to provide context for the inventive concept, it may be sufficient to refer to those elements indirectly in the claims so that they are not required for infringement.

It is worth noting that, as demonstrated by the introduction of chiplets changing the way SoCs are manufactured, what might be considered an integral product is liable to change as technology develops. We therefore recommend seeking professional advice from a patent attorney familiar with the technical field of your invention.

D Young & Co is a leading top-tier European intellectual property firm, dedicated to protecting and enforcing our clients’ IP rights. For over 130 years we’ve been applying our world-class expertise to take ideas, products and services further.

Also Read:

Alphawave Semi and the AI Era: A Technology Leadership Overview

Enabling the Ecosystem for True Heterogeneous 3D IC Designs

Altair SimLab: Tackling 3D IC Multiphysics Challenges for Scalable ECAD Modeling

August 23, 2025September 4, 2025

IMEC’s Advanced Node Yield Model Now Addresses EUV Stochastics

IMEC’s Advanced Node Yield Model Now Addresses EUV Stochastics
by Fred Chen on 08-23-2025 at 8:00 am
Categories: Lithography

It lays the foundation for the Stochastics Resolution Gap

Chris Mack, the CTO of Fractilia, recently wrote of the “Stochastics Resolution Gap,” which is effectively limiting the manufacturability of EUV despite its ability to reach resolution limits approaching 10 nm in the lab [1,2]. As researchers have inevitably found, the shrinking dimensions of features targeted by EUV lithography have led to increasing stochastic variability operating at the molecular level [1,3,4]. This, in turn, leads to variations of feature width, feature position, edge roughness, and worst of all, yield-killing defects.

An SPIE paper by IMEC last year gave an updated yield model which strove to take into account the stochastic behavior of EUV lithography [5]. The model made use of defect density from calibrated wafer data and was said to be benchmarked against industry [5,6]. The model is essentially:

where systematic yield is an estimated value (98%) and random yield is given by the Poission model exp(-A*D0), with A being the die area (here taken to be 1 cm2), and D0 being the defect density. Since some layers require more than one mask, D0 would be the product of the defect densities per mask use. Since EUV stochastic effects get worse with smaller pitch, the corresponding D0 per use of EUV mask increases with pitch. In fact, there is a cliff that starts just below 40 nm pitch (Figure 1).

Figure 1. Defect density per EUV mask use from calibrated wafer data, owing to stochastic behavior [5].

Advanced nodes also use immersion ArF lithography (193i). In IMEC’s model, the defect density per 193i mask use is held fixed at 0.005/cm2 [5,6]. Table 1 gives the assumed D0 values per mask use for the 7nm and 5nm nodes.

Table 1. Defect density (per cm2) per mask use for 7nm and 5nm process nodes [5].

Note that there are several versions of 5nm nodes. N5M applies both 193i and EUV for metal patterning (one mask each), while N5 EUV applies EUV only (single exposure patterning) and N5C EUV adds an EUV cut mask for metal patterning. Likewise, for 7nm, N7C EUV is N7 EUV with an EUV cut mask added for metal patterning. N5M EUV has better D0 per mask due to a more relaxed M1 pitch (equal to gate pitch), while N5 EUV and N5C EUV used tighter M1 pitch, effectively a shrink compared to N5 193i and N5M.

Table 2. Number of 193i and EUV masks used per 7nm and 5nm layer, assumed in [5].

Assuming a 1 cm2 die area, the yields are calculated for each of the five layers (M1, V1, M2, V2, M3), according to the number of masks used, then the layer yields are multiplied together, and finally the systematic yield is multiplied by the result to give the total yield. For 7nm, we see in Figure 2 that although the use of fewer masks with EUV increases the yield, the difference from the all 193i case is small. For 7nm, 193i LELE (self-aligned) patterning is largely sufficient and actually cheaper than single exposure EUV [7,8].

Figure 2. Estimated yields for 7nm (all 193i/all EUV single exposure/all EUV, including cuts).

On the other hand, for 5nm, more layers are at tighter pitches, increasing the stochastic defect density from EUV. Thus, increasing EUV use actually increased the overall defect density, lowering yield (Figure 3).

Figure 3. Estimated yields for 5nm (all 193i/mixed 193i/EUV/all EUV single exposure/all EUV, including cuts).

The reason for the drastic change in trend is the much higher EUV defect density (0.057/cm2) at the tighter metal pitch. In fact, compared to the 193i defect density (0.005/cm2), it is 11 times higher, meaning its impact on yield would be the same as using eleven 193i masks in multipatterning!

Thus, IMEC’s updated yield model is the basis for the Stochastics Resolution Gap, as it can be projected that yield impact from stochastics outweighs intrinsic resolution in process choices for advanced nodes. Further tuning of the model would be beneficial, such as accounting for performance impact from roughness and edge placement error as well as CD variations. This spotlights the need for improved, high-volume metrology techniques for detecting these issues from EUV stochastics, definitely something that Fractilia would be happy to deal with.