RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Podcast EP337: The Importance of Network Communications to Enable AI Workloads with Abhinav Kothiala

Podcast EP337: The Importance of Network Communications to Enable AI Workloads with Abhinav Kothiala
by Daniel Nenni on 03-27-2026 at 10:00 am

Daniel is joined by Abhinav Kothiala, a principal product manager for the Synopsys Ethernet IP portfolio. He has over 12 years of experience across engineering and product management, spanning SoC design, functional verification, and building wireless connectivity platforms and IoT products. He also holds two patents in circuit design.

Dan discusses the evolution of Ethernet standards with Abhinav, who explains that traditional Ethernet is not well-suited for distributed AI workloads. In this informative discussion, Abhinav describes new and emerging interface protocols better suited to AI environments.

He discusses what is needed to achieve the required scale-up. Abhinav explains that the network must “disappear” by delivering very low latency, deterministic performance. The capabilities of ESUN and UALink are explored in detail. Abhinav also explains what the future will require and the role of the IP provider.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center

Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
by Jonah McLeod on 03-27-2026 at 6:00 am

Terafab Elon Musk 2026

At the TERAFAB launch event in Austin on March 21, Elon Musk made a prediction that would have sounded like science fiction a decade ago—and may still: roughly 80 percent of AI compute will eventually move off-planet.

The argument is straightforward once you accept his premises. Earth-based data centers face three hard constraints—land, cooling, and grid capacity—and all three are getting worse as AI infrastructure demand accelerates. Land requires zoning, permitting, and proximity to fiber and power. Cooling consumes enormous quantities of water or electricity, or both. And grid capacity, particularly clean grid capacity, is increasingly contested.

Space, Musk argues, dissolves all three simultaneously. Satellites don’t need real estate. The vacuum of space is an ideal radiative heat sink—no water, no chillers, no mechanical systems at all. And solar irradiance above the atmosphere runs roughly five times the average output of a ground-based installation—not because the sun shines harder in space, but because a space-based array sees the sun continuously, with no night cycle, no weather, and no atmospheric losses. It is, Musk suggested, basically a free data center—if you can get there.

The obvious objection is launch cost. Getting hardware into orbit remains expensive by any terrestrial comparison. Musk’s counter is that Starship changes the math, and TERAFAB—announced the same evening, in a defunct Austin power plant, with light beams shooting into the sky and the Governor of Texas in the audience—changes it further.

TERAFAB is a $20–25 billion joint venture between Tesla, SpaceX, and xAI, to be built at Giga Texas in Austin, consolidating chip design, lithography, fabrication, memory production, packaging, and testing under one roof—vertical integration no semiconductor company has attempted at this scale, for reasons that will become apparent. The stated production target is chips with an aggregate power draw of one terawatt—roughly fifty times the estimated power consumption of all advanced AI chips currently in production worldwide.

Musk uses power draw as his unit of scale because it is the one metric that translates across wildly different chip architectures, and because it serves his core argument: total US grid capacity runs approximately 0.5 terawatts, making a terawatt of chip power physically impossible to run on Earth. Most of it, he concludes, must go to space. Getting that much compute into orbit means launching roughly 10 million tons per year—approximately 50,000 Starship flights annually, or one every ten minutes. Musk provided no construction or production timeline..

TERAFAB is intended to produce two chip families: AI5, a purpose-stripped inference processor for Tesla vehicles and Optimus robots, with design nearly complete and small-batch production expected later this year; and D3, a space-hardened chip purpose-built for the orbital satellite constellation. Musk has described personal involvement in AI5’s design—the strategic decisions appear to be his; the detailed engineering work is being done by Tesla’s in-house chip team, whose names are not public. The D3 has no disclosed timeline, no foundry assignment, and no published architecture. SpaceX has already filed with the FCC to launch up to one million satellites built around it. The satellites are ready for ordering. The chip is ready for naming.

If launch prices fall to the levels Musk is targeting and TERAFAB delivers at anything approaching its stated capacity, the economics of orbital compute become at least arguable. Space offers effectively unlimited siting, free radiative cooling, and abundant solar power without grid or permitting constraints. In that model, the long-term savings eventually swamp the upfront cost of getting hardware off the ground. The physics are genuine. The execution is another matter.

What Stays on the Ground

Anything with a human or machine waiting on a response. Conversational AI, agentic pipelines, autonomous vehicles, industrial robotics, financial systems, real-time audio and video processing—all require response times that orbital round-trips cannot accommodate. LEO adds 40–80ms of latency before a single computation runs. GEO pushes that past 500ms. For a user waiting on a reply, or a robot waiting on a command, that’s disqualifying. Gravity, it turns out, is not the only thing keeping compute on Earth.

What moves to orbit? Training runs and batch workloads. A model training job that takes days doesn’t care about a 60ms round-trip. Neither does batch inference, large-scale data processing, scientific simulation, or pre-generated content rendering. These are the workloads that consume the most power and are hardest to site on Earth—and they are genuinely good candidates for orbital migration, if someone can build the infrastructure to get them there.

The 80 Percent Problem

Here is where Musk’s headline figure deserves scrutiny. Current data on workload composition suggests the orbital-eligible fraction of global data center compute is closer to 20–30 percent—not 80. The gap between those numbers is not a rounding error. It is the entire argument.

According to McKinsey’s December 2025 data center demand model, total global data center demand in 2025 runs approximately 82 GW, with AI training accounting for 23 GW, AI inference 21 GW, and non-AI workloads 38 GW. [McKinsey & Company] Training—the most straightforwardly orbital-eligible workload—represents roughly 28 percent of the total. Add the latency-tolerant fraction of batch processing and non-AI workloads and you might reach 35–40 percent, generously.

The bigger problem is where growth is headed. Inference will account for roughly two-thirds of all AI compute by 2026, up from about one-third in 2023. [Deloitte Insights] And inference is structurally latency-bound. Inference workloads follow user behavior, and real-time responsiveness is key—which is why inference infrastructure needs to be close to population centers. [Edgecore] That requirement doesn’t dissolve with cheaper launch costs. It doesn’t dissolve at all.

McKinsey projects that by 2030, AI inference will represent more than 40 percent of total data center demand, overtaking non-AI workloads by 2029, while training holds steady at just under 30 percent. [McKinsey & Company] The dominant and fastest-growing category of compute is precisely the one most resistant to orbital migration. Musk’s 80 percent assumes a future where most inference migrates off-planet—which would require either a latency breakthrough that does not appear on any roadmap, or a fundamental restructuring of how AI applications are built that nobody has proposed.

None of this invalidates the core insight. Training workloads are insensitive to latency and can tolerate delays of up to 100 milliseconds between adjacent regions, which already allows hyperscalers to site them in remote, power-rich areas where grid capacity, land, and water are more available. [McKinsey & Company] Orbit is simply the logical extreme of that same siting logic. A more defensible claim might be that orbital compute captures 25–35 percent of global data center demand within the next two decades, concentrated in training and scheduled batch workloads. That is still an enormous market. It is just not the one Musk described in Austin.

The Harder Questions

Thermal management in low earth orbit, radiation hardening at scale, on-orbit servicing, and debris risk remain largely unaddressed in Musk’s public presentation. The D3’s design philosophy—running hotter to shed radiator mass—is elegant engineering thinking. But a chip that hasn’t taped out is not a solution to any of those problems yet. And the launch arithmetic is sobering: 50,000 Starship flights a year is not an engineering challenge, it is a category error relative to anything in the current manifest.

What is real: the terrestrial power constraint driving this vision is genuine and worsening. The semiconductor and systems industries have been quietly watching data center power demand outrun grid capacity for years. Musk is the first person with launch infrastructure, chip design capability, and apparent willingness to spend $25 billion making the orbital alternative credible. That is worth taking seriously, even if the specific numbers are not.

In Austin last week, the conversation shifted. Whether or not TERAFAB delivers on its promises, orbital compute is no longer a thought experiment. That much Musk has accomplished—which is, it should be said, more than most people accomplish in a career.

The rest of the scorecard, however, looks like this: Dojo was cancelled, revived, renamed, and partially absorbed into AI6—all within six months. AI5 was “finished” in July 2025, “almost done” in January 2026, and still not taped out in March. The D3 chip that the entire orbital compute vision depends on has no disclosed design, foundry, or timeline. SpaceX has an FCC filing for a million satellites built around a chip that doesn’t exist yet. And TERAFAB itself has no construction timeline and a price tag that isn’t in Tesla’s capital plan.

Standing in front of all of that, Musk announced the next three projects: megawatt satellites, a lunar factory, and an electromagnetic mass driver on the Moon.

He is, as ever, a man who is always three projects ahead of his last unfinished one.

Also Read:

Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era

Captain America: Can Elon Musk Save America’s Chip Manufacturing Industry?

TSMC Technology Symposium 2026: Advancing the Future of Semiconductor Innovation


Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era

Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
by Daniel Nenni on 03-26-2026 at 10:00 am

Silicon Insurance Why eFPGA is Cheaper Than a Respin

As semiconductor technology advances into increasingly complex and expensive process nodes, the economic and technical risks associated with ASIC design have grown dramatically. At advanced nodes such as Intel 18A, the cost of a single design error can escalate into tens of millions of dollars, compounded by months of delay. In this environment, embedded FPGA (eFPGA) technology has emerged as a compelling solution, often described as “silicon insurance.” By integrating reconfigurable logic directly into an ASIC, eFPGA enables post-silicon flexibility that can eliminate the need for costly respins. Companies like QuickLogic are at the forefront of this shift, demonstrating that the modest overhead of eFPGA is far outweighed by the financial and strategic benefits it provides.

The fundamental issue with traditional ASIC design lies in its rigidity. Once fabricated, an ASIC is effectively immutable. Any functional bug, evolving standard, or late-stage requirement change necessitates a full respin. This process involves redesigning portions of the chip, regenerating masks, fabricating new wafers, and repeating validation cycles. At leading-edge nodes, mask sets alone can cost tens of millions of dollars, while the full respin cycle can delay product launch by six to twelve months. In fast-moving markets such as artificial intelligence, automotive systems, and data center infrastructure, such delays can result in lost market share that far exceeds the direct engineering costs.

Embedded FPGA addresses this challenge by introducing programmable logic into the ASIC fabric. Unlike fixed-function logic, eFPGA blocks can be reconfigured after fabrication, allowing designers to patch bugs, update algorithms, or adapt to new standards without altering the physical silicon. This capability fundamentally changes the risk profile of chip design. Instead of committing entirely to fixed functionality, designers can reserve a portion of the die for flexibility, effectively hedging against uncertainty. The additional area and power overhead associated with eFPGA becomes a predictable, bounded cost, analogous to an insurance premium while avoiding the potentially catastrophic expense of a respin.

Beyond cost avoidance, eFPGA provides significant advantages in time-to-market. In many applications, being first to market is critical. A respin not only incurs direct costs but also disrupts product timelines, often causing companies to miss key deployment windows. By contrast, eFPGA enables iterative development even after silicon has been deployed. Hardware can evolve alongside software, allowing companies to respond quickly to changing requirements or competitive pressures. This agility is particularly valuable in domains where workloads are not fully defined at design time, such as machine learning accelerators or edge computing platforms.

Historically, one of the main criticisms of FPGA-based approaches has been their impact on power, performance, and area (PPA). However, recent advancements in eFPGA hard IP have significantly narrowed this gap. QuickLogic has been a leader in optimizing eFPGA architectures to reduce overhead while maintaining flexibility. This progress is exemplified by the company’s recent announcement, QuickLogic Announces Contract for High Density eFPGA Hard IP Optimized for Intel 18A. This development focuses on architectural enhancements that improve logic density, reduce power consumption, and increase performance, making eFPGA more viable for integration into cutting-edge ASIC designs.

The significance of this announcement extends beyond a single contract. Intel 18A represents one of the most advanced semiconductor nodes, incorporating new transistor architectures and manufacturing techniques that further increase design complexity and cost. By optimizing eFPGA hard IP for such a node, QuickLogic is demonstrating that embedded programmability can coexist with the highest levels of silicon efficiency. Moreover, the architectural improvements developed for this engagement are designed to be extensible across multiple advanced nodes, indicating a broader industry trend toward integrating flexibility directly into the silicon fabric.

This shift reflects a deeper transformation in how hardware is designed and deployed. As systems become more complex and workloads more dynamic, the traditional boundary between hardware and software is blurring. eFPGA enables a more software-like approach to hardware, where functionality can be updated, optimized, and extended over time. This capability is increasingly important for applications that require long lifecycles, such as aerospace, defense, and infrastructure, where replacing hardware is costly or impractical.

Bottom line: eFPGA serves as a form of silicon insurance that fundamentally alters the economics of chip design. By trading a modest increase in area and power for the ability to avoid expensive and time-consuming respins, designers can significantly reduce both financial risk and time-to-market uncertainty. QuickLogic’s work on high-density eFPGA hard IP optimized for Intel 18A underscores the growing maturity of this approach and its relevance at the most advanced process nodes. As the semiconductor industry continues to push the limits of performance and integration, the ability to adapt silicon after fabrication is no longer a luxury, it is a necessity.

Also Read:

Global 2nm Supply Crunch: TSMC Leads as Intel 18A, Samsung, and Rapidus Race to Compete

TSMC vs Intel Foundry vs Samsung Foundry 2026

Intel to Compete with Broadcom and Marvell in the Lucrative ASIC Business


Synopsys Advances Hardware Assisted Verification for the AI Era

Synopsys Advances Hardware Assisted Verification for the AI Era
by Kalar Rajendiran on 03-26-2026 at 6:00 am

Software Defined HAV, Scalability, Density, Performance and EP Ready Hardware

At the 2026 Synopsys Converge Event, Synopsys announced a broad set of new products and platform upgrades, with its hardware-assisted verification (HAV) announcement emerging as a key highlight within that lineup. A key aspect of this announcement was moving beyond a hardware centric model to a more scalable, programmable infrastructure that can continuously evolve through software updates. By positioning HAV in this way, Synopsys emphasized improvements in performance, automation, and system level validation needed to keep pace with the growing complexity of AI driven semiconductor designs. As AI systems scale in complexity and deployment accelerates, verification, has increasingly become a bottleneck to bringing new silicon to market. In that context, the shift to software-defined HAV represents a strategic response to one of the industry’s most pressing constraints.

The announcement builds on the company’s 2025 introduction of next generation HAV hardware, including the Synopsys ZeBu-200 emulation platform and Synopsys HAPS-200 FPGA prototyping systems. Those platforms expanded capacity and performance for large system on chip designs. In 2026, Synopsys layered on top of that foundation with software-defined updates, along with automation capabilities, and new configurations such as the ZeBu-200 12 FPGA system and the HAPS-200 1 FPGA and 12 FPGA systems.

Verification Hits a Breaking Point

The backdrop to the announcement is a fundamental shift in semiconductor design. AI processors now integrate heterogeneous compute engines, massive memory bandwidth, and increasingly rely on multi die chiplet architectures. At the same time, the software stacks that run on these systems have grown just as complex.

Traditional RTL simulation, long the backbone of verification, cannot keep up with these demands at system scale. Running meaningful workloads can take weeks or months, making it impractical for validating full system behavior. The challenge is no longer validating blocks in isolation, but validating entire systems running real AI workloads, something conventional approaches cannot do within practical development timelines. Hardware assisted verification, using emulation and FPGA prototyping, has therefore emerged as a critical solution, enabling designs to run orders of magnitude faster and allowing software development to begin before silicon is available.

From Hardware Appliances to Software Defined Infrastructure

What distinguishes the 2026 announcement is a shift in how HAV platforms are conceptualized. Historically, systems like emulators and FPGA prototypes were treated as high performance but relatively fixed hardware appliances, with improvements tied primarily to new hardware generations.

Synopsys is now moving toward a software-defined HAV architecture, where platforms such as ZeBu Server 5, ZeBu-200, and HAPS-200 whose software layer dynamically manages resources, workloads, and debugging capabilities. This enables performance gains, such as up to two times improvements on ZeBu Server 5, to be delivered through software updates rather than requiring new hardware deployments.

This shift is critical because AI workloads, architectures, and software stacks are evolving faster than hardware refresh cycles. A hardware only model cannot keep pace with this rate of change. By contrast, software-defined HAV allows verification platforms to improve continuously, enabling teams to adapt to new AI workloads and system requirements without waiting for the next generation of hardware. The result is a more flexible and future proof verification environment.

Automation Moves into the Core of Verification

Another notable aspect of the announcement is the introduction of hardware-assisted test automation, signaling a shift toward more automated verification workflows. Rather than relying heavily on manually constructed tests, engineers can now run automated validation scenarios directly on HAV platforms.

These include complex system level checks such as cache coherency validation and subsystem stress testing across processor, memory, and IO architectures. In the 2026 announcement, Synopsys positions these not simply as automation features, but as hardware-assisted test solutions capable of exercising full processor subsystems under realistic workloads. By running automated coherency and subsystem validation directly on Synopsys HAV platforms, engineers can expose bugs that typically only appear under long running, highly concurrent AI workloads, conditions that are difficult or impractical to reproduce in traditional verification environments.

At AI scale, where subsystem interactions dominate system behavior, this shift from manually constructed tests to automated, workload driven validation becomes essential. The number of possible interactions across cores, caches, and memory systems is simply too large for manual approaches, making automation not just a productivity improvement, but a fundamental requirement for verifying modern AI processors.

Scaling for AI Scale Designs

Performance and capacity remain central to the HAV roadmap, and the 2026 announcement introduces meaningful gains in both areas. The ZeBu Server 5 sees up to a twofold increase in runtime performance, while modular configurations such as the ZeBu-200 12 FPGA and HAPS-200 12 FPGA systems enable similar scaling in capacity.

These improvements are critical for supporting AI chips that may incorporate tens of billions of gates, multiple compute domains, and complex interconnect structures. Faster compile times and enhanced debug capabilities further help teams manage these increasingly large verification workloads, reducing the time required to reach meaningful coverage. A customer quote substantiates this point. See below.

“As AI-driven systems become more complex, verification must scale just as quickly. Hardware-assisted verification is no longer optional. It is critical to meeting aggressive time-to-market goals and ensuring silicon readiness,” said Salil Raje, Senior Vice President and General Manager, Adaptive and Embedded Computing Group, AMD. “FPGA-based emulation and prototyping play a central role in that effort by accelerating system bring-up and enabling earlier software development. Our collaboration with Synopsys reflects that focus. Through joint optimization of Synopsys ZeBu with the AMD Vivado™ software stack, and by leveraging AMD EPYC™ processors for compute acceleration, we are reducing compile times and helping customers move to accurate system models faster.”

Flexible Configurations for a Broader Range of Use Cases

Alongside performance improvements, Synopsys is expanding the flexibility of its HAV offerings. The introduction of configurations such as the HAPS-200 1 FPGA desktop system provides an entry point for IP level validation and early software development, while larger configurations like the HAPS-200 12 FPGA and ZeBu-200 12 FPGA systems scale to full system verification.

This range of configurations allows design teams to align their verification infrastructure more closely with specific project needs, supporting a continuum from early stage validation to full system workload execution.

Expanding the Scope of What Can Be Verified

The 2026 HAV enhancements also broaden the scope of verification itself. New support for real number modeling (RNM) allows analog behavior to be approximated within digital verification flows, while fault emulation capabilities address the needs of safety critical applications.

These additions reflect a shift toward full system validation, where digital logic, analog effects, and software interactions must all be considered together. As semiconductor designs become more heterogeneous, this expanded coverage is essential for accurately validating real world AI systems rather than simplified models.

Summary

The Synopsys Converge Event HAV Announcement reflects a broader industry transition. As AI drives exponential growth in chip complexity, verification must evolve from a collection of tools into a scalable, software driven platform.

The significance of software defined HAV lies in its role in enabling AI proliferation itself. As AI hardware becomes more complex and deployment cycles accelerate, the ability to verify systems quickly and at scale determines how fast innovation can reach the market. By removing verification as a limiting factor, through continuous performance improvements, automation, and system level validation, Synopsys is positioning HAV not just as a tool, but as a critical enabler of the AI ecosystem.

Read the entire HAV announcement here.

Learn more at Synopsys.com/HAV

Also Read:

Scaling Multi-Die Connectivity: Automated Routing for High-Speed Interfaces

Synopsys Explores AI/ML Impact on Mask Synthesis at SPIE 2026

Agentic AI and the Future of Engineering


Chemical Origins of Environmental Modifications to MOR Lithographic Chemistry

Chemical Origins of Environmental Modifications to MOR Lithographic Chemistry
by Daniel Nenni on 03-25-2026 at 10:00 am

Chemical Origins of Environmental Modifications to MOR Lithographic Chemistry

In the pursuit of advanced extreme ultraviolet (EUV) lithography for high-NA patterning, metal oxide resists (MORs) offer significant promise but face challenges like critical dimension (CD) variation due to atmospheric interactions. Presented at SPIE Advanced Lithography + Patterning 2025 by Kevin M. Dorney and colleagues from imec, this study delves into the chemical mechanisms behind environmental modifications during exposure and processing, emphasizing the role of gases like O₂, CO₂, and H₂O in post-exposure delay (PED) and bake (PEB).

MORs, such as tin-based systems, undergo ligand loss upon EUV exposure, leading to condensation and pattern formation. However, post-exposure atmospheric exposure can cause CD drift, linked to airborne molecular contaminants (AMCs) and humidity. Literature reviews highlight mechanisms: Kenane et al. propose CO₂ and H₂O forming Sn-O-C=O-Sn bridges; Frederick et al. suggest O₂ enhancing ligand loss in Keggin clusters; Zhang et al. outline air/N₂ pathways yielding oxygenated Sn sites; and Castellanos et al. attribute CD drift to H₂O and AMCs cleaving ligands faster.

To probe these, imec’s BEFORCE platform, integrating EUV exposure, FTIR spectroscopy, outgassing measurements, and controlled environments—enables precise studies. The tool allows MOR coating, dosed EUV exposure, PED/PEB in custom atmospheres (e.g., varying O₂, N₂, CO₂, RH), and offline development/ellipsometry.

Initial FTIR on an open-source MOR (OSMO) reveals post-exposure ligand loss, PED-induced H₂O uptake, and PEB-driven further loss, counter-ion departure, and SnO formation. Notably, a new ~1700 cm⁻¹ peak (C=O) emerges post-PEB in air but not vacuum or N₂, indicating an air-specific, thermally stable product.

Investigating origins, CO₂ variations show no chemical change, ruling it out. H₂O isolation via RH skew in N₂ vs. clean air (CA) PEBs (35 mJ, 220°C, 60s) yields faint or absent C=O in N₂, consistent presence in CA regardless of RH. Peak ratio (1700/1550 cm⁻¹) offsets suggest O₂’s role, possibly forming esters.

Focused DOEs confirm O₂ drives C=O: at fixed 220°C PEB, intensity rises non-linearly with O₂% (diluted in N₂); at fixed O₂%, it emerges ~200°C and sharpens with temperature. Thus, O₂ and PEB temp dominate oxygenated carbon formation; CO₂/H₂O show no dependence.

Kinetics reveal O₂ amplifies ligand cleavage: at fixed temp, loss follows first-order rate law, suggesting unimolecular O₂-MOR reaction. Rate constants (k) scale with O₂ (e.g., 4.18×10⁻³ s⁻¹ at 21%, 6.11×10⁻³ at 50%), yielding activation energies ~58 kJ/mol. Temp dependence shows exponential cleavage increase, non-catalytic.

Contrasting exposed vs. unexposed films, O₂ enhances cleavage only post-EUV, implying radicals or active sites from exposure enable O₂ reaction. Proposed: EUV cleaves ligands, creating radical Sn for O₂ insertion, forming Sn-O-Sn or peroxides, unlike unexposed thermal processes.

Quantitatively, 50% O₂ yields ~3x ligand loss enhancement at high temps, potentially lowering EUV doses by amplifying sensitivity without resolution loss. Companion work (Pollentier et al., SPIE ALP 2026) links this to dose-to-gel reductions up to 30%.

Bottom Line: This reveals O₂-dependent chemistry in model MORs, requiring EUV activation. Outlook includes in-situ PEB studies, PED O₂ effects, and H₂O+O₂ synergy. Funded by EU’s Chips Joint Undertaking and partners, these insights enable co-optimized environments for stable, sensitive MOR processes, advancing semiconductor scaling.

Acknowledgments thank imec teams, Intel, and suppliers for materials and discussions.

Also Read:

Beyond Moore’s Law: High NA EUV Lithography Redefines Advanced Chip Manufacturing

Accelerating Computational Lithography Using Massively Parallel GPU Rasterizer

Unraveling Dose Reduction in Metal Oxide Resists via Post-Exposure Bake Environment


Post-Silicon Validating an MMU. Innovation in Verification

Post-Silicon Validating an MMU. Innovation in Verification
by Bernard Murphy on 03-25-2026 at 6:00 am

Innovation New

Some post-silicon bugs are unavoidable, but we’re getting better at catching them before we ship. Here we look at a method based on a bare-metal exerciser to stress-test the MMU. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Post Silicon Validation of the MMU. The authors are from IBM and this is an extension to a paper we looked at in 2022. The paper we review here was posted in 2021 at the DATE conference and has 2 citations. The method generates (offline) multi-threaded tests to run on first silicon, requiring only a bare metal interface. This exerciser is a self-contained program with all supporting data and library functions and can run indefinitely using new randomized choices for each set of threads.

The extension here focuses on MMU functions (TLB, page table walks, etc) in a multi-core environment.  In part, concepts are like pre-silicon testing methods but exploit the advantage that randomized post-silicon testing can cover a much larger state space over a much longer time than would be possible in RTL simulation. Similarly, the exerciser can add system level stresses to testing such as context switches and page migration.

Paul’s view

Back to the topic of bare-metal exercisers this month with another paper from IBM on their Threadmill tool. These exercisers are template-based generators of pseudo-random software programs. The programs generated are “bare metal” without any need for an operating system so have the freedom to create a range of low-level hardware race conditions that would otherwise be very hard to hit. The primary application is in post-silicon testing, but there is an increasing trend to use these exercisers in pre-silicon emulation since they often run much faster than classic emulation testbenches.

Our previous blogs covered bare-metal exercisers targeting coherency bugs in multi-core CPUs, deploying various tricks to stress test load/store race conditions between concurrent threads running on multiple cores. This month’s blog is on stress testing virtual to physical address translation in a memory management unit (MMU). A modern MMU has a lot of complexity, including caches for recently accessed address translations, multi-level address translation tables, and security controls. Covering all the corner cases for cache misses, thread context switches to different virtual address spaces, and security policy violations, especially race conditions between combinations of these exceptions, is borderline impossible without doing it at a bare metal level.

One key innovation in the paper is to use constraint solving (exactly as in a commercial logic simulator), to stress test permutations of concurrent address translations between multiple threads, especially including all possible walks through multi-level address translation tables. Another innovation relates to stress testing changes to the translation tables. Here, one thread runs a program that continuously locks a random block of virtual addresses and moves its associated physical address block to another location in physical memory, updating the MMU translation table accordingly. Meanwhile multiple other threads continuously run random load and store operations to those virtual addresses.

The authors implement all their ideas using Threadmill and highlight 3 deep corner case bugs that their solution found that were missed by other in-house IBM exercisers as well as their regular pre-silicon DV work. They also compare RTL code coverage achieved across the same, showing that Threadmill beats the other exercisers by about 4%, although it’s still behind regular pre-silicon DV coverage by 3%. Tight paper, with some nice ideas and clear benefits.

Raúl’s view

This month’s paper, “Post-Silicon Validation of the MMU”, presents a methodology for validating a Memory Management Unit using a bare-metal exerciser (Threadmill). The MMU is not “just another block”: while the core is logic, the MMU is a co-designed distributed HW/SW protocol which makes it disproportionately hard to verify. It sits at the boundary between hardware and the OS (or Hypervisor), translating virtual to physical addresses through multi-level tables and caches (TLBs). This creates a massive combinatorial space, with aliasing, contexts, and shared resources across cores. The key idea is to significantly enrich the generation of address translation scenario beyond simple VA to PA mappings, toward randomized, constraint-driven, and context-aware translations.

The approach includes off-target generation of translation mappings (page tables and paths) using constraint solvers, graph-based constraint solving to create diverse translation paths, complex runtime behaviors (page migration, context switching, TLB invalidations) and embedded exception handlers. The result is a system that better stresses corner cases in MMU behavior, especially those involving concurrency, aliasing, and rare timing interactions — areas where pre-silicon verification is weakest.

The paper reports RTL coverage as the primary quantitative metric, highlighting a ~4% improvement over a state-of-the-art exerciser. RTL coverage is primarily pre-silicon sign-off metric and is used here as a proxy metric for comparability. In post-silicon validation the real interest is bugs, especially the rare, high-impact ones. Coverage tells us how much of the map was explored; it does not tell us if we found what we were looking for. The paper does list non-trivial bugs found, including ones that led to additional tape-outs; the 4% is unlikely to be “more of the same coverage” but rather harder to reach corner cases.

I found the paper hard to read. Many of the innovations over a baseline Threadmill-style exerciser are described almost entirely in dense prose, for example the translation engine (GCSP over DAGs) and runtime scenarios like page migration, context switching, and attribute perturbations. The paper also leans heavily on prior work like papers, patents, and internal techniques. For experts, this may be familiar, for broader audiences it reduces accessibility.

The paper is a strong, experience driven contribution to post-silicon verification reflecting a rich, mature body of industrial knowledge, particularly relevant for teams dealing with MMU verification. While the presentation is dense and occasionally hard to read, the underlying ideas remain highly relevant. It is methodologically aligned with current practice (randomization, stress, coverage), but does not yet reflect newer paradigms (e.g., ML-guided test generation, agentic flows, feedback-driven exploration). The reported gains may look incremental, but in the context of late-stage silicon validation, they can be the difference between a clean product launch and an expensive respin.

Also Read:

An Agentic Formal Verifier. Innovation in Verification

Agentic EDA Panel Review Suggests Promise and Near-Term Guidance

TSMC and Cadence Strengthen Partnership to Enable Next-Generation AI and HPC Silicon


Securing UALink in AI clusters with UALinkSec-compliant IP

Securing UALink in AI clusters with UALinkSec-compliant IP
by Don Dingee on 03-24-2026 at 10:00 am

UALinkSec 200 Security Module block diagram

A classic networking problem is securing connections with encrypted data, but implementing strong encryption algorithms at wire speeds can limit performance. However, introducing blazing-fast connectivity without an encryption strategy leaves systems vulnerable. The architects in the UALink Consortium, including Synopsys representation, understood their assignment. UALink defines point-to-point accelerator links with a switched architecture for scaling up AI clusters to 1,024 accelerators, and the latest UALink 200G specification solidifies the UALinkSec security framework. As a companion to its UALink controller IP and robust 224G PHY IP, Synopsys is introducing its UALinkSec_200 Security Module, the first specification-compliant implementation for UALink security.

Inserting UALinkSec in the UALink network layers

UALink borrows a standard Ethernet PHY physical layer and adds unique link, transaction, and protocol layers to build in advanced features for point-to-point connections. This physical-layer choice enables immediate reuse of Ethernet 802.3dj-compliant PHY components, including the Synopsys 224G PHY IP. Low latency is a primary consideration, and simplifying assumptions helps. Fixed payloads carry either 64 or 640 bytes; reducing cable length keeps it under 4 meters; and endpoints are limited to fewer than 1,024. Link-layer retransmission and credit-based flow control keep data moving, with retransmissions occurring in less than 1 usec. A high-level overview of the stack from the UALink 200 v1.0 spec:

Between the transaction and protocol layer sits UALinkSec, deceptively thin in its description as “[end-to-end] encryption and authentication.” Its role is to protect network traffic and switches from any adversary, whether physically present or virtually inserted. UALinkSec supports encryption and authentication of all the UPLI protocol channels – requests, read responses, and write responses. When enabled, it provides data confidentiality and integrity. A simplified view, with the keys indicating UALinkSec operation:

Encryption based on AES-GCM for security and speed

The good news is UALinkSec is cleanly decoupled from the other UALink layers, making it ripe for a dedicated hardware co-processor block. Still, processing encryption algorithms can be a heavy-duty task, and power efficiency in AI data centers is a growing concern, especially since it scales directly with the number of AI nodes. Any encryption battle where processing time and power consumption are crucial parameters is won or lost on a simple decision: choosing the right encryption algorithm. If an algorithm is efficient, it’s a much more straightforward task to wrap processing around it and deliver encrypted data on time with as few watts as possible.

When you create a new security specification, you can choose a modern encryption algorithm that offers both security and speed. For UALinkSec, that choice would be AES-GCM, a variant of AES that uses Galois/Counter Mode for extremely fast symmetric-key block ciphers. Dedicated, inexpensive hardware unleashes the full speed of AES-GCM.

Against that background, Synopsys created a new IP block, the UALinkSec_200 Security Module, which complements its UALink controller IP and 224G PHY, forming a complete UALink IP Solution. The UALinkSec_200 Security Module aligns with the UALinkSec component of the UALink 200 specification. In addition to encryption and decryption functions, it supports key derivation functionality and optional authentication support – all at full UALink speeds of 200 GT/s per lane. A block diagram shows how it handles both transmit and receive data paths:

For more background, the UALink Consortium has a white paper that provides an introduction to the UALink 200G specification, including a section on UALinkSec.

Synopsys teams detail their solution in a blog post discussing the architecture and features of the UALinkSec_200 Security Module, along with additional information, including overviews and data sheets for all three components of the UALink IP Solution. Learn more at these links.

Blog post:    Securing UALink: Introducing Synopsys UALinkSec_200 Security Module

Webpages:

Synopsys UALinkSec_200 Security Module

Synopsys UALink IP Solution

UALink for Scalable AI Systems


GTC 2026: Agentic AI for Semiconductor Design and Manufacturing

GTC 2026: Agentic AI for Semiconductor Design and Manufacturing
by Daniel Nenni on 03-24-2026 at 8:00 am

Janhavi Giri, PhD Principal Architect, EDA & AI NetApp, GTC 2026

Agentic AI is emerging as a transformative paradigm in semiconductor design and manufacturing, driven by the exponential growth in data, system complexity, and performance demands. Modern semiconductor fabs generate massive volumes of heterogeneous data at unprecedented velocity. For instance, a single minute of operation in a gigafab can produce tens of thousands of wafer movement events, thousands of sensor readings, and over 100 GB of equipment and lithography data . This data-intensive environment necessitates advanced AI-driven systems capable of real-time ingestion, analysis, and decision-making.

Traditionally, semiconductor workflows relied on heuristic-based methods and manual engineering expertise. Over the past three decades, these workflows have evolved through classical machine learning and deep learning into generative AI systems, culminating in the emergence of agentic AI. Agentic AI represents a shift from assistive intelligence to autonomous systems capable of planning, reasoning, and executing complex tasks with minimal human intervention . This transition enables higher levels of automation, improved design productivity, and significant reductions in time-to-market.

In EDA, agentic AI is being deployed as multi-agent systems that orchestrate various stages of chip design. These agents specialize in tasks such as specification generation, microarchitecture design, verification, and physical implementation. Coordinated by an orchestrator agent, they collaboratively optimize power, performance, and area while ensuring functional correctness. Industry implementations demonstrate substantial productivity gains; for example, AI-driven EDA agents can achieve up to 10× acceleration in design workflows and significantly improve bug detection and resolution efficiency .

In manufacturing, agentic AI enhances yield optimization and root cause analysis. Advanced machine learning models analyze wafer inspection data to detect defect patterns and predict failure modes. Knowledge graph-based systems provide structured representations of semiconductor processes, linking entities such as lots, wafers, dies, and packages. These semantic models ground AI reasoning, reduce hallucinations, and enable traceable decision-making. Additionally, digital twins and virtual fabrication environments allow AI agents to simulate process variations and recommend optimal parameter adjustments, reducing yield ramp time and operational costs .

A critical enabler of agentic AI is the underlying data infrastructure. Semiconductor systems require AI-ready data fabrics that support high-throughput ingestion, storage, indexing, and retrieval of multimodal data. These platforms must incorporate versioning, lineage tracking, and real-time streaming to ensure reproducibility and reliability of AI-driven workflows. GPU-accelerated compute pipelines further enable large-scale simulations and model training, while scalable MLOps frameworks support continuous deployment and orchestration of AI agents .

The transition to agentic AI is typically incremental. Organizations begin with assistive AI tools such as copilots for documentation search, code generation, and design exploration. Subsequently, they build domain-specific semantic foundations using ontologies and knowledge graphs. As data pipelines mature, enterprises adopt single-purpose agents and gradually evolve toward fully orchestrated multi-agent systems targeting high-impact use cases such as verification signoff, yield analysis, and design-for-manufacturability checks .

Bottom line: Agentic AI represents a paradigm shift in semiconductor engineering, enabling autonomous, intelligent systems that span the entire silicon lifecycle—from design to fabrication. By integrating advanced AI models with robust data infrastructure and domain-specific knowledge representations, the industry is moving toward fully automated, self-optimizing workflows. This evolution is essential to address the growing complexity of semiconductor technologies and to sustain innovation in the trillion-dollar global chip market.

Also Read:

Agentic EDA Panel Review Suggests Promise and Near-Term Guidance

Cloud-Accelerated EDA Development

Agentic AI and the EDA Revolution: Why Data Mobility, Security, and Availability Matter More Than Ever


Trust in Verification with AI

Trust in Verification with AI
by Bernard Murphy on 03-24-2026 at 6:00 am

Uncharted waters

These are stressful times in functional verification. We are being pushed to more aggressively embrace AI-based automation, knowing we will continue to be held accountable for quality of results. Verification misses could upend careers, maybe enterprises. It is tempting to believe that sanity will prevail and we will ultimately settle back into cautious AI adoption, but I am no longer so sure. Offset against verification risk is the very real chance that a competitor with a bigger risk appetite and a little luck will jump ahead and leave the rest of us wondering why customers are disappearing. We need to accept that we must reach further, exploiting our creativity to manage verification risk in uncharted waters. Trusting with confidence that agents on semi-autopilot will not steer us into a rock.

Views from DVCon

I heard several opinions at DVCon. These come down to decomposing agentic flows into steps with checkpoints at which a human reviewer can easily check an agent’s work and correct as needed.

An example would be using an agent to read a spec/test spec for a specific function and from that generate PSS tests. A reasonably experienced DV engineer should be able to compare the relevant section of the spec and the generated PSS to check for correctness and completeness and iterate if needed. Finally, synthesize the PSS into UVM or other tests, run a simulation and score the run for coverage and assertion failures. Which may in turn trigger more iterations, steered by additional feedback from DV. There is evidence from a panel I moderated that, through enough iterations, it is possible to converge to engineering accuracy. Over time, fewer iterations are needed, growing trust.

Another consideration is that sometimes the right place to fix a problem is in the spec, not in agentic training. Here explainability becomes important – why did the agent do something wrong or unexpected? Human-generated specs can be incomplete, inconsistent, or incorrect in places (see this). Automating spec correction and refinement guided by human feedback is an important component in reinforcing trust.

Views from Software Engineering

I found several recent papers studying use of agentic methods in software engineering. One such paper from Monash U. in Australia and Columbia U. emphasizes that software engineering is a collaborative effort, from requirements gathering all the way to long-term maintenance and evolution. While AI agents should be able to perform some tasks autonomously, they must also conform to this collaborative model, allowing for evolving problem statements, tests, constraints and feedback.

The authors point out that trust is accumulated over time in sequences of interactions in which correctness and reliability are obviously fundamental. It’s OK for agents to be wrong in early stages, but correctness and reliability after sufficient training are essential.

Another paper from the National U of Singapore, CMU, and Stuttgart U. offers some interesting insights detailing technical considerations and human considerations for building trust. Some of their technical considerations are familiar in our context: tool-based methods to validate correctness, performance and compliance with general best practices.

Human factors are more interesting. Explainability and transparency require AI be able to justify why it made certain choices. Team practice compliance expects that agents adhere not just to general best practices but also more tightly with local team practices. The authors also suggest checks that explanations be matched appropriately to developer experience and that developers are not too blindly depending on agents (I assume through insufficient review and/or little correction, though I didn’t notice suggestions on this point).

Views from Davos

Don’t laugh. I agree in general that big consensus-driven organizations are poorly equipped to formulate policies in technologies moving much faster than they are able. However the World Economic Forum (WEF, who host the annual Davos meeting) bring together world leaders, business leaders and academics to discuss challenges and to formulate guidance rather than to regulate. An AI forum is a regular event now at Davos, and trust is now viewed as critical for encouraging worldwide AI growth.

Trust is a human condition which can’t be “fixed” with automation, but it can be fostered. WEF have published an article suggesting a “trust stack”. The first stack layer is “non-deceptive affect” meaning the agent should not try to gain trust through empathetic or praising cues or emotional appeals. A second is “epistemic humility”, a mouthful meaning that agents must communicate appropriate levels of uncertainty beside their claims, up to “I don’t know” where appropriate. Agents should also emphasize consistency over persuasion; answers should read as principled beliefs, not opinions of the moment. A fast way to destroy trust is to provide answers that change each time a question is asked.

There are more layers in the stack, but you get the idea. We want agents to act professionally and treat us as professionals, just as we would expect junior engineers to behave.

Takeaways

Trust with confidence in agentic flows is achievable but it doesn’t come free. We must adapt our behavior, to be appropriately wary of first-pass responses, to carefully review deliverables for issues, and to invest time in training agents through multiple cycles before we can consider them effective team members. Even then their output should remain subject to design review, just as for any human team member.

Some trust-centric improvements may require more detailed setup prompts to scope agent behavior (expert, professional, etc.) and to guard again unconsidered signoffs. Explainability with support for correction may be one of the most important factors in building trust, allowing us to detect and retrain where reasoning goes wrong. Today this is supported in some models through after-the-fact mechanisms, though I sense that much here is still in R&D.

Full autonomy may not be a reachable or even a desirable goal but there is certainly a path to significantly improve trust in stages, delivering improved productivity and shorter schedules. Ultimately semiconductor executives don’t expect miracles, but they do want to see significant improvement.

Also Read:

Podcast EP336: How Quadric is Enabling Dramatic Improvements in Edge AI with Veer Kheterpal

WEBINAR: HBM4E Advances Bandwidth Performance for AI Training

Siemens Fuse EDA AI Agent Releases to Orchestrate Agentic Semiconductor and PCB Design


Scaling Multi-Die Connectivity: Automated Routing for High-Speed Interfaces

Scaling Multi-Die Connectivity: Automated Routing for High-Speed Interfaces
by Kalar Rajendiran on 03-23-2026 at 10:00 am

Bump maps for HBM PHY and HBM memory

This article concludes the three-part series examining key methodologies required for successful multi-die design. The first article Reducing Risk Early: Multi-Die Design Feasibility Exploration focused on feasibility exploration and early architectural validation, while the second article Building the Interconnect Foundation: Bump and TSV Planning for Multi-Die Systems  discussed bump and TSV planning as the foundation for physical interconnect infrastructure. With these elements established, the next critical step is routing high-speed die-to-die interfaces.

As multi-die systems adopt advanced interconnect standards such as High-Bandwidth Memory (HBM) and Universal Chiplet Interconnect Express (UCIe), routing complexity has increased dramatically. These standards require extremely dense interconnect fabrics while maintaining strict signal integrity and performance requirements. Automated routing methodologies have therefore become essential for achieving scalable and reliable implementation.

The Rise of High-Speed Chiplet Interconnect Standards

High-speed interconnect standards are driving innovation in multi-die architectures by enabling efficient communication between heterogeneous chiplets. High-Bandwidth Memory provides exceptional data transfer rates through wide I/O interfaces and vertically stacked memory dies interconnected through TSVs. Universal Chiplet Interconnect Express enables standardized die-to-die communication across vendors, supporting scalable system integration and design reuse.

Both standards rely on extremely dense bump maps and fine interconnect pitches, placing significant demands on routing methodologies and signal integrity control.

Bump Maps for HBM PHY and HBM Memory

Routing Challenges in Multi-Die Interfaces

Routing high-speed signals across multi-die systems introduces numerous competing constraints. Dense bump arrays create severe routing congestion, while limited routing layers must be shared with power delivery and shielding structures. Signal integrity concerns such as crosstalk, reflection, attenuation, and skew must be carefully controlled to ensure reliable data transmission.

Die placement and interface alignment further complicate routing implementation. PHY placements not perfectly lined up with each other often require complex routing geometries and multi-stage routing paths. As signal counts scale into the thousands across multiple dies, traditional manual routing approaches become increasingly impractical.

Early Routing Feasibility Analysis

Effective routing implementation begins with early feasibility analysis that evaluates routing pitch, channel spacing, shielding strategies, and technology limitations. Integrating routing feasibility into earlier design stages (Bump and TSV planning) allows designers to identify routing constraints before physical implementation begins, reducing design iterations and improving overall predictability.

Automated Routing Methodologies

Automated routing solutions use specialized algorithms to implement high-bandwidth bump-to-bump interconnects efficiently. These solutions analyze interface topology, partition signal channels, generate routing tracks, and create optimized routing guides. By automating escape via creation and routing path generation, automated routers significantly reduce manual effort while improving routing quality.

Bump-to-bump 2-layer, 45-degree HBM routing from PHY to Memory on a silicon interposer

Signal Integrity-Driven Routing Optimization

Achieving connectivity alone is insufficient for high-speed interfaces. Automated routing engines must also optimize electrical performance. Advanced routing strategies ensure consistent trace geometry, flexible shielding implementation, and accurate differential pair routing. Additional techniques, such as return-path via placement and routing accommodations for decoupling capacitors, further improve signal integrity and system reliability.

Automated Verification and Reporting

Automated routing platforms provide comprehensive reporting capabilities that allow designers to evaluate routing performance and completeness. These reports include routing success metrics, congestion analysis, connectivity verification, and signal length statistics. Such visibility allows design teams to identify optimization opportunities and validate routing quality early in implementation.

Synopsys 3DIC Compiler Platform for Scalable Multi-Die Integration

As multi-die systems continue to grow in complexity and performance requirements, automated routing solutions are becoming indispensable. Synopsys 3DIC Compiler platform fulfills this requirement with an integrated, automated routing solution purpose-built for high-bandwidth die-to-die interconnects. The platform supports specialized capabilities for HBM and UCIe, enabling fast and reliable routing with minimal manual intervention. It combines routing automation with integrated multiphysics analysis, allowing designs to maintain signal integrity while accelerating implementation timelines and reducing design risk.

Learn more by accessing the whitepaper from here.

Summary: A Unified Multi-Die Design Methodology

Multi-die design success depends on a coordinated workflow that integrates feasibility exploration, interconnect planning, and automated routing implementation. Each stage builds upon the previous one, enabling design teams to progressively refine architectural concepts into manufacturable, high-performance systems. Together, these methodologies provide a scalable framework for developing next-generation heterogeneous multi-die semiconductor solutions capable of meeting the demands of emerging computing applications.

Also Read:

Synopsys Explores AI/ML Impact on Mask Synthesis at SPIE 2026

Agentic AI and the Future of Engineering

Ravi Subramanian on Trends that are Shaping AI at Synopsys