Synopsys IP Designs Edge AI 800x100

Impact of Varying Electron Blur and Yield on Stochastic Fluctuations in EUV Resist

Impact of Varying Electron Blur and Yield on Stochastic Fluctuations in EUV Resist
by Fred Chen on 05-03-2025 at 4:00 pm

P30 stochastics vs attenuation length

A comprehensive update to the EUV stochastic image model

In extreme ultraviolet (EUV) lithography, photoelectron/secondary electron blur and secondary electron yield are known to drive stochastic fluctuations in the resist [1-3], leading to the formation of random defects and the degradation of pattern fidelity at advanced nodes. For simplicity, blur and electron yield per photon are often taken to be fixed parameters for a given EUV resist film. However, there is no reason to expect this to be true, since the resist is inhomogeneous on the nanoscale [4-6].

I have updated the model I have been using to analyze EUV stochastics with the following:

•Image fading from EUV pole-specific shift is not included (expected to be minor)

•Polarization consideration: assume 50% TE/50% TM for 2-beam image commonly used for EUV lines and spaces [7]

•Poisson statistics is applied to absorbed photons/nm2

•Electron blur fixed at zero at zero distance and match exponential attenuation length at larger distances => exp(-x/attenuation length)

•Electron blur is considered locally varying rather than a fixed number

•Electron (or acid) number per absorbed EUV photon has a minimum and maximum number

•Acid blur still uses Gaussian form (s=5 nm)

A key feature is that electron yield per photon is part of a distribution. This distribution is often modeled as Poissonian but in actuality can deviate from it significantly [8]. The maximum number of electrons is roughly the EUV photon energy (~92 eV) divided by the ionization potential (~10 eV), giving 9. TThe minimum number of electrons can be estimated as 6 from the Auger emission scenario in Figure 1, with one electron assumed lost to the underlayer, giving 5.

Figure 1. EUV Auger emission scenario releasing minimal number of electrons.

We also recall that electron blur is not a fixed value, but can take on different values at different locations [4-6]. As constructed previously [3], the electron blur function shape arises from the difference of two exponential functions. To maintain zero probability at zero distance, an exponential function with 0.4 nm attenuation length is subtracted from the normalized exponential function with target attenuation length.

While a typical blur can correspond to an attenuation length of ~2 nm, the inhomogeneity within the resist can lead to a rare occurrence of higher blur, e.g., corresponding to an attenuation length of ~4 nm (Figure 2).

Figure 2. A “typical” electron blur distribution could peak at just under 1 nm distance and decay exponentially with an attenuation length of ~2 nm. On the other hand, a “rare” electron blur distribution may peak at ~1 nm distance and decay exponentially with an attenuation length of ~ 4 nm.

The impact of varying blur is to increase the % of defective pixels in the image (Figure 3). We expect stochastic defectivity to be higher. This is to be expected as increasing blur decreases the max-min difference, making it more likely for particle number fluctuations to cross a given threshold.

Figure 3. A higher local blur (i.e., attenuation length) reduces image contrast more, increasing the likelihood of electron number fluctuations to cross the printing threshold.

Etching can affect the stochastics in the final image. Etch bias can make obstructions in the trench more likely, or openings between trenches more likely. This is the origin of the “valley” between the stochastic defectivity cliffs (Figure 4).

Figure 4. Etch bias can also affect the stochastic defectivity. Increasing etch bias (right) means undersizing the feature, which is more likely for trenches to be blocked.

As pitch increases, the contrast loss from the “typical” electron blur shown in Figure 2 will reach a minimum value of 20% [3]. However, the stochastic behavior for larger pitches improves as thicker resists may be used, increasing photon absorption density. Going from 30 nm to 40 nm pitch (Figure 5), the absorbed photon density increases , and the contrast reduction from electron blur is also improved. However, there is still significant noise in the electron density at the edge and defectivity with etch bias.

Figure 5. Same as Figure 4 but for 40 nm pitch. The absorbed photon density and contrast are increased, but the defectivity is only slightly improved.

When chemically amplified resists are used, acid blur must be added. Acid blur is usually modeled as a Gaussian function with sigma on the order of 5 nm [9]. The extra blur from acid aggravates the stochastic behavior even more (Figure 6).

Figure 6. Acid blur is a strong blur component in chemically amplified EUV resists.

Consequently, double or multiple patterning is used with EUV comprising exposures at > 40 nm pitch, with larger features and resist thicknesses. DUV multipatterning will still have the following advantages over EUV multipatterning:

  • Polarization restriction (not unpolarized)
  • No electron blur
  • Higher photon density
  • 2-beam imaging (instead of 3- or 4-beam)
  • Well-developed phase-shift mask technology

Most importantly, DUV double patterning has lower cost than EUV double patterning.

Pledge your support

Thanks for reading Exposing EUV! Subscribe for free to receive new posts and support my work.

References

[1] Z. Belete et al., J. Micro/Nanopattern. Mater. Metrol. 20, 014801 (2021).

[2] F. Chen, A Perfect Storm for EUV Lithography.

[3] F. Chen, A Realistic Electron Blur Function Shape for EUV Resist Modeling.

[4] F. Chen, Measuring Local EUV Resist Blur with Machine Learning.

[5] F. Chen, Stochastic Effects Blur the Resolution Limit of EUV Lithography.

[6] G. Denbeaux et al., “Understanding EUV resist stochastic effects through surface roughness measurements,” IEUVI Resist TWG meeting, February 23, 2020.

[7] H. J. Levinson, Jpn. J. Appl. Phys. 61, SD0803 (2022).

[8] E. F. da Silveira et al., Surf. Sci. 408, 28 (1998); Z. Bay and G. Papp, IEEE Trans. Nucl. Sci. 11, 160 (1964); L. Frank, J. Elec. Microsc. 54, 361 (2005).

[9] H. Fukuda, J. Micro/Nanolith. MEMS MOEMS 19, 024601 (2020).

This article is based on the presentation in the following video: Comprehensive Update to EUV Stochastic Image Model.

Thanks for reading Exposing EUV! Subscribe for free to receive new posts and support my work.


Executive Interview with Koji Motomori, Senior Director of Marketing and Business Development at Numem

Executive Interview with Koji Motomori, Senior Director of Marketing and Business Development at Numem
by Daniel Nenni on 05-03-2025 at 2:00 pm

Koji Motomori

Koji Motomori is a seasoned business leader and technologist with 30+ years of experience in semiconductors, AI, embedded systems, data centers, mobile, and memory solutions, backed by an engineering background. Over 26 years at Intel, he drove strategic growth initiatives, securing $2B+ in contracts with OEMs and partners. His expertise spans product marketing, GTM strategy, business development, deal-making, and ecosystem enablement, accelerating the adoption of CPU, memory, SSD, and interconnect technologies.

Tell us about your company.

At Numem, we’re all about taking memory technology to the next level, especially for AI, Edge Devices, and Data Centers. Our NuRAM SmartMem™ is a high-performance, ultra-low-power memory solution built on MRAM technology. So, what makes it special? It brings together the best of different memory types—SRAM-like read speeds, DRAM-like write performance, non-volatility, and ultra-low power.

With AI and advanced computing evolving fast, the demand for efficient, high-density memory is skyrocketing. That’s where we come in. Our solutions help cut energy consumption while delivering the speed and reliability needed for AI training, inference, and mission-critical applications. Simply put, we’re making memory smarter, faster, and more power-efficient to power the future of computing.

What problems are you solving?

That’s a great question. The memory industry is really struggling to keep up with the growing demands of AI and high-performance computing. Right now, we need memory that’s not just fast, but also power-efficient and high-capacity. The problem is, existing technologies all have major limitations.

Let’s take SRAM, for example, it’s fast but has high leakage power and doesn’t scale well at advanced nodes. HBM DRAM is another option, but it’s higher cost, power-hungry, and still not fast enough to fully meet AI’s needs. And then there’s DDR DRAM, which has low bandwidth, making it a bottleneck for high-performance AI workloads.

That’s exactly why we developed NuRAM SmartMem to solve these challenges. It combines the best of different memory types:

  • It gives you SRAM-like read speeds and DRAM-like write speeds, so AI workloads run smoothly.
  • It has 200x lower standby power than SRAM, which is huge for energy efficiency.
  • It’s 2.5x denser than SRAM, helping reduce cost and die size.
  • It delivers over 3x the bandwidth of HBM, eliminating AI bottlenecks.
  • And it’s non-volatile, meaning it retains data even when the power is off.

So, with NuRAM SmartMem™, we’re not just making memory faster, we’re making it more efficient and scalable for AI, Edge, and Data Center applications. It’s really a game-changer for the industry.

What application areas are your strongest?

That’s another great question. Our memory technology is designed to bring big improvements across a wide range of applications, but we’re especially strong in a few key areas.

For data centers, we help make AI model training and inference more efficient while cutting power consumption. Since our technology reduces the need for SRAM and DRAM, companies see significant Total Cost of Ownership (TCO) benefits. Plus, the non-volatility of our memory enables instant-on capabilities, meaning servers can reboot much faster.

In automotive, especially for EVs, real-time decision-making is critical. Our low-power memory helps extend battery life, and by consolidating multiple memory types like NOR Flash and LPDDR, we save space, power, cost, and weight—while also improving reliability.

For Edge AI devices and IoT applications, power efficiency is a huge concern. Our ultra-low-power memory helps reduce energy consumption, making these devices more sustainable and efficient.

Aerospace is another area where we stand out. Mission-critical applications demand reliability, energy efficiency, and radiation immunity—all of which our memory provides.

Then there are security cameras—with ultra-low power consumption and high bandwidth, our memory helps extend battery life while supporting high-resolution data transmission. And since we can replace memory types like NOR Flash and LPDDR, we also optimize space, power, and cost.

For wearable devices, battery life is everything. Our technology reduces power consumption, enabling lighter, more compact designs that last longer—something consumers really appreciate.

And finally, in PCs and smartphones, AI-driven features need better memory performance. Our non-volatile memory allows for instant-on capabilities, extends battery life, and replaces traditional memory types like boot NOR Flash and DDR, leading to power and space savings, plus faster boot times and overall better performance.

So overall, our memory technology delivers real advantages across multiple industries.

What keeps your customers up at night?

A lot of things. AI workloads are becoming more demanding, and our customers are constantly looking for ways to stay ahead.

One big concern is power efficiency and thermal management. AI systems push power budgets to the limit with rising energy costs, the total cost of ownership (TCO) becomes a huge factor. Keeping power consumption low is critical, not just for efficiency, but for performance and profitability.

Then there’s the issue of memory bandwidth bottlenecks. Traditional memory architectures simply can’t keep up with the growing performance demands of AI, which creates bottlenecks and limits system scalability.

Scalability and cost are also major worries. AI applications need more memory, but scaling up can increase the spending fast. Our customers want solutions that provide higher capacity without blowing the budget.

And finally, reliability and data retention are key, especially for AI and data-heavy applications. These workloads require memory that’s not just fast, but also non-volatile, secure, and long-lasting while still keeping power consumption low.

That’s exactly where NuRAM SmartMem comes in. Our technology delivers ultra-low power, high-density, and high-bandwidth memory solutions that help customers overcome these challenges and future-proof their AI-driven applications.

What does the competitive landscape look like, and how do you differentiate?

The high-performance memory market is dominated by SRAM, LPDDR DRAM, and HBM. Each of these technologies has strengths, but they also come with some major challenges.

SRAM, for example, is fast, but it has high standby power and scalability limitations at advanced nodes. LPDDR DRAM is designed to be lower power than standard DRAM, but it still consumes a lot of energy. And HBM DRAM delivers high bandwidth, but it comes with high cost, power constraints, and integration complexity.

That’s where NuRAM SmartMem™ stands out. We’ve built a memory solution that outperforms these technologies in key areas:

  • 200x lower standby power than SRAM, making it perfect for always-on AI applications that need ultra-low power.
  • 5x higher density than SRAM, reducing die size and overall memory costs.
  • Non-volatility, which unlike SRAM and DRAM, NuRAM retains data even without power. This adds both energy efficiency and reliability.
  • Over 3x faster bandwidth than HBM3E, solving AI’s growing memory bandwidth challenges.
  • Over 260x lower standby power than HBM3E, thanks to non-volatility and our flexible power management feature per block.
  • Scalability & Customization—NuRAM SmartMem™ is available as both IP cores and chiplets, making integration seamless for AI, IoT, and Data Center applications.

So, what really differentiates us? We’re offering a next-generation memory solution that maximizes performance while dramatically reducing power and cost. It’s a game-changer compared to traditional memory options.

What new features/technology are you working on?

We’re constantly pushing the boundaries of AI memory innovation, focusing on performance, power efficiency, and scalability. A few exciting things we’re working on right now include:

  • Smart Memory Subsystems – We’re making memory smarter. Our self-optimizing memory technology is designed to adapt and accelerate AI workloads more efficiently.
  • 2nd-Gen NuRAM SmartMem™ Chiplets – We’re taking things to the next level with even higher bandwidth, faster read/write speeds, lower power consumption, and greater scalability than our first generation.
  • AI Optimized Solutions – We’re fine-tuning our memory for LLM inference, AI Edge devices, and ultra-low-power AI chips, ensuring they get the best performance possible.
  • High-Capacity & Scalable Operation – As AI models keep growing, memory needs to scale with them. We’re expanding die capacity and improving stacking while working closely with foundries to boost manufacturability and yield for high-volume production.
  • Memory Security & Reliability Enhancements – AI applications rely on secure, stable memory. We’re enhancing data integrity, security, and protection against corruption and cyber threats to ensure reliable AI operations.

For the future, we’re on track to deliver our first-generation chiplet samples in Q4 2025 and second-generation chiplets samples in Q2 2026. With these advancements, we’re setting a new benchmark for efficiency, performance, and power optimization in AI memory.

How do customers normally engage with your company?

We work closely with a wide range of customers, including AI chip makers, MCU/ASIC designers, SoC vendors, Data Centers, and Edge computing companies. Our goal is to integrate our advanced memory solutions into their systems in the most effective way possible.

There are several ways customers typically engage with us:

  • NuRAM + SmartMem™ IP Licensing – Some customers embed our NuRAM SmartMem™ technology directly into their ASICs, MCUs, MPUs, and SoCs, boosting performance and efficiency for next-gen AI and computing applications.
  • SmartMem™ IP Licensing—Others use our SmartMem™ technology on top of their existing memory architectures, whether Flash, RRAM, PCRAM, traditional MRAM, or DRAM, to improve memory performance and power efficiency.
  • Chiplet Partnerships – For customers looking for a plug-and-play solution, we offer SmartMem™ chiplets that deliver high bandwidth and ultra-low power, specifically designed for server and Edge AI accelerators while seamlessly aligning with industry-standard memory interfaces.
  • Custom Memory Solutions – We also work with customers to customize memory architectures to their specific AI and Edge workloads, ensuring optimal performance and power efficiency.
  • Collaborations & Joint Development – We actively partner with industry leaders to co-develop next-generation memory solutions, maximizing AI processing efficiency and scalability.

At the end of the day, working with Numem gives customers access to ultra-low-power, high-performance, and scalable memory solutions that help them meet AI’s growing demands while significantly reducing energy consumption and cost.

Also Read:

Executive Interview with Leo Linehan, President, Electronic Materials, Materion Corporation

CEO Interview with Ronald Glibbery of Peraso

CEO Interview with Pierre Laboisse of Aledia


Video EP3: A Discussion of Challenges and Strategies for Heterogeneous 3D Integration with Anna Fontanelli

Video EP3: A Discussion of Challenges and Strategies for Heterogeneous 3D Integration with Anna Fontanelli
by Daniel Nenni on 05-02-2025 at 10:00 am

In this episode of the Semiconductor Insiders video series, Dan is joined by Anna Fontanelli, founder and CEO of MZ Technologies. Anna explains some of the substantial challenges associated with heterogeneous 3D integration. Dan then begins to explore some of the capabilities of GenioEVO, the first integrated chiplet/package EDA tool to address, in the pre-layout stage the two major issues of 3D-IC design, thermal and mechanical stress.

Contact MZ Technologies

The views, thoughts, and opinions expressed in these videos belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview with Richard Hegberg of Caspia Technologies

CEO Interview with Richard Hegberg of Caspia Technologies
by Daniel Nenni on 05-02-2025 at 6:00 am

Richard Hegberg

Rick has a long and diverse career in the semiconductor industry. He began as VP of sales at Lucent Microelectronics. He has held executive roles at several high-profile companies and participated in several acquisitions along the way. These include NetApp, SanDisk/WD, Atheros/Qualcomm, Numonyx/Micron, ATI/AMD, and VLSI Technology.

Rick was CEO of three semiconductor start-ups. He has a deep knowledge and passion for semiconductor systems with a special focus on analog and AI. Rick joined Caspia Technologies as CEO in September 2024.

Tell us about your company

Caspia Technologies was born out of pioneering work in cybersecurity at the University of Florida in Gainesville.  The founding team includes Dr. Mark Tehranipoor, Department Chair of Electrical and Computer Engineering. Mark is also the founding director of the Florida Institute for Cybersecurity Research.

He and his research team have driven an expanded understanding of the complex world of cybersecurity. This team began developing unique GenAI assisted tools to assure new chip designs are resistant to current and future cyberattacks. That commercial application of the team’s work is what created Caspia Technologies and got my attention to lead the effort.

What problems are you solving?

Caspia is delivering technology and know-how to ensure chip designs are resistant to cyberattacks. If you think about current production chip design flows, there are well-developed processes to verify the functionality, timing and power of new chip designs. What is missing is a process to verify how secure and resistant to attack these chips designs are. A security verification flow is needed, and Caspia is delivering that.

There is more to this story. We all know that AI is driving many new and highly complex chip designs. Design teams are now using AI technology to make it easier and faster to design the required chips. Using AI to design AI chips if you will. While this approach has shown substantial results, there are two critical risks that emerge.

First, the vast data sets that are being used to train AI models are typically not managed and secured appropriately. This opens the door to faulty design practices. And second, these same AI algorithms can be used to mount very sophisticated cyberattacks. AI makes it easier to design chips but also easier to attack those same chips.

Using GenAI technology and a deep understanding or cyberattack methods, Caspia is addressing these problems.

What application areas are your strongest?

Assurance of security at the hardware level is our focus and our core strength. We are presently deploying a GenAI assisted security platform that examines pre-silicon designs using three approaches.

First, we perform static checking of RTL code to identify design practices that can lead to security weaknesses. The tool also helps designers fix the problems it finds. We us a large set of checks that is constantly updated with our security trained large language models (LLMs).

Second, we use GenAI to create security-based assertions that can be fed to standard formal verification tools. This opens the door to deep analysis of new designs from a security standpoint.

And third, we use GenAI to set up scenarios using standard co-simulation and emulation technology to ensure the design is indeed resistant to real-world cyberattacks.

What keeps your customers up at night?

That depends on the customer. Those that are new to the use of AI technology for chip design may not realize the risks they are facing. Here, we provide training to create a deeper understanding of the risks and how to address them.

Design teams that are already using AI in the design flow better understand the risks they face. What concerns these teams usually falls into two categories:

  1. How can I protect the data I am using to train my LLMs, and how can I use those LLMs more effectively?
  2. How can I be sure my design is resistant to current and future cyberattacks?

Caspia’s security platform addresses both of these issues.

What does the competitive landscape look like and how do you differentiate?

There are many commercially available IP blocks that claim various degrees of security hardness. There are also processes, procedures and tools that can help ensure a better design process.

But there is no fully integrated platform that uses GenAI to verify new chip designs are as secure as they can be. This is the unique technology that is only available from Caspia.

What new features/technology are you working on?

Our current security verification platform is a great start. This technology has received an enthusiastic response from both small and very large companies. The ability to add a security verification to your existing design flow is quite compelling. No new design flow, just new security insights.

We are constantly updating our GenAI models to ensure we are tracking and responding to the latest threats. Beyond that, we will be adding additional ways to check a design. Side-channel assessment, fault injection, IP protection and silicon backside protection are all on our roadmap.

How do customers normally engage with your company?

We are starting to be visible at more industry events. For example, we are a Corporate Sponsor at the IEEE International Symposium on Hardware Oriented Security and Trust (HOST) in San Jose on May 5-8 this year and Dr. Tehranipoor will be giving a SkyTalk at DAC as well in June. You can stop by and see us at these events. You can also reach out to us via our website here. We’ll take it from there.

Also Read:

CEO Interview with Dr. Michael Förtsch of Q.ANT

Executive Interview with Leo Linehan, President, Electronic Materials, Materion Corporation

CEO Interview with Ronald Glibbery of Peraso


TSMC Describes Technology Innovation Beyond A14

TSMC Describes Technology Innovation Beyond A14
by Mike Gianfagna on 05-01-2025 at 10:00 am

Device Architecture Outlook

The inaugural event for the 2025 TSMC Technology Symposium recently concluded in Santa Clara, California. This will be followed by events around the world over the next two months. We have summarized information from this event regarding process technology innovation and advanced packaging innovation. Overall, the A14 process node was deemed to define the most advanced technology available from TSMC. Recently, a presentation from the event was posted that discusses technology leadership, and in that presentation, what lies beyond A14. Seeing what’s around the next corner is always interesting. Let’s look at how TSMC describes technology innovation beyond A14.

The Presenter

Dr. Yuh Jier Mii

The presenter was Dr. Yuh-Jier Mii, EVP and Co-Chief Operating Officer at TSMC. Dr. Mii is an excellent presenter. He describes very complex work in language everyone can understand. His presentation builds on work he presented at last year’s IEDM event. Dr. Mii covered a lot of information. A link is coming. But first, I’d like to focus on his comments on innovation at TSMC beyond A14. 

What Was Said

The broad focus of Dr. Mii’s discussion focused on new transistor architectures and new materials. He began by discussing device architectures. The current evolution is from FinFET to Nanosheet. Beyond these technologies, vertically stacked NFET and PFET devices, called CFETs is a likely scaling candidate. Beyond CFET, there are breakthroughs in channel material that can enable further dimensional scaling and energy reduction. These developments are summarized in the graphic above.

Dr. Mii reported that TSMC has been actively building CFET devices on silicon to enable the next level of scaling. TSMC presented its first CFET transistor at a 48nm gate pitch at IEDM 2023. This year at IEDM, TSMC presented the smallest CFET inverter. The figure below illustrates the well-balanced performance characteristics of this device up to 1.2V.

He explained that this demonstration achieved a significant milestone in CFET technology development that will help to drive future technology scaling.

Dr. Mii reported that great progress has also been made on transistors with 2D channel materials. TSMC has demonstrated the first electrical performance using a monolayer channel in stacked nanosheet architecture similar to the N2 technology. An inverter has also been developed using well-matched N and P channel devices operating at 1V. This work is summarized in the figure below.

Going forward, there are plans to continue to develop new interconnect technologies to improve interconnect performance. For copper interconnect, the plan is to use a new via scheme to reduce via resistance and coupling capacitance. Work is also underway on a new copper barrier to reduce copper line resistance.

Beyond copper, there is work underway on new metal materials with an air gap that could further reduce resistance and coupling capacitance. Intercalated Graphene is another new and promising metal material that could significantly reduce interconnect delay in the future. This work is summarized in the graphic below.

To Learn More

Dr. Mii covered many other topics. You can view his entire presentation here. And that’s how TSMC describes technology innovation beyond A14.

Also Read:

TSMC Brings Packaging Center Stage with Silicon

TSMC 2025 Technical Symposium Briefing

IEDM 2025 – TSMC 2nm Process Disclosure – How Does it Measure Up?


SNUG 2025: A Watershed Moment for EDA – Part 2

SNUG 2025: A Watershed Moment for EDA – Part 2
by Lauro Rizzatti on 05-01-2025 at 6:00 am

SNUG 2025 Scaling Compute for the Age of Intelligence Figure 1

At this year’s SNUG (Synopsys Users Group) conference, Richard Ho, Head of Hardware, OpenAI, delivered the second keynote, titled “Scaling Compute for the Age of Intelligence.” In his presentation, Richard guided the audience through the transformative trends and implications of the intelligence era now unfolding before our eyes.

Recapping the current state of AI models, including the emergence of advanced reasoning models in fall 2024, Richard highlighted the profound societal impact of AI technology, extending well beyond technical progress alone. He followed with a discussion on the necessary scaling of compute and its implications for silicon technology. The talk emphasized the distinction between traditional data centers and AI-optimized data centers, concluding with an analysis of how these shifts are reshaping silicon design processes and the role of EDA engineers.

Reasoning Models: The Spark of General Intelligence

AI entered mainstream consciousness in November 2022 with the launch of ChatGPT. Its rapid adoption surprised everyone, including OpenAI. Today, the latest version, GPT-4o—introduced in 2024 with advanced reasoning models—has surpassed 400 million monthly users, solidifying its role as an essential tool for coding, education, writing, and translation. Additionally, generative AI tools like Sora are revolutionizing video creation, enabling users to express their ideas more effectively.

Richard believes that the emergence of reasoning models heralds the arrival of AGI. To illustrate the concept, he used a popular mathematical riddle, known as the locker problem. Imagine a corridor with 1,000 lockers, all initially closed. A thousand people walk down the corridor in sequence, toggling the lockers. The first person toggles every locker, the second every other locker, the third every third locker, and so on. The 1,000th person toggles only the 1,000th locker. How many lockers remain open?

Richard presented this problem to a reasoning model in ChatGPT, requesting its chain of thought. Within seconds, ChatGPT concluded that 31 lockers remain open.

More important than the result was the process. In summary, the model performed:
  • Problem Extraction: The model identified the core problem: toggling and counting divisors.
  • Pattern Recognition: It recognized that only perfect squares have an odd number of divisors.
  • Logical Deduction: It concluded that 31 lockers remain open.

This example, while relatively simple, encapsulated the promise of reasoning models: to generalize, abstract, and solve non-trivial problems across domains.

The Economic Value of Intelligence

The keynote then pivoted from theory to impact, discussing the impact of AI on global productivity. Richard started from the Industrial Revolution and moved to the Information Age, showing how each major technological shift triggered steep GDP growth. With the advent of AI, particularly language models and autonomous agents, another leap is expected — not because of chatbots alone, but because of how AI enables new services and levels of accessibility.

Richard emphasized the potential of AI in a variety of fields as:

  • Medical expertise: available anywhere, anytime,
  • Personalized education: tuned to individual learners’ needs,
  • Elder care and assistance: affordable, consistent, and scalable,
  • Scientific collaboration: AI as an assistant for tackling climate change, medicine, and energy.

To illustrate the transformative potential of AI agents, Richard recalled his experience at D.E. Shaw Research to solve the protein folding challenge. Fifteen years ago, D.E. Shaw Research designed a line of purpose-built supercomputers, called Anton 1 & 2, to accelerate the physics calculations required to model the forces—electrostatics, van der Waals, covalent bonds—acting on every atom in a protein, across femtosecond time steps. It was like brute-forcing nature in silicon, and it worked.

In late 2020, DeepMind tackled the same problem, using AI. Instead of simulating molecular physics in fine-grained time steps, they trained a model and succeeded. Their AlphaFold system was able to predict protein structures with astonishing accuracy, bypassing a decade of painstaking simulation with a powerful new approach, and rightly earned one of the highest recognitions in science, a Nobel Prize.

AI is no longer just mimicking human language or playing games. It’s becoming a tool for accelerated discovery, capable of transformative contributions to science.

Scaling Laws and Predictable Progress

The talk then shifted gears to a foundational law: increasing compute power leads to improved model capabilities. In fact, the scaling of compute has been central to AI’s progress.

In 2020, OpenAI observed that increasing compute consistently improves model quality, a relationship that appears as a straight line on a log-log plot. The relationship leads to practical tasks, such as solving coding and math problems. The evolution of GPT is a living proof of this dependency:

  • GPT-1 enabled state-of-the-art word prediction.
  • GPT-2 generated coherent paragraphs.
  • GPT-3 handled few-shot learning.
  • GPT-4 delivered real-world utility across domains.

To exemplify this, Richard pointed to the MMLU benchmark, a rigorous suite of high school and university-level problems. Within just two years, GPT-4 was scoring near 90%, showing that exponential improvements were real — and happening fast.

The Infrastructure Demands of AI

The compute required to train these large models has been increasing at eye-popping rates — 6.7X annually pre-2018, and still 4X per year since then, far exceeding Moore’s Law.

Despite industry buzz about smaller, distilled models, Richard argued that scaling laws are far from dead. Reasoning models benefit from compute not only at training time, but also at inference time — especially when using techniques like chain-of-thought prompting. In short, better thinking still requires more compute.

This growth in demand is reflected in investment. OpenAI alone is committing hundreds of billions toward infrastructure. The industry as a whole is approaching $1 trillion in total AI infrastructure commitments.

The result of all this investment is a new class of infrastructure that goes beyond what we used to call “warehouse-scale computing” for serving web services. Now we must build planet-spanning AI systems to meet the needs of training and deploying large models. These training centers don’t just span racks or rooms—they span continents. Multiple clusters, interconnected across diverse geographies, collaborating in real time to train a single model. It’s a planetary-scale effort to build intelligence.

Designing AI Systems

Today, we must focus on the full-stack of AI compute: the model, the compiler, the chip, the system, the kernels—every layer matters. Richard recalled the Amdahl’s Law, namely, optimizing a single component in the chain doesn’t guarantee a proportional improvement in overall performance. To make real gains, we have to improve the stack holistically—software, hardware, and systems working in concert.

Within this vertically integrated stack, one of the most difficult components to design is the chip accelerator. Designing the right hardware means striking a careful balance across multiple factors. It’s not just about raw compute power—it’s also about throughput, and in some cases, latency. Sometimes batching is acceptable; other times, real-time response matters. Memory bandwidth and capacity remain absolutely critical, but increasingly, so does bandwidth between dies and between chips. And perhaps most importantly, network bandwidth and latency are becoming integral to system-level performance.

Peak performance numbers often advertised are rarely achieved in practice. The real value comes from the delivered performance in processing AI software workloads, and that requires full-stack visibility, from the compiler efficiency, to how instructions are executed, where bottlenecks form, and where idle cycles (the “white spaces”) appear. Only then realistic system’s throughput and latency can be assessed. Co-design across model, software, and hardware is absolutely critical.

Richard emphasized a critical yet often overlooked aspect of scaling large infrastructure: reliability. Beyond raw performance, he stressed the importance of keeping massive hardware clusters consistently operational. From hardware faults and rare software bugs triggered in edge-case conditions to network instability—such as intermittent port flaps—and physical data center issues like grid power fluctuations, cooling failures, or even human error, he argued that every layer of the stack must be designed with resilience in mind, not just speed or scale.

Resiliency is perhaps the most underappreciated challenge. For AI systems to succeed, they must be reliable across thousands of components. Unlike web services, where asynchronous progress is common, AI training jobs are synchronous — every node must make progress together. One failure can stall the entire job.

Thus, engineering resilient AI infrastructure is not just important, it’s existential.

From Chip Design to Agentic Collaboration

The keynote closed with a return to the world of chip design. Richard offered a candid view of the challenges in hardware timelines — often 12 to 24 months — and contrasted this with the rapid cadence of ML research, which can shift in weeks.

His dream? Compress the design cycle dramatically.

Through examples from his own research, including reinforcement learning for macro placement and graph neural networks for test coverage prediction, he illustrated how AI could help shrink the chip design timeline. And now, with large language models in the mix, the promise grows further.

He recounted asking ChatGPT to design an async FIFO for CDC crossings — a common interview question from his early career. The model produced functional SystemVerilog code and even generated a UVM testbench. While not yet ready for full-chip design, it marks a meaningful step toward co-designing hardware with AI tools.

This led to his final reflection: In the age of intelligence, engineers must think beyond lines of code. With tools so powerful, small teams can accomplish what once took hundreds of people.

Richard concluded by accentuating that the next era of AI will be shaped by engineers like those attending SNUG, visionaries whose work in compute, architecture, and silicon will define the future of intelligence. He closed with a call to action: “Let’s keep pushing the boundaries together in this incredibly exciting time.”

Key Takeaways
  • AI capabilities are accelerating — not just in chatbot quality, but in math, reasoning, science, and infrastructure design.
  • Scaling laws remain reliable — compute continues to drive capability in predictable and exponential ways.
  • The infrastructure must evolve — to handle synchronous workloads, planetary-scale compute, and the resiliency demands of large-scale training.
  • Hardware co-design is essential — model, software, and system improvements must be approached holistically.
  • AI for chip design is here — not yet complete, but showing promise in coding, verification, and architecture exploration.
Also Read:

SNUG 2025: A Watershed Moment for EDA – Part 1

DVCon 2025: AI and the Future of Verification Take Center Stage

The Double-Edged Sword of AI Processors: Batch Sizes, Token Rates, and the Hardware Hurdles in Large Language Model Processing


Emerging NVM Technologies: ReRAM Gains Visibility in 2024 Industry Survey

Emerging NVM Technologies: ReRAM Gains Visibility in 2024 Industry Survey
by Daniel Nenni on 04-30-2025 at 11:00 am

Slide1

A recent survey of more than 120 anonymous semiconductor professionals offers a grounded view of how the industry is evaluating non-volatile memory (NVM) technologies—and where things may be heading next.

The 2024 NVM Survey, run in late 2024 and promoted through various semiconductor-related platforms and portals including SemiWiki, drew responses from engineers, architects, and decision-makers across North America, Europe, and Asia. It focused on how memory IP is being selected, which technologies are under review, and what factors matter most to the people making those calls.

81% of respondents said they’re currently evaluating or have previously used NVM IP. These are teams with real-world design decisions in motion. The respondent base included a mix of semiconductor vendors, IP companies, and system developers—ranging from large global firms to focused design teams. Job titles covered everything from engineers to CTOs.

When asked about emerging NVM types, ReRAM (Resistive RAM) ranked among the most recognized technologies. Over 60% of respondents were familiar with it, placing it in the lead slightly ahead of MRAM. While embedded flash remains dominant, newer options like ReRAM are clearly on the radar as potential alternatives. That recognition doesn’t guarantee adoption. But it does indicate that ReRAM is part of the memory conversation for more companies than in years past.

A notable number of respondents expect to select NVM IP within the next six to 12 months. Some respondents are also evaluating multiple NVM options in parallel, which reflects a shifting landscape. Cost, power, integration complexity, and endurance are all forcing companies to think beyond the status quo.

When asked about the criteria driving their NVM IP selection, respondents cited power efficiency, reliability, integration flexibility, and scalability. Two factors stood out: reliability (42%) and high-temperature performance (37%). Reliability shows up in two columns—technical and commercial—which makes sense. Especially in markets like automotive, industrial, and IoT, that’s not negotiable.

Respondents also shared what’s not working with existing solutions. Top issues included limited endurance, high power consumption, and complex integration workflows. These pain points explain the interest in exploring new NVM types. But most emerging options still have hurdles to clear—scalability, ecosystem maturity, and total cost of ownership being the most cited.

Survey participants are building for a wide range of markets, with a few recurring themes: IoT, where power efficiency and size matter most; automotive, where memory must survive heat and stress; and AI/ML, where fast, reliable access drives performance. These are sectors with sharp constraints—and they’re forcing design teams to re-evaluate long-held assumptions about memory.

The survey also asked how professionals stay informed. The most common answers: technical content from vendors, peer recommendations, and webinars or conference sessions. That may not surprise anyone, but it reinforces a key point: decisions are being made by people actively looking for clarity, not just headlines.

This year’s survey shows a market in transition. Traditional NVM, notably flash, isn’t going anywhere just yet, but it’s no longer the only path forward. Newer technologies—like ReRAM—are being seriously evaluated, something more possible now that major foundries like TSMC are offering ReRAM IP as a main part of their portfolio.

There will be another survey later this year. Stay tuned to see how things progress.

Also Read:

Designing and Simulating Next Generation Data Centers and AI Factories

How Cadence is Building the Physical Infrastructure of the AI Era

Achieving Seamless 1.6 Tbps Interoperability for High BW HPC AI/ML SoCs: A Technical Webinar with Samtec and Synopsys


Podcast EP300: Next Generation Metalization Innovations with Lam’s Kaihan Ashtiani

Podcast EP300: Next Generation Metalization Innovations with Lam’s Kaihan Ashtiani
by Daniel Nenni on 04-30-2025 at 10:00 am

Dan is joined by Kaihan Ashtiani, Corporate Vice President and General Manager of atomic layer deposition and chemical vapor deposition metals in Lam’s Deposition Business Unit. Kaihan has more than 30 years of experience in technical and management roles, working on a variety of semiconductor tools and processes.

Dan explores the challenges of metallization for advanced semiconductor devices with Kaihan, where billions of connections must be patterned reliably to counteract heat and signal integrity problems. Kaihan describes the move from chemical vapor deposition to the atomic layer deposition approach used for advanced nodes. He also discusses the motivations for the move from tungsten to molybdenum for metalization.

He explains that thin film resistivity challenges make molybdenum a superior choice, but working with this material requires process innovations that Lam has been leading. Kaihan describes the ALTUS Halo tool developed by Lam and the ways this technology addresses the challenges of metallization patterning for molybdenum, both in terms of quality of results and speed of processing.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Feeding the Beast: The Real Cost of Speculative Execution in AI Data Centers

Feeding the Beast: The Real Cost of Speculative Execution in AI Data Centers
by Jonah McLeod on 04-30-2025 at 10:00 am

Per Module Cost Breakdown RISCV

For decades, speculative execution was a brilliant solution to a fundamental bottleneck: CPUs were fast, but memory access was slow. Rather than wait idly, processors guessed the next instruction or data fetch and executed it ‘just in case.’ Speculative execution traces its lineage back to Robert Tomasulo’s work at IBM in the 1960s. His algorithm—developed for the IBM System/360 Model 91—introduced out-of-order execution and register renaming. This foundational work powered performance gains for over half a century and remains embedded in most high-performance processors today.

But as workloads have shifted—from serial code to massively parallel AI inference—speculation has become more burden than blessing. Today’s data centers dedicate massive silicon and power budgets to hiding memory latency through out-of-order execution, register renaming, deep cache hierarchies, and predictive prefetching. These mechanisms are no longer helping—they’re hurting. The effort to keep speculative engines fed has outpaced the benefit they provide.

It’s time to rethink the model. This article explores the economic, architectural, and environmental case for moving beyond speculation—and how a predictive execution interface can dramatically reduce system cost, complexity, and energy use in AI data centers. See Fig. 1, which shows Side-by-side comparison of integration costs per module. Predictive interface SoCs eliminate the need for HBM3 and complex speculative logic, slashing integration cost by more than 3×. When IBM introduced Tomasulo’s algorithm in the 1960s, “Think” was the company’s unofficial motto—a call to push computing forward. In the 21st century, it’s time for a new mindset. One that echoes Apple’s challenge to the status quo: “Think Different.” Tomasulo changed computing for his era. Today, Dr. Thang Tran is picking up that torch—with a new architecture that reimagines how CPUs coordinate with accelerators. Predictive execution is more than an improvement—it’s the next inflection point.

Figure 1: Per-Module Cost Breakdown – Grace Hopper Superchip (GH200) vs. Predictive Interface SoC

Freeway Traffic Analogy: Speculative vs. Predictive Execution

Imagine you’re driving on a crowded freeway during rush hour. Speculative execution is like changing lanes the moment you see a temporary opening—hoping it will be faster. You swerve into that new lane, pass 20 cars… and then hit the brakes. That lane just slowed to a crawl, and you have to switch again, wasting time and fuel with every guess.

Predictive execution gives you a drone’s-eye view of the next 255 car lengths. You can see where slowdowns will happen and where the traffic flow is smooth. With that insight, you plan your lane changes in advance—no jerky swerves, no hard stops. You glide through traffic efficiently, never getting stuck. This is exactly what predictive interfaces bring to chip architectures: fewer stalls, smoother data flow, and far less waste.

Let’s examine the cost of speculative computing in current hyperscalar designs. The NVIDIA Grace Hopper Superchip (GH200) integrates a 72-core Grace CPU with a Hopper GPU via NVLink-C2C and feeds them using LPDDR5x and HBM3 memory respectively. While this architecture delivers impressive performance, it also incurs massive BoM costs due to its reliance on HBM3 high-bandwidth memory (96–144 GB), CoWoS packaging to integrate GPU and HBM stacks, deep caches, register renaming, warp scheduling logic, and power delivery for high-performance memory subsystems.

GH200 vs. Predictive Interface: Module Cost Comparison
GH200 Module Components Cost Architecture with Predictive Interface Cost
HBM3 (GPU-side) $2,000–$2,500 DDR5/LPDDR5 memory (shared) $300–$500
LPDDR5x (CPU-side) $350–$500 Interface control fabric (scheduler + memory coordination) $100–$150
Interconnect & Control Logic (NVLink-C2C + PHYs) $250–$350 Standard packaging (no CoWoS) $250–$400
Packaging & Power Delivery (CoWoS, PMICs) $600–$1,000 Simplified power delivery $100–$150
Total per GH200 module $3,200–$4,350 Total cost per module $750–$1,200
A Cost-Optimized Alternative

An architecture with predictive interface eliminates speculative execution and instead employs time-scheduled, deterministic coordination between scalar CPUs and vector/matrix accelerators. This approach eliminates speculative logic (OOO, warp schedulers), makes memory latency predictable—reducing cache and bandwidth pressure, enables use of standard DDR5/LPDDR memory, and requires simpler packaging and power delivery. In the same data center configuration, this would yield a total integration cost of $2.4M–$3.8M, resulting in a total estimated savings: $7.8M–$10.1M per deployment.

While the benefits of predictive execution are substantial, implementing it does not require a complete redesign of a speculative computing system. In most cases, the predictive interface can be retrofitted into the existing instruction execution unit—replacing the speculative logic block with a deterministic scheduler and timing controller. This retrofit eliminates complex out-of-order execution structures, speculative branching, and register renaming, removing approximately 20–25 million gates. In their place, the predictive interface introduces a timing-coordinated execution fabric that adds 4–5 million gates, resulting in a net simplification of silicon complexity. The result is a cleaner, more power-efficient design that accelerates time-to-market and reduces verification burden.

Is $10M in Savings Meaningful for NVIDIA?

At NVIDIA’s global revenue scale (~$60B in FY2024), a $10M delta is negligible. But for a single data center deployment, it can directly impact total cost of ownership, pricing, and margins. Scaled across 10–20 deployments, savings exceed $100M. As competitive pressure rises from RISC-V and low-cost inference chipmakers, speculative execution becomes a liability. Predictive interfaces offer not just architectural efficiency but a competitive edge.

Environmental Impact

Beyond cost and performance, replacing speculative execution with a predictive interface can yield significant environmental benefits. By reducing compute power requirements, eliminating the need for HBM and liquid cooling, and improving overall system efficiency, data centers can significantly lower their carbon footprint.

  • Annual energy use is reduced by ~16,240 MWh
  • CO₂ emissions drop by ~6,500 metric tons
  • Up to 2 million gallons of water saved annually by eliminating liquid cooling
Conclusion: A Call for Predictable Progress

Speculative execution has long served as the backbone of high-performance computing, but its era is showing cracks—both in cost and efficiency. As AI workloads scale exponentially, the tolerance for waste—whether in power, hardware, or system complexity—shrinks. Predictive execution offers a forward-looking alternative that aligns not only with performance needs but also with business economics and environmental sustainability.

The data presented here makes a compelling case: predictive interface architectures can slash costs, lower emissions, and simplify designs—without compromising on throughput. For hyperscalers like NVIDIA and its peers, the question is no longer whether speculative execution can keep up, but whether it’s time to leap ahead with a smarter, deterministic approach.

As we reach the tipping point of compute demand, predictive execution isn’t just a refinement—it’s a revolution waiting to be adopted.

Also Read:

LLMs Raise Game in Assertion Gen. Innovation in Verification

Scaling AI Infrastructure with Next-Gen Interconnects

Siemens Describes its System-Level Prototyping and Planning Cockpit


LLMs Raise Game in Assertion Gen. Innovation in Verification

LLMs Raise Game in Assertion Gen. Innovation in Verification
by Bernard Murphy on 04-30-2025 at 6:00 am

Innovation New

LLMs are already simplifying assertion generation but still depend on human-generated natural language prompts. Can LLMs go further, drawing semantic guidance from the RTL and domain-specific training? Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Using LLMs to Facilitate Formal Verification of RTL and was published in Arxiv.org in 2023. The authors are from the Princeton University. The paper has 27 citations according to Google Scholar.

The authors acknowledge that there is already published work on using LLMs to generate SVA assertions from natural language prompts but point out that the common approach doesn’t alleviate much of the burden on test writers who must still reason about and express test intent in natural language. Their goal is to explore whether LLMs can generate correct SVA for a given design without any specification beyond the RTL— even when the RTL contains bugs. They partially succeed, though still depend on designer review/correction.

Paul’s view

Great find by Bernard this month – paper out of Princeton on prompt engineering to improve GPT4’s ability to generate SVAs.  The intended application is small units of code that can achieve full statement and toggle coverage based only on SVAs and model checking.

The authors refine their prompt by taking RTL for a simple FIFO module which is known to be correct and repeatedly asking GPT4 to “write SVA assertions to check correctness of ALL the functionality” of that module. After each iteration they review the SVAs and add hints to their prompt to help GPT4 generate a better result. For example, “on the postcondition of next-cycle assertions (|=>), USE $past() to refer to the value of wires.” After 23 iterations and about 8 hours of manual effort they come up with a prompt that generates a complete and correct set of assertions for the FIFO.

Next, the authors take their engineered prompt and try it on a more complex module – the page table walker (PTW) of an opensource RISC-V core. They identify a recent bug fix to the PTW and take an RTL snapshot from before that bug fix. After calling GPT4 8 times (for a total of 80 SVAs generated), they are able to get an SVA generated that catches the bug. An encouraging step in the right direction, but of course it’s much easier to find an SVA to match a known bug vs. looking at several failing auto-generated SVAs and wondering which ones are due to a real bug in the RTL vs. the SVA itself being buggy.

The latter part of the paper investigates if auto-generated SVAs can improve RTL generation: the authors take a 50 word plain text description of a FIFO queue and ask GPT4 to generate RTL for it. They generate SVAs for this RTL, manually fix any errors, and add the fixed SVAs back into the prompt. After 2 iterations of this process they get clean RTL and SVAs with full coverage. Neat idea, and another encouraging result, but I do wonder if the effort required to review and fix the SVAs was any less than the effort that would have been required to review and fix the first RTL generated GPT4.

Raúl’s view

Formal property verification (FPV) utilizing SystemVerilog Assertions (SVA) is essential for effective design verification. Researchers are actively investigating the application of large language models (LLMs) in this area, such as generating assertions from natural language, producing liveness properties from an annotated RTL module interface, and creating a model of the design from a functional specification for comparison with the RTL implementation. This paper examines whether LLMs can generate accurate SVA for a given design solely based on the RTL, without any additional specifications – which has evident advantages. The study builds upon the previously established framework, AutoSVA, which uses GPT-4 to generate end-to-end liveness properties from an annotated RTL module interface. The enhanced framework is referred to as AutoSVA2.

The methodology involves iteratively refining prompts with rules to teach GPT-4 how to generate correct SVA (even state-of-the-art GPT-4 generates syntactically and semantically wrong SVA by default) and crafting rules to guide GPT4 at generating SVA output, published as open-source artifacts [2]. Two examples of such rules include: “signals ending in _reg are registers: the assigned value changes in the next cycle”, “DO NOT USE $past() on postcondition of same-cycle assertion”.

The paper details extensive experimentation that identified a bug in the RISC-V CVA6 Ariane core which had previously gone undetected. AutoSVA2 also allows the generation of Register Transfer Level (RTL) for a FIFO queue based on a fifty-word specification. To illustrate the process, here is an excerpt from the paper describing the workflow:

  1. Start with a high-level specification in English
  2. The LLM generates a first version of the RTL based on the specification, the module interface, and an order to generate synthesizable Verilog
  3. AutoSVA2 generates an FPV Testbench (FT) based on the RTL
  4. JasperGold evaluates the FT
  5. The engineer audits and fixes the SVA
  6. The LLM generates a new version of the RTL after appending the SVA to the previous prompt.
  7. Steps 3 to 6 are then repeated until convergence: either (a) full proof and coverage of the FT or(b) a plateau in the improvements of the RTL and SVA.

This process differs significantly from the role of a designer or verification engineer. GPT-4 creativity allows it to generate SVA from buggy RTL as well as create buggy SVA for correct RTL; reproducibility presents a challenge; internal signals, timing, syntax, and semantics may be partially incorrect and are partly corrected by the rules mentioned above.

On the positive side, AutoSVA2-generated properties improved coverage of RTL behavior by up to six times over AutoSVA-generated ones with less effort and exposed an undiscovered bug. The authors think that the approach has the potential to expand the adoption of FPV and pave the way for safer LLM-assisted RTL design methodologies. The Times They Are A-Changin’?

Also Read:

High-speed PCB Design Flow

Perspectives from Cadence on Data Center Challenges and Trends

Designing and Simulating Next Generation Data Centers and AI Factories