SILVACO 073125 Webinar 800x100

Metal fill extraction: Breaking the speed-accuracy tradeoff

Metal fill extraction: Breaking the speed-accuracy tradeoff
by Admin on 05-12-2025 at 10:00 am

fig1 metal fill

As semiconductor technology scales and device complexity increases, accurately modeling the parasitic effects of metal fill has become critical for circuit performance, power integrity, and reliability. Metal fill is a crucial part of the manufacturing process, ensuring uniform layer density, improving planarization, and managing thermal and stress effects. However, the added metal fill structures also introduce parasitic capacitances that can significantly impact the behavior of the circuit.

What is metal fill and why is it important?

Metal fill refers to the addition of non-functional conductive structures in the layout to ensure uniform density across the chip. This is important for the planarization process during chemical mechanical polishing (CMP), as well as for thermal management and stress management during manufacturing. Without metal fill, there would be large open areas on the chip with little to no metal, leading to uneven deposition and thickness variations that are difficult to planarize. Figure 1 presents an example of a design with many metal fill shapes populating the empty areas in between signal nets.

Figure 1. An illustration of an IC design with dense metal fill structures in-between signal nets.

By adding these metal fill shapes, the density of each metal layer is evened out, enabling more effective CMP and better control over the final wafer topography. This improves the manufacturability and reliability of the chip. Metal fill also helps with thermal management by providing more conductive paths for heat dissipation, and it can help mitigate mechanical stress effects during processing.

However, the added metal fill structures also introduce parasitic capacitances that can significantly impact the behavior of the circuit. These parasitic effects can influence the chip’s timing, power integrity and overall performance. As a result, accurately modeling the impact of metal fill has become crucial for modern IC designs.

Limitations of traditional approaches

Traditionally, there has been a significant computational challenge in balancing the accuracy of parasitic extraction with the runtime required. Ignoring metal fill entirely oversimplifies the model, while extracting all parasitic components from the fill shapes is computationally expensive. This has led to two main approaches:

  • Ignoring metal fill until signoff
    The simpler method is to ignore metal fill until the final signoff stage. The design is placed and routed, RC values are extracted, and static timing analysis (STA) is performed without considering metal fill. When the design is believed to be ready, metal fill is inserted, and RC extraction is rerun to account for the capacitive coupling. However, this late-stage addition of metal fill often results in timing violations that require numerous design modifications to resolve, delaying the project schedule.
  • Inserting metal fill during layout
    The second approach is to perform metal fill insertion after each place and route step, enabling extraction and STA to account for the effects on every iteration. However, this adds significant time to the layout loop, as inserting metal fill on every layer can add hours to each iteration, slowing down the development process. The metal fill would still need to be verified during signoff, but there should be fewer violations that require design modifications.

These two approaches hint at an underlying tradeoff: modeling accuracy vs. computational efficiency. The most detailed modeling includes all metal fill as part of the parasitic extraction procedure with high accuracy, but at the cost of very high computational resources and processing time. Approximations will reduce the computational load, but at a cost to accuracy, leading to potential design errors not caught during simulation phases.

There are several ways to control the tradeoffs between accuracy and computational cost. For example, nets can be modeled as grounded or left floating, a decision with significant impacts on the accuracy of parasitic values calculated. Consider the options:

  • Floating, no reduction: Keep all fill shapes as floating in simulations and extract all of them for a more precise modeling of the actual physical design.
  • Reduction: Reduces the parasitic network of the fill shapes by analyzing the whole design to understand the impact of each fill shape. This generates a reduced netlist and maintains parasitic network equivalence to enable faster simulations.
  • Grounding: Assume floating nets are grounded to simplify the extraction process and reduce computational overhead.
  • Ignoring: Ignore all metal fill shapes and extract only signal nets. This is usually only used for debugging purposes.

The need for a new solution

These traditional flows are no longer sufficient for today’s large and complex chips. Designers need a solution that does not delay the schedule but still delivers high correlation to real metal fill in terms of extraction and timing accuracy

The solution is a smart and adaptive metal fill extraction technique that selectively focuses on the most impactful parasitic capacitances. This approach is an enhancement to the “reduction” method and can provide over 4x runtime improvements compared to conventional methods, while maintaining minimal impact on accuracy (figure 2).

An adaptive metal fill extraction technique dynamically adjusts the level of detail based on the design context, such as the density of signal nets versus fill shapes. This  context-aware solution improves designer efficiency as semiconductor technology continues to scale and design complexity increases.

Figure 2: Chart comparing runtime for the different extraction techniques.

Conclusion: Enhancing productivity and reliability

Smarter metal fill extraction ensures that there are no unpleasant surprises late in the schedule, as the design team can be confident in the fast, accurate solution provided by the adaptive metal fill extraction technique. As semiconductor technology continues to advance, this type of smart, context-aware approach will become increasingly important for managing the complexity of metal fill and its impact on design implementation. This helps ensure the final IC product meets stringent performance and reliability requirements, even as semiconductor technology continues to scale.

About the author, Shehab Ashraf

Shehab is a product engineer for Calibre Design Solutions at Siemens Digital Industries Software, specializing in parasitic extraction. He received his BE in Electrical and Electronics Engineering from The German University in Cairo.

Also Read:

Siemens Describes its System-Level Prototyping and Planning Cockpit

Verifying Leakage Across Power Domains

Going Beyond DRC Clean with Calibre DESiemens Fleshes out More of their AI in Verification Story


How Arteris is Revolutionizing SoC Design with Smart NoC IP

How Arteris is Revolutionizing SoC Design with Smart NoC IP
by Mike Gianfagna on 05-12-2025 at 6:00 am

How Arteris is Revolutionizing SoC Design with Smart NoC IP

Recently, Design & Reuse held its IP-SoC Days event at the Hyatt Regency in Santa Clara. Advanced IP drives a lot of the innovation we are seeing in chip design. This event provides a venue for IP providers to highlight the latest products and services and share a vision of the future. IP consumers are anxious to hear about all the new technology conveniently in one place. Some of the presentations rose above the noise and made a substantial impact. This is a story of one of those presentations. Read on to see how Arteris is revolutionizing SoC design with Smart NoC IP.

Who’s Talking?

 The presentation was delivered by Rick Bye, director of product management at Arteris. He also developed the content with Guillaume Boillet, senior director of strategic marketing at Arteris. These folks bring a lot to the table when it comes to advanced chip design.

Guillaume Boillet

Besides Arteris, Guillaume Boillet has worked at Synopsys, Mentor, Atrenta, ST-Ericsson, STMicroelectronics, and Thales.  He has substantial experience in hardware design and EDA tool development and marketing with a focus on power optimization and connectivity. The skills he brings to Arteris are well-suited to the company’s broad footprint.

 

 

Rick Bye

Before joining Arteris, Rick Bye had a long career in the development of chips and IP for advanced communications and power management at companies such as Texas instruments, Broadcom, Zarlink Semi, Silicon Labs, NXP, and Arm. The combined knowledge and experience of these two gentlemen is formidable. The content developed and delivered at IP-SoC Days reflected that experience. A link is coming for the presentation but first let’s look at the topics covered.

What Was Discussed

Rick began with an overview of who Arteris is and where the company touches the semiconductor ecosystem. His first slide is an eye-popping overview of technical accomplishments and ecosystem impact. Once you click on the link below, you’ll get to see the incredible array of logos touched by Arteris.

In terms of corporate resume, there are also a lot of impressive statistics to review. Most folks know that Arteris is a system IP company. Some facts you may not know is that the company can claim 3.7B+ SoCs shipped in electronic systems, with 200+ active customers, and 850+ SoC design starts. The company’s products are used by 9 out of the top 10 semiconductor companies and the company has a 90%+ customer retention rate. There are plenty more accomplishments cited.

In terms of footprint, the focus is on overcoming complexity with maximum flexibility for optimized SoCs.  There are three parts of this story, SoC Integration Automation, Network-on-Chip (NoC) Interconnect IP, and Network-on-Chip Interface IP. The balance of the presentation focused on the second item and how Arteris impacts design in this area. Regarding NoC usage, it was reported that there are typically 5-20 NoCs per chip or chiplet with NoCs representing 10-13% of the silicon.

The next section of the presentation examines semiconductor market dynamics. The emerging trends and the impact of AI are always interesting to hear about, and there are some great statistics presented. For those who have been in this industry for a while, you will be familiar with the projections made by Handel Jones at IBS. You will get to see Handel’s latest numbers, which are always interesting.

The next part of the presentation focuses on Arteris Smart NoC technology. There are two families of products here. Ncore delivers the requirements for cache-coherent interconnect IP, and FlexGen, FlexNoC, and FlexWay serve the needs for non-coherent Interconnect IP.  The remainder of the presentation focuses primarily on the needs of the non-coherent portion of the design.  The figure below illustrates where the non-coherent interconnect IP products fit.

FlexGen impacts a broad range of applications, so more details are presented on this technology. The graphic at the top of this post presents some of those details. To provide more context, here are some additional facts:

Challenge: SoC design complexity has surpassed manual human capabilities, requiring smart NoC automation. Modern SoCs have 5 to 20+ unique NoC instances and each instance can require 5-10 iterations.

FlexGen, smart NoC IP from Arteris delivers:

  • Productivity Boost: Accelerates chip design by up to 10x, shortening and reducing iterations from weeks to days for greater efficiency
  • Expert-Level Results: Enhances engineering efficiency by 3x while delivering expert-quality results with optimized routing and reduced congestion
  • Wire Length Reduction: AI-driven heuristics reduce wire length by up to 30%, improving chip or chiplet power efficiency

Connects any processor (Arm, RISC-V, x86) and supports industry protocols.

The presentation then dives into more detail about FlexGen and how it builds on FlexNoC technology for physical awareness. The core 80+ patent profile of Arteris in this area is explored and specific examples of performance and efficiency advantages are presented. The workflows involved are presented, along with specific examples of its impact.

To Learn More

I’ve just scratched the surface in this overview. You need to watch the complete presentation from IP-SoC Days to get the full picture of how Arteris can help with your next design. You can access a video of the complete presentation here. And that’s how Arteris is revolutionizing SoC design with Smart NoC IP.

Also Read:

Is Arteris Poised to Enable Next Generation System Design?

Arteris Raises Bar Again with AI-Based NoC Design

Arteris Raises Bar Again with AI-Based NoC Design

MCUs Are Now Embracing Mainstream NoCs

 


CEO Interview with Ido Bukspan of Pliops

CEO Interview with Ido Bukspan of Pliops
by Daniel Nenni on 05-10-2025 at 4:00 pm

Ido Bukspan

Prior to becoming CEO of Pliops in 2023, Ido Bukspan was the senior vice president of the Chip Design Group at NVIDIA and one of the leaders at Mellanox before it was acquired by NVIDIA for nearly $7 billion.

Tell us about your company.

Pliops accelerates and amplifies the performance and scalability of global GenAI infrastructure, driving unparalleled efficiency and innovation.

Pliops was founded in 2017 by storage industry veterans from Samsung, M-Systems, and XtremIO. Pliops is pioneering a new category of product that enables cloud and enterprise data centers to access data up to 50 times faster with one-tenth of the computational load and power consumption. Its technology consolidates multiple inefficient layers into one ultra-fast device based on a groundbreaking approach. Pliops’ solution addresses the scalability challenges posed by the cloud data explosion and the increasing data demands of AI and ML applications.

What problems are you solving?

Pliops XDP LightningAI, our revolutionary Accelerated Key-Value distributed smart node, introduces a new tier of memory that surpasses HBM for GPU compute applications. Our product can double end-to-end performance and enhance efficiency for vLLM, a leading inferencing solution. By leveraging Pliops’ state-of-the-art technology, we deliver advanced GenAI and AI solutions, significantly improving GPU utilization, reducing total cost of ownership, and cutting power consumption and carbon emissions.

What application areas are your strongest?

Our strongest application areas focus on accelerating GenAI applications, including LLM inferencing, DLRM, RAG/VectorDB, and SQL and Document DB acceleration. Our users benefit from over 3X better utilization of compute resources, more than 50% savings in capital expenditures, and a 50% reduction in carbon footprint.

What keeps your customers up at night?

Our customers are deeply concerned about power consumption in data centers, especially as AI infrastructure and emerging AI applications significantly increase power footprints and strain cooling budgets. They also worry about maintaining margins as they expand their AI infrastructure, adding GPU tiers of compute. The growing power and cooling demands, combined with substantial capital expenditures on GPUs, are consuming margins and keeping our customers up at night.

What does the competitive landscape look like and how do you differentiate?

The competitive landscape in GenAI and applications requiring new infrastructure, including additional CapEx, cooling budgets, and increased power, is still emerging as we stand at the dawn of the AI revolution. Innovation is essential in this space, and we embrace all solutions, not just our own. Specifically, our focus is on enabling GPUs to have IO access to large amounts of data for local consumption by applications. There are various approaches to solving this, and our strategy is among them.

What new features/technology are you working on?

We are continually advancing our technology to meet the evolving demands of AI applications. Our latest developments include enhancements to our XDP LightningAI, which now provides unprecedented levels of memory and compute efficiency for GenAI workloads. We’re focusing on solutions that allow GPUs to access vast amounts of data for local processing, significantly improving performance and reducing energy consumption. Additionally, we are working on innovative methods to further decrease the power footprint and cooling requirements of data centers, ensuring our solutions remain at the forefront of sustainable AI infrastructure.

How do customers normally engage with your company?

We provide a comprehensive toolbox for critical GenAI applications, making it exceptionally easy for our customers to adopt our products and technology. Our solutions encompass silicon, hardware, and software components, and we offer reference designs that ISVs, CSPs, and operators can utilize with minimal effort to realize significant benefits. Our products are available as cloud-ready solutions, and we collaborate with compute/storage OEMs and ISVs to support HPC and on-prem data center operators.

Talk to a Pliops product expert.

Also Read:

CEO Interview with Roger Cummings of PEAK:AIO

Executive Interview with Koji Motomori, Senior Director of Marketing and Business Development at Numem

CEO Interview with Richard Hegberg of Caspia Technologies


CEO Interview with Roger Cummings of PEAK:AIO

CEO Interview with Roger Cummings of PEAK:AIO
by Daniel Nenni on 05-10-2025 at 2:00 pm

Roger Cummings of PEAK AIO

Roger Cummings is the CEO of PEAK:AIO, a company at the forefront of enabling enterprise organizations to scale, govern, and secure their AI and HPC applications. Under Roger’s leadership, PEAK:AIO has increased its traction and market presence in delivering cutting-edge software-defined data solutions that transform commodity hardware into high-performance storage systems for AI and HPC workloads.

Roger is a seasoned entrepreneur and business leader with a distinguished track record of driving growth, innovation, and market leadership. Specializing in application infrastructure and AI/ML technologies, Roger has consistently identified emerging opportunities and built organizations that establish market dominance in rapidly evolving industries. Over his career, Roger has successfully guided five early-stage companies through highly successful acquisitions, raising over $1 billion in funding to fuel their global expansion.

In addition to his executive roles, Roger is an advisory board member at DevNetwork and an advisor at High Alpha Innovation. He has co-authored several papers on go-to-market strategies, operational excellence, and AI application infrastructure, reflecting his thought leadership in the field.

Tell us about your company?

PEAK:AIO is a pioneering AI infrastructure company specializing exclusively in data storage solutions engineered for Artificial Intelligence workloads. Unlike legacy IT storage vendors, our solutions are built from the ground up to match AI innovators’ exact needs and skill sets in the healthcare, government, life sciences, and advanced research sectors. Our mission is to eliminate bottlenecks in AI development by delivering unmatched speed, simplicity, and scalability.

What problems are you solving?

AI projects today often struggle with data bottlenecks caused by legacy storage solutions originally designed for traditional IT workloads. These outdated systems are typically complex, slow, energy-inefficient, and poorly aligned with the requirements of modern AI workloads. PEAK:AIO directly addresses these issues by providing storage infrastructure specifically designed for AI, eliminating data throughput limitations and significantly improving efficiency. We deliver up to six times the performance in one-sixth of the footprint, using just a fraction of the energy compared to traditional IT storage solutions.

What application areas are your strongest?

We excel in sectors where rapid access to vast datasets is crucial for successful AI outcomes, notably healthcare, life sciences, government, and large-scale AI research. Our technology underpins many significant AI initiatives, including pioneering deployments within the UK’s NHS and cutting-edge research collaborations with renowned institutions such as Los Alamos National Labs (LANL).

What keeps your customers up at night?

AI innovators, including data scientists, researchers, and medical professionals, worry most about their ability to rapidly and reliably access massive datasets needed to train complex AI models. They fear data bottlenecks that slow their projects, escalate costs and inefficiently use energy. Furthermore, managing complex, legacy storage infrastructure is not their core competency. Our customers need solutions that offer powerful simplicity, high performance, and extreme reliability, which is exactly what PEAK:AIO delivers.

What does the competitive landscape look like and how do you differentiate?

The competitive landscape is primarily populated by traditional IT storage vendors, which have adapted legacy solutions to attempt to meet AI needs. These repurposed offerings are usually complex, costly, and energy-intensive.

PEAK:AIO differentiates through purpose-built architecture optimized explicitly for AI workloads, delivering industry-leading performance, significantly lower energy use, and an unparalleled simplicity that fits the persona of today’s AI leaders, who are often highly skilled researchers and scientists who demand solutions that just work.

What new features/technology are you working on?

We are actively expanding our technology leadership by integrating advanced memory and storage architectures, such as CXL (Compute Express Link) and next-generation NVMe solutions, into our platforms. Additionally, we’re advancing our proprietary software-defined storage engine, designed to dynamically adapt storage tiers based on real-time AI workloads, continuously ensuring optimal performance and efficiency.

Additionally, we are evaluating vector database technologies and how this will help our clients.

How do customers normally engage with PEAK:AIO?

Customers typically engage through partnerships. We collaborate closely with each partner and customer to understand their specific AI infrastructure needs, and our specialist team provides personalized guidance, installation support, and ongoing optimization. This high-touch, expert-led engagement ensures our customers gain the maximum value and performance from their AI investments.

Also Read:

Executive Interview with Koji Motomori, Senior Director of Marketing and Business Development at Numem

CEO Interview with Richard Hegberg of Caspia Technologies

CEO Interview with Dr. Michael Förtsch of Q.ANT


Video EP4: A Deeper Look at Advanced Packaging & Multi-Die Design Challenges with Anna Fontanelli

Video EP4: A Deeper Look at Advanced Packaging & Multi-Die Design Challenges with Anna Fontanelli
by Daniel Nenni on 05-09-2025 at 10:00 am

In this episode of the Semiconductor Insiders video series, Dan is once again joined by Anna Fontanelli, founder and CEO of MZ Technologies. In this discussion, more details of the challenges presented by advanced packaging and multi-die design are explored. Anna provides details of what’s involved in architectural exploration and interconnect management. She also provides some background on how MZ Technologies helps tame these challenges with its GENIO EVO platform.

Contact MZ

Video EP3: A Discussion of Challenges and Strategies for Heterogeneous 3D Integration with Anna Fontanelli

The views, thoughts, and opinions expressed in these videos belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Analog Bits Steals the Show with Working IP on TSMC 3nm and 2nm and a New Design Strategy

Analog Bits Steals the Show with Working IP on TSMC 3nm and 2nm and a New Design Strategy
by Mike Gianfagna on 05-09-2025 at 8:00 am

Analog Bits Steals the Show with Working IP on TSMC 3nm and 2nm and a New Design Strategy

The TSMC Technology Symposium recently kicked off in Santa Clara, with a series of events scheduled around the world. This event showcases the latest TSMC technology. It is also an opportunity for TSMC’s vast ecosystem to demonstrate commercial application on TSMC’s technology. There is a lot to unpack at an event like this. There are great presentations and demonstrations everywhere, but occasionally a company rises above the noise and grabs the spotlight with unique or memorable news.

My view is that Analog Bits stepped into the spotlight this year with cutting-edge analog IP on the latest nodes and a strategy that will change the way design is done. Let’s examine how Analog Bits steals the show with working IP on TSMC 3nm and 2nm and a new design strategy.

Blazing the Trail to 2nm

Working silicon demonstrations of TSMC’s CLN2P technology represent rare air at this TSMC event. Analog Bits recently completed a successful second test chip tapeout at 2nm, but the real news is the company also came to the show with multiple working analog IPs at 2nm. Six precision IPs were demonstrated, the locations of those blocks on the test chip is shown below and the finished chip pictured at the top of this post.

ABITCN2P – Test Chip Layout

What follows are some details from the cutting edge. Let’s begin with the wide range PLL.  Features of this IP include:

  • Electrically programmable for multiple applications
  • Wide range of input and output frequencies for diverse clocking needs
  • Implemented with Analog Bits’ proprietary architecture
  • Low power consumption
  • Spread spectrum tracking capability
  • Requires no additional on-chip components or bandgaps, minimizing power consumption
  • Excellent jitter performance with optimized noise rejection

The figure below illustrates some power and jitter numbers. Note the jitter data is for the whole test setup, test chip output buffers, test board, measurement equipment, and not a de-embedded number of the PLL standalone.

PLL Jitter and Power

Next is the PVT sensor. IPs like this are critical for managing power and heat. There will be more on power management in a bit. Features of this IP include:

  • High accuracy thermometer is a highly integrated macro for monitoring temperature variation on-chip
  • Industry leading accuracy untrimmed, with easy trimming procedures
  • An additional voltage sample mode is included allowing for voltage monitoring
  • The block includes a simple-to-use digital interface that works with standard core and IO level power supplies
  • Implemented with Analog Bits’ proprietary architecture
  • Low power consumption

Demonstrations included showcasing the temperature accuracy and temperature and voltage linearity of the IP.

Next is a droop detector. Voltage droop is another key item for power management.  It occurs when the current in the power delivery network (PDN) abruptly changes, often due to workload fluctuations. This effect can lead to supply voltage drops across the chip which can cause performance degradation, reduce energy efficiency, and even result in catastrophic timing failures. Feature of this IP include:

  • Integrated voltage reference for stand-alone operation
  • Easy to integrate with no additional components or special power requirements
  • Easy to use and configure
  • Programmable droop detection levels
  • Low power
  • Implemented with Analog Bits’ proprietary architecture
  • Requires no additional on-chip macros, minimizing power consumption

The next IP is an 18-40MHz crystal oscillator. Features for this IP include:

  • Pad macro that supports most industry standard crystals in the 18-40MHz range
  • Uses standard CMOS transistors
  • Power-down option for IDDQ testing
  • Oscillator by-pass mode option for logic testing
  • Self-contained ESD protection structure

And finally, the differential transmit (TX) and receive (RX) IP blocks. Features here include:

TX

  • Wide frequency range support up to 2,000 MHz output for diverse clocking needs
  • Implemented with Analog Bits’ proprietary architecture
  • Low power consumption
  • Requires no additional on-chip components or bandgaps, minimizing power consumption

RX

  • Differential clock receiver
  • Single-ended output to chip core
  • Wide ranges of input frequencies for diverse clocking needs
  • Implemented with Analog Bits’ proprietary architecture
  • Low power consumption
  • Programmable termination
  • Spread spectrum tracking capability
  • Requires no additional on-chip components or bandgaps, minimizing power consumption

On the Cutting Edge with 3nm IP

Four power management IPs from TSMC’s CLN3P process were also demonstrated at the show. The test chip these IPs came from is also pictured in the graphic at the top of this post. The IPs demonstrated include:

A scalable low-dropout (LDO) regulator. Features of this IP include:

  • Integrated voltage reference for precision stand-alone operation
  • Easy to integrate with no additional components or special power requirements
  • Easy to use and configure
  • Scalable for multiple output currents
  • Programmable output level
  • Trimmable
  • Implemented with Analog Bits’ proprietary architecture
  • Requires no additional on-chip macros, minimizing power consumption

The line regulation performance of this IP is shown in the figure below.

Next is a spread spectrum clock generation PLL supporting PCIe Gen4 and Gen5. Features of this IP include:

  • High performance design emphasis for meeting low jitter requirements in PCIe Gen4 and Gen5 applications
  • Implemented with Analog Bits’ proprietary LC architecture
  • Low power consumption
  • Spread spectrum clock generation (SSCG) and tracking capability
  • Excellent jitter performance with optimized noise rejection
  • Calibration code and bandgap voltage observability (for test)
  • Requires no additional on-chip components, minimizing power consumption

A high-accuracy thermometer IP using Analog Bits patented pinless technology was also demonstrated. Features of this IP include:

  • IP is a highly integrated macro for monitoring temperature variation on-chip
  • Industry leading accuracy untrimmed, with easy trimming procedures
  • An additional voltage sample mode is included allowing for voltage monitoring
  • The block includes a simple-to-use digital interface that works with just standard core and power supply saving customers analog routing and simplifying package design
  • Pinless technology means the IP is powered by the core voltage, no analog power pin is required
  • Low power consumption

Voltage linearity for this IP is shown in the figure below.

Voltage Linearity

And finally, a droop detector for 3nm. Features include:

  • Integrated voltage reference for stand-alone operation
  • Easy to integrate with no additional components or special power requirements
  • Easy to use and configure
  • Programmable droop detection levels
  • Low power
  • Implemented with Analog Bits’ proprietary architecture
  • Requires no additional on-chip macros, minimizing power consumption

Intelligent Power Architecture Launches a New Design Strategy

Innovation brings new challenges. A big design challenge is optimizing performance and power in an on-chip environment that is constantly changing, is prone to on-chip variation and is faced with all kinds of power-induced glitches. As multi-die design grows, these problems are compounded across many chiplets that now also need a high-bandwidth, space-efficient, and power-efficient way to communicate.

This problem cannot be solved as an afterthought. Plugging in optimized IP or modifying software late in the design process will not be enough. Analog Bits believes that developing a holistic approach to power management during the architectural phase of the project is the only path forward.

It is against this backdrop that the company announced its Intelligent Power Architecture initiative at the TSMC Technology Symposium. The company stated that its high-accuracy on-die PVT sensors, process performance monitors, integrated power-on resets, droop detectors, LDOs, and glitch catchers all work together with its low power SerDes, ADCs and pinless IP libraries to deliver a power management architecture that will meet the most demanding requirements. Pinless IP technology, invented by Analog Bits, will become even more critical to migrate below 3nm as all of the IP will work directly from the core voltage. The technology is already proven in production silicon on N5 and N3.

Analog Bits stated the company is already working with large, successful organizations that are building some of the most power-hungry chips in the world to achieve this goal. The mission now is to bring an intelligent power architecture to mainstream design for all companies. This work will be interesting to watch as Analog Bits re-defines the way advanced design is done. 

To Learn More

You can find extensive coverage of Analog Bits on SemWiki here. You can also learn more about what Analog Bits did at the TSMC Technology Symposium here, including additional IP demos  of automotive grade pinless high-accuracy PVT, pinless PLL, and PCIe SERDES on TSMC N5A. And you can watch the details of both the 2nm and 3nm demos here.

Keep watching the company’s website as the strategy behind the Intelligent Power Architecture unfolds. And that’s how Analog Bits steals the show with working IP on TSMC 3nm and 2nm and a new design strategy.

Also Read:

2025 Outlook with Mahesh Tirupattur of Analog Bits

Analog Bits Builds a Road to the Future at TSMC OIP

Analog Bits Momentum and a Look to the Future


Podcast EP286: The Significant Impact of Ambient Scientific Technology on AI Deployment with GP Singh

Podcast EP286: The Significant Impact of Ambient Scientific Technology on AI Deployment with GP Singh
by Daniel Nenni on 05-09-2025 at 6:00 am

Dan is joined by GP Singh, CEO of Ambient Scientific. With over 20 years of experience, GP has played a pivotal role in shaping the industry, driving 50+ chip tapeouts, including game-changing advancements at FinFET technology nodes. Now, as the CEO of Ambient Scientific, GP brings together hands-on engineering expertise and visionary leadership, with more than 50 patents and 5 publications to his name.

In this highly informative and very relevant discussion, GP explains some of the fundamental obstacles to large-scale deployment of AI in both the datacenter and at the edge. A primary issue today is that AI circuits are either too limited in capability or too power hungry. GP describes the genesis of a unique approach to this problem developed by Ambient. He explains that the approach is to use analog AI compute, but to create a hybrid analog/digital architecture that delivers high performance at low power.

Ambient has plans to deliver applications on top of its silicon technology but customers can also build applications as well, creating a very wide footprint in the market. The impact of this enabling technology can be quite substantial.

Contact Ambient Scientific

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


RISC-V Virtualization and the Complexity of MMUs

RISC-V Virtualization and the Complexity of MMUs
by Bernard Murphy on 05-08-2025 at 7:00 am

Virtual memory mapping

In the early days of RISC-V adoption, applications were microcontroller-centric with no need for virtualization support. But horizons expanded and now RISC-V is appearing in application processors, very much needing to be able to virtualize multiple apps concurrently. Take another step forward to datacenter servers running virtual machines under hypervisors, with each virtual machines running multiple virtual processes. Virtualization is turning up everywhere in RISC-V, supported by a standard for the MMU that virtualization requires. But the complexity of the standard is taxing verification teams when it comes to developing comprehensive testplans. I talked to Adnan Hamid (President and CTO) and Dave Kelf (CEO) of Breker Verification to get insight.

A quick recap on virtual memory and MMUs

The basic idea is quite simple. Each software developer can assume their program is running standalone with as much memory as it needs. The operating system (and the MMU) supports this fiction through an indirection between virtual and physical memory. This memory space indirection allows the OS/MMU to allocate and move around chunks of memory in the form of pages to support multiple processes occupying physical memory and/or offline storage at the same time. Virtualization delivers multiple benefits: More than one program can be active at a time; Each program can assume it has access to more memory than is physically available since overflow can be swapped out to disk; the OS/MMU can transparently optimize to reduce memory fragmentation as running processes complete; the OS/MMU can ensure memory isolation between processes, so if one process tries to access an out-of-bounds address in its own space, that attempt doesn’t affect other processes running at the same time.

Hypervisors add another level of indirection. These run multiple virtual machines, each in their own virtual space hosting an OS with services, in turn running multiple virtualized processes within that virtualized space. Nothing really complicated there.

Where it gets complicated

Seems pretty straightforward, right? Unfortunately it gets a whole lot more tangled in the details. MMU complexity isn’t unique to RISC-V. Adnan has previously worked on MMU verification for x86 and Arm-based systems and confirms there is plenty of complexity in both. Still, the RISC-V definition is unique in a few ways. First, the definition was finalized more recently, implying perhaps more time is needed for the standard (or at least documentation of the standard) to fully mature through widespread deployment. Second, in keeping with the RISC-V philosophy, MMU support is defined through extensions to the ISA, but the compatibility test framework requires demonstrating system level compatibility between multiple processors, probably coherent networks, the MMU, external memory and backing store. Third, the RISC-V standard teams saw opportunity to further generalize the definition, no doubt adding more capability but also more complexity.

Some of the complexity is just in the nature of MMUs. Process image data is stored in pages, each page 4KB by default but different profiles allow for larger pages, even a mix of page sizes. Pages are indexed by page tables, a lookup mechanism storing virtual and physical offsets for each page in memory. When a read or store is made to an address, the MMU will attempt to find the corresponding reference in these page tables. Naturally this lookup is supported by a cache (TLB) to enhance performance. If the appropriate address is already in a page in memory, the value can be returned/updated. If not, the MMU faults through to finding the appropriate page in main memory or backing store, bringing it in and making space by evicting some least recently used page currently in memory. When a hypervisor is active, lookup must go through two tables of indirection.

Add to this multiple levels of page table to accelerate lookup, multiple address translation protocols, privilege management, and other goodies which play into the details of how the MMU should function to be compatible with the RISC-V compliance tests. There is a written specification which Adnan repeatedly called “dense”, meaning long and complex. No doubt very carefully thought through by experts, though there still seems to be some debate about whether it is fully finalized.

Fairly quickly I get out of my depth in all this complexity. Instead I’ll turn to my own level of indirection by talking about what Breker has been doing to help DV teams in this space. Industrial experience in working with the standard is a pretty good indicator of maturity. One important point to remember is that the standard defines ISA extensions for MMU support, and it provides a system compatibility reference checker. It doesn’t tell you how to build your MMU or how to verify it. Both are left as exercises for the design and verification teams.

Breker SystemVIP for MMU verification

Breker hosted a tutorial on MMU testing at DVCon which was well attended (90 people). So popular that they have subsequently repeated the tutorial, reaching similar crowds. The tutorials reinforce that DV experts are struggling to know how to write testplans around MMUs for RISC-V-based systems.

Breker has put a lot of work into understanding these requirements to build a system VIP which can provide a canned starting point for DV testplans and test implementation. Adnan freely confesses that they aren’t all the way there yet. In Breker’s own work and in talking with clients, they know of holes in the Breker solution. Adnan says they have frequent and spirited discussions around whether the Breker interpretation is correct on any given point. At this point Adnan feels that the Breker has it right more often than not, but they still consider feedback both to test and to drive refinements to their implementation. Meantime clients and prospects keep coming back to Breker, with questions and arguments. A pretty good indication that even if incomplete, Breker is still leading the pack!

Very interesting. MMU system testing in the RISC-V world may be a niche but it’s a very important niche for anyone building a system which claims to support virtualization. You can learn more about Breker work in this space HERE.

Also Read:

How Breker is Helping to Solve the RISC-V Certification Problem

Breker Brings RISC-V Verification to the Next Level #61DAC

System VIPs are to PSS as Apps are to Formal


Beyond the Memory Wall: Unleashing Bandwidth and Crushing Latency

Beyond the Memory Wall: Unleashing Bandwidth and Crushing Latency
by Lauro Rizzatti on 05-07-2025 at 2:00 pm

Figure 1

VSORA AI Processor Raises $46 Million to Fast-Track Silicon Development

We stand on the cusp of an era defined by ubiquitous intelligence—a stone’s throw from a tidal wave of AI-powered products underpinned by next-generation silicon. Realizing that future demands nothing less than a fundamental rethink of how we design semiconductors and architect computers.

At the core of this transformation is a simple—but profound—shift: AI silicon must be shaped by AI workloads from day one. Gone are the days when hardware and software evolve in parallel—and only converge at validation, by which point the architecture is set in stone. Today’s paradigm demands re-engineer engineering, i.e., software-defined hardware design, tightly integrating AI code and silicon from the ground up.

Brute Force, No Grace: GPUs Hit the Memory Wall Processing LLMs

Today, the dominant computing architecture for AI processors is the Graphics Processing Unit (GPU). Originally conceived in 1999, when Nvidia released the GeForce 256 marketed as the “world’s first GPU”, it addressed the growing demand for parallel processing in rendering computer graphics. The GPU has since been repurposed to handle the massive, highly parallel workloads required by today’s AI algorithms—particularly those based on large language models (LLMs).

Despite significant advancements in GPU theoretical throughput, GPUs still face fundamental limitations, namely, poor computational efficiency, high power consumption, and suboptimal latency. To exemplify, a GPU with a theoretical peak performance of one PetaFLOPS and a realistic efficiency of 10% when processing a state-of-the-art LLM such as GPT-4 or LLM3-405B (noting that efficiency varies depending on the specific algorithm), would in practice deliver only 100 TeraFLOPS. To achieve a sustained PetaFLOPS of performance, 10 such GPUs would be required resulting in substantial more power consumption than that of a single device. Less apparent, this configuration also introduces significantly longer latency, compounding the inefficiencies.

Peeling back the layers of a GPU would uncover the culprit behind its poor efficiency: the memory wall. This long-standing bottleneck arises from an ever-widening gap between the insatiable demand of compute cores for data and the finite bandwidth of off-chip memory. As a result, cores frequently stall waiting on data transfers, preventing sustained utilization even when computational resources are plentiful.

Enhancements to the memory bandwidth via layered access in the form of multi-level caches have helped mitigate the impact—until the advent of AI workloads exposed this limitation. GPU’s brute-force approach, necessary to handle large language models (LLM), comes at a price: poor efficiency resulting in high energy consumption and long latency.

While GPU limitations during LLM training primarily manifest as increased computational cost, they pose a more critical obstacle during inference. This is especially pronounced in edge deployments, where stringent power budgets and real-time latency requirements, crucial for applications like autonomous driving, severely restrict GPU viability.

The VSORA Solution: Knocking Down the Memory Wall

While the semiconductor industry is intensely focused on mitigating the memory bandwidth bottleneck that plagues LLM inference processing, French startup VSORA has quietly pioneered a disruptive solution. The solution represents a paradigm shift in memory management.

VSORA Architecture: Functional Principles

The VSORA’s architecture redefines how data is stored, moved, and processed at scale. At its heart lies an innovative scalable compute core designed around a very fast tightly-coupled-memory (TCM).

The TCM functions like an expansive and vast register file—offering the lowest-latency, single-cycle read/write access of any on-chip memory. Placed directly alongside the compute fabric, it bypasses the multi-cycle penalties of conventional cache hierarchies. As a result, VSORA maintains exceptionally high utilization even on irregular workloads, since hot data is always available in the very next cycle.

Together, the compute logic and the TCM form a unified, scalable compute core that minimizes data-movement overhead and bypasses traditional cache hierarchies. The result is an order-of-magnitude reduction in access latency and blazing-fast end-to-end inference performance across edge and data-center deployments. See figure 1.

Figure 1: Traditional hierarchical-cache memory structure vs VSORA register-like memory approach [Source: VSORA]

VSORA Architecture: Physical Implementation

The VSORA architecture is realized using a chiplet-based design within a 2.5D silicon‐interposer package, coupling compute chiplets to high-capacity memory chiplets. Each compute chiplet carries two VSORA basic compute cores, and each memory chiplet houses a high-bandwidth memory stack. Compute and memory chiplets communicate over an ultra-low-latency, high-throughput Network-on-Chip (NoC) fabric.

In the flagship Jotunn8 device, eight compute chiplets and eight HBM3e chiplets are tiled around the central interposer, delivering massive aggregate bandwidth and parallelism in a single package.

Beyond Bandwidth/Latency: VSORA’s On-the-Fly Re-configurable Compute Cores Unlock Algorithm-Agnostic Deployment

In most AI accelerators today, the fundamental compute element is a single-bit multiply-accumulate (MAC) unit. Thousands—or even hundreds of thousands—of these MACs are woven together in a massive array, with both the compiler and the user defining how data flows spatially across the array and in what temporal order each operation executes. While this approach excels at raw throughput for uniform, fixed-precision workloads, it begins to fracture under the demands of modern large language models and cutting-edge AI applications, which require:

  • Mixed-precision support: LLMs often need to employ different quantization on different layers, for example, a mix of FP8 Tensorcore, FP16 Tensorcore and FP16 DSP layers within the same network to balance performance, accuracy and numerical fidelity. This requires the system to repeatedly quantize and dequantize data, introducing both overhead and rounding error
  • Dynamic range management: Activations and weights span widely varying magnitudes. Architectures built around a single bit can struggle to represent very large or very small values without resorting to costly software-driven scaling.
  • Irregular and sparse tensors: Advanced workloads increasingly exploit sparsity to prune redundant connections. A rigid MAC mesh, optimized for dense operations, underutilizes its resources when data is sparse or when operations deviate from simple dot products.

These limitations introduce bottlenecks and reduce accuracy, consequently throughput drops when precision conversions don’t map neatly onto the MAC fabric, and critical data must shuffle through auxiliary units for scaling or activation functions.

VSORA’s architecture flips the script on traditional accelerator fabrics by adopting reconfigurable compute tiles that adapt on the fly—zero downtime, zero manual reprogramming. Instead of dedicating large swaths of silicon to fixed-function MAC arrays or rigid tensor cores, each VSORA tile can instantly assume either DSP-style or Tensorcore-style operation, at any precision (FP8, FP16, INT8, etc.), on a per-layer basis.

In practice, this means that:

  • Layer-optimal precision: One layer might run at FP16 with high-dynamic-range DSP operations for numerically sensitive tasks, then the very next layer switches to FP8 Tensorcore math for maximum throughput—without any pipeline stalls.
  • Resource consolidation: Because every tile can serve multiple roles, there’s no idle silicon stranded when workloads shift in precision or compute type. VSORA sustains peak utilization across the diverse math patterns of modern LLMs.
  • Simplified compiler flow: The compiler’s task reduces to choosing the ideal mode per layer—Tensorcore or DSP—instead of wrestling with mapping data to dozens of discrete hardware blocks.

The result is an accelerator that tunes itself continuously to each model’s needs, delivering higher accuracy, lower latency, and superior energy efficiency compared to static, single-purpose designs.

The VSORA’s architecture is not just about raw bandwidth; it’s about intelligent data processing, tailored to the specific demands of each application. This meticulous attention to detail at the core level is what distinguishes VSORA, enabling them to deliver AI inference solutions that are both powerful and efficient.

VSORA’s Secret Weapon: The Intelligent Compiler

Hardware ingenuity is only half the equation. VSORA’s algorithm-agnostic compiler consists of two stages. A front-end graph, hardware-independent compiler, ingests standard model formats (Tensorflow, PyTorch, ONNX, etc.) and optimizes the model via layer fusion, layer re-ordering, weight compilation and scheduling, slicing, tensor layout optimization, execution scheduling and sparsity enabling (data and weights). A back-end, LLVM-based compiler, fully automates the mapping of leading-edge LLMs—such as Llama—onto the VSORA J8.

VSORA’s architecture radically simplifies the deployment of large language models by replacing the tedious, error-prone mapping workflows common in GPU environments with an automated, software-defined memory management layer. Unlike traditional GPU toolchains—where developers must hand-tune data layouts, manage low-level memory transfers, and master platform-specific APIs such as NVIDIA CUDA—VSORA’s compiler handles all of this transparently. As a result, teams can bring LLMs online far more quickly and reliably, even in power-constrained or latency-sensitive applications, without sacrificing performance or requiring deep hardware-level expertise.

The result is a seamless compilation software stack that maximizes chip utilization, simplifies deployment, and unleashes the full performance potential of VSORA’s breakthrough inference platform.

Conclusion

Unlike general-purpose accelerators optimized for training, VSORA conceived an architecture optimized for inference. The specialization reduces latency, boosts real-world responsiveness, and drives down operational costs in scenarios where every millisecond counts—from on-device AI in smart cameras to safety-critical systems in self-driving cars.

Market research forecasts AI inference revenue to double from about $100 billion in 2025 to an estimated $250 billion by 2030—a 15+ percent compound annual growth rate. As enterprises race to deploy real-time AI at scale, VSORA’s efficiency-first approach could redefine cost structures and performance benchmarks across the industry.

On April 27, 2025, VSORA announced a $46 millions investment led by Otium Capital and a prominent French family office, with participation from Omnes Capital, Adélie Capital, and co-financing by the European Innovation Council Fund. In the words of Khaled Maalej, VSORA founder and CEO, “this funding empowers VSORA to tape-out the chip and ramp up production.”

Also Read:

SNUG 2025: A Watershed Moment for EDA – Part 1

SNUG 2025: A Watershed Moment for EDA – Part 2

DVCon 2025: AI and the Future of Verification Take Center Stage


Intel’s Foundry Transformation: Technology, Culture, and Collaboration

Intel’s Foundry Transformation: Technology, Culture, and Collaboration
by Kalar Rajendiran on 05-07-2025 at 10:00 am

Intel and UMC 2025

Intel’s historical dominance in semiconductor process technology began to erode around 2018, as competitors started delivering higher performance at smaller nodes. In response, Intel is now doubling down on innovation across two fronts: advanced process nodes such as Intel 18A and 14A, and cutting-edge packaging technologies.

Interestingly, this emphasis on packaging innovation isn’t a deviation from Moore’s Law—it’s an expansion of it. In the original paper that gave birth to Moore’s Law, Gordon Moore wrote that it may prove economical to build large systems out of smaller functions, which are separately packaged and interconnected. That concept is materializing today through multi-die architectures and chiplet-based integration, which are key to Intel’s packaging roadmap.

These dual pillars of process and packaging took center stage at the recent Intel Foundry Direct Connect event, where Intel outlined how these technologies will power next-generation products in a world increasingly defined by AI-driven workloads and heterogeneous computing.

A separate article covers what was shared regarding advanced process and packaging technology. During Day 2 of the Direct Connect event, Walter Ng, VP of Worldwide Business Development at Intel Foundry Services, and TJ Lin, President of UMC-USA, gave a joint talk. This article focuses on that session.

The Cultural Challenge: From Products to Services

Technology alone is not enough to reinvent Intel’s role in the industry. A transformation from a product-centric company to a customer-focused foundry demands an equally profound cultural shift. For decades, Intel has engineered and delivered its own products; now, it must serve as a platform for others’ innovations. This shift was a major theme at the event, especially during the joint presentation by Intel and its strategic foundry partner, United Microelectronics Corporation (UMC).

UMC’s own evolution from an IDM (Integrated Device Manufacturer) to a dedicated foundry equips it with a culture deeply rooted in customer collaboration, operational efficiency, and service orientation. These are exactly the qualities Intel must adopt to succeed in its foundry ambitions—and UMC is well-positioned to help guide that transformation.

A Strategic Opportunity

While Intel is forging ahead on advanced process and packaging fronts, the 12nm process node was selected for the Intel-UMC partnership for several strategic reasons. Although future collaborations may include additional nodes, the immediate focus is on delivering a competitive 12nm platform that targets a broad range of applications: high-performance computing, mobile, RF, consumer, industrial, automotive, aerospace, and medical sectors.

This market is expected to grow to $20 billion by 2028, with early momentum driven by logic and RF designs. From 2027 onward, growth in specialty technologies is expected. Application areas include WiFi combo chips, RF transceivers, image signal processors, set-top box SoCs, and more—addressing the full spectrum of modern semiconductor demands.

Distributed Development and Accelerated Execution

Development is proceeding in parallel at UMC’s Tainan facility in Taiwan and Intel’s Ocotillo Technology Fabrication (OTF) site in Arizona, reinforcing a geo-diversified manufacturing strategy. With fabs across the US, Taiwan, Korea, China, EMEA, and Japan, the collaboration supports customers in building resilient, multi-sourced supply chains.

Initial performance benchmarks are promising: compared to UMC’s 22uLP node, the new 12nm offering delivers 28% better performance, 47% lower power consumption, and over 50% area savings. In response to anchor customers, Intel has accelerated its Process Design Kit (PDK) delivery schedule, enabling earlier design-in and tape-out.

The partners are also closely coordinating foundry operations and support services to ensure a seamless transition from design to high-volume manufacturing.

UMC’s Role and Expertise

UMC brings decades of experience in foundry operations, with a comprehensive ecosystem of IP and design enablement tools, support for specialty devices, and a diverse global customer base. Its track record in delivering complex, customized solutions makes it a strong partner in applications where tailored performance is essential.

Intel’s Added Value

Intel contributes significant R&D depth in FinFET technology, established advanced-node capacity, and leadership in packaging innovation. Initiatives like the Chiplet Alliance are enabling a robust ecosystem for modular system design. Furthermore, Intel’s domestic manufacturing footprint in the U.S. strengthens its appeal for customers with localization or national security requirements.

Together, Intel and UMC are offering a competitive FinFET solution that supports multi-sourcing strategies and provides a clear technology migration path for future products.

Service Culture Learning as a Catalyst for Change

Beyond technological and operational synergies, this collaboration serves a more profound purpose in Intel’s evolution: accelerating its cultural transformation. UMC’s journey from IDM to foundry is now becoming part of Intel’s learning curve. As Intel adopts a more customer-first mindset, this partnership offers valuable guidance and real-world insight.

The collaboration is not merely an exchange of capabilities; it is also a transfer of values, principles, and best practices that may shape the long-term success of Intel Foundry Services.

Summary

In a semiconductor industry defined by diversification, specialization, and global complexity, the Intel-UMC 12nm partnership exemplifies smart, strategic collaboration. By combining UMC’s mature process expertise with Intel’s FinFET and packaging leadership—alongside a deepening cultural alignment—the partnership is well-positioned to unlock new market opportunities.

As Intel seeks to reclaim its role as a technology leader and establish itself as a next-generation foundry platform, this collaboration with UMC isn’t just strategic—it’s foundational.

Also Read:

Intel’s Path to Technological Leadership: Transforming Foundry Services and Embracing AI

Intel Foundry Delivers!

Intel Presents the Final Frontier of Transistor Architecture at IEDM