SNPS1670747138 DAC 2025 800x100px HRes

Emerging NVM Technologies: ReRAM Gains Visibility in 2024 Industry Survey

Emerging NVM Technologies: ReRAM Gains Visibility in 2024 Industry Survey
by Daniel Nenni on 04-30-2025 at 11:00 am

Slide1

A recent survey of more than 120 anonymous semiconductor professionals offers a grounded view of how the industry is evaluating non-volatile memory (NVM) technologies—and where things may be heading next.

The 2024 NVM Survey, run in late 2024 and promoted through various semiconductor-related platforms and portals including SemiWiki, drew responses from engineers, architects, and decision-makers across North America, Europe, and Asia. It focused on how memory IP is being selected, which technologies are under review, and what factors matter most to the people making those calls.

81% of respondents said they’re currently evaluating or have previously used NVM IP. These are teams with real-world design decisions in motion. The respondent base included a mix of semiconductor vendors, IP companies, and system developers—ranging from large global firms to focused design teams. Job titles covered everything from engineers to CTOs.

When asked about emerging NVM types, ReRAM (Resistive RAM) ranked among the most recognized technologies. Over 60% of respondents were familiar with it, placing it in the lead slightly ahead of MRAM. While embedded flash remains dominant, newer options like ReRAM are clearly on the radar as potential alternatives. That recognition doesn’t guarantee adoption. But it does indicate that ReRAM is part of the memory conversation for more companies than in years past.

A notable number of respondents expect to select NVM IP within the next six to 12 months. Some respondents are also evaluating multiple NVM options in parallel, which reflects a shifting landscape. Cost, power, integration complexity, and endurance are all forcing companies to think beyond the status quo.

When asked about the criteria driving their NVM IP selection, respondents cited power efficiency, reliability, integration flexibility, and scalability. Two factors stood out: reliability (42%) and high-temperature performance (37%). Reliability shows up in two columns—technical and commercial—which makes sense. Especially in markets like automotive, industrial, and IoT, that’s not negotiable.

Respondents also shared what’s not working with existing solutions. Top issues included limited endurance, high power consumption, and complex integration workflows. These pain points explain the interest in exploring new NVM types. But most emerging options still have hurdles to clear—scalability, ecosystem maturity, and total cost of ownership being the most cited.

Survey participants are building for a wide range of markets, with a few recurring themes: IoT, where power efficiency and size matter most; automotive, where memory must survive heat and stress; and AI/ML, where fast, reliable access drives performance. These are sectors with sharp constraints—and they’re forcing design teams to re-evaluate long-held assumptions about memory.

The survey also asked how professionals stay informed. The most common answers: technical content from vendors, peer recommendations, and webinars or conference sessions. That may not surprise anyone, but it reinforces a key point: decisions are being made by people actively looking for clarity, not just headlines.

This year’s survey shows a market in transition. Traditional NVM, notably flash, isn’t going anywhere just yet, but it’s no longer the only path forward. Newer technologies—like ReRAM—are being seriously evaluated, something more possible now that major foundries like TSMC are offering ReRAM IP as a main part of their portfolio.

There will be another survey later this year. Stay tuned to see how things progress.

Also Read:

Designing and Simulating Next Generation Data Centers and AI Factories

How Cadence is Building the Physical Infrastructure of the AI Era

Achieving Seamless 1.6 Tbps Interoperability for High BW HPC AI/ML SoCs: A Technical Webinar with Samtec and Synopsys


Feeding the Beast: The Real Cost of Speculative Execution in AI Data Centers

Feeding the Beast: The Real Cost of Speculative Execution in AI Data Centers
by Jonah McLeod on 04-30-2025 at 10:00 am

Per Module Cost Breakdown RISCV

For decades, speculative execution was a brilliant solution to a fundamental bottleneck: CPUs were fast, but memory access was slow. Rather than wait idly, processors guessed the next instruction or data fetch and executed it ‘just in case.’ Speculative execution traces its lineage back to Robert Tomasulo’s work at IBM in the 1960s. His algorithm—developed for the IBM System/360 Model 91—introduced out-of-order execution and register renaming. This foundational work powered performance gains for over half a century and remains embedded in most high-performance processors today.

But as workloads have shifted—from serial code to massively parallel AI inference—speculation has become more burden than blessing. Today’s data centers dedicate massive silicon and power budgets to hiding memory latency through out-of-order execution, register renaming, deep cache hierarchies, and predictive prefetching. These mechanisms are no longer helping—they’re hurting. The effort to keep speculative engines fed has outpaced the benefit they provide.

It’s time to rethink the model. This article explores the economic, architectural, and environmental case for moving beyond speculation—and how a predictive execution interface can dramatically reduce system cost, complexity, and energy use in AI data centers. See Fig. 1, which shows Side-by-side comparison of integration costs per module. Predictive interface SoCs eliminate the need for HBM3 and complex speculative logic, slashing integration cost by more than 3×. When IBM introduced Tomasulo’s algorithm in the 1960s, “Think” was the company’s unofficial motto—a call to push computing forward. In the 21st century, it’s time for a new mindset. One that echoes Apple’s challenge to the status quo: “Think Different.” Tomasulo changed computing for his era. Today, Dr. Thang Tran is picking up that torch—with a new architecture that reimagines how CPUs coordinate with accelerators. Predictive execution is more than an improvement—it’s the next inflection point.

Figure 1: Per-Module Cost Breakdown – Grace Hopper Superchip (GH200) vs. Predictive Interface SoC

Freeway Traffic Analogy: Speculative vs. Predictive Execution

Imagine you’re driving on a crowded freeway during rush hour. Speculative execution is like changing lanes the moment you see a temporary opening—hoping it will be faster. You swerve into that new lane, pass 20 cars… and then hit the brakes. That lane just slowed to a crawl, and you have to switch again, wasting time and fuel with every guess.

Predictive execution gives you a drone’s-eye view of the next 255 car lengths. You can see where slowdowns will happen and where the traffic flow is smooth. With that insight, you plan your lane changes in advance—no jerky swerves, no hard stops. You glide through traffic efficiently, never getting stuck. This is exactly what predictive interfaces bring to chip architectures: fewer stalls, smoother data flow, and far less waste.

Let’s examine the cost of speculative computing in current hyperscalar designs. The NVIDIA Grace Hopper Superchip (GH200) integrates a 72-core Grace CPU with a Hopper GPU via NVLink-C2C and feeds them using LPDDR5x and HBM3 memory respectively. While this architecture delivers impressive performance, it also incurs massive BoM costs due to its reliance on HBM3 high-bandwidth memory (96–144 GB), CoWoS packaging to integrate GPU and HBM stacks, deep caches, register renaming, warp scheduling logic, and power delivery for high-performance memory subsystems.

GH200 vs. Predictive Interface: Module Cost Comparison
GH200 Module Components Cost Architecture with Predictive Interface Cost
HBM3 (GPU-side) $2,000–$2,500 DDR5/LPDDR5 memory (shared) $300–$500
LPDDR5x (CPU-side) $350–$500 Interface control fabric (scheduler + memory coordination) $100–$150
Interconnect & Control Logic (NVLink-C2C + PHYs) $250–$350 Standard packaging (no CoWoS) $250–$400
Packaging & Power Delivery (CoWoS, PMICs) $600–$1,000 Simplified power delivery $100–$150
Total per GH200 module $3,200–$4,350 Total cost per module $750–$1,200
A Cost-Optimized Alternative

An architecture with predictive interface eliminates speculative execution and instead employs time-scheduled, deterministic coordination between scalar CPUs and vector/matrix accelerators. This approach eliminates speculative logic (OOO, warp schedulers), makes memory latency predictable—reducing cache and bandwidth pressure, enables use of standard DDR5/LPDDR memory, and requires simpler packaging and power delivery. In the same data center configuration, this would yield a total integration cost of $2.4M–$3.8M, resulting in a total estimated savings: $7.8M–$10.1M per deployment.

While the benefits of predictive execution are substantial, implementing it does not require a complete redesign of a speculative computing system. In most cases, the predictive interface can be retrofitted into the existing instruction execution unit—replacing the speculative logic block with a deterministic scheduler and timing controller. This retrofit eliminates complex out-of-order execution structures, speculative branching, and register renaming, removing approximately 20–25 million gates. In their place, the predictive interface introduces a timing-coordinated execution fabric that adds 4–5 million gates, resulting in a net simplification of silicon complexity. The result is a cleaner, more power-efficient design that accelerates time-to-market and reduces verification burden.

Is $10M in Savings Meaningful for NVIDIA?

At NVIDIA’s global revenue scale (~$60B in FY2024), a $10M delta is negligible. But for a single data center deployment, it can directly impact total cost of ownership, pricing, and margins. Scaled across 10–20 deployments, savings exceed $100M. As competitive pressure rises from RISC-V and low-cost inference chipmakers, speculative execution becomes a liability. Predictive interfaces offer not just architectural efficiency but a competitive edge.

Environmental Impact

Beyond cost and performance, replacing speculative execution with a predictive interface can yield significant environmental benefits. By reducing compute power requirements, eliminating the need for HBM and liquid cooling, and improving overall system efficiency, data centers can significantly lower their carbon footprint.

  • Annual energy use is reduced by ~16,240 MWh
  • CO₂ emissions drop by ~6,500 metric tons
  • Up to 2 million gallons of water saved annually by eliminating liquid cooling
Conclusion: A Call for Predictable Progress

Speculative execution has long served as the backbone of high-performance computing, but its era is showing cracks—both in cost and efficiency. As AI workloads scale exponentially, the tolerance for waste—whether in power, hardware, or system complexity—shrinks. Predictive execution offers a forward-looking alternative that aligns not only with performance needs but also with business economics and environmental sustainability.

The data presented here makes a compelling case: predictive interface architectures can slash costs, lower emissions, and simplify designs—without compromising on throughput. For hyperscalers like NVIDIA and its peers, the question is no longer whether speculative execution can keep up, but whether it’s time to leap ahead with a smarter, deterministic approach.

As we reach the tipping point of compute demand, predictive execution isn’t just a refinement—it’s a revolution waiting to be adopted.

Also Read:

LLMs Raise Game in Assertion Gen. Innovation in Verification

Scaling AI Infrastructure with Next-Gen Interconnects

Siemens Describes its System-Level Prototyping and Planning Cockpit


LLMs Raise Game in Assertion Gen. Innovation in Verification

LLMs Raise Game in Assertion Gen. Innovation in Verification
by Bernard Murphy on 04-30-2025 at 6:00 am

Innovation New

LLMs are already simplifying assertion generation but still depend on human-generated natural language prompts. Can LLMs go further, drawing semantic guidance from the RTL and domain-specific training? Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Using LLMs to Facilitate Formal Verification of RTL and was published in Arxiv.org in 2023. The authors are from the Princeton University. The paper has 27 citations according to Google Scholar.

The authors acknowledge that there is already published work on using LLMs to generate SVA assertions from natural language prompts but point out that the common approach doesn’t alleviate much of the burden on test writers who must still reason about and express test intent in natural language. Their goal is to explore whether LLMs can generate correct SVA for a given design without any specification beyond the RTL— even when the RTL contains bugs. They partially succeed, though still depend on designer review/correction.

Paul’s view

Great find by Bernard this month – paper out of Princeton on prompt engineering to improve GPT4’s ability to generate SVAs.  The intended application is small units of code that can achieve full statement and toggle coverage based only on SVAs and model checking.

The authors refine their prompt by taking RTL for a simple FIFO module which is known to be correct and repeatedly asking GPT4 to “write SVA assertions to check correctness of ALL the functionality” of that module. After each iteration they review the SVAs and add hints to their prompt to help GPT4 generate a better result. For example, “on the postcondition of next-cycle assertions (|=>), USE $past() to refer to the value of wires.” After 23 iterations and about 8 hours of manual effort they come up with a prompt that generates a complete and correct set of assertions for the FIFO.

Next, the authors take their engineered prompt and try it on a more complex module – the page table walker (PTW) of an opensource RISC-V core. They identify a recent bug fix to the PTW and take an RTL snapshot from before that bug fix. After calling GPT4 8 times (for a total of 80 SVAs generated), they are able to get an SVA generated that catches the bug. An encouraging step in the right direction, but of course it’s much easier to find an SVA to match a known bug vs. looking at several failing auto-generated SVAs and wondering which ones are due to a real bug in the RTL vs. the SVA itself being buggy.

The latter part of the paper investigates if auto-generated SVAs can improve RTL generation: the authors take a 50 word plain text description of a FIFO queue and ask GPT4 to generate RTL for it. They generate SVAs for this RTL, manually fix any errors, and add the fixed SVAs back into the prompt. After 2 iterations of this process they get clean RTL and SVAs with full coverage. Neat idea, and another encouraging result, but I do wonder if the effort required to review and fix the SVAs was any less than the effort that would have been required to review and fix the first RTL generated GPT4.

Raúl’s view

Formal property verification (FPV) utilizing SystemVerilog Assertions (SVA) is essential for effective design verification. Researchers are actively investigating the application of large language models (LLMs) in this area, such as generating assertions from natural language, producing liveness properties from an annotated RTL module interface, and creating a model of the design from a functional specification for comparison with the RTL implementation. This paper examines whether LLMs can generate accurate SVA for a given design solely based on the RTL, without any additional specifications – which has evident advantages. The study builds upon the previously established framework, AutoSVA, which uses GPT-4 to generate end-to-end liveness properties from an annotated RTL module interface. The enhanced framework is referred to as AutoSVA2.

The methodology involves iteratively refining prompts with rules to teach GPT-4 how to generate correct SVA (even state-of-the-art GPT-4 generates syntactically and semantically wrong SVA by default) and crafting rules to guide GPT4 at generating SVA output, published as open-source artifacts [2]. Two examples of such rules include: “signals ending in _reg are registers: the assigned value changes in the next cycle”, “DO NOT USE $past() on postcondition of same-cycle assertion”.

The paper details extensive experimentation that identified a bug in the RISC-V CVA6 Ariane core which had previously gone undetected. AutoSVA2 also allows the generation of Register Transfer Level (RTL) for a FIFO queue based on a fifty-word specification. To illustrate the process, here is an excerpt from the paper describing the workflow:

  1. Start with a high-level specification in English
  2. The LLM generates a first version of the RTL based on the specification, the module interface, and an order to generate synthesizable Verilog
  3. AutoSVA2 generates an FPV Testbench (FT) based on the RTL
  4. JasperGold evaluates the FT
  5. The engineer audits and fixes the SVA
  6. The LLM generates a new version of the RTL after appending the SVA to the previous prompt.
  7. Steps 3 to 6 are then repeated until convergence: either (a) full proof and coverage of the FT or(b) a plateau in the improvements of the RTL and SVA.

This process differs significantly from the role of a designer or verification engineer. GPT-4 creativity allows it to generate SVA from buggy RTL as well as create buggy SVA for correct RTL; reproducibility presents a challenge; internal signals, timing, syntax, and semantics may be partially incorrect and are partly corrected by the rules mentioned above.

On the positive side, AutoSVA2-generated properties improved coverage of RTL behavior by up to six times over AutoSVA-generated ones with less effort and exposed an undiscovered bug. The authors think that the approach has the potential to expand the adoption of FPV and pave the way for safer LLM-assisted RTL design methodologies. The Times They Are A-Changin’?

Also Read:

High-speed PCB Design Flow

Perspectives from Cadence on Data Center Challenges and Trends

Designing and Simulating Next Generation Data Centers and AI Factories


Automotive Functional Safety (FuSa) Challenges

Automotive Functional Safety (FuSa) Challenges
by Daniel Payne on 04-29-2025 at 10:00 am

silicon lifecycle management slm diagram min

Modern vehicles have become quite sophisticated, like a supercomputer on wheels. They integrate a vast number of electronic components, including thousands of chips, to deliver advanced functionalities ranging from infotainment to critical safety systems. This increasing complexity necessitates a robust approach to automotive Functional Safety (FuSa), a discipline focused on ensuring the absence of unreasonable risk due to hazards caused by malfunctioning electrical and electronic (E/E) systems in road vehicles. For engineers in the automotive market, understanding and implementing FuSa principles is mandatory to designing safe, reliable, and trustworthy vehicles.

Industry Standards and Cooperation

Ensuring uniform safety across the automotive industry requires collaboration and adherence to common frameworks. The International Organization for Standardization (ISO), in partnership with auto manufacturers (OEMs) and their suppliers, has developed key standards, most notably ISO 26262. This standard governs the functional safety of E/E systems within road vehicles. A critical element of ISO 26262 is the Automotive Safety Integrity Level (ASIL) classification, which ranges from ASIL-A (least stringent) to ASIL-D (most stringent). ASIL-D serves as a crucial benchmark for assessing and guaranteeing the reliability and safety of systems-on-chip (SoCs) and 3D integrated circuits (ICs) in applications where a failure could lead to severe consequences. OEMs collaborate with engineers to incorporate automotive-grade IP alongside ISO 26262-certified design and testing methodologies to create SoCs that meet stringent safety requirements.

Beyond ISO 26262, other standards contribute to vehicle safety. The Institute of Electrical and Electronics Engineers (IEEE), through its committee on IEEE P2851, sets guidelines for the design, implementation, and evaluation of safety-critical systems. These standards outline essential methods, description languages, data models, and databases that can be utilized across the industry in a technology-agnostic manner, contributing to safer vehicles and potentially reducing costs associated with redesigns and recalls due to safety issues. These standards also facilitate data exchange and interoperability throughout the vehicle’s lifecycle and evolve alongside emerging technologies like AI. The integration of more technology inherently introduces more risk, underscoring the importance of these safety frameworks.

Balancing Risk and Reward in Feature Integration

The integration of advanced technologies, such as Artificial Intelligence (AI) components into modern vehicles offers numerous benefits, including park assist and real-time analysis of driving conditions. This integration presents a balance between the potential rewards and inherent risks. High-performance SoCs that handle AI workloads consume more power, impacting energy efficiency, particularly in electric vehicles. 3D ICs introduce thermal management challenges that require effective cooling methods to ensure reliability and longevity, which is especially critical for the battery life and thermal stability of electric vehicles.

The addition of more chips and safety features inherently increases system complexity, leading to a higher risk of failure. Data security is another significant concern that must be addressed proactively. Material costs associated with these new technologies can impact profit margins and vehicle affordability. Functional safety necessitates a careful balancing act where OEMs must weigh safety mechanisms against budget constraints, performance requirements, and security considerations.

Security as an Indispensable Element of Safety

A fundamental principle in modern automotive engineering is that if the technology is not secure, it is not safe. To ensure reliable and predictable vehicle operation, tamper-proof data transfer among the numerous sensors and components is essential. Achieving this requires a comprehensive security approach that integrates security considerations from the earliest stages of design. This includes incorporating the extensive security expertise gained in the networking domain over the past three decades into in-vehicle network architectures. Recommended security measures encompass encryption for data in transit and at rest, multi-factor authentication, secure communication protocols, and regular security audits.

Hardware-based security features are vital in defending against potential threats. These features include secure enclaves, Trusted Execution Environments (TEEs), and Intrusion Detection and Prevention Systems (IDPS) that protect sensitive data and system integrity. The use of Hardware Security Modules (HSMs) and secure boot processes ensures that only authenticated and untampered firmware and software can operate within the vehicle’s Electronic Control Units (ECUs). Adherence to the ISO 21434 standard is also crucial for comprehensive vehicle security, as it covers the entire vehicle lifecycle, emphasizing risk management, organizational and technical requirements, and continuous monitoring. Since the components governing security also rely on chips, their safe operation must be ensured through proactive measures like predictive maintenance.

Proactive Reliability through Predictive Maintenance

Predictive maintenance leverages advanced analytics and machine learning algorithms to anticipate potential failures before they occur. This proactive approach can be applied to any part of the vehicle and is increasingly being used at the silicon level to predict chip degradation. Predictive maintenance techniques can monitor the health of critical components such as an engine’s electronic control unit (ECU) or the battery management system (BMS) in electric vehicles, allowing for timely maintenance before actual failures.

Achieving optimal results with predictive maintenance requires the vehicle operating system to analyze vast amounts of data using advanced technologies capable of identifying patterns and predicting potential failures with high precision. This often involves leveraging edge computing to process data locally within the vehicle and cloud computing to aggregate and analyze data at a larger scale. Advanced machine learning models are trained on both historical and real-time data to recognize early indicators of component degradation, such as subtle rises in operating temperature preceding a chip failure. A comprehensive framework for managing and effectively utilizing this extensive data is essential to fully realize the benefits of predictive maintenance.

Silicon Lifecycle Management

Silicon Lifecycle Management (SLM) provides a comprehensive strategy for managing the data and processes associated with the maintenance and service of vehicle components throughout their entire lifecycle. By integrating SLM with predictive maintenance, cybersecurity measures, and adherence to industry standards, manufacturers can ensure that maintenance activities are not only timely but also aligned with the overall vehicle service strategy.

Synopsys offers a broad portfolio of standards-based, automotive-grade IP, including interface, processor, security, and foundation IP, which are compliant with industry standards to accelerate SoC-level design and qualification. They also provide a comprehensive suite of integrated, standards-based SLM tools, IP, and methodologies that offer observability, analytics, and automation at the silicon level. Their Process, Voltage, and Temperature (PVT) Monitor IP, for example, is certified as ASIL-B ready and meets the AEC-Q100 Grade 2 standard. By gathering data at every stage of the product lifecycle, Synopsys’ SLM IP facilitate continuous analysis and provide actionable insights, improving design efficiency and quality while also predicting in-field chip degradation or failure. These automotive-grade IPs and the continuous insights derived from SLM are crucial for ensuring the long-term functional safety of modern vehicles. Synopsys’ SLM product family continuously monitors critical silicon metrics like voltage, temperature, margins, and health, enhancing the reliability and performance of ECUs, CPUs, GPUs, and other architectures through real-time edge and cloud analytics. This enables the assessment of silicon aging and reliability, providing valuable insights into both systemic and random defects and facilitating predictive maintenance to extend silicon lifecycle and reduce costs. Synopsys’ electronic digital twins (eDT) further support automotive development by enabling the validation of each step from SoC to ECU to E/E architecture before production, optimizing for performance, safety, reliability, quality, and security.

Summary

Ensuring automotive functional safety in today’s complex vehicles demands a holistic, silicon-to-systems approach. While modern safety features and sensors can reduce human errors, they also introduce new levels of complexity and risk. To maintain and enhance functional safety, the industry must continue to promote and refine essential standards, rigorously ensure the security of data within and around the vehicle, and leverage comprehensive approaches that provide end-to-end monitoring, verification, and predictability from the silicon level up to the complete vehicle system. Companies like Synopsys play a lead role by providing the necessary IP, tools, and methodologies to navigate these challenges and build safer, more reliable software-defined vehicles.

Related Blogs


Scaling AI Infrastructure with Next-Gen Interconnects

Scaling AI Infrastructure with Next-Gen Interconnects
by Kalar Rajendiran on 04-29-2025 at 6:00 am

Data Centers Reimagined for Future of Gen AI

At the recent IPSoC Conference in Silicon Valley, Aparna Tarde gave a talk on the importance of Next-Gen Interconnects to scale AI infrastructure. Aparna is a Sr. Technical Product Manager at Synopsys. A synthesis of the salient points from her talk follows.

The rapid advancement of artificial intelligence (AI) is fundamentally reshaping the requirements for data center infrastructure. Meta’s Llama 3, for instance, has 405 billion parameters and was trained on 16,000 Nvidia H100 GPUs using 15.6 trillion tokens, and took 70 days to train. These massive workloads demand not only immense compute resources but also unprecedented memory capacity and ultra-fast interconnects to sustain performance growth. To meet performance requirements, systems now rely heavily on efficient XPU-to-XPU communication. This demands high-bandwidth, low-latency, and energy-efficient interconnects capable of synchronizing large-scale compute clusters. Memory requirements are also becoming a challenge; for example, Llama 3.1 requires 854 GB of memory per model instance, while current GPUs like the Nvidia H200 offer only 141 GB, necessitating either memory scaling or advanced compression. To accommodate the growing needs of generative AI, data center architecture is being redefined. Traditional homogeneous resource pools are being disaggregated, while heterogeneous resources are aggregated to meet specific workload demands.

Higher bandwidth and new communication protocols are becoming essential, especially as longer communication reaches drive the shift from copper to optical links. AI workloads also impose strict requirements for low power consumption and latency.

The Role of Interfaces in AI and HPC SoCs

Efficient data movement within and between SoCs is becoming the cornerstone of AI scalability. At the same time, no single interface can address all AI workload requirements. Instead, a mix of standards is evolving to meet various demands: PCIe and CXL manage general-purpose and memory-centric workloads, UALink handles intra-node compute scaling, and Ultra Ethernet supports inter-node communication across large clusters. Together, these interfaces create a cohesive and open ecosystem, supplanting legacy proprietary networks and enabling more flexible and scalable infrastructure.

UALink: Scaling Up AI Performance

UALink is purpose-built to meet the demands of AI scale-up, offering a lightweight, high-bandwidth, open-standard interconnect optimized for XPU-to-XPU resource sharing and synchronization across up to 1,024 accelerators. Unlike PCIe 7.0, which lacks remote memory access, or CXL 3.x, which is slowed by complexity and adoption lag, UALink delivers streamlined performance without the overhead. By replacing proprietary networks and complementing existing standards, UALink provides the efficient, scalable backbone needed for next-gen AI systems.

Ultra Ethernet: Scaling Out AI Across the Data Center

As AI models grow more distributed, scale-out networking becomes critical. Ultra Ethernet is designed to meet this challenge, providing high bandwidth, multi-path routing, and low-latency communication between over 1 million endpoints. Unlike traditional Ethernet which struggles with configuration complexity, network congestion, and out-of-order delivery, Ultra Ethernet is optimized for simplicity and performance.

The Optical Shift: From Copper to Co-Packaged Optics

With data movement becoming a primary concern, the limitations of copper are more apparent than ever. Different interconnect mediums now serve distinct roles in the data center: PCB traces for intra-device links, copper for intra-rack communication, and optics for inter-rack distances. Co-Packaged Optics (CPO) is the next step in this evolution, integrating silicon and optical components within a single package to dramatically reduce power consumption and latency. Network switches bear the brunt of AI-related bandwidth scaling and are expected to be the early adopters of CPO.

Multi-Die Packaging: Enabling the Next Generation of AI SoCs

As single-die integration becomes insufficient for AI workloads, multi-die packaging technologies like 2.5D, 3D, and 3.5D are becoming mainstream. These techniques allow for metal-to-metal die connections that significantly reduce latency and power while increasing interconnect bandwidth. Though more complex and costly due to thermal modeling and bonding challenges, 3D stacking enables re-usable, scalable chiplet architectures. UCIe 2.0 is suitable for 2D and 2.5D integration, while UCIe-3D and 3DIO PHY support vertical die-to-die communication for 3D stacking. These approaches are essential for AI SoCs that require close cache-to-core coupling and high-speed D2D links.

Summary

AI infrastructure is undergoing a massive transformation driven by the need for more performance, efficiency, and scalability. Next-generation interconnects like UALink for scale-up and Ultra Ethernet for scale-out, combined with advanced optical links and multi-die packaging, form the foundation of tomorrow’s AI-powered data centers. Synopsys is enabling the buildup of next-gen AI infrastructure through its comprehensive and performance-optimized high-speed interface IP portfolio.

Learn more about Synopsys HPC IP Solutions

Also Read:

The Growing Importance of PVT Monitoring for Silicon Lifecycle Management

Achieving Seamless 1.6 Tbps Interoperability for High BW HPC AI/ML SoCs: A Technical Webinar with Samtec and Synopsys

SNUG 2025: A Watershed Moment for EDA – Part 1


Siemens Describes its System-Level Prototyping and Planning Cockpit

Siemens Describes its System-Level Prototyping and Planning Cockpit
by Mike Gianfagna on 04-28-2025 at 10:00 am

Siemens Describes its System Level Prototyping and Planning Cockpit

We all know semiconductor design is getting harder. Much harder when you consider the demands of AI workloads and heterogeneous integration of many chiplets in a single package. This class of system demands co-optimization across the entire design flow. For example, functional verification, thermal analysis, signal and power integrity, electromigration, and IR drop all need to be balanced across a complex process of die and package co-design. Data management and tool integration are particularly vexing here.

Against this backdrop, Siemens Digital Industries Software has published an eBook that flattens this class of problem. The company illustrates how its approach works using Intel Foundry’s EMIB packaging technology. If you face any type of complex chip design, this eBook is must-read. Don’t let the category scare you off, the eBook isn’t long, but It’s packed with solid examples of how to tame complex chip design. A link is coming but first let’s examine how Siemens describes its system-level prototyping and planning cockpit.

About the Authors

Two exceptional gentlemen with substantial background in the problems addressed in this eBook are the authors.

Keith Felton

Keith Felton has over 30 years of experience developing and marketing advanced tools and supporting customers in the use of those tools for complex chip design, PCB design, and high-density advanced packaging. He has worked at companies such as Cadence, Viewlogic and Zuken-Redac as well as Siemens. He has also led partnerships across the semiconductor ecosystem.

 

 

 

Mike Walsh

Mike Walsh has over 30 years of experience helping customers around the world to design challenging advanced packages. He has broad knowledge of the system planning and design process and expertise in system-in-package (SiP), interposers, 3D IC, wafer-level integration, and multi-substrate solutions.

The insights offered by these authors is eye-opening, relevant and quite valuable.

About Intel Foundry’s EMIB Technology

Embedded multi-die interconnect bridge (EMIB) is a semiconductor packaging technology developed by Intel Foundry that uses a small, embedded silicon bridge to interconnect multiple dies, or chiplets, within a single package. In contrast to large silicon interposers, the EMIB bridge only spans the area needed to connect the specific dies, making it a more compact and cost-effective solution.

EMIB facilitates integration of multiple dies in a single package with the ability to have multiple EMIB bridges. This approach provides a good example of how to deploy an integrated chip/package flow since the increased design complexity intrinsic to EMIB technology shifts more of the challenges to the package level.

The Design Challenges

Design challenges include high pin counts, integration of diverse components, and providing an accurate representation of the EMIB structure for package design tools. Because EMIB is a passive bridge without active silicon, defining the EMIB component modules and setting up constraints to achieve low latency design rule checks (DRC) is crucial. Power delivery to the EMIB bridge is a primary design concern, requiring point-to-point connections and sufficient power distribution.

Typical advanced packaging workflows present several challenges. These flows include design and analysis tools from different vendors, creating disconnected manual processes. An environment like this requires importing and exporting a lot of data. This results in a lot of data iterations that can produce errors. This approach can also be time consuming, tempting designers to skip steps, such as functional simulation. But ignoring design steps results in failing to detect connectivity errors, producing non-functional designs.

The diagram below illustrates the complexity and interdependence of the process.

Advanced Package Workflow Challenges

The Siemens System-Level Prototyping and Planning Cockpit

The eBook describes the Siemens approach to designing for Intel Foundry’s EMIB technology. This is accomplished by defining and driving everything from a single digital twin model of the entire advanced package assembly, which is constructed and managed by the Siemens Innovator3D IC™ solution.

Innovator3D IC uniquely represent the entire system, including dies, chiplets, interposers, EMIBs, packages, and PCBs within a single environment. It builds a cohesive view of the system by consuming data in a variety of formats and different levels of completeness. This unified view enables the creation of application-specific data sets, which are then pushed into other tools like Calibre® 3DSTACK, Calibre 3DThermal, Aprisa™, and Tessent™ from Siemens.

Innovator3D IC also leverages these tools in a predictive manner. For example, with Calibre 3DThermal, early insights into thermal performance can guide floor planning adjustments and corrective actions before serious issues arise. Even incomplete data, like preliminary power or heat sink information, can be used to overlay results back into Innovator3D IC providing valuable feedback for optimization.

By providing a platform that integrates all these tools and workflows, Innovator3D IC ensures early issue identification and seamless collaboration across the design flow, ultimately improving efficiency and design quality.

The eBook presents details of a six-step process to perform the complete, integrated design in one unified environment. The descriptions are clear and easy to follow. I highly recommend getting your copy of this eBook. The link is coming soon. The six steps detailed are:

  • Step #1 – Die, EMIB, and package co-design
  • Step #2 – Functional verification of system-level connectivity
  • Step #3 – Early predictive thermal analysis
  • Step #4 – Physical layout using Xpedition Package Designer
  • Step #5 – SI/PI/EM analysis and extraction using HyperLynx
  • Step #6 – 3D assembly verification using Calibre 3DSTACK

To Learn More

I have just scratched the surface of what you will learn from this Siemens eBook. If complex chip/package co-design is on your mind, and especially if you are considering Intel Foundry’s EMIB technology, you need to get your own copy.

You can access your copy of Reference workflows for Intel Foundry EMIB and MIB-T integration platforms here.

You can also learn more about the various tools from Siemens Digital Industries Software that comprise this unique and well-integrated flow here:

And that’s how Siemens describes its system-level prototyping and planning cockpit.


Recent AI Advances Underline Need to Futureproof Automotive AI

Recent AI Advances Underline Need to Futureproof Automotive AI
by Bernard Murphy on 04-28-2025 at 6:00 am

BEVDepth min

The world of AI algorithms continues to advance at a furious pace, and no industry is more dependent on those advances than automotive. While media and analysts continue to debate whether AI will deliver value in business applications, there is no question that it adds value to cars, in safety, some level of autonomous driving, and in comfort. But there’s a tension between these rapid advances and the 15-year nominal life of a car. Through that lifetime, software and AI models must be updated at service calls or through over-the-air updates. Such updates are now relatively routine for regular software, but AI advances are increasing stress on NPU architectures even more than they have in the past.

A BEVDepth algorithm (courtesy of GitHub)

From CNNs to Transformers to Fusion

For most of us CNNs were the first big breakthrough in AI, amply served by a matrix/vector engine (MACs) followed by a bit of ALU activity to wrap up the algorithm. Hardware was just that – a MAC array and a CPU. Transformers made this picture messier. The attention part is still handled by a matrix engine but the overall flow goes back and forth between matrix, vector and scalar operations. Still manageable in common NPUs with three engines (MAC, DSP, CPU) but traffic between these engines increases, adding latency unless the NPU and the model are optimized carefully for minimal overhead. Now add in fusion, depending on special operators which must be custom coded in C++ to run on the CPU. The important metric, inferences per second, depends heavily on model designer/implementor expertise.

This raises two important questions for any OEM or Tier1 planning around an NPU selection. First, what kind of performance can they expect for advanced algorithms for next generation designs? Second, will they need NPU provider expertise to code and optimize their proprietary/differentiating algorithms. Revealing company secrets is only part of the problem. The NPU market is still young and likely volatile, unlike the established CPU market. After committing to an NPU, who will take care of their model evolution needs over the 15-year life of a car?

Predicting what might be needed in the future is impossible, but it is certainly possible to look to model needs on the near horizon for a sense of which NPU architectures might best support adaptation to change. Quadric has an intriguing answer, citing some of the newer AI models such as BEVDepth.

Evolution in Bird’s Eye View (BEV) applications

If you have a relatively modern car you are probably already familiar with Bird’s Eye View as an aid to parallel parking. This is an option on your infotainment screen, an alternative to the backup camera view and the forward-facing camera view. BEV is the screen that shows a view from an imaginary camera floating six feet above the car, amazingly useful to judge how close you are to the car behind, the car in front, and the kerb.

This view is constructed through the magic of optics: multiple cameras around the car in effect project their images onto a focal plane at that imaginary camera location. The images are stitched together, with some adjustment, providing that bird’s-eye view.

Neat and useful, but model designers have larger aspirations than support for parallel parking. BEV is already making its way into some aspects of autonomous driving, especially as a near-range supplement to LIDAR or RADAR. But to be truly useful it needs to extend to a 3D view.

Adding depth information to BEV has stimulated a lot of research. Each camera contributes a different input to the BEV, not just as a slice of that view, but also through differing perspectives and intrinsic properties of the cameras. There are multiple proposed algorithms, of which one is BEVDepth. This algorithm uses point clouds from LIDAR as a reference for transformer-based depth learning around camera images.

An important step in this process involves voxel pooling. Pooling is a familiar step in CNNs, reducing the dimension of an image while preserving important features. Voxels are just the “3D pixels” you would expect in a 2D image (BEV) with depth. Voxel pooling is a complex algorithm and (in the GitHub version) is implemented in CUDA, the well-known NVIDIA programming standard. At nearly 200 lines of CUDA, this is not a simple operator to be added easily to the ONNX standard operator set. Further I am told this operation accounts for 60% of the compute cost of BEVDepth and must run on an ALU. Could you implement this on a regular NPU? Probably but apparently other NPU experts still haven’t delivered performance versions, while Quadric has already demonstrated their implementation.

A good fit for Quadric Chimera

You may not remember the key value prop for the Chimera NPU. These can be arranged as systolic arrays, nothing novel there. But each processing element (PE) in the array has MACs, an ALU, and local register memory wrapped in a processor pipeline. In switching between matrix, vector, and scalar operations, there’s no need to move data. Computation of all types can be handled locally as data flows through the array, rather than having to be swapped back and forth between matrix, DSP, and CPU engines.

Sounds good, but does it deliver? Quadric ran a benchmark of the Voxel Pooling algorithm, comparing performance on an Nvidia RTX 3090 chip versus a Quadric QC-Ultra (quad-core), running at the same clock frequency. The Quadric solution ran more than 2 times faster at substantially lower power. And here’s the clincher. While the algorithm is written in CUDA, the only difference between CUDA C++ and Quadric’s C++ is some easily understood memory pointer changes. Quadric was able to port the GitHub code in 2 weeks and claim anyone with C++ experience could have made the same changes. They claim the same applies to any operation which can be written in C++.

The takeaway is that a model as advanced as BEVDepth, supported by a key function written in CUDA, was easily mapped over to the Quadric platform and ran twice as fast as the same function running on an Nvidia chip at substantially lower power. Faster of course because Chimera is designed for IoT inferencing rather than heavy-duty training. Much lower power for the same reason. And programming is easily managed by an OEM or Tier1 C++ programmer. Ensuring that models can be maintained and upgraded long-term over the life of an automotive product line.

Staying current with AI innovation is a challenge in all markets, but none more so than in automotive. The NPU architecture you want to bet on must allow you to upgrade models in ways you can’t yet predict over 15-year lifespans. You need a solution your own software programmers can manage easily yet which offers all the performance advantages you expect from an NPU. You might want to checkout Quadric’s website.

Also Read:

2025 Outlook with Veerbhan Kheterpal of Quadric

Tier1 Eye on Expanding Role in Automotive AI

A New Class of Accelerator Debuts

 


Podcast EP285: The Post-Quantum Cryptography Threat and Why Now is the Time to Prepare with Michele Sartori

Podcast EP285: The Post-Quantum Cryptography Threat and Why Now is the Time to Prepare with Michele Sartori
by Daniel Nenni on 04-25-2025 at 10:00 am

Dan is joined by Michele Sartori – senior product manager at PQShield. Michele is a software engineer in Computer and Network Security, specializing in product management. He is a passionate tech team leader at the forefront of emerging technologies focused on achieving tangible results.

In this highly informative discussion, Dan explores the details of preparing for post-quantum cryptography with Michele. Michele explains why the time to begin preparation for these changes is NOW. He describes what needs to be done in key areas such as performance, integration and of, course security. He explains how to develop a three-step plan to prepare the enterprise for important changes that will become a mandate by 2030.

Michele also explains what risks exist today, even before quantum computers have reached the required performance level to pose a real threat. He describes current and future cryptography and security strategies and the work PQShield is doing across the ecosystem to help organizations prepare for the coming changes.

Contact PQShield

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Cost-Effective and Scalable: A Smarter Choice for RISC-V Development

Cost-Effective and Scalable: A Smarter Choice for RISC-V Development
by Daniel Nenni on 04-25-2025 at 6:00 am

Vast library of 90+ Prototype Ready IP

The RISC-V ecosystem is witnessing remarkable growth, driven by increasing industry adoption and a thriving open-source community. As companies and developers seek customizable computing solutions, RISC-V has become a top choice. Providing a scalable and cost-effective ISA foundation, RISC-V enables high-performance and security-enhanced implementations, making it ideal for next-generation digital infrastructure.

RISC-V’s modular ISA and growing ecosystem support a wide range of configurations, making it highly adaptable across applications. Designers have options to integrate extensions such as vector processing, floating-point, atomic operations, and compressed instructions. Furthermore, its scalability spans from single-core to multi-core architectures and can incorporate optimizations like out-of-order execution to enhance performance. To achieve an optimal balance of performance, power efficiency, and scalability, selecting the right RISC-V microarchitecture and system integration strategy is crucial.

For entry-level RISC-V development, single-core open-source implementations are well-suited for small to medium capacity FPGA-based platforms. Among them, Xilinx VU9P-based solutions, such as the VCU118 development board, have been widely adopted by engineers for their balanced capabilities and accessibility. The S2C Prodigy S7-9P Logic System takes this foundation even further. Built on the same powerful VU9P FPGA with 14M ASIC gates, it enhances usability, expandability, and cost-efficiency. With seamless integration of daughter cards and an advanced toolchain, the S7-9P offers an ideal fit for small to medium-scale RISC-V designs, empowering developers to accelerate their innovation with confidence.

Media-Ready Prototyping: MIPI and HDMI for Real-World Applications

As multimedia processing becomes increasingly integral to RISC-V applications, the demand for high-speed data handling and versatile prototyping tools has never been greater. The S2C Prototyping systems meet this need with support for MIPI and HDMI via optional external daughter cards, making it an ideal choice for smart displays, AR/VR systems, and AI-powered cameras. For example, if you’re developing a RISC-V-based smart camera, a complete prototyping environment from capturing images via MIPI D-PHY to display outputs through HDMI can be deployed with ease. Its flexible expansion options allow developers to experiment with various configurations, refine their designs, and push the boundaries of RISC-V media applications.

High-Speed Connectivity: QSFP28 Ethernet for Next-Gen Networking

With networking requirements becoming more demanding, high-speed connectivity is crucial for RISC-V-based applications. The S7-9P rises to this challenge with built-in QSFP28 Ethernet support, enabling 100G networking applications. This makes it an optimal choice for developing and testing prototyping RISC-V-based networking solutions, including routers, switches, and edge AI processing units.

Need More Scalability?

While the S7-9P is an excellent choice for entry-level to mid-range RISC-V prototyping, more complex designs may require greater capacity. For high-end verification and large-scale projects, S2C also offers advanced solutions like the VU440 (30M ASIC gates), VU19P (49M ASIC gates), and VP1902 (100M ASIC gates), providing the scalability needed for RISC-V subsystems, multi-core, AI, and data-intensive applications.

Special Offer: Save 25% on the Prodigy S7-9P Bundle

For a limited time, get the Prodigy S7-9P bundle—which includes a free Vivado license (valued at $5,000+)—for just $14,995, a 25% savings! Visit S7-9P for information or Contact our team to find the perfect fit for your project.

Also Read:

S2C: Empowering Smarter Futures with Arm-Based Solutions

Accelerating FPGA-Based SoC Prototyping

Unlocking SoC Debugging Challenges: Paving the Way for Efficient Prototyping


High-speed PCB Design Flow

High-speed PCB Design Flow
by Daniel Payne on 04-24-2025 at 10:00 am

PCB design phases min

High-speed PCB designs are complex, often requiring a team with design engineers, PCB designers and SI/PI engineers working together to produce a reliable product, delivered on time and within budget. Cadence has been offering PCB tools for many years, and they recently wrote a 10-page white paper on this topic, so I’ll share what I learned. The promise is that using early identification and resolution of SI and PI challenges will shorten the overall time to market.

The three PCB design steps are: Schematic, Layout, Post-layout and Signoff. If your EDA tool flow includes in-design analysis, then the team can find and fix SI and PI issues earlier and with more accuracy.

Collaboration across teams means that an EE can define the high-speed constraints at the schematic stage with little need for an SI expert. Layout designers use visualization tools to see SI/PI issues quickly in their tools. Handoffs between team members are made efficient by in-tool feedback.

The Power Distribution Network (PDN) can be analyzed for issues like IR drop under DC operating conditions, enabling decisions on current density and specifying copper weight and thicknesses.  You can visualize DC drop analysis in Cadence tools.

DC drop analysis

During transient operation the PCB design encounters high-frequency switching currents that couple with inductance to create voltage noise. Add decoupling capacitors and minimizing inductance are ways to mitigate this noise. AC power analysis tools simulate transient responses from the PCB, along with power noise and impedance profiles so that each component has stable and clean power.

AC Analysis

High-speed datalinks are commonly used for PCIe, Ethernet, USB and UCIe designs, so care is required to manage channel loses, via effects and pass compliance testing. Vias can add undesired discontinuities, create impedance mismatches and degrade signals from inductance and capacitance effects, cause stub resonance and even add return path discontinuities. Engineers can now design, view and validate via structures early on with the Aurora Via Wizard.

Aurora Via Wizard

Traces at high frequencies exhibit losses from conductor resistance, dielectric absorption and the roughness of copper traces. Designers can choose low-loss dielectrics, optimize the trace geometry and maintain a continuous ground plane under signal traces to mitigate these losses. To simulate different dielectric materials the Sigrity X Topology Workbench comes into play. For SerDes interfaces there’s the Compliance Analysis tools to validate a design early, adjust signal paths and pass protocol specifications.

Designing DDR5 interfaces at multi-gigabit speeds is enabled by using Sigrity X Topology Explorer Workbench for parameter sweeps to find the best termination configuration and find optimal routing solutions while finding any timing violations. DDR memory buses can have hundreds of signals, and using Sigrity X Aurora helps to automate through impedance validation, crosstalk analysis and return path optimization.

Signal quality

Another high-speed design issue is Simultaneous Switching Noise (SSN), causing ground bounce, increased jitter and timing errors. Cadence has power-aware IBIS and advanced PDN analysis tools to quickly identify these vulnerabilities, provide decoupling capacitor placement and accurately simulation SSN effects. For via-to-via crosstalk issues there’s 2.5D and 3D analysis tools for via modeling, along with design recommendations for via shielding and optimized layer transitions.

Cadence Tools

The full high-speed PCB flow is covered by tools that work together from schematic to signoff: Allegro X Design Platform, Sigrity X Platform, Sigrity X Aurora Via Wizard, Sigrity X Topology Explorer Workbench, Clarity 3D.

Summary

High-speed PCB design teams can navigate successfully through the challenges of signal integrity and power integrity by using in-design analysis tools. This approach shortens time to market through tool automation, using distributed computing and making complex concepts easier to understand.

Read the complete  white paper from Cadence online.

Related Blogs