SILVACO 073125 Webinar 800x100

Feeding the Beast: The Real Cost of Speculative Execution in AI Data Centers

Feeding the Beast: The Real Cost of Speculative Execution in AI Data Centers
by Jonah McLeod on 04-30-2025 at 10:00 am

Per Module Cost Breakdown RISCV

For decades, speculative execution was a brilliant solution to a fundamental bottleneck: CPUs were fast, but memory access was slow. Rather than wait idly, processors guessed the next instruction or data fetch and executed it ‘just in case.’ Speculative execution traces its lineage back to Robert Tomasulo’s work at IBM in the 1960s. His algorithm—developed for the IBM System/360 Model 91—introduced out-of-order execution and register renaming. This foundational work powered performance gains for over half a century and remains embedded in most high-performance processors today.

But as workloads have shifted—from serial code to massively parallel AI inference—speculation has become more burden than blessing. Today’s data centers dedicate massive silicon and power budgets to hiding memory latency through out-of-order execution, register renaming, deep cache hierarchies, and predictive prefetching. These mechanisms are no longer helping—they’re hurting. The effort to keep speculative engines fed has outpaced the benefit they provide.

It’s time to rethink the model. This article explores the economic, architectural, and environmental case for moving beyond speculation—and how a predictive execution interface can dramatically reduce system cost, complexity, and energy use in AI data centers. See Fig. 1, which shows Side-by-side comparison of integration costs per module. Predictive interface SoCs eliminate the need for HBM3 and complex speculative logic, slashing integration cost by more than 3×. When IBM introduced Tomasulo’s algorithm in the 1960s, “Think” was the company’s unofficial motto—a call to push computing forward. In the 21st century, it’s time for a new mindset. One that echoes Apple’s challenge to the status quo: “Think Different.” Tomasulo changed computing for his era. Today, Dr. Thang Tran is picking up that torch—with a new architecture that reimagines how CPUs coordinate with accelerators. Predictive execution is more than an improvement—it’s the next inflection point.

Figure 1: Per-Module Cost Breakdown – Grace Hopper Superchip (GH200) vs. Predictive Interface SoC

Freeway Traffic Analogy: Speculative vs. Predictive Execution

Imagine you’re driving on a crowded freeway during rush hour. Speculative execution is like changing lanes the moment you see a temporary opening—hoping it will be faster. You swerve into that new lane, pass 20 cars… and then hit the brakes. That lane just slowed to a crawl, and you have to switch again, wasting time and fuel with every guess.

Predictive execution gives you a drone’s-eye view of the next 255 car lengths. You can see where slowdowns will happen and where the traffic flow is smooth. With that insight, you plan your lane changes in advance—no jerky swerves, no hard stops. You glide through traffic efficiently, never getting stuck. This is exactly what predictive interfaces bring to chip architectures: fewer stalls, smoother data flow, and far less waste.

Let’s examine the cost of speculative computing in current hyperscalar designs. The NVIDIA Grace Hopper Superchip (GH200) integrates a 72-core Grace CPU with a Hopper GPU via NVLink-C2C and feeds them using LPDDR5x and HBM3 memory respectively. While this architecture delivers impressive performance, it also incurs massive BoM costs due to its reliance on HBM3 high-bandwidth memory (96–144 GB), CoWoS packaging to integrate GPU and HBM stacks, deep caches, register renaming, warp scheduling logic, and power delivery for high-performance memory subsystems.

GH200 vs. Predictive Interface: Module Cost Comparison
GH200 Module Components Cost Architecture with Predictive Interface Cost
HBM3 (GPU-side) $2,000–$2,500 DDR5/LPDDR5 memory (shared) $300–$500
LPDDR5x (CPU-side) $350–$500 Interface control fabric (scheduler + memory coordination) $100–$150
Interconnect & Control Logic (NVLink-C2C + PHYs) $250–$350 Standard packaging (no CoWoS) $250–$400
Packaging & Power Delivery (CoWoS, PMICs) $600–$1,000 Simplified power delivery $100–$150
Total per GH200 module $3,200–$4,350 Total cost per module $750–$1,200
A Cost-Optimized Alternative

An architecture with predictive interface eliminates speculative execution and instead employs time-scheduled, deterministic coordination between scalar CPUs and vector/matrix accelerators. This approach eliminates speculative logic (OOO, warp schedulers), makes memory latency predictable—reducing cache and bandwidth pressure, enables use of standard DDR5/LPDDR memory, and requires simpler packaging and power delivery. In the same data center configuration, this would yield a total integration cost of $2.4M–$3.8M, resulting in a total estimated savings: $7.8M–$10.1M per deployment.

While the benefits of predictive execution are substantial, implementing it does not require a complete redesign of a speculative computing system. In most cases, the predictive interface can be retrofitted into the existing instruction execution unit—replacing the speculative logic block with a deterministic scheduler and timing controller. This retrofit eliminates complex out-of-order execution structures, speculative branching, and register renaming, removing approximately 20–25 million gates. In their place, the predictive interface introduces a timing-coordinated execution fabric that adds 4–5 million gates, resulting in a net simplification of silicon complexity. The result is a cleaner, more power-efficient design that accelerates time-to-market and reduces verification burden.

Is $10M in Savings Meaningful for NVIDIA?

At NVIDIA’s global revenue scale (~$60B in FY2024), a $10M delta is negligible. But for a single data center deployment, it can directly impact total cost of ownership, pricing, and margins. Scaled across 10–20 deployments, savings exceed $100M. As competitive pressure rises from RISC-V and low-cost inference chipmakers, speculative execution becomes a liability. Predictive interfaces offer not just architectural efficiency but a competitive edge.

Environmental Impact

Beyond cost and performance, replacing speculative execution with a predictive interface can yield significant environmental benefits. By reducing compute power requirements, eliminating the need for HBM and liquid cooling, and improving overall system efficiency, data centers can significantly lower their carbon footprint.

  • Annual energy use is reduced by ~16,240 MWh
  • CO₂ emissions drop by ~6,500 metric tons
  • Up to 2 million gallons of water saved annually by eliminating liquid cooling
Conclusion: A Call for Predictable Progress

Speculative execution has long served as the backbone of high-performance computing, but its era is showing cracks—both in cost and efficiency. As AI workloads scale exponentially, the tolerance for waste—whether in power, hardware, or system complexity—shrinks. Predictive execution offers a forward-looking alternative that aligns not only with performance needs but also with business economics and environmental sustainability.

The data presented here makes a compelling case: predictive interface architectures can slash costs, lower emissions, and simplify designs—without compromising on throughput. For hyperscalers like NVIDIA and its peers, the question is no longer whether speculative execution can keep up, but whether it’s time to leap ahead with a smarter, deterministic approach.

As we reach the tipping point of compute demand, predictive execution isn’t just a refinement—it’s a revolution waiting to be adopted.

Also Read:

LLMs Raise Game in Assertion Gen. Innovation in Verification

Scaling AI Infrastructure with Next-Gen Interconnects

Siemens Describes its System-Level Prototyping and Planning Cockpit


LLMs Raise Game in Assertion Gen. Innovation in Verification

LLMs Raise Game in Assertion Gen. Innovation in Verification
by Bernard Murphy on 04-30-2025 at 6:00 am

Innovation New

LLMs are already simplifying assertion generation but still depend on human-generated natural language prompts. Can LLMs go further, drawing semantic guidance from the RTL and domain-specific training? Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Using LLMs to Facilitate Formal Verification of RTL and was published in Arxiv.org in 2023. The authors are from the Princeton University. The paper has 27 citations according to Google Scholar.

The authors acknowledge that there is already published work on using LLMs to generate SVA assertions from natural language prompts but point out that the common approach doesn’t alleviate much of the burden on test writers who must still reason about and express test intent in natural language. Their goal is to explore whether LLMs can generate correct SVA for a given design without any specification beyond the RTL— even when the RTL contains bugs. They partially succeed, though still depend on designer review/correction.

Paul’s view

Great find by Bernard this month – paper out of Princeton on prompt engineering to improve GPT4’s ability to generate SVAs.  The intended application is small units of code that can achieve full statement and toggle coverage based only on SVAs and model checking.

The authors refine their prompt by taking RTL for a simple FIFO module which is known to be correct and repeatedly asking GPT4 to “write SVA assertions to check correctness of ALL the functionality” of that module. After each iteration they review the SVAs and add hints to their prompt to help GPT4 generate a better result. For example, “on the postcondition of next-cycle assertions (|=>), USE $past() to refer to the value of wires.” After 23 iterations and about 8 hours of manual effort they come up with a prompt that generates a complete and correct set of assertions for the FIFO.

Next, the authors take their engineered prompt and try it on a more complex module – the page table walker (PTW) of an opensource RISC-V core. They identify a recent bug fix to the PTW and take an RTL snapshot from before that bug fix. After calling GPT4 8 times (for a total of 80 SVAs generated), they are able to get an SVA generated that catches the bug. An encouraging step in the right direction, but of course it’s much easier to find an SVA to match a known bug vs. looking at several failing auto-generated SVAs and wondering which ones are due to a real bug in the RTL vs. the SVA itself being buggy.

The latter part of the paper investigates if auto-generated SVAs can improve RTL generation: the authors take a 50 word plain text description of a FIFO queue and ask GPT4 to generate RTL for it. They generate SVAs for this RTL, manually fix any errors, and add the fixed SVAs back into the prompt. After 2 iterations of this process they get clean RTL and SVAs with full coverage. Neat idea, and another encouraging result, but I do wonder if the effort required to review and fix the SVAs was any less than the effort that would have been required to review and fix the first RTL generated GPT4.

Raúl’s view

Formal property verification (FPV) utilizing SystemVerilog Assertions (SVA) is essential for effective design verification. Researchers are actively investigating the application of large language models (LLMs) in this area, such as generating assertions from natural language, producing liveness properties from an annotated RTL module interface, and creating a model of the design from a functional specification for comparison with the RTL implementation. This paper examines whether LLMs can generate accurate SVA for a given design solely based on the RTL, without any additional specifications – which has evident advantages. The study builds upon the previously established framework, AutoSVA, which uses GPT-4 to generate end-to-end liveness properties from an annotated RTL module interface. The enhanced framework is referred to as AutoSVA2.

The methodology involves iteratively refining prompts with rules to teach GPT-4 how to generate correct SVA (even state-of-the-art GPT-4 generates syntactically and semantically wrong SVA by default) and crafting rules to guide GPT4 at generating SVA output, published as open-source artifacts [2]. Two examples of such rules include: “signals ending in _reg are registers: the assigned value changes in the next cycle”, “DO NOT USE $past() on postcondition of same-cycle assertion”.

The paper details extensive experimentation that identified a bug in the RISC-V CVA6 Ariane core which had previously gone undetected. AutoSVA2 also allows the generation of Register Transfer Level (RTL) for a FIFO queue based on a fifty-word specification. To illustrate the process, here is an excerpt from the paper describing the workflow:

  1. Start with a high-level specification in English
  2. The LLM generates a first version of the RTL based on the specification, the module interface, and an order to generate synthesizable Verilog
  3. AutoSVA2 generates an FPV Testbench (FT) based on the RTL
  4. JasperGold evaluates the FT
  5. The engineer audits and fixes the SVA
  6. The LLM generates a new version of the RTL after appending the SVA to the previous prompt.
  7. Steps 3 to 6 are then repeated until convergence: either (a) full proof and coverage of the FT or(b) a plateau in the improvements of the RTL and SVA.

This process differs significantly from the role of a designer or verification engineer. GPT-4 creativity allows it to generate SVA from buggy RTL as well as create buggy SVA for correct RTL; reproducibility presents a challenge; internal signals, timing, syntax, and semantics may be partially incorrect and are partly corrected by the rules mentioned above.

On the positive side, AutoSVA2-generated properties improved coverage of RTL behavior by up to six times over AutoSVA-generated ones with less effort and exposed an undiscovered bug. The authors think that the approach has the potential to expand the adoption of FPV and pave the way for safer LLM-assisted RTL design methodologies. The Times They Are A-Changin’?

Also Read:

High-speed PCB Design Flow

Perspectives from Cadence on Data Center Challenges and Trends

Designing and Simulating Next Generation Data Centers and AI Factories


Siemens Describes its System-Level Prototyping and Planning Cockpit

Siemens Describes its System-Level Prototyping and Planning Cockpit
by Mike Gianfagna on 04-28-2025 at 10:00 am

Siemens Describes its System Level Prototyping and Planning Cockpit

We all know semiconductor design is getting harder. Much harder when you consider the demands of AI workloads and heterogeneous integration of many chiplets in a single package. This class of system demands co-optimization across the entire design flow. For example, functional verification, thermal analysis, signal and power integrity, electromigration, and IR drop all need to be balanced across a complex process of die and package co-design. Data management and tool integration are particularly vexing here.

Against this backdrop, Siemens Digital Industries Software has published an eBook that flattens this class of problem. The company illustrates how its approach works using Intel Foundry’s EMIB packaging technology. If you face any type of complex chip design, this eBook is must-read. Don’t let the category scare you off, the eBook isn’t long, but It’s packed with solid examples of how to tame complex chip design. A link is coming but first let’s examine how Siemens describes its system-level prototyping and planning cockpit.

About the Authors

Two exceptional gentlemen with substantial background in the problems addressed in this eBook are the authors.

Keith Felton

Keith Felton has over 30 years of experience developing and marketing advanced tools and supporting customers in the use of those tools for complex chip design, PCB design, and high-density advanced packaging. He has worked at companies such as Cadence, Viewlogic and Zuken-Redac as well as Siemens. He has also led partnerships across the semiconductor ecosystem.

 

 

 

Mike Walsh

Mike Walsh has over 30 years of experience helping customers around the world to design challenging advanced packages. He has broad knowledge of the system planning and design process and expertise in system-in-package (SiP), interposers, 3D IC, wafer-level integration, and multi-substrate solutions.

The insights offered by these authors is eye-opening, relevant and quite valuable.

About Intel Foundry’s EMIB Technology

Embedded multi-die interconnect bridge (EMIB) is a semiconductor packaging technology developed by Intel Foundry that uses a small, embedded silicon bridge to interconnect multiple dies, or chiplets, within a single package. In contrast to large silicon interposers, the EMIB bridge only spans the area needed to connect the specific dies, making it a more compact and cost-effective solution.

EMIB facilitates integration of multiple dies in a single package with the ability to have multiple EMIB bridges. This approach provides a good example of how to deploy an integrated chip/package flow since the increased design complexity intrinsic to EMIB technology shifts more of the challenges to the package level.

The Design Challenges

Design challenges include high pin counts, integration of diverse components, and providing an accurate representation of the EMIB structure for package design tools. Because EMIB is a passive bridge without active silicon, defining the EMIB component modules and setting up constraints to achieve low latency design rule checks (DRC) is crucial. Power delivery to the EMIB bridge is a primary design concern, requiring point-to-point connections and sufficient power distribution.

Typical advanced packaging workflows present several challenges. These flows include design and analysis tools from different vendors, creating disconnected manual processes. An environment like this requires importing and exporting a lot of data. This results in a lot of data iterations that can produce errors. This approach can also be time consuming, tempting designers to skip steps, such as functional simulation. But ignoring design steps results in failing to detect connectivity errors, producing non-functional designs.

The diagram below illustrates the complexity and interdependence of the process.

Advanced Package Workflow Challenges

The Siemens System-Level Prototyping and Planning Cockpit

The eBook describes the Siemens approach to designing for Intel Foundry’s EMIB technology. This is accomplished by defining and driving everything from a single digital twin model of the entire advanced package assembly, which is constructed and managed by the Siemens Innovator3D IC™ solution.

Innovator3D IC uniquely represent the entire system, including dies, chiplets, interposers, EMIBs, packages, and PCBs within a single environment. It builds a cohesive view of the system by consuming data in a variety of formats and different levels of completeness. This unified view enables the creation of application-specific data sets, which are then pushed into other tools like Calibre® 3DSTACK, Calibre 3DThermal, Aprisa™, and Tessent™ from Siemens.

Innovator3D IC also leverages these tools in a predictive manner. For example, with Calibre 3DThermal, early insights into thermal performance can guide floor planning adjustments and corrective actions before serious issues arise. Even incomplete data, like preliminary power or heat sink information, can be used to overlay results back into Innovator3D IC providing valuable feedback for optimization.

By providing a platform that integrates all these tools and workflows, Innovator3D IC ensures early issue identification and seamless collaboration across the design flow, ultimately improving efficiency and design quality.

The eBook presents details of a six-step process to perform the complete, integrated design in one unified environment. The descriptions are clear and easy to follow. I highly recommend getting your copy of this eBook. The link is coming soon. The six steps detailed are:

  • Step #1 – Die, EMIB, and package co-design
  • Step #2 – Functional verification of system-level connectivity
  • Step #3 – Early predictive thermal analysis
  • Step #4 – Physical layout using Xpedition Package Designer
  • Step #5 – SI/PI/EM analysis and extraction using HyperLynx
  • Step #6 – 3D assembly verification using Calibre 3DSTACK

To Learn More

I have just scratched the surface of what you will learn from this Siemens eBook. If complex chip/package co-design is on your mind, and especially if you are considering Intel Foundry’s EMIB technology, you need to get your own copy.

You can access your copy of Reference workflows for Intel Foundry EMIB and MIB-T integration platforms here.

You can also learn more about the various tools from Siemens Digital Industries Software that comprise this unique and well-integrated flow here:

And that’s how Siemens describes its system-level prototyping and planning cockpit.


Recent AI Advances Underline Need to Futureproof Automotive AI

Recent AI Advances Underline Need to Futureproof Automotive AI
by Bernard Murphy on 04-28-2025 at 6:00 am

BEVDepth min

The world of AI algorithms continues to advance at a furious pace, and no industry is more dependent on those advances than automotive. While media and analysts continue to debate whether AI will deliver value in business applications, there is no question that it adds value to cars, in safety, some level of autonomous driving, and in comfort. But there’s a tension between these rapid advances and the 15-year nominal life of a car. Through that lifetime, software and AI models must be updated at service calls or through over-the-air updates. Such updates are now relatively routine for regular software, but AI advances are increasing stress on NPU architectures even more than they have in the past.

A BEVDepth algorithm (courtesy of GitHub)

From CNNs to Transformers to Fusion

For most of us CNNs were the first big breakthrough in AI, amply served by a matrix/vector engine (MACs) followed by a bit of ALU activity to wrap up the algorithm. Hardware was just that – a MAC array and a CPU. Transformers made this picture messier. The attention part is still handled by a matrix engine but the overall flow goes back and forth between matrix, vector and scalar operations. Still manageable in common NPUs with three engines (MAC, DSP, CPU) but traffic between these engines increases, adding latency unless the NPU and the model are optimized carefully for minimal overhead. Now add in fusion, depending on special operators which must be custom coded in C++ to run on the CPU. The important metric, inferences per second, depends heavily on model designer/implementor expertise.

This raises two important questions for any OEM or Tier1 planning around an NPU selection. First, what kind of performance can they expect for advanced algorithms for next generation designs? Second, will they need NPU provider expertise to code and optimize their proprietary/differentiating algorithms. Revealing company secrets is only part of the problem. The NPU market is still young and likely volatile, unlike the established CPU market. After committing to an NPU, who will take care of their model evolution needs over the 15-year life of a car?

Predicting what might be needed in the future is impossible, but it is certainly possible to look to model needs on the near horizon for a sense of which NPU architectures might best support adaptation to change. Quadric has an intriguing answer, citing some of the newer AI models such as BEVDepth.

Evolution in Bird’s Eye View (BEV) applications

If you have a relatively modern car you are probably already familiar with Bird’s Eye View as an aid to parallel parking. This is an option on your infotainment screen, an alternative to the backup camera view and the forward-facing camera view. BEV is the screen that shows a view from an imaginary camera floating six feet above the car, amazingly useful to judge how close you are to the car behind, the car in front, and the kerb.

This view is constructed through the magic of optics: multiple cameras around the car in effect project their images onto a focal plane at that imaginary camera location. The images are stitched together, with some adjustment, providing that bird’s-eye view.

Neat and useful, but model designers have larger aspirations than support for parallel parking. BEV is already making its way into some aspects of autonomous driving, especially as a near-range supplement to LIDAR or RADAR. But to be truly useful it needs to extend to a 3D view.

Adding depth information to BEV has stimulated a lot of research. Each camera contributes a different input to the BEV, not just as a slice of that view, but also through differing perspectives and intrinsic properties of the cameras. There are multiple proposed algorithms, of which one is BEVDepth. This algorithm uses point clouds from LIDAR as a reference for transformer-based depth learning around camera images.

An important step in this process involves voxel pooling. Pooling is a familiar step in CNNs, reducing the dimension of an image while preserving important features. Voxels are just the “3D pixels” you would expect in a 2D image (BEV) with depth. Voxel pooling is a complex algorithm and (in the GitHub version) is implemented in CUDA, the well-known NVIDIA programming standard. At nearly 200 lines of CUDA, this is not a simple operator to be added easily to the ONNX standard operator set. Further I am told this operation accounts for 60% of the compute cost of BEVDepth and must run on an ALU. Could you implement this on a regular NPU? Probably but apparently other NPU experts still haven’t delivered performance versions, while Quadric has already demonstrated their implementation.

A good fit for Quadric Chimera

You may not remember the key value prop for the Chimera NPU. These can be arranged as systolic arrays, nothing novel there. But each processing element (PE) in the array has MACs, an ALU, and local register memory wrapped in a processor pipeline. In switching between matrix, vector, and scalar operations, there’s no need to move data. Computation of all types can be handled locally as data flows through the array, rather than having to be swapped back and forth between matrix, DSP, and CPU engines.

Sounds good, but does it deliver? Quadric ran a benchmark of the Voxel Pooling algorithm, comparing performance on an Nvidia RTX 3090 chip versus a Quadric QC-Ultra (quad-core), running at the same clock frequency. The Quadric solution ran more than 2 times faster at substantially lower power. And here’s the clincher. While the algorithm is written in CUDA, the only difference between CUDA C++ and Quadric’s C++ is some easily understood memory pointer changes. Quadric was able to port the GitHub code in 2 weeks and claim anyone with C++ experience could have made the same changes. They claim the same applies to any operation which can be written in C++.

The takeaway is that a model as advanced as BEVDepth, supported by a key function written in CUDA, was easily mapped over to the Quadric platform and ran twice as fast as the same function running on an Nvidia chip at substantially lower power. Faster of course because Chimera is designed for IoT inferencing rather than heavy-duty training. Much lower power for the same reason. And programming is easily managed by an OEM or Tier1 C++ programmer. Ensuring that models can be maintained and upgraded long-term over the life of an automotive product line.

Staying current with AI innovation is a challenge in all markets, but none more so than in automotive. The NPU architecture you want to bet on must allow you to upgrade models in ways you can’t yet predict over 15-year lifespans. You need a solution your own software programmers can manage easily yet which offers all the performance advantages you expect from an NPU. You might want to checkout Quadric’s website.

Also Read:

2025 Outlook with Veerbhan Kheterpal of Quadric

Tier1 Eye on Expanding Role in Automotive AI

A New Class of Accelerator Debuts

 


Podcast EP285: The Post-Quantum Cryptography Threat and Why Now is the Time to Prepare with Michele Sartori

Podcast EP285: The Post-Quantum Cryptography Threat and Why Now is the Time to Prepare with Michele Sartori
by Daniel Nenni on 04-25-2025 at 10:00 am

Dan is joined by Michele Sartori – senior product manager at PQShield. Michele is a software engineer in Computer and Network Security, specializing in product management. He is a passionate tech team leader at the forefront of emerging technologies focused on achieving tangible results.

In this highly informative discussion, Dan explores the details of preparing for post-quantum cryptography with Michele. Michele explains why the time to begin preparation for these changes is NOW. He describes what needs to be done in key areas such as performance, integration and of, course security. He explains how to develop a three-step plan to prepare the enterprise for important changes that will become a mandate by 2030.

Michele also explains what risks exist today, even before quantum computers have reached the required performance level to pose a real threat. He describes current and future cryptography and security strategies and the work PQShield is doing across the ecosystem to help organizations prepare for the coming changes.

Contact PQShield

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Cost-Effective and Scalable: A Smarter Choice for RISC-V Development

Cost-Effective and Scalable: A Smarter Choice for RISC-V Development
by Daniel Nenni on 04-25-2025 at 6:00 am

Vast library of 90+ Prototype Ready IP

The RISC-V ecosystem is witnessing remarkable growth, driven by increasing industry adoption and a thriving open-source community. As companies and developers seek customizable computing solutions, RISC-V has become a top choice. Providing a scalable and cost-effective ISA foundation, RISC-V enables high-performance and security-enhanced implementations, making it ideal for next-generation digital infrastructure.

RISC-V’s modular ISA and growing ecosystem support a wide range of configurations, making it highly adaptable across applications. Designers have options to integrate extensions such as vector processing, floating-point, atomic operations, and compressed instructions. Furthermore, its scalability spans from single-core to multi-core architectures and can incorporate optimizations like out-of-order execution to enhance performance. To achieve an optimal balance of performance, power efficiency, and scalability, selecting the right RISC-V microarchitecture and system integration strategy is crucial.

For entry-level RISC-V development, single-core open-source implementations are well-suited for small to medium capacity FPGA-based platforms. Among them, Xilinx VU9P-based solutions, such as the VCU118 development board, have been widely adopted by engineers for their balanced capabilities and accessibility. The S2C Prodigy S7-9P Logic System takes this foundation even further. Built on the same powerful VU9P FPGA with 14M ASIC gates, it enhances usability, expandability, and cost-efficiency. With seamless integration of daughter cards and an advanced toolchain, the S7-9P offers an ideal fit for small to medium-scale RISC-V designs, empowering developers to accelerate their innovation with confidence.

Media-Ready Prototyping: MIPI and HDMI for Real-World Applications

As multimedia processing becomes increasingly integral to RISC-V applications, the demand for high-speed data handling and versatile prototyping tools has never been greater. The S2C Prototyping systems meet this need with support for MIPI and HDMI via optional external daughter cards, making it an ideal choice for smart displays, AR/VR systems, and AI-powered cameras. For example, if you’re developing a RISC-V-based smart camera, a complete prototyping environment from capturing images via MIPI D-PHY to display outputs through HDMI can be deployed with ease. Its flexible expansion options allow developers to experiment with various configurations, refine their designs, and push the boundaries of RISC-V media applications.

High-Speed Connectivity: QSFP28 Ethernet for Next-Gen Networking

With networking requirements becoming more demanding, high-speed connectivity is crucial for RISC-V-based applications. The S7-9P rises to this challenge with built-in QSFP28 Ethernet support, enabling 100G networking applications. This makes it an optimal choice for developing and testing prototyping RISC-V-based networking solutions, including routers, switches, and edge AI processing units.

Need More Scalability?

While the S7-9P is an excellent choice for entry-level to mid-range RISC-V prototyping, more complex designs may require greater capacity. For high-end verification and large-scale projects, S2C also offers advanced solutions like the VU440 (30M ASIC gates), VU19P (49M ASIC gates), and VP1902 (100M ASIC gates), providing the scalability needed for RISC-V subsystems, multi-core, AI, and data-intensive applications.

Special Offer: Save 25% on the Prodigy S7-9P Bundle

For a limited time, get the Prodigy S7-9P bundle—which includes a free Vivado license (valued at $5,000+)—for just $14,995, a 25% savings! Visit S7-9P for information or Contact our team to find the perfect fit for your project.

Also Read:

S2C: Empowering Smarter Futures with Arm-Based Solutions

Accelerating FPGA-Based SoC Prototyping

Unlocking SoC Debugging Challenges: Paving the Way for Efficient Prototyping


High-speed PCB Design Flow

High-speed PCB Design Flow
by Daniel Payne on 04-24-2025 at 10:00 am

PCB design phases min

High-speed PCB designs are complex, often requiring a team with design engineers, PCB designers and SI/PI engineers working together to produce a reliable product, delivered on time and within budget. Cadence has been offering PCB tools for many years, and they recently wrote a 10-page white paper on this topic, so I’ll share what I learned. The promise is that using early identification and resolution of SI and PI challenges will shorten the overall time to market.

The three PCB design steps are: Schematic, Layout, Post-layout and Signoff. If your EDA tool flow includes in-design analysis, then the team can find and fix SI and PI issues earlier and with more accuracy.

Collaboration across teams means that an EE can define the high-speed constraints at the schematic stage with little need for an SI expert. Layout designers use visualization tools to see SI/PI issues quickly in their tools. Handoffs between team members are made efficient by in-tool feedback.

The Power Distribution Network (PDN) can be analyzed for issues like IR drop under DC operating conditions, enabling decisions on current density and specifying copper weight and thicknesses.  You can visualize DC drop analysis in Cadence tools.

DC drop analysis

During transient operation the PCB design encounters high-frequency switching currents that couple with inductance to create voltage noise. Add decoupling capacitors and minimizing inductance are ways to mitigate this noise. AC power analysis tools simulate transient responses from the PCB, along with power noise and impedance profiles so that each component has stable and clean power.

AC Analysis

High-speed datalinks are commonly used for PCIe, Ethernet, USB and UCIe designs, so care is required to manage channel loses, via effects and pass compliance testing. Vias can add undesired discontinuities, create impedance mismatches and degrade signals from inductance and capacitance effects, cause stub resonance and even add return path discontinuities. Engineers can now design, view and validate via structures early on with the Aurora Via Wizard.

Aurora Via Wizard

Traces at high frequencies exhibit losses from conductor resistance, dielectric absorption and the roughness of copper traces. Designers can choose low-loss dielectrics, optimize the trace geometry and maintain a continuous ground plane under signal traces to mitigate these losses. To simulate different dielectric materials the Sigrity X Topology Workbench comes into play. For SerDes interfaces there’s the Compliance Analysis tools to validate a design early, adjust signal paths and pass protocol specifications.

Designing DDR5 interfaces at multi-gigabit speeds is enabled by using Sigrity X Topology Explorer Workbench for parameter sweeps to find the best termination configuration and find optimal routing solutions while finding any timing violations. DDR memory buses can have hundreds of signals, and using Sigrity X Aurora helps to automate through impedance validation, crosstalk analysis and return path optimization.

Signal quality

Another high-speed design issue is Simultaneous Switching Noise (SSN), causing ground bounce, increased jitter and timing errors. Cadence has power-aware IBIS and advanced PDN analysis tools to quickly identify these vulnerabilities, provide decoupling capacitor placement and accurately simulation SSN effects. For via-to-via crosstalk issues there’s 2.5D and 3D analysis tools for via modeling, along with design recommendations for via shielding and optimized layer transitions.

Cadence Tools

The full high-speed PCB flow is covered by tools that work together from schematic to signoff: Allegro X Design Platform, Sigrity X Platform, Sigrity X Aurora Via Wizard, Sigrity X Topology Explorer Workbench, Clarity 3D.

Summary

High-speed PCB design teams can navigate successfully through the challenges of signal integrity and power integrity by using in-design analysis tools. This approach shortens time to market through tool automation, using distributed computing and making complex concepts easier to understand.

Read the complete  white paper from Cadence online.

Related Blogs


ESD Alliance Executive Outlook Features View of How Multi-Physics is Reshaping Chip Design and EDA Tools

ESD Alliance Executive Outlook Features View of How Multi-Physics is Reshaping Chip Design and EDA Tools
by Bob Smith on 04-24-2025 at 6:00 am

CEO Outlook #2 (1)

Every spring, the ESD Alliance, a SEMI Technology Community, organizes a get together where industry executives and experts gather to network and talk about trends in the electronic design automation industry.

The theme of this year’s event, once again co-hosted by Keysight, is “How Multi-Physics is Reshaping Chip Design and EDA Tools.” It will be held Thursday, May 22, starting at 5:30 p.m. at Keysight’s office in Santa Clara, Calif.

Our event speakers and panelists are all technically involved with multi-physics and will share their experiences and opportunities and challenges still ahead. Moderated by Ed Sperling, Editor-in-Chief, Semiconductor Engineering, panelists include: Bill Mullen, Ansys Fellow at Ansys; John Ferguson, Sr. Director, Product Management from Siemens EDA; Chris Mueth, Sr. Director, New Markets and Strategic Initiatives of Keysight; and Albert Zheng, Sr. Engineering Group Director with Cadence.

Registration is open. Pricing for members is free. Non-member pricing is $49. Register at: https://tinyurl.com/bucunc7j.

In a perfect world, multi-physics analysis would be seamlessly integrated within the chip/system design flow resulting in early detection and correction of physical issues.

While the industry isn’t fully there yet, there is rapid adoption of these new technologies. With system complexities ever-increasing, chip designers are being required to expand their scope and responsibilities beyond “the chip.” Modern semiconductor-based systems often include novel packaging of devices and substrates in form factors that minimize system area, while simultaneously optimizing for performance, power and reliability.

While traditional analysis tools continue to play an important role in the design process, multi-physics tools are rapidly being adopted to address system-level issues that must be considered to bring new products to market. In order to achieve market success, these products must meet broad specifications including reliability and safety in addition to typical chip performance issues such as performance, size, energy and throughput.

The term multi-physics covers the range of physical effects that are typically not within the scope of traditional chip design analysis tools. These effects include (but are not   limited to) mechanical stress, heat, electromagnetic interference and even packaging and cooling.

Mechanical stress must be considered in situations where discrete devices (such as chiplets) are interconnected by stacking or sharing a substrate. During operation, heat may cause the components to undergo thermal expansion that can lead to mechanical strain that impacts functioning or leads to system failure.

Heat generated must be analyzed to understand how it propagates through the system. Hot spots may lead directly to device performance issues and induce mechanical stress that causes further issues.

Electromagnetic interference typically arises due to high-speed signals within the system that can interact with other signals within the system or other nearby system components. These interactions can lead to performance issues or even system failures.

Packaging and cooling analysis is necessary to understand how to mitigate the effects of heat and mechanical stress on the chip/system and to other nearby components.

Join us at the 2025 Executive Outlook event and gain insight into how these rapidly evolving technologies are changing chip and system design flows. The event will be held at Keysight, Building 5, 5301 Stevens Creek Blvd in Santa Clara.

About the ESD Alliance

The ESD Alliance, a SEMI Technology Community, offers initiatives and activities that bring value to our entire industry including:

  • Coordinating and amplifying the collective and regional voices of our industry.
  • Continually promoting the value our industry delivers to the global semiconductor and electronics industry.
  • Addressing and defending threats and reducing risks to our industry.
  • Achieving efficiencies for our industry.
  • Marketing the attractiveness of the design ecosystem as an ideal industry for pursuing a career.
  • Enabling networking, sharing and collaboration across our industry.

If your company is not currently a member, shouldn’t it consider joining the ESD Alliance and SEMI? Contact me at bsmith@semi.org to get the discussion started.

Also Read:

Andes RISC-V CON in Silicon Valley Overview

SNUG 2025: A Watershed Moment for EDA – Part 1

DVCon 2025: AI and the Future of Verification Take Center Stage


TSMC Brings Packaging Center Stage with Silicon

TSMC Brings Packaging Center Stage with Silicon
by Mike Gianfagna on 04-23-2025 at 11:45 am

TSMC Brings Packaging Center Stage with Silicon

The worldwide TSMC 2025 Technology Symposium recently kicked off with the first event in Santa Clara, California. These events typically focus on TSMC’s process technology and vast ecosystem. These items were certainly a focus for this year’s event as well. But there is now an additional item that shares the spotlight – packaging technology. Thanks to the increase in heterogeneous integration driven in large part by AI, the ability to integrate multiple dies in sophisticated packages has become another primary driver for innovation. So, let’s look at what was shared at the pre briefing by Dr. Kevin Zhang and how TSMC brings packaging center stage with silicon.

A Growing Palette of Options

TSMC has taken advanced packaging well beyond the 2.5D interposer approach that is now quite familiar. The diagram above was provided by TSMC to illustrate the elements that comprise the TSMC 3DFabric® technology portfolio. According to TSMC, transistor technology and advanced packaging integration technology go hand-in-hand to provide its customers with a complete product-level solution.

On the left are the options for stacking or die-level/wafer-level integration. SoIC-P ( below) uses microbump technology to deliver down to a 16um pitch. Using bumpless technology (SoIC-X), you can achieve a few micron pitch. TSMC started with 9um and is now in production at 6um with more improvements to come, creating a monolithic-like integration density.

For 2.5/3D integration, there are many options available. Chip on Wafer on Substrate (CoWoS) technology supports both the familiar silicon interposer as well as CoWoS-L, which uses an organic interposer with a local silicon bridge for high-density interconnect. CoWos-R provides a pure organic interposer.

Integrated Fan-Out (InFO) technology began in 2016 for mobile applications. The platform has been expanded to support automotive applications as well.

There is also the newer System-on-Wafer (TSMC-SoW™) packaging. This technology broadens the integration scale to the wafer level. There is a chip-first approach (SoW-P), where the chip is put on the wafer and then an integrated RDL is built to bring the dies together.  Or, there is a chip-last approach (SoW-X), where you first build the interposer at the wafer level and then add the chips across the wafer. This last approach can produce a design that is 40X larger than the standard reticle size.

High-performance computing for AI is clearly a major driver for advanced packaging technology. The first diagram below provided by TSMC, illustrates a typical AI accelerator application today that integrates a monolithic SoC with HBM memory stacks through a silicon interposer. Some major improvements that are coming for this type of architecture as shown on the next diagram.

The monolithic SoC is now replaced with a 3D stack of chips to address high-density compute requirements. HBM memory stacks are integrated with an RDL interposer. Integrated silicon photonics will also be part of the design to improve communication bandwidth and power. Integrated voltage regulators will also help to optimize power for this type of application.

Regarding power optimization, future AI accelerators can require thousands of watts of power, creating a huge challenge in terms of power delivery into the package. Integrated voltage regulators will help to tame this class of problem. TSMC has developed a high-density inductor which is a key component required to develop this class of regulator. So, a monolithic PMIC plus this Inductor can provide a 5X power delivery density (vs. PCB level).

There are many exciting new technologies on the horizon which will require all the packaging innovation discussed here. Augmented reality glasses is one example of a new product that will require everything discussed. A device like this will require, among other things, an ultra-low power processor, a high resolution camera for AR sensing, eNVM for code storage, a large main processor for spatial computing, a near-eye display engine, WiFi/Bluethooth for low latency RF, and a digital intensive PMIC for low power charging. This kind of product will set a new bar for complexity and efficiency.

While autonomous vehicles get a lot of attention, the demands of humanoid robots were also discussed. TSMC provided the graphic below to illustrate the significant amount of advanced silicon required. And the ability to integrate all of this into dense, power efficient packages is critical as well.

To Learn More

It was clear at the TSMC Technology Symposium that advanced processing and advanced packaging will need to work as one going forward to achieve the type of product innovation on the horizon. TSMC has clearly taken this challenge and is developing unified offerings to address the coming requirements.

You can learn more about TSMC’s 3DFabric Technology here. And that’s why TSMC brings packaging center stage with silicon.

 

UPDATE: TSMC is sharing recordings of the presentations HERE.

Also Read:

TSMC 2025 Technical Symposium Briefing

IEDM 2025 – TSMC 2nm Process Disclosure – How Does it Measure Up?

TSMC Unveils the World’s Most Advanced Logic Technology at IEDM

IEDM Opens with a Big Picture Keynote from TSMC’s Yuh-Jier Mii

 


TSMC 2025 Technical Symposium Briefing

TSMC 2025 Technical Symposium Briefing
by Daniel Nenni on 04-23-2025 at 11:40 am

TSMC Advanced Tecnology RoadMap 2025 SemiWiki

At the pre-conference briefing, Dr. Kevin Zhang gave quite a few of us media types an overview of what will be highlighted at the 2025 TSMC Technical Symposium here in Silicon Valley. Since most of the semiconductor media are not local this was a very nice thing to do. I will be at the conference and will write more tomorrow after the event. TSMC was also kind enough to share Kevin’s slides with us.

The important thing to note is that TSMC is VERY customer driven so this presentation is based on interactions with the largest semiconductor manufacturing customer base the industry has ever seen, absolutely.

As you can imagine, AI is driving the semiconductor industry now not unlike what smartphones did for the last two decades. The difference being that AI consumes leading edge silicon at an alarming rate which is a good thing for the semiconductor industry. While AI is very performance centric, it must also be power sensitive. This puts TSMC in a very strong position from all of those years of manufacturing mobile SOCs for smartphones and other battery operated devices.

Kevin started with the AI revolution and how AI will be infused into most every electronic device from the cloud to the edge and will enable many new applications. Personally, I think AI will transform the world in a similar fashion as smartphones have but on a much grander scale.

Not long ago the mention of the semiconductor industry hitting $1T seemed like a dream. It is one thing for industry observers like myself to say it but it is quite another when TSMC does. There is little doubt in my mind that it will happen based on my observations inside the semiconductor ecosystem.

There have been some minor changes to the TSMC roadmap. It has been extended out to 2028 adding N3C and A14. The C is a compressed version meaning the yield learning curve is at a point where the process can be further optimized for density.

A14 will certainly be a big topic of discussion at the event. A14 is TSMC’s second generation of nanosheet transistor which is considered a full node (PPA) versus N2: 10-15% speed improvement at the same power, 25-30% power reduction at the same speed, and 1.2X logic density improvement. The first iteration of 14A does not have backside power delivery. It was the same with N2 which was followed by A16 with Super Power Rail (SPR). SPR for A14 is expected in 2029.

The TSMC 16A specs were updated as well. 16A is the first version of SPR for reduced IR drop and improved logic density. This has the transistor connection on the back. SPR is targeted at AI/HPC designs with improved signal routing and power delivery. A16 is on track for production in the second half of 2026. In comparison to N2P, A16 provides an 8-10% speed improvement at the same power, 15-20% power reduction at the same speed.

From what I have heard TSMC N2 is yielding quite well and is on track for production later this year. The big question is who will be the first customer to ship N2 product? Usually it is Apple but word on the street is the iPhones this year will again be using N3. I already have an N3 iPhone so I will skip this generation if that is the case. If Apple does an N2 based iPhone Max Pro this year then count me in!

TSMC N2P is also on track for production in the second half of 2026. As compared to N3E, N2P offers: 18% speed improvement at the same power, a 36% power reduction at the same speed, and a 1.2x density improvement.

The most interesting thing about N2 is the rapid growth of tape-outs between N5, N3, and N2. It really is astounding. Given that TSMC N3 was an absolute landslide for customer tape-outs I had serious doubts if we would ever see a repeat of that success but here we are. Again, in the past mobile was the driver for early tape-outs but now we have AI/HPC as well.

Finally, as Kevin said, TSMC N3 is the last and best FinFET technology available on such a massive scale with N3, N3E, N3P, N3X, N3A, and now N3C. Yet, N2 tape-outs beat N3 in the first year and the second year even more so. Simply amazing. I guess the question is who is NOT using TSMC N2?

The second part of the presentation was on packaging which will be covered in another blog. After the event I can provide even more details and get a feeling for the vibe at the event from the ecosystem. Exciting times!

UPDATE: TSMC is sharing recordings of the presentations HERE.

Also Read:

TSMC Brings Packaging Center Stage with Silicon

IEDM 2025 – TSMC 2nm Process Disclosure – How Does it Measure Up?

TSMC Unveils the World’s Most Advanced Logic Technology at IEDM

IEDM Opens with a Big Picture Keynote from TSMC’s Yuh-Jier Mii