RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Is Intel About to Take Flight?

Is Intel About to Take Flight?
by Jonah McLeod on 04-21-2026 at 10:00 am

Chip to Austin

The Pan Am–Boeing playbook and what Musk’s Terafab order could mean for Intel Foundry

“We either build the Terafab or we don’t have the chips.” That’s Elon Musk, speaking to Reuters, stating a supply constraint as plainly as anyone has stated one. TSMC is sold out. Samsung is committed. The existing supply chain can’t expand fast enough to meet what his companies will need for AI, robotics, and space. So he went looking for a different kind of supplier — one with capacity, knowhow, and no queue.

Pan Am’s Juan Trippe found himself in the same position in the mid-1960s with a seat shortage. Existing aircraft couldn’t move enough people at a price that made mass air travel possible. Boeing was hemorrhaging money chasing Concorde, consumed by the prestige race for supersonic speed. Trippe didn’t ask for the fastest plane, but one with enough capacity to move the most people, to the destinations they dreamed of but couldn’t afford.

Trippe walked into Boeing with a vision of a world that had never flown before — and left with Boeing committed to building something beyond their own comprehension. Allen built a factory larger than anything Boeing had ever constructed. Trippe ordered planes Pan Am couldn’t fully pay for. The supersonic program died. The 747 reshaped global travel for fifty years.

What made both men effective was their audacity — the alpha instinct to get out in front of the pack and identify where the big kill is before anyone else sees it. Trippe walked into Boeing with a betcha neither company could afford to lose. In his mind the 747 with Pan Am’s blue and white livery was already crossing every ocean, connecting cities worldwide, carrying people who had never flown.

Musk walks into Intel with the same completed picture — a conveyor belt of silicon feeding his robots, his cars, his satellites, a future he can already see in full resolution even if no one else yet can. He walked in as a partner proposing a betcha Intel can’t afford to lose alone. Both pairs of men at the bridge’s edge — cord untested, betting it holds.

In mid-April, Intel CEO Lip-Bu Tan sent a memo to his staff — two days after Intel announced its Terafab involvement via a 60-word post on X with no press release published to its own website. “Musk’s expansive vision across AI, transportation, communications, robotics and space travel relies heavily on an ample and uninterrupted supply of silicon chips,” Tan wrote of Musk. “Intel is thus a natural partner to help him realize his vision.” He noted that he and Musk had held “wide-ranging and deep conversations,” from which “both sides quickly realized that working together would be mutually beneficial.” That’s the Allen moment — the handshake that commits the company before the engineering is fully proven.

The prestige race is with TSMC. And like Boeing’s pursuit of supersonic transport, it has been expensive, consuming, and largely beside the point of what Intel actually needs to survive.

TSMC built its dominance not by inventing new physics but by running sophisticated machines with greater discipline, consistency, and yield than anyone else. Five companies supply those machines and collectively control the world supply of leading-edge chip production. ASML holds a monopoly on EUV lithography — the only process capable of printing patterns that define a leading-edge transistor, with no alternative anywhere on earth. Applied Materials encodes deposition and materials engineering. Lam Research owns plasma etch. KLA dominates inspection and metrology. Tokyo Electron controls thermal processing and the track systems every wafer passes through before lithography. One Dutch, one Japanese, three American. Remove any single link from the chain and leading-edge production stops globally. Not slows. Stops.

These companies represent the accumulated process expertise of an entire generation of engineers, encoded into capital equipment over decades. The lithography knowledge that once lived in the minds of physicists at Philips now lives in a $400 million machine. The etch knowledge that took Lam’s engineers thirty years to develop runs in software on a plasma chamber. The inspection intuition KLA’s founders built from first principles is now a metrology system examining wafers at resolutions the human eye cannot approach.

The encoding worked — up to a point. What it captured was the physics, the measurement, the repeatable process. What it couldn’t capture was the judgment — what to do when you find something unexpected in a combination of circumstances that has never occurred before in exactly that way. That knowledge lives in people. It always has.

ASML and TSMC didn’t arrive at dominance separately. They built each other. TSMC needed a lithography partner willing to develop equipment around a pure-play foundry model. ASML needed a customer whose volume commitments made the economics of EUV development viable — a technology that took twenty years and billions of dollars to become real. TSMC’s process discipline gave ASML a proving ground. ASML’s roadmap gave TSMC the node leadership no integrated device manufacturer could match.

They didn’t plan to build a duopoly. They kept showing up for each other when the technology needed one more generation of commitment to become real. The three American equipment companies — Applied Materials, Lam, KLA — built the tools that made TSMC possible. The Dutch-Taiwanese partnership built the manufacturing culture that knew how to run them. The tools stayed in California. The judgment went to Hsinchu.

Pan Am and Boeing ran the same play a decade earlier. Neither knew they were building something that would reshape global travel. They were solving immediate problems for each other — Pan Am needed seats, Boeing needed a customer large enough to justify the factory. The 747 was the output of that mutual dependency, not a grand vision executed from the top down.

Tan’s memo reads the same way. Not a grand vision — a natural partnership. A customer with an uninterrupted need for silicon, a supplier with allocated capacity and no queue. Tan has assigned his chief of staff and interim CTO, Pushkar Ranade — an 18-year Intel veteran — to manage the engagement directly, with Tan overseeing it personally. “I have asked Pushkar to assemble and engage select technologists across the company to contribute to this project,” Tan wrote. That is not a business development assignment. That is the company’s institutional memory being pointed at the problem.

What makes the situation urgent is that the judgment Hsinchu accumulated is exactly what Hillsboro is now losing.

The process tech laid off at National Semiconductor in the 1980s whose departure preceded an immediate yield collapse is the oldest story in semiconductor manufacturing — management cutting what it couldn’t measure, losing what it couldn’t replace. Intel’s layoffs are running the same script right now, with engineers who ran 14nm and Intel 4 through production ramps walking out the door carrying knowledge no machine has recorded.

The fab managers executing those layoffs made locally rational decisions. So did every executive who offshored assembly, every investor who rewarded the fabless model, every university that defunded process engineering programs because students wanted to study AI. Each decision looked good on its own terms. The aggregate is TSMC’s Arizona operation running identical machines to Taiwan at lower yields — because the machines transferred and the judgment didn’t.

You cannot allocate budget for the intuition that forms when enough people who know what they are doing work in close enough proximity for long enough that the knowledge stops being individual and becomes environmental. The CHIPS Act funded the machines. It didn’t fund the community. AI is making this worse in a way the policy apparatus hasn’t fully reckoned with.

AI is doing to semiconductor process engineering education what offshoring did to semiconductor manufacturing employment — making it invisible as a career path at exactly the moment it becomes strategically critical. A Stanford graduate who could become a process integration engineer at Hillsboro is instead becoming a machine learning researcher at a billion-dollar startup. The pay, the status, and the trajectory all point the same direction, and it isn’t toward a fab.

AI runs on chips. The foundation models, inference engines, training clusters — all of it requires leading-edge silicon that requires the process engineering expertise the AI industry is pulling talent away from. The demand that makes AI possible is eroding the supply chain that makes AI possible. The next generation of potential process engineers is being pulled into AI’s orbit before they ever acquire the knowhow that needed recording. The reservoir isn’t just draining from the top. It is failing to refill from the bottom.

Musk isn’t requesting Intel’s most sophisticated capability. Tesla’s inference chips for Optimus and FSD are demanding but structurally simpler than the hyperscaler XPUs that exposed Intel’s yield limitations with Broadcom. Where Broadcom throws fastballs — face-to-face stacking, “Correct By Construction” yield requirements, zero tolerance for first-pass failure — Musk lobs softballs. Single-die inference chips, EMIB packaging Intel already owns, volumes that ramp gradually. Intel can hit that. Every wafer it runs for Tesla is a yield learning cycle on 18A — batting practice that makes the fastball more hittable the next time Broadcom steps onto the mound.

Intel also has something Terafab couldn’t acquire independently: ASML machines already installed, allocated, and running on a node with no external customer queue. Musk doesn’t need to build a fab. He needs to fill one that already exists and is currently running below its potential. TSMC’s allocation is spoken for years out. Samsung is committed. There is no line to cut at either foundry. Intel is sitting on capacity the rest of the industry can’t offer.

Pan Am’s 747 order didn’t make Boeing immediately competitive with Concorde. It kept Boeing solvent and learning while the prestige race burned money in the background. Musk’s inference chip orders may do the same for Intel Foundry — keeping the organization alive, building process confidence, absorbing displaced Hillsboro talent into a program with real volume and real deadlines before that talent disperses entirely.

The skepticism is warranted. Brad Gastwirth, global head of research at supply chain firm Circular Technology, noted last week that while “the ambition implied is significant,” visibility into execution remains limited. “There is no defined timeline to high volume manufacturing, no disclosure around capital intensity or cost per wafer, and no guidance on yield ramp expectations — which are critical given how sensitive advanced node production remains.” Those gaps are real. Tan’s promise to disclose the scope in coming weeks will either close them or widen them.

The warning the Pan Am analogy carries is the one Boeing learned the hard way. Pan Am pushed faster than the engineering was ready, contributing to early reliability problems that nearly killed the 747 before it found its footing. Musk’s timelines carry the same risk. The customer who saves you can also be the customer who breaks you if the pressure outruns the capability. Tweets never rescued a process node that couldn’t sustain yield.

The machines are extraordinary. The equipment suppliers encoded more process expertise into capital than any industry in history. But yield still lives in people. And the people are leaving. The window is closing —one career change at a time, the way it was lost.

Tan’s memo said Intel is a natural partner. The 747 flew. Whether Intel’s version does is the question the next few years will answer.

Also Read:

 


Live Event: Engineering the Future of AI Systems

Live Event: Engineering the Future of AI Systems
by Daniel Nenni on 04-21-2026 at 8:00 am

Engineering the Future of AI Systems Keysight

The rapid acceleration of artificial intelligence (AI) workloads is placing unprecedented demands on system design, validation, and performance optimization. To address these challenges, Keysight Technologies presents its forward-looking event, Engineering the Future of AI Systems—a technical deep dive into the tools, methodologies, and measurement strategies required to build next-generation AI infrastructure.

The Keysight DES Roadshow brings together experts across Design & Verification, CAE, Software Test Automation, Data Management, and more to show how organizations can build a complete foundation for AI-driven engineering.

When: 4/29/26, 9:30 AM – 3:00 PM
Where: Keysight Offices, Santa Clara, CA

RESERVE YOUR SPOT

This event is tailored for engineers, system architects, and R&D leaders working across data centers, high-performance computing (HPC), and advanced semiconductor ecosystems. As AI models grow in complexity—driven by larger parameter counts, distributed training, and real-time inference requirements—the underlying hardware and interconnect architectures must evolve in lockstep. Keysight’s session provides a rigorous examination of how to design, validate, and scale these systems efficiently.

At the core of the discussion is signal integrity and high-speed data transfer. AI systems depend heavily on ultra-fast interconnects such as PCIe Gen5/Gen6, CXL, and high-bandwidth memory interfaces. The event explores how engineers can accurately characterize channel performance, mitigate jitter and noise, and ensure compliance with emerging standards. Using advanced measurement techniques and simulation workflows, Keysight demonstrates how to reduce design risk while accelerating time-to-market.

Another focal point is power integrity and thermal management—two critical constraints in dense AI compute environments. As GPUs and AI accelerators push power envelopes higher, maintaining stable voltage delivery and managing heat dissipation becomes increasingly complex. The event outlines best practices for dynamic power analysis, transient response validation, and system-level thermal modeling. These insights are essential for sustaining performance under real-world workloads while avoiding reliability issues.

Keysight also addresses the growing importance of co-design across hardware and software layers. Modern AI systems are no longer optimized in silos; instead, they require tight integration between silicon design, firmware, and application workloads. The event highlights how measurement-driven design approaches can bridge these domains, enabling engineers to validate performance against actual AI use cases rather than synthetic benchmarks alone.

In addition, attendees will gain exposure to cutting-edge test automation and digital twin methodologies. By leveraging virtual prototyping and automated validation frameworks, engineering teams can iterate more rapidly and identify bottlenecks earlier in the design cycle. Keysight showcases how these techniques reduce costly redesigns and improve overall system robustness.

The event also touches on scalability challenges in AI clusters and hyperscale data centers. Topics include high-speed networking validation, synchronization across distributed systems, and latency optimization. As AI workloads increasingly rely on parallel processing across thousands of nodes, ensuring deterministic performance and minimal communication overhead is crucial. Keysight’s expertise in network emulation and protocol testing provides actionable guidance for addressing these issues.

A distinguishing feature of Engineering the Future of AI Systems is its emphasis on practical application. Rather than remaining purely theoretical, the session incorporates real-world case studies and measurement scenarios drawn from leading-edge AI deployments. This approach allows participants to directly translate insights into their own design and validation workflows.

Ultimately, this event positions Keysight at the forefront of AI system engineering, offering a comprehensive toolkit for tackling the most pressing technical challenges in the field. For organizations striving to remain competitive in the AI race, the ability to design reliable, high-performance systems is no longer optional—it is foundational.

By attending, engineers will not only deepen their understanding of AI infrastructure complexities but also gain access to proven methodologies that streamline development and enhance system confidence. In an era where innovation cycles are shrinking and performance expectations are soaring, Keysight’s expertise provides a critical advantage in engineering the future of AI systems.

Agenda

Time
9:30 AM Doors Open
10:00 AM Keynote: Engineering the Future of Design with AI
10:05 AM Design & Verification: The AI Hardware Revolution—Are Your Design Flows Ready?
10:35 AM Computer-Aided Engineering: From Simulation to Insight—The Rise of Predictive Engineering
11:05 AM Software Quality Engineering: Beyond Scripts—AI-Driven Testing from the User’s Perspective
11:35 AM
Engineering Data Management: From Data Chaos to AI-Ready Engineering
12:05 PM Optical Design Engineering: Designing the Next Generation of Intelligent Optical Systems
12:35 – 1:15 PM Lunch + Hands-On Demo Stations
1:15 – 3:30 PM Breakout Sessions

RESERVE YOUR SPOT

Breakout Sessions

Design and Verification

      • The AI Hardware Challenge: Why Next-Generation Design Requires a New EDA Platform
        Understand the growing complexity of AI hardware and why unified EDA platforms are essential to scale design, simulation, and verification.
      • AI-Designed Analog Chips: From Research to Real-World Design
        Explore how AI is transforming analog design with real-world examples and forward-looking insights.
      • PowerArtist: Building Energy-Smart Chips
        Learn how early RTL power analysis enables faster, more efficient chip design and helps prevent costly late-stage issues.

Software Test Automation

      • Testing from an End User’s Perspective with Keysight Eggplant
        Validate software exactly as users experience it, moving beyond scripts with AI-driven testing.

Computer-Aided Engineering

      • Virtual Prototyping Redefined
        Accelerate development with predictive, real-time, and immersive simulations across multiphysics domains.

Engineering Data Management

    • AI-Ready Engineering Data: Preparing for AI with SOS Enterprise
      Transform fragmented design data into structured, governed, AI-ready assets that enable scalable innovation.

RESERVE YOUR SPOT

Also Read:

Analog Bits Demos Real-Time On-Chip Power Sensing and Delivery on N2P at the TSMC 2026 Technology Symposium

WEBINAR: Beyond Moore’s Law and The Future of Semiconductor Manufacturing Intelligence

When a Platform Provider Becomes a Competitor: Why Arm’s Silicon Strategy Changes the Incentives


proteanTecs at Chiplet Summit – Changing the Game for Health & Performance Monitoring of Chiplets

proteanTecs at Chiplet Summit – Changing the Game for Health & Performance Monitoring of Chiplets
by Mike Gianfagna on 04-21-2026 at 6:00 am

proteanTecs at Chiplet Summit – Changing the Game for Health & Performance Monitoring of Chiplets

The recent Chiplet Summit 2026 was a great place to learn about new chiplet designs, emerging standards, and a growing array of support technologies to help design and manufacture chiplet-based systems. In my travels at the show, I found a lot of technology that fit these descriptions. But there were also companies at the show that took a different approach to support chiplet design. We’re all aware of the importance of a well-centered design that delivers optimal performance and power consumption in the smallest footprint.

Starting with a solid design that is extensively verified before tapeout is an important step to achieve this goal. But production and real-world uncertainties can create problems for even the best design. proteanTecs is one of the companies that uses novel technology to approach this problem. Their approach delivers accurate information about the chip throughout its lifetime and takes action on this data to ensure consistent health while optimizing performance and power. And all this is done without impacting die size. Let’s take a look at how proteanTecs is changing the game for health & performance monitoring of chiplets.

My Tour Guide

Nir Sever

I met Nir Sever, Senior Director of Business Development for proteanTecs in the company’s booth at Chiplet Summit. Nir has been with proteanTecs for over six years. Previously, he had a long history in executive management, chip design and design methodology at companies such as Tehuti Networks, 3dfx, Cadence and Zoran.

Nir explained the significant capabilities offered by proteanTecs and he demonstrated how it all worked in a live demo on the show floor with a real commercial application.

The Demo

The first point Nir covered was the scope of what proteanTecs delivers to its customers. proteanTecs is offering monitoring which is much more than the simple concept of embedded sensors. Leveraging its propriety ML-driven software engine, the software analyzes the details of the design, analyzes elements such as block size and number of clock and power domains.

Demo System

Then, based on this analysis a recommended configuration of monitoring agents and Hardware (HW) Monitoring System Infrastructure is deployed. Once this configuration is agreed to, proteanTecs software does additional analysis to determine where to place each of the agents. A key point here is that this work is done *after* the design is placed and routed. So, the agents literally fit in the white space of the design thereby avoiding any additional overhead. Agent placement is an important step to optimize proximity to areas to be monitored for tight interaction between the agents.

Once implemented, the HW Monitoring System delivers critical information about the circuit without impacting die size or performance.  Nir demonstrated the solution, compatible for all major implementation flows, on an Alphawave Semiconductor system with a datacenter grade high-speed optical networking chip built in a TSMC 5nm process.

The diagram below shows the configuration and placement of the agents for the demo circuit.

Configuration and Placement of the Agents

Once the Agents are embedded in the design, Nir highlighted some key capabilities. The first capability he showed was Continuous Performance Monitoring. This capability reads the information from all agents periodically and visualizes it. This information is useful for system designers to help them understand how different chip configurations or workloads impact overall performance. It’s similar to a logic analyzer/virtual scope embedded in the system. The diagram below is screen capture of a fully populated display.

Continuous Performance Monitoring

Margin to Timing Failure is tracking the critical path for the chip. Each of the traces in this display represent the information from one of the Margin Agents (blue dots) in the “Floorplan” figure. DC Voltage displays the voltage at various locations. V&T Stress shows the impact of voltage and temperature stress over time. Frequency depicts the stress from the toggle rate, which reflects aging in the chip. Clock Cycle-to-cycle Jitter measures noise on the clock and Effective Cycle Time Delta measures power supply noise. A complete view of chip status and behavior delivered in real time.

This mode performs passive display of the chip’s performance. Next, Nir moved to a mode where the embedded system performed active optimization and enabled adaptive voltage scaling. He explained that a typical chip operates with a voltage that has sufficient margin to ensure the chip operates correctly over its lifetime. While this strategy delivers predictable performance, it wastes power in the early stage of the chip’s lifetime, when it can safely operate at a lower voltage.

The adaptive voltage scaling function delivers the capability to “tune” the operating voltage to a safe level over the lifetime of the chip. The diagram below summarizes how this works.

Adaptive Voltage Scaling

The left display has the familiar Margin to Timing Failure traces. The green line below the traces shows the safe minimum timing value. You can see the system’s default operation is significantly above this value, meaning it is operating at an unnecessarily high voltage. The system then begins to lower the operating voltage as shown on the right. This continues until the chip is operating at the lowest safe timing threshold. On the far left, you can see a power saving of 11.64% has been achieved. And because the chip is operating at a lower voltage the lifetime of the chip has been extended by 16.28%. Nir explained that the code to perform this optimization is embedded in the on-chip application. It is completely self-contained and requires no external cloud access.

For demonstration of the safety-net, Nir forced a lower operating voltage to drive the system below the safe operating margin. Within two clock cycles the problem was sensed and corrected, keeping the chip exactly at the minimum safe operating point. This is one of many potential applications that can be deployed using proteanTecs. Nir explained that customers can develop their own applications as well. Since most of the processing is done on the embedded hardware, the size of a typical application is quite small at 10 – 50 kilobytes. It became clear to me that the proteanTecs technology delivers safe, robust operation across the lifetime of the chip with very little overhead.

To Learn More

During my time with Nir at the proteanTecs booth I saw an impressive demonstration of what proteanTecs can deliver to optimize any design over its lifetime. This discussion only scratches the surface of what the company offers.

You can learn more on their power reduction solution and AVS Pro here. You can also learn more about the depth of extensive capabilities offered by proteanTecs here. And that’s proteanTecs at Chiplet Summit – changing the game for health & performance monitoring of chiplets.

Also Read:

Intelligent Networks: Power, Reliability, and Maintenance in Telecom — Webinar Preview

Accelerating NPI with Deep Data: From First Silicon to Volume

Failure Prevention with Real-Time Health Monitoring: A proteanTecs Innovation


WEBINAR: Intrinsic Techniques in RF Power Amplifier Design

WEBINAR: Intrinsic Techniques in RF Power Amplifier Design
by Don Dingee on 04-20-2026 at 10:00 am

Intrinsic node overview

Load-pull power amplifier (PA) design techniques determine the optimal impedances at the power transistor’s extrinsic reference plane, which is the physically accessible boundary for measurement or simulation. This reference plane can be the package transistor leads, die bond pads, or IC chip terminals. It includes the parasitic resistance, capacitance, and inductance inside the GaN device. A simulation-based GaN PA design can supplement this load-pull technique by providing access to the power device’s intrinsic current source. Enhanced GaN models in Keysight’s Advanced Design System (ADS), along with intrinsic techniques like load-line analysis, help RF engineers design high-frequency, highly efficient PAs.

The second webinar session in Keysight’s RF PA design master class series, led by Matt Ozalas, Principal Product Owner for ADS and Scientist, and Joe Schultz, RF Solutions Engineer, explores intrinsic techniques in detail. (The first session focused on extrinsic techniques, particularly load-pull analysis.) Webinar 2 introduces a simple Class J PA design with an intrinsic-node model, examines the history of intrinsic modeling and analysis techniques, and walks through in-depth simulations of the Class J amplifier in an ADS intrinsic analysis workspace.

Waveform engineering applied to a Class J PA

An intrinsic node is a virtual construct designed for simulation. Matt starts with a simplified GaN transistor model that surrounds an ideal current generator with non-ideal parasitic elements. Representing parasitics faithfully increases behavior fidelity, which in turn enables a more accurate simulation of efficiency. In the chart below, power tracks between extrinsic and intrinsic analyses, but drain efficiency differs considerably between the two techniques, with the intrinsic analysis being more accurate.

That’s important for a Class J PA, which begins with a construct similar to a Class B amplifier but tunes the harmonic load impedances so that power delivery is zero at the harmonics, theoretically allowing maximum PA efficiency at full voltage swing. Reaching that efficiency requires waveform engineering, strategically reducing the overlap between voltage and current waveforms to minimize resistive power dissipation. “Once we’ve designed these waveforms mathematically, it’s possible to apply a Fourier transform, and then, rather than looking at time-domain waves, we look at harmonic frequency tones,” Matt says. Voltage and current tones at each harmonic frequency yield computed impedances. By applying those Fourier-transformed impedances back to the idealized current generator in the intrinsic transistor model, energy organizes itself to maximize achievable efficiency.

Building comprehensive intrinsic analysis in an ADS workspace

Joe is a recent addition to the Keysight team, with three decades of RF power amplifier design experience spanning Motorola, Freescale, and NXP. He begins with a bit of a digression, mentioning the power amplifier work of Steve Cripps, particularly his 1999 text, “RF Power Amplifiers for Wireless Communications,” and a seminal 2009 article by Paul Tasker on “Practical Waveform Engineering” in IEEE Microwave Magazine.

Joe also shares his own design experience. He worked with Motorola’s LDMOS transistor technology in his early career, which was state-of-the-art at the time. GaN technology can deliver four or five times the power density of LDMOS in amplifier applications, but only if designers pay proper attention to efficiency and power dissipation in PA designs.

After some slides on LDMOS versus GaN, Joe dives into the core of his presentation: a highly technical discussion of an intrinsic approach to RF PA design. The foundation of the analysis is Cripps’ load-line method, which uses swept DC current-voltage (DCIV) analysis with full-scale voltage and current excursions to generate power contours by tracing constant-resistance and constant-conductance circles for different power points on a Smith chart.

Joe also discusses conduction angle, the portion of the RF cycle where current conduction occurs, and one of the differentiators between amplifier classes. The presentation then builds into a GaN FET analysis in ADS, using a new Keysight ASN-HEMT GaN FET demo model and running DCIV, S-parameter, AC sweeps, harmonic balance, and both load-pull and load-line analysis to establish matching impedances. The power of ADS’s data display with the full results is evident in one of Joe’s concluding slides, though a screenshot alone doesn’t do the narrative justice.

If you are interested in RF power amplifier design or learning more about extrinsic load-pull and intrinsic load-line analysis with ADS, hearing from Keysight’s experts makes the time spent viewing these two webinars worthwhile – and yes, the ADS workspaces used in these webinars are available online for hands-on learning. Register for both on-demand sessions at this link:

RF Power Amplifier Design MasterClass Webinar Series

Also Read:

WEBINAR: Two-Part Series on RF Power Amplifier Design

On the high-speed digital design frontier with Keysight’s Hee-Soo Lee

2026 Outlook with Nilesh Kamdar of Keysight EDA


Analog Bits Demos Real-Time On-Chip Power Sensing and Delivery on N2P at the TSMC 2026 Technology Symposium

Analog Bits Demos Real-Time On-Chip Power Sensing and Delivery on N2P at the TSMC 2026 Technology Symposium
by Mike Gianfagna on 04-20-2026 at 6:00 am

Analog Bits Demos Real Time On Chip Power Sensing and Delivery on N2P at the TSMC 2026 Technology Symposium

Analog Bits has a way of stealing the show at every event they attend. The formula is actually quite straight-forward – come to the show with the most relevant, highest impact IP running on the most advanced process. The company will be applying this strategy again at the upcoming TSMC 2026 Technology Symposium with an array of real-time on-chip sensing and delivery IP on TSMC’s N2P process.

The latest megawatt AI and HPC systems using multi kilowatt SoCs face thermal, power efficiency, performance variability and reliability challenges that digital design alone cannot solve. Traditional approaches are no longer effective as transistors speed up and voltages scale down, creating increasing power density. Multi-chip packaging compounds power problems as well.

Addressing these power density challenges requires new approaches that leverage advanced processes along with architectural-level optimizations to achieve power targets. These are the capabilities Analog Bits will bring to the TSMC event. Let’s look a bit closer as Analog Bits demos real-time on-chip power sensing and delivery on N2P at the TSMC 2026 Technology Symposium.

The Hardware

Below is a photo of the TSMC N2P test board with the test chip inserted.

ABITCN2P2 Test Board : Test Chip

And here is the test chip layout.

ABITCN2P2 Test Chip Layout

The Demos

Here is an overview of some of the capabilities that you will see at the show.

LDO: A linear regulator with a small difference between input and output voltage. Example: 50-100mV. Benefits of the on-die LDO include improved power efficiency & signal integrity, fast transient response and efficient regulation, voltage scalability, integration and space savings, and noise reduction.

Target customer use cases include high-performance CPU (ARM) cores and high lane count high performance SERDES. There are multiple working silicon examples in N3P. The N2P LDO delivers a 30% area reduction and ultra-high bandwidth operation.

The Droop Detector provides an always on sensor for security hacks on TSMC N2P.

The Glitch Catcher provides high frequency detection for high reliability on TSMC N2P.

The Ultra Low Power PLL supports microwatt class low power applications on TSMC N2P.

A Low Jitter C2C PLL (up to 20GHz), ultra-low power PLL, and patented pinless core powered PLL are all available on TSMC N2P.

A high accuracy PVT sensor and pinless high accuracy PVT sensor are both available on TSMC N2P.

Also being featured for the first time are the highly accurate remote pinless PVT sensors with a +/-3.5C (untrimmed), and low power PLLs that feature microwatt class power levels at 0.5 micro-watt/MHz.

These new IPs will provide significant advantages to customers seeking PPA optimization and intelligent on-chip power management for advanced SoCs on TSMC N2P process technology. Mahesh Tirupattur, CEO of Analog Bits, commented, “Our integrated on-die LDO provide an exceptionally clean form of power delivery, coupled with the glitch catcher and droop detector features, making power observable in real-time and enabling fast corrective actions to be taken almost instantaneously.”

To Learn More

The 2026 TSMC Technology Symposium will be held on April 22, 2026, at the Santa Clara Convention Center. You can register for the event here.  Be sure and stop by Analog Bits at booth #608 to see a demo of over 12 IPs on the TSMC N2P test chips. And that’s how Analog Bits demos real-time on-chip power sensing and delivery on N2P at the TSMC 2026 Technology Symposium.

Also Read:

2026 Outlook With Mahesh Tirupattur of Analog Bits

Podcast EP322: A Wide-Ranging and Colorful Conversation with Mahesh Tirupattur

Analog Bits Steps into the Spotlight at TSMC OIP


Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint

Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint
by Daniel Nenni on 04-19-2026 at 2:00 pm

Intel Sambanova SemiWiki Graphic

SambaNova Systems and Intel have introduced a blueprint for heterogeneous inference that reflects a significant shift in how modern large language model (LLM) workloads are deployed. Instead of relying on a single accelerator type, the proposed architecture assigns different phases of inference to specialized hardware: GPUs for prefill, SambaNova Reconfigurable Dataflow Units (RDUs) for decode, and Intel® Xeon® 6 CPUs for agentic tools and orchestration. This design addresses the growing complexity of agentic AI systems, where reasoning loops, tool calls, and iterative execution create heterogeneous compute demands that cannot be efficiently served by homogeneous accelerator clusters.

At the core of the proposal is the observation that inference is not a monolithic workload. It consists of distinct computational phases with different performance bottlenecks. The prefill phase processes the user prompt, computes attention matrices, and builds key–value caches. This stage is highly parallel and compute-intensive, making GPUs the most efficient hardware choice. GPUs excel at dense matrix operations and high-throughput tensor math, allowing rapid ingestion of large prompts and minimizing time-to-first-token latency. By isolating prefill onto GPU resources, the architecture ensures high utilization of GPU compute capabilities without wasting cycles on sequential token generation.

Following prefill, the workload transitions to the decode phase, where tokens are generated one at a time. Decode is fundamentally different from prefill: it is memory-bandwidth bound and heavily dependent on efficient access to attention caches. GPUs, while powerful, often underperform in decode scenarios because their architecture is optimized for large batched operations rather than sequential token generation. SambaNova’s RDUs are designed specifically for dataflow-oriented execution, enabling optimized memory access patterns and efficient handling of transformer inference during decode. This specialization improves token throughput and reduces latency, especially for long-context or multi-step reasoning workloads.

The third component of the architecture is the use of Intel® Xeon® 6 CPUs for agentic tools and orchestration. Agentic AI systems increasingly involve external actions such as database queries, API calls, code execution, and workflow management. These tasks are not well suited for accelerators but benefit from general-purpose CPU capabilities, large memory footprints, and mature software ecosystems. Xeon 6 processors act as the control plane, coordinating execution between GPUs and RDUs while also handling tool invocation, validation, and decision logic. This separation allows accelerators to remain focused on model inference while CPUs manage procedural logic and integration with enterprise systems.

This heterogeneous architecture delivers several system-level benefits. First, it improves hardware utilization by ensuring each processor operates within its optimal performance envelope. GPUs handle parallel compute-heavy tasks, RDUs manage memory-bound token generation, and CPUs execute control and orchestration logic. Second, the design enhances scalability for agentic workloads. As agents perform multiple reasoning steps, decode latency accumulates; specialized RDUs mitigate this bottleneck. Third, the architecture enables modular infrastructure scaling, allowing organizations to independently scale GPU, RDU, and CPU pools depending on workload demands.

Another key advantage is improved cost efficiency. GPU-only deployments often suffer from underutilization during decode or orchestration phases. By offloading those tasks to specialized hardware, the system reduces the need for excessive GPU capacity. This approach aligns with emerging data center trends that emphasize disaggregated compute and composable infrastructure. Additionally, using x86-based CPUs for orchestration ensures compatibility with existing enterprise software stacks, reducing integration complexity.

The blueprint also highlights the evolution of AI workloads toward agentic reasoning systems. Traditional chat-style inference involved single-pass generation, but modern agents iteratively plan, execute, and refine outputs. These workflows create alternating compute patterns: dense prompt processing, sequential decoding, and CPU-driven tool execution. A heterogeneous architecture maps naturally to this pattern, reducing performance bottlenecks and improving responsiveness.

In summary, the SambaNova–Intel blueprint demonstrates a practical pathway toward next-generation AI infrastructure. By combining GPUs for prefill, RDUs for decode, and Xeon 6 CPUs for agentic tools, the architecture reflects a shift from homogeneous accelerator clusters to specialized compute fabrics. This design improves performance, utilization, and scalability for agentic AI workloads, and it signals how future AI data centers may evolve to support increasingly complex reasoning systems.

Building the Blueprint for Premium Inference

Also Read:

Intel, Musk, and the Tweet That Launched a 1000 Ships on a Becalmed Sea

Agentic AI Demands More Than GPUs

Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era


CEO Interview with Johan Wadenholt Vrethem of Voxo

CEO Interview with Johan Wadenholt Vrethem of Voxo
by Daniel Nenni on 04-19-2026 at 12:00 pm

Johan Wadenholt Vrethem Voxo

With over two decades of experience bridging technology and business, Johan Wadenholt Vrethem focused on harnessing AI to transform how organizations operate and engage with their customers. After leading critical digital initiatives and client engagements in the banking and finance sectors at CGI, Johan co-founded Voxo to drive innovation in conversational analytics and event technology.

Today, through Voxo Insights and Voxo Event, they are delivering real-time, AI-powered understanding of customer interactions and event discussions—empowering teams to act on data in ways they never thought possible.

Tell us about your company.

Voxo is an AI event content partner. We capture everything spoken on stage: keynotes, panels, and roundtables, and turn it into structured, branded, shareable content within minutes of a session ending.

We started in Stockholm in 2016, initially in conversational analytics for financial advisory and customer service. That product taught us what it actually takes to build reliable speech AI in demanding real-world environments, with accuracy requirements, speaker variations, and real-time latency.

The pivot to events came from a clear market signal. We summarized all stage sessions at Techarena 2024, Scandinavia’s largest tech conference. The demand afterward was immediate, and we decided to launch this as an event-specific product.

Today, we work with enterprise customers globally. Partners include event tech platforms such as RainFocus and Amego, and customers such as HubSpot, GitHub, and Intuit.

What problems are you solving?

Events are among the most information-dense environments in the world, and almost all of that intelligence evaporates the moment a session ends. A speaker walks off stage after 45 minutes and without a summary, the insights are effectively gone. Our summaries are live minutes after the speakers walk off the stage, with key takeaways that are branded and ready-to-post on social media.

On the production side, creating post-event content from hundreds of sessions takes marketing teams weeks. By then, the audience has moved on. Our customers tell us that for them to create what Voxo delivers in a day would have taken their teams months.

For attendees, the service adds more value to the event experience. It’s impossible to be in three sessions at once. With Voxo, they get a summary of every session they missed, available immediately. At The AI Summit New York, that meant 25,000 summary downloads across roughly 200 sessions. The content doesn’t expire; it can be used for future marketing efforts to promote next year’s event.

What application areas are your strongest?

Enterprise conferences and large-scale industry summits. Multi-day, multi-track events with demanding quality standards and marketing teams that need to move fast. We’re also about to launch a self-service SaaS platform for events as well, that’ll make it easier for smaller organizations and enterprise organizers to utilize the whole event agenda in their content schedule during and after the event.

What keeps your customers up at night?

Accuracy. Publishing something inaccurate for a global event brand is a real risk. One wrong summary published with the wrong company name or speaker could be damaging. That’s why we built a human-in-the-loop review into the workflow. We train our event specialists to review quickly so we don’t lose the speed advantage.

Other than that, it’s quality. We have to maintain quality across 200 sessions simultaneously, with sometimes imperfect audio, accented speakers, and no time to redo anything. That’s what we’re built for.

And beyond that, there’s the ROI question. Events are expensive. Sponsorships are expensive. So our customers need to show that their event is actually creating value for stakeholders. We measure every summary download and can connect it to the personal data from the event apps, so the sponsors get more visibility and also tangible ROI.

What does the competitive landscape look like and how do you differentiate?

There are a lot of transcription tools out there like Otter and Fireflies, and now a wave of AI tools built for events. But what’s different about our offer is that we’re not just another tool that generates summaries. We’ve focused on the layer around it, how content is actually captured, structured, branded and quality-checked. Because that’s the hard part at scale, event marketers know this better than anyone else.

Capturing and creating content at a multi-stage event is a massive undertaking, and we’re adding a layer of valuable content that benefits everyone from organizers and speakers, to attendees.

We’ve also seen on-demand viewing increase by up to 400% when summaries create that initial interest, so it feeds into the already existing event content structure as well.

What new features and technology are you working on?

We’re focusing on deepening the personalization for attendees. Right now, we deliver session summaries to everyone equally. The next step is understanding who the attendee is (their role, their interests, what they attended) and surfacing content most relevant to them. This is a feature we’re delivering for the first time in June with an enterprise client.

Within the year, we will release a fully customizable content lab with new formats, more interactive content, and deep integrations into social media platforms. Earlier this year, we released AI Podcasts that are created on-the-fly during the event, and we’re also experimenting with motion graphics and automated video creation to be released soon!

The self-service platform will be a major release from us, enabling the vast majority of the market to use the same tools we provide to our enterprise clients today to medium and smaller companies as well.

How do customers normally engage with your company?

Event organizers or enterprise marketing teams often find us through a recommendation or from attending an event we’ve summarized. If it’s through an event, then they’ve experienced the attendee side of the summaries and usually pretty intrigued right off the bat.

Or it’s through event platform partnerships like RainFocus or Amego. The organizer is already using their platform; they see Voxo as a partner, and the integration is smooth sailing from there.

The best salespeople are our existing happy customers, which is something we’re really proud of since we’re a new technology application in the event industry. So far, more than 90% of our clients from 2025 have signed up for the services again for this year.

Also Read:

CEO Interview with Dr. Hardik Kabaria of Vinci

CEO Interview with Steve Kim of Chips&Media

CEO Interview with Jussi-Pekka Penttinen of Vexlum


TSMC to Elon Musk: There are no Shortcuts in Building Fabs!

TSMC to Elon Musk: There are no Shortcuts in Building Fabs!
by Daniel Nenni on 04-17-2026 at 10:00 am

Elon Musk Terafab 2026

The opening of the TSMC 2026 earning call series brought no surprises. CC Wei has done more than 30 such calls since taking the CEO position in 2018 and he never fails to disappoint. Once again, CC Wei reported numbers above guidance driven by strong demand and flawless execution. This illustrates the benefit of TSMC’s close collaborations and deeply trusted relationships with partners and customers. The TSMC forecast is the most trusted forecast the semiconductor industry will ever see, absolutely.

I do remember the one-time CC Wei did disappoint on an earnings call and that was during COVID which was a painful supply chain lesson for all. CC Wei turned that COVID supply chain experience into a “Why supply chain trust and resilience is so important” master class that goes to the heart of the TSMC mission statement and that is Trust.

“Our mission is to be the trusted technology and capacity provider of the global logic IC industry for years to come.”

As expected, TSMC N5 and N3 accounted for the majority of 2026 revenue meaning that margins are also well above 60% and look to stay that way in the not-so-distant future. TSMC N3 is also fast approaching the 5-year depreciation mark so TSMC corporate margins will only go up from here.

As we discussed before, TSMC N3 is the final node in the record setting FinFET family of process technologies and it has ZERO competition in the merchant foundry business. I remember tracking design wins when N3 was first launched and realizing that TSMC N3 would be the most dominant process node I would ever see in my 40+ year semiconductor career and that is certainly the case as it stands today.

CC Wei: In Taiwan, we are adding a new 3-nanometer fab to our GIGAFAB cluster in Tainan Science Park. Volume production is scheduled for the
first half of 2027. In Arizona, our second fab will also utilize 3-nanometer technologies. Construction is already complete and volume
production will begin in the second half of 2027. In Japan, we now plan to utilize 3-nanometer technology in our second fab and volume
production is scheduled in 2028.

CC Wei also discussed moving more N5 capacity to N3. Samsung has reportedly fixed their yield problems at 5/4nm so it makes complete sense for TSMC to focus on the higher margin N3 process technologies. Besides, it is easier to move a TSMC N5 design to TSMC N3 than to Samsung 4nm and much easier than moving a design to Samsung 3/2nm (GAA) so CC Wei’s strategy is clear and sound.

CC Wei: Next, let me talk about our N2 capacity expansion plan. Our practice is to prioritize the land in Taiwan to support the fast ramp of our newest
node due to the need for tight integration with R&D operations. Today, our new node, N2, has already entered high-volume manufacturing
in the fourth quarter of 2025 with good yield. N2 is ramping successfully in multiple phases at both Hsinchu and Kaohsiung site, supported
by strong demand from both smartphone and HPC/AI applications.

In regards to TSMC N2, TSMC’s N3 dominance not only sets up customers for a smooth transition to the N2 process family, it brings forward the strongest ecosystem of partners the semiconductor industry has ever seen, which is a very big deal. There is little doubt that TSMC will dominate the 2nm process node. I’m just wondering how big the NOT TSMC market will be at 2nm? It was next to zero at 3nm due to the lack of competition. I hope 2nm will be different with Intel Foundry 18AP and Samsung Foundry SF2 offering viable alternatives to TSMC N2, and maybe even Rapidus 2nm.

The call was closed out with a TSMC A14 Status. Will TSMC A14 again dominate the foundry business? Or better yet; How big will the NOT TSMC market be at 14 Angstrom? It is too soon to tell but my guess would be that the NOT TSMC market will continue to grow due to supply chain concerns.

CC Wei: Finally, let me talk about our A14 status. Featuring our second-generation nanosheet transistor structure, A14 will deliver another full-node
stride from N2, with performance and power benefit to address the insatiable need for high performance and energy efficient computing. Compared with N2, A14 will provide 10% to 15% speed improvement at the same power for 25% to 30% power improvement at the same
speed and close to 20% chip density gain.

Our A14 technology development is on track and progressing well. We are observing a high level of customer interest and engagement from both smartphone and HPC applications. Volume production is scheduled for 2028. Our A14 technology and its derivatives will further extend our technology leadership position and enable TSMC to capture the growth opportunities well into the future.

Of course there were references to Elon Musk and Terafab during the Q&A. CC Wei offered Elon Musk some very sound advice:

CC Wei: Again, let me say that it takes two to three years to build a new fab. No shortcuts. And it takes another one to two years to ramp it up. Again,
that’s a fundamental of foundry industry. And whether we try to win them back (Intel and Tesla), actually, they are still our customers and we are very confident in our technology position. And we work very hard to capture every piece of business possible.

Did you get that Elon? No short cuts in semiconductor manufacturing.

In regards to CapEX, TSMC raised CapEX from $40-41B in 2025 to $52-56B in 2026 which is huge! CC Wei mentioned that TSMC would probably be at the high end of that when asked during the Q&A. In my opinion TSMC will definitely be at the high end of that and maybe even higher. It all depends on how well the NOT TSMC market is developing

Also Read:

TSMC Technology Symposium 2026: Advancing the Future of Semiconductor Innovation

Global 2nm Supply Crunch: TSMC Leads as Intel 18A, Samsung, and Rapidus Race to Compete

TSMC Process Simplification for Advanced Nodes


Podcast EP342: The Evolution and Impact of Physical AI with Hezi Saar

Podcast EP342: The Evolution and Impact of Physical AI with Hezi Saar
by Daniel Nenni on 04-17-2026 at 6:00 am

Daniel is joined by Hezi Saar, Executive Director of Product Marketing at Synopsys, Hezi is responsible for the mobile, automotive, and consumer IP product lines. He brings more than 20 years of experience in the semiconductor and embedded systems industries.

Dan explores the growing field of physical AI with Hezi, who explains this discipline focuses on edge AI applications where the AI interacts with humans and the surrounding world. He goes on to describe some of the unique requirements for this field, which include the ability to process text and vision information to allow AI to take action to accommodate the required capabilities. Hezi describes self-driving cars as one example.

He describes the unique and demanding latency, bandwidth and power requirements for physical AI applications. The impact of open standards is discussed, along with an assessment of the market’s future growth and expanding applications.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Speculation: Silicon’s Most Expensive Compulsion

Speculation: Silicon’s Most Expensive Compulsion
by Admin on 04-16-2026 at 10:00 am

SemiWiki Art

How Time-Based Scheduling
Reclaims Silicon Wasted by Speculative Execution

By: Dr. Thang Tran, Founder and CTO, Simplex Micro

I have spent my career designing processor architectures, and I have reached an uncomfortable conclusion: a substantial fraction of the silicon area and power in modern high-performance processors exists not to compute results, but to hide from software the fact that instructions execute out of program order.

Out-of-order speculative execution was a necessary engineering choice for an era dominated by general-purpose workloads with predictable branches and abundant independent instructions. Today, the workloads that drive compute investment–AI inference, scientific simulation, and EDA tool execution–have dependency structures that defeat the assumptions of speculative execution. These workloads pay the full cost of speculation machinery while receiving little of its benefit.

The ratification of RISC-V’s RVA23 profile confirms the direction. By making the vector extension mandatory, RVA23 shifts the performance burden from speculative scalar execution to explicit vector parallelism–and in doing so, makes simpler, deterministic scalar cores viable for the first time in mainstream application processors.

Published microarchitecture research documents this overhead consistently. The reorder buffer, reservation stations, register renaming logic, and branch prediction structures together consume an estimated 30 to 50 percent of processor core area in aggressive out-of-order designs. Branch prediction alone accounts for more than 10 percent of total chip power in high-end implementations.

Simplex Micro’s Time-Based Scheduling (TBS) architecture applies this principle to the vector processing unit (VPU), where it matters most for AI workloads. The Simplex CPU is a conventional superscalar out-of-order design; TBS governs the VPU, which executes deterministically and non-speculatively. In the VPU, vector instructions dispatch when their data is ready–not when a predictor guesses they might be ready, not speculatively into a future that may need to be discarded. For AI applications computing many data elements in parallel, speculation would multiply wasted work across every element in the vector. Non-speculative execution is not a constraint; it is the correct architectural choice for this workload class. The silicon recovered from speculation machinery in the VPU can be reinvested in execution units, cache capacity, or core count. These are the resources that determine throughput on the workloads that matter now.

The Problem with Speculation

Out-of-order execution was developed to keep processor execution units busy during data dependency stalls. When instruction B depends on the result of instruction A, a processor that must wait for A before starting B wastes cycles. The out-of-order solution is to look ahead in the instruction stream, find instruction C that does not depend on A or B, and execute C while waiting for A to complete. But implementing out-of-order execution requires four categories of hardware that have no computational purpose–they exist only to support the speculation machinery itself.

The reorder buffer holds instructions that have executed out of program order and tracks them until they can be retired in order, maintaining the architectural illusion of sequential execution. In the Intel P6 family and its descendants, the ROB also stores uncommitted register values, making it a heavily multi-ported structure. Published research characterizes it as “a complex multi-ported structure that dissipates a significant percentage of the overall chip power.” [1] Modern high-performance designs maintain 200 to 500 ROB entries simultaneously, each carrying instruction state, operand values, result data, and status bits tracking whether the instruction has completed speculatively.

Reservation stations hold instructions after decode, waiting for operands to become available. Each cycle, the reservation station logic scans all in-flight instructions, detects ready operands through tag-matching comparisons, and selects instructions for dispatch. This comparison logic fires continuously and scales in complexity with the number of in-flight instructions–not with the number of instructions actually ready to execute.

Register renaming eliminates false data dependencies–write-after-read and write-after-write hazards–that arise when instructions from different parts of the program reuse the same architectural register names within an out-of-order execution window. A processor with 16 architectural integer registers may maintain 180 to 256 physical registers to support renaming, along with rename tables and freelist management. Published analysis confirms: “All contemporary dynamically scheduled processors support register renaming to cope with false data dependencies… the ROB is a large multiported structure that occupies a significant portion of the die area and dissipates a sizable fraction of the total chip power.” [2]

Branch prediction allows the processor to speculatively execute instructions past unresolved conditional branches by guessing the outcome before the branch condition is evaluated. State-of-the-art predictors achieve over 95 percent accuracy but require multiple large prediction tables, branch target buffers, return address stacks, and indirect branch predictors. “High-end processors typically incorporate complex branch predictors consisting of many large structures that together consume a notable fraction of total chip power–more than 10 percent in some cases.” [3]

The published characterization of the aggregate cost is unambiguous: “Out-of-order scheduling logic requires a substantial area of the CPU die to maintain dependence information and queues of instructions… A larger portion of the chip in out-of-order processors is dedicated to issuing instructions out-of-order than to actual execution.” [4] More die area is devoted to the scheduling machinery than to the execution units that compute results. This is the overhead cost of hiding that fact from software.

Itemizing the cost

The Reorder Buffer’s area cost scales with entry count and port count. A 500-entry ROB storing full instruction state and operand values with the multi-porting required for simultaneous read and write access at peak instruction bandwidth is a large SRAM structure. Research published in the Journal of Supercomputing notes that “naive scaling of the conventional reorder buffer architecture can severely increase the complexity and power consumption,” [5] confirming that area scales super-linearly with ROB capacity as designers push for larger instruction windows.

Conservative published estimates place the ROB at 5 to 10 percent of total core area in high-performance designs. Die photo analysis of commercial out-of-order processors–one of the few sources of actual area breakdown data–confirms the ROB as a first-order contributor to core area, accounting for a significant fraction of the identifiable structures. [6]

Branch prediction overhead is the best-documented component because reducing it has been an active research priority for two decades. The published evidence is quantified and consistent:

  • High-end processors typically have branch predictors consuming more than 10 percent of total chip power. [3]
  • The Alpha EV8 branch predictor alone used 352 Kbits of storage, with “a very large silicon area devoted to branch prediction.” [7]
  • Reducing BTB size by a factor of eight achieves 9.2 percent dynamic energy reduction of the processor core. [8]
  • A four-wide out-of-order processor’s branch predictor consumes enough power that reducing it by 52 percent reduces overall processor energy by 4.1 percent. [9]

These figures establish branch prediction at 8 to 15 percent of core area and more than 10 percent of core dynamic power in high-performance designs–before accounting for the pipeline flush cost of mispredictions.

Branch prediction overhead extends beyond the predictor structures themselves. When a prediction is wrong, the processor discards all speculatively executed work, flushes the pipeline, and restarts from the correct path. In deep pipelines exceeding 20 stages in modern designs, this flush costs 15 to 30 cycles. Published analysis notes that “around 200 instructions are already executed along the predicted path” before a misprediction is detected and resolved. [10]

For the workloads I target–iterative solvers, simulated annealing, mixture-of-experts routing in LLMs, and dynamic dispatch in EDA tools–branches are data-dependent and misprediction rates are high. The flush cost is paid repeatedly, consuming power and cycles without producing any useful result.

The security cost of speculative execution is precisely documented by production deployments. Spectre and Meltdown, disclosed in January 2018, demonstrated that speculatively executed instructions leave observable traces in processor caches even when their results are discarded–because speculation performs real memory accesses against data the program should not be able to read. The root cause is architectural. Speculation by definition executes instructions before knowing whether they should execute.

Software and microcode mitigations for these vulnerabilities impose measured production penalties:
  • Red Hat measured performance impact ranging from 1 to 20 percent across workloads at initial disclosure, improving to 1 to 8 percent with optimized mitigations. [11]
  • Intel’s own benchmarks showed 2 to 21 percent degradation on SYSMark workloads. [12]
  • I/O-intensive server and database workloads showed 7 to 23 percent degradation. [13]
  • HPC workloads running NAMD, NWChem, and HPCC showed 2 to 3 percent single-node degradation, rising to 5 to 11 percent on multi-node MPI configurations. [14]

These are not theoretical costs. They are measured production performance penalties paid by every data center running speculation-based processors, in perpetuity, as the price of a design choice made decades ago. A processor that does not speculate has no Spectre attack surface by construction.

Time-Based Scheduling: The Architectural Alternative

The insight behind TBS is straightforward: track when each instruction’s input data is ready and dispatch it at that moment. Speculation is not needed. There is no requirement to guess at instruction readiness, no need to maintain the appearance of in-order retirement, and no need to search for independent work while waiting on dependencies–the scheduling mechanism resolves all of that directly from operand availability.

TBS tracks operand availability directly. When a producing instruction completes and writes its result, the dependent instruction becomes eligible for immediate dispatch. When operands are not yet available, the instruction waits. No prediction, no speculation, no recovery machinery.

This model eliminates or drastically reduces each of the four speculation overhead categories I described in Section 1:

The reorder buffer is largely eliminated. TBS does not execute instructions out of dependency order–it executes them in data-dependency order, which is the order the computation itself demands. Instructions do not need to be held pending in-order retirement because the commitment model is defined by data availability. Minimal bookkeeping for precise exceptions remains necessary but is a fraction of a full ROB’s complexity and area.

The reservation station shrinks dramatically. Rather than a large centralized structure scanning for ready instructions every cycle through continuous tag-matching, TBS tracks operand availability directly. When a producing instruction completes and its result is written, the dependent instruction is immediately eligible for dispatch. The scheduling logic is driven by completion events rather than polling–architecturally simpler and proportionally lower power.

Register renaming is substantially reduced. False dependencies arise primarily in out-of-order execution where instructions from different program regions execute simultaneously and reuse register names. TBS’s data-dependency-ordered dispatch significantly reduces the occurrence of the write-after-read and write-after-write hazards that renaming exists to eliminate. The physical register file can be substantially smaller than in an aggressive out-of-order design.

Branch prediction is eliminated. TBS does not execute past unresolved branches. When a branch condition depends on an in-flight computation, TBS waits for that computation to complete, then dispatches the correct path immediately. No prediction tables. No branch target buffer for speculative fetch. No misprediction recovery machinery. The Spectre attack surface–rooted in speculative memory accesses along wrong-path instructions–is removed by construction, not by patch.

The aggregate area recovery from eliminating these structures is substantial. Conservative estimates derived from published component figures:

  • Reorder buffer reduction: 5 to 7 percent of core area
  • Reservation station simplification: 4 to 6 percent of core area
  • Register renaming reduction: 4 to 6 percent of core area
  • Branch predictor elimination: 8 to 15 percent of core area

Total recovered area: 20 to 35 percent of processor core area, available for reallocation to structures that directly serve computation. In aggressive out-of-order designs where published estimates place total speculation overhead at 40 to 50 percent of core area, the recovery is larger still.

Where Recovered Area Goes

More execution units. A TBS core recovering 25 percent of area from speculation structures can deploy proportionally more arithmetic, load-store, and floating-point execution units. For the workloads I target, more execution units improve throughput directly when the dependency structure permits parallel dispatch–which TBS’s data-readiness model identifies and exploits without speculation overhead.

Larger cache hierarchy. Memory bandwidth and cache capacity are the dominant bottlenecks for LLM inference, HPC iterative solvers, and EDA placement runs. Area invested in larger L1 or L2 caches, or additional cache banks for higher bandwidth, addresses the actual bottleneck rather than overhead the workload never required.

More cores. In a many-core design for parallel workloads–MPI applications, distributed inference, multi-threaded EDA–a simpler TBS core with less speculation overhead means more cores fit on the same die.

On-chip memory. AI inference architectures benefit from large on-chip SRAM scratchpads that reduce off-chip memory traffic. Area recovered from speculation machinery can fund that on-chip memory directly, improving the memory bandwidth situation that limits inference performance more than compute does.

The Power Dividend

Area and dynamic power track closely in CMOS design. Larger structures with more ports consume proportionally more power when active. The speculation machinery that consumes 30 to 50 percent of core area also accounts for a significant fraction of core dynamic power.

Branch prediction’s power profile is particularly significant because the predictor operates on every fetch cycle regardless of whether a branch is encountered. The ROB and reservation stations consume power continuously as comparison logic checks for retirement eligibility and operand readiness. Register rename logic fires on every instruction decode.

Published estimates place speculation-related dynamic power at 20 to 30 percent of core power in high-performance designs. For a processor cluster running continuous workloads–an inference server, an HPC node, an EDA compute farm–20 to 30 percent power reduction translates directly to operating cost over the multi-year deployment lifetime of the hardware.

At data center scale, where power cost over a five-year deployment commonly exceeds hardware acquisition cost, this is a first-order economic consideration, not a secondary specification. I have seen customers plan infrastructure capacity around power envelope more often than around peak compute throughput. TBS addresses both.

Workload Alignment

TBS is not universally superior to out-of-order execution. On workloads with predictable branches and abundant independent instructions–general-purpose integer workloads running operating systems, web servers, and database query engines–speculation machinery provides real benefit and TBS would not automatically outperform it. The crossover point is workloads with deep genuine dependency chains and unpredictable branches. This is exactly the profile of the workloads that now dominate compute investment:

Simplex TBS VPU vs. conventional architectures

Performance, determinism, and silicon efficiency across three architecture classes. Bar length indicates relative advantage–longer is better. Speculative execution cost is measured as estimated pipeline cycles lost to misprediction and flush overhead; Simplex TBS eliminates this category by design. GPU comparison reflects edge-deployment conditions, not datacenter throughput.

LLM inference is a sequential dependency chain by construction. Autoregressive token generation requires each token before the next can be computed. Branch behavior in mixture-of-experts routing is data-dependent and difficult to predict. Speculation machinery provides minimal benefit and pays its full area and power cost regardless. More critically, LLM inference and other AI workloads execute across large vectors of data elements simultaneously. A speculative execution error does not waste one instruction–it wastes work across every element in the vector. The cost of misprediction scales with vector width. Non-speculative, deterministic execution in the VPU eliminates this multiplicative waste entirely.

Scientific computing and HPC iterative solvers–Gauss-Seidel, conjugate gradient, multigrid, simulated annealing–have deep genuine dependency chains where each iteration depends on the previous one. The loop structure is sequential by mathematical necessity. These workloads do not benefit from speculative execution of future iterations because the current iteration’s result determines whether and how the next proceeds.

EDA tool execution–static timing analysis, placement optimization, routing–propagates values through directed dependency graphs where every node depends on its predecessors. The branch behavior during convergence is data-dependent. Speculation machinery works against these workloads rather than for them.

Safety-critical and certified systems represent a fourth category where TBS’s determinism advantage is decisive independent of performance. DO-178C avionics, IEC 61508 industrial safety, and ISO 26262 automotive standards require deterministic, reproducible execution. Speculative execution’s inherent non-determinism at the microarchitectural level is a barrier to certification that cannot be patched away. TBS dispatches in data-dependency order, which is determined entirely by the program and its inputs. The same program on the same inputs follows the same execution path every time. This is determinism by architectural design, not by added overhead.

RVA23 includes Zkt, which mandates constant-time execution for certain operations regardless of data values. This is a determinism requirement in the profile standard itself–one that speculative out-of-order execution makes difficult to satisfy without added overhead, and that TBS provides by architectural design. The same profile makes RVV mandatory, which shifts the scalar core’s role from performance engine to dependency coordinator and makes simple, deterministic scalar execution viable for the first time in mainstream application processors.

Conclusion

The silicon cost of speculation is documented in peer-reviewed microarchitecture research, measured in production deployments affected by Spectre and Meltdown mitigations, and paid continuously in area, power, and security exposure by every data center running out-of-order processors.

The industry accepted this cost for five decades because the benefit was real for the workloads that dominated computing during that era. The workloads that now drive compute investment do not share those characteristics. AI inference, scientific simulation, EDA tool execution, and safety-critical embedded systems all have dependency structures that defeat speculation’s assumptions. They pay the full cost of speculation machinery while receiving little or none of its benefit.

Time-Based Scheduling eliminates that machinery by architectural design. Instructions execute when their data is ready–exactly when the computation’s own logic demands, without speculating when the workload doesn’t require it. The silicon area and power recovered from speculation structures are available for execution units, memory hierarchy, and core count that serve the workload directly.

I designed TBS from first principles about how computation should work when data dependencies are real, sequential, and unavoidable. The Simplex CPU is a conventional superscalar out-of-order design–TBS governs the VPU, which handles the vector-parallel workloads where speculation would be most harmful. The workloads that now dominate compute investment–LLM inference, HPC solvers, EDA convergence–share exactly that structure. TBS is built for them.

Simplex Micro is a RISC-V processor IP company developing Time-Based Scheduling architecture for compute workloads. Specific silicon area and power figures for Simplex Micro’s TBS implementation are available under NDA.

Sources

[1] García Ordaz et al., “A Reorder Buffer Design for High Performance Processors,” Computación y Sistemas, Vol. 16 No. 1, 2012. https://www.scielo.org.mx/pdf/cys/v16n1/v16n1a3.pdf

[2] Kucuk et al., “Complexity-effective Reorder Buffer Designs for Superscalar Processors,” IEEE/ACM MICRO-35, 2002. https://www.academia.edu/18728937

[3] Gao et al., “Efficient Architectural Exploration of TAGE Branch Predictor for Embedded Processors,” ScienceDirect, 2019. https://www.researchgate.net/publication/332775637

[4] ScienceDirect Topics, “Out-of-Order Execution.” https://www.sciencedirect.com/topics/computer-science/out-of-order-execution

[5] Choi, Park, Jeong, “Revisiting Reorder Buffer Architecture for Next Generation High Performance Computing,” Journal of Supercomputing, Vol. 65, 2013. https://link.springer.com/article/10.1007/s11227-011-0734-x

[6] Klauser et al., “Federation: Out-of-Order Execution Using Simple In-Order Cores,” University of Virginia Technical Report, 2007. https://www.cs.virginia.edu/~skadron/Papers/federation_tr_aug07.pdf

[7] Seznec et al., “The Alpha EV8 Conditional Branch Predictor,” 2003. https://www.researchgate.net/publication/3215341

[8] Li et al., “Energy-Efficient Branch Predictor via Instruction Block Type Prediction in Decoupled Frontend,” IET Computers & Digital Techniques, 2025. https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/cdt2/3359419

[9] Li et al., ibid.

[10] ScienceDirect Topics, “Speculative Execution.” https://www.sciencedirect.com/topics/computer-science/speculative-execution

[11] Red Hat, “Speculative Execution Exploit Performance Impacts,” 2018. https://access.redhat.com/articles/3307751

[12] InfoQ, “Intel Found That Spectre and Meltdown Fix Has a Performance Hit of 0-21%,” January 2018. https://www.infoq.com/news/2018/01/intel-spectre-performance/

[13] Databricks, “Meltdown and Spectre’s Performance Impact on Big Data Workloads in the Cloud,” January 2018. https://www.databricks.com/blog/2018/01/13/meltdown-and-spectre-performance-impact-on-big-data-workloads-in-the-cloud.html

[14] The Next Platform, “Reckoning The Spectre And Meltdown Performance Hit For HPC,” January 2018. https://www.nextplatform.com/2018/01/30/reckoning-spectre-meltdown-performance-hit-hpc/

Also Read:

Podcast EP340: A Review of the Q4 2025 Electronic Design Market Data Report with Wally Rhines

From Wooden Boards to White Gloves: How FPGA Prototyping and Emulation Became Two Worlds of Verification… and How the Convergence Is Unfolding

yieldHUB Expands Its Impact with New Technology and a New Website