SemiWiki – Page 6 – The Open Forum for Semiconductor Professionals

April 19, 2026April 20, 2026

Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint

Disaggregating LLM Inference: Inside the SambaNova Intel Heterogeneous Compute Blueprint
by Daniel Nenni on 04-19-2026 at 2:00 pm
Categories: Foundries, Intel Foundry
1 Comment

SambaNova Systems and Intel have introduced a blueprint for heterogeneous inference that reflects a significant shift in how modern large language model (LLM) workloads are deployed. Instead of relying on a single accelerator type, the proposed architecture assigns different phases of inference to specialized hardware: GPUs for prefill, SambaNova Reconfigurable Dataflow Units (RDUs) for decode, and Intel® Xeon® 6 CPUs for agentic tools and orchestration. This design addresses the growing complexity of agentic AI systems, where reasoning loops, tool calls, and iterative execution create heterogeneous compute demands that cannot be efficiently served by homogeneous accelerator clusters.

At the core of the proposal is the observation that inference is not a monolithic workload. It consists of distinct computational phases with different performance bottlenecks. The prefill phase processes the user prompt, computes attention matrices, and builds key–value caches. This stage is highly parallel and compute-intensive, making GPUs the most efficient hardware choice. GPUs excel at dense matrix operations and high-throughput tensor math, allowing rapid ingestion of large prompts and minimizing time-to-first-token latency. By isolating prefill onto GPU resources, the architecture ensures high utilization of GPU compute capabilities without wasting cycles on sequential token generation.

Following prefill, the workload transitions to the decode phase, where tokens are generated one at a time. Decode is fundamentally different from prefill: it is memory-bandwidth bound and heavily dependent on efficient access to attention caches. GPUs, while powerful, often underperform in decode scenarios because their architecture is optimized for large batched operations rather than sequential token generation. SambaNova’s RDUs are designed specifically for dataflow-oriented execution, enabling optimized memory access patterns and efficient handling of transformer inference during decode. This specialization improves token throughput and reduces latency, especially for long-context or multi-step reasoning workloads.

The third component of the architecture is the use of Intel® Xeon® 6 CPUs for agentic tools and orchestration. Agentic AI systems increasingly involve external actions such as database queries, API calls, code execution, and workflow management. These tasks are not well suited for accelerators but benefit from general-purpose CPU capabilities, large memory footprints, and mature software ecosystems. Xeon 6 processors act as the control plane, coordinating execution between GPUs and RDUs while also handling tool invocation, validation, and decision logic. This separation allows accelerators to remain focused on model inference while CPUs manage procedural logic and integration with enterprise systems.

This heterogeneous architecture delivers several system-level benefits. First, it improves hardware utilization by ensuring each processor operates within its optimal performance envelope. GPUs handle parallel compute-heavy tasks, RDUs manage memory-bound token generation, and CPUs execute control and orchestration logic. Second, the design enhances scalability for agentic workloads. As agents perform multiple reasoning steps, decode latency accumulates; specialized RDUs mitigate this bottleneck. Third, the architecture enables modular infrastructure scaling, allowing organizations to independently scale GPU, RDU, and CPU pools depending on workload demands.

Another key advantage is improved cost efficiency. GPU-only deployments often suffer from underutilization during decode or orchestration phases. By offloading those tasks to specialized hardware, the system reduces the need for excessive GPU capacity. This approach aligns with emerging data center trends that emphasize disaggregated compute and composable infrastructure. Additionally, using x86-based CPUs for orchestration ensures compatibility with existing enterprise software stacks, reducing integration complexity.

The blueprint also highlights the evolution of AI workloads toward agentic reasoning systems. Traditional chat-style inference involved single-pass generation, but modern agents iteratively plan, execute, and refine outputs. These workflows create alternating compute patterns: dense prompt processing, sequential decoding, and CPU-driven tool execution. A heterogeneous architecture maps naturally to this pattern, reducing performance bottlenecks and improving responsiveness.

In summary, the SambaNova–Intel blueprint demonstrates a practical pathway toward next-generation AI infrastructure. By combining GPUs for prefill, RDUs for decode, and Xeon 6 CPUs for agentic tools, the architecture reflects a shift from homogeneous accelerator clusters to specialized compute fabrics. This design improves performance, utilization, and scalability for agentic AI workloads, and it signals how future AI data centers may evolve to support increasingly complex reasoning systems.

Building the Blueprint for Premium Inference

Also Read:

Intel, Musk, and the Tweet That Launched a 1000 Ships on a Becalmed Sea

Agentic AI Demands More Than GPUs

Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era

April 19, 2026April 20, 2026

CEO Interview with Johan Wadenholt Vrethem of Voxo

CEO Interview with Johan Wadenholt Vrethem of Voxo
by Daniel Nenni on 04-19-2026 at 12:00 pm
Categories: AI, CEO Interviews

With over two decades of experience bridging technology and business, Johan Wadenholt Vrethem focused on harnessing AI to transform how organizations operate and engage with their customers. After leading critical digital initiatives and client engagements in the banking and finance sectors at CGI, Johan co-founded Voxo to drive innovation in conversational analytics and event technology.

Today, through Voxo Insights and Voxo Event, they are delivering real-time, AI-powered understanding of customer interactions and event discussions—empowering teams to act on data in ways they never thought possible.

Tell us about your company.

Voxo is an AI event content partner. We capture everything spoken on stage: keynotes, panels, and roundtables, and turn it into structured, branded, shareable content within minutes of a session ending.

We started in Stockholm in 2016, initially in conversational analytics for financial advisory and customer service. That product taught us what it actually takes to build reliable speech AI in demanding real-world environments, with accuracy requirements, speaker variations, and real-time latency.

The pivot to events came from a clear market signal. We summarized all stage sessions at Techarena 2024, Scandinavia’s largest tech conference. The demand afterward was immediate, and we decided to launch this as an event-specific product.

Today, we work with enterprise customers globally. Partners include event tech platforms such as RainFocus and Amego, and customers such as HubSpot, GitHub, and Intuit.

What problems are you solving?

Events are among the most information-dense environments in the world, and almost all of that intelligence evaporates the moment a session ends. A speaker walks off stage after 45 minutes and without a summary, the insights are effectively gone. Our summaries are live minutes after the speakers walk off the stage, with key takeaways that are branded and ready-to-post on social media.

On the production side, creating post-event content from hundreds of sessions takes marketing teams weeks. By then, the audience has moved on. Our customers tell us that for them to create what Voxo delivers in a day would have taken their teams months.

For attendees, the service adds more value to the event experience. It’s impossible to be in three sessions at once. With Voxo, they get a summary of every session they missed, available immediately. At The AI Summit New York, that meant 25,000 summary downloads across roughly 200 sessions. The content doesn’t expire; it can be used for future marketing efforts to promote next year’s event.

What application areas are your strongest?

Enterprise conferences and large-scale industry summits. Multi-day, multi-track events with demanding quality standards and marketing teams that need to move fast. We’re also about to launch a self-service SaaS platform for events as well, that’ll make it easier for smaller organizations and enterprise organizers to utilize the whole event agenda in their content schedule during and after the event.

What keeps your customers up at night?

Accuracy. Publishing something inaccurate for a global event brand is a real risk. One wrong summary published with the wrong company name or speaker could be damaging. That’s why we built a human-in-the-loop review into the workflow. We train our event specialists to review quickly so we don’t lose the speed advantage.

Other than that, it’s quality. We have to maintain quality across 200 sessions simultaneously, with sometimes imperfect audio, accented speakers, and no time to redo anything. That’s what we’re built for.

And beyond that, there’s the ROI question. Events are expensive. Sponsorships are expensive. So our customers need to show that their event is actually creating value for stakeholders. We measure every summary download and can connect it to the personal data from the event apps, so the sponsors get more visibility and also tangible ROI.

What does the competitive landscape look like and how do you differentiate?

There are a lot of transcription tools out there like Otter and Fireflies, and now a wave of AI tools built for events. But what’s different about our offer is that we’re not just another tool that generates summaries. We’ve focused on the layer around it, how content is actually captured, structured, branded and quality-checked. Because that’s the hard part at scale, event marketers know this better than anyone else.

Capturing and creating content at a multi-stage event is a massive undertaking, and we’re adding a layer of valuable content that benefits everyone from organizers and speakers, to attendees.

We’ve also seen on-demand viewing increase by up to 400% when summaries create that initial interest, so it feeds into the already existing event content structure as well.

What new features and technology are you working on?

We’re focusing on deepening the personalization for attendees. Right now, we deliver session summaries to everyone equally. The next step is understanding who the attendee is (their role, their interests, what they attended) and surfacing content most relevant to them. This is a feature we’re delivering for the first time in June with an enterprise client.

Within the year, we will release a fully customizable content lab with new formats, more interactive content, and deep integrations into social media platforms. Earlier this year, we released AI Podcasts that are created on-the-fly during the event, and we’re also experimenting with motion graphics and automated video creation to be released soon!

The self-service platform will be a major release from us, enabling the vast majority of the market to use the same tools we provide to our enterprise clients today to medium and smaller companies as well.

How do customers normally engage with your company?

Event organizers or enterprise marketing teams often find us through a recommendation or from attending an event we’ve summarized. If it’s through an event, then they’ve experienced the attendee side of the summaries and usually pretty intrigued right off the bat.

Or it’s through event platform partnerships like RainFocus or Amego. The organizer is already using their platform; they see Voxo as a partner, and the integration is smooth sailing from there.

The best salespeople are our existing happy customers, which is something we’re really proud of since we’re a new technology application in the event industry. So far, more than 90% of our clients from 2025 have signed up for the services again for this year.

Also Read:

CEO Interview with Dr. Hardik Kabaria of Vinci

CEO Interview with Steve Kim of Chips&Media

CEO Interview with Jussi-Pekka Penttinen of Vexlum

April 17, 2026April 20, 2026

TSMC to Elon Musk: There are no Shortcuts in Building Fabs!

TSMC to Elon Musk: There are no Shortcuts in Building Fabs!
by Daniel Nenni on 04-17-2026 at 10:00 am
Categories: Foundries, TSMC
1 Comment

The opening of the TSMC 2026 earning call series brought no surprises. CC Wei has done more than 30 such calls since taking the CEO position in 2018 and he never fails to disappoint. Once again, CC Wei reported numbers above guidance driven by strong demand and flawless execution. This illustrates the benefit of TSMC’s close collaborations and deeply trusted relationships with partners and customers. The TSMC forecast is the most trusted forecast the semiconductor industry will ever see, absolutely.

I do remember the one-time CC Wei did disappoint on an earnings call and that was during COVID which was a painful supply chain lesson for all. CC Wei turned that COVID supply chain experience into a “Why supply chain trust and resilience is so important” master class that goes to the heart of the TSMC mission statement and that is Trust.

“Our mission is to be the trusted technology and capacity provider of the global logic IC industry for years to come.”

As expected, TSMC N5 and N3 accounted for the majority of 2026 revenue meaning that margins are also well above 60% and look to stay that way in the not-so-distant future. TSMC N3 is also fast approaching the 5-year depreciation mark so TSMC corporate margins will only go up from here.

As we discussed before, TSMC N3 is the final node in the record setting FinFET family of process technologies and it has ZERO competition in the merchant foundry business. I remember tracking design wins when N3 was first launched and realizing that TSMC N3 would be the most dominant process node I would ever see in my 40+ year semiconductor career and that is certainly the case as it stands today.

CC Wei: In Taiwan, we are adding a new 3-nanometer fab to our GIGAFAB cluster in Tainan Science Park. Volume production is scheduled for the
first half of 2027. In Arizona, our second fab will also utilize 3-nanometer technologies. Construction is already complete and volume
production will begin in the second half of 2027. In Japan, we now plan to utilize 3-nanometer technology in our second fab and volume
production is scheduled in 2028.

CC Wei also discussed moving more N5 capacity to N3. Samsung has reportedly fixed their yield problems at 5/4nm so it makes complete sense for TSMC to focus on the higher margin N3 process technologies. Besides, it is easier to move a TSMC N5 design to TSMC N3 than to Samsung 4nm and much easier than moving a design to Samsung 3/2nm (GAA) so CC Wei’s strategy is clear and sound.

CC Wei: Next, let me talk about our N2 capacity expansion plan. Our practice is to prioritize the land in Taiwan to support the fast ramp of our newest
node due to the need for tight integration with R&D operations. Today, our new node, N2, has already entered high-volume manufacturing
in the fourth quarter of 2025 with good yield. N2 is ramping successfully in multiple phases at both Hsinchu and Kaohsiung site, supported
by strong demand from both smartphone and HPC/AI applications.

In regards to TSMC N2, TSMC’s N3 dominance not only sets up customers for a smooth transition to the N2 process family, it brings forward the strongest ecosystem of partners the semiconductor industry has ever seen, which is a very big deal. There is little doubt that TSMC will dominate the 2nm process node. I’m just wondering how big the NOT TSMC market will be at 2nm? It was next to zero at 3nm due to the lack of competition. I hope 2nm will be different with Intel Foundry 18AP and Samsung Foundry SF2 offering viable alternatives to TSMC N2, and maybe even Rapidus 2nm.

The call was closed out with a TSMC A14 Status. Will TSMC A14 again dominate the foundry business? Or better yet; How big will the NOT TSMC market be at 14 Angstrom? It is too soon to tell but my guess would be that the NOT TSMC market will continue to grow due to supply chain concerns.

CC Wei: Finally, let me talk about our A14 status. Featuring our second-generation nanosheet transistor structure, A14 will deliver another full-node
stride from N2, with performance and power benefit to address the insatiable need for high performance and energy efficient computing. Compared with N2, A14 will provide 10% to 15% speed improvement at the same power for 25% to 30% power improvement at the same
speed and close to 20% chip density gain.

Our A14 technology development is on track and progressing well. We are observing a high level of customer interest and engagement from both smartphone and HPC applications. Volume production is scheduled for 2028. Our A14 technology and its derivatives will further extend our technology leadership position and enable TSMC to capture the growth opportunities well into the future.

Of course there were references to Elon Musk and Terafab during the Q&A. CC Wei offered Elon Musk some very sound advice:

CC Wei: Again, let me say that it takes two to three years to build a new fab. No shortcuts. And it takes another one to two years to ramp it up. Again,
that’s a fundamental of foundry industry. And whether we try to win them back (Intel and Tesla), actually, they are still our customers and we are very confident in our technology position. And we work very hard to capture every piece of business possible.

Did you get that Elon? No short cuts in semiconductor manufacturing.

In regards to CapEX, TSMC raised CapEX from $40-41B in 2025 to $52-56B in 2026 which is huge! CC Wei mentioned that TSMC would probably be at the high end of that when asked during the Q&A. In my opinion TSMC will definitely be at the high end of that and maybe even higher. It all depends on how well the NOT TSMC market is developing

Also Read:

TSMC Technology Symposium 2026: Advancing the Future of Semiconductor Innovation

Global 2nm Supply Crunch: TSMC Leads as Intel 18A, Samsung, and Rapidus Race to Compete

TSMC Process Simplification for Advanced Nodes

Podcast EP342: The Evolution and Impact of Physical AI with Hezi Saar

Podcast EP342: The Evolution and Impact of Physical AI with Hezi Saar
by Daniel Nenni on 04-17-2026 at 6:00 am

Daniel is joined by Hezi Saar, Executive Director of Product Marketing at Synopsys, Hezi is responsible for the mobile, automotive, and consumer IP product lines. He brings more than 20 years of experience in the semiconductor and embedded systems industries.

Dan explores the growing field of physical AI with Hezi, who explains this discipline focuses on edge AI applications where the AI interacts with humans and the surrounding world. He goes on to describe some of the unique requirements for this field, which include the ability to process text and vision information to allow AI to take action to accommodate the required capabilities. Hezi describes self-driving cars as one example.

He describes the unique and demanding latency, bandwidth and power requirements for physical AI applications. The impact of open standards is discussed, along with an assessment of the market’s future growth and expanding applications.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.

April 16, 2026April 17, 2026

Speculation: Silicon’s Most Expensive Compulsion

Speculation: Silicon’s Most Expensive Compulsion
by Admin on 04-16-2026 at 10:00 am
Categories: EDA

How Time-Based Scheduling
Reclaims Silicon Wasted by Speculative Execution

By: Dr. Thang Tran, Founder and CTO, Simplex Micro

I have spent my career designing processor architectures, and I have reached an uncomfortable conclusion: a substantial fraction of the silicon area and power in modern high-performance processors exists not to compute results, but to hide from software the fact that instructions execute out of program order.

Out-of-order speculative execution was a necessary engineering choice for an era dominated by general-purpose workloads with predictable branches and abundant independent instructions. Today, the workloads that drive compute investment–AI inference, scientific simulation, and EDA tool execution–have dependency structures that defeat the assumptions of speculative execution. These workloads pay the full cost of speculation machinery while receiving little of its benefit.

The ratification of RISC-V’s RVA23 profile confirms the direction. By making the vector extension mandatory, RVA23 shifts the performance burden from speculative scalar execution to explicit vector parallelism–and in doing so, makes simpler, deterministic scalar cores viable for the first time in mainstream application processors.

Published microarchitecture research documents this overhead consistently. The reorder buffer, reservation stations, register renaming logic, and branch prediction structures together consume an estimated 30 to 50 percent of processor core area in aggressive out-of-order designs. Branch prediction alone accounts for more than 10 percent of total chip power in high-end implementations.

Simplex Micro’s Time-Based Scheduling (TBS) architecture applies this principle to the vector processing unit (VPU), where it matters most for AI workloads. The Simplex CPU is a conventional superscalar out-of-order design; TBS governs the VPU, which executes deterministically and non-speculatively. In the VPU, vector instructions dispatch when their data is ready–not when a predictor guesses they might be ready, not speculatively into a future that may need to be discarded. For AI applications computing many data elements in parallel, speculation would multiply wasted work across every element in the vector. Non-speculative execution is not a constraint; it is the correct architectural choice for this workload class. The silicon recovered from speculation machinery in the VPU can be reinvested in execution units, cache capacity, or core count. These are the resources that determine throughput on the workloads that matter now.

The Problem with Speculation

Out-of-order execution was developed to keep processor execution units busy during data dependency stalls. When instruction B depends on the result of instruction A, a processor that must wait for A before starting B wastes cycles. The out-of-order solution is to look ahead in the instruction stream, find instruction C that does not depend on A or B, and execute C while waiting for A to complete. But implementing out-of-order execution requires four categories of hardware that have no computational purpose–they exist only to support the speculation machinery itself.

The reorder buffer holds instructions that have executed out of program order and tracks them until they can be retired in order, maintaining the architectural illusion of sequential execution. In the Intel P6 family and its descendants, the ROB also stores uncommitted register values, making it a heavily multi-ported structure. Published research characterizes it as “a complex multi-ported structure that dissipates a significant percentage of the overall chip power.” [1] Modern high-performance designs maintain 200 to 500 ROB entries simultaneously, each carrying instruction state, operand values, result data, and status bits tracking whether the instruction has completed speculatively.

Reservation stations hold instructions after decode, waiting for operands to become available. Each cycle, the reservation station logic scans all in-flight instructions, detects ready operands through tag-matching comparisons, and selects instructions for dispatch. This comparison logic fires continuously and scales in complexity with the number of in-flight instructions–not with the number of instructions actually ready to execute.

Register renaming eliminates false data dependencies–write-after-read and write-after-write hazards–that arise when instructions from different parts of the program reuse the same architectural register names within an out-of-order execution window. A processor with 16 architectural integer registers may maintain 180 to 256 physical registers to support renaming, along with rename tables and freelist management. Published analysis confirms: “All contemporary dynamically scheduled processors support register renaming to cope with false data dependencies… the ROB is a large multiported structure that occupies a significant portion of the die area and dissipates a sizable fraction of the total chip power.” [2]

Branch prediction allows the processor to speculatively execute instructions past unresolved conditional branches by guessing the outcome before the branch condition is evaluated. State-of-the-art predictors achieve over 95 percent accuracy but require multiple large prediction tables, branch target buffers, return address stacks, and indirect branch predictors. “High-end processors typically incorporate complex branch predictors consisting of many large structures that together consume a notable fraction of total chip power–more than 10 percent in some cases.” [3]

The published characterization of the aggregate cost is unambiguous: “Out-of-order scheduling logic requires a substantial area of the CPU die to maintain dependence information and queues of instructions… A larger portion of the chip in out-of-order processors is dedicated to issuing instructions out-of-order than to actual execution.” [4] More die area is devoted to the scheduling machinery than to the execution units that compute results. This is the overhead cost of hiding that fact from software.

Itemizing the cost

The Reorder Buffer’s area cost scales with entry count and port count. A 500-entry ROB storing full instruction state and operand values with the multi-porting required for simultaneous read and write access at peak instruction bandwidth is a large SRAM structure. Research published in the Journal of Supercomputing notes that “naive scaling of the conventional reorder buffer architecture can severely increase the complexity and power consumption,” [5] confirming that area scales super-linearly with ROB capacity as designers push for larger instruction windows.

Conservative published estimates place the ROB at 5 to 10 percent of total core area in high-performance designs. Die photo analysis of commercial out-of-order processors–one of the few sources of actual area breakdown data–confirms the ROB as a first-order contributor to core area, accounting for a significant fraction of the identifiable structures. [6]

Branch prediction overhead is the best-documented component because reducing it has been an active research priority for two decades. The published evidence is quantified and consistent:

High-end processors typically have branch predictors consuming more than 10 percent of total chip power. [3]
The Alpha EV8 branch predictor alone used 352 Kbits of storage, with “a very large silicon area devoted to branch prediction.” [7]
Reducing BTB size by a factor of eight achieves 9.2 percent dynamic energy reduction of the processor core. [8]
A four-wide out-of-order processor’s branch predictor consumes enough power that reducing it by 52 percent reduces overall processor energy by 4.1 percent. [9]

These figures establish branch prediction at 8 to 15 percent of core area and more than 10 percent of core dynamic power in high-performance designs–before accounting for the pipeline flush cost of mispredictions.

Branch prediction overhead extends beyond the predictor structures themselves. When a prediction is wrong, the processor discards all speculatively executed work, flushes the pipeline, and restarts from the correct path. In deep pipelines exceeding 20 stages in modern designs, this flush costs 15 to 30 cycles. Published analysis notes that “around 200 instructions are already executed along the predicted path” before a misprediction is detected and resolved. [10]

For the workloads I target–iterative solvers, simulated annealing, mixture-of-experts routing in LLMs, and dynamic dispatch in EDA tools–branches are data-dependent and misprediction rates are high. The flush cost is paid repeatedly, consuming power and cycles without producing any useful result.

The security cost of speculative execution is precisely documented by production deployments. Spectre and Meltdown, disclosed in January 2018, demonstrated that speculatively executed instructions leave observable traces in processor caches even when their results are discarded–because speculation performs real memory accesses against data the program should not be able to read. The root cause is architectural. Speculation by definition executes instructions before knowing whether they should execute.

Software and microcode mitigations for these vulnerabilities impose measured production penalties:

Red Hat measured performance impact ranging from 1 to 20 percent across workloads at initial disclosure, improving to 1 to 8 percent with optimized mitigations. [11]
Intel’s own benchmarks showed 2 to 21 percent degradation on SYSMark workloads. [12]
I/O-intensive server and database workloads showed 7 to 23 percent degradation. [13]
HPC workloads running NAMD, NWChem, and HPCC showed 2 to 3 percent single-node degradation, rising to 5 to 11 percent on multi-node MPI configurations. [14]

These are not theoretical costs. They are measured production performance penalties paid by every data center running speculation-based processors, in perpetuity, as the price of a design choice made decades ago. A processor that does not speculate has no Spectre attack surface by construction.

Time-Based Scheduling: The Architectural Alternative

The insight behind TBS is straightforward: track when each instruction’s input data is ready and dispatch it at that moment. Speculation is not needed. There is no requirement to guess at instruction readiness, no need to maintain the appearance of in-order retirement, and no need to search for independent work while waiting on dependencies–the scheduling mechanism resolves all of that directly from operand availability.

TBS tracks operand availability directly. When a producing instruction completes and writes its result, the dependent instruction becomes eligible for immediate dispatch. When operands are not yet available, the instruction waits. No prediction, no speculation, no recovery machinery.

This model eliminates or drastically reduces each of the four speculation overhead categories I described in Section 1:

The reorder buffer is largely eliminated. TBS does not execute instructions out of dependency order–it executes them in data-dependency order, which is the order the computation itself demands. Instructions do not need to be held pending in-order retirement because the commitment model is defined by data availability. Minimal bookkeeping for precise exceptions remains necessary but is a fraction of a full ROB’s complexity and area.

The reservation station shrinks dramatically. Rather than a large centralized structure scanning for ready instructions every cycle through continuous tag-matching, TBS tracks operand availability directly. When a producing instruction completes and its result is written, the dependent instruction is immediately eligible for dispatch. The scheduling logic is driven by completion events rather than polling–architecturally simpler and proportionally lower power.

Register renaming is substantially reduced. False dependencies arise primarily in out-of-order execution where instructions from different program regions execute simultaneously and reuse register names. TBS’s data-dependency-ordered dispatch significantly reduces the occurrence of the write-after-read and write-after-write hazards that renaming exists to eliminate. The physical register file can be substantially smaller than in an aggressive out-of-order design.

Branch prediction is eliminated. TBS does not execute past unresolved branches. When a branch condition depends on an in-flight computation, TBS waits for that computation to complete, then dispatches the correct path immediately. No prediction tables. No branch target buffer for speculative fetch. No misprediction recovery machinery. The Spectre attack surface–rooted in speculative memory accesses along wrong-path instructions–is removed by construction, not by patch.

The aggregate area recovery from eliminating these structures is substantial. Conservative estimates derived from published component figures:

Reorder buffer reduction: 5 to 7 percent of core area
Reservation station simplification: 4 to 6 percent of core area
Register renaming reduction: 4 to 6 percent of core area
Branch predictor elimination: 8 to 15 percent of core area

Total recovered area: 20 to 35 percent of processor core area, available for reallocation to structures that directly serve computation. In aggressive out-of-order designs where published estimates place total speculation overhead at 40 to 50 percent of core area, the recovery is larger still.

Where Recovered Area Goes

More execution units. A TBS core recovering 25 percent of area from speculation structures can deploy proportionally more arithmetic, load-store, and floating-point execution units. For the workloads I target, more execution units improve throughput directly when the dependency structure permits parallel dispatch–which TBS’s data-readiness model identifies and exploits without speculation overhead.

Larger cache hierarchy. Memory bandwidth and cache capacity are the dominant bottlenecks for LLM inference, HPC iterative solvers, and EDA placement runs. Area invested in larger L1 or L2 caches, or additional cache banks for higher bandwidth, addresses the actual bottleneck rather than overhead the workload never required.

More cores. In a many-core design for parallel workloads–MPI applications, distributed inference, multi-threaded EDA–a simpler TBS core with less speculation overhead means more cores fit on the same die.

On-chip memory. AI inference architectures benefit from large on-chip SRAM scratchpads that reduce off-chip memory traffic. Area recovered from speculation machinery can fund that on-chip memory directly, improving the memory bandwidth situation that limits inference performance more than compute does.

The Power Dividend

Area and dynamic power track closely in CMOS design. Larger structures with more ports consume proportionally more power when active. The speculation machinery that consumes 30 to 50 percent of core area also accounts for a significant fraction of core dynamic power.

Branch prediction’s power profile is particularly significant because the predictor operates on every fetch cycle regardless of whether a branch is encountered. The ROB and reservation stations consume power continuously as comparison logic checks for retirement eligibility and operand readiness. Register rename logic fires on every instruction decode.

Published estimates place speculation-related dynamic power at 20 to 30 percent of core power in high-performance designs. For a processor cluster running continuous workloads–an inference server, an HPC node, an EDA compute farm–20 to 30 percent power reduction translates directly to operating cost over the multi-year deployment lifetime of the hardware.

At data center scale, where power cost over a five-year deployment commonly exceeds hardware acquisition cost, this is a first-order economic consideration, not a secondary specification. I have seen customers plan infrastructure capacity around power envelope more often than around peak compute throughput. TBS addresses both.

Workload Alignment

TBS is not universally superior to out-of-order execution. On workloads with predictable branches and abundant independent instructions–general-purpose integer workloads running operating systems, web servers, and database query engines–speculation machinery provides real benefit and TBS would not automatically outperform it. The crossover point is workloads with deep genuine dependency chains and unpredictable branches. This is exactly the profile of the workloads that now dominate compute investment:

Simplex TBS VPU vs. conventional architectures

Performance, determinism, and silicon efficiency across three architecture classes. Bar length indicates relative advantage–longer is better. Speculative execution cost is measured as estimated pipeline cycles lost to misprediction and flush overhead; Simplex TBS eliminates this category by design. GPU comparison reflects edge-deployment conditions, not datacenter throughput.

LLM inference is a sequential dependency chain by construction. Autoregressive token generation requires each token before the next can be computed. Branch behavior in mixture-of-experts routing is data-dependent and difficult to predict. Speculation machinery provides minimal benefit and pays its full area and power cost regardless. More critically, LLM inference and other AI workloads execute across large vectors of data elements simultaneously. A speculative execution error does not waste one instruction–it wastes work across every element in the vector. The cost of misprediction scales with vector width. Non-speculative, deterministic execution in the VPU eliminates this multiplicative waste entirely.

Scientific computing and HPC iterative solvers–Gauss-Seidel, conjugate gradient, multigrid, simulated annealing–have deep genuine dependency chains where each iteration depends on the previous one. The loop structure is sequential by mathematical necessity. These workloads do not benefit from speculative execution of future iterations because the current iteration’s result determines whether and how the next proceeds.

EDA tool execution–static timing analysis, placement optimization, routing–propagates values through directed dependency graphs where every node depends on its predecessors. The branch behavior during convergence is data-dependent. Speculation machinery works against these workloads rather than for them.

Safety-critical and certified systems represent a fourth category where TBS’s determinism advantage is decisive independent of performance. DO-178C avionics, IEC 61508 industrial safety, and ISO 26262 automotive standards require deterministic, reproducible execution. Speculative execution’s inherent non-determinism at the microarchitectural level is a barrier to certification that cannot be patched away. TBS dispatches in data-dependency order, which is determined entirely by the program and its inputs. The same program on the same inputs follows the same execution path every time. This is determinism by architectural design, not by added overhead.

RVA23 includes Zkt, which mandates constant-time execution for certain operations regardless of data values. This is a determinism requirement in the profile standard itself–one that speculative out-of-order execution makes difficult to satisfy without added overhead, and that TBS provides by architectural design. The same profile makes RVV mandatory, which shifts the scalar core’s role from performance engine to dependency coordinator and makes simple, deterministic scalar execution viable for the first time in mainstream application processors.

Conclusion

The silicon cost of speculation is documented in peer-reviewed microarchitecture research, measured in production deployments affected by Spectre and Meltdown mitigations, and paid continuously in area, power, and security exposure by every data center running out-of-order processors.

The industry accepted this cost for five decades because the benefit was real for the workloads that dominated computing during that era. The workloads that now drive compute investment do not share those characteristics. AI inference, scientific simulation, EDA tool execution, and safety-critical embedded systems all have dependency structures that defeat speculation’s assumptions. They pay the full cost of speculation machinery while receiving little or none of its benefit.

Time-Based Scheduling eliminates that machinery by architectural design. Instructions execute when their data is ready–exactly when the computation’s own logic demands, without speculating when the workload doesn’t require it. The silicon area and power recovered from speculation structures are available for execution units, memory hierarchy, and core count that serve the workload directly.

I designed TBS from first principles about how computation should work when data dependencies are real, sequential, and unavoidable. The Simplex CPU is a conventional superscalar out-of-order design–TBS governs the VPU, which handles the vector-parallel workloads where speculation would be most harmful. The workloads that now dominate compute investment–LLM inference, HPC solvers, EDA convergence–share exactly that structure. TBS is built for them.

Simplex Micro is a RISC-V processor IP company developing Time-Based Scheduling architecture for compute workloads. Specific silicon area and power figures for Simplex Micro’s TBS implementation are available under NDA.

Sources

[1] García Ordaz et al., “A Reorder Buffer Design for High Performance Processors,” Computación y Sistemas, Vol. 16 No. 1, 2012. https://www.scielo.org.mx/pdf/cys/v16n1/v16n1a3.pdf

[2] Kucuk et al., “Complexity-effective Reorder Buffer Designs for Superscalar Processors,” IEEE/ACM MICRO-35, 2002. https://www.academia.edu/18728937

[3] Gao et al., “Efficient Architectural Exploration of TAGE Branch Predictor for Embedded Processors,” ScienceDirect, 2019. https://www.researchgate.net/publication/332775637

[4] ScienceDirect Topics, “Out-of-Order Execution.” https://www.sciencedirect.com/topics/computer-science/out-of-order-execution

[5] Choi, Park, Jeong, “Revisiting Reorder Buffer Architecture for Next Generation High Performance Computing,” Journal of Supercomputing, Vol. 65, 2013. https://link.springer.com/article/10.1007/s11227-011-0734-x

[6] Klauser et al., “Federation: Out-of-Order Execution Using Simple In-Order Cores,” University of Virginia Technical Report, 2007. https://www.cs.virginia.edu/~skadron/Papers/federation_tr_aug07.pdf

[7] Seznec et al., “The Alpha EV8 Conditional Branch Predictor,” 2003. https://www.researchgate.net/publication/3215341

[8] Li et al., “Energy-Efficient Branch Predictor via Instruction Block Type Prediction in Decoupled Frontend,” IET Computers & Digital Techniques, 2025. https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/cdt2/3359419

[9] Li et al., ibid.

[10] ScienceDirect Topics, “Speculative Execution.” https://www.sciencedirect.com/topics/computer-science/speculative-execution

[11] Red Hat, “Speculative Execution Exploit Performance Impacts,” 2018. https://access.redhat.com/articles/3307751

[12] InfoQ, “Intel Found That Spectre and Meltdown Fix Has a Performance Hit of 0-21%,” January 2018. https://www.infoq.com/news/2018/01/intel-spectre-performance/

[13] Databricks, “Meltdown and Spectre’s Performance Impact on Big Data Workloads in the Cloud,” January 2018. https://www.databricks.com/blog/2018/01/13/meltdown-and-spectre-performance-impact-on-big-data-workloads-in-the-cloud.html

[14] The Next Platform, “Reckoning The Spectre And Meltdown Performance Hit For HPC,” January 2018. https://www.nextplatform.com/2018/01/30/reckoning-spectre-meltdown-performance-hit-hpc/

Also Read:

Podcast EP340: A Review of the Q4 2025 Electronic Design Market Data Report with Wally Rhines

From Wooden Boards to White Gloves: How FPGA Prototyping and Emulation Became Two Worlds of Verification… and How the Convergence Is Unfolding

yieldHUB Expands Its Impact with New Technology and a New Website

April 16, 2026April 17, 2026

WEBINAR: Beyond Moore’s Law and The Future of Semiconductor Manufacturing Intelligence

WEBINAR: Beyond Moore’s Law and The Future of Semiconductor Manufacturing Intelligence
by Daniel Nenni on 04-16-2026 at 6:00 am
Categories: EDA, Events, NetApp, PDF Solutions, Semiconductor Services, Siemens EDA, Synopsys

This is a live panel with industry experts who are on the leading edge of AI in semiconductor manufacturing. This is a must attend event for all levels of semiconductor professionals. I hope to see you there.

The semiconductor industry faces unprecedented challenges as it pushes toward advanced nodes below 3nm, managing exponential process complexity, yield variability, and escalating production costs.

This webinar explores the transition from reactive automation to autonomous manufacturing, bringing together leaders from across the semiconductor ecosystem. Experts will examine how AI-driven digital twins, predictive metrology, agentic AI systems, and generative design are revolutionizing yield management and sustainability.

Discover how cross-ecosystem collaboration and intelligent fab assistants are overcoming the limitations of traditional manufacturing models. From inline defect classification to AI copilots for process engineers, these innovations are paving the way for self-optimizing ‘AI Factories’ that will define the future of global microelectronics as conventional approaches to reach their limits.

Register Now

Date: April 23rd, 2026
Time: 10:00 AM PDT
Duration: 60 minutes

Agenda

(5min) Introductions- Daniel Nenni, SemiWiki
(10 min) Dr. Janhavi Giri, EDA & Semiconductor Industry Vertical Lead, NetApp
Presentation on “Beyond Moore’s Law: The Future of Semiconductor Manufacturing Intelligence”

Panel session (45-min)

Dr. Jim Shiely, Director, R&D Calibre Semi Manufacturing, Siemens EDA
Dr. Larry Melvin, Sr. Director, Technical Product Management, Synopsys
Dr. Christophe Begue, VP, Corporate Strategic Marketing, PDF Solutions
Dr. Janhavi Giri, EDA & Semiconductor Industry Vertical Lead, NetApp
Daniel Nenni, SemiWiki Founder (Moderator)

Who Should Attend

Semiconductor fab, manufacturing, and yield leaders
Process and device engineers at advanced nodes
AI, data, and automation teams driving smart factories
Digital transformation and technology executives
Equipment, EDA, and ecosystem partners

Key Takeaways

- How AI is enabling autonomous, self optimizing semiconductor manufacturing
- Practical use of digital twins, predictive metrology, and agentic AI
- New approaches to yield improvement, cost reduction, and sustainability
- The role of intelligent fab assistants and cross ecosystem collaboration
- What’s next for the semiconductor “AI Factory”

Register Now

Also Read:

GTC 2026: Agentic AI for Semiconductor Design and Manufacturing

Agentic EDA Panel Review Suggests Promise and Near-Term Guidance

Cloud-Accelerated EDA Development

April 15, 2026April 15, 2026

Exploring the Hidden Complexity of Modern Power Electronics Design – A Siemens White Paper

Exploring the Hidden Complexity of Modern Power Electronics Design – A Siemens White Paper
by Mike Gianfagna on 04-15-2026 at 10:00 am
Categories: EDA, Siemens EDA

Review the specifications of any state-of-the-art microcontroller and you will discover the high dynamic current the device can consume. Examining the high clock rates and low tolerable voltage drop will lead you to the all-important power delivery network, or PDN. Components here include power planes, layer stack-up, decoupling capacitors, and a voltage regulator module. The quality of the PDN can have a big impact on overall product performance. The goal is to deliver a stable supply voltage with low voltage drop across the entire specified temperature range throughout the product’s lifetime.

Getting all this right can be quite daunting. Siemens recently released a white paper on this topic. Predominant failure mechanisms for PDN designs are discussed, along with how to avoid them. There is a lot of great information here. A link is coming so you can get the whole story but first let’s see what exploring the hidden complexity of modern power electronics design reveals in this Siemens white paper.

What’s at Stake

An improperly designed PDN can create many issues, some immediate and others are like a ticking time bomb. An example is high voltage drop due to narrow traces. In this case, the result is reduced timing margin that can cause products to fail. This effect may not manifest at room temperature but rather at higher temperatures or after years of field operation. The impact of such field failures is unforeseeable but can be avoided by applying the capabilities of integrated simulation tools.

It’s important to note that the interaction between electrical, thermal, and reliability constraints is far more complex than most traditional workflows can typically capture. In addition, it’s crucial to understand PDN modelling to ensure eﬃciency during the design process and to guarantee deterministic behavior across all load states and switching events.

The white paper goes on to point out that the journey from a perfect schematic to field failures is not intuitive, but the significant amount of design constraints increases the complexity for modern power electronics. For example, even a simple current-mode pulse-width-modulated controller, such as the LM5020 has about 40 explicit design constraints for using the part. Not handling those constraints across the electrical, thermal, mechanical, and manufacturing domains, along with the associated complexity, can lead to field failures.

With this backdrop, the white paper discusses the four predominant PDN failure mechanisms to illustrate the necessity of a highly integrated design and verification flow. The graphic at the top of this post summarizes those four failure mechanisms.

What to Focus On

Here is a short summary of the PDN failure mechanisms discussed in the white paper.

1) Improper stackup: Printed circuit board (PCB) failures often originate from insuﬃcient control of the PDN, whose impedance must remain low across a wide frequency range to ensure stable IC supply conditions. The white paper explains that PDN behavior is strongly influenced by layer stackup configuration, dielectric spacing, power-plane geometry, and decoupling-capacitor placement.

Improper choices in these areas leads to increased impedance, resonant peaks, and voltage ripple, all of which reduce design margin and increase susceptibility to field failures.

2) Capacitor selection: While layer stack-up and power-plane spacing define the baseline impedance profile of the PDN, the white paper points out that impedance is not static over the system lifetime. In addition to geometry-induced resonances, time- and temperature-dependent degradation of decoupling capacitors progressively alters the PDN characteristics and dominates long-term reliability and failure behavior.

3) Thermal and environmental stress: Beyond purely electrical phenomena such as PDN resonances and decoupling degradation, the white paper also points out that long-term system reliability is governed by a broader set of interacting eﬀects. Here, mechanical, thermal, and environmental stressors couple with electrical loading and material properties, giving rise to cumulative damage mechanisms that cannot be captured by electrical analysis alone.

The white paper presents a physics of failure (PoF)-based methodology for predicting the lifetime of power electronics at the PCB level during early design stages. There are a lot of details covered in this approach. A summary of the methodology is shown below to give you an idea of what’s involved.

Application of physics of failure for power electronics

4) Using components out of the specification: After addressing electrical, mechanical, and aging-related failure mechanisms at the system level, the white paper revisits failures that originate directly from component operating conditions. It is pointed out that operation beyond specified electrical and thermal limits remains a dominant root cause of semiconductor failures and directly constrains PDN design and verification. The dominant failure mechanisms for this class of issues are discussed, along with approaches to address them.

To Learn More

A concluding comment from the white paper is that power electronics has moved beyond simple schematic-driven design. The complexity of modern systems demands automated, multi-domain, simulation-driven workflows that drastically reduce manual eﬀort and improve reliability.

This white paper provides an excellent overview of the key issues designers will face and practical guidance on approaches to address these issues. If you are faced with PDN design challenges this is a must-read white paper. You can get your copy here. And that is what’s involved in exploring the hidden complexity of modern power electronics design.

Also Read:

Siemens Wins Best in Show Award at Chiplet Summit and Targets Broad 3D IC Design Enablement

Siemens Fuse EDA AI Agent Releases to Orchestrate Agentic Semiconductor and PCB Design

Accelerating Computational Lithography Using Massively Parallel GPU Rasterizer

Podcast EP341: Details of the Upcoming Microelectronics US Event with Michael Adeniya

Podcast EP341: Details of the Upcoming Microelectronics US Event with Michael Adeniya
by Daniel Nenni on 04-15-2026 at 8:00 am

Daniel is joined by Michael Adeniya, Group Director, Microelectronics Global and a key architect behind the launch of Microelectronics US. Mike is focused on uniting the “Silicon Hills” ecosystem to address the practical engineering bottlenecks of the post CHIPS Act era. By fostering strategic partnerships with leaders like Silicon Catalyst, RISC-V, Edge AI Foundation and more.

Dan begins the discussion by exploring the motivation and goals behind the work Michael is doing at Microelectronics Global. Michael explains that he is developing industry events that provide impact and focus for collaboration well beyond the event’s few days. The Microelectronics US event is designed to accomplish this goal. Michael explains the event is set in Austin, Texas, an under-served vibrant community that provides an excellent backdrop for the show. The event is free for all to attend, and includes a extensive series of presentations, panels, workshops, executive forums, and social gatherings all focused on collaboration to achieve execution at scale.

The event will cover semiconductors, photonics and embedded systems. Michael explains that many major semiconductor companies will be present at the show along with an array of promising startups. Michael also explains that Silicon Catalyst is a strategic partner for this event and he has already worked with the organization on other successful events. The conference is also offering free exhibit space to qualifying startups.

The event will be held Palmer Events Center in Austin, Texas from April 22-23, 2026. You can learn more about the event and register to attend here.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.

April 15, 2026April 15, 2026

Effective Defense Against Hacks at the Edge

Effective Defense Against Hacks at the Edge
by Bernard Murphy on 04-15-2026 at 6:00 am
Categories: PQShield, Quantum Computing, Security

IoT permeates every aspect of our lives, in payment systems, access authorization, vehicles, utilities, factories, hospitals, and in so many other fields. Which makes these systems attractive targets for hacking and social disruption while also challenging to protect given the highly constrained resources that many such devices can support. Effective defense requires a multi-dimensional post-quantum approach: secure boot, secure TLS (transport layer security) and protection against physical attacks through side-channels. PQShield have announced their MicroCore IP (software only or software plus optional hardware acceleration) to address a wide range of IoT footprints, in as little as 5KB RAM, tuned to the most demanding edge applications.

Secure boot

Post quantum cryptography (PQC) is more expensive than classical cryptography. Keys are bigger and algorithms are bigger, which should be no surprise since the point is to defeat quantum computer (QC) attempts to decode weaker pre-quantum cryptography. Post quantum methods therefore call for very careful design to fit resource constrained edge devices.

“Q-day” (the unannounced date on which QCs will be able to break classical encryption) may arrive within the next 10 years. That’s a problem for products already in place and expected to be in deployment for 10-20 years. Product builders have a range of challenging needs: PQC signature verification to authorize over-the-air boot image updates, deployable as a software-only update with no need to change hardware, compliant with the latest authorized PQC standards: NIST, CNSA, PSA, CAVP, and ISO, and supporting a spectrum of options from low footprint to high speed to ultra secure, also in line with standards.

PQShield claim their MicroLib software IP provides all these capabilities, especially in this context enabling PQC in under 5KB of SRAM. Probably not too surprising since they have been active in this area and with NIST for quite a while.

Transport layer security (TLS)

Edge devices in industrial IoT (IIot) applications depend on secure communication between those devices. The mechanism to ensure this is transport layer security (TLS) which has superseded SSL. TLS is realized through a handshake: client and server agree on a TLS version, the server sends a certificate to prove its identity, then session keys are exchanged to encrypt further communication in that session.

Software-only upgrades requiring minimal memory are especially important here. These so-called “brownfield” upgrades are the only practical option in industries which have established IIoT installations yet are required to become PQC compliant. Ripping out pre-PQC devices and starting again would be wildly impractical, yet the risks and regulatory requirements from the NSA and beyond cannot be ignored.

In support of TLS PQShield offers PQC algorithms, PSA Crypto APIs and MbedTLS, supporting seamless migration to PQC secure without proprietary lock in.

Protection against side-channel attacks (SCA)

Side-channel attacks hack hardware by teasing out crypto information through careful study of timing, power, or logic behavior. This idea really took off in the mid-1990s with a paper published by Kocher. Differential Power Analysis (DPA) is one approach, monitoring power consumption at a very fine level during encryption/decryption and using statistical analysis to progressively infer bits in the key. All methods are non-destructive, but some do need close access to a device to be hacked, not so difficult to accomplish in a large factory.

Methods to defend against SCA include balancing computations during crypto operations so that differences between bits in timing or power (for example) cannot be discerned. Achieving this goal requires additional care in designing PQC algorithms, also careful testing on reference board to ensure that implementations demonstrate very low bit-to-bit variance in these physical parameters. PQShield here also claims thy meet these objectives.

The company recently demonstrated these capabilities at Embedded World. You can learn more HERE and you can contact them at contact@pqshield.com.

Also Read:

PQShield on Preparing for Q-Day

The Quantum Threat: Why Industrial Control Systems Must Be Ready and How PQShield Is Leading the Defense

Think Quantum Computing is Hype? Mastercard Begs to Disagree

April 14, 2026April 15, 2026

Hardening the Silicon: Why Analog Anti-Tamper IP is the New Security Baseline

Hardening the Silicon: Why Analog Anti-Tamper IP is the New Security Baseline
by Daniel Nenni on 04-14-2026 at 10:00 am
Categories: Agile Analog, IP

In today’s increasingly connected world, there are billions of SoCs, powering everything from automotive ECUs to industrial IoT sensors and processing sensitive data. While software-level security is taken seriously, hardware-level vulnerabilities have often been an afterthought. As hackers are now using more complex tampering techniques, protecting the physical silicon inside devices has become crucial. Analog anti-tamper IP and sensors can play an important role in strengthening hardware security and preventing attacks that lead to unauthorized access or manipulation.

The primary driver for enhanced hardware security is the mass deployment of decentralized connected devices. According to Agile Analog’s agileSecure anti-tamper presentation, more than ten billion IoT devices are currently in the field. These devices are physically accessible to attackers, making them key targets for invasive and non-invasive tampering. At the same time, as organizations transition to AI-heavy workloads, a single hardware breach could compromise the entire Root of Trust (RoT), resulting in catastrophic data loss and costing millions of dollars. As AI models move to silicon, the need for hardened hardware-level security becomes undeniable.

Hardware attacks are increasingly sophisticated. Hackers can manipulate silicon behavior using techniques such as fault injection, glitching and side channel analysis. Fault injection attacks intentionally introduce errors into a system in order to bypass security checks or access confidential information. Similarly, side channel attacks analyze indirect signals such as power supply or execution timing to extract sensitive data such as cryptographic keys. More advanced physical attacks, including micro-probing, laser fault injection and focused ion beam techniques, allow attackers to access internal circuitry directly. Supply chain threats such as hardware trojans, device cloning and reverse engineering create significant risks for both manufacturers and users.

Recent incidents demonstrate how serious hardware vulnerabilities can be. Field data shows several security breaches caused by hardware attacks. In one instance, voltage glitching was used to bypass secure fuses in consumer electronics to dump encrypted firmware. In the automotive sector, infotainment platforms have been compromised through glitching attacks that neutralized digital signature verification. These examples prove that digital-only security is often not sufficient when attackers can physically interact with hardware.

Analog sensors provide an effective way to detect and prevent many of these attacks. Unlike digital sensors, analog anti-tamper sensors monitor the physical operating conditions of the chip itself. They can detect unexpected voltage levels, irregular clock signals, temperature changes or electromagnetic disturbances that may indicate that an attack is taking place. When these abnormal conditions are detected, the system can trigger an immediate hardware-level response, such as zeroizing keys or forcing a reset.

Several different types of analog anti-tamper sensors are used to strengthen hardware security. Voltage glitch detectors monitor the power supply and identify sudden spikes or drops that could be a sign of an attack. Clock attack monitors detect irregular clock behavior such as frequency manipulation or timing glitches. Thermal sensors identify unusual temperature changes that may occur during cold boot attacks designed to extract encryption keys from memory. Electromagnetic sensors detect electromagnetic fault injection attacks that disrupt internal circuits. By combining multiple sensors, engineers can create layered defenses that significantly increase the difficulty of successfully attacking a device.

Despite their importance, designing analog security sensors presents several challenges. Analog circuit design requires specialized expertise and often involves complex and time consuming manual processes. Variations between semiconductor manufacturing technologies can also make it difficult to reuse designs across different process nodes. As companies face shrinking tape-out windows and a shortage of analog design expertise, many may struggle to implement this advanced hardware security.

Agile Analog is helping to address these challenges. Using the company’s unique Composa tool, it is possible to automatically generate analog IP to a customer’s exact specifications, for any foundry and on any process – from legacy nodes to FinFET. Analog security components from the agileSecure anti-tamper security IP portfolio can be developed more efficiently and deployed across multiple semiconductor technologies. This approach reduces development time and cost while maintaining reliability and performance.

Bottom line: Hardware security has become an essential part of modern electronic systems as connected devices and sophisticated attack techniques continue to increase. Analog anti-tamper sensors provide a critical layer of protection by detecting physical attacks that digital security mechanisms cannot prevent. Integrating these analog sensors into semiconductor designs will play a key role in protecting devices, sensitive data and critical infrastructure in the future.

CONTACT AGILE ANALOG

Also Read:

2026 Outlook with Krishna Anne of Agile Analog

Podcast EP319: What Makes Agile Analog a Unique Company with Chris Morrison

Agile Analog Update at #62DAC