Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Forum Threads

Nikon Renews Patent Battle with ASML: final capitulation by Nikon in lithography?

latest reply by jai1611 on November 19, 2025

started by user nl on April 25, 2017
ASML CEO says Dutch-China tension has not hit chip-gear maker

latest reply by Jert on November 19, 2025

started by Daniel Nenni on November 16, 2025
Ex-Intel CEO says Taiwan energy concerns warrant U.S. pivot

latest reply by Daniel Nenni on November 19, 2025

started by Daniel Nenni on November 19, 2025
Pat Gelsinger explains how his initials ended up etched into every i386 processor ever made

latest reply by hist78 on November 19, 2025

started by hist78 on November 18, 2025
Why Is Jensen Huang Getting Nervous About TSMC’s 3nm Shortage?

latest reply by Daniel Nenni on November 19, 2025

started by Daniel Nenni on November 13, 2025
No firm is immune if AI bubble bursts, Google CEO tells BBC

latest reply by MKWVentures on November 18, 2025

started by Daniel Nenni on November 18, 2025
Amazon explains how the Trainium chip works to non-experts

started by blueone on November 18, 2025
Optical chips?

latest reply by Arthur Hanson on November 18, 2025

started by Arthur Hanson on November 13, 2025
Intel Cancels its Mainstream Future Xeon

latest reply by siliconbruh999 on November 18, 2025

started by siliconbruh999 on November 16, 2025
US chip giant [GlobalFoundries] buys Singapore’s Advanced Micro Foundry amid AI data centre boom

latest reply by Barnsley on November 18, 2025

started by Barnsley on November 18, 2025

Recent Article Comments

AI RTL Generation versus AI RTL Verification
Too soon to tell I would say. They are all doing interesting work, not clear anyone yet has breakout ideas

— Bernard Murphy on November 16, 2025
EDA Has a Value Capture Problem — An Outsider’s View
Hi Peter, you are absolutely arguing with the right person! I am the author. Two responses: 1. "Large buyers often…

— ly on November 15, 2025
EDA Has a Value Capture Problem — An Outsider’s View
So let's get this straight - the author states that EDA companies carry on hundreds of negotiations each year (true),…

— Peter Bennet on November 15, 2025
Think Quantum Computing is Hype? Mastercard Begs to Disagree
Excellent question Fred. I'm planning a series of blogs on this topic, including a discussion on benchmarking. Saty tuned!

— Bernard Murphy on November 15, 2025
EDA Has a Value Capture Problem — An Outsider’s View
How hard is it to vibe code an open source simulator and synthesizer? The tools used by software engineers are…

— horace on November 15, 2025
Think Quantum Computing is Hype? Mastercard Begs to Disagree
PQC is important but who runs benchmarks?

— Fred Chen on November 14, 2025
AI RTL Generation versus AI RTL Verification
"IDEs may play a part but the bulk of the innovation here is around GenAI examining regression results to determine…

— Debamitro Chakraborti on November 12, 2025
Adding Expertise to GenAI: An Insightful Study on Fine-Tuning
Debamitro - thanks for the huggingface link. A question over time will be what level of quality control is imposed…

— Bernard Murphy on November 12, 2025
Adding Expertise to GenAI: An Insightful Study on Fine-Tuning
It will be interesting to see if fine-tuning suffices for the needs of RTL design/verification. Or will we need to…

— Debamitro Chakraborti on November 12, 2025
PDF Solutions Charts a Course for the Future at Its User Conference and Analyst Day
Hello Horace. The last question in this interview may provide some perspective on the company's name: https://www.investmentreports.co/interview/john-kibarian-1499

— Mike Gianfagna on November 8, 2025

Banner Electrical Verification The invisible bottleneck in IC design updated 1

WP_Term Object
(
    [term_id] => 6435
    [name] => AI
    [slug] => artificial-intelligence
    [term_group] => 0
    [term_taxonomy_id] => 6435
    [taxonomy] => category
    [description] => Artificial Intelligence
    [parent] => 0
    [count] => 736
    [filter] => raw
    [cat_ID] => 6435
    [category_count] => 736
    [category_description] => Artificial Intelligence
    [cat_name] => AI
    [category_nicename] => artificial-intelligence
    [category_parent] => 0
)

October 29, 2025October 28, 2025 by Lauro Rizzatti

Inference Acceleration from the Ground Up

Inference Acceleration from the Ground Up
by Lauro Rizzatti on 10-29-2025 at 6:00 am
Categories: AI

Key Takeaways

VSORA has developed a novel architecture optimized for AI inference, achieving near-theoretical performance in latency, throughput, and energy efficiency.
The architecture addresses the 'memory wall' issue by using a unified memory stage with a massive SRAM array, facilitating faster data access and eliminating bottlenecks.
Each processing core in VSORA's architecture features 16 million registers and integrates high-throughput MAC units, enabling flexible tensor operations and high computational efficiency.

VSORA, a pioneering high-tech company, has engineered a novel architecture designed specifically to meet the stringent demands of AI inference—both in datacenters and at the edge. With near-theoretical performance in latency, throughput, and energy efficiency, VSORA’s architecture breaks away from legacy designs optimized for training workloads.

The team behind VSORA has deep roots in the IP business, having spent years designing, testing, and fine-tuning their architecture. Now in its fifth generation, the architecture has been rigorously validated and benchmarked over the past two years in preparation for silicon manufacturing.

Breaking the Memory Wall

The “memory wall” has challenged chip designers since the late 1980s. Traditional architectures attempt to mitigate the impact on performance induced by data movement between external memory and processing units by layering memory hierarchies, such as multi-layer caches, scratchpads, and tightly coupled memory, each offering tradeoffs between speed and capacity.

In AI acceleration, this bottleneck becomes even more pronounced. Generative AI models, especially those based on incremental transformers, must constantly reprocess massive amounts of intermediate state data. Conventional architectures struggle here. Every cache miss—or any operation requiring access outside in-memory compute—can severely degrade performance.

VSORA tackles this head-on by collapsing the traditional memory hierarchy into a single, unified memory stage: a massive SRAM array that behaves like a flat register file. From the perspective of the processing units, any register can be accessed anywhere, at any time, within a single clock. This eliminates costly data transfers and removes the bottlenecks that hamper other designs.

A New AI Processing Paradigm: 16 Million Registers per Core

At the core of the VSORA’s architecture is a high-throughput computational tile consisting of 16 processing cores. Each core integrates 64K multi-dimensional matrix multiply–accumulate (MAC) units, scalable from 2D to arbitrary N-dimensional tensor operations, alongside eight high-efficiency digital signal processing (DSP) cores. Numerical precision is dynamically configurable on a per-operation basis, ranging from 8-bit fixed-point to 32-bit floating-point formats. Both dense and sparse execution modes are supported, with runtime-selectable sparsity applied independently to weights or activations, enabling fine-grained control of computational efficiency and inference performance.

Each core incorporates an unprecedented 16 million registers, orders of magnitude higher than the few hundred to few thousand typically found in conventional architectures. While such a massive register file would normally challenge traditional compiler designs, VSORA overcomes these with two architectural innovations:

Native Tensor Processing: VSORA’s hardware natively supports vector, tensor, and matrix operations, removing the need to decompose them into scalar instructions. This eliminates the manual implementation of nested loops often required in GPU environments such as CUDA, thereby improving computational efficiency and reducing programming complexity.
High-Level Abstraction: Developers program at a high level using familiar frameworks, such as PyTorch and ONNX for AI workloads, or Matlab-like functions for DSP, without the need to write low-level code or manage registers directly. This abstraction layer streamlines development, enhances productivity, and maximizes hardware utilization.

Chiplet-Based Scalability

VSORA’s physical implementation leverages a chiplet architecture, with each chiplet comprising two VSORA computational tiles. By combining VSORA chiplets with high-bandwidth memory (HBM) chiplet stacks, the architecture enables efficient scaling for both cloud and edge inference scenarios.

Datacenter-Grade Inference. The flagship Jotunn8 configuration pairs eight VSORA chiplets with eight HBM3e chiplets, delivering an impressive 3,200 TFLOPS of compute performance in FP8 dense mode. This configuration is optimized for large-scale inference workloads in datacenters.
Edge AI Configurations. For edge deployments, where memory requirements are lower, VSORA offers:
- Tyr2: Two VSORA chiplets + one HBM chiplet = 800 TFLOPS
- Tyr4: Four VSORA chiplets + one HBM chiplet = 1,600 TFLOPS

These configurations empower efficient tailoring of compute and memory resources to suit the constraints of edge applications.

Power Efficiency as a Side Effect

The performance gains are evident, but equally remarkable are the advances in processing and power efficiency.

Extensive pre-silicon validation using leading large language models (LLMs) across multiple concurrent workloads demonstrated processing efficiencies exceeding 50%, that’s an order of magnitude higher than state-of-the-art GPU-based designs.

In terms of energy efficiency, the Jotunn8 architecture consistently delivers twice the performance-per-watt of comparable solutions. In practical terms, its power draw is limited to approximately 500 watts, compared to more than one kilowatt for many competing accelerators.

Collectively, these innovations yield multiple times higher effective performance at less than half the power consumption, translating to an overall system-level advantage of 8–10× over conventional implementations.

CUDA-Free Compilation Simplifies Algorithmic Mapping and Accelerate Deployment

One of the often-overlooked advantages of the VSORA architecture lies in its streamlined and flexible software stack. From a compilation perspective, the flow is dramatically simplified compared to traditional GPU environments like CUDA.

The process begins with a minimal configuration file of just a few lines that defines the target hardware environment. This file enables the same codebase to execute across a wide range of hardware configurations, whether that means distributing workloads across multiple cores, chiplets, full chips, boards, or even across nodes in a local or remote cloud. The only variable is execution speed; the functional behavior remains unchanged. This makes on-premises and localized cloud deployments seamless and scalable.

A Familiar Flow, Without the Complexity

Unlike CUDA-based compilation processes, the VSORA flow appears reassuringly basic without the layers of manual tuning and complexity. Traditional GPU environments often require multiple painful optimization steps that, when successful, can deliver strong performance, but are fragile and time-consuming. VSORA simplifies this through a more automated and hardware-agnostic compilation approach.

The flow begins by ingesting standard AI inputs, such as models defined in PyTorch. These are processed by VSORA’s proprietary graph compiler, which automatically performs essential transformations such as layer reordering or slicing for optimal execution. It extracts weights and model structure and then outputs an intermediate C++ representation.

This C++ code is then fed into an LLVM-based backend, which identifies the compute-intensive portions of the code and maps them to the VSORA architecture. At this stage, the system becomes hardware-aware, assigning compute operations to the appropriate configuration—whether it’s a single VSORA tile, a TYR4 edge device, a full Jotunn8 datacenter accelerator, a server, a rack or even multiple racks in different locations.

Invisible Acceleration for Developers

From a developer’s point of view, the VSORA accelerator is invisible. Code is written as if it targets the main processor. During compilation, the compilation flow identifies the code segments best suited for acceleration and transparently handles the transformation and mapping to VSORA hardware. This significantly lowers the barrier for adoption, requiring no low-level register manipulation or specialized programming knowledge.

VSORA’s instruction set is high-level and intuitive, carrying over rich capabilities from its origins in digital signal processing. The architecture supports AI-specific formats such as FP8 and FP16, as well as traditional DSP operations like FP16 arithmetic, all handled automatically on a per-layer basis. Switching between modes is instantaneous and requires no manual intervention.

Pipeline-Independent Execution and Intelligent Data Retention

A key architectural advantage is pipeline independence—the ability to dynamically insert or remove pipeline stages based on workload needs. This gives the system a unique capacity to “look ahead and behind” within a data stream, identifying which information must be retained for reuse. As a result, data traffic is minimized, and memory access patterns are optimized for maximum performance and efficiency, reaching levels unachievable in conventional AI or DSP systems.

Built-In Functional Safety

To support mission-critical applications such as autonomous driving, VSORA integrates functional safety features at the architectural level. Cores can be configured to operate in lockstep mode or in redundant configurations, enabling compliance with strict safety and reliability requirements.

Conclusion

VSORA is not retrofitting old designs for modern inference needs, instead it’s building from the ground up. With a memory architecture that eliminates traditional bottlenecks, compute units tailored for tensor operations, and unmatched power efficiency, VSORA is setting a new standard for AI inference—whether in the cloud or at the edge.

Also Read:

The Rise, Fall, and Rebirth of In-Circuit Emulation (Part 1 of 2)

The Rise, Fall, and Rebirth of In-Circuit Emulation: Real-World Case Studies (Part 2 of 2)

Silicon Valley, à la Française

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.

AI RTL Generation versus AI RTL Verification
Too soon to tell I would say. They are all doing interesting work, not clear anyone yet has breakout ideas

— Bernard Murphy on November 16, 2025
EDA Has a Value Capture Problem — An Outsider’s View
Hi Peter, you are absolutely arguing with the right person! I am the author. Two responses: 1. "Large buyers often…

— ly on November 15, 2025
EDA Has a Value Capture Problem — An Outsider’s View
So let's get this straight - the author states that EDA companies carry on hundreds of negotiations each year (true),…

— Peter Bennet on November 15, 2025
Think Quantum Computing is Hype? Mastercard Begs to Disagree
Excellent question Fred. I'm planning a series of blogs on this topic, including a discussion on benchmarking. Saty tuned!

— Bernard Murphy on November 15, 2025
EDA Has a Value Capture Problem — An Outsider’s View
How hard is it to vibe code an open source simulator and synthesizer? The tools used by software engineers are…

— horace on November 15, 2025
Think Quantum Computing is Hype? Mastercard Begs to Disagree
PQC is important but who runs benchmarks?

— Fred Chen on November 14, 2025
AI RTL Generation versus AI RTL Verification
"IDEs may play a part but the bulk of the innovation here is around GenAI examining regression results to determine…

— Debamitro Chakraborti on November 12, 2025
Adding Expertise to GenAI: An Insightful Study on Fine-Tuning
Debamitro - thanks for the huggingface link. A question over time will be what level of quality control is imposed…

— Bernard Murphy on November 12, 2025

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Key Takeaways

Breaking the Memory Wall

A New AI Processing Paradigm: 16 Million Registers per Core

Power Efficiency as a Side Effect

CUDA-Free Compilation Simplifies Algorithmic Mapping and Accelerate Deployment

Conclusion

Also Read:

Comments

Recent Forum Threads

Recent Article Comments