WP_Term Object
(
    [term_id] => 98
    [name] => Andes Technology
    [slug] => andes-technology
    [term_group] => 0
    [term_taxonomy_id] => 98
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 37
    [filter] => raw
    [cat_ID] => 98
    [category_count] => 37
    [category_description] => 
    [cat_name] => Andes Technology
    [category_nicename] => andes-technology
    [category_parent] => 178
)
            
Andes Logo SemiWiki 2026
WP_Term Object
(
    [term_id] => 98
    [name] => Andes Technology
    [slug] => andes-technology
    [term_group] => 0
    [term_taxonomy_id] => 98
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 37
    [filter] => raw
    [cat_ID] => 98
    [category_count] => 37
    [category_description] => 
    [cat_name] => Andes Technology
    [category_nicename] => andes-technology
    [category_parent] => 178
)

Pushing the Packed SIMD Extension Over the Line: An Update on the Progress of Key RISC-V Extension

Pushing the Packed SIMD Extension Over the Line: An Update on the Progress of Key RISC-V Extension
by Daniel Nenni on 01-20-2026 at 6:00 am

Key takeaways

Pushing the Packed SIMD Extension Over the Line Andes RISCV Summit

The rapid growth of signal processing workloads in embedded, mobile, and edge computing systems has intensified the need for efficient, low-latency computation. Rich Fuhler’s update on the RISC-V Packed SIMD extension highlights why scalar SIMD digital signal processing (DSP) instructions are becoming a critical architectural feature and how the RISC-V ecosystem is moving closer to standardizing and deploying them at scale.

Packed SIMD, sometimes referred to as scalar SIMD, occupies a middle ground between purely scalar execution and full vector or GPU-style parallelism. Rather than operating on long vectors, packed SIMD instructions perform the same operation on multiple narrow data elements packed into a single scalar register. This approach is particularly effective for DSP-heavy workloads such as audio codecs, image processing, and communications algorithms, where operations like saturated arithmetic, multiply-accumulate (MAC), and bit manipulation dominate execution profiles.

One of the primary motivations for packed SIMD instructions is their suitability for latency-sensitive and deterministic workloads. Many DSP applications must meet strict real-time deadlines and cannot tolerate the overhead or nondeterminism associated with offloading computation to GPUs or wide vector units. Scalar SIMD instructions reduce instruction count and execution cycles while remaining tightly integrated into the scalar pipeline, enabling predictable timing behavior that is essential for real-time systems such as audio processing chains or control loops in industrial applications.

Power and silicon area efficiency are equally important drivers. In embedded and IoT devices, full SIMD or vector units often impose prohibitive costs in terms of energy consumption and die area. The presentation highlights a striking comparison from Andes Technology: a vector extension with two vector processing units can require roughly 850K logic gates, whereas the packed SIMD extension can be implemented in approximately 80K gates. This order-of-magnitude difference makes packed SIMD an attractive solution for designers who need higher performance than scalar code can deliver but cannot afford the overhead of full vector hardware.

As a result, a wide range of markets stand to benefit from the standardization of packed SIMD in RISC-V. These include mobile and edge AI, automotive and industrial IoT, consumer electronics, communications infrastructure such as 5G and satellite systems, and even microcontroller-class devices. In all of these domains, workloads frequently involve fixed-point arithmetic and repetitive DSP kernels that map naturally to packed SIMD operations.

From a standardization perspective, the Packed SIMD extension has reached an important consolidation phase. Instruction definitions that were previously scattered across multiple documents are being combined, with the majority now captured in the v0.92 draft of the specification, albeit with some renaming. New architectural tests have been written, and discussions are ongoing with the Architecture Review Committee to finalize instruction layout and formatting before formal review. An asciidoc version of the specification is expected to be published to GitHub, signaling increasing maturity and openness of the extension.

Toolchain support is also progressing rapidly. Updates for GCC, LLVM, and binutils-gdb have already been pushed upstream, ensuring that compiler and debugger ecosystems can take advantage of packed SIMD instructions. Work on C and C++ intrinsic functions is underway, which will make it easier for application developers to explicitly leverage the extension without resorting to hand-written assembly. In addition, architectural models and compliance tools such as SAIL, ACTs, and RISCOF are being prepared for public availability, alongside simulators like QEMU and Spike.

Bottom line: Benchmarking results presented using the Andes D23 core demonstrate substantial performance gains across a wide range of audio codecs and DSP workloads when packed SIMD is enabled, compared to configurations without DSP support. These results reinforce the extension’s practical value and underline why pushing the Packed SIMD extension “over the line” is a key milestone for the RISC-V ecosystem

Also Read:

RISC-V: Powering the Era of Intelligent General Computing

Navigating SoC Tradeoffs from IP to Ecosystem

S2C, MachineWare, and Andes Introduce RISC-V Co-Emulation Solution to Accelerate Chip Development

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.