
In a presentation at the RISC-V Summit North America 2025, John Simpson, Senior Principal Architect at SiFive, delved into the evolving landscape of RISC-V extensions tailored for artificial intelligence and machine learning. RISC-V’s open architecture has fueled its adoption in AI/ML markets by allowing customization and extension of core designs. However, Simpson emphasized the importance of balancing this flexibility with standardization under profiles like RVA23 to foster an open ecosystem that promotes innovation while preserving differentiation. As AI models grow exponentially—drawing from Epoch AI data showing model sizes surging from vector compute to massive matrix operations, the need for accelerated matrix multiplication and broader datatype support has become critical. Different application domains necessitate varied ISA approaches, but with only a handful of matrix multiply routines, software portability remains relatively unaffected by these choices.
Central to RISC-V’s AI capabilities is the Vector Extension (RVV), which addresses computations beyond matrix multiplies, such as those in activation functions like LayerNorm, Softmax, Sigmoid, and GELU. These operations, involving exponentials and normalizations, can bottleneck throughput when matrix multiplies are accelerated. For instance, prefilling Llama-3 70B with 1k tokens requires 5.12 billion exponential operations. RVV 1.0 supports integer (INT8/16/32/64) and floating-point (FP16/32/64) datatypes, with extensions like Zvfbmin for BF16 conversions and Zvfbwma for widening BF16 multiply-adds. Proposed additions, such as Zvfbta for BF16 arithmetic and Zvfofp8min for OCP FP8 (E4M3/E5M2) via conversions, aim to expand support. Discussions focus on using an altfmt bit in the vtype CSR to encode new datatypes efficiently, avoiding instruction length expansions. Future activity may include OCP MX formats like FP8/6/4, potentially requiring more instruction space or vtype bits.
Simpson outlined several matrix extension approaches under consideration by RISC-V task groups. The Zvbdot extension introduces vector batch dot-products without new state, leveraging existing vector registers. It computes eight dot-products per instruction, with one input from vector A and eight from group B (columns as registers), accumulating in group C. A 3-bit offset accesses up to 64 results. For VLEN=1024 with FP8 inputs and FP32 outputs, it achieves 1K MACs per instruction while writing only 256 bits, accelerating GEMM and GEMV with a vector-friendly read-heavy design.
Integrated Matrix Extensions (IME TG) reuse vector registers as matrix tiles, adding minimal vtype bits. They support matrix-matrix multiplies, with higher arithmetic intensity from longer vectors. Most sub-proposals require new tile load/store instructions, and Option-G is advancing. Write demands for result C might necessitate register renaming in the matrix unit, transparent to software.
Vector-Matrix Extensions (VME TG) add large matrix accumulator state for C, divided into tiles, while using RVV vectors for A and B. Outer-product multiplies accumulate into C, with potential “fat” support for narrower inputs. It includes moves between C and vectors/memory, enabling high throughput by placing accumulators near arithmetic units.
Attached Matrix Extensions (AME TG) introduce separate state for A, B, and C, performing matrix-matrix multiplies independently of RVV. If RVV is absent, new vector operations on matrix state are needed; otherwise, integration is preferred. Requiring dedicated load/store paths, AME offers the largest design space for peak performance, though no consensus proposal exists yet.
Performance varies by approach: Zvbdot suits LLM decode phases with batch=1, accelerating GEMV. IME fits edge devices prioritizing area/power. VME balances vector sourcing with high MACs, while AME maximizes MACs but demands more resources. For LLMs, larger batches improve efficiency but strain KV cache bandwidth.
Bottom line: These extensions position RISC-V as a versatile AI platform, evolving to meet diverse needs from edge to hyperscale. SiFive’s insights highlight ongoing standardization efforts to ensure scalability and ecosystem growth.
Also Read:
SiFive Launches Second-Generation Intelligence Family of RISC-V Cores
Podcast EP197: A Tour of the RISC-V Movement and SiFive’s Contributions with Jack Kang
Enhancing RISC-V Vector Extensions to Accelerate Performance on ML Workloads
Share this post via:
Comments
There are no comments yet.
You must register or log in to view/post comments.