WP_Term Object
(
    [term_id] => 48
    [name] => SiFive
    [slug] => sifive
    [term_group] => 0
    [term_taxonomy_id] => 48
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 43
    [filter] => raw
    [cat_ID] => 48
    [category_count] => 43
    [category_description] => 
    [cat_name] => SiFive
    [category_nicename] => sifive
    [category_parent] => 178
)

December 31, 2025February 9, 2026 by Daniel Nenni

Tiling Support in SiFive’s AI/ML Software Stack for RISC-V Vector-Matrix Extension

Name: Semitracks Course: Semiconductor Reliability and Product Qualification
Start: 2026-03-09T00:00:00-07:00
End: 2026-03-12T23:59:59-07:00
Location: Munich, Germany

Tiling Support in SiFive’s AI/ML Software Stack for RISC-V Vector-Matrix Extension
by Daniel Nenni on 12-31-2025 at 10:00 am
Categories: Events, IP, SiFive

Key takeaways ▼

At the 2025 RISC-V Summit North America, Min Hsu, Staff Compiler Engineer at SiFive, presented on enhancing tiling support within SiFive’s AI/ML software stack for the RISC-V Vector-Matrix Extension (VME). This extension aims to boost matrix multiplication efficiency, a cornerstone of AI workloads. SiFive’s VME implementation introduces a large matrix accumulator state for the result matrix C, leveraging existing RISC-V Vector (RVV) registers to supply source operands A and B. This design enables outer-product-style multiplications directly into the C accumulator, with options for “fat” k>1 support to handle narrower input datatypes. Rows or columns of C can be moved to vector registers or loaded/stored from memory, and the C state may be segmented into multiple tiles. By positioning the accumulator near arithmetic units, the matrix engine achieves high throughput, making it ideal for compute-intensive AI tasks.

A key focus was tiled matrix multiplication, illustrated through a Python pseudocode example. The function tiled_matmul decomposes large matrices A (m x k), B (k x n), and C (m x n) into manageable tiles. Outer loops iterate over tile_m, tile_n, and tile_k dimensions, creating views of sub-matrices (e.g., lhs_tile = A[m1:m1+tile_m, k1:k1+tile_k]). Inner loops then apply register-level tiling with tile_m_v, tile_n_v, and tile_k_v, performing the core operation: dst_tile[mv:mv+tile_m_v, nv:nv+tile_n_v] += np.matmul(lhs_tile_v, rhs_tile_v). This hierarchical tiling optimizes data locality—outer tiles fit into caches, inner ones into registers—reducing memory access overhead and enhancing performance for large-scale AI models.

SiFive’s AI/ML software stack integrates these hardware features seamlessly, enabling end-to-end execution of high-profile models on SiFive platforms. Central to this is the Intermediate Representation Execution Environment (IREE), an open-source MLIR-based compiler and runtime optimized for SiFive microarchitectures. IREE supports diverse front-ends like PyTorch for LLMs, applying target-specific tiling policies to break down operations. It enables intra-operation parallelization, generates code via SiFive’s tuned LLVM compilers and Scalable Kernel Libraries (SKL), and mixes MLIR codegen with microkernels (ukernels) for efficiency. The runtime handles inter-operation parallelization through asynchronous execution and task scheduling, supporting both Linux and bare-metal environments.

Hsu highlighted advancements in multi-tile matrix multiplication within IREE. Previously, IREE supported only single-tile K-loops, where sources A0 and B0 are loaded once, and a single matmul accumulates into C00. Now, enhancements allow multi-tile K-loops, loading sources like A0, A1 once and distributing accumulations across multiple C tiles (e.g., C00 += A0 * B0, C10 += A1 * B0, then C01 += A0 * B1, C11 += A1 * B1). This reduces redundant loads, improving arithmetic intensity and efficiency, especially for deep neural networks where K dimensions are large.

In takeaways, Hsu emphasized that tiled matrix multiplication is essential for high-performance AI/ML applications, as it maximizes hardware utilization. IREE excels in automating and optimizing these tiling strategies. RISC-V’s VME is purpose-built for such tiled operations, delivering native performance gains. SiFive’s XM series implements VME in a compact, integrated form factor, and the team’s contributions to IREE—particularly multi-tile support—further amplify efficiency. This software-hardware synergy positions SiFive’s stack as a robust solution for AI acceleration on RISC-V, bridging custom extensions with standardized ecosystems to drive innovation in edge and datacenter AI.

Bottom line: The presentation underscores SiFive’s commitment to advancing RISC-V for AI, combining architectural extensions with sophisticated compiler tools to tackle compute bottlenecks effectively.

Also Read:

SiFive Launches Second-Generation Intelligence Family of RISC-V Cores

Podcast EP197: A Tour of the RISC-V Movement and SiFive’s Contributions with Jack Kang

Enhancing RISC-V Vector Extensions to Accelerate Performance on ML Workloads

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.

Instance

Array
(
    [node_name] => SiFive
    [node_id] => Array
        (
            [0] => 2
        )

)

Nodes

Threads

XF\Mvc\Entity\ArrayCollection Object ( [entities:protected] => Array ( [372] => XF\Entity\Node Object ( [_uniqueEntityId:XF\Mvc\Entity\Entity:private] => 54 [rootClass:protected] => XF\Entity\Node [_useReplaceInto:protected] => [_newValues:protected] => Array ( ) [_values:protected] => Array ( [node_id] => 372 [title] => SiFive [description] => [node_name] => [node_type_id] => Forum [parent_node_id] => 355 [display_order] => 390 [display_in_list] => 1 [lft] => 83 [rgt] => 84 [depth] => 2 [style_id] => 0 [effective_style_id] => 4 [breadcrumb_data] => {"385":{"node_id":385,"title":"Companies","depth":0,"lft":13,"node_name":null,"node_type_id":"LinkForum","display_in_list":true},"355":{"node_id":355,"title":"Company Forums","depth":1,"lft":14,"node_name":null,"node_type_id":"Category","display_in_list":true}} [navigation_id] => [effective_navigation_id] => ) [_relations:protected] => Array ( ) [_previousValues:protected] => Array ( ) [_options:protected] => Array ( ) [_deleted:protected] => [_readOnly:protected] => [_writePending:protected] => [_writeRunning:protected] => [_errors:protected] => Array ( ) [_whenSaveable:protected] => Array ( ) [_cascadeSave:protected] => Array ( ) [_behaviors:protected] => ) ) [populated:protected] => 1 )

Search Semiwiki

Recent SiFive Articles

Also Read:

Comments

Sponsor

Recent SiFive Press