WP_Term Object
(
    [term_id] => 21412
    [name] => Semidynamics
    [slug] => semidynamics
    [term_group] => 0
    [term_taxonomy_id] => 21412
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 12
    [filter] => raw
    [cat_ID] => 21412
    [category_count] => 12
    [category_description] => 
    [cat_name] => Semidynamics
    [category_nicename] => semidynamics
    [category_parent] => 178
)

WP_Term Object
(
    [term_id] => 21412
    [name] => Semidynamics
    [slug] => semidynamics
    [term_group] => 0
    [term_taxonomy_id] => 21412
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 12
    [filter] => raw
    [cat_ID] => 21412
    [category_count] => 12
    [category_description] => 
    [cat_name] => Semidynamics
    [category_nicename] => semidynamics
    [category_parent] => 178
)

August 30, 2023January 17, 2025 by Daniel Payne

RISC-V 64 bit IP for High Performance

RISC-V 64 bit IP for High Performance
by Daniel Payne on 08-30-2023 at 10:00 am
Categories: IP, Semidynamics

RISC-V as an Instruction Set Architecture (ISA) has grown quickly in commercial importance and relevance since its release to the open community in 2015, attracting many IP vendors that now provide a variety of RTL cores. Roger Espasa, CEO and Founder of Semidynamics, has presented at RISC-V events on how their IP is customized for compute challenges that require high bandwidth and high performance cores with vector units. Semidynamics was founded in 2016, has Barcelona for the HQ, and already has customers in the US and Asia by offering two customizable RISC-V IPs:

Avispado – in-order RISCV64GCV, supporting AXI and CHI
Atrevido – out-of-order RISCV64GC, supporting AXI and CHI

A typical CPU has a handful of big cores and large caches, making them easy to program, though not high performance.

GPUs, by contrast, have many tiny cores that provide high performance for parallel code, but are harder to program and add communication latency through the PCIe bus when data needs to be passed back and forth between the CPU and the GPU.

The approach at Semidynamics is to use a RISC-V core connected to compute cores which makes it easy to program, higher performance for parallel codes and offering zero communication latency. CPU plus vector unit provides the best of both worlds.

RISC-V CPU plus Vector unit, higher performance — CPU plus Vector unit

The RISC-V specification documents 32 vector registers, and you can add a number of vector cores, along with a connection to your cache inside a vector unit.

With Semidynamics IP you can customize the number of Vector Cores: 4, 8, 16, 32. Another way to look at this is to note that 4 Vector Cores is 256-bit, up to 32 Vector Cores which is 2,048-bit.

IP users also choose which data types: FP64, FP32, FP16, BF16, INT64, INT32, INT16, INT8. For an AI application they may choose data types of FP16, BF16, while an HPC application could select FP64, FP32.

The third customization is the Vector Register Length, where for more performance and lower power you can make the vector register bigger than the vector unit.

Here’s the block diagram of the Atrevideo 423-V8:

Atrevido min — Atrevido 423 + V8 Vector Unit

The vector unit is fully out of order, which is unique among RISC-V IP vendors. The combination of the vector unit plus Gazzillion unit are capable of streaming data at over 60 Bytes/cycles.

Vector + Gazzillion, Bytes/Cycle performance — High Bandwidth: Vector + Gazzillion

The purple line shows the Read performance and in the L1 Cache it’s 20-60 bytes/cycle, other machines show a rapid drop in bandwidth after leaving L1 Cache, while this approach keeps going, with a flattening at 56. Even going to DDR memory shows a bandwidth of 40. With a clock rate of 1.0GHz that makes 40 GB/s bandwidth.

IP customers can even add their own RTL code connected to the Vector Unit for their own purposes.

Performance of matrix multiplication is important in AI workloads, and on the OOO V8 Vector Unit there’s a peak of 16 FP64 FLOPS/cycle, and a 99% of peak for a matrix size >= 400. For a small matrix size of 24×24 the performance is 7 FP64 FLOPS/cycle, or 50% of peak. Matrix multiplication for FP16 using a Vector Unit with 8 vector cores has a peak of 64 FP16 FLOPS/cycle, and 99% of peak for M >= 600.

A real-time object detection benchmark called YOLO (You Only Look Once) was run on the Atrevido 423-V8 platform, and it showed a 58% higher performance per vector core than competitors. These results were for video with 24 layers. 5.56 Gops/frame and about 9M parameters.

Summary

Choosing a RISC-V IP vendor is a complicated task, so knowing about vendors like Semidynamics can help you better understand how a customized approach could most efficiently run your specific workloads. With Semidynamics you get to choose between architectural choices like in-order or out-of-order, with or without vector units. The reported numbers from this IP vendor look promising, and I look forward to their future announcements.

Comments

There are no comments yet.

You must register or log in to view/post comments.

Instance

Array
(
    [node_name] => Semidynamics
    [node_id] => Array
        (
            [0] => 2
        )

)

Nodes

Threads

XF\Mvc\Entity\ArrayCollection Object ( [entities:protected] => Array ( [469] => XF\Entity\Node Object ( [_uniqueEntityId:XF\Mvc\Entity\Entity:private] => 48 [rootClass:protected] => XF\Entity\Node [_useReplaceInto:protected] => [_newValues:protected] => Array ( ) [_values:protected] => Array ( [node_id] => 469 [title] => Semidynamics [description] => [node_name] => [node_type_id] => Forum [parent_node_id] => 355 [display_order] => 376 [display_in_list] => 1 [lft] => 95 [rgt] => 96 [depth] => 2 [style_id] => 0 [effective_style_id] => 4 [breadcrumb_data] => {"385":{"node_id":385,"title":"Companies","depth":0,"lft":13,"node_name":null,"node_type_id":"LinkForum","display_in_list":true},"355":{"node_id":355,"title":"Company Forums","depth":1,"lft":14,"node_name":null,"node_type_id":"Category","display_in_list":true}} [navigation_id] => [effective_navigation_id] => ) [_relations:protected] => Array ( ) [_previousValues:protected] => Array ( ) [_options:protected] => Array ( ) [_deleted:protected] => [_readOnly:protected] => [_writePending:protected] => [_writeRunning:protected] => [_errors:protected] => Array ( ) [_whenSaveable:protected] => Array ( ) [_cascadeSave:protected] => Array ( ) [_behaviors:protected] => ) ) [populated:protected] => 1 )

Upcoming Events

Auto.AI USA

Verification Futures Conference 2025

SNUG India 2025

Webinar: Securing Post-Quantum Implementations Against Physical Attacks

Webinar: Data Center RAS in the Age of AI Computing

Search Semiwiki

Recent Semidynamics Articles

From All-in-One IP to Cervell™: How Semidynamics Reimagined AI Compute with RISC-V

Vision-Language Models (VLM) – the next big thing in AI?

Semidynamics adds NoC partner and ONNX for RISC-V AI applications

2025 Outlook with Volker Politz of Semidynamics

Semidynamics: A Single-Software-Stack, Configurable and Customizable RISC-V Solution

Gazzillion Misses – Making the Memory Wall Irrelevant

CEO Interview: Roger Espasa of Semidynamics

Semidynamics Shakes Up Embedded World 2024 with All-In-One AI IP to Power Nextgen AI Chips

RISC-V Summit Buzz – Semidynamics Founder and CEO Roger Espasa Introduces Extreme Customization

Deeper RISC-V pipeline plows through vector-scalar loops