Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

Moore’s Law Wiki
Yes, I am trying to teach AI how to do semiconductor wikis and put the Wiki back in SemiWiki. Should…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
I am trying to teach AI to speak semiconductor wikis. The problem is the date of the references. A 2023…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
Hmm - what's the source for 0.015-0.016? -- this thread shows 0.0199 (N3B) and 0.021 (N3E) https://semiwiki.com/forum/threads/tsmc-officially-halts-sram-scaling.17223/ Perhaps this source…

— Xebec on July 14, 2025
Moore’s Law Wiki
Are these AI Generated? :)

— Xebec on July 14, 2025
TSMC N3 Process Technology Wiki
It should be 25-30% smaller? Process Node Typical SRAM Cell Size Density Improvement TSMC N5 ~0.021 µm² — TSMC N3…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
~1.6x denser vs. N5 SRAM I thought the scaling was more like 1.05X? (Various threads here on 'SRAM scaling dead…

— Xebec on July 14, 2025
Facing the Quantum Nature of EUV Lithography
This presentation considers 5 nm Gaussian acid blur: https://www.youtube.com/watch?v=MYLdE69RDBg

— Fred Chen on July 7, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Appreciate your take, Rahul. You’re absolutely right that market scale drives architectural investment—scalar dominated when desktop and enterprise ruled, and…

— Jonah McLeod on June 29, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Well.. I found this to be a funny article. Flynn's critique is fine and good...but not really the driving factor…

— Rahul Razdan on June 29, 2025
Reachability in Analog and AMS. Innovation in Verification
Apologies for that slip-up on our part. Failing memories!

— Bernard Murphy on June 27, 2025

WP_Term Object
(
    [term_id] => 106
    [name] => FPGA
    [slug] => fpga
    [term_group] => 0
    [term_taxonomy_id] => 106
    [taxonomy] => category
    [description] => 
    [parent] => 0
    [count] => 340
    [filter] => raw
    [cat_ID] => 106
    [category_count] => 340
    [category_description] => 
    [cat_name] => FPGA
    [category_nicename] => fpga
    [category_parent] => 0
)

October 2, 2019October 13, 2019 by Bernard Murphy

Acceleration in a Heterogenous Compute Environment

Acceleration in a Heterogenous Compute Environment
by Bernard Murphy on 10-02-2019 at 5:00 am
Categories: eFPGA, FPGA
3 Comments

Heterogenous compute isn’t a new concept. We’ve had it in phones and datacenters for quite a while – CPUs complemented by GPUs, DSPs and perhaps other specialized processors. But each of these compute engines has a very specific role, each driven by its own software (or training in the case of AI accelerators). You write software for the CPU, you write different software for the GPU and so on. Which makes sense, but it’s not general-purpose acceleration for a unified code-set. Could there be an equivalent in heterogenous compute to the multi-threading we use every day in multi-core compute?

Of course we need to think outside the box on how this might work; you can’t just drop general code on a mixed architecture and expect acceleration, any more than you can drop general code on a multi-core system with similar expectations. But given sufficient imagination, it appears the answer is yes. I recently came across a company called CacheQ which claims to provide a compelling answer in this space.

The company was founded just last year and is headed by a couple of senior FPGA guys. Clay Johnson (CEO) was VP of a Xilinx BU for a long time before going on to lead security ventures, and Dave Bennett (CTO) has a similar background, leading software dev at Xilinx for many years before again joining Clay in the security biz and now in CacheQ. Funding comes from Social Capital (amount not disclosed).

Given their background, it’s not surprising they turned first to FPGAs as a resource for acceleration. FPGAs are becoming more common as resources in datacenters (just look at Microsoft Azure) and in a lot of edge applications, I’m guessing for flexibility and easy field update, also as an appealing option for relatively low volume applications. And they are also starting to appear as embedded IP inside SoCs.

Back to the goal. CacheQ’s objective is to let software developers start with C-code (or object code) and to be able to significantly accelerate that code by leveraging a combination of CPU and FPGA resources while speeding and simplifying implementation and partitioning between processor and FPGA. At this point I started to wonder if this was some kind of high-level synthesis (HLS) play. It isn’t (the object code option is perhaps a hint). They position their product as an ultravisor (think amped-up hypervisor). They build a virtual machine around an application which then goes through an optimization and partitioning phase, then into code generation and mapping across any of several possible targets: x86 servers or desktops, FPGA accelerators, embedded Arm devices or heterogenous SoCs.

Still, doesn’t mapping onto FPGAs require going through RTL with all its concomitant challenges? Here the company provides some detail though they are understandably cagey about providing too much. So here is what I can tell you. Part of what they’re doing is unrolling complex loops in the code and mapping these, with pipelining, into the FPGA. They also automatically create the stack to manage CPU to FPGA communication and they manage memory allocation transparently across these domains.

The “aha” here is that they’re providing a way for software developers to get acceleration, not a way for hardware developers to build a design. This is a quite different intent from HLS and a goal that many have been chasing for a while. They don’t have to map everything to the FPGA, they just have to provide significant net speedup in critical pieces of code. They show some impressive numbers for key functions on their website

I asked about active applications today. Clay mentioned use in weather simulation, industrial and government applications. I also asked about support for other potential accelerators (GPU, DSP, …). He said that these are in long-term planning; each can offer acceleration in its own way, I would guess for big matrix operations as an example.

This looks like an interesting challenge to the long-standing problem of making FPGAs (and ultimately other platforms) more accessible to the general-purpose programmer. Worth a closer look. The website is HERE.

Share this post via:

Comments

3 Replies to “Acceleration in a Heterogenous Compute Environment”

You must register or log in to view/post comments.

Moore’s Law Wiki
Yes, I am trying to teach AI how to do semiconductor wikis and put the Wiki back in SemiWiki. Should…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
I am trying to teach AI to speak semiconductor wikis. The problem is the date of the references. A 2023…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
Hmm - what's the source for 0.015-0.016? -- this thread shows 0.0199 (N3B) and 0.021 (N3E) https://semiwiki.com/forum/threads/tsmc-officially-halts-sram-scaling.17223/ Perhaps this source…

— Xebec on July 14, 2025
Moore’s Law Wiki
Are these AI Generated? :)

— Xebec on July 14, 2025
TSMC N3 Process Technology Wiki
It should be 25-30% smaller? Process Node Typical SRAM Cell Size Density Improvement TSMC N5 ~0.021 µm² — TSMC N3…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
~1.6x denser vs. N5 SRAM I thought the scaling was more like 1.05X? (Various threads here on 'SRAM scaling dead…

— Xebec on July 14, 2025
Facing the Quantum Nature of EUV Lithography
This presentation considers 5 nm Gaussian acid blur: https://www.youtube.com/watch?v=MYLdE69RDBg

— Fred Chen on July 7, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Appreciate your take, Rahul. You’re absolutely right that market scale drives architectural investment—scalar dominated when desktop and enterprise ruled, and…

— Jonah McLeod on June 29, 2025

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

3 Replies to “Acceleration in a Heterogenous Compute Environment”

Recent Forum Threads

Recent Article Comments