IC Mask SemiWiki Webinar Banner
WP_Term Object
(
    [term_id] => 106
    [name] => FPGA
    [slug] => fpga
    [term_group] => 0
    [term_taxonomy_id] => 106
    [taxonomy] => category
    [description] => 
    [parent] => 0
    [count] => 339
    [filter] => raw
    [cat_ID] => 106
    [category_count] => 339
    [category_description] => 
    [cat_name] => FPGA
    [category_nicename] => fpga
    [category_parent] => 0
)

Acceleration in a Heterogenous Compute Environment

Acceleration in a Heterogenous Compute Environment
by Bernard Murphy on 10-02-2019 at 5:00 am

Heterogenous compute isn’t a new concept. We’ve had it in phones and datacenters for quite a while – CPUs complemented by GPUs, DSPs and perhaps other specialized processors. But each of these compute engines has a very specific role, each driven by its own software (or training in the case of AI accelerators). You write software for the CPU, you write different software for the GPU and so on. Which makes sense, but it’s not general-purpose acceleration for a unified code-set. Could there be an equivalent in heterogenous compute to the multi-threading we use every day in multi-core compute?

Acceleration

Of course we need to think outside the box on how this might work; you can’t just drop general code on a mixed architecture and expect acceleration, any more than you can drop general code on a multi-core system with similar expectations. But given sufficient imagination, it appears the answer is yes. I recently came across a company called CacheQ which claims to provide a compelling answer in this space.

The company was founded just last year and is headed by a couple of senior FPGA guys. Clay Johnson (CEO) was VP of a Xilinx BU for a long time before going on to lead security ventures, and Dave Bennett (CTO) has a similar background, leading software dev at Xilinx for many years before again joining Clay in the security biz and now in CacheQ. Funding comes from Social Capital (amount not disclosed).

Given their background, it’s not surprising they turned first to FPGAs as a resource for acceleration. FPGAs are becoming more common as resources in datacenters (just look at Microsoft Azure) and in a lot of edge applications, I’m guessing for flexibility and easy field update, also as an appealing option for relatively low volume applications. And they are also starting to appear as embedded IP inside SoCs.

Back to the goal. CacheQ’s objective is to let software developers start with C-code (or object code) and to be able to significantly accelerate that code by leveraging a combination of CPU and FPGA resources while speeding and simplifying implementation and partitioning between processor and FPGA. At this point I started to wonder if this was some kind of high-level synthesis (HLS) play. It isn’t (the object code option is perhaps a hint). They position their product as an ultravisor (think amped-up hypervisor). They build a virtual machine around an application which then goes through an optimization and partitioning phase, then into code generation and mapping across any of several possible targets: x86 servers or desktops, FPGA accelerators, embedded Arm devices or heterogenous SoCs.

Still, doesn’t mapping onto FPGAs require going through RTL with all its concomitant challenges? Here the company provides some detail though they are understandably cagey about providing too much. So here is what I can tell you. Part of what they’re doing is unrolling complex loops in the code and mapping these, with pipelining, into the FPGA. They also automatically create the stack to manage CPU to FPGA communication and they manage memory allocation transparently across these domains.

The “aha” here is that they’re providing a way for software developers to get acceleration, not a way for hardware developers to build a design. This is a quite different intent from HLS and a goal that many have been chasing for a while. They don’t have to map everything to the FPGA, they just have to provide significant net speedup in critical pieces of code. They show some impressive numbers for key functions on their website

I asked about active applications today. Clay mentioned use in weather simulation, industrial and government applications. I also asked about support for other potential accelerators (GPU, DSP, …). He said that these are in long-term planning; each can offer acceleration in its own way, I would guess for big matrix operations as an example.

This looks like an interesting challenge to the long-standing problem of making FPGAs (and ultimately other platforms) more accessible to the general-purpose programmer. Worth a closer look. The website is HERE.

Share this post via:

Comments

3 Replies to “Acceleration in a Heterogenous Compute Environment”

You must register or log in to view/post comments.