WP_Term Object
    [term_id] => 45
    [name] => Aldec
    [slug] => aldec
    [term_group] => 0
    [term_taxonomy_id] => 45
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 96
    [filter] => raw
    [cat_ID] => 45
    [category_count] => 96
    [category_description] => 
    [cat_name] => Aldec
    [category_nicename] => aldec
    [category_parent] => 157
    [is_post] => 1

Partitioning for Prototypes

Partitioning for Prototypes
by Bernard Murphy on 09-19-2017 at 7:00 am

I earlier wrote a piece to make you aware of a webinar to be hosted by Aldec on some of their capabilities for partitioning large designs for prototyping. That webinar has now been broadcast and I have provided a link to the recorded version at the end of this piece. The webinar gets into the details of how exactly you would use the software to optimally partition; here I’ll revisit why this is important, adding a realization for me on the pros and cons of automatic versus guided partitioning.

Prototyping a hardware design on a FPGA platform is especially important for software development, debug and regression while the ultimate ASIC hardware is still in development or even in the early stage when your bidding on an RFP or hoping to persuade VCs/angels that you have an investment-worthy idea. They’re also the best way to test in-system behavior with external interfaces like video streams, storage interfaces and a wealth of communications options.

But you can’t typically fit an SoC into even the largest FPGA; Aldec cited as an example a multi-core graphics processor requiring 15 Xilinx UltraScale devices to fully map the design. This means you need to figure out how to split your design across those devices. The temptation may to build your own board or set of boards, which may initially seem simpler but you’ll quickly find that splitting the design effectively and balancing delays at those splits can be very non-trivial.

Wherever you spilt, signals have to cross through board traces and sometimes even between boards. Those signals have to travel through device pins, the board traces and possibly backplane connections so they’re going to switch more slowly than signals within a device. What appeared to be reasonably matched delays in your RTL design can quickly become very unmatched. And in ways that can change wildly with your split strategy; on this experiment, you have certain critical paths you need to manage, on the next experiment, those paths need no special help but a new set of paths are suddenly a big problem.

Clocks can be even more challenging; clock signals crossing between devices may introduce significant clock skew you didn’t anticipate. For both signal and clock timing problems, within a single FPGA device, vendor design tools will help you close timing but closing timing across the whole design is going to be your problem.

You also have to deal with IO resource limits on FPGAs. Aside from timing, you can’t arbitrarily divide up your design because in most cases that will require you support more IO signals on a device than there are signal pins. Handling this requires some clever logic to bundle/unbundle signals to meet pin limitations; a lot more work if you’re trying to hand-craft your own mapping.

Making all of this work on your own custom-crafted FPGA boards is not for the faint of heart. A much simpler solution is to use the Aldec HES prototyping systems (supporting up to 630M gates split across 4 boards, each with 6 FPGAs), together with HES-DVM PROTO to simplify partitioning and setup. They illustrate this through a NoC-based design hosting a CPU, RAM and GPIO and UART interfaces, mapping onto one of their HES7XV12000BP boards.

In the demo they highlight some of HES-DVM Proto capabilities: clock conversion, interconnect analysis, “Try Move” options, experimenting with different inter-chip connectors and others. This is an interactive process so you might wonder why you can’t just push a “Go” button, sit back and wait for it all to be done automatically. That is supported in some tools but even there, if you want to get to decent performance levels (>5MHz), you have to get involved. I guess Aldec just skipped the easy but slow option and went right for hands-on and fast.

You can watch the Webinar HERE.