Many advanced algorithmic IPs are described in C++. We use this language because of its flexibility. Of course software algorithms are written to be executed on processors so they don’t solve all the issues of getting the algorithm implemented in hardware directly. This is not simply a high-level synthesis (HLS) issue. Usually for implementation in hardware a software algorithm needs to be transformed to operate on a streaming, sample-by-sample basis. In order to achieve performance characteristics, a monolithic software algorithm is implemented as a chain of modules operating in parallel. The single-threaded C++ algorithm won’t meet the system constraints if it is left in a single-threaded form. For such a multi-module or multi-thread implementation you’re going to need more architectural information such as the macro-level block diagram and the number and type of interfaces between the modules. This type of information is best captured in a language like SystemC. But how to you get there?
Making transformations like these is a task that requires both familiarity with hardware as well as familiarity with the algorithms. One part which can be automated is the insertion of the communications channels to properly interface the threads or modules. This is not just the synchronization mechanism, but also storage and buffering since the streams and modules are working independently. The communications between blocks is a necessary part of the design but may not specifically add unique value.
Forte has created an application inside its Cynthesizer Workbench called Interface Generator which automatically generates the mechanisms to efficiently manage the data transfer between multiple threads and modules. The important decisions requiring algorithmic understanding, such as the types of channel to use to interface two modules, are left to the designer. The designer is given many types of channels to choose from – the data type, data storage capacity, how the data is to be synchronized, etc.
Using this interface generation approach, these custom SystemC channels are added to the design library. The designer can use a set of function calls implemented in the channels to handle the transfer of data between the streams or modules. The standardized function calls created by the Interface Generator give the designer a new layer of abstraction, hiding the details and reducing implementation errors inherent in creating the interface code manually (see this video for an example). Errors are reduced by using these standardized function calls to implement the complex interface behavior. Also, this gives the designer the flexibility to try different types of channels to see which type of channel is best for meeting the target specification without having to write low-level RTL protocols for each attempt. Accessing the communication channel by calling these functions allows the designer to work at a higher level of abstraction, with the details of the storage and synchronization protocols encapsulated inside the channel.
It seems to me that this approach would be useful in a huge number of applications. For example, many video vendors have C or C++ implementations of picture improvement algorithms. The algorithms are implemented under different specifications each time they are used. In order to meet the various constraints placed on the picture improvement module such as area, performance (e.g., frames per second), and power, the designers will explore different ways to parallelize the design. How the threads or modules are connected will have an impact on the correctness of the design as well as its performance. Use the Interface Generator the designer can easily experiment with multiple channel types to see which types meet the over design specification.
Another situation where such a problem may come up is when an algorithm is deemed too large and needs to be broken into multiple modules. This could be due to a chip’s floorplan constraints or to allow the design to be broken down for easier verification. It could be a way to save cost by splitting an algorithm into two or more less expensive FPGAs instead of one large FPGA, or it could be to assign the work to a number of designers working at the same time. The value of the Interface Generator is quite clear here as errors are reduced and multiple different interface approaches can be tried in order to meet the design objectives. A video showing usage of the Interface Generator in the design of an edge detection filter can be found here.
Bottom line: Designers who need to take an untimed C++ design and implement it as a multi-threaded or multi-module hardware design can benefit from the automatic creation of communication channels.
For more information see the Forte website here.
Forte is an ‘I LOVE DAC’ sponsor. To get your free DAC badge, or to sign up for a Forte demo at DAC, click here.
Share this post via:
5 Expectations for the Memory Markets in 2025