A key application for embedded FPGA (eFPGA) technology is to provide functionality for specific algorithms — as the throughput of this implementation exceeds the equivalent code executing on a processor core, these SoC blocks are often referred to as accelerators. The programmability of eFPGA technology offers additional flexibility to the SoC designer, allowing algorithm optimizations and/or full substitutions to be implemented in the field.
I recently had the opportunity to chat with Tony Kozaczuk, Director of Solutions Architecture at Flex Logix Technologies, about a new application note that Flex Logix has authored, to illustrate how eFPGA technology is ideally suited to accelerator designs. I had an opportunity to see a pre-release version of the app note — it was enlightening to see the diversity of accelerators, as well as various implementation tradeoffs available to realize latency and throughput targets.
The accelerator examples in the app note pertain to the interface protocols of the AMBA architecture. This specification has evolved to encompass a breadth of (burst and single) data transfer bandwidth requirements for system and peripheral bus attach, as summarized in the figure below.
The app note illustrates how the eFPGA block can be readily integrated into these AMBA bus definitions, including both AXI/AHB master and slave bus protocols, and through an AXI2APB bridge for communication using the lower bandwidth APB bus, as illustrated below.
Tony reviewed some of the performance tradeoffs associated with embedding the AMBA bus protocol functionality within or external to the eFPGA block.
Flex Logix is providing all the Verilog models for attaching an accelerator to these AMBA bus options for free on their web site — see the link below at the bottom of this article.
Several unique features of the Flex Logix eFPGA technology are critical for accelerator design. The I/O signals on the EFLX array tile are readily connected to adjacent tiles, and very significantly, readily connected to SRAM sub-blocks integrated within the eFPGA physical implementation, without disrupting the inter-tile connectivity. The SRAM sub-blocks can be floorplanned within the overall EFLX accelerator for optimal performance — the figure below illustrates a complex example. The graphic on the left is a floorplan of a full accelerator block, comprised of array tiles embedded SRAM’s. Flex Logix offers both a logic and a specialized DSP tile, as illustrated in the graphic on the right. (Specific accelerator examples described shortly have a simpler SRAM floorplan.)
The EFLX compiler integrates the Verilog model connectivity to the SRAM’s with placement configuration information to assemble the full design. The app note includes EFLX code examples for integrating SRAM blocks — a crucial requirement for high-performance accelerators. The app note also describes how to manage the synchronization of data inputs to the accelerator.
The accelerator examples that Tony briefly reviewed were very informative — there are more in the app note. The implementation of the AES encryption algorithm utilizes the AXI4-Stream protocol definition, with the master/slave protocol logic included within the eFPGA array Verilog model.
The figure above shows architecture options when considering an accelerator implementation — note that information such as the encryption key could be provided directly as part of the eFPGA programmability, or (optionally) sent separately from a processor core (over the APB interface). The throughput of the AES implementation compiled by the EFLX compiler from source Verilog to the TSMC 16FFC technology is illustrated below, compared to the same algorithms executing in program code running on a Cortex-M4 core.
Two EFLX array performance results are quoted, at the same published frequency for the Cortex-M4, and the 16FFC frequency realizable in the physical implementation.
Another accelerator example is a FFT calculation engine, as illustrated below. The figure depicts the integrated SRAM sub-blocks included with this implementation, and how the EFLX tile I/O connectivity to the SRAM is implemented. (6 of the EFLX 2.5K LUT tiles and 18 SRAM sub-blocks are used.)
Embedded FPGA technology will provide SoC architects with compelling options to include application-specific accelerators to the design, with the added flexibility of programmability over a hard, logic cell-based implementation. A critical feature is the ability to integrate SRAM with the accelerator, as part of the compilation and physical design flows.
Flex Logix has prepared an app note describing how their eFPGA technology is a great match for accelerator designs — it is definitely worth a read. And, the Verilog examples, are great, as well — they clearly illustrate how to attach to the various AMBA protocols. The app note and Verilog code are available here.