In life, there are some things that just go together. Imagine the world without peanut butter and jelly, eggs and potatoes, telephones and voicemail, or the internet and search engines. In the world of computing there are many such examples – UARTS and FIFO’s, processor cores and GPU’s, etc. Another trait all these things have is in common is inevitability – they just were going to happen sooner or later. In hindsight they seem obvious. At the Linley Processor Conference just held in Santa Clara, Achronix presented something new that has “meant to go together” written all over it.
CPU’s are flexible, but are relatively slow compared to ASIC’s. However, ASIC’s lack flexibility – they are built to perform one task. FPGA’s have resided in a sweet spot between the two with many of the advantages of each, but with a number of drawbacks. They aren’t as fast as ASIC’s, but aren’t as flexible as CPU’s. For the purposes of this discussion, I am putting GPU’s in the same camp as CPU’s. Of course, there are many FPGA’s that come with embedded processors, but these are usually smaller embedded processors and not always suitable for high throughput applications like networking or servers
Designers of SOC’s, especially those that are created to differentiate products, frequently include FPGA’s at the system or board level. This comes with a price however. A lot of real estate on commercial FPGA’s is for multi-purpose IO’s. Also, system designers have to live with the supplied on-chip resources on commercial FPGA’s, like clocks, RAM, DSP’s, etc., even If some are over or under utilized. Standalone FPGA’s need their own DDR memory, which brings with it coherency issues with the CPU DRAM. Perhaps the biggest penalty is the time and power required to move data from the system SOC to the FPGA and back again.
Achronix’s position is that configurable embedded FPGA cores are the solution. An ideal pairing, that allows system architects to take advantage of the benefits of FPGA’s and avoid the drawbacks that would otherwise hurt performance, power and cost. They have announced their Speedcore™ eFPGA, embedded FPGA that can be configured specifically to the requirements of any particular design. Speedcore eFPGA offers up to 2 million LUT’s. The number of LUT’s and FF’s are completely configurable and up to the user. Also, BRAM, LRAM and DSP density are configurable. The same goes for LRAM and BRAM widths and depths. They offer customizable DSP functionality too.
The real benefits come from high speed on-chip interfaces that are highly parallel. For instance, 2 x 128b interfaces running at 600MHz offer throughput of 153Gb/s. Just as importantly, this comes with extremely low latency – ~10ns for round trip. As many interfaces as needed can be added for higher throughput. With Speedcore eFPGA’s AXI/ACE-Lite interfaces, integration just as with any other IP core is possible. This saves time and complexity in moving data to and from the FPGA.
The FPGA is now a peer in the system and can work much more effectively in off loading the CPU’s. The Accelerator Coherency Port (ACP) lets the eFPGA access the memory via L2 and L3 cache. This lowers latency significantly. Also the eFPGA can issue interrupts to the CPU. The eFPGA can play a major role in interrupt handling. IRQ’s can be handled in the FPGA and only forwarded to the CPU if needed.
For configuration Speedcore eFPGA offers half-DMA for rapid initialization. Speeds of ~2 ms per 100K LUT’s are available. The configuration can be made secure with the built-in encryption engine. There are other significant security benefits as well. Beyond the obvious added security of not having the FPGA data stream going off chip, the Speedcore eFPGA accesses memory through the Trustzone controller.
The presentation also went into details on specific example use cases. One of them was network and protocol acceleration. The FPGA can work hand in hand with the system CPU to accelerate packet processing. Packets can be inspected rapidly without throttling memory performance. Also the eFPGA can place flagged packet headers into the CPU cache so the CPU can service them. Similar kinds of speed ups were covered in use cases for Storage and SQL operations.
Some unexpected benefits come from the additional observability that an on-chip FPGA offers. Due to its ability to access the system memory bus, the FPGA can be configured to monitor and collect statistics about on-chip traffic. Extremely detailed information can be gathered with extensive filtering capabilities. Furthermore, during debug and bring up the eFPGA can serve as a programmable traffic generator. The eFPGA also makes possible at-speed testing at manufacturing and at power on.
Because eFPGA needs less support silicon when placed in an SOC, there are savings in overall silicon area. This advantage extends to BOM reduction, a reduction on board real estate and reduced pin counts on the SOC. In some scenarios, the programmability could also help SOC providers reduce respins on their chips – new functionality or ECO’s could be implemented with a bitstream change.
The Achronix presentation went into more detail than I can provide here. But, by now it should be pretty clear that pulling a programmable FPGA core into an SOC is a big win from almost every perspective. We can safely assume that embedded FPGA’s and SOC’s will soon to be famous pairing, right up there with coffee and cream. For more details about Achronix Speedcore™ eFPGA please look at their website.