Regular Semiwiki readers are aware that embedded FPGA (eFPGA) IP development is a rapidly growing (and evolving) technical area. The applications for customizable and upgradeable logic in the field are many and diverse — as a result, improved performance, greater configurable logic capacity/density, and comprehensive testability are customer requirements of increasing importance.
I recently had the opportunity to chat with Geoff Tate, CEO, and Cheng Wang, Senior VP of Engineering at FlexLogix, about the expansion and advancements in the eFPGA market. Flex Logix has just announced their “second-generation” array architecture, with initial silicon validation targeting TSMC’s 16FFC technology offering — the discussion of the features incorporated in this new IP generation was especially insightful.
Geoff highlighted,“A large cross-section of our customers are focused on performance. We made a key change to our basic architecture — 6-input LUT’s(also available as dual 5-input LUT’s)replace the 5-input(dual 4-input)topology of our first generation design.”
I countered with the statement that there is a contingent of FPGA users recommending 4-input LUT’s as the preferred logic mapping and configuration memory area tradeoff. Cheng provided some interesting data to counter that assertion, “Our networking customers require high packet processing throughput. This application leverages the high fan-in functions available with 6-input LUT’s, to reduce the number of logic levels in each pipeline stage. Additionally, we have optimized our unique hierarchical interconnect topology for improved performance for larger eFPGA arrays.”
As for the logic mapping efficiency with higher fan-in LUT’s, Cheng provided a comparative data point:
ARM Cortex-M0 microcontroller:
- 3905 4-input LUT’s
- 3089 6-input LUT’s
Cheng highlighted that the eFLEX2.5K core granularity when building an array resulted in the Cortex-M0 using 1 DSP and 2 logic cores in their first-generation architecture, whereas the new (6-input LUT) design maps the Cortex-M0 into 1 DSP and 1 logic core. (For a review of the FlexLogix eFLEX design approach that supports arrays of 2.5K cores and interconnect, please refer to a previous semiwiki article — link.) Synthesis algorithms are clearly successful in leveraging higher fan-in LUT cells.
Cheng referred to several RTL benchmarks indicating that the combination of the new core and hierarchical interconnect is providing 25% performance gains over the previous eFPGA arrays (at the same process node.)
eFPGA logic density — moving aggressively from 28nm to 16FFC
The FlexLogix architecture and compiler support a range of eFLEX2.5 core instances, in array configurations as large as 7X7, for a total capacity exceeding 100K LUT’s. They recently released a full array testsite to TSMC’s 16FFC process node.
Figure 1. Image of the 7×7 array of eFLEX2.5 cores integrated on a 16FFC testsite.
Cheng continued,“Our customers are enthusiastic about the PPA characteristics of 16FFC. There is significant momentum behind this node — we are seeing consolidation behind the 1P2xa1xd3xe PDK, which we used for the testsite — we have optimized the use of six metal levels within and between cores.”
An important feature of any programmable logic implementation is the ability to read the configuration bits, as part of production test and/or during functional runtime. “Customers require the capability to verify the configuration data, at any time.”, Cheng said. “The SRAM read operation is available with little additional hardware overhead, with the bits visible through the configuration chain.”
DFT and Production eFPGA Test
“Customers are also requiring very high (stuck-at) fault coverage during production test.”, Geoff emphasized.“And, for sure, tester time, and thus cost, must be optimized, as well.”
Naively, I mentioned that the eFPGA could simply adopt the “standard” embedded IP core wrap test architecture. Cheng educated me to the unique characteristics of eFPGA test: “The test overhead to load configuration bits to a large array as a single entity in wrap test fashion — with 1.4M configuration SRAM bits per core — would be prohibitive. We needed to develop an aggressive approach to parallelizing the loading of configuration bits and exercising test patterns. Give the multiple eFLEX2.5 core instances that comprise the embedded IP, we re-architected the core and array compiler to enable common configuration bits to be applied during production test to each core in parallel.”
Figure 2. 2nd-gen eFLEX2.5 core block diagram (left), with DFT connectivity defined for each core (right)
Figure 3. Parallel loading of common configuration bits and test patterns to each core in the array.
Cheng continued,“We developed a core model that enables commercial DFT tools to generate patterns. We’re achieving well in excess of 98% stuck-at fault coverage. We will collaborate with customers to develop additional patterns to focus on primary I/O and inter-core logic to bring the coverage well above 99%, if required.” It’s clear to me now that an embedded FPGA is definitely not like other IP, when addressing production test pattern development.
The eFGPA market is evolving rapidly — customers are requiring improved performance, greater capacity, (runtime) configuration visibility, and improved production test coverage/efficiency. The FlexLogix development team is responding to these requirements with corresponding innovations in their “second-generation” architecture.
For more information on their recent eFPGA release, please follow this link.