Last month, I made a “no-brainer” forecast that 2017 would be the year in which embedded FPGA (eFPGA) IP would emerge as a key differentiator for new SoC designs (link to the earlier article here).
The fusion of several technical and market factors are motivating design teams to incorporate programmable logic functionality into their feature set:
- the NRE and qualification cost of a new SoC design is increasingly significantly, at newer process nodes; a single part number that can adapt to multiple applications is a significant cost savings
- many algorithms can be more efficiently implemented in logic than using code executing on a processor core; the flexibility of reconfiguring the algorithm logic to specific market requirements enables broader applicability
- new (on-chip and external) bus interface specifications are emerging, yet final ratification is pending — a programmable logic IP block affords adaptability
Yet, field programmable logic implementations are often associated with higher power dissipation than cell-based logic designs. And, many of the applications for eFPGA technology are power-sensitive.
To better understand how eFPGA IP will address power optimization, and how that compares to logic library design, I reached out to Cheng Wang, Senior VP of Engineering at Flex Logix Technologies Inc., for his insights.
As with conventional logic designs, Cheng indicated that addressing power/performance for eFPGA IP is a multi-faceted task. Architecturally, the programmable logic fabric needs to define the look-up table (LUT) size that best fits the target applications. From a circuit perspective, the LUT array implementation needs to leverage all the available foundry technology features, in a manner similar to standard cell library offerings, but with specific consideration for eFPGA circuits.
Cheng highlighted, “Customers are seeking the low-power and ultra-low power foundry offerings, such as TSMC’s 40LP/ULP. These processes offer a wide supply voltage range and a variety of device threshold voltage variants, such as SVT, HVT, and UHVT (in 40ULP). For an eFPGA block we offer customers the flexibility in device selection for their application, to which we have incorporated additional power optimizations. For example, the eFPGA configuration storage bits always utilize low-leakage devices, which corresponds to ~30-40% of the block area. We have included support for a local body-bias voltage. And, perhaps most significantly, the eFGPA IP block support full power-gated operation.”
The figure below depicts the power gating microarchitecture.
The LUT cell includes state-retention when power gated, also utilizing low-leakage devices to minimize quiescent dissipation. A single block input pin defines the sleep state — the necessary internal turn-on/turn-off signal ramp to control current transients is engineered within the block itself.
In short, power optimization for an eFGPA IP block involves many of the same trade-offs as a cell-based design — e.g., supply and bias conditions, device threshold selection. Some library cell-based logic implementation flows support biasing and power gating, with additional verification and electrical analysis steps. The eFGPA IP includes the additional design engineering within the LUT and configuration circuits for these power optimizations — which are available across the range of eFPGA array sizes.
As an aside, Cheng spoke briefly about the verification of these power optimization features. Shuttle testsites are used to confirm and qualify the silicon eFPGA array implementations. Qualification is performed over the supply voltage ranges supported by the foundry (e.g., 0.6V – 1.0V for TSMC’s 16FFC).
Finally, we chatted about a different class of eFPGA customers, for which performance, not power dissipation, is the key requirement. Chung characterized this market segment as, “We have customers who are strictly seeking the fastest implementation available, for applications such as user-defined networking. The configurability of an eFGPA array enables them to incorporate various packet-processing algorithms. Architecturally, for these designs, the LUT expanded from 4 to 6 logic inputs — ideally, these algorithms can thus be realized in fewer logic stages. In several design examples we’ve reviewed with customers, more than 90% of the LUT’s do utilize more than 4 logic inputs. Correspondingly, the full eFPGA array architecture needed to support a larger number of block I/O’s.”
Cheng continued, “From a technology and circuits perspective, these customers are designing in FinFET technologies, such as TSMC’s 16FF+/FFC. The circuit drivers in the 16nm design are larger. For performance, we have implemented a clock distribution mesh(as opposed to a tree), which is expandable across the range of eFPGA array sizes and aspect ratios. The device selection is available in SVT and LVT variants. As an IP provider, we work with the foundry to utilize the ‘converged PDK’ metal stack, which the foundry’s customers are adopting for the lower metal levels. Within this metal stack definition, our LUT circuits and switch fabric optimize metal linewidth/space dimensions for performance.” (The Flex Logix EFLX-100 utilizes metals M1-M5, the EFLX-2.5K utilizes M1-M6.)
For these performance-driven customers, the tradeoff in area, performance, and power in the eFPGA design suggests a low value for the power gating and body biasing features included in the older process node implementations.
I learned that eFGPA array design is both similar and, in many ways, different than cell library-based logic design, when considering power and performance target optimizations. The synthesis of RTL functionality into the target logic offering is similar, to be sure. And, foundry process technology device offerings are leveraged in the logic circuits. Yet, the eFGPA offering from Flex Logix is more than a logic block with programmable functionality — it has been engineered to integrate specific power/performance features that customers expect from a complex IP offering.