GLOBALFOUNDRIES has been evangelizing their 22FDX FD-SOI process for a few months; readers may have seen Tom Simon’s write-up of their preview at ARM TechCon. Dr. Joerg Winkler recently gave an updated webinar presentation of their approach in an implementation of ARM Cortex-A17 core.
By now, you’ve probably heard that 22FDX targets a cost/die comparable to 28SLP while offering a performance boost from its next-generation FD transistors. 22FDX also offers the ability to integrate RF features and an opportunity to reduce RF power consumption significantly.
Most of the story has focused on PPA (power-performance-area) enhancements, but I see two aspects of this people may have overlooked. To prove out the 22FDX process, GF decided to tape out a baseline implementation for comparison. The starting point was the same: a quad-core Cortex-A17 using the same processor core macro.
However, taking full advantage of 22FDX is not as simple as dropping in a 28SLP design. Details of body-bias routing come into play. We’ve seen the above picture before as well. One can apply reverse body-bias (RBB) to raise VT and lower leakage, or apply forward body-bias (FBB) to lower VT and increase maximum frequency. FBB uses a flipped-well architecture where the nMOS transistor sits on the N-well and the pMOS transistor sits on the P-well.
Winkler launches into an overview of how GF has teamed with Cadence on tools handling the details of body-biasing and other details of the 22FDX design flow. Philosophically, GF chose to implement one unified body-bias scenario for the Cortex-A17 baseline tests in 22FDX. They placed each of the cores on its own power domain, and brought in 5 pairs of body-bias nets with an outer ring approach (the white lines around the “non-CPU” block and the boundary of the four cores).
One of the interesting points is the body-bias networks are known to the design flow. GF leverages support for UPF in the Cadence platform (UPF scripts were heavily used), as well as multi-corner PVT and PVTB support. There is also discussion of the details of handling cache. In this implementation, there are 14 different L1 cache macros, and one L2 cache macro. Each has to be supported for periphery body-biasing and bitcell array body-biasing, leading to the need for 5 body-bias net pairs. The routing has to obey high-voltage spacing rules.
After the extensive discussion of how they added body-biasing to a quad-core Cortex-A17 in 22FDX, I got the distinct feeling that it is very hard to compare this big implementation apples-to-apples to the 28SLP baseline because no specific results were shared. Winkler switches to another story we’ve seen before, a PPA comparison on Cortex-A9 which is much simpler. The punchline of that story is for the same clock speed, the 22FDX version of the Cortex-A9 uses 45% less power and 45% less area – using RBB. One could choose to use FBB, and in that same 45% less area get 30% more frequency at the same power point.
That leads to what I think are the two main takeaways of 22FDX. To change the PPA target on bulk nodes, the implementation has to change. On 22FDX, using body-bias (possibly dynamically under software control) one can slide the same implementation up and down the power-frequency curve. Also, up front choices of either RBB or FBB can have a major impact – for example, in the same SoC on 22FDX a big Cortex-A17 cluster could use FBB for maximum performance, and a LITTLE Cortex-A9 cluster could use RBB for minimum power consumption.
You can see the entire GF webinar on 22FDX in the clear on YouTube:
How to Implement an ARM Cortex-A17 Processor in 22FDX FD-SOI Technology
There’s also much more information about 22FDX on the GF website. The investment in getting into 22FDX and having control over tuning an implementation using body-biasing puts it in a unique spot. Instead of just chasing smaller and smaller geometries, 22FDX captures the costs of a now-mature 28nm node with significant performance advantages.Share this post via: