WP_Term Object
(
    [term_id] => 13
    [name] => Arm
    [slug] => arm
    [term_group] => 0
    [term_taxonomy_id] => 13
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 389
    [filter] => raw
    [cat_ID] => 13
    [category_count] => 389
    [category_description] => 
    [cat_name] => Arm
    [category_nicename] => arm
    [category_parent] => 178
)
            
Mobile Unleashed Banner SemiWiki
WP_Term Object
(
    [term_id] => 13
    [name] => Arm
    [slug] => arm
    [term_group] => 0
    [term_taxonomy_id] => 13
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 389
    [filter] => raw
    [cat_ID] => 13
    [category_count] => 389
    [category_description] => 
    [cat_name] => Arm
    [category_nicename] => arm
    [category_parent] => 178
)

Cortex-A9 speed limits and PPA optimization

Cortex-A9 speed limits and PPA optimization
by Don Dingee on 12-19-2012 at 3:01 pm

We know by now that clock speeds aren’t everything when it comes to measuring the goodness of a processor. Performance has direct ties to pipeline and interconnect details, power factors into considerations of usability, and the unspoken terms of yield drive cost.

My curiosity kicked in when I looked at the recent press release from Cadence announcing they had reached 2.2 GHz on a 28nm dual-core ARM Cortex-A9 with Open Silicon. Are we reaching the limits of the Cortex-A9 in terms of clock speed growth? Or are more improvements in power, performance, and area (PPA) in store for the core?

The raw percentages quoted by Cadence in that release sound great: 10% reduction in design area, 33% reduction in clock tree power, 27% reduction in leakage power compared to an unnamed prior design flow. These new figures were achieved with a combination of the latest RTL compiler, RTL-to-GDSII core optimization, and clock concurrent optimization techniques, which are really targeted at 20nm design but are certainly applicable to less aggressive nodes.

We may be pressing the limits on what the Cortex-A9 core can do at 28nm, and there is likely only one more major speed bump to 20nm in store for the Cortex-A9. I went hunting and found several data points.

ST-Ericsson has (had?) a 2.3 GHz version, with rumbles of 2.5 GHz possible, of the dual-core NovaThor L8580 running on an FD-SOI process. It’s questionable if this device or the rest of the forward ST-Ericsson roadmap ever get to market in light of STMicro wanting to pull out of the JV, the continuing saga of Nokia attempts to recover, and the stark reality of US carriers preferring Qualcomm 4G LTE implementations.

TSMC has taped out a 3.1 GHz dual-core Cortex-A9 on their 28HPM process, which from what I can find is the unofficial record for Cortex-A9 clock speed. However, the “typical” conditions which TSMC cites leave out one detail: active cooling is required, which rules out use of a real world part at this speed in phones or tablets. The economics of yield at that speed are unclear, but they can’t be good otherwise we’d be hearing a lot more about this on processor roadmaps.

Along the lines of how much PPA optimization is possible, I went looking for another opinion and found this SoC Realization white paper from Atrenta, which discusses how PPA fits into the picture. The numbers Cadence is quoting suggest that we’re close to closing the optimization gap for the Cortex-A9, because the big-hitters in the flow have been optimized.

By back of the envelope calculations, if state-of-the-art optimization for a Cortex-A9 gives us 2.2 GHz at 28nm, a process bump to 20nm creates headroom to about 3 GHz. Reports have Apple heading to TSMC for 20nm quad-core designs, but reading between the lines of that the same concerns of power consumption and cooling exist – these chips aren’t slated for iPhones. (As I’ve said before, Apple is driving multiple roadmap lines, one on the A6 for phones, one on the A6x for tablets and presumably the long awaited Apple TV thingie, and likely a third ARM-based chip for future MacBooks probably on the 64-bit Cortex-A50 series core.)

The reason I say the Cortex-A9 likely gets only one more speed bump is explained pretty well in this article, projecting what 64-bit does for ARM-based core performance. While a lot of that is estimation, the point which I agree with is most of the energy for further EDA optimization will be put into the Cortex-A50 series. TSMC and ARM both agree that the drive for 16nm FinFET and beyond is focused on 64-bit cores.

A couple immutable rules of my own when it comes to tech:

  • 10 engineers can make anything work, once; optimization is more interesting.
  • Once something is optimized, it’s optimized, and it’s time to design the next thing.

I think we’re reaching that point on the Cortex-A9, and 3 GHz is about the end of the line for what PPA optimization and process bumps will do. With that said, what may happen is instead of going for higher clock speeds, designers drive the Cortex-A9 for lower power and take it to more embedded applications.

Punditry has its risks, like being wrong a lot or being labeled Captain Obvious. I’m thick skinned. What are your thoughts on this topic, agree or disagree?

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.