Synopsys IP Designs Edge AI 800x100

Cortex-A9 speed limits and PPA optimization

Cortex-A9 speed limits and PPA optimization
by Don Dingee on 12-19-2012 at 3:01 pm

We know by now that clock speeds aren’t everything when it comes to measuring the goodness of a processor. Performance has direct ties to pipeline and interconnect details, power factors into considerations of usability, and the unspoken terms of yield drive cost.

My curiosity kicked in when I looked at the recent press release from Cadence announcing they had reached 2.2 GHz on a 28nm dual-core ARM Cortex-A9 with Open Silicon. Are we reaching the limits of the Cortex-A9 in terms of clock speed growth? Or are more improvements in power, performance, and area (PPA) in store for the core?

The raw percentages quoted by Cadence in that release sound great: 10% reduction in design area, 33% reduction in clock tree power, 27% reduction in leakage power compared to an unnamed prior design flow. These new figures were achieved with a combination of the latest RTL compiler, RTL-to-GDSII core optimization, and clock concurrent optimization techniques, which are really targeted at 20nm design but are certainly applicable to less aggressive nodes.

We may be pressing the limits on what the Cortex-A9 core can do at 28nm, and there is likely only one more major speed bump to 20nm in store for the Cortex-A9. I went hunting and found several data points.

ST-Ericsson has (had?) a 2.3 GHz version, with rumbles of 2.5 GHz possible, of the dual-core NovaThor L8580 running on an FD-SOI process. It’s questionable if this device or the rest of the forward ST-Ericsson roadmap ever get to market in light of STMicro wanting to pull out of the JV, the continuing saga of Nokia attempts to recover, and the stark reality of US carriers preferring Qualcomm 4G LTE implementations.

TSMC has taped out a 3.1 GHz dual-core Cortex-A9 on their 28HPM process, which from what I can find is the unofficial record for Cortex-A9 clock speed. However, the “typical” conditions which TSMC cites leave out one detail: active cooling is required, which rules out use of a real world part at this speed in phones or tablets. The economics of yield at that speed are unclear, but they can’t be good otherwise we’d be hearing a lot more about this on processor roadmaps.

Along the lines of how much PPA optimization is possible, I went looking for another opinion and found this SoC Realization white paper from Atrenta, which discusses how PPA fits into the picture. The numbers Cadence is quoting suggest that we’re close to closing the optimization gap for the Cortex-A9, because the big-hitters in the flow have been optimized.

By back of the envelope calculations, if state-of-the-art optimization for a Cortex-A9 gives us 2.2 GHz at 28nm, a process bump to 20nm creates headroom to about 3 GHz. Reports have Apple heading to TSMC for 20nm quad-core designs, but reading between the lines of that the same concerns of power consumption and cooling exist – these chips aren’t slated for iPhones. (As I’ve said before, Apple is driving multiple roadmap lines, one on the A6 for phones, one on the A6x for tablets and presumably the long awaited Apple TV thingie, and likely a third ARM-based chip for future MacBooks probably on the 64-bit Cortex-A50 series core.)

The reason I say the Cortex-A9 likely gets only one more speed bump is explained pretty well in this article, projecting what 64-bit does for ARM-based core performance. While a lot of that is estimation, the point which I agree with is most of the energy for further EDA optimization will be put into the Cortex-A50 series. TSMC and ARM both agree that the drive for 16nm FinFET and beyond is focused on 64-bit cores.

A couple immutable rules of my own when it comes to tech:

  • 10 engineers can make anything work, once; optimization is more interesting.
  • Once something is optimized, it’s optimized, and it’s time to design the next thing.

I think we’re reaching that point on the Cortex-A9, and 3 GHz is about the end of the line for what PPA optimization and process bumps will do. With that said, what may happen is instead of going for higher clock speeds, designers drive the Cortex-A9 for lower power and take it to more embedded applications.

Punditry has its risks, like being wrong a lot or being labeled Captain Obvious. I’m thick skinned. What are your thoughts on this topic, agree or disagree?


Intel not interested by NVELO? Samsung was…

Intel not interested by NVELO? Samsung was…
by Eric Esteve on 12-19-2012 at 3:06 am

Short news came during last week-end and Linkedin was the most efficient media to learn that NVELO has been acquired. Probably very few people out of the SSD ecosystem knew about NVELO. Based in Santa Clara, the company was a spin off from Denali, privately owned and if you look at the top management, you will recognize a few name, like David Lin the VP of Product development or Sanjay Srivastava, Chairman of the board, both being part of the winning team who has sold Denali for $315M to Cadence. The product developed by NVELO? Dataplex, which is SSD cache, allowing to benefit from the Nand Flash advantages over a pure Hard Disk Drive (HDD) type of storage, but without the over cost generated by a complete SSD storage system. Pretty smart product…

There are several products in the market that use NVELO’s Dataplex software such as OCZ’s Synapse, Corsair’s Accelerator and Crucial’s Adrenaline SSDs. Dataplex is essentially an alternative to Intel’s Smart Response Technology (SRT) but with fewer limitations. For example, Dataplex is not tied to any specific chipsets, making it a viable option for AMD based setups and older systems without Intel’s SRT support. There is also no 64GB cache size limitation like in Intel’s SRT, although most of the SSDs that are bundled with Dataplex are 64GB or smaller. Whether it’s worth it to use an SSD bigger than 64GB for caching is a different question, but at least there is an option for that.

I don’t know if Intel was part of the companies bidding for NVELO, but the fact is that Samsung has completed this acquisition, on December 14, 2012, for an undisclosed amount (probably several dozen of $ million, I would say maybe $50 million, but I have absolutely no insight information). According with Samsung: “The timely integration of NVELO’s flagship storage technology into Samsung’s best-in-class SSD technologies will give Samsung customers access to an ever-evolving and more diversified portfolio of NAND storage solutions suitable for a broad range of computing platforms.”

The semiconductor industry is fast moving, this is not a scoop, and Intel is certainly at the beginning of a huge reorganization, that they have to complete if they don’t want to lose their #1 position, and maybe more than that, as missing the move from PC to mobile in general (smartphone or media tablet) could be dramatic, if the company don’t find a new sustainable source of revenue, for something like $20 Billion, in the very near future, let say by 2015 or so.

NVELO acquisition will certainly not bring such a revenue amount to Samsung, but this acquisition is one of the sign showing that the SC is heavily changing, and that Samsung is now acting more like a leader than like a challenger, that the company is still in respect with Intel. But for how long?

As a bonus, you may want to read some declaration made by Samsung and NVELO representatives: they are excited, and we are too!

“The acquisition of NVELO will enable us to extend our ability to provide SSD related storage solutions to customers. We are pleased with this transaction as the employees of NVELO share our vision to take SSD storage into the next-generation of performance and reliability,” said Young-Hyun Jun, executive vice president of Flash product & technology, Device Solutions, Samsung Electronics.

“The NVELO team is excited to join the Samsung family,” said Jiurong Cheng, president and CEO, NVELO. “We look forward to accelerating storage innovation in close cooperation with Samsung storage experts as we help to deliver fully integrated SSD solutions to the market.”

The acquisition involves all technology and personnel under NVELO, Inc. Further details of the agreement were not disclosed.

Eric Esteve from IPNEST


A Brief History of Berkeley Design Automation

A Brief History of Berkeley Design Automation
by Daniel Nenni on 12-18-2012 at 1:00 pm

Analog, mixed-signal, RF, and custom digital circuitry implemented in GHz nanometer CMOS introduce a new class of design and verification challenges that traditional transistor‑level simulators cannot adequately address. Berkeley Design Automation, Inc., (BDA) was founded in 2003 by Amit Mehrotra and Amit Narayan, UC Berkeley Ph.D. graduates within the intent of delivering a next-generation circuit-level design tools to address these coming challenges. Ravi Subramanian, another UC Berkeley Ph.D., joined shortly thereafter as CEO. Together, and with initial funding from Woodside Fund and Bessemer Venture Partners, the three embarked on launching the venture.

BDA’s initial focus was to deliver full-spectrum device noise analysis for high-performance analog and RF ICs based on Mehrotra’s fundamental research contributions. In 2004, the company introduced its initial product, PLL Noise Analyzer™, the first tool in the industry to provide closed-loop transistor-level device noise analysis tool phase-locked loops (PLLs). Through establishing the tool in several leading-edge semiconductor companies, BDA positioned itself to learn about design teams’ most significant circuit design and verification challenges—all of which required breakthrough accuracy, performance, and capacity. BDA was able to leverage the different technologies it developed for the PLL Noise Analyzer to create a new nanometer circuit verification platform that would deliver just that.

In 2006, BDA introduced the centerpiece of its new platform—the Analog FastSPICE™ circuit simulator (AFS). The product’s simple yet powerful value proposition was anchored on delivering identical results to traditional SPICE with 5×–10× higher performance and 5x-10x higher capacity that was plug-compatible with existing flows. With this breakthrough capability, design teams could solve analog/RF circuit verification problems that were previously impossible or infeasible using traditional SPICE. Thus, the new simulator category “Analog FastSPICE” was born—“Analog” accuracy with ”fastSPICE” performance.

In order to establish itself in a market crowded with legacy tools, BDA developed its now well-known engagement model of solving problems that design teams literally could not address with their current tools. Generally this consisted of simulating circuits at one or two levels of hierarchy above what they could currently simulate, running circuit simulations in days that would otherwise take weeks, and simulation circuits that otherwise ran in digital fastSPICE simulators – all with foundry certified SPICE accuracy and a drop-in use model. Word spread that SPICE wasn’t dead.

Design teams rapidly adopted AFS and encouraged BDA to deliver even more performance and capacity without compromising accuracy. They also asked BDA to extend AFS functionality to address new nanometer CMOS verification needs, including device noise impact on all noise-sensitive circuits, large post-layout simulations, complex mixed-signal simulation, and efficient characterization of global and local process variation. Leveraging its modular architecture implemented as a unified executable, BDA rapidly filled out the AFS Platform, including: multithreaded and multi-core parallel execution, >10-M element DC and transient capacity, >100K-element periodic steady state analysis, full-spectrum periodic noise analyses, full-spectrum transient noise analysis, and HDL co-simulation. All of these capabilities provide foundry certified accuracy to 20nm for both leading netlist and model formats.

The company’s technology development pipeline continues unabated with leading-edge development to breakdown remaining and emerging barriers. Most recently BDA announced breakthrough AMS capabilities in its Analog FastSPICE AMS simulator which uniquely works with any leading HDL simulator and makes mixed-signal simulation practical for everyday use.

Today, over 125 semiconductor companies around the world rely on the BDA AFS Platform and the company’s deep application expertise to meet their nanometer circuit design and verification challenges. The main circuit application areas include: high-speed I/O, PLLs/DLLs, ADCs/DACs, image sensors, transceivers, memories, and RFICs.

BDA was recognized as one of the 500 fastest growing technology companies by revenue in North America in both 2011 and 2012 by Deloitte. The company is privately held and backed by Woodside Fund, Bessemer Venture Partners, Panasonic Corp., NTT Corp., IT-Farm, and MUFJ Capital.


FinFET Modeling and Extraction at 16-nm

FinFET Modeling and Extraction at 16-nm
by Daniel Payne on 12-18-2012 at 12:05 pm

In 2012 FinFET is one of the most talked about MOS technologies of the year because traditional planar CMOS has slowed down on scaling below the 28nm node. To learn more about FinFET process modeling I attended a Synopsys webinar where Bari Biswas presented for about 42 minutes include a Q&A portion at the end.


Bari Biswas, Synopsys
Continue reading “FinFET Modeling and Extraction at 16-nm”


Double Patterning Tutorial

Double Patterning Tutorial
by Paul McLellan on 12-17-2012 at 4:07 am

Double patterning at 20nm is one of those big unavoidable changes that it is almost impossible to know too much about. Mentor’s David Abercrombie, DFM Program Manager for Calibre, has written a series of articles detailing the multifaceted impacts of double patterning on advanced node design and verification. There is a pdf that gives an introduction and a link to each article here.

Treat this as an advanced tutorial on the issues of double patterning, in particular the different approaches that different companies are deciding to take. Do you want your designers to see the two colors on the mask or leave it up to either specialists or the foundry? Do you want designers to be able to seed double patterning, which can be important in analog to guarantee better control over some of the parasitics.

David won the award for the best tutorial at the 2012 TSMC OIP for his presentation, along with Peter Hsu of TSMC, on Finding and Fixing Double Patterning Errors in 20nm. This pretty much summarizes what everyone has to become aware of the lower layers of 20nm.


Of course, since David works for Mentor, the solutions are all positioning Calibre’s capabilities. But the pieces mostly don’t mention Calibre and are general tutorials on double patterning that do not depend on the tool flow that you are using. After all, double patterning is a lithography problem requiring doubled up masks for some layers, and the fundamental problem doesn’t depend on the tool flow you take to get to a solution. If you are doing 20nm layout, you need to have at least a working knowledge of this stuff.

The different pieces that go to make up the whole tutorial are:

  • Double Patterning Requires a Double Take
  • Double Patterning: Sharing the Benefit and the Burden
  • Debugging Double Patterning without Getting Double Vision
  • To Cut or Not To Cut? That is the Double Patterning Question
  • Colorblind—Colorless versus Two-Color Double Patterning Design
  • Anchors Away – Anchoring and Seeding in Double Pattern Design
  • Monsters, Inc.: How Do I Fix These Double Patterning Errors Anyway?
  • Double Patterning: Challenges and Possible Solutions in Parasitics Extraction
  • Why do my DP colors keep changing?


I fully recommend reading the entire summary if not all of David’s articles that the summary links to. Again, the document (pdf) is here.


Apache/Ansys presents: 3DIC thermal, transmission lines, low frequency analysis

Apache/Ansys presents: 3DIC thermal, transmission lines, low frequency analysis
by Paul McLellan on 12-16-2012 at 10:00 pm

Late in January it is DesignCon at the Santa Clara convention center from January 28th-31st. Details are here.

On Tuesday from 11.05 to 11.45 Apache and Ansys will be presenting on Thermal Co-analysis of 3D IC/packages/system. This is being presented by a whole team of people: Stephen Pan, senior product specialist at ANSYS; Norman Chang, VP product development at Apache; Mark Qi Ma, a product engineer at Apache; Gokul Shankaran, a product specialist at ANSYS;Product Specialist, ANSYS; and Manoj Nagulapally, R&D manager at ANSYS.

The title of the paper makes it pretty clear what they will be presenting. Thermal management in 3D-IC designs is critical because the heat can’t easily get out from in the heart of a 3D stack of die. We are used to having to model temperature effects on a single die, but in a stack a hot spot on one die can affect the performance of the die above and below. And, of course, the performance has an affect on the temperature.

Accurate temperature maps on chips have impacts on chip reliability and performance such as EM limits, IR maps, and power distribution. Chip, package, board, and system are all thermally coupled, but with very different length scales. While it is still a challenge to include the details of all the levels in one thermal-analysis model using current technology, an accurate and practical co-analysis flow is presented to help manage thermal problems on 3D-ICs. Apache/ANSYS will describe a methodology in power-thermal co-analysis of 3D-IC packages through power-temperature iterative loops at the chip-package level and power-thermal BC iterative loops involving chip, package, and system (CPS).

Then from 2.50 to 3.30 that same day (Tuesday) Ansys, Intel and the University of South Carolina present Analytic Solutions for Periodically Loaded Transmission Line Modeling. This paper is presented by Paul Huray from University of Southern Carolina, Priva Pathmanathan from Intel and Steve Pytel, the Signal Integrity Product Manager for ANSYS.

Next day, on Wednesday from 11.05 to 11.45 ANSYS presents A Reverse Nyquist Approach to Understanding the Importance of Low-Frequency Information in Scattering Matrices. The paper is presented by Daniel Dvorscak and Michale Tsuk of ANSYS.

High-speed applications require high-frequency (20 GHz+) S-parameter models. While high bandwidth is important, it is also critical to model the behavior accurately at low frequency. When using S-parameter models in circuit simulation, the lack of proper low-frequency content can result in inaccurate results due to the simulator’s lacking enough information to recapture physical properties of the model, such as insertion delay, inter-symbol interference, or the lower knee frequency of discrete passive components. This session will cover the pitfalls of low-frequency undersampling, as well as how to use the duality of time and frequency for predicting the appropriate frequency sampling required when generating individual as well as concatenating models.


TSMC 28nm and 20nm Update Q4 2012

TSMC 28nm and 20nm Update Q4 2012
by Daniel Nenni on 12-16-2012 at 7:00 pm

The big news in Taiwan last week was another increase in TSMC capital expenditures to $9B in 2013. That number could grow however. Last year TSMC CAPEX was set at $6B and ended up at $8.3B due to rapid 28nm capacity expansion and an accelerated 20nm program. 2013 will be all about FinFETs and manufacturing Apple SoCs so $9B may not cover it.

Taiwan weather was very nice last week, for me anyway. Cooler than normal, cool enough for me to wear a suit and tie, which I very rarely do. It is so rare that people joked and took pictures. And thank you to the Hsinchu Royal Hotel for upgrading me to a suite. You never know when you need a second bathroom in your hotel room.

In 2012 TSMC sales will grow a whopping 19%! TSMC revenue for 2013 is expected to grow 15-20% which is a conservative estimate in my opinion. TSMC 28nm will continue to break process node records and the mobile market will continue to drive economic growth. The semiconductor industry should also do well in 2013 with a predicted 5% growth versus a 3% contraction in 2012.

Speaking of 28nm, TSMC made significant progress in both yield and performance this quarter so we will see even more good 28nm die in 2013 and they will be faster. I give 100% credit to the gate-last implementation of HKMG. Interesting to note, TSMC actually started 28nm research using gate-first HKMG but changed to gate-last due to yield and manufacturing issues. Fortunately TSMC has the advantage of high volume production experience using a broad set of applications. Not just CPUs or GPUs, but FPGAs, SoCs, and dozens of other design types from more than one thousand customers.

IBM on the other hand does not have that breadth of fabless semiconductor customer experience so they went with a gate-first approach at 28nm leaving common platform partners GLOBALFOUNDRIES and Samsung way behind the manufacturing yield and performance curve. TSMC owns the 28nm node and that is why they will have another big revenue year in 2013. TSM stock at $20 in 2013! Believe it!

20nm will be a much more interesting node in regards to competition however. After learning the gate-first lesson, IBM is following TSMC with a gate-last HKMG implementation at 20nm. Unfortunately the added difficulty of 20nm double patterning and lithography challenges, which have yet to be solved at a production level, is causing delays. The fabless semiconductor ecosystem is working around the clock on this and I honestly expect a hockey stick 20nm production curve once this has been solved. Crowdsourcing at its finest!

The other big news was the Intel 22nm SoC process announcement at IEDM last week. I was very vocal about Intel not understanding the SoC business when they jumped into the foundry mix last year. This is a big first step but Intel still has a long way to go. I will do a much more detailed analysis in my next blog and you will see what I mean. Let me apologize in advance to the Intel PR people as I take some wind from their over blown sails.

While I enjoy my monthly trips to Taiwan it sure is good to be home. Absence does make the heart grow fonder.


Novocell team finishes record-breaking year with record number of new customers

Novocell team finishes record-breaking year with record number of new customers
by Eric Esteve on 12-14-2012 at 8:10 am

In this pretty shaky NVM IP market, where articles frequently mention legal battles rather than product features, it seems interesting to take a look at this Newsletter from Novocell Semiconductor starting with these words: “As the Christmas carols and festive music floods the airwaves (and the shopping areas) here in western Pennsylvania and the threat of cold and snow looms, the cheery band of engineers at Novocell are finding themselves overcome with the holiday spirit…”

This is a nice way to start a Newsletter, but you also can find some post about very interesting topics. Like this one about the legacy nodes, where we can discover that customer project increase at nodes from 90nm and above:

Rumors of the demise of legacy nodes have been greatly exaggerated;

2012 customer project work and interest increases at nodes from 90nm to 350nm!

During the past 6 months, we have been seeing a noticeable increase in inquiries, quotes, and project wins for customer projects at nodes from 130nm, 180nm, 250nm and that old standby, 350nm.

Reading the industry press would give the impression that the world has moved to 28 and 40nm processes, and that a vast majority of the microelectronics/SoC industry is preparing to rollout on 20nm next year and on 14nm finFET as soon as TSMC’s new plant doors open in 2015…
… to be continued in the newsletter

If you didn’t knew, you will learn why Novocell’s Smartbit-family of OTP, the only anti-fuse vendor providing OTP at the larger process nodes, provides the highest level of foundry and process independence available.
Very interesting to notice, Novocell has decided to propose a new type of license, that you usually would see in the Software of RTL IP market:

Novocell introduces new “Evaluation Period” License

As more and more firms have come knocking to inquire about antifuse OTP in general, and advantages of our unique Smartbit technology in specific, we have encountered a large number of firms who have had experience with EEPROM and conventional ROM, but have not worked with memory quite like ours.

Perhaps like you, they are used to getting surprises downstream in projects: additional chip area needs for redundant bits and error correction circuitry based on reliability requirements; Or they commonly have been burned by other technologies’ integration difficulties or late-breaking news of needs for charge pumps, control circuitry, or other “gotcha’s”.

To help ease these concerns, Novocell has introduced an Evaluation License, that provides your team from 6-9 months to evaluate the Novocell IP macro of interest at a cost reduced to cover support and IP updates, and allows for conversion to a standard IP license upon move to production.

Did you know that Novocell had close a strong technology partnership with major silicon foundries, like IBM, and that Novocell has completed the final validation step and has been approved as a IBM Foundry Business Partner, along with recognition of the foundational Novobits OTPproduct line as a Ready for IBM Technology?

Then, You should have a look at the newsletter!

Eric Esteve from IPNEST



Apache Presents: ESD analysis

Apache Presents: ESD analysis
by Paul McLellan on 12-13-2012 at 1:15 am

The 26th Conference on VLSI Design will be in Pune, India from January 5th to 10th at the Hyatt Regency. Details on the conference here. Registration here. I happened to be involved in the first of these conferences, which was held in Edinburgh where I was wrapping up my PhD. It was in the considerably less palatial surroundings of the Appleton Tower before it was modernized and was run on a shoe-string. We would have been surprised to have a time-traveler come back and tell us that the conference would still be going in 2013 and perhaps even more surprised that it would be in India.

At next month’s conference, Apache (I guess you all know it is a subsidiary of Ansys by now, don’t you) is presenting a paper jointly with nVidia

The paper is on Comprehensive Layout-based ESD Check Methodology with Fast Full-chip Static and Maco-level Dynamic Solutions, presented by Norman Chang and Jai Pollayil of Apache and Ting-Sheng Ku of nVidia on 9th January from 11am to 12.20pm (the dreaded presentation spot being between the audience and lunch). By the way, I love the way the sessions are called “pre-lunch”, “post-lunch” and “post-afternoon-tea”. Bit like a cricket test match (which, for Americans and others who are totally bemused by cricket, takes place over 5 days with meal breaks, makes baseball seem like a blink of an eye).

The paper examines the comprehensive ESD static/dynamic methodology that was developed for failure diagnosis and predictive simulation of improvements. This methodology focused on full-chip static and dynamic analysis including modeling of die-level metal grid, substrate grid and well diode, package effective capacitance, and pogo pin. The paper includes real-world human body model (HBM) and charged device model (CDM) applications.

The paper focuses on an integrated methodology for how a full-chip static and block-based dynamic ESD methodology provides comprehensive coverage on HBM/CDM issues, which are resistive-dominant, and the dynamic problems that require more detailed RLC and transistor-level modeling. This methodology uses PathFinder, an Apache tool that performs full-chip static analysis and constructs dynamic ESD circuits with complete parasitic models in simulation.


New HP Memristor Material Developments

New HP Memristor Material Developments
by Ed McKernan on 12-12-2012 at 10:00 pm

At the recent NCCAVS Thin Film Users Group meeting in November, HP was on the program in the person of Joshua Yang who gave a materials centric look at the status of the HP ReRAM (Memristor) program. A colleague passed on the informative set of slides presented at the meeting. Being a former process integration team leader, I was immediately struck by a couple of TEMs on the Roadmap slide during my first scan of the presentation! Joining up a few dots and tracking down the reference (a joint paper with SK Hynix at the 2012 VLSI Technology Symposium*), the TEMs are probably from the Hynix/HP collaboration and show a 54nm (half pitch?) cross bar array fabricated over larger technology node CMOS. More over atReRAM-Forum.com.

By Christie Marrian