100X800 Banner (1)

Changing your IC Layout Methodology to Manage Layout Dependent Effects (LDE)

Changing your IC Layout Methodology to Manage Layout Dependent Effects (LDE)
by Daniel Payne on 04-18-2012 at 12:38 pm

Smaller IC nodes bring new challenges to the art of IC layout for AMS designs, like Layout Dependent Effects (LDE). If your custom IC design flow looks like the diagram below then you’re in for many time-consuming iterations because where you place each transistor will impact the actual Vt and Idsat values, which are now a function of proximity to a well:


Source: EE Times, Mentor Graphics

Analog designs are most sensitive to variations in Vt and current levels, especially for circuit designs that need precise matching.

Engineers at Freescale Semiconductor wrote a paper about Layout Dependent Effects and presented at CICC to quantify how much Vt and Idsat would change based on the location of MOS devices to the edge of a well.


Well Proximity Effect (WPE), Source: Freescale Semiconductor

What they showed was Vt became a function of proximity to the well edge and its value could shift by 50mv:


Vt variation. Source: Freescale Semiconductor

Drain current levels can vary by 30% based on proximity to the well edge:


Id variation. Source: Freescale Semiconductor

EDA developers at Mentor Graphics decided to create a different IC design methodology to provide earlier visibility to the IC design team about how LDE is impacting circuit performance. Here’s the new flow:


Source: EE Times, Mentor Graphics

Design constraints about matching requirements are entered at the schematic design state, then fed forward into an LDE estimator module for use during placement. A constraint would define the maximum change in Vt or Id levels between transistors that require matching.

While layout placement is being done the LDE estimator module can quickly determine how each MOS device Vt and Id values are impacted, then compare that to the design constraints provided by the circuit designer, all before routing is started. The layout designer can continue to rearrange transistor placement until all constraints are passing.

Notice how there was no extraction and SPICE circuit simulation required during this LDE estimation phase, the layout designer is interactively placing MOS devices and verifying that the layout is passing or failing the constraints set by the circuit designer.

Test Results
A two-stage Miller OTA amplifier circuit was designed and put through the new methodology.

Schematic capture and layout were done with Pyxis, extraction using Calibre and circuit simulation with Eldo. The target Gain and Bandwidth specs were first met by transistor sizing and circuit simulation, with results shown below:


The first layout iteration was done without the traditional IC flow shown, no LDE estimation was used however the extracted netlist failed both Gain and Bandwidth specs:

Next, layout was done with the LDE estimator module during placement to give the layout designer early feedback on MOS device constraints. The new layout is slightly different from the previous one and most importantly this new layout meets the Gain and Bandwidth specifications:

Here’s a table that summarizes the change in Vt and Id values for each MOS device compared between the first placement and final device placement:

Using the methodology of LDE estimation during placement produced an analog opamp with Vt variations that were up to 10X smaller, and Id variations that were up to 9X smaller.

Summary
Analog circuits are most sensitive to LDE effects, so you need to consider a new methodology to quickly provide feedback on how good your layout is while you are still interactively placing MOS devices instead of waiting until routing, extraction and circuit simulation are completed. This new methodology is all about early feedback which will actually speed up analog design closure.



ARM Seahawk

ARM Seahawk
by Paul McLellan on 04-17-2012 at 8:27 pm

I wrote on Monday about ARM’s Processor Optimization Packs (POPs). In Japan they announced yesterday the Seahawk hard macro implementation in the TSMC 28HPM process. It is the highest performance ARM to date, running at over 2GHz. It is a quad-core Cortex A15.

The hard macro was developed using ARM Artisan 12-track libraries and the appropriate Processor Optimization Pack announced on a couple of days ago. Full details will be announced at the CoolChips conference in Yokohama Japan today. It delivers three significant firsts for the ARM hard macro portfolio, as not only is this the first quad–core hard macro, but also the first hard macro based on the highest performance ARMv7 architecture-based Cortex-A15 processor, and it is also the first hard macro based on 28nm process.

The ARM press release is here. A blog entry about the core is here.


Previewing Intel’s Q1 2012 Earnings

Previewing Intel’s Q1 2012 Earnings
by Ed McKernan on 04-17-2012 at 9:15 am

Since November of 2011 when Intel preannounced it would come up short in Q4 due to the flooding in Thailand that took out a significant portion of the HDD supply chain, the analysts on Wall St. have been in the dark as to how to model 2012. Intel not only shorted Q4 but they effectively punted on Q1 as well by starting the early promotion of Ivy Bridge ultrabooks at the CES show in January. Behind the scenes, Intel made a hard switch to ramping 22nm production at three fabs faster than what is typical in order to cross the chasm and leave AMD and nVidia behind. But that is not all, I believe Paul Otellini will take considerable time discussing the availability of wafers at Intel relative to that of TSMC and Samsung in supplying the demands expected to come from this years’ Mobile Tsunami.

As mentioned in previous writings, the capital expenditures put forth by Intel in 2011 and expected in 2012 point to a company that expects to nearly double in size (wafer capacity) by end of 2013. The single digit PC growth and mid-teen server growth can not soak up all the new wafers. It has to come from another high volume segment. I have speculated that it is Apple and other tablet and smartphone OEMs. In rough numbers it would be on the order of 400MU of mobile processing capacity. Or it can be a combination of processors and 3G/4G silicon. Either way, it was a big bet on Intel’s part to go out and expand their fab footprint.

In the last few weeks there have been a series of articles on this site and in EETimes that at first argued TSMC was having yield issues at 28nm. As time as gone on, it appears that it was not yield issues but capacity or lack thereof. TSMC’s customers made forecasts two to three years ago, during the worst part of the economic crises that did not account for the step function increase in demand for leading edge capacity to service our Mobile Tsunami build out. The difficulty of any foundry is to modulate the demands of multiple inputs. TSMC has to be aware of double counting that leads to four or five vendors expecting to own 200% of the ARM processor market or wireless baseband chips. Intel, however, did make the bet but probably based on the strength of their process technology.

But there are intriguing questions on Intel’s side as well. For the past year, I have observed and noted that the ASPs on Intel chips no longer fall every 6 to 8 weeks like they did in their old model. It was part of the strategy to keep competitors gasping for air to keep up. It seems to say that they can set prices at will.

Even more interesting is the fact that the first Ivy Bridge parts to be introduced are in the mid to high-end range which is different than what they did in the past. The low end Ivy Bridge will not arrive until late Q3. This says there is either very high demand for Ivy bridge or they can’t build enough or both. Ramping production on three fabs means a lot of wafers are headed down the line with the goal of getting yield up sooner. Is the 22nm trigate process one that inherently have lower yield? If the answer is that Intel will get into high yield mode this summer, than they have the flexibility of selling FREE $$$ Atoms into the Smartphone space with the goal of attaching higher ASP based 3G/4G baseband chips – this is my theory as to how they ramp revenue starting late 2012 and through the 2013 year, which is before TSMC and Samsung can catch up on 28nm capacity. Apple, who just launched their new iPAD with the A5X built on the antiquated 45nm process will be taking lots of notes today.

FULL DISCLOSURE: I am long INTC, AAPL, QCOM, ALTR



Laker Wobegon, where all the layout is above average

Laker Wobegon, where all the layout is above average
by Paul McLellan on 04-17-2012 at 4:00 am

TSMC’s technnology symposium seems to be the new time to make product announcements, with ARM and Atrenta yesterday and Springsoft today.

There is a new incarnation of Springsoft’s Laker layout family, Laker[SUP]3[/SUP] (pronounced three, not cubed). The original version ran on its own proprietary database. The second version added openAccess to the mix, but with an intermediate layer to allow both databases to work. Laker[SUP]3[/SUP] bites the bullet and uses openAccess as its only native database. This gives it the performance and capacity for 28nm and 20nm flows.

There are a lot of layout environments out there. Cadence, of course, has Virtuoso. Synopsys already had one of their own and then with the acquisition of Magma have a second one. Mentor is in the space. Some startups are in the space too. Springsoft had an executive pre-release party on Thursday last week (what EDA tool doesn’t go better with a good Chardonnay) and one senior person (who had better remain nameless since I don’t think it was meant to be an official statement of his employer) said that he thought that by the time we get to 20nm there are only going to be a couple of layout systems with the capability to remain standing and Springsoft would be one.

There are three big new things in Laker[SUP]3[/SUP]. The first is the switch to openAccess. But they didn’t just switch they also re-wrote all the disc access part so that there is a performance increase of 2-10X on things like reading in designs or streaming out gds2. But many intermediate things are also reading and writing stuff to disk so it is not just the obvious candidates that speed up.

The second is that the previous versions of Laker had a table driven DRC. That has been completely re-written since just simple width and spacing type rules are no longer adequate (‘simple’ is not a word that anyone would use about 28nm design rules, let alone 20nm with double patterning and other weird stuff). The new DRC can handle these types of rules, but it is not positioned as a signoff DRC, it is used by all the rule-driven functions and by place and route. On the “trust but verify” basis, Calibre is also built into Laker in the form of Calibre RealTime that runs continuously in the background giving instant feedback using the signoff rule deck. Since no designer can actually comprehend design rules any more, this is essential. The alternative, as one customer of another product complained, is having to stream out the whole design every 15 minutes and kick off a Calibre run.

The third big development is an analog prototyping flow. One big difference is that most constraint generation (to tell the placer what to do) is automatically recognized as opposed to the user having to provide a complex text file of constraints. Symmetrical circuits are recognized by tracing current flow, common analog and digital subcircuits such as current mirrors are recognized. The library of matched devices is extendible so that prototyping flow gets smarter over time as the idiosyncrasies of the designer, design or company get captured. There have been numerous attempts to improve the level of automation in analog layout, the hillside is littered with the bodies. This looks to me as if it manages to strike a good balance between automating routine stuff while still leaving the designer in control (analog design will never be completely automatic, let’s face it).

Laker for a time was regarded somewhat unfairly as “only used by people in Taiwan” where admittedly it has become the dominant tool. But two of the top five fabless semiconductor companies have standardized on Laker, and five of the top ten semiconductor companies are using it. And the hors d’ouvres in the edible spoons at the launch party were pretty neat.

More details on Laker[SUP]3[/SUP] are here.



Soft Error Rate (SER) Prediction Software for IC Design

Soft Error Rate (SER) Prediction Software for IC Design
by Daniel Payne on 04-16-2012 at 10:00 am

My first IC design in 1978 was a 16Kb DRAM chip at Intel and our researchers discovered the strange failure of Soft Errors caused by Alpha particles in the packaging and neutron particles which are more prominent at higher altitudes like in Denver, Colorado. Before today if you wanted to know the Soft Error Rate (SER) you had to fabricate a chip and then submit it to a specialized testing company to see the Failure In Time (FIT) levels. It can be very expensive to have an electronic product fail in the field because of Soft Errors and the SER levels are only increasing with smaller process nodes.


Intel 2117, courtesy of www.cpumuseum.com

Causes of SER
Shown below are the three causes of SER:
[LIST=1]

  • neutrons found in nature can strike Silicon creating alpha particles
  • Impurities in packaging materials emit alpha particles
  • Boron impurities can create alpha particles

    When an alpha particle strikes the IC it can upset the charge in a memory cell or flip-flop, causing it to change states, leading to a temporary logic failure.

    SER Prediction Software
    The good news is that today a company called iROC announced two software tools that will actually allow IC designers to predict and pinpoint the layout and circuit locations that are most susceptible to high FIT levels.

    • TFIT (Transistor Failure In Time)
    • SOCFIT (SOC Failure in Time)

    The TFIT tool reads in something called a Response Model provided by the Foundry, your SPICE netlist, and GDS II layout, it then runs a SPICE circuit simulation using HSPICE or Spectre (can be adapted to work with Eldo, etc.). Output from TFIT is the FIT rate of each cell and it can show you which transistors are most triggered by neutron particles so that you can improve your design sensitivity. This simulation run takes tens of minutes.

    SRAM designers can add Error Correcting Codes (ECC) to their designs to mitigate FIT, however a Flip-Flop has no ECC so one choice is to harden the FF which creates a cell that is 2X or 3X the size and power.

    A FF netlist can be analyzed by TFIT in about 10-20 minutes.

    SER Data has the FIT info for all FF and SRAM cells, including combinational logic.

    SOCFIT can be run on either the RTL or gate-level netlist, and has a capacity of 10+ million FFs. It uses a static timing analysis tool (Synopsys Primetime, Cadence), and can also use simulation tools for fault injection (Synopsys, Cadence). It first runs a static analysis on RTL or gates to determine the overal FIT rate, if your design is marginal then you can run a dynamic analysis using fault injection (typical 10 hour run time). This approach could use emulation to speed up results in the future.

    The SOCFIT tool answers the question, “Which cells are the most sensitive in my design?”

    You can even run SOCFIT before final tapeout, while logic is changing. SOCFIT has been under development for 8 years now, and they’ve seen good correlation between prediction and actual measurement.

    SER Info
    Both memory and logic have SER issues, even FF circuits, but not so much combinational logic because of its high drive.

    One particle can upset multiple memory bits now in nodes like 40nm and smaller.

    SRAM is more sensitive to neutron particles than FFs, then DRAMs are less sensitive because alpha particles impact leakage.

    Flash memory is even less sensitive than DRAM to Single Event Upsets (SEU).

    The FPGA architecture is most sensitive to SER because of the heavy use of FF cells.

    Bulk CMOS is more sensitive than SOI.

    FinFET is new, so iROC is just starting to analyzing that from an R&D viewpoint using 3D TCAD models. You can expect to see more data later in the year.

    TFIT will cover all voltages, and process variations.

    TSMC provides the Response Model input to TFIT, and they have been providing the SER Data to customers based on testing in the past, not simulation.

    iROC – The Company
    iROC (Integrated RObustness On Chip) has a mission to analyze, measure and improve SER on ICs. They’ve been providing SER testing services since 2000, where they bring chips to a Cyclotron and expose them with Neutron beams to replicate 10 years of life in just minutes. iROC also partners with foundries like TSMC and GLOBALFOUNDRIES.

    Competition to the iROC approach are mostly internally developed R&D tools from IDMs.

    Some 500 chips have been tested so far, so iROC understands the problems and how to prevent them from being catastrophic.

    Summary
    iROC is the first commercial EDA company to offer two SER analysis tools used at the cell and SOC levels, the tool results correlate well with actual measurements on silicon chips. This will be an exciting company to watch grow a new EDA tool category in the reliability analysis segment.


  • Atrenta’s Spring Cleaning Deal

    Atrenta’s Spring Cleaning Deal
    by Paul McLellan on 04-16-2012 at 9:00 am

    Atrenta is having a special offer to let you “spring clean” your IP for free. They are providing two weeks of free access to the Atrenta IP kit starting from today, April 16th, until the end of May. During this period, qualified design groups in the US will be able to use the kit for two consecutive weeks to “spring clean” their third party or internally developed IP blocks at no cost.

    Atrenta’s IP Kit is also used by TSMC to quality soft IP for inclusion in the TSMC 9000 IP library. See my blog here. Plus it is TSMC’s technology symposium tomorrow.

    The IP Kit generates two important reports: the Atrenta DashBoard and DataSheet.


    The Atrenta DashBoard provides a pass/fail status for all IP blocks. It shows the status of the block for key design objectives such as CDC, power, test, timing constraints and more. It also reflects overall readiness of the IP as measured by various quality goals. User-defined success criteria are used to report tolerance to fatals, errors and warnings. Designers are able to drill down to get additional information on the exact violations reported, as well as access trend data that shows overall progress to achieve a passing status over time. A SpyGlass Clean report has no failures reported.


    The second report is the Atrenta DataSheet. This report focuses on IP characteristics. Once the DashBoard report is “clean,” the DataSheet acts as a final handoff document that captures key information about the IP block, such as the I/O table, clock trees, reset trees, final power spec, test coverage, constraints coverage and more. Especially useful when a block is being integrated, the report gathers this key information into one easy-to-read HTML document.

    And if you really get carried away with the idea of spring cleaning, my condo could do with some attention.

    Details on the IP Kit Spring Cleaning promotion is here.

    And Atrenta’s geek friend has his own take (1.5 mins):


    High Yield and Performance – How to Assure?

    High Yield and Performance – How to Assure?
    by Pawan Fangaria on 04-16-2012 at 7:30 am

    In today’s era, high performance mobile devices are asserting their place in every gizmos we play with and guess what enables them work efficiently behind the scene – it’s large chunks of memory with low power and high speed, packed as dense as possible. Ever growing requirement of power, performance and area led us to process nodes like 20nm, but that has a burgeoning challenge of extreme process variation limiting the yield. However there is no escape from detecting the failure rate early in the cycle to assure high yield.

    In case of memory, there can be billions of bit cells with column selectors and sense amplifiers and you can imagine the read / write throughput on those cells. Although redundant columns and error correction mechanisms are provided, they are not sufficient to tolerate bit cell failure above a certain number. The requirement here is to detect failure in the range of sigma of 6.

    So, how do we detect failure at such high precision? Traditional methods are mostly based on Monte Carlo (MC) simulation, the idea first invented by Stanislaw Ulam, John Neumann and Nicholas Metropolis in 1940. To get a feel of this, let’s consider a bit cell of 6 transistors with 5 process variables per device, making a total of 30 process variables. Below is the QQ plot of distribution of bit cell read current (cell_i) on x-axis and cumulative density function (CDF) on y-axis. Each dot on the graph is a MC sample point. There are 1 million samples simulated.


    QQ plot of bit cell read current with 1M MC samples simulated

    The QQ curve is a representation of the response of output to process variables. The bend in the middle of the curve means a quadratic response in that region. The sharp drop off in bottom left means a circuit cut off in that region. Clearly any method assuming linear response will be extremely inaccurate.

    Now consider the QQ plot for delay of a sense amplifier having 125 process variables.


    QQ plot of delay of sense amplifier with 1M MC samples simulated

    The three stripes indicate three distinct sets of delays indicating discontinuities; a small step in process variable space sometimes leads to major change in performance. Such strong nonlinearities will make linear and quadratic models completely fail. It must also be noted that the above result is obtained after 1M MC samples which covers circuits of about 4-sigma. For 6-sigma, one would need about 1 billion MC samples, not practical.

    In order to detect rare failures with lesser samples, many variants of MC method and other analytical methods have been tried, but each of them lacks in either of robustness, accuracy, practicality or scalability. Some of them can work with only 6 to 12 process variables. A survey of all of them is provided in a white paper by Solido Design Automation.

    Solido has developed a new method; they call it HSMC (High Sigma Monte Carlo) which is promising; fast, accurate, scalable, verifiable and usable. This method has been implemented as a high quality tool in Solido Variation Designer Platform.

    The HSMC method prioritizes simulations towards the most-likely-to-fail cases by adaptive learning through feedback from SPICE. It never rejects any sample in case it causes failure, hence increasing accuracy. The method can produce extreme tail of the output distributions (like in QQ plot), using real MC samples and SPICE accurate results in hundreds or a few thousand simulations. The flow goes something like this –

    • 1. Extract 6-sigma corners by simply running HSMC, opening the resulting QQ plot, selecting the point at the 6-sigma mark, and saving it as a corner.
    • 2. Bit cell or sense amplifier designs are tried with different sizing. For each candidate design, one only needs to simulate at the corner(s) extracted in the 1[SUP]st[/SUP] step. The output performances are at “6-sigma yield”, but only with a handful of simulations.
    • 3. Finally, verify the yield by doing another run of HSMC. The flow concludes if there are no significant interactions between process variables and outputs, which is generally the case. Otherwise, a re-loop is done, by choosing a new corner, designing against it and verifying.

    Let’s look at the results of HSMC applied on the same bit cell and sense amplifier designs –


    Bit cell_i – 100 failures in first 5000 samples


    Sense amp delay – 61 failures in first 9000 samples


    QQ plot of cell_i – 1M MC samples and 5500/100M HSMC samples
    MC would have taken 100M samples against 5500 with HSMC


    QQ plot of sense amp delay – 1M MC samples and 5500/100M HSMC samples

    The process is extended further for reconciliation between global (die-to-die, wafer-to-wafer) and local (within-die) statistical process variation. It is clear that this method is fast due to handful of samples to be simulated, accurate as no likely failure is rejected, scalable as this method can handle 100s of process variables, verifiable and usable.

    The details can be looked into the actual white paper, “High-Sigma Monte Carlo for High Yield and Performance Memory Design”, written by Trent McConaghy, Co-founder and CTO, Solido Design Automation, Inc.

    By Pawan Kumar Fangaria
    EDA/Semiconductor professional and Business consultant
    Email:Pawan_fangaria@yahoo.com


    Making your ARMs POP

    Making your ARMs POP
    by Paul McLellan on 04-16-2012 at 6:30 am

    Just in time for TSMC’s technology symposium (tomorrow) ARM have announced a whole portfolio of new Processor Optimization Packs (POPs) for TSMC 40nm and 28nm. For most people, me included, my first question was ‘What is a POP?’

    A POP is three things:

    • physical IP
    • certified benchmarking
    • implementation knowledge

    Basically, ARM takes their microprocessors, which are soft cores, and implements them. Since so many of their customers use TSMC as a foundry, the various TSMC processes are obviously among the most important. They examine the critical paths and the cache memories and design special standard cells and other elements to optimally match the processor to the process. They don’t do this just once, they pick a few sensible implementation choices (highest performance 4 core for networking, medium performance dual core for smartphones, lowest power single core for low end devices). A single POP contains all the components necessary for all these different power/performance/area points. Further, although we all casually say things like ‘TSMC 40nm’ in fact TSMC has two or three processes at each node to hit different performance/power points, so they have to do all of this several times.

    Then they provide the performance benchmarks that they managed to achieve, along with all the detailed implementation instructions as to how they did it. These are EDA tool chain independent since customers have different methodologies. But the combination of IP and documentation should allow anyone to reproduce their results or get equivalent results with their own implementations after any changes that they have made for their own purposes and to differentiate themselves from their competitors.

    Companies using the POPs get noticeably better results than simply using the regular libraries and doing without the specially optimized IP.

    About 50% of licensees of the processors for which POPs have been available seem to have licensed them, currently there are 28 companies using them. Here’s a complete list of the POPs (click to enlarge):
    Of course ARM has new microprocessors in development (for example, the 64 bit ones already announced) and they are also working closely with foundries at 20nm and 14nm (including FinFETs). So expect that when future microprocessors pop out that a POP will pop out too.

    About TSMC

    TSMC created the semiconductor Dedicated IC Foundry business model when it was founded in 1987. TSMC served about 470 customers and manufactured more than 8,900 products for various applications covering a variety of computer, communications and consumer electronics market segments. Total capacity of the manufacturing facilities managed by TSMC, including subsidiaries and joint ventures, reached above 9 million 12-inch equivalent wafers in 2015. TSMC operates three advanced 12-inch wafer GIGAFAB™ facilities (fab 12, 14 and 15), four eight-inch wafer fabs (fab 3, 5, 6, and 8), one six-inch wafer fab (fab 2) and two backend fabs (advanced backend fab 1 and 2). TSMC also manages two eight-inch fabs at wholly owned subsidiaries: WaferTech in the United States and TSMC China Company Limited, In addition, TSMC obtains 8-inch wafer capacity from other companies in which the Company has an equity interest.

    TSMC’s 2015 total sales revenue reached a new high at US$26.61 billion. TSMC is headquartered in the Hsinchu Science Park, Taiwan, and has account management and engineering service offices in China, Europe, India, Japan, North America, and, South Korea.