DAC2025 SemiWiki 800x100

SpringSoft Community Conferences

SpringSoft Community Conferences
by Paul McLellan on 07-18-2011 at 5:31 pm

During the next 6 months or so, SpringSoft will be running a dozen community conferences. These are open not just to users but to anyone interested in SpringSoft’s technology.

There will be 3 conferences in US in October in Irvine, Austin and San Jose. For more details as they become available check here. There will be three in Bangalore (India), Seoul (Korea) and Yokohama (Japan).

But the first three, coming up next month, are in Taiwan and China.

August 4th in Hsinchu, Taiwan 八月四号在新竹台灣

August 10th in Shanghai. 八月十号在上海中国

August 12th in Beijing。 八月十二号在北京中国

All have the same agenda. The morning will consist of two keynotes followed by lunch. Then in the afternoon there are parallel session covering either functional verification (Verdi and Protolink) or physical layout (Laker). The day wraps up at 5pm with closing results and a drawing for an iPad2.

Full details of the Asian seminars are here.


Variation Analysis

Variation Analysis
by Paul McLellan on 07-18-2011 at 1:33 pm

I like to say that “you can’t ignore the physics any more” to point out that we have to worry about lots of physical effects that we never needed to consider. But “you can’t ignore the statistics any more” would be another good slogan. In the design world we like to pretend that the world is pass/fail. But manufacturing is actually a statistical process and isn’t pass/fail at all. One area that is getting worse with each process generation is process variation and it is now breaking the genteel pass/fail model of the designer.

For those of you interested in variation, there is an interesting research note from Gary Smith EDA. One of the biggest takeaways is that, of course, you are interested in variation if you are designing ICs in a modern process node, say 65nm or below. In a recent survey of design engineer management, 37% identified variation-aware design as important at 90nm and all the way up to 95-100% at 28nm and 22nm. If you are not worrying about variation now, you probably should be and certainly will be. 65nm seems to be the tipping point.

Today, only about a quarter of design organizations already have variation-aware tools deployed with another quarter planning to deploy this year. The only alternative to using variation-aware tools is to guard-band everything with worst-possible-case behavior. The problem with this is that at the most advanced process nodes there isn’t really any way to do this, the worst case variation is just too large. The basic problem is well illustrated by this diagram: for some parameter the typical (mean) performance advances nicely, but the worse case performance doesn’t advance nearly so much since the increased variation means that some number of standard deviations from the mean hardly moves (and can even actually get worse). Inadequate handling of variation shows up as worse performance in some metric, or forces respins when the first design doesn’t work or, when problems get detected late in the design cycle, lead to tapeout delays.

All the main foundries have released reference flows that incorporate variation and analysis tools, primarily from Solido Design Automatioin.

Solido is the current leader supplying tools to address variation. The tools are primarily used by the people designing at the transistor level: analog and RF designers, standard-cell designers, memory designers and so on. STARC in Japan recently did a case study and the Solido variation tools exceeded STARC’s performance specifications across process corner and local mismatch conditions. Solido is also in the TSMC 28nm AMS 2.0 reference flow and have been silicon validated.

Gary Smith’s full report is here.
Solido’s website is here.
TSMC AMS 2.0 Wiki is here.


Richard Goering does Q&A with ClioSoft CEO

Richard Goering does Q&A with ClioSoft CEO
by Daniel Payne on 07-18-2011 at 11:05 am

Richard Goering is well-known from his editorial days at EE Times (going back some 25 years), now at Cadence he blogs at least once a week on EDA topics that touch Cadence tools.

Before DAC he talked with Srinath Anantharaman about how Cadence tools work together with ClioSoft tools to keep IC Design Data Management Simple.

Through just nine questions Richard finds out where ClioSoft came from, how their tools work inside of a Cadence IC flow, and what is new at DAC this year.

Also Read

Hardware Configuration Management at DAC

Cadence Virtuoso 6.1.5 and ClioSoft Hardware Configuration Management – Webinar Review

How Avnera uses Hardware Configuration Management with Virtuoso IC Tools


Intel Briefing: Tri-Gate Technology and Atom SoC

Intel Briefing: Tri-Gate Technology and Atom SoC
by Daniel Nenni on 07-17-2011 at 3:00 pm

Sorry to disappoint but my 2 hours at the Intel RNB was a very positive experience. It is much more fun writing negative things about industry leaders because I enjoy the resulting hate mail and personal attacks, but the candor and transparency of the Intel guys won me over. They even asked ME questions which was a bit telling. I also picked up a very nice Intel hat. I now blog for hats!

The first meeting was with Jon Carvill, Mobile Media guy at Intel. Before that he was VP of Communications at GlobalFoundries and Head of PR at AMD/ATI. I worked with Jon at GlobalFoundries, he’s a stand-up guy, very technical, and almost 7 feet tall. I don’t take pictures with Jon since he makes me look like a dwarf.

The second meeting was with Rob Willoner, a long time manufacturing guy at Intel and Radoslaw Walczyk, an Intel PR guy. You can find Rob’s Tri-Gate presentation HERE. In these types of meetings you watch the face of the PR guy when the technology guy answers questions. If the PR guy flinches you are getting good information!

The questions they asked me were about 40nm yield and 28nm ramping (I will blog on that next week). It was interesting that the conversation went there.

The questions I asked them were about Tri-Gate and Atom in regards to the foundry business. I’m a foundry guy and would really like to see Intel get serious and “raise the foundry competitiion bar”. With that said, here are my comments on Intel in the foundry business, Tri-Gate technology, and Atom SoCs:

[LIST=1]

  • Intel is definitely serious about the foundry business. Not only to promote Atom as an SoC block, but also to fill 22nm capacity. Intel will start the foundry business with FPGAs from Achronix and Tabula. FPGA’s have very regular structures which are easier to tune a process for. FPGA performance is also important and Intel is certainly the expert on high speed silicon.
  • Intel will not manufacture ARM designs. This kills the “Apple to foundry at Intel” rumors. The Apple A6 processor will be fabbed at TSMC 20nm HKMG using ultra low power 3D IC technology, believe it! This also makes Intel a “boutique” foundry like Samsung and not an “open” foundry like TSMC. That position could change of course but probably not in my life time.
  • Intel still has a lot to learn about a pure-play foundry design ecosystem. None of my design questions were answered and it was because they just did not know. Example: Intel does not acknowledge the term restricted design rules (RDRs) since microprocessor design rules have always been restricted. TSMC just went to RDRs in 28nm as a result of the 40nm ramping problem. More about that next blog.
  • The Ivy Bridge processor is not in production at 22nm. It’s a test vehicle only and will not be in production until sometime next year. 22nm Atom SoC production will be in 2013. The Intel PR guy flinched at this one. 😉 To be fair, Intel production levels are much higher than most so the Intel internal definition of production is not the same as the Intel PR definition.
  • What is the difference between Tri-Gate and FinFet? Tri-Gate is a type of FinFet, FinFet is more of a global term. Intel Tri-gate work started in 2002 and the current implementation uses standard semiconductor manufacturing equipment with a few extra steps. More on Tri-Gate HERE.
  • Tri-Gate manufacturing costs are +2-3%? That would be wafer manufacturing costs, which does not include mask and other prep costs. 2-3% is definitely a PR spin thing and not the actual cost delta.

    Clearly this is just scratching the surface of the briefing so if you have questions post them in the comment section and they will get answered. You have to be a registered SemiWiki user to read/write comments, when you register put “iPad2” in the referral section and you might even win one.

    By the way, when I’m not in Taiwan, I’m on the Iron Horse Trail with my new walking partner Max. Max is a six month old Great Dane and he already weighs 110 lbs. I like how Max’s big head makes mine look small. Peet’s Coffee in Danville is our favorite destination so stop on by and say “Hi”. Be nice though or Max will slobber on you.


  • Webinar: IP integration methodology

    Webinar: IP integration methodology
    by Paul McLellan on 07-17-2011 at 12:24 pm

    The next Apache webinar is coming up on 21st July at 11am Pacific time on “IP integration methodology”.

    This webinar will be conducted by Arvind Shanmugavel, Director Applications Engineering at Apache Design Solutions. Mr. Shanmugavel has been with Apache since 2007, supporting the RedHawk and Totem product lines. Prior to Apache he worked at Sun Microsystems for several years, leading various design initiatives for advanced microprocessor designs. He received his Masters in Electrical Engineering from the University of Cincinnati, Ohio.

    Today’s SoC consists of several IP blocks, developed internally or externally. To achieve a successful integration of IP into a single chip design requires a methodology that considers the power noise impact of merging sensitive analog circuitry with high-speed digital logic on the same piece of silicon. In addition, it must handle the sharing of IP information and knowledge between disparate design groups to ensure the design will work to specification and at the lowest cost. Apache’s power analysis and optimization solutions allow IP designers to validate their design and create protected and portable models that can be used for mixed-signal analysis and SoC sign-off. Apache’s IP Integration Methodology targets the design, validation, and cost reduction of highly integrated mixed-signal SoCs to help deliver robust single chip designs.

    More details on the webinars here.

    Register to attend here (and don’t forget to select semiwiki.com in the “How did you hear about it?” box).


    First low-power webinar: Ultra-low-power Methodology

    First low-power webinar: Ultra-low-power Methodology
    by Paul McLellan on 07-13-2011 at 12:10 pm

    The first of the low power webinars is coming up on July 19th at 11am Pacific time. The webinar will be conducted by Preeti Gupta, Sr. Technical Marketing Manager at Apache Design Solutions. Preeti has 10 years of experience in the exciting world of CMOS power. She has a Masters in Electrical Engineering from Indian Institute of technology, New Delhi, India.

    Meeting the power budget and reducing operational and/or stand-by power requires a methodology that establishes power as a design target during the micro-architecture and RTL design process, not something that can be left until the end of the design cycle. Apache’s analysis-driven reduction techniques allow designers to explore different power saving modes. Once RTL optimization is completed and a synthesized netlist is available, designers can run layout-based power integrity to qualify the success of RTL stage optimizations, ensuring that the voltage drop in the chip is contained. Apache’s Ultra-Low-Power Methodology enables successful design and delivery of low-power chips by offering a comprehensive flow that spans the entire design process.

    More details on the webinars here.

    Register to attend here (and don’t forget to select semiwiki.com in the “How did you hear about it?” box).


    And it’s Intel at 22nm but wait, Samsung slips ahead by 2nm…

    And it’s Intel at 22nm but wait, Samsung slips ahead by 2nm…
    by Paul McLellan on 07-12-2011 at 12:46 pm

    Another announcement of interest, given all the discussion of Intel’s 22nm process around here, is that Samsung (along with ARM, Cadence and Synopsys) announced that they have taped out a 20nm ARM test-chip (using a Synopsys/Cadence flow).

    An interesting wrinkle is that at 32nm and 28nm they used a gate-first process but that for 20nm they have switched to gate-last. Of course taping out a chip is different from having manufactured one and got it to yield well. There have been numerous problems with many of the novel process steps in technology nodes below 30nm.

    The chip contains an ARM Cortex-M0 along with custom memories and, obviously, various test structures.

    It is interesting to look at Intel vs Samsung’s semiconductor revenues (thanks Nitin!). In 2010 Intel was at $40B and Samsung was at $28B. But Samsung grew at 60% versus “only” 25% for Intel. Another couple of years of that an Samsung will take Intel’s crown as #1 semiconductor manufacturer.

    As I’ve said before, Intel needs to get products in the fast-growing mobile markets, and I’m still not convinced that Atom’s advantages (Windows compatibility) really matter. Of course Intel’s process may be enough to make it competitive but that depends on whether Intel’s wafers are cheap enough.


    Cadence aquires Azuro

    Cadence aquires Azuro
    by Paul McLellan on 07-12-2011 at 12:20 pm

    Cadence this morning announced that it has acquired Azuro. Azuro has become a leader in building the clock trees for high performance SoCs. A good rule of thumb is that the clock consumes 30% of the power in an SoC so optimizing it is really important. Terms were not disclosed.

    The clock trees involve clock gating which can reduce clock tree power by 30% (and thus overall chip power by 10%). The can improve performance of the clock tree by reducing skew and thus overall clock frequency by up to 10%. And all while reducing the area of the clock tree by as much as 30%.

    By reputation, Azuro’s technology is much better than the clock synthesis that comes for “free” in any of the major place and route systems. How easy it will continue to be to use, say, Synopsys for place and route while using Azuro’s ccopt (clock concurrent optimization technology) remains to be seen.


    On-chip supercomputers, AMBA 4, Coore’s law

    On-chip supercomputers, AMBA 4, Coore’s law
    by Paul McLellan on 07-11-2011 at 12:45 pm

    At DAC I talked with Mike Dimelow of ARM about the latest upcoming revision to the AMBA bus standards, AMBA 4. The standard gets an upgrade about every 5 years. The original ARM in 1992 ran at 10MIPS with a 20MHz clock. The first AMBA bus was a standard way to link the processor to memories (through the ARM system bus ASB) and to peripherals (through the ARM peripheral bus APB). Next year ARM-bsed chips will run at 2.5Ghz and deliver 7000 MIPS.

    Eric’s story of Thomson-CSF’s attempt to build a processor of this type of performance in 1987 points out that in those days that would have qualified as a supercomputer.

    The latest AMBA standard proposal actually steals a lot of ideas from the supercomputer world. One of the biggest problems with multi-core computing once you get a lot of cores is the fact that each core has its own cache and when the same memory line is cached in more than one place they need to be kept coherent. The simplest way to do this, which works fine for a small number of cores, is to keep the line in only one cache and invalidate it in all the others. Each cache monitors the address lines for any writes and invalidates its own copy, known as snooping. As the number of cores creeps up this become unwieldy and is a major performance hit as more and more memory accesses turn out to be to invalidated lines that therefore require an off-chip memory access (or perhaps another level cache, but much slower either way). The problem is further compounded by peripherals, such as graphics processors, that access memory too.

    The more complex solution is to make sure that the caches are always coherent. When a cache line is written, if it is also in other caches then these are updated too, a procedure known as snarfing. The overall goal is to do everything possible to avoid needing to make an off-chip memory reference, which is extremely slow in comparison to a cache-hit and consumes a lot more power.

    The news AMBA 4 supports this. It actually supports the whole continuum of possible architectures, from non-coherent caches (better make sure that no cores are writing to the same memory another is reading from) through the fully coherent snooped and snarfed caches described above.

    I’ll ignore the elephant in the room of how you program these beasts when you have large number of cores. Remember, what I call Coore’s law: Moore’s law means the number of cores on a chip is doubling every couple of years, it’s just not obvious yet since we’re still on the flat part of the curve.

    The other big hardware issue is power. On a modern SoC with heterogeneous cores and specialized bits of hardware then power can often be reduced by having a special mini-core. For example, it is much more power-efficient to use a little Tensilica core for MP3 playback than to use the main ARM processor, even though it is “just software” and so the ARM can perfectly well do it. Only one of the cores is used at a time: if no MP3 is playing the Tensilica core is powered down, if MP3 is playing then the ARM is (mostly) idle.

    However, when you get to symmetrical multiprocessing, there is no point in powering down one core in order to use another: they are the same core so if you didn’t need the core then don’t put it on the chip. If you have 64 cores on a chip then the only point of doing it is because at times you want to run all 64 cores at once. And the big question, to which I’ve not seen entirely convincing answers, is whether you can afford power-wise to light up all those cores simultaneously. Or is there a power limitation to how many cores we can have (unless we lower their performance, which is almost the same thing as reducing the number of cores).

    The AMBA 4 specification can be downloaded here.

    Note: You must be logged in to read/write comments.


    Design for test at RTL

    Design for test at RTL
    by Paul McLellan on 07-10-2011 at 3:09 pm

    Design for test (DFT) imposes various restrictions on the design so that the test automation tools (automatic test pattern approaches such as scan, as well as built-in self-test approaches) will subsequently be able to generate the test program. For example, different test approaches impose constraints on clock generation and distribution, on the use of asynchronous signals such as resets, memory controls and so on. If all the rules are correctly addressed then the test program should be complete and efficient in terms of tester time.

    The big challenge is that most of these restrictions and most of the analysis tools around work on the post-synthesis netlist. There are two big problems with this. The first problem is that if changes are required to the netlist it is often difficult to work out how to change the RTL to get the “desired” change to the netlist and the only fix is to simply update the netlist, and accept that the RTL, the canonical design representation, is not accurate. The second problem is that the changes come very late in the design cycle and, as with all such unanticipated changes, can disrupt schedules and perhaps even performance.

    Paradoxically, most of these changes would have been simple to have made had the DFT rule-checking been performed on the RTL instead of the netlist. The changes can then be implemented at the correct level in the design, and earlier in the design cycle when they are planned and so not disruptive.

    The challenge is that many of the DFT rules relate to specific architectures for scan chains but at the RTL level the scan chains have not yet been instantiated and will not be until much later in the design cycle. So to be useful, an RTL approach needs to first infer which chains will exist without the expensive process of instantiating them and hooking them all up. A second problem is that the various test modes alter how the clocks are generated and distributed (most obviously to the various scan chains etc). And a third issue is that test tools such as fault simulators required completely predictable circuit behavior. Bus contention or race conditions can create non-deterministic behavior which must be avoided. None of these three problems can be addressed by a simple topological analysis of the RTL.


    Instead a look-ahead architecture is required that predicts how the suite of test tools will behave and can then check that all the rules will pass. This can be done using a very fast sythesis to produce a generic hierarchical netlist, but with enough fidellity to allow checks such as latch detection. The netlist can then be flattened for fast checking of topological rules like combinational loop detection. This approach allows for DFT rule-checking to be done even before the block or design runs through synthesis, including accurately estimating the test coverage and so avoiding a scramble late in the design cycle to improve an inadequate coverage.

    The Atrenta SpyGlass-DFT white paper can be downloaded here.