RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Richard Goering does Q&A with ClioSoft CEO

Richard Goering does Q&A with ClioSoft CEO
by Daniel Payne on 07-18-2011 at 11:05 am

Richard Goering is well-known from his editorial days at EE Times (going back some 25 years), now at Cadence he blogs at least once a week on EDA topics that touch Cadence tools.

Before DAC he talked with Srinath Anantharaman about how Cadence tools work together with ClioSoft tools to keep IC Design Data Management Simple.

Through just nine questions Richard finds out where ClioSoft came from, how their tools work inside of a Cadence IC flow, and what is new at DAC this year.

Also Read

Hardware Configuration Management at DAC

Cadence Virtuoso 6.1.5 and ClioSoft Hardware Configuration Management – Webinar Review

How Avnera uses Hardware Configuration Management with Virtuoso IC Tools


Intel Briefing: Tri-Gate Technology and Atom SoC

Intel Briefing: Tri-Gate Technology and Atom SoC
by Daniel Nenni on 07-17-2011 at 3:00 pm

Sorry to disappoint but my 2 hours at the Intel RNB was a very positive experience. It is much more fun writing negative things about industry leaders because I enjoy the resulting hate mail and personal attacks, but the candor and transparency of the Intel guys won me over. They even asked ME questions which was a bit telling. I also picked up a very nice Intel hat. I now blog for hats!

The first meeting was with Jon Carvill, Mobile Media guy at Intel. Before that he was VP of Communications at GlobalFoundries and Head of PR at AMD/ATI. I worked with Jon at GlobalFoundries, he’s a stand-up guy, very technical, and almost 7 feet tall. I don’t take pictures with Jon since he makes me look like a dwarf.

The second meeting was with Rob Willoner, a long time manufacturing guy at Intel and Radoslaw Walczyk, an Intel PR guy. You can find Rob’s Tri-Gate presentation HERE. In these types of meetings you watch the face of the PR guy when the technology guy answers questions. If the PR guy flinches you are getting good information!

The questions they asked me were about 40nm yield and 28nm ramping (I will blog on that next week). It was interesting that the conversation went there.

The questions I asked them were about Tri-Gate and Atom in regards to the foundry business. I’m a foundry guy and would really like to see Intel get serious and “raise the foundry competitiion bar”. With that said, here are my comments on Intel in the foundry business, Tri-Gate technology, and Atom SoCs:

[LIST=1]

  • Intel is definitely serious about the foundry business. Not only to promote Atom as an SoC block, but also to fill 22nm capacity. Intel will start the foundry business with FPGAs from Achronix and Tabula. FPGA’s have very regular structures which are easier to tune a process for. FPGA performance is also important and Intel is certainly the expert on high speed silicon.
  • Intel will not manufacture ARM designs. This kills the “Apple to foundry at Intel” rumors. The Apple A6 processor will be fabbed at TSMC 20nm HKMG using ultra low power 3D IC technology, believe it! This also makes Intel a “boutique” foundry like Samsung and not an “open” foundry like TSMC. That position could change of course but probably not in my life time.
  • Intel still has a lot to learn about a pure-play foundry design ecosystem. None of my design questions were answered and it was because they just did not know. Example: Intel does not acknowledge the term restricted design rules (RDRs) since microprocessor design rules have always been restricted. TSMC just went to RDRs in 28nm as a result of the 40nm ramping problem. More about that next blog.
  • The Ivy Bridge processor is not in production at 22nm. It’s a test vehicle only and will not be in production until sometime next year. 22nm Atom SoC production will be in 2013. The Intel PR guy flinched at this one. 😉 To be fair, Intel production levels are much higher than most so the Intel internal definition of production is not the same as the Intel PR definition.
  • What is the difference between Tri-Gate and FinFet? Tri-Gate is a type of FinFet, FinFet is more of a global term. Intel Tri-gate work started in 2002 and the current implementation uses standard semiconductor manufacturing equipment with a few extra steps. More on Tri-Gate HERE.
  • Tri-Gate manufacturing costs are +2-3%? That would be wafer manufacturing costs, which does not include mask and other prep costs. 2-3% is definitely a PR spin thing and not the actual cost delta.

    Clearly this is just scratching the surface of the briefing so if you have questions post them in the comment section and they will get answered. You have to be a registered SemiWiki user to read/write comments, when you register put “iPad2” in the referral section and you might even win one.

    By the way, when I’m not in Taiwan, I’m on the Iron Horse Trail with my new walking partner Max. Max is a six month old Great Dane and he already weighs 110 lbs. I like how Max’s big head makes mine look small. Peet’s Coffee in Danville is our favorite destination so stop on by and say “Hi”. Be nice though or Max will slobber on you.


  • Webinar: IP integration methodology

    Webinar: IP integration methodology
    by Paul McLellan on 07-17-2011 at 12:24 pm

    The next Apache webinar is coming up on 21st July at 11am Pacific time on “IP integration methodology”.

    This webinar will be conducted by Arvind Shanmugavel, Director Applications Engineering at Apache Design Solutions. Mr. Shanmugavel has been with Apache since 2007, supporting the RedHawk and Totem product lines. Prior to Apache he worked at Sun Microsystems for several years, leading various design initiatives for advanced microprocessor designs. He received his Masters in Electrical Engineering from the University of Cincinnati, Ohio.

    Today’s SoC consists of several IP blocks, developed internally or externally. To achieve a successful integration of IP into a single chip design requires a methodology that considers the power noise impact of merging sensitive analog circuitry with high-speed digital logic on the same piece of silicon. In addition, it must handle the sharing of IP information and knowledge between disparate design groups to ensure the design will work to specification and at the lowest cost. Apache’s power analysis and optimization solutions allow IP designers to validate their design and create protected and portable models that can be used for mixed-signal analysis and SoC sign-off. Apache’s IP Integration Methodology targets the design, validation, and cost reduction of highly integrated mixed-signal SoCs to help deliver robust single chip designs.

    More details on the webinars here.

    Register to attend here (and don’t forget to select semiwiki.com in the “How did you hear about it?” box).


    First low-power webinar: Ultra-low-power Methodology

    First low-power webinar: Ultra-low-power Methodology
    by Paul McLellan on 07-13-2011 at 12:10 pm

    The first of the low power webinars is coming up on July 19th at 11am Pacific time. The webinar will be conducted by Preeti Gupta, Sr. Technical Marketing Manager at Apache Design Solutions. Preeti has 10 years of experience in the exciting world of CMOS power. She has a Masters in Electrical Engineering from Indian Institute of technology, New Delhi, India.

    Meeting the power budget and reducing operational and/or stand-by power requires a methodology that establishes power as a design target during the micro-architecture and RTL design process, not something that can be left until the end of the design cycle. Apache’s analysis-driven reduction techniques allow designers to explore different power saving modes. Once RTL optimization is completed and a synthesized netlist is available, designers can run layout-based power integrity to qualify the success of RTL stage optimizations, ensuring that the voltage drop in the chip is contained. Apache’s Ultra-Low-Power Methodology enables successful design and delivery of low-power chips by offering a comprehensive flow that spans the entire design process.

    More details on the webinars here.

    Register to attend here (and don’t forget to select semiwiki.com in the “How did you hear about it?” box).


    And it’s Intel at 22nm but wait, Samsung slips ahead by 2nm…

    And it’s Intel at 22nm but wait, Samsung slips ahead by 2nm…
    by Paul McLellan on 07-12-2011 at 12:46 pm

    Another announcement of interest, given all the discussion of Intel’s 22nm process around here, is that Samsung (along with ARM, Cadence and Synopsys) announced that they have taped out a 20nm ARM test-chip (using a Synopsys/Cadence flow).

    An interesting wrinkle is that at 32nm and 28nm they used a gate-first process but that for 20nm they have switched to gate-last. Of course taping out a chip is different from having manufactured one and got it to yield well. There have been numerous problems with many of the novel process steps in technology nodes below 30nm.

    The chip contains an ARM Cortex-M0 along with custom memories and, obviously, various test structures.

    It is interesting to look at Intel vs Samsung’s semiconductor revenues (thanks Nitin!). In 2010 Intel was at $40B and Samsung was at $28B. But Samsung grew at 60% versus “only” 25% for Intel. Another couple of years of that an Samsung will take Intel’s crown as #1 semiconductor manufacturer.

    As I’ve said before, Intel needs to get products in the fast-growing mobile markets, and I’m still not convinced that Atom’s advantages (Windows compatibility) really matter. Of course Intel’s process may be enough to make it competitive but that depends on whether Intel’s wafers are cheap enough.


    Cadence aquires Azuro

    Cadence aquires Azuro
    by Paul McLellan on 07-12-2011 at 12:20 pm

    Cadence this morning announced that it has acquired Azuro. Azuro has become a leader in building the clock trees for high performance SoCs. A good rule of thumb is that the clock consumes 30% of the power in an SoC so optimizing it is really important. Terms were not disclosed.

    The clock trees involve clock gating which can reduce clock tree power by 30% (and thus overall chip power by 10%). The can improve performance of the clock tree by reducing skew and thus overall clock frequency by up to 10%. And all while reducing the area of the clock tree by as much as 30%.

    By reputation, Azuro’s technology is much better than the clock synthesis that comes for “free” in any of the major place and route systems. How easy it will continue to be to use, say, Synopsys for place and route while using Azuro’s ccopt (clock concurrent optimization technology) remains to be seen.


    On-chip supercomputers, AMBA 4, Coore’s law

    On-chip supercomputers, AMBA 4, Coore’s law
    by Paul McLellan on 07-11-2011 at 12:45 pm

    At DAC I talked with Mike Dimelow of ARM about the latest upcoming revision to the AMBA bus standards, AMBA 4. The standard gets an upgrade about every 5 years. The original ARM in 1992 ran at 10MIPS with a 20MHz clock. The first AMBA bus was a standard way to link the processor to memories (through the ARM system bus ASB) and to peripherals (through the ARM peripheral bus APB). Next year ARM-bsed chips will run at 2.5Ghz and deliver 7000 MIPS.

    Eric’s story of Thomson-CSF’s attempt to build a processor of this type of performance in 1987 points out that in those days that would have qualified as a supercomputer.

    The latest AMBA standard proposal actually steals a lot of ideas from the supercomputer world. One of the biggest problems with multi-core computing once you get a lot of cores is the fact that each core has its own cache and when the same memory line is cached in more than one place they need to be kept coherent. The simplest way to do this, which works fine for a small number of cores, is to keep the line in only one cache and invalidate it in all the others. Each cache monitors the address lines for any writes and invalidates its own copy, known as snooping. As the number of cores creeps up this become unwieldy and is a major performance hit as more and more memory accesses turn out to be to invalidated lines that therefore require an off-chip memory access (or perhaps another level cache, but much slower either way). The problem is further compounded by peripherals, such as graphics processors, that access memory too.

    The more complex solution is to make sure that the caches are always coherent. When a cache line is written, if it is also in other caches then these are updated too, a procedure known as snarfing. The overall goal is to do everything possible to avoid needing to make an off-chip memory reference, which is extremely slow in comparison to a cache-hit and consumes a lot more power.

    The news AMBA 4 supports this. It actually supports the whole continuum of possible architectures, from non-coherent caches (better make sure that no cores are writing to the same memory another is reading from) through the fully coherent snooped and snarfed caches described above.

    I’ll ignore the elephant in the room of how you program these beasts when you have large number of cores. Remember, what I call Coore’s law: Moore’s law means the number of cores on a chip is doubling every couple of years, it’s just not obvious yet since we’re still on the flat part of the curve.

    The other big hardware issue is power. On a modern SoC with heterogeneous cores and specialized bits of hardware then power can often be reduced by having a special mini-core. For example, it is much more power-efficient to use a little Tensilica core for MP3 playback than to use the main ARM processor, even though it is “just software” and so the ARM can perfectly well do it. Only one of the cores is used at a time: if no MP3 is playing the Tensilica core is powered down, if MP3 is playing then the ARM is (mostly) idle.

    However, when you get to symmetrical multiprocessing, there is no point in powering down one core in order to use another: they are the same core so if you didn’t need the core then don’t put it on the chip. If you have 64 cores on a chip then the only point of doing it is because at times you want to run all 64 cores at once. And the big question, to which I’ve not seen entirely convincing answers, is whether you can afford power-wise to light up all those cores simultaneously. Or is there a power limitation to how many cores we can have (unless we lower their performance, which is almost the same thing as reducing the number of cores).

    The AMBA 4 specification can be downloaded here.

    Note: You must be logged in to read/write comments.


    Design for test at RTL

    Design for test at RTL
    by Paul McLellan on 07-10-2011 at 3:09 pm

    Design for test (DFT) imposes various restrictions on the design so that the test automation tools (automatic test pattern approaches such as scan, as well as built-in self-test approaches) will subsequently be able to generate the test program. For example, different test approaches impose constraints on clock generation and distribution, on the use of asynchronous signals such as resets, memory controls and so on. If all the rules are correctly addressed then the test program should be complete and efficient in terms of tester time.

    The big challenge is that most of these restrictions and most of the analysis tools around work on the post-synthesis netlist. There are two big problems with this. The first problem is that if changes are required to the netlist it is often difficult to work out how to change the RTL to get the “desired” change to the netlist and the only fix is to simply update the netlist, and accept that the RTL, the canonical design representation, is not accurate. The second problem is that the changes come very late in the design cycle and, as with all such unanticipated changes, can disrupt schedules and perhaps even performance.

    Paradoxically, most of these changes would have been simple to have made had the DFT rule-checking been performed on the RTL instead of the netlist. The changes can then be implemented at the correct level in the design, and earlier in the design cycle when they are planned and so not disruptive.

    The challenge is that many of the DFT rules relate to specific architectures for scan chains but at the RTL level the scan chains have not yet been instantiated and will not be until much later in the design cycle. So to be useful, an RTL approach needs to first infer which chains will exist without the expensive process of instantiating them and hooking them all up. A second problem is that the various test modes alter how the clocks are generated and distributed (most obviously to the various scan chains etc). And a third issue is that test tools such as fault simulators required completely predictable circuit behavior. Bus contention or race conditions can create non-deterministic behavior which must be avoided. None of these three problems can be addressed by a simple topological analysis of the RTL.


    Instead a look-ahead architecture is required that predicts how the suite of test tools will behave and can then check that all the rules will pass. This can be done using a very fast sythesis to produce a generic hierarchical netlist, but with enough fidellity to allow checks such as latch detection. The netlist can then be flattened for fast checking of topological rules like combinational loop detection. This approach allows for DFT rule-checking to be done even before the block or design runs through synthesis, including accurately estimating the test coverage and so avoiding a scramble late in the design cycle to improve an inadequate coverage.

    The Atrenta SpyGlass-DFT white paper can be downloaded here.


    Intel Twisting ARM?

    Intel Twisting ARM?
    by Daniel Nenni on 07-10-2011 at 11:00 am

    Intel’s new Tri-Gate technology is causing quite a stir on the stock chat groups. Some have even said if Intel uses its Tri-Gate technology on only Intel processors ARM will be in deep deep trouble. These guys are “Intel Longs” of course and they are battling “Intel Shorts” with cut and paste news clips.

    “ARM is in trouble & this is why. Future smartphones will require more & more capability/features/functions. That’s just the way it is. ARM is great at performance/power/other specs based on today’s capabilities. But, when the architecture gets stretched, all bets are off. We’re starting to see that today with certain benchmarks. Intel’s architecture will be far superior in the long run because they own the end-to-end (design to manufacturing), it will be scalable, it will be affordable, etc. The analysts are too dumb to understand this yet. They will in less than a year’s time though.” backbay_bston

    I don’t own any of these stocks so I’m financially neutral but clearly I’m very suspicious of Intel’s Tri-Gate claims as I blogged in: TSMC Versus Intel: The Race to Semiconductors in 3D! That blog got me an invitation to the Intel RNB (Robert Noyce Building) to meet with one of their manufacturing guys and talk about Tri-Gate. I spend a lot of time in Asia and saw the horrors of 40nm statistical process variation (yield). More recently I have seen a near perfect implementation of 28nm HKMG, but I promise you I’m going into this meeting with an open mind and an Intel powered laptop.

    In preparation for my technical deep dive on TriGate technology at RNB I need to come up with good questions so I will look smart. I could really use your help with this, here is what I have so far:

    On the manufacturing side:
    [LIST=1]

  • What is the difference between Tri-Gate and bulk CMOS HKMG?
  • Additional processing steps?
  • How many more masks/layers?
  • Special manufacturing equipment?

    On the design side:
    [LIST=1]

  • Spice Models: There are no “standards” for multi-gate Spice models — the Compact Model Council has not really made adoption of MG models a priority… What did Intel use for device models and circuit simulation? An approach internal to Intel? (Most of the modeling research published in technical journals to date has been for a single fin.)
  • Layout Dependent Effects: For several generations of planar technologies, the influence of Layout Dependent Effects has continued to increase — what are the LDE in a Tri-Gate technology? For example, for six device fins in parallel, do the fins on the outer edges behave differently than the middle fins? Or, is the volume of the fin so small that adjacent layout structures have little influence on the device current? (If LDE is less of an issue with Tri-Gates, that would be a major turning point in CAD tools and flows.) Restricted design rules?
  • Custom parasitic extraction with Tri-Gate is very challenging! There are unique device parasitics associated with Tri-Gates — the input gate resistance is more intricate due to the 3D topology over and between fins, and the parasitic gate-to-drain and gate-to-source capacitances are likewise more involved. What approach did Intel take toward parasitic extraction? (Were “standard” multiple-fin device combinations chosen to simplify the task of (custom) parasitic extraction?)
  • Why 6 and 2? Intel appears to have “standardized” on offering two design choices — six FinFET’s in parallel and two in parallel — what were the considerations that went into this choice? (also, see #3)
  • AMSdesign impact of Tri-Gate: Analog mixed-signal designs are constrained by the limited availability of diodes and resistors that are available in planar technology — what circuit methodology changes did the AMS design teams have to make? Did Intel ever consider offering a mixed TriGate and planar device offering on the same die.
  • MultiVt Device Options and Circuit Optimization: Tri-Gate does not offer the custom circuit designer as much freedom in design optimization, due to the quantization of the device width in increments of additional fins… what changes did Intel make to their circuit optimization methods? What device Vt and gate length options are available to designers for optimization?
  • Thermal Characteristics: What additional thermal heat transfer issues are present, due to the power dissipation in the small volume of the fin?
  • Tri-Gate vs. Dual-Gate FinFET’s: Was this comparison done? Why did Intel choose a “tri-gate” device, rather than a “dual-gate” device (with a thicker, non-contributing oxide on top of the fin? (Tri-Gate devices are reported to have worse leakage current behavior, at the top corners of the fin.)
  • Statistical Process Variation: How will it be addressed? What are the major contributors to statistical process variation with FinFET fabrication?
  • Fin Dimensions: The fin height, fin thickness, and spacing between fins are key manufacturing parameters toward achieving a high circuit density — what criteria did Intel use in optimizing the Tri-Gate device dimensions?

    Let me know what else interests you about Intel’s new Tri-Gate technology. Clearly the design side questions are for the people who believe Intel is a foundry.

    Tri-Gate technology certainly could be a game changer, especially for AMD. How is AMD going to compete on processor speed using 28nm Gate-First HKMG technology? Is this a factor in AMD’s inability to attract a top CEO candidate?



    For those of you who have not met me before here is a recent mug shot.
    Not only do I have a hot wife half my age but look at the size of my head. You can only imagine how smart I am. Plus I drive a Porsche. Cool AND smart, absolutely.


  • Low Power Webinar Series

    Low Power Webinar Series
    by Paul McLellan on 07-08-2011 at 4:57 pm

    At DAC 2011 in San Diego, Apache gave many product presentations. Of course not everyone could make DAC or could make all the presentations in which they were interested. So from mid-July until mid-August these presentations will be given as webinars. Details, and links for registration, are here on the Apache website.

    The seminars are as below. All webinars are 11am to 12pm PDT.

    • Ultra-low-power methodology, July 19[SUP]th[/SUP]
    • IP integration methodology, July 21[SUP]st[/SUP]
    • PowerArtist: RTL power analysis, reduction and debug, July 26[SUP]th[/SUP]
    • RedHawk: SoC power integrity and sign-off for 28nm design, July 28[SUP]th[/SUP]
    • Totem: analog/mixd signal power noise and reliability, August 2[SUP]nd[/SUP]
    • PathFinder: full-chip ESD integrity and macro-level dynamic ESD, August 4[SUP]th[/SUP]
    • Chip-Package-System (CPS) convergence solution, August 9[SUP]th[/SUP]
    • Sentinel: PSI IC-package power and signal integrity solution, August 11[SUP]th[/SUP]