CEVA Dolphin Weninar SemiWiki 800x100 260419 (1)

Antun Domic, on Synopsys’ Secret Sauce in Design

Antun Domic, on Synopsys’ Secret Sauce in Design
by Paul McLellan on 07-20-2015 at 7:00 am

Antun Domic is the GM of the Design Group at Synopsys. I sat down with him a couple of weeks ago.

His name is Croatian although, of course, there was no Croatia back then it was part of Yugoslavia. But in fact he grew up in Chile and went to university there where he studied EE and math. He came to the US as a grad student and did a PhD at MIT in math. He returned to Chile in the Pinochet era and taught for 2 years before returning once and for all to the US (his wife is American). He worked briefly for Honeywell in EDA.

He then went to MIT Lincoln Labs doing work funded by DARPA on restructurable wafer-scale VLSI, big systems larger than chip, essentially silicon compilation. He says DARPA was key in the mid to late 1980s funding EDA research at MIT, Berkely, CMU etc and is probably one key reason why even today EDA is dominated by US companies.

He then moved to the microprocessor group of Digital in the era when every company had internal CAD. I actually interviewed there with his predecessor Alan Hanover, who went on to found Viewlogic. Now part of Synopsys! He left after tapeout of the second Alpha chip and went to Cadence.

At Cadence he ran the group that was doing their synthesis product back then called Synergy. But Synergy had a hard time against Synopsys’ Design Compiler and was eventually canceled (and Cadence bought Ambit where I was VP engineering, but that’s my story not Antun’s). Antun became head of P&R, where Cadence was a lot stronger.

He came to Synopsys in 1997, so nearly 20 years ago, and ran logic synthesis and timing analysis (DC and PrimeTime). Now he has essentially all design tools: logic synthesis, timing, P&R, test, layout verification, circuit simulation and custom layout.

The design environment has changed a lot. 15 years ago everyone designed on node N (whether analog or digital) and was just waiting to move to node N+1. Today there is a much wider spectrum of designs. 90nm for automotive. 28nm will be around for a long time. Obviously lots of stuff on 14/16nm.

A big business imperative for Synopsys is to service 10nm with their leading edge customers (the usual suspects but I won’t mention them since they sometimes get upset about that sort of thing). However, an important part of the industry will remain on established nodes. Some design is still done by hand but most is automated with the usual Verilog, DC, STA, P&R flow. One of the great successes of EDA is the level of automation which is stunning. Today we can do flat P&R of 10M cells, with 10 corners, GHz clocks, FinFET. Design Compiler in the early 1990s was limited to 5K gates.

The focus recently has been introducing ICC2 (Synopsys’ place & route system, called Newton internally). One of the rare occasions that an EDA company looks at the problem completely from scratch: new database, new optimization, new clock-tree synthesis. The only parts kept were placement and routing (which is under 25% of the code).

The main reason for starting from scratch was that you couldn’t just depend on getting increased single thread performance, just more cores. So almost everything needed to be rewritten for threading. Also, existing algorithms don’t always parallelize effectively and so new algorithms are needed to do that. On the horizon was 20/16/14/10 which would need much more speed and data capacity. ICC1 was already the fastest tool available but was still too slow. The plan was for a 5X increase in raw speed. For design planning it ended up 10X faster. There was 5M lines of new code written. Most of the Magma engineering team was retained (despite Magma’s P&R product Talus being phased out) and they contributed to this effort. Other additions were in timing analysis and circuit simulation, for example the current head of PrimeTime was the head of timing analysis at Magma.

Another major project was to take Custom Designer (Synopsys’ internally developed custom design) and Springsoft Laker (acquired, largely confined to Asia). An important project to combine the two technologies has been ongoing. Watch this space for official announcement.

Physical verification, called ICValidator is a much larger business than most people realize. It is used heavily internally in IP development and QA and is now much better supported by foundries. It has been used used to support the largest FinFET designs for many foundries.

The flows have changed. P&R is the tool where a lot of technologies get done today (metal fill, power minimization, chip finishing) and so connecting layout, extraction, verification is very important, such as running DRC/LVS without leaving the P&R environment. Incremental analysis is a big time saver: don’t reextract the whole chip if only a few nets have changed. Synopsys has a big effort in incremental analysis.

Things are much better. 10 years ago there were lots of data transfer format flow issues. Eventually stuff got truly standardized: .lib (aka liberty), LEF/DEF, UPF. Effort now is to make flows incremental and deciding when to do analysis.

Another challenge is teaching algorithms when to give up. When synthesis or P&R can’t make the constraints. Placement has to keep improving too since it is still handling a sea of standard cells but now IP is not just a memory in the corner but perhaps 500-1000 larger blocks spread through the sea. So the cells are no longer “roughly” the same size as with pure standard cells.

His group has about 2000 people distributed everywhere. Even, to his amazement, 70 engineers in his home country Chile.

See also Bijan Kiani Talks Synopsys Custom Layout and More


Cost Modeling as a Decision Making Tool

Cost Modeling as a Decision Making Tool
by Scotten Jones on 07-19-2015 at 10:00 am

The use of simulation is well established in the semiconductor industry. Virtually all circuit designs are run through a Spice simulation, layouts are analyzed for timing issues and even process development employs process simulation tools. What I believe is less widely used but just as useful is cost modeling.

The semiconductor industry has been driven by Moore’s Law for the last fifty years. Moore’s Law is not just about increasing chip density but also about lowering costs. Cost modeling can be a very useful tool in the process development, design and procurement phases of a chips lifetime allowing various alternatives to be evaluated for cost impact and even in supplier negotiations.

The first project I did after founding IC Knowledge was to evaluate potential cost savings for a customer if they did a shrink of an ASIC they were having made. The ASIC was originally expected to be a low volume runner but was now running in the millions of pieces per year. The first cost model I ran for the customer was a baseline of their current design and process. I immediately discovered that the customer was paying a greater than 70% margin for the ASIC. Armed with that knowledge the customer negotiated a reduction in the ASIC price of over a dollar a unit saving millions of dollars per year. Further modeling also revealed the opportunity to cost reduce the part by redesigning it into a smaller node. Since our modeling produced revised wafer cost estimates as well as mask set costs, the payback for the redesign could be easily calculated.

Cost modeling is only a useful tool if the results are accurate. Our models are highly complex bottoms up simulators. My company IC Knowledge LLC introduced our IC Cost and Price Model in 2000 and has now been selling it commercially for fifteen years. In that time the model has become the industry standard for cost modeling of low power silicon ICs such as ASICs, SOCs, microprocessors, microcontrollers, DRAM, Flash and many other chip types. We have built up an extensive customer and partner network that provides a steady stream of feedback that insures and refines the accuracy of our models. We have also introduced models for high power silicon integrated circuits and discrete devices, MEMS products and our newest product our Strategic Model that projects out to the 5nm node with detailed equipment and materials requirements, and wafer and process step costs.

Our modeling is routinely used for negotiations around wafer or finished product pricing, for benchmarking, outside analysis, technology selection and many other purposes. Our IC Cost and Price Model covers all of the major foundry processes and produces manufacturing wafer cost and selling price estimates by supplier, node, year and quarter, and volume. The estimates can be run out to 2020 to anticipate costs over the lifetime of a part. Our new strategic model has been very successful with materials and equipment companies and also offers the ability to cost out new processes during development.

With our new Strategic Cost Model a process integration engineer developing a new process could evaluate alternate process flows and evaluate the cost impact of specific process design decisions. Our easy to use IC Cost and Price Model allows designers to evaluate the cost of different process nodes and suppliers as well as various process adders. Our Discrete and Power Products Cost and Price model is particularly popular with the automotive industry and our MEMS Cost and Price Model can model complete MEMS products with up to 2 MEMS die and up to 2 IC die per part. All of these models come standard with 12 months of updates and support.

With the economics of semiconductor production under so much pressure at the leading edge nodes the use of cost modeling to optimize processes, designs and procurement is an essential tool for the semiconductor industry. IC Knowledge LLC is the world leader in this space with a broad line of standard off the shelf model available as well as custom project cost consulting.


GPS Chronicle: The Early History

GPS Chronicle: The Early History
by Majeed Ahmad on 07-19-2015 at 4:00 am

There is really nothing new about GPS: the technology was reinvented from the old. After satellite communications was established, scientists and engineers started to look for different ways of utilizing this fascinating space marvel. Radio navigation systems had been developed during the World War II for aircraft operations, which subsequently evolved into Loran satellite system.

In 1958, the U.S. Navy began working on Loran satellite to develop a system “Transit” for indicating the position of a receiver on the ground. Two years later, the navy launched Transit-1B system to demonstrate the feasibility of using satellites for navigational aids. A receiver on a ship used the measured shift of satellite’s radio signal, along with known characteristics of the satellite orbits, to calculate the ship position.


GPS has been reinvented from satellite communications

A practical system was born out of the need of the U.S. troops to pinpoint their locations during the Vietnam War. However, this system had a limited accuracy and was difficult to use due to its bulky terminal size. So, in the mid-1970s, the U.S. Department of Defense began a project to upgrade the navigation devices built around this concept for classified military use. The solution they developed required two dozen satellites, atomic clocks, microwave radio transmitters and some heavy-duty number-crunching hardware.

A more portable unit could now pinpoint an object’s exact location anywhere on the globe by receiving signals from a network of satellites in an orbit and triangulate them to determine latitude and longitude. The military called it Navstar, after the satellite constellation it used, but the industry and users ignored this nomenclature, and technology became known to the world as Global Positioning System or GPS.

The operational system contained twenty-one satellites in three orbital planes, with three spare satellites. The GPS collection of twenty-four satellites orbited twelve thousand miles above the Earth. These satellites constantly transmitted their precise time and position in space. With GPS, a receiver on ground or in the air could calculate its position using time signals from the satellites.

The calculation itself was based on a kind of triangulation—a math technique used to locate an object based on its distance from three points. So signals from three satellites were necessary, although in practice a fourth satellite was used to improve the accuracy of the other three signals. The result was that a GPS receiver could produce highly accurate coordinates of latitude, longitude, and altitude.


GPS was originally developed for military use

The U.S. Air Force played a crucial role in nurturing the GPS technology by incorporating features like accurate digital maps and satellite photographs. As a result, the pilots were able to spot the key target areas and hit them effectively. Precision-guided munitions, dubbed “smart bombs,” increasingly used GPS to hone in on a fixed target such as a military installation or an airfield.

Content of this article is based on excerpts from Smartphone: Mobile Revolution at the Crossroads of Communications, Computing and Consumer Electronics.


TSMC (Apple) Update Q2 2015!

TSMC (Apple) Update Q2 2015!
by Daniel Nenni on 07-18-2015 at 8:00 pm

The TSMC quarterly conference call was last week and of course it stirred up quite a bit of controversy. Let me share with you my experience, observations, and opinions and maybe together we can come up with an accurate prediction for 2016. First let’s take a look at 20nm and what people now call the “Apple effect.”

Correct me if I’m wrong here but this is how I remember it: The TSMC 20nm process was highly criticized for cost, power leakage, and yield prior to the arrival of the Apple A8 and A8x SoCs. As we now know 20nm was the fastest ramping process in the history of TSMC and the A8 powered iPhone 6 is a huge success. This much is now well documented.

Next came TSMC 16nm. Unfortunately, the first 16nm process did not meet expectations of the fabless semiconductor ecosystem as compared to Intel and Samsung 14nm. Intel 14nm was faster and denser and Samsung 14nm was lower power. This was clearly a missstep for TSMC but they learned from it and came back with 16FF+ (second generation FinFETs) which is now the best performing process of its kind. TSMC openly makes this claim but I have confirmed it with several early access IP and fabless companies and they would know. 16FF+ based mobile products will hit the market in Q4 2015 and you will be impressed, absolutely.

TSMC 16FF+ does use the same BEOL (back end of line) as 20nm, which is the second half of the chip manufacturing process. The FEOL (front end of line) however is quite different. In fact, you will see a difference between the original TSMC 16nm and 16FF+ which has resulted in a significant PPA improvement (performance, power, and area). So when Morris Chang claims that 16FF+, which is technically their second generation FinFET, will be an even faster ramp than 20nm I believe it to be true.

As I predicted last year, Apple chose Samsung 14nm LPE for the iPhone6S (A9 SoC) and TSMC 16nmFF+ for the iPads (A9x). I stand by that prediction even though on the conference call Morris Chang said that in 2016 TSMC 16nm market share will be much greater than “our next competitors.” Given that Apple and QCOMM, TSMC’s two largest customers, are currently using Samsung 14nm there is really only one way this prediction can come true: Apple and QCOM will use 16FFC (TSMC’s third FinFET generation) for their SoCs in 2016.

TSMC also mentioned that 10nm will be in production in Q1 2017 which supports the above prediction that the iProducts released in 2016 will not be 10nm. The other interesting thing to note is the PPA numbers for 10nm: 15% speed gain at the same total power, or more than 35% power reduction at the same speed, and with k density of 2.2 times that of 16nm FinFET. I can tell you that Apple will not accept a 15% speed gain for a new process. I was told that the new 16FFC process due out mid 2016 was built “with” Apple so I would expect the same for 10nm. 16nm FF+ provides a 40% higher speed and 60% power savings over 20nm. My prediction is that the Apple version of 10nm for the 2017 iProducts will offer a minimum 25% speed increase.

Sound reasonable?

The conference call transcript is HERE.


Why Drones Love Atmel SAM E70

Why Drones Love Atmel SAM E70
by Eric Esteve on 07-18-2015 at 7:00 am

Avionics is by nature a mature market, requiring the use of validated system solution: safety is an absolute requirement, innovative systems require stringent qualification phase. That’s why the very fast adoption of drones as alternative solution for human piloted planes is impressive. It took 10 years or so for drones to be widely developed and used for applications ranging from war to entertainment, pricing ranging from a few $100’s to several 100’s of $K. But, even if we consider consumer oriented, rather cheap drones, the processing needs require using not only high performance but also versatile MCU, able to manage gyroscope, accelerator, geomagnetic sensor, GPS, rotational station, 4 to 6 axis control, optical flow and so on.

When I was designing for Avionics, namely the electronic CFM56 motor control (this reactor being jointly developed by GE in the US and Snecma in France was the WW leader, equipping Boeing and Airbus planes), the CPU was a multi-hundred dollar Motorola 68020, leading to $20 per MIPS cost! I don’t precisely know the ATMEL SAM E70 price (I would guess that it cost a few dollars) but that I know is that the MCU is offering an excess of 600 DMIPS. This very high performance as well as the very large on-chip memory size, up to 384 Kbytes SRAM and 2 Mbytes Flash are the main reasons why this MCU has been selected to support the “Drone with integrated navigation control to avoid obstacle and improve stability”.

In fact the key design requirements for this application were: +600 DMIPS, Camera sensor interface, Dual ADC and PWM for motor control, Dual CAN and small package offering. Looking at the block diagram below helps linking the MCU features with the various application capabilities: Gyroscope (SPI), Accelerator (SPI x2), Geomagnetic sensor (I2C x2), GPS (UART), 1 or 2 channel rotational station (UART x2), 4/6 axis control communication (CAN x2), Voltage/current (ADC), Analog sensor (ADC), Optical Flow sensor (through Image Sensor Interface or ISI) and Pulse Width Modulator (PWM x8) to support rotational station and 4/6 axis speed PWM control.

SAM E70 is based on Cortex M7, a principle and multi verse handling MCU which can handle high performance combined with extensive peripheral sets supporting multi-threaded processes. This multi-thread support will open in the future many more drones capabilities than simply flying…

Today’s drones are capable to fly or stay stationary, takes pictures or movies… and that’s already very impressive to see sub-kilogram devices offering such capabilities! But the drone industry is already preparing the future, with the desire to get more application stacks into the Drones so they can take in automation, routing, cloud connectivity (when available), 4g/5g, and various optional connectivity to enhance data pulling and posting…. Just imagine a small town counting a few thousand habitants, except a couple of days or weeks per year, because of a special event or simply holidays when suddenly hundred thousand of people are coming. These peoples want to feed their smartphone with multimedia or share live experience by sending movies or pictures, most of them at the same time. The 4G/5G and cloud infrastructure is not tailored for such an amount of people, so the communication system may simply break. This could be fixed simply by sending drones to reinforce communication infrastructure.

This is just one example of what could be the advanced usage of drones and these innovative applications will be characterized by common set of requirements: high processing performance, large SRAM and Flash memory capability and extensive peripheral sets supporting multi-threaded processes. Cortex M7 ARM based SAM E70 MCU from Atmel is a good example, offering processing power in excess of 1000 DMIPS, large on-chip SRAM (up to 384 Kbytes) and Flash (up to 2 Mbytes) capabilities managing all sorts of sensors, navigation, automation, servos, motor, routing, adjustments, video/audio, and more.

More products and design kit on Atmel Sales portal:

By Eric Esteve from IPNEST


How ARM Implemented a Mali GPU using Logic Synthesis and Place/Route Tools

How ARM Implemented a Mali GPU using Logic Synthesis and Place/Route Tools
by Daniel Payne on 07-17-2015 at 12:00 pm

ARM is a well-known semiconductor IP provider and they often create a reference design so that SoC companies can have a starting point to work with. On the GPU side of IP the ARM engineers have an architecture called Mali, and a recent webinar hosted by Synopsys reviewed how the physical design area was minimized by using a combination of tools:

  • Logic Synthesis – Design Compiler Graphical
  • Place/Route – IC Compiler

Front-end design engineers should be attracted to Design Compiler Graphical over the standard Design Compiler tool for logic synthesis because of the promises of: improved QoR like up to 10% higher clock frequency, congestion prediction and optimization, floorplan exploration, and providing physical guidance to IC Compiler that gives 1.5X faster placement.

Pierre-Alexandre Bou-Ach from ARM talked about how the Mail GPUs were designed and optimized for smallest area or lowest power. The ARM Mali-T820 was a GPU optimized for smallest area. The Implementation Reference Methodology (iRM) for the Mali GPU is based on Synopsys tools and shows how to achieve a specific PPA (Power, Performance Area) result.

Related – Synopsys Eats Their Own Dog food

There are a multitude of both front-end and back-end factors that will affect silicon area for a GPU, like:

For an area-centric design the strategy is to continuously track area using multiple metrics:

  • Core area
  • Die area

    • Physical only cells area
    • Hard macro area
    • Memories area
  • Combinational cells area
  • Repeaters area
  • Sequential standard cells area
  • Standard cells area

Related – ARM A57 (A53) Virtualizer + IP Accelerated = ?

An area Pareto chart shows that the larges area contribution was coming from the combinational cells without repeaters. The grey line is cumulative area contribution.

An analysis of area by design hierarchy was performed so that any change to the RTL could be directly related to an area impact, and the biggest modules were identified during the earliest stages of development. The placement of blocks within the hierarchy was studied to understand how to minimize repeater insertions. The IC Compiler tool helps in area reduction by reporting why any new cells are being inserted, so for the shader core the new cells added were to fix hold time violations:

Some best practices in the iRM flow when using the 28HPM process node:

  • Apply dont_use constraints on high drive repeaters and complex cells
  • Use memories from the ARM compiler
  • Manage the cell density with placer_max_cell_density_threshold 0.80
  • Design Compiler Graphical

    • Use the SPG flow
    • Try hierarchy reduction and flattening
    • Increase area priority
    • Set a realistic clock latency
    • Use area recovery
  • IC Compiler

    • Control repeater insertion during placement
    • Refine path group control
    • Area recovery enabled
    • Layer optimizations


Using multibit registers (2 bit and 4 bit cells) versus no multibit showed a savings up to 32% with standard cell implementation. Using ultra high density memories where appropriate in the shader core provided 25.46% area reduction of the memory, while using UHD memories on the top-level L2 had a 16.37% area reduction. Total area reduction using UHD memories was 4.57% for the shader core and 6.87% for the top-level L2.

Adding up all of the optimizations the Mail-T820 GPU team was able to achieve >4% area savings across the total cell area, while at the same time leakage power was reduced by >4%.

Summary
ARM has created an iRM flow that provides a reference Mali-T820 design for minimum area when using the Synospys tools for logic synthesis and place/route. Watch the entire 25 minute archived webinar online here.


GlobalFoundries 22nm FD-SOI: What Happens When

GlobalFoundries 22nm FD-SOI: What Happens When
by Paul McLellan on 07-17-2015 at 7:00 am

Earlier in the week I wrote about GlobalFoundries announcement of 22nm FD-SOI. At SEMICON West there were three events that filled in some more details. First, on Tuesday, a lunch presentation given by SOITEC who make the wafer blanks that FD-SOI requires. Then on Wednesday I sat down for an hour with Gary Patton and Subi Kengeri to get more details. And finally, on Wednesday evening there was a meeting with many of the people who are participating in the 22nm FD-SOI ecosystem. See also GlobalFoundries FD-SOI. Yes, it’s trueGary Patton used to be the head of R&D at IBM Semiconductor. Since IBM is retaining semiconductor R&D then he was what was called a “voluntary” and he could decide whether to remain with IBM or join GlobalFoundries. He decided to join GF as the CTO and says he is “all in”. He is impressed with Sanjay Jha (who I assume was also the person who closed the deal with Gary to bring him over). A bit of history. IBM used SOI for its high end processors, but not fully-depleted, partially-depleted. That is a process that is very high performance, but expensive, and hard to deal with. There is also an RF-SOI process used by both IBM and GlobalFoundries (in Singapore). This has become the substrate of choice for building radios in the modern era with multiband phone. Your phone has some in since 100% of phones do these days (it is a lot cheaper to manufacture than SiGe or GaAs). The IBM RF process (where they are world leaders) will run in Burlington, Dresden and Singapore. See also GlobalFoundries Adds RF to 28nm Gary said that despite deciding to primarily go forward with FinFET that IBM continued to do research on FD-SOI at Albany. Plus, of course, STMicroelectronics developed 28nm FD-SOI which GlobalFoundries licensed. But when they went to customers, they were told the performance wasn’t high enough. So they decided to develop a 22nm version with the aim of getting very close to FinFET performance but with a manufacturing cost the same as 28nm, and much lower power. As I said in my earlier blog, there are actually 4 different processes that make up the 22FDX process family although it is very modular. Each process has a couple of extra masks but it is almost the same basic process. Why did they do this? After all, GlobalFoundries already has 14nm FinFET (licensed from Samsung). The business driver is that volume and growth are both higher at the low end. Yes, the most advanced application processors for mobile need FinFET but the price is too high for the mainstream. Think of a cheap application processor with battery life of a week for emerging markets. So what are the key features:

  • operation as low as 0.4V. It is the only process in the world that can do this, including any that are known to be in development
  • integrated RF. The insulating substrate makes this a lot easier
  • body bias allowing for tradeoffs between power and performance under software control
  • up to 70% lower power than 28HKMG
  • performance up to 70% faster than 28HKMG (with FBB at 1.5V, can actually go to 1.8V) although not at the same time as the lowest power
  • 50% fewer immersion layers than FinFET (hence the significantly manufacturing cost and lower mask cost)
  • 20% smaller die than 28 planar

So what about availability? The initial PDKs exist and have gone to early customers and IP developers. Because 22FDX is similar to 28nm FD-SOI, doesn’t require double patterning, and doesn’t have the complexity of FinFET, the program is on an accelerated schedule with design enablement (EDA, IP etc) working in parallel with technology development. They expect early tapeouts soon after the technology is qualified. So when will that be? They are running internal shuttles already (they call them TQE). They will start external shuttles in Q1 of 2016. Risk production is planned for the end of 2016. Apparently the first silicon run of the “lightning” testchip was closer to the target than anyone had ever seen before, with N transistors on the dot and P just 2% off. The process will run in the Dresden fab (where Dan has been this week, along with CEO Sanjay Jha, not to mention Angela Merkel). It uses the same toolset as 28nm. It could also run in Malta or East Fishkill, but not Singapore. There is plenty of capacity for high volume customers. I asked Gary about 10nm and 7nm. He pointed out that with the IBM semiconductor acquisition that there is a huge infusion of talent who have done leading edge TD for decades. Most of them are now in Malta to accelerate 10/7nm plus there is the Albany Nanotech Center just 20 minutes away. Later in the evening there was a one hour panel session, moderated by Subi, with:

  • Marie-Noëlle Semeria, CEO, Leti (research on FD-SOI)
  • Paul Boudr​e, CEO, Soitec (manufacturer of base wafers)
  • Ron Moore, VP Marketing, ARM (physical libraries and microprocessors)
  • Juan Rey, Senior Engineering Director, Mentor Graphics (EDA)
  • Brandon Wang, Group Director, Strategic Programs, Cadence (EDA and IP)
  • Jamil Kawa, Group Director, Synopsys (EDA and IP)
  • Bill Wang, VP and GM, VeriSilicon Holdings (design services)
  • Patrick Soheili, VP, Product Management and Corporate Development, eSilicon (fabless ASIC)
  • Dasaradha Gude, CEO, INVECAS (design services)

I won’t go into what everyone said. The main conclusions were that the forward body bias (FBB) is the only thing that requires special attention. Obviously physical verification rule decks need to be created but since DRC/LVS already supports 20nm planar, 16nm FinFET, 28FD-SOI, no issues are anticipated. The IP people all had 28FD-SOI experience and also don’t expect any issues. Ron Moore of ARM confirmed that they had PDKs and they were investigating the performance of ARM processors (which also means they must have built a preliminary standard cell library). So, it’s been FD-SOI week all week. Given everything I’ve seen and heard this is a real announcement of something significant.


How PowerArtist Interfaces with Emulators

How PowerArtist Interfaces with Emulators
by Pawan Fangaria on 07-16-2015 at 5:00 pm

Last month in DAC I could see some of the top innovations in the EDA world. EDA is a key enabler for advances in semiconductor designs. Among a number of innovations worth mentioning (about which I blogged just after DAC), the integration of Mentor’s Veloce with ANSYS’ PowerArtist for power analysis of live applications caught my attention. We already know about Veloce as a versatile tool for hardware emulation and PowerArtist as a versatile tool for power analysis of SoCs from RTL level. What makes the combination of two interesting is that the power consumption in a device during actual running of an application can be accurately measured and analyzed much faster. So I was more interested in learning about how exactly the interface between these tools work.

Before I go into the interface details, let me briefly mention about ANSYS PowerArtist functionality. PowerArtist provides power analysis of SoCs at RTL level in different measures such as average or time-based power. Also power-critical vector selection can be done. The PowerArtist uses RTL Power Model (RPM) for RTL-driven physical power integrity. The PowerArtist Calibrator and Estimator (PACE) technology ensures that early RTL power estimates track the final gate-level power numbers. The PowerArtist provides interactive debugging for power and employs various techniques for power reduction at clock, memory and logic level.

The activity data for computing power is typically acquired from simulation testbench and stored in files with standard formats such as SAIF (Switching Activity Interchange Format), VCD (Value Change Dump) and FSDB (Fast Signal Database). The PowerArtist reads the data from these files for power analysis. Clearly the file based interface provides post-simulation power analysis and brings its own overhead in making the analysis slow and error prone. Moreover, these formats lack either in terms of accuracy or capacity; SAIF does not include temporal information; VCD has temporal information but is inefficient because it is a textual format; FSDB is both temporal and binary but its generation slows down emulators and simulators.

To overcome these issues, ANSYSand Mentordeveloped an innovative approach where the activity data stream of an application running in Veloce emulator is directly captured by PowerArtist through an streaming interface. Due to elimination of file-based interface, both the emulator hardware and power analysis software tools run order of magnitude faster with the accuracy of actual consumed power. A key advantage of this approach is that it enables early RTL power visibility and budgeting for live applications which is not possible with traditional file-based approach.

PAVES (PowerArtist Vector Streaming) is a new innovative RTL power socket that can connect with emulators and simulators enabling streaming activity transfer. The PAVES socket interface with Veloce emulator’s DRW (Dynamic Read Waveform API) has been demonstrated working well in 52[SUP]nd[/SUP] DAC. This enables early gate-level power verification for live applications and therefore decisions for power budgeting of derivative designs. Since PAVES can process activity in parallel with the application running in Veloce, the power analysis can be much faster and accurate.

This approach of power analysis and budgeting for live applications has been tested by early access partners and customers. The runtime performance improvement with this new approach compared to the file-based approach can be up to 4.25x among the designs shown in the table above. This performance improvement is without any compromise on RTL-to-gate power accuracy.

With PAVES PowerArtist can read switching data directly from any supported emulator running a live application and provide visibility into RTL power as well as perform gate-level power verification without any overhead of file-based transfer. This is another feather added into the growing importance of emulation-based verification of SoCs.

Read a technical paper at ANSYS website to know more about this methodology.
Also read:
Eyes Meet Innovations at DAC
Getting the Best Dynamic Power Analysis Numbers
Benefits of RTL Power Budgeting

Pawan Kumar Fangaria
Founder & President at www.fangarias.com


Leveraging Power Reduction Techniques for MCU Based SoCs

Leveraging Power Reduction Techniques for MCU Based SoCs
by Daniel Nenni on 07-16-2015 at 12:00 pm

Dolphin Integration launched a new 32-bit microcontroller, RISC-351 Zephyr, targeting low-power SoCs for IoT-like competitive markets taking into consideration three angles for optimization of power consumption: architectural, memory and software.

Architecture Angle
As a reminder, 8-bit versus 16-bit versus 32-bit applies to 3 dimensions independently: instruction code, addressing space, and word width. The Arithmetic Logic Unit (ALU) performs operations on the word width. Thanks to an innovative instruction set and core micro-architecture, the RISC-351 Zephyr offers the unique flexibility of dealing with 8, 16, and 32-bit words using dedicated instructions and minimum sufficient data path in order to achieve low power consumption and small silicon area at subsystem-level (including program and data memories).

Beyond clock gating, which has been carefully implemented so that most functional blocks can be separately gated, RISC-351 Zephyr is available in a Retention Ready (RR) version which supports efficient power gating in ‘Deep Sleep’ mode. The advantage of this mode is that only the registers, which hold the needed information to wake-up in the same state, are kept in retention. All other logic is completely switched off.

Memory Angle
Memories play a major part in the overall power consumption of any microcontroller based subsystem. The Reduced Instruction Set of Zephyr achieves unsurpassed code density thanks to a smart use of variable instruction sizes whenever possible. This either enables adding more functionalities in the program or selecting smaller program memories, whether RAMs or NVMs (and thus saving leakage power).

The RISC-351 Zephyr also features an innovative pre-fetch interface dedicated to minimizing the number of accesses to the program memory by eliminating unnecessary ones. The number of accesses is reduced by 15% compared to conventional 32-bit low-power MCUs.

In addition, Dolphin Integration proposes an instruction cache-controller (R-Stratus-LP) which has been specifically designed to reduce power consumption and access time of embedded Flash and EEPROM memories by more than three times. The R-Stratus-LP offers highest hit rates because of its on-the-fly parameterized associativity ways and line size change capabilities.

Software Angle
A complete and innovative Integrated Development Environment (IDE) and compiler is essential to fully optimize any MCU subsystem.

The RISC-351 Zephyr is delivered with an innovative compiler SmartCC, the first compiler in the low-power MCU market to be based on the widely acclaimed LLVM framework. In addition to the wide compatibility of SmartCC with GCC and the latest ANSI-C standards, the compiler has been designed to maximize the use of the internal registers and thus reduce dynamic power consumption by minimizing energy consumption of the memory accesses.

Last but not least, Dolphin Integration enables developers to go further with its new IDE SmartVision[SUP]TM,[/SUP] by being able to quantify the energy consumed by each function during the program execution therefore allowing a designer to identify and optimize the most energy-consuming functions.

More information on: RISC-351 Zephyr and its IDE SmartVision™


Coventor SEMulator3D, Now With Added Dopant, Diffusion, Illumination and More

Coventor SEMulator3D, Now With Added Dopant, Diffusion, Illumination and More
by Paul McLellan on 07-16-2015 at 7:00 am

Coventor just rolled out the latest version of SEMulator3D, their virtual fabrication tool. Very conveniently it is SEMICON West this week and they have a booth. I dropped by and got a demo from David Fried, Coventor’s CTO about all the new stuff. He’s very proud of SEMulator3D’s new logo but mostly he is proud of several major improvement in their ability to do virtual fabrication of wafers. The value proposition of SEMulator is that you can avoid the cost and especially the time of running a lot of wafers, especially doing design of experiments (DOE) type work where many wafers need to be run with slightly different parameters. He told me that customers tell him the time is probably the biggest factor since, in practice, in a new fab, not running wafers just means the equipment sits idle and the only saving is in materials so they tend to run wafers of some sort continuously. As we all know, the primary way to get yield up is to run a lot of wafers.

As processes get more complex, and especially get more vertical, we need to model very complex structures such as vertical III-V nanowires, octuple patterning, vertical flash. Doing it the old way, by running experimental wafers is not enough on its own. Cost and development time stretch out and eliminating systematic structural defectivity becomes the key to a successful ramp. But the vertical structures cause problems due to shadowing, complex doping, deep etching and more.

So what’s new in version 5.0?

SEMulator3D has always had a module for handling dopants, basically implant and diffusion, but it was inadequate for the types of processes now being created. There is a lot of interaction between the physical structure and the electrical effect of the dopants. The new module handles ion implant, thermal diffusion, doped diffusion, doped epitaxy. It further includes visualization of the dopant concentration gradiants and the dopant-type concentrations, as in the diagram below of a 20nm SRAM. For example, look at the NFETs (top row) where you can see the source/drain implants penetrating, and shadowed by the gate/spacer structure. In the PFETs (bottom row) you can see the Boron (blue) and how it finally diffuses out (the Arsenic in the NFET stays put better due to diffusion properties).

Another new module allows analysis of visibility limitations, such as shadowing and off-axis effects. In some cases these off-axis effects are intentional, such as angled etch (as in the picture on the left below). And sometimes unintentional, such as where on a 300mm wafer the central die my have perfectly vertical etch but near the edge of the wafer there may be few degrees of error (as in the picture on the right). This sort of analysis can be key to high yield at the edge of the wafer.

The final big change, other than lots of incremental improvements in user interface, performance and so on, is in being able to link SEMulator3D to other tools. It is the best tool for the type of modeling it does but there is a whole selection of other tools that handle various detailed analysis, typically starting from a mesh of the structure. SEMulator3D has always supported output interfaces, and SEMulator3D 5.0 adds to the library of available modeling platforms.


Coventor are at SEMICON West at booth 2531.