SemiWiki – Page 811 – The Open Forum for Semiconductor Professionals

August 19, 2014August 22, 2024

SEMulator3D: GlobalFoundries Process Variation Reduction

SEMulator3D: GlobalFoundries Process Variation Reduction
by Paul McLellan on 08-19-2014 at 7:01 am
Categories: Coventor, EDA, Foundries, GlobalFoundries

At SEMICON last month, Rohit Pal of GlobalFoundries gave a presentation on their methodology for reducing process variation. It was titled Cpk Based Variation Reduction: 14nm FinFET Technology.

Capability indices such as Cpk is a commonly used technique to assess the variation maturity of a technology. It looks at a given parameter’s variability and compares it to 6 sigma. The higher the number the better, 1.33 should have the process yielding close to 100% (for that parameter) and 2 is the full 6 sigma. Using Cpk makes it easy to track metrics to assess variation improvement for a technology. They can also be used as a gating item for technology milestone achievement. However, it is not truly an absolute value, it is a function of the specification limits.

One of the big challenges is modern processes is that variation at one stage of the process can depend critically on variation at an earlier stage in the process, so the steps cannot be considered individually. Plus, with a fab cycle measured in months, and masks costs measured in millions, doing experiments on real silicon are prohibitive. At a high level, the approach GlobalFoundries used is to use structural simulation using Coventor’s SEMulator3D virtual fabrication platform. By analyzing the output it is possible to assess the knock-on effects of process changes, meaning effects later in the process. Analyzing the output it is possible to see which early factors have a major effect on variation later in the process, and thus where to focus the effort for improvement. On the other hand, factors which make little difference later can be left alone.

Structural simulation in SEMulator3D works by taking a specification of all the process parameters along with the layout data. SEMulator3D then builds up the result of building that layout on the process with those particular parameters. This structural output can then be used to derive electrical and other data. The picture at the top of this blog entry shows some example output, the bright green being the gates for the FinFets and the purple are the fins themselves. SEMulator3D has modules that understand the implications of almost everything that might be used in a process such as directional deposition, anisotropic etch, chemical mechanical polishing (CMP), implant and so on. Just as in actual fabrication, the virtual fabrication lays the various steps down one after another and builds up the outcome. But in the form of a 3-dimensional model of the outcome rather than an actual chip, of course. In a lot less time. For no mask or fab charges.

The example that Rohit went into in detail was FinFET gate height. Insufficient gate-height was identified as a yield problem. But gate-height is influenced by many steps (fin definition, dummy poly definition, junction, poly open, work function patterning, tungsten fill, tungsten etching, CMP and probably more). For example, the picture below shows an adjustment made to eSiGe Space RIE (reactive ion etching). After simulating more steps, you can easily see visually a big difference in the eSiGe epitaxy.

For the gate height improvement, a 9 factor two level DOE was executed and based on the simulation they could determine that Fin reveal, poly CMP, poly open CMP and tungsten CMP were statistically significant. So the specification limits were redefined and the variation spread amongst the contributing steps.

For example, one step is poly open CMP. The original process had poor yield and an unnaceptably Cpk of 0.36. By adding additional steps to the process using a two level deposition before the first CMP and then doing a second Cmp got the Cpk up to 1.1.

The conclusion is that the Cpk approach along with structural simulations (Coventor’s SEMulator3D) and physical to physical, electrical and yield correlations were used to define specification limits for physial measurements. Gate height variations for 14nm FinFET technology was successfully improved using this methodology.

The slides for Rohit’s presentation are here.

More articles by Paul McLellan…

August 19, 2014

Enable a new generation of connected devices?

Enable a new generation of connected devices?
by Eric Esteve on 08-19-2014 at 4:08 am
Categories: Imagination Technologies, IP

Imagination Technologies has designed a complete environment to address the needs of emerging IoT and other connected devices, FlowCloud. The technology has been engineered by Imagination to optimize device to cloud connectivity for embedded applications. FlowCloud is a cloud based application independent development platform, but it’s also a set of of core services and supporting infrastructure that form a set of building blocks specifically designed to accelerate the deployment of cloud-based applications. FlowCloud provides a platform that enables rapid construction and management of machine-to-machine and man-to-machine connected services, equally suitable for the hobbyist programmer through to large corporate clients. In other words, anybody can design an IoT or cloud-connected devices using FlowCloud. But the platform is also able to handle complex services relying upon subscription, billing and payment mechanisms.

The white paper introducing to FlowCloud can be accessed here (you may have to register before downloading)

In the real world, once the concept of the innovative emerging Internet of Things (IoT) or Machine-to-Machine (M2M) cloud-connected devices has been sketched, you want to be able to prototype it, and do it fast. Imagination’s silicon partners have created several low-cost reference platforms with full support for FlowCloud. For example, the chipKIT Wi-Fire development platform from Digilent is an ideal starting point; it uses a PIC32 microcontroller (MCU) with a MIPS microAptiv CPU and boasts on-board Wi-Fi. You can exercise your application, still using FlowCloud. At the heart, there is a set of core services and supporting infrastructure that form a set of building blocks specifically designed to accelerate the deployment of cloud-based applications. State-of-the-art data centres host the FlowCloud platform and supported services, using cluster server technology and built-in redundancy to deliver high reliability and guarantee system uptime.

Let’s take a look at some real case applications.
Case 1 is a Simple home control application. PowerBox is an example application that tracks and controls the electricity consumed by an appliance, enabling running costs to be monitored by the consumer. The application also delivers remote control and scheduling of individual power sockets.

Case 2 is an example of Secure systems, Electronic Healthcare, where high integrity, security and privacy of data are primary considerations. In the example of the SensiumVitals system from Sensium Healthcare, the device itself takes the form of a medical patch worn by the patient which provides live monitoring of vital signs including respiration, heartbeat and body temperature.

Reading Imagination white paper, you will discover how FlowCloud can be used to build a complete system competing with Apple itself, with Case 3, Cloud Music Service. The service allows full subscriptions, billing and micropayment.

Major features of FlowCloud include device and user management, asynchronous messaging services, event logging, data storage facilities, secure transactions and electronic payments. From the analytics standpoint, a full suite of administration and reporting tools provide dynamic views into the data stored server-side, enabling monitoring and management of all user interactions plus the status of all devices registered to your cloud-based services. Moreover, the tools allow both aggregation and deep analysis of this data, enabling the creation of advanced intelligent services.

If you read this white paper too quickly, you may only get the cloud based “development platform” view and miss the innovative set of FlowCloud services, including registration, authentication, association, security, notifications, updates and remote control. Imagination also proposes optional plug-ins modules to accelerate development including FlowTalk (VoIP), FlowFunds (electronic payments), FlowMusic (audio subscription services) and many others.

FlowCloud has been designed like a consumer interface, a cloud based set of services allowing from hobbyist programmer through to large corporate clients to realize the rapid construction and management of machine-to-machine and man-to-machine connected services. FlowCould may allow garage start-up to quickly develop and demonstrate, not only the system, but also the business model, and exercise it in the real life…

From Eric Esteve from IPNEST

Intel 14nm is NOT in Production Yet!

Intel 14nm is NOT in Production Yet!
by Daniel Nenni on 08-18-2014 at 2:01 pm
Categories: Foundries, Intel Foundry

Okay, maybe I’m the only one questioning Intel 14nm yield but I think it will be an interesting discussion in the comments section. Here are the questions I would have asked Intel during their recent 14nm PR tour: Has the P1272 process been rolled out to the production fabs in OR, AZ, and Ireland? Is the process officially in production (at Intel this means yield is in a specific range)? Before I share the answers I dug up to those questions lets take a look at the slide show Intel presented last November during the analyst meeting. Here are the most interesting yield slides:

Please note that some of the slides have *Forecast at the bottom. Just last week Intel shared an updated yield slide with notably less detail. Wait, is Broadwell really an SoC?

Clearly Intel missed the Q1 2014 “matching yield” projection but the question is why? Given that 14nm is a second generation FinFET process it really boggles the mind why yield is such a challenge. The consensus at SEMICON West last month is that there was a significant materials change at 14nm. If you know more about this please let us know in the comments section. Another slide Intel shared recently also shows a FinFET change which was predicted/discussed by Asen Asenov of GSS: Has Intel Learned from Predictive Simulations?

Gold Standard Simulations (GSS) offers complete solutions for Design Technology Co-Optimisation (DTCO), PDK development and exploration and screening of future technology options. Our tool chain integrates predictive Monte Carlo and statistical TCAD simulations, statistical compact model extraction and high sigma statistical circuit simulation using ‘push button’ cluster-based technology. Our tools are the ‘gold standard’ in terms of physical accuracy, efficiency and usability.

Why is Intel releasing this information now? My guess is that they are under considerable pressure from Wall Street (I have received several calls on it and have another one coming up). The last comment on 14nm production I remember is from BK on the Q2 2014 conference call last month:

“We also expect the first 14-nanometer Broadwell Core M processor-based systems including fanless two-in-ones will be on shelves for the holiday selling season, followed by broader OEM availability in the first half of 2015.“

Since BK is an experienced Intel operations person it would have been nice if he had said, “The P1272 14nm process has been moved from R&D (copy exact) to production fabs in Oregon, Arizona, and Ireland. P12272 is currently in pre-production at those fabs with production targeted by the end of 2014.” It’s all about transparency Brian, absolutely.

It would also be interesting to know why Intel chose such an aggressive metal fabric for 14nm. Is Intel bound by Moore’s Law and the ability to go where no transistor has gone before? Or was there a technical method in their madness?

Hopefully the foundries will have an easier time with yield since they chose to reuse the 20nm metal fabric for their first FinFET implementation. In the foundry business it’s all about manufacturability and servicing a very large customer base so the method in TSMC’s madness is easy to understand.

Also Read: Intel Versus TSMC 14nm Processes

FD-SOI at 14nm

FD-SOI at 14nm
by Paul McLellan on 08-17-2014 at 7:01 am
Categories: FD-SOI, Foundries, STMicroelectronics

At the recent Semicon West, Michel Haond of ST Microelectronics had a presentation on 14nm FD-SOI, or what they more lengthily call UTBB FD-SOI (which when you expand it all out comes to Ultra Thin Body and Buried-Oxide Fully Depleted Silicon on Insulator). When Chenming Hu (or whoever in his group) came up with the term FinFET it was certainly a much catchier name. Even legendary marketers Intel could only come up with TriGate, which doesn’t seem an improvement to me. Anyway, seems we are stuck with FD-SOI now.

As you probably know (at least if you have been following Semiwiki), bulk transistors ran out of steam at 20nm and we needed new transistor architectures. The two competitors are FinFET and FD-SOI. As ST pointed out, if you turn the FinFET on its side (see the diagram above) then the two transistors are not that different. The problem with bulk planar is that the channel is not well controlled by the gate and so leakage is unacceptably high since it is not possible to truly turn the transistor off. FinFET and FD-SOI both make the channel region very thin, FinFET by putting the gate on both sides (and the top) of the channel, FD-SOI by backing the channel with an insulator so there is no route for leakage current to sneak around the back.

ST Microelectronics has been manufacturing FD-SOI at 28nm and they have also licensed the technology to Samsung (and, perhaps, GlobalFoundries). It certainly seems to be a good way of extending 28nm, getting 20nm performance in a 28nm process. That is important since there is a lot of 28nm capacity but, more importantly, 28nm does not require double patterning.

In the presentation, Michel described 14nm FD-SOI as a 2-D bulk process with the same performance as a 3-D FinFET process. There are some significant potential advantages. Simpler process with fewer steps and fewer masks (but a more expensive base wafer). No channel doping, no pocket implants. Potential for back-bias control leading to being able to dynmaically adjust performance vs lower power. ST believe the process is scalable down to 10nm.

The presentation contains a lot of detail about process innovations, process performance boosters and process bosters that are too specialized to go into unless you are a die-hard TD engineer. But a couple of things to point out: there is local interconnect (middle-of-line MOL). M1/M2/M3 are double patterned with a 64nm pitch. Higher metal layers have 80nm pitch (or greater) and are single patterned. The N-transistors have a silicon channel with Hafnium Oxide Titanium Nitride oxide (HfO2/TiN). The P-transistors have Silicon Germanium channels.

The process has 18 masks for the FEOL (transistors), 7 masks for MOL (local interconnect) and 27 masks for BEOL (metal fabric) for 11 layers of metal. If you want all the gory details they are in the picture above.

The timeline for all this is: 28nm is available, 14nm is in development now and 10nm will be in R&D during 2015 and 2016. At 16nm, the expected performance improvement over the previous generation is either a 20% speedup or 30% power reduction at the same speed. At 10nm, either a 20% speedup or 25% power reduction at same speed.

Michel’s full presentation is here.

More articles by Paul McLellan…

August 17, 2014June 14, 2019

Another debug view in the UVM Toolbox

Another debug view in the UVM Toolbox
by Don Dingee on 08-17-2014 at 1:00 am
Categories: Aldec, EDA

One of the biggest endearing qualities of a debug environment for any type of coding is availability of multiple ways to accomplish a task. Whether the preference is keyboard shortcuts, mouse left-click drill-down and right-click pull-down menus, source code view, hierarchical class view, or graphical relationship view, a good debugger just lets developers be productive.

With a variety of tools arrayed within a debug environment, it is easy to pick and choose the way information is viewed and accessed, and control the level of detail needed. Sometimes, the fastest way is a simplified view of functions and variables. Other times, a more robust view of complex relationships is handy, especially to see interprocedural issues.

Aldec continues their quest to enhance their mixed-language, advanced verification platform, this time with the latest Aldec Riviera-PRO 2014.06 release. Beyond the obligatory gains in performance and language support with each new version, Aldec has been concentrating of late on their visual debug capability for UVM.

In our previous installment on UVM tools from Aldec, we saw the UVM Graph feature which helps visualize relationships within a testbench model. The new UVM Toolbox feature provides the quick and easy version of how to find a component with a simplified, tree-like hierarchy. UVM Toolbox is completely synchronized with UVM Graph, as well as the Class Viewer and HDL Editor, allowing developers to jump between views as desired while retaining context.

The hierarchy reveals parent-child relationships of UVM components easily and clearly. When a component is selected in UVM Toolbox, object properties are displayed. With an emphasis on speed of access and readability, the new view is a solid addition.

Another capability of Riviera-PRO is the waveform viewer, and it has been extended to include support for hierarchical virtual objects. This means that virtual records and arrays can be created, including other virtual objects and named rows. Also added is an antialiasing option in the Analog tab, which can help clean up views of analog waveforms.

Also noteworthy are changes to maintain the integrity of a development environment given what else is going on in the world. For those still clinging to a Windows XP development box, it’s time to move on – this is the first Aldec Riviera-PRO release to declare non-support for Win XP. Also, the OpenSSL library has been updated to the Heartbleed-free 1.0.1g version, as well as updates to the 2.8 version of the OVL library, the 2014.01 version of the OSVVM library, and the 1.2 version of the UVM library (uvm_1_2) included in a precompiled version.

In related news, the educational version of Riviera-PRO EDU is now available on EDA Playground. While it may not have all of the advanced features we’ve been discussing, it is an easy way for students and developers to learn about HDL simulation and debug.

For more on Aldec Riviera-PRO 2014.06, see the What’s New presentation.

Related stories:
Then, Python walked in for verification

Now, even I can spot bad UVM

August 16, 2014

How to Reduce Maximum Power at RTL Stage?

How to Reduce Maximum Power at RTL Stage?
by Pawan Fangaria on 08-16-2014 at 8:30 am
Categories: EDA

Of course that reduction has to stay throughout the design cycle up to layout implementation and fabrication. Since the advent of high density, mega functionality SoC designs at advanced nodes and battery life critical devices played by our fingertips, the gap between SoC power requirement and actual SoC power has only increased. There has been enough emphasis on power reduction techniques such as gate and interconnect capacitance reduction, voltage and frequency scaling which have reached their limits keeping in view the performance and process variation at lower nodes. Then there are effective techniques such as clock and data gating, memory gating, flop sharing and cloning etc. available at RTL to reduce activities. However, how often are these done in the right manner? In order to gain maximum power reduction, they need to be guided by sequential analysis of the design across state boundaries (and their behavior across clock cycles) which can eliminate unnecessary computations and reduce power consumption per operation or spread the operation over a larger time. So, how do we do it?

I had a great learning from a webinarat Calyptowebsite to gain the maximum advantage of using these techniques in a manual-cum-automated way. The combinational clock gating that saves power in flops by eliminating ‘clock’ power in gated flops (without any power saving in downstream logic) is very common in existing synthesis tools and is verifiable by any combinational logic equivalence checker. The data activity reduction can be done by sequential data gating, reduction in the number of operators and operand reordering by pushing the high activity data operand towards later stages of a complex operation. A significant power saving can be done by ‘flop sharing’ technique where flops are shared between data and control paths, eliminating redundant flops. Then there is ‘flop cloning’ that reduces activity by cloning high fan-out flops and identifying specific gating conditions. Similarly, reduction in memory activity can be an important source of power reduction where memory enable can be shut-off during any redundant read or write. The memory can be put in sleep mode as ‘light sleep’, ‘deep sleep’ or ‘shut down’ depending upon the situation.

As discussed above, there are very effective power saving techniques, but how to best utilize them in order to gain maximum saving in power? Above is an example of sequential clock gating where the key is to find when the data read or write is going to be redundant and then gate the flop appropriately, thus saving power in clock as well as logic. However, the practical situation is not so simple to find out such conditions.

Consider the above circuit; a simple pattern matching tool cannot detect such conditions. It requires mathematical and formal reasoning to find conditions under which writes to a flop never make their way to the design output or the same data value is getting written over and over to the same flop. In other words, a non-pattern-dependent formal approach is required to discover gating conditions.

TheCalypto Power Platform has automatic sequential analysis and optimization capability (vectorless or controlled by user provided switching activity) that performs exhaustive analysis of a design to find all optimization opportunities, computes potential power saving for each of the optimized expressions and determines optimal enable logic that can maximize power saving without impacting area or time.

The Calypto RTL power flow provides very early, fast and accurate feedback on possible power saving in a design along with any area impact, information about complete and incomplete clock-gating expressions and any wasted power. While a complete expression found by RTL sequential analysis is safe to gate a clock to save power, an incomplete expression may change design functionality and hence needs interactive analysis and correction before implementation of clock gating.

Above is an example of incomplete expression where value of a signal from previous cycle is not available, and data and control paths are optimized separately.

Similarly, there is another example of incomplete expression where registers appear to be in multiple clock domains. It’s unsafe to use a signal from different clock domain to create clock-gating expression.

The overall flow is very flexible and robust to provide lint clean optimized RTL (that takes care of CDC and timing issues) with ECO support and equivalence checking against the original RTL through Calypto’s unique SLEC (Sequential Logic Equivalence Checker) tool. The automated optimization implements gating expressions automatically. If time schedule permits to do more power optimization, then designers can analyze the incomplete expressions, complete them by fixing in RTL and iterate over the flow to gain maximum reduction in power.

This flow, having manual exploration with automation was performed on a few TI designs which provided impressive results; 2-3 iterations without any impact on design schedule resulted into overall power savings in the range of 26% to 52%.

Calypto has variants of specialized power estimation and reduction tools for various design needs; PowerPro CG for logic, register and clock-tree; PowerPro MG for memory; PowerPro Adviser for IP core where manual control over design is needed; PowerPro PA for RTL power estimation and analysis of results.

The challenge of power optimization of SoCs and IPs can be addressed by power efficient RTL, and to increase the efficiency of RTL for maximum reduction in power, sequential analysis followed by automated and interactive optimization of RTL is a must. Since the optimization is done at the functional level in RTL without changing the functionality of the design, it stays throughout the design process. More details can be obtained from the on-line webinar, very well presented by Abhishek Ranjan, Sr. Director of Engineering at Calypto.

Cadence Completes Power Signoff Solution with Voltus-Fi

Cadence Completes Power Signoff Solution with Voltus-Fi
by Paul McLellan on 08-15-2014 at 7:01 am
Categories: Cadence, EDA

You probably remember Cadence introduced Voltus towards the end of last year at their signoff summit. This was aimed at digital designers. Prior to that they had announced Tempus, their static timing analysis tool. More recently they announced Quantus QRC extraction. All of these tools that end in -us have been re-architected to take advantage of large server farms, able to use dozens or even hundreds of cores to handle the largest designs in reasonable speed. These tools are primarily focused on supporting large digital SoCs.

Last week Cadence announced Voltus-Fi to complete their power signoff solution. It is aimed at analog designs and extends the electromigration and IR drop (EMIR) analysis to analog. It provides best-in-class transistor-level EMIR accuracy, especially in advanced node FinFET processes. It uses Cadence’s patented voltage-based iteration method, which requires a smaller memory footprint and runs faster than the industry’s traditional current-based iteration method. Basically it is a transistor level EMIR tool with SPICE level accuracy especially targeted at the most advanced nodes.

As I said at the announcement of Voltus:of course, those tools work just fine in non-advanced nodes too, but at 20nm and 16nm there are FinFETs, double patterning, timing impacts from dummy metal fill, a gazillion corners to be analyzed and so on.

As you would expect, it is fully integrated with Voltus itself, to give a seamless flow for advanced mixed-signal designs that contain both digital and analog blocks. It is also leverages Quantus QRC for transistor-level parasitic extraction, the Spectre Accelerated Parallel Simulator and the Spectre Extensive Partitioning Simulator.

It is also fully integrated into the Virtuoso platform for analog and custom block design. EMIR results from Voltus-Fi can be displayed on the real physical layout for quick analysis, debugging and optimization.

All this integration and performance shrinks the power signoff closure cycle. Many designs are more constrained by power and integrity issues than they are by raw performance, not least many of the most advanced chips for mobile where battery life is one of the key features of a device that shows through all the way to the end user. Consumers might not know what microprocessor is in their phone but they certainly know how long the battery lasts before they need to recharge it. Many submarkets of the Internet of Things (IoT) are even more power critical and also typically involve mixed-signal designs incorporating analog blocks (and perhaps sensors too).

More articles by Paul McLellan…

August 14, 2014June 14, 2019

A Deeper Insight into Quantus QRC Extraction Solution

A Deeper Insight into Quantus QRC Extraction Solution
by Pawan Fangaria on 08-14-2014 at 7:00 pm
Categories: Cadence, EDA

Last month Cadenceannounced its fastest parasitic extraction tool (minimum 5 times better performance compared to other available tools) which can handle growing design sizes with interconnect explosion, number of parasitics and complexities at advanced process nodes including FinFETs, without impacting accuracy of extraction. It’s obvious, massive parallelism with several CPUs combined power is at work, which Cadence did with Tempus timing signoff solution and Voltus power integrity solution as well, but there are more things to read into why it appears to be the best solution positioned for signoff extraction.

For the parasitic extraction to be of signoff quality it needs to be silicon proven, which Quantus QRC Extraction Solution provides with best-in-class accuracy; being fully certified for the ultimate 16nm FinFET process of TSMC. A new high-performance ‘random-walk’ field solver, Quantus FS embedded in Quantus QRC enables it to accurately extract critical nets; benchmark on a 20nm design shows mean of -0.01 and standard deviation of 3.09 compared to field solver on 1000 random nets.

Time appears to be the scarcest at the time of signoff and tape-out. The Quantus QRC provides automated incremental extraction for functional ECOs (Engineering Change Orders), such as any routing change in EDI (Encounter Digital Implementation), directly through an integrated database, thus eliminating the need of time consuming full-flat extraction at the chip or block level with every change.

Supporting FinFET process means taking into account many new parameters such as fringe 3D capacitances from gates and fins, new capacitance components to fins from gate thickness, new resistances, external capacitances to M0/V0 MEOL (Middle End of Line) contacts and below M1 FEOL (Front End of Line) features like complex poly structures, raised source and drain, two-step M0 and multi-finger fins with varied pitched and widths. Also, litho bias, corner variations and mask shift variations in BEOL (Back End of Line) process and double patterning technology need to be considered. The increases in parameters resulting into bigger netlists, design size, interconnect corners (3x more corners with double patterning at 20nm and below) etc. impact post-layout simulation performance. This requires complex modeling for better accuracy and efficient and faster simulation runs. The Quantus QRC Extraction Solution has a robust 3D modeling framework which provides unmatched accuracy against foundry and ~2x smaller netlist. The tool provides ~2.5x faster simulation run and faster characterization of standard cells, SRAMs and IPs.

The tool provides unique functionalities required for different types of designs such as SerDes, IP/SRAM/bitcell characterization, memory, powerMOS, image sensors, custom/analog and RF designs. It has unique capabilities for substrate noise analysis (SNA) with a full 3D substrate model, extraction of inductance and analysis of parasitic impact on clock and long nets in designs at ~100GHz, support of Partial Element Equivalent Circuit (PEEC) method and mutual and self-inductance, RC and RCLK reduction that can reduce simulation time by an order of ~20x, and meshR (used for powerMOS) providing better accuracy for irregular or wide metal shapes (large grids being at the center of the die and fine grids near contacts, edges and corners) and higher speed of simulation using adaptive meshing technique which reduces the number of resistances. A 3DIC using TSVs can be extracted precisely with this tool.

The Quantus QRC is closely integrated with Virtuoso ADE environment which provides early visibility into parasitics at the schematic level through in-design extraction of partial layout which can be easily generated from Virtuoso ADE. This helps in better correlation between schematic and post-layout simulation, thus reducing design iterations and aiding in faster design convergence.

The Quantus QRC Extraction Solution is integrated with all P&R tools, Virtual Prototyping and analysis tools and Signoff tools. It’s the same extraction engine during the implementation and signoff that provides better correlation and faster design closure. The users while working in Encounter Digital Implementation System can gain single-click execution for all extraction models.

Coming back to massive parallelism, what’s special about it? The performance is linearly scalable with the number of CPUs increased, generally not common with other architectures. It’s scalable for multi-corner simulation runs as well; an icing on the cake is that the tool runs 2-3x faster in case of multi-corner simulation. The Cadence proprietary parallel architecture allows scaling to unlimited number of CPUs and machines as the SoC size increases, thus providing highest capacity and performance.

The Quantus QRC Extraction Solution is the best-in-class technology for parasitic extraction and analysis for analog, digital and AMS SoCs employing today’s advanced node technologies. Its in-design integration with both analog and digital platforms along with a state-of-science field solver provides silicon-proven accuracy with faster design convergence and better correlation. More details can be obtained from a whitepaperwritten by Hitendra Divecha, Product Marketing at Cadence. The whitepaper has details of encouraging benchmarks for various steps in the overall design process.

Also read –
https://www.semiwiki.com/forum/content/3665-cadence-announces-quantus-next-generation-extraction.html

http://www.cadence.com/Community/blogs/ii/archive/2014/07/14/quantus-qrc-massive-parallelism-extracts-accurate-parasitics-quickly.aspx?postID=1335602

When TSMC advocates FD-SOI…

When TSMC advocates FD-SOI…
by Eric Esteve on 08-14-2014 at 1:00 pm
Categories: FD-SOI, Foundries, STMicroelectronics, TSMC

I found a patent recently (May,14 2013) granted to TSMC “Planar Compatible FDSOI Design Architecture”, the following sentences, directly extracted from this patent, advertise FDSOI design better than a commercial promotion! “Devices formed on SOI substrates offer many advantages over their bulk counterparts, including absence of reverse body effect, absence of latch-up, soft-error immunity, and elimination of junction capacitance typically encountered in bulk silicon devices. SOI technology therefore enables higher speed performance, higher packing density, and reduced power consumption.” Nothing new here for Semiwiki readers… except that this enumeration of the advantages of SOI technology in respect with bulk planar is coming from TSMC…

In fact, the sentence mention “SOI substrates”, but when you look at the next paragraph, you find the definition of partially-depleted (PD) SOI transistor and fully-depleted (FD) SOI transistor, and their respective behavior and advantages:

A PDSOI transistor is formed in an active region with an active layer thickness that is larger than the maximum depletion width. The PDSOI transistor therefore has a partially depleted body. PDSOI transistor have the merit of being highly manufacturable, but they suffer from floating body effects. Digital circuits, which typically have higher tolerance for floating body effects may employ PDSOI transistors.
A FDSOI transistor is formed in an active region with an active layer thickness that is smaller than the maximum depletion width. FDSOI transistors avoid problems of floating body effects with the use of a thinner active layer thickness or a lighter body doping. Generally, analog circuitry performs better when designed using FDSOI devices than using PDSOI devices.

To illustrate this patent, TSMC is referring to a Baseband IC for mobile application, or maybe an integrated BB and Application Processor. In both cases many of the integrated IP, like memory cell or high speed SerDes, are based on analog circuitry, thus FDSOI clearly appears to be the best choice.

You may wonder why TSMC is highly promoting FDSOI, as we know that the foundry has not selected this technology. TSMC is supporting 28nm bulk planar, then 20nm (including double patterning for critical layers) and 16nm FinFET. So, why TSMC is doing such an advertising for FDSOI? Reading further, we can see:

“An FDSOI ASIC design in the same footprint as a bulk planar ASIC design provides several advantages over the bulk planar ASIC design. Adaptive body bias techniques are inefficient with bulk planar designs because of the PN junction forward bias issue and because junction leakage increases in the reverse bias condition. Therefore, planar technologies have to adopt voltage scaling techniques for power savings in single Vt designs.”

It look like that TSMC is willing to demonstrate that a FDSOI design can be portable to a bulk planar technology, providing that the power rails have been carefully designed, and this requirement is extensively described within the patent (in fact, it’s the core of the patent). We have highlighted in Semiwiki one of the important advantages linked with FDSOI technology: a dual Vt library can support a complete SoC design, allowing cost savings (number of masks and process steps is lower) and faster process turnaround time, when compared with four Vt on bulk planar, only bulk option to offer the same level of power savings than FDSOI.

But we still don’t know why TSMC has filled this patent. Is it because the company is willing to offer FDSOI as an additional process option to existing customers? In this case, this patent could be a way to minimize risk, showing to a customer moving to FDSOI that he could decide to come back to a bulk planar option, with no redesign because the “FDSOI ASIC design is in the same footprint as a bulk planar ASIC design”. By the way, TSMC offering FDSOI process option would be a scoop…

Another possibility would be that TSMC is not willing to support FDSOI, but certain existing ASIC customer willing to try FDSOI with TSMC competition, this patent would allow TSMC to keep the door opened, and these customers could come back to bulk planar ASIC processed at TSMC. This approach would be like a double sourcing, but between bulk planar and FDSOI.

TSMC has certainly carefully looked at FDSOI as a technology option, even if so far the company doesn’t support FDSOI. I am happy to see that a TSMC patent highlights the many technical advantages of FDSOI vs bulk planar, like absence of reverse body effect, absence of latch-up, soft-error immunity, and elimination of junction capacitance. In this advantage list, we can add potential cost savings (when SOI wafer price will go down), faster wafer fab cycle time and probably the most important, far better power efficiency, whether the SoC is designed for Networking infrastructure or mobile application processor. Will all these advantages be enough to compensate some current weaknesses, like customer fear in front of innovation and work in progress IP ecosystem, and finally pushing TSMC to join the ST and Samsung train?

From Eric Esteve from IPNEST

Transaction-based Emulation

Transaction-based Emulation
by Paul McLellan on 08-14-2014 at 7:01 am
Categories: EDA, Synopsys

Verification has been going through a lot of changes in the last couple of years. Three technologies that used to be largely contained in their own silos have come together: simulation, emulation and virtual-platforms.

Until recently, the workhorse verification tool was simulation. Emulation had its place but limits on capacity and its high cost, and difficulty of use, kept it from the mainstream. Virtual platforms had their niche but the modeling challenge meant that they were not nearly as widely used as they could have been.

Then simulation ran out of steam. State of the art SoCs were just too big for simulation. And we are not talking about gate-level simulation here, that ran out of steam years ago. This is RTL simulation. At the same time emulation technology improved both in terms of capacity and usability. It used to be a multi-week or even month project to get a design moved onto an emulator and getting everything up and running. Also, the cost came down and the ability to share an emulator among multiple-users at the same time further reduced the amortized cost. I have seen statements that emulation is now the cheapest verification cycle you can get, compared to running simulation on server farms. I don’t know if that is strictly true but it seems to be getting to be in the same ballpark. I moderated a panel on emulation at DAC last year with companies like TI and Broadcom on the panel. They all used emulation extensively and their only real problem was not being able to get enough of it. But there is never enough time and money to do all the verification you might like on a modern SoC.

It turned out that once people had emulators, the modeling problem for virtual platforms could be made to go away. Instead of hand-crafting behavioral or transaction level models and then trying to keep them synchronized with the RTL, it became possible to just use the RTL. Run the processor and its associated software load using the virtual platform technology but run the rest of the design by compiling the RTL into an emulator.

As you probably remember, Synopsys acquired Eve and its Zebu emulation product line last year. With various flavors of VCS they already had RTL simulation, of course. Plus, 3 or so years ago, Synopsys acquired Virtio, VaST and CoWare giving them virtual platform technology. Now, with a lot more integration work having been done Synopsys has new capabilities most of which they market under the brand-name Verification Compiler.

A couple of days ago Synopsys had a webinar Creating a High-performance Transaction-based Emulation Environment (yes, I know it would have been better to put this out a couple of days ago instead of today, but migraine struck. But there is a replay).Transaction-based emulation or TBE has become an increasingly popular method for utilizing emulators because of the high verification performance and flexibility in connecting to existing environments. Achieving high performance requires a combination of the emulator’s capabilities and tuning the environment that drives it to avoid bottlenecks. This tutorial will explain the necessary components and techniques to create a high performance emulation environment.

The webinar was presented by Lance Tamura, who is the CAE manager for the Zebu emulator.

The replay for the webinar is available here (registration).

And the non-silicon-valley SNUGs are coming up (with a bit better notice than for the webinar):

Boston on September 11th
Austin on September 23rd
Ottowa on October 8th

More articles by Paul McLellan…