RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

FD-SOI at 14nm

FD-SOI at 14nm
by Paul McLellan on 08-17-2014 at 7:01 am

At the recent Semicon West, Michel Haond of ST Microelectronics had a presentation on 14nm FD-SOI, or what they more lengthily call UTBB FD-SOI (which when you expand it all out comes to Ultra Thin Body and Buried-Oxide Fully Depleted Silicon on Insulator). When Chenming Hu (or whoever in his group) came up with the term FinFET it was certainly a much catchier name. Even legendary marketers Intel could only come up with TriGate, which doesn’t seem an improvement to me. Anyway, seems we are stuck with FD-SOI now.


As you probably know (at least if you have been following Semiwiki), bulk transistors ran out of steam at 20nm and we needed new transistor architectures. The two competitors are FinFET and FD-SOI. As ST pointed out, if you turn the FinFET on its side (see the diagram above) then the two transistors are not that different. The problem with bulk planar is that the channel is not well controlled by the gate and so leakage is unacceptably high since it is not possible to truly turn the transistor off. FinFET and FD-SOI both make the channel region very thin, FinFET by putting the gate on both sides (and the top) of the channel, FD-SOI by backing the channel with an insulator so there is no route for leakage current to sneak around the back.

ST Microelectronics has been manufacturing FD-SOI at 28nm and they have also licensed the technology to Samsung (and, perhaps, GlobalFoundries). It certainly seems to be a good way of extending 28nm, getting 20nm performance in a 28nm process. That is important since there is a lot of 28nm capacity but, more importantly, 28nm does not require double patterning.


In the presentation, Michel described 14nm FD-SOI as a 2-D bulk process with the same performance as a 3-D FinFET process. There are some significant potential advantages. Simpler process with fewer steps and fewer masks (but a more expensive base wafer). No channel doping, no pocket implants. Potential for back-bias control leading to being able to dynmaically adjust performance vs lower power. ST believe the process is scalable down to 10nm.


The presentation contains a lot of detail about process innovations, process performance boosters and process bosters that are too specialized to go into unless you are a die-hard TD engineer. But a couple of things to point out: there is local interconnect (middle-of-line MOL). M1/M2/M3 are double patterned with a 64nm pitch. Higher metal layers have 80nm pitch (or greater) and are single patterned. The N-transistors have a silicon channel with Hafnium Oxide Titanium Nitride oxide (HfO2/TiN). The P-transistors have Silicon Germanium channels.


The process has 18 masks for the FEOL (transistors), 7 masks for MOL (local interconnect) and 27 masks for BEOL (metal fabric) for 11 layers of metal. If you want all the gory details they are in the picture above.


The timeline for all this is: 28nm is available, 14nm is in development now and 10nm will be in R&D during 2015 and 2016. At 16nm, the expected performance improvement over the previous generation is either a 20% speedup or 30% power reduction at the same speed. At 10nm, either a 20% speedup or 25% power reduction at same speed.

Michel’s full presentation is here.

See also:
FD-SOI: 20nm Performance at 28nm Cost
FD-SOI Better Than FinFET?

Keywords: FD-SOI, Cost, FinFET


More articles by Paul McLellan…


Another debug view in the UVM Toolbox

Another debug view in the UVM Toolbox
by Don Dingee on 08-17-2014 at 1:00 am

One of the biggest endearing qualities of a debug environment for any type of coding is availability of multiple ways to accomplish a task. Whether the preference is keyboard shortcuts, mouse left-click drill-down and right-click pull-down menus, source code view, hierarchical class view, or graphical relationship view, a good debugger just lets developers be productive.

With a variety of tools arrayed within a debug environment, it is easy to pick and choose the way information is viewed and accessed, and control the level of detail needed. Sometimes, the fastest way is a simplified view of functions and variables. Other times, a more robust view of complex relationships is handy, especially to see interprocedural issues.

Aldec continues their quest to enhance their mixed-language, advanced verification platform, this time with the latest Aldec Riviera-PRO 2014.06 release. Beyond the obligatory gains in performance and language support with each new version, Aldec has been concentrating of late on their visual debug capability for UVM.

In our previous installment on UVM tools from Aldec, we saw the UVM Graph feature which helps visualize relationships within a testbench model. The new UVM Toolbox feature provides the quick and easy version of how to find a component with a simplified, tree-like hierarchy. UVM Toolbox is completely synchronized with UVM Graph, as well as the Class Viewer and HDL Editor, allowing developers to jump between views as desired while retaining context.

The hierarchy reveals parent-child relationships of UVM components easily and clearly. When a component is selected in UVM Toolbox, object properties are displayed. With an emphasis on speed of access and readability, the new view is a solid addition.

Another capability of Riviera-PRO is the waveform viewer, and it has been extended to include support for hierarchical virtual objects. This means that virtual records and arrays can be created, including other virtual objects and named rows. Also added is an antialiasing option in the Analog tab, which can help clean up views of analog waveforms.

Also noteworthy are changes to maintain the integrity of a development environment given what else is going on in the world. For those still clinging to a Windows XP development box, it’s time to move on – this is the first Aldec Riviera-PRO release to declare non-support for Win XP. Also, the OpenSSL library has been updated to the Heartbleed-free 1.0.1g version, as well as updates to the 2.8 version of the OVL library, the 2014.01 version of the OSVVM library, and the 1.2 version of the UVM library (uvm_1_2) included in a precompiled version.

In related news, the educational version of Riviera-PRO EDU is now available on EDA Playground. While it may not have all of the advanced features we’ve been discussing, it is an easy way for students and developers to learn about HDL simulation and debug.

For more on Aldec Riviera-PRO 2014.06, see the What’s New presentation.

Related stories:
Then, Python walked in for verification

Now, even I can spot bad UVM


How to Reduce Maximum Power at RTL Stage?

How to Reduce Maximum Power at RTL Stage?
by Pawan Fangaria on 08-16-2014 at 8:30 am

Of course that reduction has to stay throughout the design cycle up to layout implementation and fabrication. Since the advent of high density, mega functionality SoC designs at advanced nodes and battery life critical devices played by our fingertips, the gap between SoC power requirement and actual SoC power has only increased. There has been enough emphasis on power reduction techniques such as gate and interconnect capacitance reduction, voltage and frequency scaling which have reached their limits keeping in view the performance and process variation at lower nodes. Then there are effective techniques such as clock and data gating, memory gating, flop sharing and cloning etc. available at RTL to reduce activities. However, how often are these done in the right manner? In order to gain maximum power reduction, they need to be guided by sequential analysis of the design across state boundaries (and their behavior across clock cycles) which can eliminate unnecessary computations and reduce power consumption per operation or spread the operation over a larger time. So, how do we do it?

I had a great learning from a webinarat Calyptowebsite to gain the maximum advantage of using these techniques in a manual-cum-automated way. The combinational clock gating that saves power in flops by eliminating ‘clock’ power in gated flops (without any power saving in downstream logic) is very common in existing synthesis tools and is verifiable by any combinational logic equivalence checker. The data activity reduction can be done by sequential data gating, reduction in the number of operators and operand reordering by pushing the high activity data operand towards later stages of a complex operation. A significant power saving can be done by ‘flop sharing’ technique where flops are shared between data and control paths, eliminating redundant flops. Then there is ‘flop cloning’ that reduces activity by cloning high fan-out flops and identifying specific gating conditions. Similarly, reduction in memory activity can be an important source of power reduction where memory enable can be shut-off during any redundant read or write. The memory can be put in sleep mode as ‘light sleep’, ‘deep sleep’ or ‘shut down’ depending upon the situation.

As discussed above, there are very effective power saving techniques, but how to best utilize them in order to gain maximum saving in power? Above is an example of sequential clock gating where the key is to find when the data read or write is going to be redundant and then gate the flop appropriately, thus saving power in clock as well as logic. However, the practical situation is not so simple to find out such conditions.

Consider the above circuit; a simple pattern matching tool cannot detect such conditions. It requires mathematical and formal reasoning to find conditions under which writes to a flop never make their way to the design output or the same data value is getting written over and over to the same flop. In other words, a non-pattern-dependent formal approach is required to discover gating conditions.

TheCalypto Power Platform has automatic sequential analysis and optimization capability (vectorless or controlled by user provided switching activity) that performs exhaustive analysis of a design to find all optimization opportunities, computes potential power saving for each of the optimized expressions and determines optimal enable logic that can maximize power saving without impacting area or time.

The Calypto RTL power flow provides very early, fast and accurate feedback on possible power saving in a design along with any area impact, information about complete and incomplete clock-gating expressions and any wasted power. While a complete expression found by RTL sequential analysis is safe to gate a clock to save power, an incomplete expression may change design functionality and hence needs interactive analysis and correction before implementation of clock gating.

Above is an example of incomplete expression where value of a signal from previous cycle is not available, and data and control paths are optimized separately.

Similarly, there is another example of incomplete expression where registers appear to be in multiple clock domains. It’s unsafe to use a signal from different clock domain to create clock-gating expression.

The overall flow is very flexible and robust to provide lint clean optimized RTL (that takes care of CDC and timing issues) with ECO support and equivalence checking against the original RTL through Calypto’s unique SLEC (Sequential Logic Equivalence Checker) tool. The automated optimization implements gating expressions automatically. If time schedule permits to do more power optimization, then designers can analyze the incomplete expressions, complete them by fixing in RTL and iterate over the flow to gain maximum reduction in power.

This flow, having manual exploration with automation was performed on a few TI designs which provided impressive results; 2-3 iterations without any impact on design schedule resulted into overall power savings in the range of 26% to 52%.

Calypto has variants of specialized power estimation and reduction tools for various design needs; PowerPro CG for logic, register and clock-tree; PowerPro MG for memory; PowerPro Adviser for IP core where manual control over design is needed; PowerPro PA for RTL power estimation and analysis of results.

The challenge of power optimization of SoCs and IPs can be addressed by power efficient RTL, and to increase the efficiency of RTL for maximum reduction in power, sequential analysis followed by automated and interactive optimization of RTL is a must. Since the optimization is done at the functional level in RTL without changing the functionality of the design, it stays throughout the design process. More details can be obtained from the on-line webinar, very well presented by Abhishek Ranjan, Sr. Director of Engineering at Calypto.

More Articles by Pawan Fangaria…..


Cadence Completes Power Signoff Solution with Voltus-Fi

Cadence Completes Power Signoff Solution with Voltus-Fi
by Paul McLellan on 08-15-2014 at 7:01 am

You probably remember Cadence introduced Voltus towards the end of last year at their signoff summit. This was aimed at digital designers. Prior to that they had announced Tempus, their static timing analysis tool. More recently they announced Quantus QRC extraction. All of these tools that end in -us have been re-architected to take advantage of large server farms, able to use dozens or even hundreds of cores to handle the largest designs in reasonable speed. These tools are primarily focused on supporting large digital SoCs.

Last week Cadence announced Voltus-Fi to complete their power signoff solution. It is aimed at analog designs and extends the electromigration and IR drop (EMIR) analysis to analog. It provides best-in-class transistor-level EMIR accuracy, especially in advanced node FinFET processes. It uses Cadence’s patented voltage-based iteration method, which requires a smaller memory footprint and runs faster than the industry’s traditional current-based iteration method. Basically it is a transistor level EMIR tool with SPICE level accuracy especially targeted at the most advanced nodes.


As I said at the announcement of Voltus:of course, those tools work just fine in non-advanced nodes too, but at 20nm and 16nm there are FinFETs, double patterning, timing impacts from dummy metal fill, a gazillion corners to be analyzed and so on.

As you would expect, it is fully integrated with Voltus itself, to give a seamless flow for advanced mixed-signal designs that contain both digital and analog blocks. It is also leverages Quantus QRC for transistor-level parasitic extraction, the Spectre Accelerated Parallel Simulator and the Spectre Extensive Partitioning Simulator.


It is also fully integrated into the Virtuoso platform for analog and custom block design. EMIR results from Voltus-Fi can be displayed on the real physical layout for quick analysis, debugging and optimization.

All this integration and performance shrinks the power signoff closure cycle. Many designs are more constrained by power and integrity issues than they are by raw performance, not least many of the most advanced chips for mobile where battery life is one of the key features of a device that shows through all the way to the end user. Consumers might not know what microprocessor is in their phone but they certainly know how long the battery lasts before they need to recharge it. Many submarkets of the Internet of Things (IoT) are even more power critical and also typically involve mixed-signal designs incorporating analog blocks (and perhaps sensors too).

See also:
Signoff Summit and Voltus

More articles by Paul McLellan…


A Deeper Insight into Quantus QRC Extraction Solution

A Deeper Insight into Quantus QRC Extraction Solution
by Pawan Fangaria on 08-14-2014 at 7:00 pm

Last month Cadenceannounced its fastest parasitic extraction tool (minimum 5 times better performance compared to other available tools) which can handle growing design sizes with interconnect explosion, number of parasitics and complexities at advanced process nodes including FinFETs, without impacting accuracy of extraction. It’s obvious, massive parallelism with several CPUs combined power is at work, which Cadence did with Tempus timing signoff solution and Voltus power integrity solution as well, but there are more things to read into why it appears to be the best solution positioned for signoff extraction.

For the parasitic extraction to be of signoff quality it needs to be silicon proven, which Quantus QRC Extraction Solution provides with best-in-class accuracy; being fully certified for the ultimate 16nm FinFET process of TSMC. A new high-performance ‘random-walk’ field solver, Quantus FS embedded in Quantus QRC enables it to accurately extract critical nets; benchmark on a 20nm design shows mean of -0.01 and standard deviation of 3.09 compared to field solver on 1000 random nets.

Time appears to be the scarcest at the time of signoff and tape-out. The Quantus QRC provides automated incremental extraction for functional ECOs (Engineering Change Orders), such as any routing change in EDI (Encounter Digital Implementation), directly through an integrated database, thus eliminating the need of time consuming full-flat extraction at the chip or block level with every change.

Supporting FinFET process means taking into account many new parameters such as fringe 3D capacitances from gates and fins, new capacitance components to fins from gate thickness, new resistances, external capacitances to M0/V0 MEOL (Middle End of Line) contacts and below M1 FEOL (Front End of Line) features like complex poly structures, raised source and drain, two-step M0 and multi-finger fins with varied pitched and widths. Also, litho bias, corner variations and mask shift variations in BEOL (Back End of Line) process and double patterning technology need to be considered. The increases in parameters resulting into bigger netlists, design size, interconnect corners (3x more corners with double patterning at 20nm and below) etc. impact post-layout simulation performance. This requires complex modeling for better accuracy and efficient and faster simulation runs. The Quantus QRC Extraction Solution has a robust 3D modeling framework which provides unmatched accuracy against foundry and ~2x smaller netlist. The tool provides ~2.5x faster simulation run and faster characterization of standard cells, SRAMs and IPs.

The tool provides unique functionalities required for different types of designs such as SerDes, IP/SRAM/bitcell characterization, memory, powerMOS, image sensors, custom/analog and RF designs. It has unique capabilities for substrate noise analysis (SNA) with a full 3D substrate model, extraction of inductance and analysis of parasitic impact on clock and long nets in designs at ~100GHz, support of Partial Element Equivalent Circuit (PEEC) method and mutual and self-inductance, RC and RCLK reduction that can reduce simulation time by an order of ~20x, and meshR (used for powerMOS) providing better accuracy for irregular or wide metal shapes (large grids being at the center of the die and fine grids near contacts, edges and corners) and higher speed of simulation using adaptive meshing technique which reduces the number of resistances. A 3DIC using TSVs can be extracted precisely with this tool.

The Quantus QRC is closely integrated with Virtuoso ADE environment which provides early visibility into parasitics at the schematic level through in-design extraction of partial layout which can be easily generated from Virtuoso ADE. This helps in better correlation between schematic and post-layout simulation, thus reducing design iterations and aiding in faster design convergence.

The Quantus QRC Extraction Solution is integrated with all P&R tools, Virtual Prototyping and analysis tools and Signoff tools. It’s the same extraction engine during the implementation and signoff that provides better correlation and faster design closure. The users while working in Encounter Digital Implementation System can gain single-click execution for all extraction models.

Coming back to massive parallelism, what’s special about it? The performance is linearly scalable with the number of CPUs increased, generally not common with other architectures. It’s scalable for multi-corner simulation runs as well; an icing on the cake is that the tool runs 2-3x faster in case of multi-corner simulation. The Cadence proprietary parallel architecture allows scaling to unlimited number of CPUs and machines as the SoC size increases, thus providing highest capacity and performance.

The Quantus QRC Extraction Solution is the best-in-class technology for parasitic extraction and analysis for analog, digital and AMS SoCs employing today’s advanced node technologies. Its in-design integration with both analog and digital platforms along with a state-of-science field solver provides silicon-proven accuracy with faster design convergence and better correlation. More details can be obtained from a whitepaperwritten by Hitendra Divecha, Product Marketing at Cadence. The whitepaper has details of encouraging benchmarks for various steps in the overall design process.

Also read –
https://www.semiwiki.com/forum/content/3665-cadence-announces-quantus-next-generation-extraction.html

http://www.cadence.com/Community/blogs/ii/archive/2014/07/14/quantus-qrc-massive-parallelism-extracts-accurate-parasitics-quickly.aspx?postID=1335602

More Articles by Pawan Fangaria…..


When TSMC advocates FD-SOI…

When TSMC advocates FD-SOI…
by Eric Esteve on 08-14-2014 at 1:00 pm

I found a patent recently (May,14 2013) granted to TSMC “Planar Compatible FDSOI Design Architecture”, the following sentences, directly extracted from this patent, advertise FDSOI design better than a commercial promotion! “Devices formed on SOI substrates offer many advantages over their bulk counterparts, including absence of reverse body effect, absence of latch-up, soft-error immunity, and elimination of junction capacitance typically encountered in bulk silicon devices. SOI technology therefore enables higher speed performance, higher packing density, and reduced power consumption.” Nothing new here for Semiwiki readers… except that this enumeration of the advantages of SOI technology in respect with bulk planar is coming from TSMC…


In fact, the sentence mention “SOI substrates”, but when you look at the next paragraph, you find the definition of partially-depleted (PD) SOI transistor and fully-depleted (FD) SOI transistor, and their respective behavior and advantages:

  • A PDSOI transistor is formed in an active region with an active layer thickness that is larger than the maximum depletion width. The PDSOI transistor therefore has a partially depleted body. PDSOI transistor have the merit of being highly manufacturable, but they suffer from floating body effects. Digital circuits, which typically have higher tolerance for floating body effects may employ PDSOI transistors.
  • A FDSOI transistor is formed in an active region with an active layer thickness that is smaller than the maximum depletion width. FDSOI transistors avoid problems of floating body effects with the use of a thinner active layer thickness or a lighter body doping. Generally, analog circuitry performs better when designed using FDSOI devices than using PDSOI devices.

To illustrate this patent, TSMC is referring to a Baseband IC for mobile application, or maybe an integrated BB and Application Processor. In both cases many of the integrated IP, like memory cell or high speed SerDes, are based on analog circuitry, thus FDSOI clearly appears to be the best choice.


You may wonder why TSMC is highly promoting FDSOI, as we know that the foundry has not selected this technology. TSMC is supporting 28nm bulk planar, then 20nm (including double patterning for critical layers) and 16nm FinFET. So, why TSMC is doing such an advertising for FDSOI? Reading further, we can see:

An FDSOI ASIC design in the same footprint as a bulk planar ASIC design provides several advantages over the bulk planar ASIC design. Adaptive body bias techniques are inefficient with bulk planar designs because of the PN junction forward bias issue and because junction leakage increases in the reverse bias condition. Therefore, planar technologies have to adopt voltage scaling techniques for power savings in single Vt designs.”

It look like that TSMC is willing to demonstrate that a FDSOI design can be portable to a bulk planar technology, providing that the power rails have been carefully designed, and this requirement is extensively described within the patent (in fact, it’s the core of the patent). We have highlighted in Semiwiki one of the important advantages linked with FDSOI technology: a dual Vt library can support a complete SoC design, allowing cost savings (number of masks and process steps is lower) and faster process turnaround time, when compared with four Vt on bulk planar, only bulk option to offer the same level of power savings than FDSOI.

But we still don’t know why TSMC has filled this patent. Is it because the company is willing to offer FDSOI as an additional process option to existing customers? In this case, this patent could be a way to minimize risk, showing to a customer moving to FDSOI that he could decide to come back to a bulk planar option, with no redesign because the “FDSOI ASIC design is in the same footprint as a bulk planar ASIC design”. By the way, TSMC offering FDSOI process option would be a scoop…

Another possibility would be that TSMC is not willing to support FDSOI, but certain existing ASIC customer willing to try FDSOI with TSMC competition, this patent would allow TSMC to keep the door opened, and these customers could come back to bulk planar ASIC processed at TSMC. This approach would be like a double sourcing, but between bulk planar and FDSOI.

TSMC has certainly carefully looked at FDSOI as a technology option, even if so far the company doesn’t support FDSOI. I am happy to see that a TSMC patent highlights the many technical advantages of FDSOI vs bulk planar, like absence of reverse body effect, absence of latch-up, soft-error immunity, and elimination of junction capacitance. In this advantage list, we can add potential cost savings (when SOI wafer price will go down), faster wafer fab cycle time and probably the most important, far better power efficiency, whether the SoC is designed for Networking infrastructure or mobile application processor. Will all these advantages be enough to compensate some current weaknesses, like customer fear in front of innovation and work in progress IP ecosystem, and finally pushing TSMC to join the ST and Samsung train?

From Eric Esteve from IPNEST

More Articles by Eric Esteve…..


Transaction-based Emulation

Transaction-based Emulation
by Paul McLellan on 08-14-2014 at 7:01 am

Verification has been going through a lot of changes in the last couple of years. Three technologies that used to be largely contained in their own silos have come together: simulation, emulation and virtual-platforms.

Until recently, the workhorse verification tool was simulation. Emulation had its place but limits on capacity and its high cost, and difficulty of use, kept it from the mainstream. Virtual platforms had their niche but the modeling challenge meant that they were not nearly as widely used as they could have been.

Then simulation ran out of steam. State of the art SoCs were just too big for simulation. And we are not talking about gate-level simulation here, that ran out of steam years ago. This is RTL simulation. At the same time emulation technology improved both in terms of capacity and usability. It used to be a multi-week or even month project to get a design moved onto an emulator and getting everything up and running. Also, the cost came down and the ability to share an emulator among multiple-users at the same time further reduced the amortized cost. I have seen statements that emulation is now the cheapest verification cycle you can get, compared to running simulation on server farms. I don’t know if that is strictly true but it seems to be getting to be in the same ballpark. I moderated a panel on emulation at DAC last year with companies like TI and Broadcom on the panel. They all used emulation extensively and their only real problem was not being able to get enough of it. But there is never enough time and money to do all the verification you might like on a modern SoC.

It turned out that once people had emulators, the modeling problem for virtual platforms could be made to go away. Instead of hand-crafting behavioral or transaction level models and then trying to keep them synchronized with the RTL, it became possible to just use the RTL. Run the processor and its associated software load using the virtual platform technology but run the rest of the design by compiling the RTL into an emulator.

As you probably remember, Synopsys acquired Eve and its Zebu emulation product line last year. With various flavors of VCS they already had RTL simulation, of course. Plus, 3 or so years ago, Synopsys acquired Virtio, VaST and CoWare giving them virtual platform technology. Now, with a lot more integration work having been done Synopsys has new capabilities most of which they market under the brand-name Verification Compiler.

A couple of days ago Synopsys had a webinar Creating a High-performance Transaction-based Emulation Environment (yes, I know it would have been better to put this out a couple of days ago instead of today, but migraine struck. But there is a replay).Transaction-based emulation or TBE has become an increasingly popular method for utilizing emulators because of the high verification performance and flexibility in connecting to existing environments. Achieving high performance requires a combination of the emulator’s capabilities and tuning the environment that drives it to avoid bottlenecks. This tutorial will explain the necessary components and techniques to create a high performance emulation environment.

The webinar was presented by Lance Tamura, who is the CAE manager for the Zebu emulator.

The replay for the webinar is available here (registration).

And the non-silicon-valley SNUGs are coming up (with a bit better notice than for the webinar):

  • Boston on September 11th
  • Austin on September 23rd
  • Ottowa on October 8th


More articles by Paul McLellan…


Intel Versus TSMC 14nm Processes

Intel Versus TSMC 14nm Processes
by Scotten Jones on 08-13-2014 at 5:00 pm

Intel has begun to release some details on their 14nm process. I thought it would be interesting to contrast what Intel has disclosed to TSMC’s 16nm process disclosure from last year’s IEDM (TSMC calls their 14nm process 16nm).

[TABLE] align=”center” border=”1″
|-
| style=”width: 141px” |
| style=”width: 163px” | Intel 14nm
| style=”width: 168px” | TSMC 16nm
| style=”width: 116px” | Ratio TSMC/Intel
|-
| style=”width: 141px” | Process target
| style=”width: 163px” | MPU
| style=”width: 168px” | SOC
| style=”width: 116px” |
|-
| style=”width: 141px” | Status
| style=”width: 163px” | Shipping
| style=”width: 168px” | Development
| style=”width: 116px” |
|-
| style=”width: 141px” | Process type
| style=”width: 163px” | FinFET on bulk
| style=”width: 168px” | FinFET on bulk
| style=”width: 116px” |
|-
| style=”width: 141px” | Gate
| style=”width: 163px” | Gate last HKMG
| style=”width: 168px” | Gate last HKMG
| style=”width: 116px” |
|-
| style=”width: 141px” | Fin pitch
| style=”width: 163px” | 42nm
| style=”width: 168px” | 48nm
| style=”width: 116px” | 1.14
|-
| style=”width: 141px” | Gate pitch
| style=”width: 163px” | 70nm
| style=”width: 168px” | 90nm
| style=”width: 116px” | 1.29
|-
| style=”width: 141px” | M1 pitch
| style=”width: 163px” | 52nm
| style=”width: 168px” | 64nm
| style=”width: 116px” | 1.23
|-
| style=”width: 141px” | SRAM cell size
| style=”width: 163px” | 0.0588um2
| style=”width: 168px” | 0.07um2
| style=”width: 116px” | 1.19
|-

There are both similarities and differences between the processes. Intel’s process is for MPUs and TSMC’s process is for SOCs. MPU processes are more targeted and require fewer options. A TSMC SOC process for example would typically have 2 or more gate oxide thicknesses with options for 4 or more Vts while Intel’s MPU processes are single gate oxide and at 22nm were 3Vts. On the other hand Intel is now shipping 14nm MPUs while TSMC will not be shipping SOCs on 16nm until mid-next year (although Intel will likely not ship their SOC version of 14nm until next year either). Intel’s disclosure also shows a significant density advantage over TSMC at almost 20% for SRAM cell size.

Also read:Who Will Lead at 10nm?

The preceding numbers are all based on TSMC’s IEDM paper from last December. TSMC is also known to have an FF and FF+ process. The FF+ process shows significant improvements in performance over FF. Is this due to a shrink or what performance enhancement is used to achieve this? It will also be interesting to see how Samsung’s 14nm process compares once we have critical dimensions for them. I would be very interested to hear from any Semiwiki readers who can provide additional information on the TSMC or Samsung processes.

A critical metric for both processes will be cost. Intel has already disclosed that 14nm produces a significant cost reduction per transistor versus 22nm (at least for MPUs). Various industry observers have published articles projecting increased cost per transistor for foundries at both 20nm and 16nm/14nm. Our modeling suggests TSMC will achieve a cost reduction at 20nm and may achieve a small cost reduction at 16nm as well.


Smart Meters

Smart Meters
by Paul McLellan on 08-13-2014 at 7:05 am

The Internet of Things (IoT) isn’t a single homogenous market but splits up into different segments with very different requirements. A lot of IoT markets are still in our future: next generation wearable medical devices, autonomous cars and more. One area where IoT has been going strong, long enough that it probably pre-dates the catchy buzzword IoT, is smart power meters.

Today Atmel announced their latest power line communications SoC specifically designed for this market. The Atmel SAM4CP16B is an extension of Atmel’s SAM4Cx smart energy platform built on a dual-core 32-bit ARM Cortex-M4 architecture. It is fully compatible with Atmel’s ATPL230A OFDM physical layer device compliant with PRIME standard specification. The flexible solution addresses OEM’s requirements for various system partitioning, BOM reduction and time-to-market requirements by incorporating independent application, protocol stack and physical layer processing functions within the same device. Key features of the SoC include integrated low-power driver, advanced cryptography, 1MB of embedded Flash, 152KB of SRAM, low-power real-time clock, and an LCD display controller.


I think that as various submarkets of the internet of things develops, then we will see a lot of devices like this, SoCs that integrate everything that is required for a particular application, leaving the system company to customize the hardware, add their own software and so on. IoT will not be a market like mobile, with huge chips being done in the latest process generation. Many IoT designs will include analog, RF and sensors, all of which are best designed in older processes like 65nm or even 130nm.

The system volumes for many designs will be relatively low and so designing a specific chip for each application will be unattractive. Even in mobile where the volumes are much higher, only Apple and Samsung design their own application processors, as far as I know. Everyone else licenses one from Qualcomm, Mediatek or others. Even Apple gets the modem (radio) from Qualcomm. The aggregate volumes will end up being large (there will be a lot of things) so the prize goes to the semiconductor companies that do the best job of designing chips that match what the system companies require.

Data sheet for the Atmel part is here (warning, it’s 1000 pages)

See also:

What is the Latest in Mobile


More articles by Paul McLellan…


One Breath, One Milliwatt

One Breath, One Milliwatt
by Eran Belaish on 08-11-2014 at 8:00 pm

To understand how challenging it is to successfully implement Always-on Technology, consider doing any kind of sport while holding your breath. Sounds crazy? There’s actually one sport in which participants do just that – freediving. So what does freediving have to do with always-on technology? Quite a few things apparently.

Harsh Environment
In freediving, there is one resource which is by far scarcer than others – oxygen. In always-on technology that would be power. The problem is that these two resources are the most important ones to the functionality of the two types of systems, oxygen to the biological one and power to the electrical one. Being short of such fundamental resources usually spells bad news to the system, unless you know what you are doing.

Optimal Resource Utilization
To maintain an operational level of such precious resources, activity level has to be kept to the minimum necessary. Furthermore, any activity that is taking place has to make optimal use of the limited resources, which mandates deviating from old habits. For example, any component that is always-on must not include wasteful elements such as power-hungry processors like the ones found in smartphones. On the other hand, the always-on processor has to be efficient enough to be able to execute its tasks (e.g. Bluetooth-enabled voice activation, gesture recognition) — which could be quite intensive — with minimal power consumption. Similarly, to succeed in freediving one has to let go of terrestrial habits that might be fine given unlimited oxygen credit but have no real justification when submerged. Consider, for instance, hand movement that automatically happens when we walk. Such movement has no effect underwater and it wastes oxygen in vain. Furthermore, neutralizing the hands also helps in keeping a streamlined position.


Cutting Edge Technology to the Rescue

Less than 100 years ago, medical doctors believed that freediving below 30 meters was biologically impossible as the lungs would crash under the pressure of the water column above. Around that depth the air in the lungs indeed compresses dramatically and reaches residual volume which is about a third of original volume on surface. What the doctors didn’t know is that two mechanisms called mammalian diving reflex and blood shift kick in and slow down the heart rate, optimize blood circulation and transfer blood to the lungs to prevent them from crushing upon reaching residual volume. The current world record is 281 meters, so don’t believe everything your doctor tells you. In always-on technology, breakthroughs are still ahead of us. First and foremost SoC vendors should let dedicated processors handle always-on tasks rather than running them on yesterday’s power-hungry application processors. For example, the CEVA-TeakLite-4 runs various always-on functions simultaneously in less than 150uW at a 28nm HPM process node: voice trigger, face trigger, sensor fusion and Bluetooth Low Energy (BLE). Running similar functions on the application processor (AP) would require at least two orders of magnitude more power, clearly surpassing the power consumption threshold required for a reasonable battery life. This gap in power consumption is not accidental – unlike an AP, the CEVA-TeakLite-4 DSP is well adept to running such functions, which often require intensive signal processing. Furthermore, with its power-optimized hand-crafted RTL, power scaling mechanism and 10-stage pipeline that easily fits low power memories, the CEVA-TeakLite-4 consumes ultra-low power by design. On top of that there are a few technologies that still sound a little like science fiction but will probably commercialize given enough time, such as subthreshold conduction and energy harvesting that can dramatically reduce power consumption or charge the device with scavenged energy.

Separate the Men from the Boys
Any inefficiency in always-on systems has immediate implications, namely poor battery life as witnessed with recent smartwatches, for instance. While poor battery life can be tolerated by users of other devices, users of wearable ones are less forgiving, demanding longer times between charging (weeks vs. hours), and many of those devices end up in the back of a drawer as a consequence. In freediving, any inefficiency immediately translates to poor bottom time. The challenge here goes far beyond neutralizing frenzied limbs, as out of all organs, the brain is the biggest oxygen consumer and there is only one natural way to keep brain oxygen consumption to the minimum – relaxation. So next time you go freediving keep that in mind (or even better don’t keep anything in mind) and next time you design an always-on application, think carefully about which processor is the best fit for always-on functions. In both cases it will help you reach a similar goal – less frequent recharging.