CEVA Dolphin Weninar SemiWiki 800x100 260419 (1)

Semi Equip Stocks Roll Over & Play Dead? Semicon Post Partum Depression?

Semi Equip Stocks Roll Over & Play Dead? Semicon Post Partum Depression?
by Robert Maire on 07-28-2015 at 4:00 pm

Are Semi Equipment Stocks & Industry Rolling Over?
Is there any Upside in the second half of 2015?
Waiting for next years iPhone 7 refresh?
Does Windows 10 even matter?

“Sell in May and Go Away”

An old investor’s adage that has proven correct this year. The Semiconductor Equipment stocks seemed to have had a nice run up after reporting Q1 numbers in April which propelled the stocks higher into the first part of May. The euphoria of a good Q1 seemed to fade with a string of less than positive tech news related to semis, perhaps started by the weakening of the DRAM market.

The stocks then flattened in the second half of May and started to “roll over” in the beginning of June. That downward trendily picked up speed in the beginning of July and seems to have accelerated post Semicon West and into initial earnings results. This acceleration was partly due to capex issues from Intel & TSMC and partly do to a less than stellar tone coming out of Semicon West. Earnings news so far hasn’t lit anyone on fire yet.

Semicon Didn’t help much…

We were clear in both our preview of Semicon and our summary, post Semicon that there was no motivation to go out and buy Semi equipment stocks as there wasn’t enough positive tone to offset the negative news and questions going into the show. If anything the flattish tone of the show made us more concerned…

Semicon Summary

The question now becomes “Is this just a pause or a lull or will the industry roll over and see a downturn?” The stocks seem to be forecasting a downturn rather than a pause.

A one legged stool….

It would appear the only remaining strong driver in the industry is 3D NAND spending, the other three legs of the stool, foundry, logic & DRAM are less than stable.

The second half of 2015 will have to see a significant uptick in 3D NAND spend to offset reductions in other areas….and that would be hard enough to get to a flat second half, it will be way harder for #D NAND spending to both offset weak spending and be strong enough to power an increase in the second half.

Where to hide if things roll over…

Historically consumables tend to be more stable than capital equipment plays when it hits the fan. Fabs still need consumables such as slurry, photomasks, filters, probe cards etc; etc. This would suggest that companies such as CCMP, PLAB, ENTG and FORM should far a bit better even though they will see the downturn as well. CCMP is a bit weaker but slurry sales will continue. FORM appears to be oversold by an overreacting market as its enterprise value has been cut in half and we are relatively sure that business won’t be cut in half. ENTG continues to be a strong performer.

What could suffer more…

Sub suppliers that provide components to the bigger equipment companies tend to be at the “end of the whip” and fall faster in a downturn and climb higher in an upturn. In a downturn, companies such as AMAT & LRCX slow down orders and burn off inventory. Sub suppliers are a leveraged play to the equipment companies. In this category we find AEIS, BRKS & MKSI. Though all three have been doing a good job of trying to diversify, at the end of the day they are still leveraged to equipment companies.

Back End not a hiding place…
Back end spending has been more stable in past years but appears to be falling off a bit with front end this time around. In the test area we do not see major changes that would drive significant new testers sales such as a transition from 4G to 5G (which is still way , way off). Internet of things can be tested on existing tools as we are not talking about a significant change in capability. Assembly and inspection in the back end also is not great as we saw from BESI’s guidance and UTEK’s results. We would anticipate that back end business at RTEC may see a similar weakness.

Playing 3D NAND exposure…
If we wanted to stay in large cap equipment companies we would likely choose LRCX as they appear to have the highest exposure to 3D NAND and multi patterning. At Semicon , LRCX pointed out that they have the number one share in the 3D NAND market which could help them as memory exposure has helped them in the past.

Windows 10 is a non event…

It seems as if Windows 10 is not event a factor in the semiconductor industry. In years past, before the advent of smart phones and tablets, each new version of Windows would drive a huge up cycle in the chip industry and so far Windows 10 is more like a lame whimper which appears to have zero impact on spending and chip production.

Following the Iphone refresh cycle…

Cycles in the semi industry continue to reinforce the seasonal pattern. We are well into production of the components for the Iphone 6S so its clear that we aren’t going to be spending in the near term for either processors in the 6S or memory we are past the spending season for chips in the 6S and we aren’t into the period where spending will pick up yet for the Iphone 7, so we are stuck in a natural lull.

Macro questions still remain and Android phones are not on fire. Qualcomm’s recent results throw another wrench into the Semi industry as you can’t imagine them being as aggressive in the number of different SKUs or overall demand for chip capacity in the industry. We would imagine that Qualcomm will certainly be more conservative and that translates to spending less.

Timing….

Given that we will be going into the dead zone of August its likely the stocks may be”dead money” for a while or at least until the Iphone 6S launch in the fall. The only thing going on near term is Windows 10 and doing a product upgrade as we enter August seems like timing guaranteed to fail. But then again our expectations of Win 10 are zero anyway….

The next chance for the stocks to recover may be in October but right now we are not holding our breath for that….

Robert Maire
Semiconductor Advisors LLC

Also Read:
Intel 10nm delay confirmed by Tick Tock arrhythmia leak-“The Missing Tick”


Build Low Power IoT Design with Foundation IP at 40nm

Build Low Power IoT Design with Foundation IP at 40nm
by Pawan Fangaria on 07-28-2015 at 12:00 pm

In a power hungry world of semiconductor devices, multiple ways are being devised to budget power from system to transistor level. The success of IoT (Internet of Things) Edge devices specifically depend on lowest power, lowest area, optimal performance, and lowest cost. These devices need to be highly energy efficient for sustained battery life or longer duration. Some health care devices may not be even accessible to replace battery.

A typical SoC for tremendously growing wearable and fitness segment of IoT can have significant amount of logic, memory, and foundry IP including Flash and GPIO. The power management and wakeup circuitry on chip are essential to conserve energy during sleep mode of operation of various components or power domains. Also, energy can be minimized by optimal adjustment of Vdd and Vth at different performance levels. A key to enable successful energy saving operations in an IoT device lies in the type of Foundation IP the device employs.

There are IoT optimized low-power Foundation IP including memories and logic libraries provided by Synopsysand TSMCthat support energy saving operations by various means in SoCs. Synopsys provides TSMC sponsored 40nm ULP logic libraries for IoT applications that can be with or without eFlash. It contains power optimization kits and low-power cell kits for low voltage operations (down to 60% of VddNom) with reduced leakage and dynamic power. The power optimization kits contain special cells such as power gate, isolation cell, always-on, retention register, and level shifter to enable operations such as eliminating leakage in idle cells, managing multiple voltage domains, lowering leakage in active mode, and so on. For example, Synopsys provides an innovative ‘Live Latch’ for retention register with its protocol fully adopted in Liberty UPF and CPF systems. The TSMC sponsored offering includes different clusters of PVTs to manage power domains ranging from 0.9v +/- 10% to 1.1v +/- 10%.

There are special foundation blocks with optimized structures for low-power beyond what is provided by voltage and time based operations to reduce power. Let’s analyze a few of these.


There are special FFs that can stretch performance at low voltages and minimize power. On the left side of the above picture, the delay optimized flop passes a signal very fast from clock to Q and the setup optimized flop catches the signal at very low setup time or even negative setup time, thus allowing a faster clock. On the right side, there is a 2-bit flop that operates on a single clock line. This reduces the load (capacitance) on clock line which improves area and leakage. The methodology can be used for larger flop structures, thus reducing a significant amount of power spent in a clock tree.

There is ULL (Ultra Low Leakage) library for always-on wakeup circuits that supports a wide range of voltage operations from 0.9v to 3.6v and a leakage reduction of up to 100x. The ULL control block is used to manage power of a device through on-chip LDOs (low-dropout regulators) as well as off-chip regulators. It also provides alarms to wakeup SoC, logic, memory, and so on.


The IoT devices need high density memories with low leakage and power reduction modes. Above is an example where source biasing can be used to reduce leakage in an SRAM by 70%. There is ultra-low-power viaROM with 20%+ leakage reduction. The array can be shut down when not in use.


Synopsys provides a DesignWare System for embedded and external memory test, repair and diagnostics. It has very small BIST footprint. It supports SST’s 40nm eFlash. The errors in an eFlash which are different than in SRAM can be searched and the affected cells can be substituted by a column of bit cells. The system is seamlessly integrated with a Yield Accelerator and a SiliconBrowser. The award winning debug and diagnostic ecosystem allows seamless entry into the device from a PC.

Oticon designed a hearing aid device using 65LP DesignWare memory and logic IP. It runs multi-DSP core at extremely low voltages and uses power optimization kit to manage multiple voltage domains. The device is being migrated to 40nm.

With low-power foundation IP and high density eFlash which are big enablers for IoT devices, 40ULP is an attractive technology node for IoT devices as it provides a significant gain in area compared to 55nm and 90nm. TSMC 40ULP has 7-track UHD with less leakage and less power, and 9-track HD libraries for higher performance. There are various options to choose between performance and leakage from a range of libraries at 7-track UHD.

It’s important that the EDA tools and flows understand various techniques used to reduce power. For example, performance flops and multi-bit flops need to be understood by synthesis, clock-tree synthesis, P&R, DFT and verification tools.

There is a webinar sponsored by Synopsys in which Ken Brock has provided lot of details about how this ULP Foundation IP at 40nm provides an attractive solution for IoT devices. The on-line webinar is freely available HERE. It requires an instant registration.

Pawan Kumar Fangaria
Founder & President at www.fangarias.com


Ultra-low Power IP for Wearables

Ultra-low Power IP for Wearables
by Paul McLellan on 07-28-2015 at 7:00 am

Wearables and the Internet of Things (IoT) in general are all about low power. Everyone must have read (or even experienced) the phenomenon of putting something like a Fitbit on and then after a short period leaving it in a drawer or putting it to recharge and forgetting about it for weeks. The longer devices can last the more likely they are to be successful. The biggest complaint about the Apple watch seems to be battery life, it barely lasts a day.

But wearables and IoT are not just about the actual device, they are also about communication infrastructure and computing/storage infrastructure. It is all about partitioning intelligence between the device and the cloud and ensuring that the data can be transmitted efficiently between them. Power is important in all of these areas. Even datacenters are power limited, although obviously at a completely different tradeoff point from wearables.

eSilicon has a lot of experience with low power design in all three of the IoT segments. They have an extensive portfolio of internally developed special purpose memories and register files and experience with special low-power processes:

  • ultra low power and ultra low voltage SRAM and ROM
  • pseudo-DP SRAM
  • low-leakage SRAM
  • 65nm, 55nm, 40nm, 28nm at TSMC, GlobalFoundries, SMIC and more

IoT mobile computing requires low-power high-performance embedded memories. IoT medical requires low-power and low-voltage embedded memories. And IoT computing requires low-power high-density embedded memories optimized for microcontrollers and processors. Although there is some overlap, these segments have largely different requirements.

eSilicon also provides STAR Navigator that allows design groups to explore the IP portfolio risk-free. They can download all the views of the IP (except the physical layout) so that they can easily do rapid prototyping, experiments, comparisons and generally “try before you buy”. This is all handled through an online portal and doesn’t require interacting with salesmen or placing a purchase order. Of course, if eventually you want to tape out a chip using some of the IP you “tried” you do have to eventually “buy” it to get access to the physical layout.

eSilicon also have another service called STAR Optimizer. This makes use of a lot of internal big-data analytics on all the designs eSilicon have seen and allows a design that is far enough along to be further optimized, sometimes with subtle changes like changing the core voltage of a memory but not the periphery, or swapping out one bit cell for a different one, or varying the process spread and trading off a little yield for lower power. This provides a very structured way to get access to the captured design expertise of a team that has done a lot more designs than any one design group is likely to have done.

For example, here is an interface ASIC in 28nm with 17M gates and about 42Mb memory subsystem. The goal was the lowest possible standby power and idle power. By making changes, standby power was reduced by 8X and idle power by 20X.

Another important recent development for low power is that TSMC has gone back and produced new ULP (ultra low power) versions of some of its mature processes such as 40nm. Other foundries have also been returning to older nodes, not just pushing on to FinFET, since designs are now spread across the process spectrum depending on what features they require. By remapping designs from 40LP to 40ULP there is considerable reduction in leakage power which has a major impact on SRAM leakage in particular (and many chips are dominated in area/leakage by SRAM).

IoT designs are likely to go through a sequence where initially they are built out of standard products until it is clearer what optimized solutions can be created using silicon level design and IP. But a lot of differentiation is going to come from the ability to choose and deploy low power semiconductor IP.

The eSilicon STAR portal is here.


Designing an IDCT for H.265 using High Level Synthesis

Designing an IDCT for H.265 using High Level Synthesis
by Daniel Payne on 07-27-2015 at 8:00 pm

Math geeks know all about Inverse Discrete Cosine Transforms (IDCT) and a popular use is in the hardware architecture of High Efficiency Video Coding (HEVC), also known as H.265, the new video compression standard and widely used in consumer and industrial video devices. You could go about hand-coding RTL to create an IDCT function, but it would take you too many lines of code and precious engineering time compared to using higher level languages like C++ or SystemC. The promise of using High Level Synthesis (HLS) is that you can actually code your video algorithms in much less time and code compared to RTL, thus getting to market quicker with less engineering effort.

Uday Das from Calypto presented a tutorial at the #52DACevent last month in San Francisco with the subject, “Building an IDCT for H.265 Using Catapult“, so I reviewed the 46 slides and share my impressions in this brief blog. The HEVC specification calls for 4 transform units of various sizes: 4×4, 8×8, 16×16 and 32×32 to code the prediction residual. The hardware architecture here uses a row column decomposition approach that performs a 1-D operation on each row, followed by another 1-D operation on each column:

Related – NVIDIA and Qualcomm Talk about High Level Synthesis, Samsung on Low Power for Mobile

Algorithm
The IDCT algorithm can be described as a lower order matrix embedded in a higher order matrix, then detailed in a signal flow graph as an 8 point IDCT A8, made up of 4 point 1D IDCT A4 and an odd matrix M4:

Data flow for this algorithm can be designed using two major functions: Butterfly, Mult_odd.

An interface description can then be written in either C or SystemC, where C code is more compact:


A core class can be written and then re-used for the 4, 8, 16 and 32 points of Mult_odd and Butterfly member functions:

The Butterfly function is common for all sizes, and notice that there is no timing information at this level. The HLS tool Catapult will unroll the loop to create hardware for parallel execution.

Related – Shorten the Learning Curve for High Level Synthesis

Our functional model of the 1-D IDCT has instances of function calls and some muxes:

To meet the H.265 specification we have to make a parallel implementation and create a 2-D IDCT using some hierarchy:

Using HLS
Designers use the HLS tool Catapult by adding design files, clicking on a hierarchy tab selecting the top-level blocks, then clicking on libraries to select a specific technology and RAM models. Next you click on mapping an choose a target clock frequency, than map your data_in and data_out as RAM.

You next select your main loop and see which resources are being used in the design:

To schedule when operations are to occur you click on the schedule tab and work with a Gantt chart. Finally, you are ready to generate RTL code.

Verification
To double check that the generated RTL code is actually performing what we had in mind with our algorithm we need to create a testbench and verification flow. Most of this process is now push-button automated for us:

The transactors are what converts function calls into pin-level signal activity.

Related – Verifying the RTL Coming out of a High-Level Synthesis Tool

Summary
The tutorial from DAC showed me that C++ and SystemC coding are more compact to describe my video hardware than using RTL code. The Catapult tool for HLS is used to control micro-architectural decisions so that I can trade off power, performance and area metrics.

Companies like Google have found that using HLS on their VP9 video compression design was 2X faster than the previous approaches using hand-coded RTL, while dramatically reducing the number of lines written. Give the folks at Calypto a call to start discussing how appropriate HLS is for your hardware architecture, you may just find out that you can get your next IP or SoC to market in less time with fewer engineers, a nice benefit.


6 Memory Considerations for IoT Designs Built Around Cortex-M7 MCUs

6 Memory Considerations for IoT Designs Built Around Cortex-M7 MCUs
by Majeed Ahmad on 07-27-2015 at 12:00 pm

Tightly coupled memory (TCM) is a salient feature in the Cortex-M7 microcontrollers as it boosts the MCU performance by offering single cycle access for the CPU and by securing the high-priority latency-critical requests from the peripherals.

Continue reading “6 Memory Considerations for IoT Designs Built Around Cortex-M7 MCUs”


Synopsys’ Andreas Kuehlmann on Software Development

Synopsys’ Andreas Kuehlmann on Software Development
by Paul McLellan on 07-27-2015 at 7:00 am

Andreas Kuehlmann is the general manager of what is officially now known as the Software Integrity Group of Synopsys, what you might think of as Coverity although they have made some acquisitions too, so they now have a broader technology base. I sat down to talk to him last week.

He was brought up in Germany and came to the US in 1991 to join the IBM TJ Watson Research Center. He was involved with high level synthesis and worked on equivalence verification, in time enabling IBM’s custom verification.

In 2000 he joined the Cadence Berkeley Labs (which was where I think I first met him since I was at Cadence at the time). In 2003 Andreas was promoted to being in charge of running the labs. In 2010 he joined Coverity as the VP of R&D. Funnily enough he had also become the president of IEEE Council on Electronic Design Automation (CEDA) so he became president of CEDA just before leaving design automation! Since 2002 he has also been an adjunct professor at Berkeley.

Coverity was acquired by Synopsys and in May of this year Andreas was appointed GM of the Software Integrity Group. Despite being part of Synopsys, Andreas emphasized that their business is not EDA. They are serving the software industry which is much larger than semiconductor. To give you an idea, there are about 100,000 design engineers, 1M embedded software engineers and 10+M software engineers total growing 10% per year.

A year ago they acquired a small startup in France which manages software test execution, finding which tests need to be run when a change is made. Recently, they added two acquisitions in the area ofdynamic security testing, complementing the static analysis approach used by the Coverity technology. So these acquisitions add dynamic analysis.

The mission of the group is to make software development a more mature process. There is a great diversity in the maturity level companies apply to software development and many don’t use modern methodologies. In chip design you don’t get to “run” the code by taping out the chip, so if you don’t use modern methodologies you don’t get working chips. Software development is not like that and quality and security suffers as a result.

What is needed is a more general approach like we use in hardware design with a combination of different approaches. Static analysis under the hood uses some of the same technology as formal verification but there is no code reused, you can’t just yank out some Synopsys product and make a software version. Some Synopsys products in the system space, such as virtual platforms, are also involved in embedded software, where there is a much stronger awareness of the disciplined approach since they see what the IC designers do day-to-day.

Embedded software is simply any software that runs in a box: a car, a washing-machine, a router. It is not small scale. There is a lot of code in your smartphone as you probably know, although that is a lot less mission critical than your car.

In the IC world, the tool investment is $50-100K per engineer. In the software world it is more like $10-12K. This will change. Software development is a labor-intensive process and with modern tools it can be done much better. It makes no sense to pay a software engineer $150K/year and then not give him or her good tools, any more than it makes sense with an IC designer.

I asked Andreas about open source competitors? He says they are inferior. Anything with high algorithmic content is hard to develop using open source projects because it depends on deep expertise not just manpower. It really doesn’t make any more sense for a software engineer to write their own C++ static analysis than it does for a design engineer to write their own static timing analysis. Apart from the opportunity cost, they almost certainly don’t know how to do it.

Having said that, they are involved with the open source community. They have scan.coverity.com which allows open source projects to use the Coverity technology for free. It has been applied to several thousand projects already. And they have found their share of bugs, even in some high profile projects like Linux and Apache (the webserver, not the EDA company).

See also Bijan Kiani Talks Synopsys Custom Layout and More
See also Antun Domic, on Synopsys’ Secret Sauce in Design


Power Analysis Needs Shift in Methodology

Power Analysis Needs Shift in Methodology
by Pawan Fangaria on 07-26-2015 at 7:00 am

It’s been the case most of the time that until we hit a bottleneck situation, we do not realize that our focus is not at the right spot. Similar is the case with power analysis at the SoC level. Power has become equally if not more important than the functionality and other parameters of an SoC, and therefore has to be verified earlier along with the functional verification of the SoC. Today, an SoC is a complete system with various functional units with different power profiles and requirements. Hence the traditional method of running simulation, generating switching activity for a number of cycles, using a power analysis tool to analyze this activity, and then extrapolating to estimate power for the whole SoC is no more appropriate; it was fine for smaller chips with limited functionalities. Today, we need to analyze actual switching activity for the complete run of applications on an SoC.

Emulation is an emerging solution to capture switching activity over long emulation run; typically in an SAIF (Switching Activity Interchange Format) file. However SAIF do not have temporal information which is a key need to identify power peaks at different times. VCD and FSDB formats have temporal information; however they are inefficient due to their large generation time and then read/write times. Also, there are other inefficiencies in their data organization and storage and access mechanisms. Moreover, the power analysis tools may not be able to handle such large files generated by emulation. Overall, even an emulation methodology based on file-based flow is not the right solution for exploring and analyzing power at the SoC level.

So, what’s the alternative for detailed power analysis of design regions and applications that cause high switching in SoCs in real scenarios? During 52[SUP]nd[/SUP] DAC we heard from Mentorand ANSYSabout an innovative approach where Veloce generates real-time dynamic power data and PowerArtist reads it directly for power measurement and analysis without any file-based interface. I have already written about some details on PowerArtist side; see the link at the end of this article. Now it’s my pleasure to write about some actual details of what happens on Veloce side.

Veloce emulation system is used to boot the OS and run live applications. The Veloce Activity Plot is its unique capability to identify high switching regions over long emulation runs and enable designers to trace back to the logic blocks or applications that have power concerns. One can view the activity plot of a full design and analyze its power consumption pattern in an order of magnitude lesser time than the time taken in a file-based system. As an example, the activity plot of a 100 million gate design for 75 million clock cycles can be generated in just 15 minutes by Veloce emulation system.

After identifying high switching activity regions at the top level of the design, the sub-blocks or applications responsible for high switching activity can be analyzed further. The time zone information thus obtained can be captured in a tzf (Time Zone File) file and sent to Veloce for generation of complete data for the selected time windows for detailed power analysis.

During the emulation run, live switching data can be sent to PowerArtist through Dynamic Read Waveform API (DRW-API). This approach enables accurate power calculation at the system level where booting an OS and running software applications is required. The dynamic API-based streaming of switching data between emulation and power analysis tools allows for all operations to be run in parallel including emulation of the SoC, capture of switching data, reading of the switching data, power analysis and generation of power numbers.

Both the tools work on the same data model which improves the efficiency of data exchange between them. The compile times of both tools are aligned. Also, a native ‘critical signal list’ (typically 10-20% of total signals in a design) is used to further improve time to power performance by reducing data exchange between the tools.

Veloce along with PowerArtist enables a complete RTL power exploration for power budgeting and tradeoffs, as well as an accurate power analysis and signoff at the gate level in a targeted application environment. The direct data exchange provides huge improvement in the time of overall flow. The verification cycles to collect design switching activity can be very long compared to simulation. Data-driven decisions for accurate power analysis are enabled over a variety of test scenarios.

This particular flow to generate power numbers has provided up to 4.25x speed improvement over file-based flow on real customer designs. A table of designs along with their speed improvement numbers is given in a whitepaper written by Vijay Chobisa and Gaurav Saharawat at Mentor Graphics. The whitepaper contains more details about the new methodology; it can be accessed from HERE.

Also read: How PowerArtist Interfaces with Emulators

This innovative methodology unfolds a powerful, accurate, and right way of power measurement and analysis in a new environment where multiple functions such as computing, gaming, video streaming, watching movies etc. can be integrated on a single device such as a Smartphone. We can expect more such innovations for power analysis in the near future.

Pawan Kumar Fangaria
Founder & President at www.fangarias.com


A Candid Conversation with the GlobalFoundries CEO!

A Candid Conversation with the GlobalFoundries CEO!
by Daniel Nenni on 07-25-2015 at 8:00 pm

I did not know Dr. Sanjay Jha prior to this meeting but I certainly knew of him from his time at Qualcomm. It seemed a bit odd for me to fly to Dresden to meet a man that is based here in Silicon Valley but that made the meeting all that more interesting. Especially after finding out the German Chancellor Angela Merkel would also be visiting Fab 1, absolutely.

While at Qualcomm Sanjay was known as a technically brilliant man with exemplary business skills to which I agree whole heartedly. After 20 years at Qualcomm, which started in design and ended in the executive ranks, Sanjay joined Motorola as CO-CEO and spun out the mobility division which he later sold to Google for a whopping $12.5B. The next time I heard Sanjay’s name was on the short list for the next Intel CEO which I think would have been an excellent choice. That position of course went to an Intel insider (Brian Krzanich) and much to my surprise and great pleasure Sanjay joined GlobalFoundries. Knowing what I do about him I expected great things but I never would have expected the acquisition of the IBM semiconductor division. Not only did GF acquire it, they got paid $1.5B! Talk about the epic deal of the century!

One thing I should warn you about when talking to Sanjay is to be careful what you ask technically because he can go down to the transistor level, no problem. We had a very interesting discussion about Vts and body bias constraints. It is interesting to note that body bias is not mandatory when designing to FD-SOI but if you choose to it can be used at the chip and block level so a designer can do what-ifs and decide how best to leverage it but I digress… I will have one of our PhD bloggers write about body biasing in more detail because it is a point of contention between CMOS and FD-SOI design it seems.

After the interview I was afforded a trip through the Fab 1 clean room. Certainly not my first time through a clean room but it is always exciting to see the inner works of semiconductor manufacturing. In fact one of my first jobs out of college was in a fab on Mathilda Avenue in Sunnyvale circa 1984. I remember seeing the “No Accidents in X Days” sign for the first time and the X was in single digits which was frightening considering how toxic fabs could be back then. And I have no idea why they call clean room outfits bunny suits because there is nothing cute about them!

Other bloggers covered the technical details of the 22nm FD-SOI announcement:

GlobalFoundries 22nm FD-SOI: What Happens When

GlobalFoundries FD-SOI. Yes, It’s True

GlobalFoundries Endorse ST/LETI FD-SOI 22nm!

And Paul McLellan just added two more interesting FD-SOI blogs here:

FD-SOI: a Gentle Introduction

Thomas Skotnicki: FD-SOI 26 Years in the Making

Sanjay also committed an additional $250M investment for 22nm development and capacity. This brings the total Dresden investment to more than $5B since 2009 ensuring that Fab 1 will continue to be the largest semiconductor manufacturing facility in Europe. Hopefully Chancellor Merkel was impressed and offers matching funds or Government incentives of some sort.

Also Read:

CTO Interview with Dr. Wim Schoenmaker of Magwel

IROC Technologies CEO on Semiconductor Reliability

CEO Interview: Jens Andersen of Invarian


Device Noise Analysis, What Not to Do for AMS IC Designs

Device Noise Analysis, What Not to Do for AMS IC Designs
by Daniel Payne on 07-24-2015 at 12:00 pm

AMS IC designers have a lot to think about when crafting transistor-level designs to meet specifications and schedules, so the most-used tool in their kit is the trusted SPICE or FastSPICE circuit simulator to help analyze timing, power, sensitivity and even device noise. I just did a Google search for “device noise analysis ic” and the first five of ten search results all have links to Mentor Graphics, so that led me to read their recent White Paper entitled, “Ten Common Device Noise Analysis Mistakes“.

The three types of Device Noise Analysis are listed in this table for a variety of AMS circuits:


The basic idea is to run the appropriate type of noise analysis depending on the circuit type. If you are using a standard SPICE circuit simulator, then the following list of 10 mistakes will be helpful to avoid, or you could consider using the Analog FastSPICE(AFS) tool from Mentor that is equipped to handle each issue.

1. Insufficient Transient Accuracy
An AMS circuit may require 80 dB to 120 dB of dynamic range, however the default SPICE tolerances typically default to a reltol=1e-3 value, providing just 60 dB of dynamic range. For each 20 dB of dynamic range then tighten reltol by a factor of 10X smaller.

2. Periodic Noise Analysis with Too Few Sidebands
The default number of sidebands in SPICE for periodic noise analysis is under 50, however for 45 nm and smaller geometry process nodes you will want to use greater than 1,000 sidebands. Traditional SPICE run times increase quadratically with the number of sidebands, so you can quickly run out of RAM. The magic sauce with AFS RF allows you to run full-spectrum device noise analysis, which has an accuracy equivalent to an unlimited number of sidebands per run.


Switched-Capacitor Filter Noise Voltage Comparison

3. Simplifying Circuits for Periodic Noise
Using SPICE for periodic noise analysis consumes too much RAM, so you may be tempted to just simplify your netlist by removing transistors and other elements. The big downside is that you could make a mistake in your reduced netlist, plus you are taking up valuable engineering time. With the AFS RF tool you can simulate circuits with more than 100K elements, a much easier approach.

Related – Full Spectrum Analog FastSPICE Useful for RF Designs on Bulk CMOS

4. Not Including Parasitics in VCO device noise analysis
With limited capacity in SPICE it’s tempting to simulate a VCO using a pre-layout netlist with no parasitics, although the values your simulation returns aren’t really accurate without parasitics.

5. Using Oscillator Noise instead of VCO noise
Having a circuit simulator that can perform only free running oscillator noise is quite a limitation if you really have a voltage controlled oscillator (VCO), because the analysis results are quite different. The AFS RF tool handles both free running oscillator and VCO noise analysis.

6. Manually Analyzing VCO Sensitivity
One part of optimizing a VCO is to minimize phase noise, and you can perform this analysis manually by taking more time, increasing the possibility of errors, and really only approximating the values. A more elegant approach is to visually see the instantaneous noise from every device as it contributes to the VCO output noise. Here’s what that analysis looks like with AFS RF for the noise intensity, sensitivity and jitter:


​VCO Noise Contribution Sensitivity

7. Not Updating Transient Noise for Every Device at Every Time Step
A traditional transient noise simulator may update the random device noise injection at fixed time intervals, but not at every simulator time step, causing inaccuracy. With the AFS simulator you will see accurate transient noise results by injecting noise at every time step.

8. Setting Transient Noise NoiseFmax Too Low
With SPICE you can set the noisefmax simulator setting to trade off runtime and accuracy, choosing a smaller noisefmax gives a shorter runtime but with lower accuracy, choosing a higher noisefmax gives better accuracy but at slower runtimes. It’s hard to tell when you’ve made the right trade off with noisefmax. Since AFS uses the full noise spectrum from noisefmin to noisefmax there is no truncating the device noise spectrum.

Related – DAC Update on IC Design Tools at Mentor

9. Setting Transient Noise Tstop Too Short
Users get to set their tstop value controlling how many cycles a simulation runs, but choosing a value that is too short means that the results could be statistically uncertain. When using the AFS simulator for transient noise analysis you get a recommended tstop value that gives the desired statistical confidence level.

10. Post-Processing Mistakes
Many AMS designs have performance metrics measured in the frequency domain, while a transient noise analysis is in the time domain so you have to use a fast Fourier Transform (FFT) post-processing on the data. Be cautious about post-processing because it impacts accuracy by:

  • Extra spectral leakage beyond 2.5 FFT bins
  • Some signal frequencies are not perfectly centered
  • An FFT window which doesn’t minimize spectral leakage
  • Using MATLAB default FFT windows


Excessive Spectral Leakage

With AFS there’s CalcPad that does the post-processing of transient noise waveforms to ensure correct FFT-based results.

Summary
Device noise analysis is required for AMS IC designs to reach performance goals but there are many pitfalls when using a traditional SPICE circuit simulator. Choosing a SPICE circuit simulator like Analog FastSPICE from Mentor Graphics will help you to do that safely and quickly. Read the complete 6 page White Paper here.


FD-SOI: a Gentle Introduction

FD-SOI: a Gentle Introduction
by Paul McLellan on 07-24-2015 at 7:00 am

Over the last couple of weeks, FD-SOI has been in the news with GlobalFoundries announcement of a 22nm FD-SOI process that will run in the Dresden Fab. Also, earlier in the week I talked to Thomas Skotnicki about the saga (and it is a saga) of how FD-SOI got from his PhD thesis to volume manufacturing and global deployment. But there is a lot less knowledge around about FD-SOI than there is about FinFET so I thought it would be good to go back and see where the motivation for a process like that came from.

See GlobalFoundries 22nm FD-SOI: What Happens When

See Thomas Skotnicki: FD-SOI 26 Years in the Making


By 28/20nm planar processes were running into problems. The channel area underneath the gate was getting very short and the gate was no longer powerful enough to control it properly. It could control the top part of the channel but the further from the gate the less the control. In particular, when the gate was off there were paths between source and drain that remained on and so there was very high leakage. See the diagram above. It was clear that a new transistor architecture would be required.

The basic constraint was that all of the channel needed to be close to the gate so that it could be controlled properly. One way to do this was to make the channel into a thin vertical fin (like a shark’s fin, that is where the name comes from) and wrap the gate around it which gives you FinFET. Since the fin is thin, it is never far from the gate and control is good and leakage is low.

The alternative is to go horizontal. If a thin channel is put on top of an insulator, and the gate is built on top of that then there is once again good control and low leakage. There are simply no paths through the channel that are far from the gate and so poorly controlled because the insulator is…well, an insulator. Current cannot flow there. The transistor is not quite as good as FinFET since it only controls one side of the channel, but it is a lot easier to manufacture. That is thick-box FD-SOI (box just stands for buried oxide, the insulator underneath the channel). If, however, the box is very thin then in effect the substrate itself becomes a sort of second back gate and can further be used to control the channel, not to turn it on and off but to affect its performance. See the diagram above.

The above diagram shows how forward body bias (FBB) works. FBB of up to about 1.5V can be applied (GF reckon they can go to 1.8V at 22nm). When higher performance is required the FBB is applied. The transistors are faster at a cost of slightly higher leakage. When the design is in a standby mode of some sort then the FBB can be removed to reduce the leakage again. Alternatively, the power supply voltage can be reduced, which would slow the transistor down too much to meet timing, and then FBB can be used to speed it up again. But the voltage is still down, reducing leakage and, of course, dramatically reducing dynamic power (voltage is squared in the power equation).
Since the FBB is under software control, it can be used very much like dynamic voltage and frequency scaling (DVFS) under software control and the EDA and signoff flows required for FBB are almost identical. The above diagram shows for the GF 22FDX process the effect. The red line is 28HKMG, the green is 22FDX and the blue is 22FDX with FBB of 1.5V.

FBB can also be used to reduce overdesign by recentering the parts and move slow and typical parts to the fast end of the distribution, and to narrow the distribution. One of the big selling points of the GlobalFoundries 22FDX process(es) is that it can reduce the power supply voltage all the way to 0.4V.

You might have heard that FD-SOI uses an expensive starting material (wafer blank) and that is true. But, in effect, that is because a couple of “mask steps” have already been done. I put it in quotes since no masks are involved. The wafer blank looks like the above picture. For 28nm the top silicon layer is 12nm thick and the box is 20nm. Some (5-6nm) of the top silicon is lost during processing leaving a channel depth of about 6nm. For 22nm these numbers are further reduced. The rest of the process is actually simpler than the equivalent bulk process involving fewer masks and fewer mask steps (and about half the number of FEOL masks as the equivalent FinFET process, although the BEOL is, of course, the same).

Any SOI technology has some inherent advantages over bulk:

  • no well taps required
  • isolation makes on-chip RF much simpler
  • higher resistance to latch up
  • inherently more radiation hard (in fact SOI was first used in space)
  • reduced parasitic capacitance (this is one of the challenges with FinFET)

Thin box FD-SOI has the further advantage of forward body bias. And the FD (fully depeleted) part makes manufacture easier since zero is a very easy number to keep under control.