webinar IPXACT banner

TSMC OIP: What to Do With 20,000 Wafers Per Day

TSMC OIP: What to Do With 20,000 Wafers Per Day
by Paul McLellan on 09-17-2015 at 4:42 pm

Today it is TSMC’s OIP Ecosystem Innovation forum. This is an annual event but is also a semi-annual update on TSMC’s processes, investment, volume ramps and more. TSMC have changed the rules for the conference this year: they have published all the presentations by their partners/customers. Tom Quan of TSMC told me that they will also provide a subset of the presentations TSMC gave to open the day.

The semiconductor business is driven by several large markets, the biggest of which is mobile. Fun statistics of the day are that mobile grew 26% from 2014-15 to shipments of 1.9B units. Since there are 4.3B worldwide mobile users, this means that the annual replacement rate is close to 50%. Global mobile traffic is forecast to go up 10X in 5 years from 30EB/yr in 2014 to 292EB/yr in 2019 (EB is exabyte).

For the future, the three big markets other than mobile are Internet of Things (IoT), Automotive, and High-performance Computing (HPC).

Let’s start with IoT: the market has a forecast CAGR from 2013 to 2018 of 21%. But the market is ripe in that 99.4% of devices are notconnected, so by 2022 the average house is forecast to have 500 smart devices. Of course every time you blink the IoT forecast goes up by a billion units but for sure it is real.

The big opportunity in automotive in the medium term is driverless cars or, before that, advanced driver assist systems (ADAS). Google’s driverless cars have done over 2M miles (with 16 minor accidents, all the fault of the other vehicle). Delphi/Audi drove a vehicle across the US from SF to NY (that I wrote about during DAC). Tesla will have autopilot in all their cars. One interesting potential change that autonomous vehicles might bring is to ownership. If you could have a car on-demand whenever you wanted one, would you own your own vehicle at all. Your car plan in a decade might be like your cellphone plan today, with various options depending on usage.

HPC is required to provide the back-end for all those mobile devices, typically in large datacenter aka cloud computing. The need for low latency and location awareness means that the mobile device needs to be providing local intelligence, but then low latency connect to the datacenter will be required too. This means that there will be upgrade cycles to all the base stations, of which there are (literally) millions.

TSMC provides a wide range of processes for different types of silicon. The process nodes mentioned here are where TSMC is working on bringing the process up; volume production is one (or sometimes two) nodes behind.

[TABLE] class=”cms_table_outer_border” style=”width: 240px”
|-
| class=”cms_table_outer_border_td” | Application
| class=”cms_table_outer_border_td” | Technology
|- class=”cms_table_outer_border_tr”
| class=”cms_table_outer_border_td” | MEMS
| class=”cms_table_outer_border_td” | 0.13um
|- class=”cms_table_outer_border_tr”
| class=”cms_table_outer_border_td” | Image Sensor
| class=”cms_table_outer_border_td” | 40nm
|- class=”cms_table_outer_border_tr”
| class=”cms_table_outer_border_td” | Embedded flash
| class=”cms_table_outer_border_td” | 28nm
|- class=”cms_table_outer_border_tr”
| class=”cms_table_outer_border_td” | RF
| class=”cms_table_outer_border_td” | 16nm
|- class=”cms_table_outer_border_tr”
| class=”cms_table_outer_border_td” | Logic
| class=”cms_table_outer_border_td” | 7nm
|- class=”cms_table_outer_border_tr”
| class=”cms_table_outer_border_td” | Analog
| class=”cms_table_outer_border_td” | 16nm
|- class=”cms_table_outer_border_tr”
| class=”cms_table_outer_border_td” | High voltage
| class=”cms_table_outer_border_td” | 40nm
|- class=”cms_table_outer_border_tr”
| class=”cms_table_outer_border_td” | Embedded DRAM
| class=”cms_table_outer_border_td” | 40nm
|- class=”cms_table_outer_border_tr”
| class=”cms_table_outer_border_td” | BCD/power
| class=”cms_table_outer_border_td” | 0.13um
|-

R&D overall is up 19% year-on-year from 2014 to 2015. It was $1.9B in 2014 and will be $2.2B in 2015. OIP has grown and now has over 200 PDKs, 7500 technology files and 8500 IP blocks. The wafers enabled by this IP grew at a CAGR of 22% from 2005-14. Capex is up 10-16% from 2014 to about $10.5B to $11B, compared to $9.5B last year. Total capacity is 1.6M 8″ equivalent wafers per month, over 20,000 per day, up 12% year-on-year.

UPDATE: I totally messed up the title of this blog and the computation. It is over 50,000 wafers per day or over 200 per hour.

New processes are ramping faster than ever. N40 ramped in 35 months. N28 ramped in 22 months. N20 ramped in 3 months. N16 is ramping even faster. At this rate volume production will be faster than qual!


The second presentation was by Jack Sun, TSMC’s CTO. I tried to take notes on the processes but there was too much information. I’ll revisit this once I get some slides to work from. But in the meantime, here are a few highlights.

  • N10 will be risk production in Q4 of 2015. Development is on-track.
  • N7 will be risk production Q1 of 2017. SRAM test-chip is functional.
  • 16FFC will be risk production in Q2 2016
  • 16FF+ is in volume production, with a couple of dozen takeouts and 50 more expected before end of year

The key new process coming soon is 16FFC, which is the third generation of 16nm process. Speedup is 65% vs 28nm and 40% vs 20nm. Or a power saving for 70% vs 28nm or 60% vs 20nm. It can go down to 0.55V. TSMC have repeatedly stated that 16FFC will be a long-lived node, which I take to mean that 16FFC will be cheaper per transistor than N28. The design rules are the same so migrating designs and IP should be fairly straightforward. There is a new library coming that will allow operation down to 0.4V, with a focus on minimizing the non-gaussian variation.

N10 has a scale factor of 50% versus 16FF+, with a performance improvement of 20% or a power saving of 40%. There are 3 different Vt and gate-length bias covering a wide range of leakage/speed envelopes. N10 SRAM is yielding well, SERDES runs at 56Gbps with 22% better power efficiency than 16FF+.

N7 has a further speed improvement of 10-50% versus N10, or a power saving of 25-30%. It will be 1.6X the density. Risk production will be Q1 of 2017. Initially libraries for mobile, but new second generation libraries with taller cells for HPC. Special SRAM for HPC too, with 25% better performance. There is an ARM Cortex-A57 test chip showing 40-45% are reduction.

But the roadmap doesn’t end there. TSMC is doing research on Ge FinFET, III-V NFET, gate-all-around nanowires, 2D crystal, directed self-assembly, multi-e-beam direct write, inverse computational lithography. And, of course, EUV. TSMC have achieved 90W source power in-house. ASML have demonstrated 130W. They are working jointly to get all the settings worked out for 125 wafer/hour production.

Other segments. CMOS Image Sensor (CIS):

  • FSI front image sensor
  • BSI back image sensor (the die is thinned and the light comes through the back)
  • BSI/ISP back image sensor flipped onto an image signal processor
  • NIR near-infra-red

MEMS

  • accelerometer
  • pressure sensor
  • motion sensor
  • microphone
  • new gas sensors
  • new biometric sensors

Emerging new memories:

  • eRRAM
  • eMRAM

This is all from my handwritten notes. If you spot errors then correct me in the comments.


The Future of Moore’s Law

The Future of Moore’s Law
by Daniel Payne on 09-17-2015 at 12:00 pm

I’ve lived in Silicon Valley then moved north to the Silicon Forest (aka Portland, Oregon) in 1995, and thankfully we have a lot of high-tech companies here like: Intel, Lam Research (Novellus), Lattice Semi, Qorvo, Synopsys, Mentor, Cadence, Northwest Logic, etc. There’s a global industry organization called SEMI that serves the manufacturing supply chain for micro-electronics and nano-electronics, including:

  • Semiconductors
  • Photovoltaics (PV)
  • High-Brightness LED
  • Flat Panel Display (FPD)
  • Micro-electromechanical systems (MEMS)
  • Printed and flexible electronics

SEMI is over 40 years old and they have a Pacific Northwest breakfast forum on the topic: The Future of Moore’s Law. This event will take place on Friday, October 30th from 7:30AM-11:00AM at Mentor Graphics in Wilsonville, Oregon. I’ll be attending the event, so look forward to my trip report in another blog.

The list of speakers and topics looks quite impressive because it covers a wide range including semiconductor fab, semiconductor equipment, packaging, testing, research and EDA software.
[TABLE] cellpadding=”5″ cellspacing=”5″ style=”border-collapse: collapse; margin: 1em 0px; color: rgb(0, 0, 0); font-family: Arial, Helvetica, ‘Nimbus Sans L’, sans-serif; font-size: 12.9996px; line-height: 15.4305px”
|-
| Time
| Speaker
| Program
|-
| 7:30am
|
| Breakfast & Registration
|-
| align=”left” valign=”top” | 8:00am
|
| valign=”top” | Tim Cleary
Sr. Director of Marketing
Cascade Microtech
Moderator – Welcome Remarks
|-
| align=”left” valign=”top” | 8:05am
| align=”left” valign=”top” |
| align=”left” valign=”top” | Dr. Walden C. Rhines (Biography)
Chairman and Chief Executive Officer
Mentor Graphics
Design Perspectives and Challenges
|-
| align=”left” valign=”top” | 8:25am
| align=”left” valign=”top” |
| align=”left” valign=”top” | Dr. Chris Spence (Biography)
Vice President, Advanced Technology Development
ASML
Lithography Perspective and Challenges
|-
| valign=”top” | 8:45am
|
| valign=”top” | David Bloss (Biography)
Director, Fab Equipment, Global Supply Management
Intel Corporation
|-
| valign=”middle” | 9:05am
| align=”left” valign=”top” |
| valign=”top” | Networking Break
|-
| valign=”top” | 9:25am
| align=”left” valign=”top” |
| valign=”top” | Vamsi K. Paruchuri, Ph.D., Senior Manager, BEOL Technology Research, IBM Research @ Albany Nanotech
7nm and beyond
|-
| valign=”top” | 9:45am
| align=”left” valign=”top” |
| valign=”top” | John Hunt (Biography)
Senior Director, Engineering
ASE Inc.
Innovation in Packaging for Mobile Applications (Abstract)
|-
| valign=”top” | 10:05am
| align=”left” valign=”top” |
| valign=”top” | Dave Towne (Biography)
Senior Technical Analyst
Yole Développement
|-
| valign=”top” | 10:25am
| valign=”top” |
| valign=”top” | Networking
|-
| valign=”top” | 11:00am
| align=”left” valign=”top” |
| Adjourn
|-

Registration
Pricing for Early Bird registration before October 16th is $55.00 for SEMI members and $75.00 for non-members. Where else can you network with such an interesting and influential group of people as this?

REGISTER


The Internet of Sensors

The Internet of Sensors
by Paul McLellan on 09-17-2015 at 7:00 am

The internet of things (IoT) has a number of key attributes: low power, security, connectivity. But almost every IoT application involves sensors of one sort or another. The visual sensors are built using CCD arrays, they are basically low-resolution cameras, but the mechanical ones are typically built using MEMS technology. These includes things like accelerometers, gyroscopes, compasses and even microphones.

For example, a multi-axis accelerometer, along with some clever signal processing software, can count your paces, tell whether you are walking, running or cycling and more. Smartwatches and more limited function Fitbit-like devices use this to track your activity. Before smartphones, one of the main drivers of MEMS accelerometers was their use in automotive sensors for airbag deployment. Also, filters for smartphones which are built using MEMS techniques despite having no moving parts.

See also Acoustic Resonators for RF: MEMS with No Moving Parts

They are also used to measure the G-forces in racing crashes. For example, since 2002, the radio earpieces used in INDYCAR racing also contain a 3-axis accelerometer that is used to measure the effect on the driver’s head in an impact. The data is streamed to a 90-second buffer in the car’s black box. One big attraction of MEMS is that the devices are physically small and light. There is not a lot of spare room inside a smartphone and even less inside an INDYCAR driver’s communication earpiece. The NFL had been doing something similar on a voluntary basis (with the accelerometers in the helmet, players are banned from wearing earpieces) but the program has been suspended for now.

MEMS stands for Micro ElectroMechanical Systems. In practice what this means is building very small mechanical systems using semiconductor manufacturing techniques. Sometimes the MEMS devices are constructed standalone but sometimes the electronics is integrated onto the same substrate. Today MEMS is an $11B business with double-digit growth forecast to reach $21B in 2020.

During SEMICON West there was a MEMS panel sesssion. The entire theatre was full with many rows of people standing behind the chairs (and me behind them). There is a lot of interest in MEMS for IoT.

One challenge in the MEMS market is that there hasn’t really been anything new recently. If you look at the legend to the graph above then every product segment was invented years ago. If the forecast of 50B IoT devices by 2020 is even approximately real, then this will drive volume in the MEMS market even without new product segments. But with many suppliers in each segment it is hard to have true product differentiation. It is important that MEMS devices are quick and easy to design so that they can perfectly be matched to their markets.

One attractive area of future potential growth is medical devices. Implantable (or in your contact lens) glucose sensors and other bio-medical sensors have the possibility of revolutionizing various aspects of medical care. Instead of measuring your blood pressure once in a blue moon when you visit a physician, it could be monitored continuously. Or your heart EEG. I believe that something like this will prove to be the killer app (or rather the keep-you-alive app) that will make us all wear a smartwatch; just getting our text messages on our wrist is not enough.

Historically MEMS devices have been built in specialized foundries and often each device required its own tweaks to the process. But the design and manufacture is becoming more standardized. This is similar to what has happened in IC process technology, where custom processes that used to be common are now vanishingly rare.

However, it is not just manufacturing that is becoming more standardized, the methodologies by which they are designed are too. Coventor was not on the panel session, although by coincidence their booth was only about 20 feet away. MEMS designers generally use two separate approaches to predicting the multi-physics behavior of their device designs. Either highly simplified, hand-crafted models or computationally slow finite-element analysis (FEA). While each of these approaches has merit, neither approach can accurately predict the dynamic behavior of the entire device while taking full account of multi-physics coupling effects. As a result, MEMS designers have historically resorted to expensive and time-consuming prototyping on real silicon. Coventor’s MEMS+ is a different kind of FEA, based on a unique MEMS-specific library of high-order, parametric finite elements. These elements provide the accuracy and generality of FEA and the simulation speed of hand-crafted models. Because MEMS+ models run extremely fast compared to conventional FEA, designers can simulate their entire MEMS device, including gas damping effects and control circuitry. This approach is leading to a more standardized approach to MEMS design. What is really required is the equivalent of a PDK in the IC design world, a kit of pre-defined and pre-characterized basic structures out of which real MEMS devices can be created.

See also Coventor, Lego and IoT in Denmark

Earlier this year Coventor did a webinar (along with ARM and Cadence) Addressing Smart Sensor Design for SoCs and IoT. One of the topics covered was how to create a mechanical MEMS component in Coventor’s MEMS+ environment. You can watch a replay of the webinar here.


IoT does NOT lack tools!

IoT does NOT lack tools!
by Daniel Nenni on 09-16-2015 at 4:00 pm

Rarely does a month go by without acquisitions in the fabless semiconductor ecosystem. Not surprisingly one of the most read pages on SemiWiki is the EDA Merger and Acquisitions Wiki with more than fifty seven thousand views. It really is a nice family tree, one which we (Daniel Payne) are diligent on keeping current. One of the most accretive EDA acquisitions this year in my opinion is Tanner EDA and I will tell you why.

EDA Mergers and Acquisitions Wiki – SemiWiki.com


IoT Lacks Tools, Says EDA Vet

The above is a recent headline of an EETimes article featuring Alberto Sangiovanni-Vincentelli, a Berkeley professor and co-founder of EDA giants Cadence and Synopsys.

“The Internet of Things is just an intermediate step on the way to the sensor dominated world” where the numbers of networked sensors will exceed the population by several orders of magnitude, said Alberto Sangiovanni-Vincentelli, a Berkeley professor and co-founder of EDA giants Cadence and Synopsys.

“My passion is the science of design. I hate seat-of-the-pants design, leaving engineer free to design is a recipe for disaster,” he said. “What excites me is abstracting the meaning of a design and applying it to everything,” he added.

This may certainly apply to Berkeley, Cadence, and Synopsys but it does not apply to Caltech and Tanner EDA. There is a “Brief History of Tanner EDA” which traces their roots back to Caltech and Carver Mead’s seminal textbook on VLSI design HERE. While I understand Alberto’s points, I’m more of a seat-of-the-pants guy and believe the majority of the IoT designs will be from small to medium sized groups of entrepreneurs which will also consist of seat-of-the-pants kind of people.

An example of an IoT design is MEMS (Micro-electro-mechanical systems) where very small devices such as sensors, gyroscopes, accelerometers, and resonators are integrated into an SoC for IoT applications. Tanner EDA is all about MEMS and has a nice video, webinar, and whitepapers to get you started:

Micro-electro-mechanical systems (MEMS), is the technology of very small devices, such as sensors, gyroscopes, accelerometers and resonators. Whether your design is purely MEMS or a combination of MEMS and IC SoC, these tools can meet your most challenging needs. Thisvideo introduces the Tanner tool flow for MEMS design.

Tanner MEMS Tool Suite Overview and Demo; Thiswebinarwill show how Tanner L-Edit MEMS Design and SoftMEMS™ 3D Solid Modeler can be used to shorten your design cycle and improve the manufacturability of your MEMS devices.

Meeting MEMS Design Challenges with Unique Layout Editing and Verification Features–Part 1: A big difference between MEMS layout and IC layout is the use of unique, irregular shapes. Unlike conventional CMOS IC design, where layout shapes are usually Manhattan style (such as rectangles and rectilinear polygon) or polygon with 45-degree edges for routing, MEMS design utilizes a much broader variety of geometries, due to its wide application in mechanical, optical, magnetic, fluidic and biological fields. This two-part paper describes how and why support and ease of use for implementation of irregular shapes, including curves and all-angle polygons, is a critical criterion differentiating MEMS-oriented CAD tools from conventional IC-oriented tools. (Part 1focuses on layout editing;Part 2on verification.)

The bottom line is that Tanner EDA made AMS tools both affordable and easy to use, two things that are critical in the developing IoT market. If you ask me that is why Mentor acquired Tanner EDA and is keeping the brand as a separate business unit, absolutely.


Re-Thinking Server Design

Re-Thinking Server Design
by Alex Lidow on 09-16-2015 at 12:00 pm

The demand for information is growing at an unprecedented rate. Our insatiable appetite for communication, computing and downloading, is driving this demand. With emerging technologies, such as cloud computing and the internet of things, not to mention the 300 hours of video being loaded to YouTube every minute, this trend for more and faster access to information is showing no signs of slowing. What makes the transfer of information at high rates of speed possible are racks and racks of computing equipment – data centers. What is required to run these computing engines is having electrical power delivered efficiently at extremely precise levels to various parts of numerous servers. Converting a distributed 48 VDC to individual processors running at 1 VDC precisely and efficiently is the crux of the challenge for today’s power conversion systems designers. This challenge must be met in order to meet the demands of the information explosion – simply put, the delivery of power needs to keep pace with the expansion of computing power.

How will power conversion systems continue to improve in order to keep pace with the rapid improvements in computing power and the need for efficient data centers? Traditionally, power conversion has been accomplished using silicon-based power transistors, but it is well known that silicon components are reaching their theoretical performance limits. A higher performing base material is needed for semiconductors, if the demand for more and faster communications and computational tasks continue…demands will certainly grow.

Fortunately, in the past few years alternative higher performing materials, such as gallium nitride, have emerged. This material has the potential to perform more than 1000 times better than silicon. From a performance point of view, gallium nitride (#GaN) is one of the most promising technologies and, what is really exciting, is that GaN has been shown to be price competitive with traditional silicon technology – and is already less expensive to produce. This is a disruptive technology.

So we wrote a book…


Given our experience with GaN technology we created new power conversion solutions using GaN devices and made performance comparisons with silicon power transistors traditionally used in power conversion systems.
Overall, this book is an aid to leading edge power systems designers to understanding and adopt gallium nitride power transistors for use in the ever-demanding application of DC-DC conversion for computing platforms, and to examine possible new directions for delivering efficient power to computing equipment within data centers. As the first book to re-examine Datacom power architecture using non-silicon based semiconductors, we examine new power conversion solutions with specific hardware examples.

The book shows how the dramatic improvement in switching performance of gallium nitride transistors as compared to silicon not only permits vast improvements in existing converters, but prompts a fresh look at changing power conversion system architectures.
In very specific ways, this book examines the benefits of enhancement-mode gallium nitride (eGaN®) FETs in power conversion applications with an input voltage range centered around 48 V with load voltage as low as 1 V. Examples of conventional PWM isolated converters, unregulated isolated converters of both hard-switched and soft-switched designs, and finally non-isolated converters using eGaN FETs are considered.

By combining the discussion of power systems architecture and GaN technology performance, we propose, create, and test a new power delivery architecture that takes advantage of the superior performance attributes of GaN. eGaN FETs and integrated circuits have demonstrated their ability to enable new power delivery approaches that can significantly improve overall system efficiency, power density, and cost.

Buy your copy now at: http://epc-co.com/epc/Products/Publications/DC-DCConverterHandbook.aspx


FPGA Prototyping: From Homebrew to Integrated Solutions

FPGA Prototyping: From Homebrew to Integrated Solutions
by Paul McLellan on 09-16-2015 at 7:01 am

Years ago, when FPGA prototyping started, there were no solutions that you could go out and buy and everything was created as a one-off: buy some FPGAs or an FPGA-based board, and put it all together. It was a lot of effort, nobody really knew in advance how long it would take, there was very limited visibility for debug and the whole thing was basically unsupportable. There is more discipline these days but even so, roughly half of all FPGA prototyping is done in a proprietary way that doesn’t scale as designs get larger and lacks more and more desirable features. The other half of the market uses an integrated solution that ties together FPGA-based hardware, the software for getting the design up and running, debug and daughter boards for hardware interfaces.

Last week I talked to Johannes Stahl of Synopsys about the new solution that they are announcing today. He told me that for some time, Synopsys has had a free book, the FPGA-based Prototyping Methodology Manual which was available for download if you answered a few questions. From those questions the top 5 care-abouts turned out to be:

  • mapping to the FPGA
  • debug visibility
  • performance
  • limited capacity
  • turnaround time

Today Synopsys is announcing an integrated solution combining ProtoCompiler software and HAPS-80 hardware that addresses these issues and:

  • reduces time to high-performance prototype to under 2 weeks
  • built-in debug captures over 1000 RTL signals per FPGA at speed, integrated with Verdi
  • 100MHz system performance
  • scalable up to 1.6B ASIC gates, which is around 7B transistors using the usual rules of thumb
  • fast parallelized tool flow

The fast bringup addresses three steps. First, an automated flow including partitioning and automatically inserting all the multiplexors necessary to get signals between the FPGAs. Second, reduced hardware and debug bringup time, and finally getting the performance optimized in multi-FPGA configurations (which is most of them). Bringup is less than two weeks, so not quite the one day that emulation has achieved today, but also not the multiple months that FPGA prototyping used to entail.

The performance increase is driven by new proprietary multiplexing that delivers 2X higher performance, system timing improved by up to 60% and better P&R guidance bringing another 10% timing improvement. Plus, under the hood, there are the latest Xilinx Virtex UltraScale VU440 devices with 26-million-ASIC-gates capacity per FPGA

These mean that for a single FPGA configuration they can achieve 300MHz, for a multiple-FPGA solution that does not involve signal multiplexing it can achieve 100MHz and designs requiring the new high-speed pin multiplexing, 30MHz. These speeds mean, for example, that you can boot a system to the OS prompt in less than a minute. HAPS-80 also enables at-speed operation of real world I/O.

The system is scalable from a single module, delivering 26M ASIC gates up to an enterprise system with 1.6B. Increasingly, in fact, enterprises are putting FPGA prototyping into the datacenter so that it can be shared among different engineers. This can either be done by putting generic hardware on the network, or else for a critical design configuring one or more systems and making them available to be shared.

For people who have been using HAPS-70, the previous generation, everything is backwards compatible. Cables and connectors are the same, the daughter-boards are the same, the form-factor is the same as Haps Trak 3. The software flow through ProtoCompiler is the same.
There are really two somewhat separate reasons for wanting to make use of FPGA protptying solutions. The first is to get the hardware debugged by exercising the hardware with large amounts of realistic software load. The second reason is to get the software development done without needing to wait for prototypes to be manufactured. There are, of course, alternatives to this: emulation, virtual platforms, hybrid emulation. Which is most appropriate depends to some extent on the stability of the design. If the RTL is changing extensively, then bringing up FPGA prototyping is less attractive since it takes a couple of weeks by which time it is obsolete. But when it is close to stable then it is far and away the fastest running solution and so the most attractive. If you need to validate a lot of hardware against a lot of software then this is the sweet spot.

Everything is available now. Faster bringup, higher performance, more visibility, large capacity, accelerated tool flow, backwards compatibility. What’s not to like?

The Synopsys blog on FPGA prototyping, Breaking the Three Laws, is here. The HAPS product page is here.


How to Overcome NoC Validation Multiple Challenges?

How to Overcome NoC Validation Multiple Challenges?
by Eric Esteve on 09-15-2015 at 12:00 pm

NetSpeed has developed NocStudio, a front end optimization design tool helping architects to create SoC architecture bridging the gap with the back end, floor planning and place and route. At the chip level, NocStudio generates a cache-coherent Network-on-Chip (NoC) allowing interconnecting the various CPU, GPU or Acceleration engines (the Cache-Coherent Clusters) with the I/O-Coherent and Noncoherent Agents by the means of Multiple Cache-Coherent Controllers. NetSpeed Gemini coherent NoC is high performance, scalable, and highly configurable for a wide range of applications and provides multiple benefits like routing and channels optimization, much easier place & route and lower power consumption.

But coherency is all about sharing and there is a complex set of protocols to make sure that sharing happens correctly. A bug in any part of the execution can bring down the complete scheme and product. In that sense, NetSpeed’s Gemini NoC is also an additional IP function which needs to be extensively verified. Let’s see how NetSpeed has addressed the verification challenges linked with this highly configurable coherency IP.

The first challenge is linked with the very specific IP nature: for a cache-coherent NoC, the verification space state is massive. Such a NoC is created to avoid deadlocks, fatal and unrecoverable errors, but also rare events requiring large warm-up periods, as bugs manifest after millions of cycles. Running millions of cycles at simulation level translate into weeks, so NetSpeed decided to use the Cadence® Palladium XP acceleration/emulation platform as part of its multi-layered approach to exhaustively verify Gemini NoC IP. The initial implementation and bring up phase takes about one week and this investment generate a quick return as NetSpeed could run in minutes on the emulation platform simulations cases taking one to two weeks on a simulator.

The next challenge comes from the very high configurability of the cache-coherent NoC IP. All these parameters can vary:

  • Number of masters (from 1 to 64)
  • Number of slaves (up to 200 I/O coherent and non-coherent agents)
  • Topology
  • Performance and power requirements (PPA)
  • Quality of Service (QoS) levels

NetSpeed has developed NocWeaver as a systemic solution to address large state space verification. NocWeaver, integrated into NocStudio design flow, is a random NoC generator, able to generate 1000 configurations per night.

NetSpeed decided on a “depth and breadth” verification strategy that used simulation to quickly cover large numbers of configuration and emulation to validate complex time-dependent scenarios.

The next challenge is the need for intelligent coordinated stimulus. The goal is to run realistic coherency testing, to identify false sharing, true sharing and trust zone when coordinated coherency tests will check for index trashing, credit control and delay randomization. Thanks to self-checking stimulus, error detection will be made on a timely manner.

To summarize, emulation with Palladium allows to quickly running hundreds of NoC configurations and easy detection of errors. Because the NoC verification has been built to be deterministic, a bug detected in emulation can be reproducible in simulation and debug is done in simulation, using additional checkers.

The main challenge with highly configurable, flexible and scalable NoC IP is the verification time. NetSpeed has used several techniques to minimize this run time. Cache initialization sequences are different in simulation (caches states are pre-loaded) and emulation where initialization is done with real HW states machines. As well the reset sequences are done by using backdoor in simulation when emulation runs full reset sequences. Finally, exit checks, expensive to verify every time in simulation can be run much faster in emulation.

Cache-coherent NoC IP verification induces multiple challenges (massive state space, large warm-up period, reproducibility, etc.), associated with business challenges to deliver the product on time and being able to build customer confidence in a new and complex IP. Even if taking highly configurable and complex IP like for example PCIe 4.0 protocol, NoC verification add one level of challenge: every SoC is different and keep the product specification “open”, when compared with any protocol specification.

NetSpeed has built a complete verification strategy, based on “depth and breadth”, using emulation to run very fast verification of hundreds of configurations, detect bugs and reproduce it in simulation to easy debug the IP. That’s a high price paid by NetSpeed to release a trustable cache-coherent NoC IP in the field.

From Eric Esteve from IPNEST


All Models Are Wrong, Some Are Useful

All Models Are Wrong, Some Are Useful
by Paul McLellan on 09-15-2015 at 7:00 am

“All models are wrong, some are useful.” This remark is attributed to the statistician George Box who used it as the section heading in a paper published in 1976.

Just for fun I looked up a few semiconductor statistics from 1976. Total capital spending was $238M in Japan and $306M in US and…that’s it, there was nobody else back then (at least according to SIA, I’m pretty sure something was going on in what became ST in Europe for example). It looks like Intel’s process technology had been 10um in 1974 for the 4040 (3000 transistors!) and was 3um in 1978 when the 8086 was released. So probably in 1976 the state-of-the-art was 5um, or 5000nm in today’s terminology.

Anyway, George Box went on to clarify what he meant:Now it would be very remarkable if any system existing in the real world could beexactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law PV = RT relating pressure P, volume V and temperature T of an “ideal” gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules.

My favorite example of different kinds of models being useful for different things are these two:

The model on the left is the sort of plastic kit that I used to put together as a kid (you can still buy them apparently). If you wanted to, for example, measure the wingspan then this would be a good model. Trying to learn anything about aerodynamics, not so much. The model on the right is useless for pretty much anything other than learning about aerodynamics. It doesn’t even look like a real plane. But it flies.

I think the biggest model that we have in semiconductors is what I like to call the digital illusion. We have analog transistors and voltage levels, but we pretend they are digital gates which are 0 or 1 and have a delay that can be captured in a few parameters. When I started in EDA, we didn’t even model input and output slopes. We would characterize a gate by using SPICE and putting a step function on the input and seeing when the output reached some threshold, that was the delay. Then we went to adding slopes, so that we would measure the delay from when the input rose to 50% to when the output fell to 40% (or 60% if it was rising). It turned out some gates reached 40% on the output even before the input reached 50% so they had negative delay. So we lifted up the corner of the rug, swept the dust underneath, and set that value to zero. Then we had to start to model interconnect resistance…

But even today we still model gates as being digital with a fairly simple delay function. When we do signoff we get a bit more complex and start to model more of the analog effects to make sure our digital illusion hasn’t broken down. If we had to model a billion transistors as genuine analog devices using circuit simulation then we would never be able to design a microcontroller, never mind a smartphone application processor or a server microprocessor.

Of course we also set bounds on the environment where we expect the model to work. If you set the power supply voltage to 0.1V we don’t expect the digital illusion to tell us how the silicon would actually behave. But in the normal operating range, luckily for us, the digital illusion seems to hold and people who don’t even know how to run SPICE can confidently write thousands of lines of SystemVerilog knowing that these will accurately be transformed into real analog transistors about which they know next to nothing.

Now that’s a model that is useful.


Mongoose: The Making of Samsung’s Custom CPU Core

Mongoose: The Making of Samsung’s Custom CPU Core
by Majeed Ahmad on 09-14-2015 at 4:00 pm

Samsung is seemingly ready to move to a new milestone in its brief but exciting system-on-chip (SoC) history: a custom CPU core codenamed Mongoose. It’s going to be based on ARMv8 instruction set and is expected to outperform the Exynos 7420 application processor that Samsung unveiled this year. There are some media reports which suggest that Samsung has been working on its own CPU core since 2011.

Samsung’s current Exynos 7420 chipset—used in the Galaxy S6 and S6 Edge phones—has turned out to be a powerhouse processor. It’s the first mobile SoC built on 14nm node and has been reportedly 30 percent to 35 percent more efficient than most application processor in the market.


Exynos M1 is making a shift to a custom core called Mongoose

More details are emerging about Samsung’s next chipset called the Exynos M1. Below are some of the key highlights:

• Exynos M1 is going to be built on a 14nm FinFET manufacturing process.
• The new 64-bit chips will have clock speeds of up to 2.3 GHz.
• Exynos M1 might utilize a Heterogeneous System Architecture.
• The new mobile SoCs will feature Mali-T880 GPU core from ARM.

There are even leaked benchmark scores that show Exynos MI way ahead of rival mobile SoCs such as Qualcomm’s Snapdragon 820, Huawei’s Kirin 950, LG’s Nuclun 2 and MediaTek’s Helio X30. Samsung’s new mobile chipset scored 2,136 points in GeekBench’s single-core test and 7,497 for the multi-core test, ahead of Kirin 950’s 1,909 and 6,096 points and Snapdragon 820’s 1,732 and 4,970 points, respectively.

Moreover, Exynos M1 scored 1,698 and 5,263 points in the power-saving mode and 1,323 and 3,489 points in the ultra power-saving mode, respectively. Then, according to a preliminary GeekBench results, Exynos M1 has accomplished a single-core score of 2,136, which is roughly 45 percent better than the single-core result of its predecessor, the Exynos 7420, which scored 1,495 points.


Will Samsung Galaxy S7 have the in-house Exynos M1 or Qualcomm’s Snapdragon 820?

That raises an interesting question: While Exynos 7420 has been a stellar mobile chipset, Exynos MI looks way ahead of it. That premise doesn’t go well with the media speculation that Samsung the smartphone maker might go for Qualcomm’s Snapdragon 820 chipset instead of its own mobile SoC to power its upcoming premium Galaxy S7 handset.

More details will be available about Samsung’s new mobile SoC in the coming months. The Exynos M1 application processor is expected to be released in early 2016.

Also read:

3 Key Frontiers for Samsung’s Next Mobile SoC

Why Qualcomm Lost Samsung and Will Get Them Back