BannerforSemiWiki 800x100 (2)

nVidia: Virtual Platform/Emulation Hybrid

nVidia: Virtual Platform/Emulation Hybrid
by Paul McLellan on 11-05-2013 at 11:57 am

I was the VP marketing at VaST Systems Technology and then at Virtutech. Both companies sold virtual platform technology which consisted of two parts:

  • an extremely fast processor emulation technology that actually worked by doing a binary translation of the target binary code (e.g. an ARM) into the native instruction set of the server on which the emulation was running (usually an x86 workstation of some sort). This was done dynamically as the code was executed in the same way as JIT compilers for Java (and other bytecodes) work
  • a modeling technology for other devices in the system (or on the chip if it was an SoC design) along with libraries of pre-existing models

Both companies were moderately successful at selling their technology, VaST especiallly in automotive and Virtutech especially in networking and base-stations. VaST ended up being acquired by Synopsys and Virtutech by the Wind River division of Intel.

But the big hurdle to using the technology was the need to create models for all the “other” devices. Everyone loved the processor technology and found its performance unbelievable. But if it took 3 months of the 6 months you were going to save to develop those models, the ROI was a lot less compelling. Plus blocks were always changing so there was always the problem to ensure that the models matched the RTL (or whatever representation was being used). One answer would have been to use emulation and just run the RTL fast. But in the mid-2000s emulators were million dollar boxes used at only a handful of the biggest semiconductor companies, certainly not available to embedded software developers. But the economics have changed. By some estimates, emulation is now the cheapest way to do simulation per cycle, beating even verilog simulators on big server farms, which has been the simulation infrastructure of choice for a long time.


At ARM TechCon last week nVidia were reporting on how they had used Cadence’s hybrid virtual platform and emulation system to bring up their latest Tegra chips.

You might think that with a big emulator lying around then you just load up the RTL for the ARM processor into the box too. But the problem is that to do real software development requires that you first boot an operating system before you get to run the code you are really working on. Linux takes 1B instructions to boot, Android takes 20B and Windows RT 50B. Booting Windows would take days on the emulator. Instead the fast processor models are used along with the rest of the chip loaded in the emulator. Obviously there is some clever glue making all this work cleanly together as the Palladium/VSP Hybrid.


Results? Linux boots in 2 minutes versus 45 minutes using just the emulator. Android in 45 minutes versus hours. Windows RT in 90 minutes versus days. Nobody knows how many days since nobody bothered to try.

Ultimately, though, what is important is whether all this made any difference to the design. Here is nVidia’s experience:

  • eliminated reliance on other pre-silicon platforms
  • found some software race conditions
  • found some memory management bugs
  • found some code completeness issues

Then silicon came back from the fab. Of course system bringup requires running the software on the actual silicon:

  • this was the smoothest bringup they had done
  • software was ready to demo product at Speed-Of-Light (SOL)
  • fewer bugs meant they could put more effort into optimizing for power and performance.

The nVidia presentation is on the Cadence website here.


More articles by Paul McLellan…


Synopsys Creates a High-performance ARC Core

Synopsys Creates a High-performance ARC Core
by Paul McLellan on 11-05-2013 at 10:00 am

ARC is a family of configurable processors. Originally it was a standalone company in the UK (what is it with the UK and processor cores?) spun out from Argonaut Software. The A in ARC stood for Argonaut originally. ARC International was acquired by Virage and then Virage was acquired by Synopsys so now it is part of Synopsys Designware IP offering.

The family of cores has always been focused on low power and on configurability, but there has not been a high-performance core in the offering. Today Synopsys announced a new generation of high-speed processors, following a sneak preview at the Linley Microprocessor Conference a couple of weeks ago:

  • advanced ARCv2 architecture
  • 18% improvement in code density
  • Real-time and high-end embedded focus
  • >3000 DMIPS per core at under 60mW, 0.15mm[SUP]2[/SUP]
  • power efficient 10-stage scalar pipeline
  • out of order execution
  • branch prediction
  • late-stage ALU to improve throughput
  • 64-bit loads and stores to move data
  • 64-bit multiply and multiply-accumulate
  • hardware integer divider (4 to 19 cycles)
  • IEEE 754 compliant FPU with single/double precision
  • ECC protection for all memories in processor
  • I/O coherency for DMA and peripherals

From an implementation point of view, the new core makes less severe demands on the type of memories required on the chip. Only single-port SRAMs are required, even for the branch prediction cache. Two pipeline stages are dedicated to access CCMs and caches.


Compared to the older ARC cores such as the ARC700 there is a lot more throughput per megahertz and 50% higher CoreMark per megahertz. The maximum operating frequency is up from 1.1GHz to 1.6GHz. Also importantly, it can run and function at a clock rate as low as 2MHz, so offers lots of scope for dynamic frequency scaling for power saving.

The core is optimized for the high end embedded market such as automotive driver assist, solid state drives, digital TV, home networking and so on. Many of these have strong real-time requirements. So to get it:

  • single-cycle peripheral and memory access
  • fast context switch with a second register file
  • configurable to hit sweet-spot for performance vs power
  • custom instructions (as all ARC processors always have)
  • robust interrupt architecture with up to 240 interrupts, 16 levels of priority, auto save and restore
  • optional ECC hardware on all memories (to correct single event upsets etc)

Of course there is a full development tool chain. ARChitect optimized the processor and configures it. There is a range of compilers. ARC plays with virtual prototypes. Out of the box it has support for Linux and Android.

More details on ARC processors are on Synopsys’ website here.


GlobalFoundries and ARM

GlobalFoundries and ARM
by Paul McLellan on 11-04-2013 at 4:56 pm

GlobalFoundries had several interesting things at the ARM TechCon last week. Firstly, GlobalFoundries won the best in show award in the chip design category recognizing the best-in-class technologies introduced since the last TechCon.

Earlier in the summer GlobalFoundries and ARM announced the ARM Cortex-A12 processor, for which GlobalFoundries was the foundry launch partner. The A12 is expected to be a very high volume processor since it is targeted at the low end of the smartphone market (which cannot afford a Cortex-A57 class processor). The low end of the smartphone market is expected to be the fastest growing going forward, the high end already being largely saturated.

Donar is the name GlobalFoundries use for the family of A12 test chips. They are based on Semper, which is an older family of A9 test chips taped out several times in 28nm and 20nm. They were created using a Cadence flow. In a joint presentation, GF and Cadence gave details on what was done. The design was a quad-core A12 and will tape-out imminently in GF’s 28nm-SLP process. It was the first experience for with the A12 for the entire project team of ARM, Cadence and GF.


The work was split up as follows. ARM developed the Cortex-A12, optimized POP components and the initial reference methodology. Cadence supplied the RTL to GDS2 implementation flow, methodology and tool support. GlobalFoundries supplied the Donar test chip design, 28nm-SLP MPW, full Cadence design flow enablement, development of a set of fast cache instances and design resources. The whole design was done on a very tight schedule between May and October.

Another interesting design that GF were showing in the exhibit hall is a 2.5D interposer-based design that was jointly created along with OpenSilicon. The design is called Avatar and consists of two ARM-based die in 28nm on a 65nm silicon interposer. This was a pipe-cleaner design to shake out problems with this sort of design, rather than anything that is expected to enter volume production.

Since there were no acceptable I/Os for this sort of design, OpenSilicon developed specialized die-to-die I/Os (which, of course, are now available for other designs). The problem with “normal” I/Os in this application is that the ESD requirements for full chip I/O is much too high, the drive requirement is too high since it is designed for a PCB trace, and the I/O needs to be small enough to fit under the microbump pitch.


The interposer has 4 front-side and 1 back-side layers of metal and through-silicon-vias (TSVs). The two die are assembled on the interposer. A lot of additional testing needs to be done on the die at wafer sort compared to a normal assembly because of what is called the “known good die” issue. If a faulty die slips through then not only does that bad die get discarded, an interposer and a second good die are also wasted.

2.5D interposer based designs allow different technologies to be mixed in the same design (although that was not done with Avatar). SoC with very wide memory, SoC with analog and high-speed interfaces, or even SoC and FPGA. As we move below 20nm it is hard to put analog on the same die and using a mature process that is optimized for analog design and then putting two or more die on an interposer is an attractive solution.Watch a video about GlobalFoundries 28nm here.

More articles by Paul McLellan…


GSA Award Nominees Announced

GSA Award Nominees Announced
by Paul McLellan on 11-04-2013 at 4:32 pm

Today GSA announced the award nominees for the 2013 awards. They will be presented at the GSA Award Dinner on Thursday December 12th at the Santa Clara Convention Center. The keynote will be given by Steve Forbes.

Recently it was announced that the 2013 Dr. Morris Chang Exemplary Leadership Award winners are CEO and Chairman, Dr. Sehat Sutardja and President and Co-founder, Ms. Weili Dai of Marvell Technology Group Ltd. (Marvell).

The evening’s program will recognize leading semiconductor companies that have exhibited market growth through technological innovation and exceptional business management strategies. The award categories and nominees (in alphabetical order) are as follows:

Start-Up to Watch Award

  • GEO Semiconductor, Inc. (GEO)
  • Quantenna Communications, Inc.
  • Tabula, Inc.

Most Respected Private Semiconductor Company Award

  • Aquantia Corporation
  • Cortina Systems
  • SiTime Corporation

Most Respected Emerging Public Semiconductor Company Award (Achieving $100 to $250 Million in Annual Sales):

  • Ambarella, Inc.
  • Cavium, Inc.
  • InvenSense, Inc.

Most Respected Public Semiconductor Company Award (Achieving $251 Million to $1 Billion in Annual Sales)

  • Dialog Semiconductor
  • Microsemi Corporation
  • Silicon Labs

Most Respected Public Semiconductor Company Award (Achieving Greater than $1 Billion in Annual Sales)

  • MediaTek Inc.
  • QUALCOMM Incorporated
  • Xilinx, Inc.

Best Financially Managed Semiconductor Company Award (Achieving Up to $500 Million in Annual Sales):

  • Audience, Inc.
  • InvenSense, Inc.
  • RDA Microelectronics

Best Financially Managed Semiconductor Company Award (Achieving Greater than $500 Million in Annual Sales)

  • Maxim Integrated Products, Inc.
  • Semtech
  • Xilinx, Inc.

Analyst Favorite Semiconductor Company Award Nominees (chosen by analyst Joseph Moore of Morgan Stanley)

  • Ambarella, Inc.
  • Avago Technologies
  • Cavium, Inc.

Analyst Favorite Semiconductor Company Award Nominees (chosen by analyst Quinn Bolton of Needham & Company, LLC)

  • Ambarella, Inc.
  • Inphi Corporation
  • MaxLinear Inc.

Outstanding Asia Pacific Semiconductor Company Award

  • MediaTek Inc.
  • Samsung Electronics, Co., Ltd.
  • Spreadtrum Communications Inc.

Outstanding EMEA Semiconductor Company Award

  • CSR plc
  • Dialog Semiconductor
  • NXP Semiconductors

You can make reservations to attend the Awards Dinner here.


More articles by Paul McLellan…


Addressing Power at Architectural and RTL Levels

Addressing Power at Architectural and RTL Levels
by Paul McLellan on 11-03-2013 at 4:30 pm

Major power reductions are possible by reducing power at the RTL and system levels, and not just at the gate and physical level. In fact, as is so often the case in design, changes can have much more impact when done at the higher level, even given that at that point in the design there is less accurate feedback about changes. Later the impact of a change is known much more accurately but the difference any change can make is smaller. In fact 80% of chip power is determined by the RTL level and above, and the maximum difference that can be made by clever synthesis and clock tree gating is 10-20%.


Power is, of course, a huge issue in SoC design. Not just for mobile and other battery powered devices, but also for tethered devices like servers and routers (a lot of the cost of a datacenter is cooling) or home DVRs and televisions (where fans are not acceptable). And all chips have potential thermal issues if the power is too high, from reliability to package cost.

There are many changes that can be made at the RTL level or above. Here are some of the most important ones:

  • System architecture level

    • SW-HW partitioning
    • OS/firmware-level APIs for standby/sleep modes
    • Single core vs Multi cores
    • Bus and memory architecture
    • Communication vs computation tradeoffs
  • Micro-architecture level

    • Frequency and voltage scaling
    • Memory/register file banking
    • Auto-inferencing of appropriate FIFOs and other communication channels
  • RTL level

    • Combinational clock gating
    • Sequential clock gating
    • Power gating

Below the RTL level the main optimization are multiple voltage domains, multiple threshold libraries (high performance on critical path, low power otherise) and clock network optimization.


On Tuesday November 19th Calypto is hosting a webinar. Abhishek Ranjan, who is a senior director of engineering, will present how to use Calypto’s HLS product Catapult along with the PowerPro RTL level power optimization tool, to reduce power at the architectural and RTL levels. The webinar is titled Techniques for Reducing Power at Various Levelsand will last about an hour. It starts at 11am Pacific Time. He will discuss dynamic voltage and frequency scaling (DVFS), power-gating, bus-data encoding, low power arithmetic architectures, memory-banking, sequential clock/memory gating and other micro-architectural techniques.

Details and registration are here. November 19th at 11am Pacific.

And a reminder about a Calypto webinar next week How to Maximize the Verification Benefit of High Level Synthesis with SystemC at 11am on Tuesday November 5th. Details and registration here. And for anyone from outside the US (or in Arizona!) we come off daylight savings time a couple of days before so make sure to log in at the correct time.


More articles by Paul McLellan…


Fabless: The Transformation of the Semiconductor Industry

Fabless: The Transformation of the Semiconductor Industry
by Daniel Nenni on 11-03-2013 at 4:00 pm


As I have mentioned before, Paul McLellan and I are writing a book on the history of the fabless semiconductor industry. There is a preview available HERE, it will initially be sold as an e-book on SemiWiki and put into print early next year. Working with Paul McLellan and Beth Martin on this was an amazing experience. The research, the writing, the “constructive” criticism of everyone who participated, it was time consuming and exhausting at times but well worth the effort, absolutely. We truly wrote this book for the greater good of the fabless semiconductor industry.

Like the fabless ecosystem itself, writing this book was a collaboration of monumental proportions with contributed chapters from the leading companies that made all of the cool mobile electronics stuff we have today possible. The book starts with the invention of the transistor and chronicles the evolution of the fabless semiconductor ecosystem up to where we are now. The final chapter is forward looking so we need your help (crowdsourcing):

WHAT’S NEXT FOR THE SEMICONDUCTOR INDUSTRY?

We’ve talked a lot about the history of the semiconductor industry, from its nascent beginning with the invention of the transistor and integrated circuit, through the changing business models and technological innovations that shaped the world of electronics we have today. But where are we heading?

Currently, smart phones and tablets powered by highly-integrated SoCs are the largest market driver for semiconductor technology. Even so, over the past 5 years the semiconductor industry has seen relatively flat revenue growth. The following passages are from industry luminaries sharing their vision of what will take the semiconductor industry to the next level of innovation and financial success.

This is your chance to be part of a best-selling book chronicling the transformation of the semiconductor industry. Be an industry luminary, send me a maximum of 300 words and be part of history! If we include your passage in the book you will get not just fame but also good fortune (a free copy of the book). Sound reasonable? I need them by November 29[SUP]th[/SUP] and you know where to find me.

Now available on Amazon.com


Webinar: IP Lifecycle Management: What is it, what problems does it solve?

Webinar: IP Lifecycle Management: What is it, what problems does it solve?
by Daniel Nenni on 11-03-2013 at 11:00 am

SoC’s are now dominated by IP blocks sourced either from 3rd parties or internal design teams. This means that IP is now critical to the success of the SoC, yet it is part of the design that teams have the least control over, or visibility into. Most design teams utilize at best ad-hoc methods to manage this IP, and the few that utilize some form of formal process tend to limit it to the management of the underlying IP data.

IP Lifecycle Management follows IP from creation through qualification and distribution into final SoC integration. As the IP passes through each stage it is tracked and managed to give a very high-level of visibility into the design and the IP status. Advanced analytics integrated throughout enable potential problems to be identified early and resolved quickly.

Formalizing and codifying this IP management process significantly reduces the risk of bad IP impacting the final design, eliminates unnecessary rework to significantly reduce design and verification resource requirements, and improved internal design reuse.

In this webinar IP Lifecycle Management will be defined, each aspect of the lifecycle will be introduced together with the problems it solves and how it benefits design teams. The webinar will utilize practical examples running on the ProjectIC platform to demonstrate the benefits of IP Lifecycle Management. In addition to the examples their will the opportunity for Q&A with the presenter.

ProjectIC is an IP Lifecycle Management platform that is methodology agnostic and can be easily integrated into any design flow. It is built on top of Methodics industry leading and proven IP Data Management Platform to deliver the capabilities needed to manage IP driven SoC Designs.

The webinar will take place on Tuesday 5th of November at 1PM Pacific Standard Time – to register for the webinar please visit http://www.methodics.com/11052013-webinar.

lang: en_US

More Articles by Daniel Nenni…..


SEMICO Impact 2013 Next Wednesday

SEMICO Impact 2013 Next Wednesday
by Paul McLellan on 11-01-2013 at 5:54 pm

Semico’s IMPACT 2013 IP event is next Wednesday November 6th at the DoubleTree Hilton in San Jose.

Here’s what you get if you attend. Keynotes from:

  • Kurt Shuler of Arteris. Give him some hard questions about Qualcomm who have just acquired their technology and engineering team
  • Chris Rowen of Tensilica, recently acquired by Cadence
  • Steve Teig of Tabula, building FPGAs with Intel 22nm as their foundry
  • John Koeter of Synopsys
  • Robert Krohn of Cisco

A panel on IP Ecosystem Solutions for Complex Systems moderated by Mahest Tirupattur of AnalogBits with panelists Jason Polychoronopoulos of Mentor, Warren Savage of IPextreme, Chris Rowen of Tensilica/Cadence and Suk Lee of TSMC.

A panel on Designing for New World Applications moderated by Kent Shimasaki of Infinitedge with panelists Ron Moore of ARM, Grant Pierce of Sonics, Steve Singer of Inside Secure and John O’Neill of Skyworks.

A technical track hosted by Constellations (IPextreme) including talks from (surprise) IPextreme, Recore Systems, Ridgetop group, Certus Semiconductor and Atrenta.

And not only is there such a thing as a free lunch, there is a free breakfast and a networking reception afterwards including a chance to win an iPad min and a NEST thermostat.

The full agenda is here. The registration page is here ($75 registration closes Monday at 5pm, a few registrations will be accepted a the door for double $150).


More articles by Paul McLellan…


Using OTP Memories for High-performance Video

Using OTP Memories for High-performance Video
by Paul McLellan on 11-01-2013 at 4:15 pm

One of the most demanding applications where semiconductors are used is in the various applications of digital video from tablet computers, to home entertainment. iPad with a retina display is already at high-definition (HD) resolution (2048×1536) and all indications are that video is racing towards what is known as 4K resolution, also known as ultra high definition, 3840×2160 pixels which is roughly four times the pixels and so four times as demanding as HD.

One of the leaders in digital TV processing (and other home control and connectivity applications) is Sigma Designs. Coincidentally they were also our lead beta customer when I was CEO at Envis. Doing high performance video is hard enough, but doing it within a tight power budget is a real challenge. Our power-reduction tool Chill wasn’t compelling enough for them to adopt it but they just announced their selection of OTP (one-time-programmable) memory supplier and it is Sidense.

Sigma have signed a multi-year license to use Sidense SHF OTP macros. These are used in advanced processes from 40nm down to 16nm FinFET. Sigma will start by using SHF in a 40nm implementation, which has already been qualified in G and low-power/low-leakage variants, for set-top-box (STB) and digital TV applications. In some sense this is a continuation of an existing relationship: Sigma have been a customer of Sidense since 2008 and have products in production using older technologies.

The factors that make Sidense attractive for these applications are:

  • small area (so low cost)
  • no mask or process changes to standard digital process (so low cost)
  • high security: there is no visible difference between a 0 and 1 bit cell, even etching the die down, and no charge is held on the bitcell
  • advanced node coverage (20nm now, 16nm in qualification)
  • high performance at low power (both active and standby)


An SHF module consists of an OTP core (the bitcell array), charge pump hard macro for in-field programming (generates the non-standard voltages required), device access port (DAP) providing access through 16/32 bit parallel bus and SBPI, which provides serial and byte-wide interfaces with SPI-compatible protocols. Read speeds are as low as 20ns depending on configuration and process and (at 28/20nm) a 1us/bit write speed.


At the recent TSMC OIP meeting, Sidense revealed the advanced process roadmap for design, characterization and qualification (above). FinFET structure aligns well with Sidense OTP implementation (which is antifuse and basically depends on forming a tiny crystal in the gate-oxide, technically known as dielectric breakdown induced epitaxy). For all nodes down to 40nm, IP9000 qualification is complete. It is in progress for 28nm and 20nm. 16nm is at the test chip stage.

There is more information about Sidense SHF memories here.


Is FD-SOI Smarter than Moore?

Is FD-SOI Smarter than Moore?
by Eric Esteve on 11-01-2013 at 12:03 pm

If you have read the excellent article from Paul McLellan, you know about FDSOI as a technology, so I will not come back to FDSOI device, and the comparison with FinFET in term of device topology, doping level and so on. If you missed it, I would recommend you to read this article, as well as the many comments (all of them being relevant). It’s good to know that Semiwiki readers are so smart! Let’s have a look at the FDSOI features making the technology a smart choice, smarter than bulk at the same technology node:

  • At first, FDSOI is cheaper than Bulk, as you need less mask levels to process FDSOI devices. Some people still think it’s more expansive, as they have in mind the extra cost of 10% of the SOI wafer. But, when this wafer has been completely processed, the final cost is lower.

  • FDSOI is faster than Bulk. If you take some time to decrypt the above picture, you will see that, for the same power budget, an ARM processor will reach slightly above 1.2 GHz on 28LP, 1.4 GHz on 28G and almost 1.6 GHz on 28FDSOI technology.

  • FDSOI is cooler than Bulk. If you want your processor to consume as low power as possible, but still exhibit good performance, reaching for example 1.5 GHz, you will compare 28FDSOI @ 0.9V with 28G @ 0.85V (28LP is already “out”) and see that there is almost one order of magnitude in term of leakage power.

So, FDSOI is clearly an attractive technology, especially for wireless or multimedia AP, as it allows minimizing drastically the power budget (by almost an order of magnitude for the leakage power), or increasing the processor core frequency. In fact, using FDSOI is equivalent to design on one technology node back (28nm instead of 20nm), and benefit from lower mask cost and process complexity.

As I told you before, Semiwiki readers are pretty smart, and I have extracted two comments:
“Silicon-proven IP is prerequisite to the success of any technology. So, ST needs close collaboration with fabless companies, or maybe even opening up some of their designs or at least their experiment in designing with FDSOI to other parties.”

“One of the main pre-requisite for success of FD-SOI or FinFET will be availability of IPs. Most of the cases the selection of foundry, even process nodes depend on the availability of silicon proven IP.”

Because STMicroelectronics is a chip maker, they know how important it is to have the right IP port-folio available for SoC design on FDSOI technology. They have managed IP migration, to support their own SoC design, and propose the following approach, extracted from the White Paper “Planar fully depleted silicon technology to design competitive SOC at 28nm and beyond’:
At SOC level, migrating an existing design from bulk to planar FD represents an effort comparable to half-node migration, for example from 45nm to 40nm. In other words, it brings very worthwhile benefits at reasonable efforts. A typical approach could be:

  • CPU and GPU: the main objective is maximum peak performance and the design is re-worked, making the most of FBB;
  • Other SOC blocks: the main objective is power savings, by reaching the target operating frequencies at lower Vdd; there is no change to block design, Timing Analysis is re-run and ECO (Engineering Change Order) is performed to fix violations if needed.
  • Other IP such as IOs and PHY blocks are swapped for their planar FD counterpart.

As far as I am concerned, I think that the availability of the right IP on FD-SOI will be very important for the adoption of this technology. STM seems to be on line with this position, as Giorgio Cesana, Director of Marketing and Communication, STM, will present at IPSOC Grenoble, on November 6[SUP]th[/SUP]a paper titled: “FD-SOI Technology for Efficient SoC: IP Development examples”. I definitely plan to attend, and I will give you a feedback about it!

From Eric Esteve from IPNEST

More Articles by Eric Esteve …..

lang: en_US