Synopsys IP Designs Edge AI 800x100

Using Virtual Platforms to Make IP Decisions

Using Virtual Platforms to Make IP Decisions
by Paul McLellan on 04-27-2013 at 10:48 am

Most SoC designs these days consist largely, but not entirely, of purchased IP blocks. But there are lots of tradeoffs involved in selecting IP blocks, and since those tradeoffs change with process node, even decisions that seem “obvious” based on the last generation of the design, may not be so clear cut. Even if you have already decided, due to existing software, to use (say) an ARM processor, there are a number of potential processors that could do the job and hit different performance/power points. Not to mention area and license fees.

Caches are a notoriously hard area to get right. Too much cache and you waste too much area and leakage power, too little and the performance is not what you expect. Not to mention the power, since cache misses are a lot more expensive from a power point of view than hits. Caches are very complex these days, with multiple masters, GPUs, snooping for coherency and so on. The caches also interact very closely with decisions made about interconnect (buses, NoC etc) in non-obvious ways.

Another difficult area is software/hardware tradeoffs. In a prior version of a design, it might have been necessary to use a special handcrafted RTL block to achieve the performance necessary. But in a later process node this might be better implemented either in software on the main control processor or perhaps in software on a specialized offload processor.


So how do you make these decisions? Obviously it is too complicated to actually put all the RTL together for the entire chip just to decide if that is really the RTL you need. Besides, RTL is too slow to run a full load of software (Android for example) and these days the purpose of many SoCs is just to run the software as efficiently as possible so it is not possible to do an analysis just looking closely at the hardware without actually running realistic software scenarios.

The answer is to use virtual platforms, which can quickly be configured to swap IP blocks in and out, vary the size of the cache, switch from ARM Cortex-A15 to A-9 and so on. And all while running fast enough that you can boot the operating system, run apps, run standard benchmarks, run test software and generally perform analysis at whatever depth you want.

Then, when you have made all your decisions, you have a virtual platform ready to deliver to the software team so that they can start work in parallel with the SoC design. Since there are typically more software engineers than IC design engineers on a project these days, this is especially important. Without having a virtual platform, it is easy for software engineers to “pretend to program” since it is impossible to be effective without being able to run the code immediately.

Carbon CTO Bill Neifert’s blog on this subject is here. Andy Meier’s blog on CPU selection is here.


GSA European Executive Forum

GSA European Executive Forum
by Paul McLellan on 04-27-2013 at 9:58 am

The first week of June is DAC in Austin of course. But over in Europe, the Wednesday and Thursday of that week, June 5-6th is the GSA European Executive Forum, bringing C-level executives together from all over Europe. It actually runs from 2pm on Wednesday until about 2pm on Thursday including a VIP dinner on Wednesday evening sponsored by eSilicon. The overall theme isThe Path to Global Growth: Optimism, Opportunity and the Role of Europe. The conference is held at the Sofitel Munich Bayerpost.

The first session is about wireless and the Internet of Things (IoT). It opens with a keynote on Reimagine Wireless: The Internet of Things Comes of Age which will present a vision of the wireless landscape and the key factors spurring its revolution such as LTE deployment and machine-to-machine communication.

This is followed by a panel moderated by Aart de Geus (who must be missing at least part of DAC to be there) ranging over the whole topic of wireless, IoT, Europe and so on. The panelists are:

  • Stan Boland, CEO, Neul
  • Matthias Bopp, CEO, Micronas
  • Graham Budd, COO, ARM
  • Maria Marced, President, TSMC Europe
  • Henri Seydoux, Founder, Chairman & CEO, Parrot

After that is a fireside chat in which Joep van Beurden, CEO of Cambridge Silicon Radio (although I believe they are officially just CSR these days) interviews Rick Clemmer, the CEO of NXP Semiconductors which is, of course, the spin-out from Philips of the old Philips Semiconductors (both companies always with an ‘s’ on the end, don’t forget).

There is a reception and then the aforementioned dinner.

Thursday morning starts with a keynote on Enhancing Automotive Safety and Efficiencyby Mark Basten, Group Chief Engineer, Electrical & Electronic, Tata Motors European Technical Centre. Tata is of course based in India but in Europe is probably most famous for being the current owner of Jaguar and Land Rover.

That is followed by a panel session moderated by Ingo Schroeter, Partner and Managing Director, The Boston Consulting Group on Reengineered and Remodeled: The Connected Car. The panelists are:

  • Hans Adlkofer, VP Automotive System Group, Infineon Technologies
  • Fabio Marchio, Group VP, GM, Automotive Microcontroller and Infotainment Division, STMicroelectronics
  • Lars Reger, VP, Head of Strategy, New Business and R&D, Automotive Business Unit, NXP Semiconductors
  • Hanns Windele, VP, Europe and India, Mentor Graphics

The topic then switches from Automotive to Energy with a keynote from André-Jacques Auberton-Hervé, Chairman & Chief Executive Officer, Soitec on Leading the Sustainable Energy Future.

There is then a panel session, moderated by David Baillie, CEO, CamSemi on Smart Energy Management. The panelists are:

  • Kourosh Boutorabi, Head of Energy Management Group, Atmel
  • Sandro Cerato, VP, Applications and System, Member of the Board, Power Management & Multimarket Division, Infineon Technologies
  • Eugen Mayer, Managing Director, Power Plus Communications
  • Dr. Hans Stork, CTO & Senior Vice President, ON Semiconductor

The conference wraps up with lunch sponsored by GlobalFoundries.

Full details are here.


TSMC ♥ Solido

TSMC ♥ Solido
by Daniel Nenni on 04-27-2013 at 8:00 am

Process variation has been a top trending term since SemiWiki began as a result of the articles, wikis, and white papers posted on the Solido landing page. Last year Solido and TSMC did a webinar together, an article in EETimes, and Solido released a book on the subject. Process variation is a challenge today at 28nm and it gets worse at 20nm and 16nm so you had better be ready.

Solido and TSMC recently completed qualification of Solido Variation Designer for 20-nm memory and standard cell designs. Solido’s software provides accurate, scalable and verifiable 6-sigma design coverage on TSMC 20-nm designs in orders-of-magnitude fewer simulations than Monte Carlo analysis.


Memory bitcells and sense amps are the first design blocks to take advantage of each shrink in process technology. Transistors are now so small that atomic variances directly impact design variation. Monte Carlo, as the standard for statistical analysis, has not been able to scale to the demands of memory design. Alternate solutions are inaccurate, scale poorly and are difficult to verify.

Consider a 256 Mb SRAM design, which consists of 256M bitcells and 64k sense amps. For the SRAM to yield, the bitcell yield would need to be 6-sigma, and sense amp yield would need to be 4.5-sigma. However, verifying to this sigma would need billions of Monte Carlo samples which is far too slow.

Solido’s High-Sigma Monte Carlo (HSMC) was shown to overcome the key drawbacks of traditional Monte Carlo analysis, providing:

  • Significantly fewer simulations
  • SPICE and Monte Carlo accurate results in the regions of interest
  • Scalable support on all design blocks used in memory design
  • Verification, for high confidence in results

Solido’s System Monte Carlo adds yield analysis capability at the array level:

  • Providing fast 3-sigma analysis across the array
  • Leveraging probability density function (PDF) data from cell-level analysis
  • Reporting tradeoffs between performance and yield
  • Fast enough to enable exploration of different array configurations

Results of running Solido on TSMC 20-nm memory design:

  • Measured bitcell performance to 6.15 sigma

    • Analyzed 12.8 billion Monte Carlo samples in only 5355 simulations
  • Measured sense amp performance to +/- 4.5 sigma

    • Analyzed 3.2 Million MonteCarlo samples in only 2727 simulations
  • Extracted probability density function (PDF) of bitcell and sense amp
  • Measured Monte Carlo based yield on a 64Mb array for 6 different read speeds in 1.5 hours
  • Improved memory specs by 11% to 52%

Retargeting standard cell libraries to new technologies is expense. It takes lots of simulator licenses and design time, layout has become part of the design loop, and increasing variability makes it difficult to size cells optimally for yield and performance. High-sigma analysis is necessary for the latest process technologies, but needs too many Monte Carlo samples to achieve accuracy and extrapolation with fewer samples is unreliable and inaccurate.

Cell Optimizer adds automation for sizing standard cells, providing:

  • Full script-based operation
  • Design sizing across multiple corners and testbenches
  • Support for pre- and post-layout netlists
  • Simulator independence

On the initial TSMC 20-nm standard cell design, 3 out of 4 measurements failed to meet the specification. After sizing, all measures met specification.

Signup for a DAC demo here:

http://www.solidodesign.com/

Solido Design Automation Inc. is a leading provider of variation-aware custom integrated circuit design software. Solido Variation Designer and application packages are used by analog/RF, IO, memory and standard cell digital library designers to improve design performance, parametric yield and designer productivity. Solido has pioneered a proprietary and patent-pending set of algorithms forming the core of its technology.

lang: en_US


FPGAs – The Possibilities are Endless – Almost

FPGAs – The Possibilities are Endless – Almost
by Luke Miller on 04-26-2013 at 8:00 pm

Has your wife ever said “Your name, I’m not a computer”? Well maybe mine has. I know what you are thinking… This guy is married? Yup, I over achieved too. Have child #7 on the way Lord willing, so you probably guessed I don’t follow much of the world’s planning and such. Like you, no one in my house really understands what I do, nor cares much. Hey they are like middle management; get ready for your yearly review where acronyms’ are king, in my last review I was told I was not visible enough, so I guess I need to eat more. OK I need to stop. While humans have what would seem like infinite possibilities, and women are multithreaded and operate in a non-binary way, I look to the possibilities of the finite, non-personal FPGA for some amazement. My bumper sticker says “I break for HLS” (I stole that from my Xilinx Buddy)

Every time I test a new bit image on a new device and the FPGA passes the smoke test; done light is on and the math working, I think wow; I can’t believe this is working. Now, it is not because I’m that bad of a designer, I hide that well. I’m just in awe of all the things that have to work just to make my little algorithm crank away. Don’t get me wrong, it is not like watching a child being born, or even a seed popping thru the soil in my garden but the sheer magnitude of all the collective efforts around the world to get a FPGA on a board that works is simply amazing. From the Fab lines, to node characterization, IO design, Hard IP’s, 3[SUP]rd[/SUP] party tools to aid the layout, DRCs, parasitic modeling, place and route 10 times, I could keep going. The configuration scheme alone I’m sure is years’ worth of work. Inside of that wonder square of gates is billions of transistors, and you know what? They work! And not only that, they work for a long time, they are reliable. Did I forget to mention all the high speed, 20+ layer board design, the micro switching power supplies? I would have to say the demo board that I program, easily must of had 1000’s of paws helping out so I could make a design a reality. Now I know I missed a whole bunch in there so don’t get nervous, I know you helped too and if you’re dead wood you at least faked it, by the way, you are fooling no one, can you say sequester.

The largest Virtex-7 has a configuration bit stream of 293,892,224 bits. That’s a lot. Many, many possibilities. Now don’t get technical on me, let’s just say it’s the full 293,892,224 bits, and that could be 2^293,892,224 different designs. I wonder how long it would take to find my bit stream match for a beamformer design. That is a neat thought. Too bad it would take a billion years to find out but the idea is you design for expects not the function. I have always thought of the FPGA as a player piano. The bit stream is the music roll and we make the music. Now that we have bounded the FPGA’s possibilities and we see they are finite, but huge, does anyone know the maximum possibilities for a CPU? It is not infinite, can’t be, assume fixed clock speed. That question brings up two more thoughts, which are there is no such thing as random and infinite. Yes in theory they exist like helpful people at a help desk but you cannot find them in practice. Roll some dice, they obeyed physics, sum up all the matter in the universe divide by Planck mass and that’s all the smallest parts possible, not infinite. Mind boggling isn’t it? OK go program an FPGA.

lang: en_US


Mentor CEO Wally Rhines U2U Keynote

Mentor CEO Wally Rhines U2U Keynote
by Daniel Nenni on 04-26-2013 at 2:00 pm

You will never meet a more approachable CEO in the semiconductor ecosystem than Dr. Walden C. Rhines. The first time I met Wally was way back when I blogged for food and he invited me over for lunch. Even better, a year or two later I was having dinner with a friend at the DBL Tree in San Jose. Wally was waiting for his flight home so he joined us for a glass of wine and an impromptu industry discussion.

At the Mentor U2U conference today Wally did a replay of the presentation he did at the GlobalPress Electronics Summit which Paul McLellan blogged about HERE. Since I’m more of a foundry person let me comment on a different part of his presentation.

Wally pointed out that when I started in this business almost 30 years ago semiconductor companies had their own fabs and could more accurately measure designs based on performance, power, area, AND manufacturability. With the emergence of independent semiconductor foundries this all changed. Manufacturing cost (yield) was a wedge between design and manufacturing. Fabless semiconductor companies emerged and pounded on the foundry doors begging for capacity for products that would have the best PPA (performance, power, area). The foundries wanted products that were manufacturable with high yield (low cost). It all came down to the choice of design rules: Should the design rule manuals (DRMs) be more accommodating to design with aggressive rules? Or should they be guard banded to allow for manufacturing variability?

First the foundries offered manufacturing centric DRMs with minimum design rules that had to be followed. As foundry competition emerged fabless companies had more choices and demanded more design oriented DRMs for better PPA. At 1.3m (from what I remember) the foundries compromised and introduced the concept of recommended design rules. The minimum rules were more design oriented (tight spacing) while the recommended (optional) rules were manufacturing oriented (larger spacing). Naturally the fabless designers did NOT use the recommended rules since they were not PPA focused, especially the ones purchasing good die versus wafers.

Notice the resemblance to Wally! 😉

This arrangement broke at 40nm which resulted in an extended yield ramp and painful market delays. At 28nm recommended rules were done away with and more restrictive design rules were implemented. As a result, 28nm ramped in record time and will be the most successful process node we will ever see (my opinion). As the slides from my EDPS Keynote show, the more restrictive DRMs and DRC decks are, the larger and more complex they become.

This transition will continue to require better EDA tools for designers and fabs to manage this ongoing collaboration and resulting complexity. One example Wally used was the Calibre PERC product which we recently blogged about HERE. This transition will also require closer collaboration between the fabless companies, EDA companies, IP companies, and the foundries. CEO’s like Wally Rhines and conferences like U2U, DAC, Arm TechCon, and TSMC OIP are critical to our survival so I ask all executives in the fabless semiconductor ecosystem to please allocate budget and send your best people out to make sure we all thrive in the coming process nodes.

lang: en_US


When installing a sink, it’s a lot faster to buy a saw

When installing a sink, it’s a lot faster to buy a saw
by Don Dingee on 04-25-2013 at 8:10 pm

Mentor’s announcement from Design West this week pretty much signals the end of standalone ESL tools, in favor of more useful stuff. They have pulled the pieces of their Sourcery CodeBench environment along with their embedded Linux offering and their Vista virtual prototyping platform into a native embedded software development environment.

Continue reading “When installing a sink, it’s a lot faster to buy a saw”


Morris Chang on Altera and Intel

Morris Chang on Altera and Intel
by Daniel Nenni on 04-25-2013 at 7:00 pm


If you want to know why I have written so much about TSMC in the past five years here it is: TSMC executives are approachable, personable, answer questions straight on, and have yet to lead me astray. If you want an example of this read the Chairman’s comments on the TSMC Q1 2013 earnings call transcript.

“On 16-nanometer FinFET, we have said several times that this is a change in cadence in our new technology introduction. It used to be 2 years per node and in the case of 16-nanometers FinFET, it follows just 1 year, by 1 year, the 20 SoC. So it is a quickening of cadence and that is because of market request, market requirements, customers’ requests.”

Call it Taiwan culture, or maybe that TSMC executives are highly technical people (experts in their fields), as a result, the flow of information is excellent for people who know what questions to ask. I’m not talking about press releases that professional PR people do for them with PR speak. I’m talking about unscripted Q&A sessions like the ones in the conference calls.

“The second point I want to make is that we have been collaborating with our customers and ecosystem partners for more than 15 years. Through the ecosystem OIP, TSMC’s technology has been collaboratively optimized for SoC development.”

My favorite Morris Chang story is when I saw him at the Royal Hotel in Hsinchu last year. I came in the same time he did and he beat me up the three flights of stairs to the lobby. Not kidding. This man has me beat by 30 years and 3 steps. I’m training on a Stairmaster now so I will be ready for him next time!

“CapEx will be between $9.5 billion and $10 billion this year. This is an increase from the last guidance we gave, which was about $9 billion. Basically, we have stepped up the preparation for the ramp-up of 20-nanometer and 16-nanometer. We have pulled some of the capital in because we want to be — to have as high yields as possible when we do start ramp-up, volume ramp-up. And of course, we are continuing to build up 28-nanometer capacity. Therefore, approximately 90% of the capital expenditures are for 28-nanometer, 20-nanometer, 16-nanometer, both building facility and equipment. Another 5% is for R&D and that’s mainly for 10-nanometer, 7-nanometer, et cetera.”

The best part of the call was in the Q&A with a question about Altera moving to Intel. Generally speaking the analyst questions are pretty dull but every once in a while they come up with a good one.

“I very much regret Altera’s decision to work on the 14-nanometer with Intel even though the financial impact is relatively small and Altera remains a major and valued partner of TSMC’s. We have gained many customers in the last few years but I really hate to lose even a part of an old one. We want them all, really. I regret it and because of this, we have thoroughly critiqued ourselves. If there was a thing like an investigative commission on what happened, we had it. And there were, in fact, many reasons why it happened and we have taken them to heart. And it’s a lesson to us and I don’t think that we — at least, we’ll try our very best not to let similar kinds of things happen again.”

In my opinion there was nothing TSMC could have done. Altera left TSMC because of Xilinx. Xilinx is a fierce competitor on all fronts: financial, marketing, sales, technology, ecosystem, etc… so there is no way Altera can outrun Xilinx on a level playing field. TSMC is open to all customers and does not do exclusive partnerships so Intel was a smart choice for Altera.

The question is: Can Intel be a good foundry partner for Altera? My guess is yes they can, as long as the new Intel CEO is on board with it and Altera does not need ARM (ARM and Intel do NOT mix). Not great news for Intel’s other FPGA partners though (Achronix and Tabula). They must really be steaming over the “exclusive” Altera deal!

lang: en_US


Best Practice for RTL Power Design for Mobile

Best Practice for RTL Power Design for Mobile
by Paul McLellan on 04-25-2013 at 11:54 am

Mobile devices are taking over the world. If you want lots of graphs and data then look at Mary Meeker’s presentation that I blogged about earlier this week. The graph on the right is just one datapoint, showing that mobile access to the internet is probably up to about 15% now from a standing start 5 years ago.

Of course, one obvious thing about mobile devices is that they run on batteries. Although there is slow steady improvement in battery technology, nobody is predicting any imminent breakthrough so if we want our batteries to last then we have to do that by getting the power consumed in the SoCs in the phone/tablet down. Or at the very least, not letting in increase. For very high volume devices there are some big discontinuous changes on the horizon such as 20nm FinFET or FD-SOI, both of which have considerably lower power than the previous generation of 28nm planar (which doesn’t have great leakage characteristics).

So when do you decide to do power analysis? One answer is all the time. But realistically, above the RTL level the design is just a block diagram without any detail for any blocks that don’t yet exist (IP blocks may be well-characterized). Below the RTL level, gates or even lower, there is very accurate data available (especially post-layout) but it is too late to make anything other than minimal changes. So like Goldilock’s porridge, RTL is not too hot and not too cold, it is just right.

Any power analysis, except the most coarse, requires vectors that stimulate the design in a “typical” way so that you can measure the power. Also, since you want to get a good power network designed early, you also need to find corner-case vectors that have the big swings in current that might drain the decaps or cause noise spikes or cause unacceptable voltage droop. So out of perhaps millions of available vectors, only a tiny percentage are needed to get good analysis done.

Design is not a static process. So once a strategy for keeping power under control has been agreed, regressions are necessary to make sure that as the design progresses that no surprises occur and suddenly increase the power. It is always easier to fix a bad change to a design just after it has been made rather than when you are about to tapeout.


So the basic flow is to start by making design tradeoffs. Next, power vectors need to be profiled in the various different operating modes that the design might be in (playing mp3, transmitting, receiving, watching video…). The power can then be checked against the budges and any hotspots identified. These can then be prioritized, deciding which anomalies are likely to have the biggest “bang for the buck” when fixed. Using automated tools, perhaps along with some manual and even embedded software work, the power can then be further optimized. And finally regressions created to make sure that the hard-won reductions don’t suddenly get lost.

Apache (you know they are a subsidiary of ANSYS don’t you!) have an webcast on best practice for RTL power. It is presented by Preeti Gupta who is director or RTL produt management. Here is the link for the webcast. It is 30 minutes long.


Bring high end camera image quality to smartphone

Bring high end camera image quality to smartphone
by Eric Esteve on 04-25-2013 at 9:13 am

We have to go back to 2008 to understand why Super Resolution is desperately needed by smartphone users, expecting to take high quality pictures with their smartphone, at least as good quality as with their camera. It’s in 2008 that smartphone worldwide shipments have surpassed standalone compact camera shipments… and we don’t expect this trend to reverse!

When you buy a smartphone, you probably don’t buy it first for the camera function, but you are happy to learn that you benefit from a 41M pixel sensor in your phone, and you think –like I do- that such a pixel count should provide top quality image. In fact, the CMOS sensor size (the chip size) in smartphone is growing smaller than in a camera so, when the number of pixel is growing, the pixel size becomes very small, and becomes more sensitive to noise and low light. Two other effects, simply based on geometry consideration, will again affect the image quality in smartphone compared with a camera, whatever the number of pixel your sensor is, as we will see with these two pictures:

The first effect is due to the distance between the sensor and the lens: as we can see, there is at least one order of magnitude difference, if no more, when comparing the smartphone with the high end camera.

The second difference is simply linked with the lens sizeitself. It’s only geometry, but optic laws are totally based on geometry. Even if it looks cruel, it’s a matter of fact and, since Euclide has issued the first rules and theorems, we can’t change these laws… But, that we can do is digitalizing the signal, then process it, using DSP algorithms, that’s the solution proposed by CEVA, called “Super Resolution”.

The principle looks obvious, like any great idea: instead of trying to take one high quality image with your smartphone (which is almost impossible due to the geometry), you take four images, with a low resolution sensor (say 5MPixel), and you process it, to finally generate high resolution image. As you can enhance the resolution by 2X by axis, you can generate up to 20 MPixel quality image, starting with your 5 MPixel sensor.
The process starts by enhancing image quality by:

  • Extracting image details
  • Reducing Noise and Luma & Chroma channels
  • Accurate Sharpenning
  • Ghost Blur Removal

Then you run the algorithm stages:

  • Course Registration
  • Fine registration
  • Image fusion, including ghost removal

Super Resolution is described in technical papers for a while, but based on iterative process requiring high bandwidth, up to 10, 000 operations by pixel. CEVA has greatly improved this complexity, down to less than 100 operations by pixel on a PC and finally, using CEVA MM3101 DSP core, up to 16 cycles/pixel. For example, in a 28nm process, the CEVA-MM3101 processor is able to take four 5MPixel images and fuse them into a single high-resolution 20MPixel image in a fraction of a second, while consuming less than 30mW.

Jeff Bier, founder of the Embedded Vision Alliance (www.Embedded-Vision.com), commented: “Smartphones are the most commonly used devices for capturing still images and video, but the slim form factor of these devices places severe limitations on the quality of captured images. CEVA’s Super Resolution algorithm, coupled with the CEVA-MM3101 imaging and vision processor, is an excellent example of how clever computer vision algorithms can be combined with optimized processor architectures to overcome physical limitations of imaging systems.”

Comparing image quality on a PC is not an easy task, but we can understand, from the above picture, that CEVA SR looks better than a PC application, and far better than Bi-cubic. And we agree with Eran Briman, vice president of marketing at CEVA, commenting: “Our new Super Resolution algorithm for the CEVA-MM3101 platform marks the first time that this technology is available in software for embedded applications. It is a testament to both the expertise of our highly skilled software engineers and to the low power capabilities of our CEVA-MM3101 platform, which comprises the hardware platform together with optimized algorithms, software components, kernel libraries, software multimedia framework and a complete development environment. We continue to lead the industry in the embedded imaging and vision domain and the addition of this latest high performance software component to our platform furthers illustrates the strength of our IP portfolio for advanced multimedia applications.”

The CEVA-MM3101 offers SoC designers an unrivalled IP platform for integrating advanced imaging and vision capabilities into any device. Coupled with CEVA’s internally developed computational photography and imaging expert algorithms such as dynamic range correction (DRC), color enhancement, digital image stabilizer and now super resolution, CEVA’s customers are equipped with a full development platform for image enhancement and computer vision applications for any end market, including mobile, home and automotive.

For more information, visit www.ceva-dsp.com/CEVA-MM3101.html.

Eric Esteve from IPNEST

lang: en_US


10 to 100X faster HDL Simulation Speeds

10 to 100X faster HDL Simulation Speeds
by Daniel Payne on 04-24-2013 at 10:44 am

Speed, capacity, accuracy – these are the three major EDA tool metrics that we pay attention to and that enable us to design and verify an SoC. Talk to any design or verification engineer and ask if they are satisfied with the time that it takes to simulate their latest design, or to verify that it meets spec and is functionally correct. That answer that you hear is, “No, I’m not satisfied, simulation of my RTL takes way too long.”

The EDA industry has responded to this challenge with several verification approaches:

  • HDL simulators – powerful debugging capabilities, good signal visibility, moderate cost, too slow
  • Emulators – faster speeds than HDL simulation, pricey, lack of signal visibility
  • FPGA Prototyping – faster speeds then HDL simulation, moderate cost, unconnected with HDL simulator

In 2011 the engineers at Aldec came up with an approach that combines an HDL simulator with an FPGA-based prototyping board, dubbed the HES XCELL. So a design or verification engineer can now use a familiar HDL simulator with debugging features, connected to an FPGA prototyping board to get a 10X to 100X speed up over just using an HDL simulator.

With this accelerated simulation approach the engineer continues to use a familiar HDL simulator to control the simulation and see results in the waveform viewer, while the actual design is simulating on the FPGA hardware to provide the speed up. You still determine what simulates in the HDL simulator versus the prototype board, so a design engineer can place new blocks in the HDL simulator and re-used blocks in hardware. As your design work is completed, you would place only your testbench in the HDL simulator, while the Design Under Test (DUT) is placed in the hardware:

8051 Example
The popular 8051 core and testbench were simulated in Aldec’s HDL simulator, called Riviera-PRO:

The 8051 core was using 97.08% of the CPU, while the testbench was using only 1.29% of the CPU. If we placed the 8051 core in hardware, instead of running in the HDL simulator, then a significant time speed-up can be had. Here’s a chart of the speed-up factor that you can expect with this accelerated simulation approach:

Getting Your Design Into Hardware
There are five steps to get your RTL design into hardware for the accelerated simulation:

[LIST=1]

  • Import your compiled HDL using the Design Verification Manager
  • Configure your design for debugging
  • Use logic synthesis on the blocks that will be accelerated into hardware
  • Partition your design between HDL-based simulation and accelerated simulation
  • Place and route your design for use in the FPGA

    Aldec uses an FPGA prototyping board with Xilinix Virtex 5 parts, you can also use boards from the DINI Group or Synopsys HAPS.

    Summary
    The approach of accelerated simulation has been shown to speed up HDL simulation results by a factor of 10X to 100X, allowing you to complete your verification quicker, while providing full debugging like in a traditional software-based HDL simulator.

    Further Reading

    White Paper: Simulation Acceleration with HES XCELL

    *lang: en_US