Bronco Webinar 800x100 1

Using HSPICE StatEye to Tackle DDR4 Rail Jitter

Using HSPICE StatEye to Tackle DDR4 Rail Jitter
by Tom Simon on 02-15-2017 at 12:00 pm

The world is a risky place, according to Scott Wedge, Principal R&D Engineer at Synopsys, who presented at the Synopsys HSPICE SIG on Feb 2[SUP]nd[/SUP] in Santa Clara. Indeed, the world circuit designers face can be uncertain. Dealing with risk and departure from ideal was a main theme in the fascinating talks at this dinner event, which is held every year in conjunction with DesignCon. Scott focused on how it is possible to overcome uncertainty through predictability.

Scott talked about new features in HSPICE that can help designers do their job better and work to manage that risk. Sources of uncertainty can come from variability and mismatch. Then over time MOS aging can contribute to reliability issues down the road. With smaller Vdd, circuits are more vulnerable to noise and jitter. The list goes on. Shortly I’ll get back to one of his items – noise effects that can lead to higher bit error rates in high speed data paths.

There were two customer presentations as well. One from Xilinx on how Synopsys worked with them to create new modeling methods to enable advanced node designs. The other customer presentation was from ARM where they discussed large scale Monte Carlo simulation methods.

John Ellis from the Signal and Power Integrity (SPI) Group at Synopsys gave an impressive presentation titled “Simulating Power Supply Induced Jitter Effects in Low BER DDR4 Interfaces Using StatEye”. While this is quite a mouthful, the title makes very clear what the subject matter of the talk was. With DDR4 and LPDDR4 there is now an explicit BER requirement. Brute force simulation of 10^16 bits leaves two choices: an eternity of SPICE runs, or the use of a Linear Time Invariant (LTI) environment in a statistical simulator. We are caught between the need for transistor level accuracy and the overwhelming time requirements of traditional SPICE.

John’s approach is to leverage HSPICE StatEye functionality to capture simultaneous switching output (SSO) effects – including rail jitter. StatEye at its most basic can rapidly capture the pulse response from an LTI system. This approach works well for modeling the datapath for LPDDR4. However, there is more at work here that the data path by itself. Fortunately, it is possible to use HSPICE StatEye to model nonlinear effects as well.

The question he posed is: Can we capture SSO performance adequately? His approach is to include the IO model in the StatEye simulation. By adding it to the channel, it is possible to include it in the pulse response – which will be affected by the IO voltage. Drawing the IO current from a non-ideal supply path with rail noise creates a nonlinear response with the output distorted by SSO noise.

SPICE simulations show differing edge response times for rise and fall when the model is nonlinear. Standard StatEye misses these effects. Synopsys StatEye can model nonlinearities with “Multi-Edge” or “Full Transient” mode. Each has its advantages and limitations.

Multiple Edge response requires one simulation for each edge. Each response can be saved for future use to shorten run times. Clearly with the addition of each edge response the HSPICE StatEye better matches the SPICE results.

Full Transient response gives a probability density function (PDF) based on an arbitrary bit stream. However, it cannot be saved and must be rerun for each case. This will serve as a good starting point to see how well we can model the effect of rail jitter on the data path. If this works, Multiple Edge Response is a possible route to increased flexibility and speed.

John’s presentation included details about the simulation model set up for DDR4. Then he showed how rail noise of around 13% will affect the reference SPICE runs. He compared the “Full Transient” mode of HSPICE to SPICE in predicting rail noise and saw good correlation. If this did not work, the subsequent analysis based on this nonlinearity would not be useful.

Next he compared HSPICE StatEye in Full-Transient mode for the full datapath including the rail noise to his SPICE results and saw good agreement in the horizontal and vertical openings. VREF was shifted somewhat. Next he worked to include more realistic triggering by factoring in DQS, including jitter. This does require two StatEye runs, but the results are impressive. The first run is used to generate the jitter function and the second one applies this to the full channel. The results are definitely better.

The next steps were to apply the Multiple Edge mode to the same case. Rather than attempt to cover the details here, I would suggest viewing the entire presentation, along with the others from the meeting.

The end results show that by applying HSPICE StatEye with its full capabilities to power and signal integrity problems like this can yield significant insight into system performance. This in turn, can help reduce risk in high speed designs by factoring in real world effects during the design and verification stages of product development.


Xilinx vs Altera Update 2017

Xilinx vs Altera Update 2017
by Daniel Nenni on 02-15-2017 at 7:00 am

I truly miss the Xilinx versus Altera war of words (competition at its finest) and competition is what makes the fabless semiconductor ecosystem truly great, absolutely. So with great disappointment I read the Intel Analyst Day transcript published by Bloomberg last week. It is attached at the bottom in case you are interested but to me it is a 41 page snoozefest.

Here is the detailed (sarcasm) update on the Programmable Solutions Group (Altera) from Intel CEO Brian Krzanich (BK). Please note that PSG was not even on the agenda with the rest of the Intel groups:

What I’m really proud of for this group is two things for 2016. One, 14 nanometer, first 14 nanometer FPGA shipped to customers, which is on Intel technology…

Let’s not forget that Intel and Altera signed a foundry agreement exactly four years ago for the 14nm product that they are just now shipping today. This agreement led to the image below showing that the Intel 14nm process is by far superior to the TSMC 16nm process. Xilinx has been in production at 16nm for more than a year now which brings me to my first question for BK:

Can you please update this slide now that both chips are in production?

and secondly, in 2016, they hit their growth targets growing faster than market, meaning we believe we gained share in 2016. First year out of an acquisition, we feel part of an acquisition we feel like those are great results and something to be proud of for Intel and the Altera team.


The FPGA market has an expected CAGR of 8.4% from 2016 to 2020 and according to the recent Xilinx investor call:

[LIST=1]

  • Sales from our 28-nanometer Zynq product family increased by nearly 20%.
  • 20-nanometer revenue again reached a record level, significantly exceeding our $50 million target.
  • 16-nanometer sales grew significantly in the December quarter to a new record, exceeding our forecast.
  • 16-nanometer shipping 12 unique products to over 300 active customers.

    What specific market share did you gain in 2016?

    And we’re already starting to look at 10-nanometer products for FPGAs as well. So, we believe we can continue to win share and grow faster than market in FPGAs.

    Xilinx is skipping TSMC 10nm in favor of an accelerated 7nm schedule which will go into production in 2018. As history has shown, the first FPGA to a process node wins majority market share. Xilinx beat Altera to 28nm by a quarter. Xilinx beat Altera to 20nm by about a year, and 16nm beat 14nm by more than a year.

    Xilinx Collaborates with TSMC on 7nm for Fourth Consecutive Generation of All Programmable Technology Leadership and Multi-node Scaling Advantage. Four generations of advanced process technology and 3D ICs, fourth generation of FinFETs

    SAN JOSE, Calif., May 28, 2015/PRNewswire/ –Xilinx, Inc. (NASDAQ: XLNX) announced that it has collaborated with TSMC on the 7nm process and 3D IC technology for its next generation of All Programmable FPGAs, MPSoCs, and 3D ICs. The technology represents the fourth consecutive generation where the two companies have worked together on advanced process and CoWoS 3D stacking technology, and will become TSMC’s fourth generation of FinFET technology. The collaboration will provide Xilinx a multi-node scaling advantage and build on its outstanding product, execution, and market success at 28nm, 20nm, and 16nm nodes.

    From what it looks like today, Xilinx 7nm will beat Altera 10nm again by more than a year. In fact, Xilinx may be at TSMC 5nm by the time Altera has ramped up 10nm.

    Bottom line: The problem with semiconductor marketing writing checks that engineering can’t cash is that at some point in time those checks will bounce. Intel’s former bravado and current lack of transparency in the FPGA market has fallen flat and that is nothing to be proud of.


  • Making Functional Simulation Faster with a Parallel Approach

    Making Functional Simulation Faster with a Parallel Approach
    by Daniel Payne on 02-14-2017 at 12:00 pm

    I’ll never forgot working at Intel on a team designing a graphics chip when we wanted to simulate to ensure proper functionality before tapeout, however because of the long run times it was decided to make a compromise to speed things up by reducing the size of the display window to just 32×32 pixels. Well, when first silicon arrived, sure enough, the only display that worked was 32×32 pixels, so we had to do another re-spin to correct for logic bugs. In the 1980’s it was quite popular to be using logic simulators that were interpreted, making it easy to interactively debug hundreds to thousands of gates.

    In the 1990’s I was working at an EDA company that acquired the simulator company that had just written the fastest, compiled-code Verilog simulator. Wow, what a dramatic improvement over the older, interpreted logic simulators.

    Today, we have SoCs with billions of gates, so this extreme size has really pushed the EDA vendors to come out with something new that can handle that capacity with run times that take hours to days, instead of weeks. The new approach to deal with these present day challenges is a 3rd generation, parallel simulation engine that scales. Here’s a chart showing the three generations of functional simulators:

    I spoke by phone with Adam Sherer of Cadence Design Systems recently to get his insight about functional simulation since the 1980’s. It turns out that back in early 2016 Cadence acquired this start-up company Rocketick with a parallel simulator called RocketSim. Yes, most of the EDA companies had been trying to develop their own parallel simulators, but the earliest results were not promising enough to become viable products because of poor scaling and manual compile processes. The real accomplishment of RocketSim was to provide a parallel simulator that could:

    • Handle multiple cores
    • Accept multiple clocking domains
    • Work with complex interconnect fabrics
    • Simulate hundreds of IP cores
    • Scale to billions of components
    • Support RTL, gate-level functional simulation and gate-level DFT

    Related blog – EDA Mergers and Acquisitions Wiki

    The secret sauce behind RocketSim is the ability to identify dependencies among independent threads of execution, while minimizing the memory footprint required. You can expect the following typical speed-ups when using this parallel simulation approach:

    • 3X for Verilog / SystemVerilog RTL
    • 5X for gate-level functional simulation
    • 10X for gate-level DFT

    With fine-grain multi-processing technology, you can run RocketSim on multi-core servers using up to 64 cores, and it knows how to separate your code into portions that can be accelerated, and portions that cannot be accelerated. For the actual users of this simulator you don’t need to change your testbench, design or even the assertions, now that’s convenient.

    Forum – CDNS reports increases in Q1 2016 results, $448M revenue, $0.17/share earnings

    The largest SoC teams have long used hardware-based engines like Palladium to get even faster runtimes, although that approach can become pricey compared to software simulators. One difference between a software simulator like RocketSim and hardware engine like Palladium, is that RocketSim handles four-state logic which includes the Z and X states while the hardware engine supports only 2-state logic.

    Related blog – Improving Methodology the NVIDIA Way

    I was impressed to learn that the RocketSim team, based in Israel, has actually grown in size since being acquired by Cadence, always a positive sign that the team is being treated well and that the marketplace is growing for a parallel simulator.

    Summary
    Functional simulation has come a long ways since the 1980’s, so we are living in exciting times as the promise of parallel simulation is being adopted to keep simulation run times reasonable instead of having to wait weeks and months for regression results. Adam Sherer has written a White Paper on this topic that you may read online here.


    The Next Big Thing in Deep Learning

    The Next Big Thing in Deep Learning
    by Bernard Murphy on 02-14-2017 at 7:00 am

    I mentioned adversarial learning in an earlier blog, used to harden recognition systems against bad actors who could use slightly tweaked images to force significant misidentification of objects. It’s now looking like methods of this nature aren’t just an interesting sidebar on machine learning, they are driving major advances in the field (per Yann LeCun at Facebook).

    The class of systems considered in these approaches are called Generative Adversarial Networks (GANs) in which one neural network is played off against another. One network, called the discriminator, performs image recognition with a twist – it reports on whether it believes the image to be real or fake (artificially constructed). The second network, called the generator, reverses the normal function of a recognition system to create artificial images which it feeds to the discriminator. If the discriminator determines an image to be fake, it feeds back information to the generator on what caused it to come to that conclusion.

    The beauty of this setup is that this pair of networks, after a bootstrap on a relatively modest set of real images, can self-train to recognition/generation levels of quality that would normally require much larger set of labeled image databases. This is a big deal. A standard reference for images, ImageNet, contains over 14 million images across 1000 categories. That’s for “standard” benchmark images. If you want to train on something outside that set, unless you get lucky you must first build a database of tens of thousands of labeled reference images. But with GAN approaches you can reduce the size of the training database to hundreds of images. That’s not only more efficient, it can be important where access to larger databases can be limited for privacy reasons, as is often the case for medical data.

    This raises an interesting question in deep learning – if GAN-enhanced training on a small set of examples can achieve similar levels of recognition to (non-enhanced) training on a much larger set, doesn’t that imply significant redundancy in the larger set? But then how do you measure or better yet eliminate that redundancy? This is a question we understand quite well in verification, but I’m not aware of work in this area for deep learning training data. I would think the topic should be extremely important. A well-chosen training set, together with GAN methods, could train a system to be accurate in recognition across a wide range of examples. A poorly chosen training set, even with GAN reinforcement, could do no better than recognize well across a limited range. If anyone knows of work in this area, let me know.

    So one thing you get out of GAN is improved learning on smaller datasets. But the other thing you get is improved image generation (because the discriminator is also training the generator). Why would that be useful? I can imagine that movie-makers might find some way to take advantage of this. A more serious application is to support something called inpainting – filling in missing parts of an image. This has obvious applications in criminal investigation as one example.

    Another very interesting application is in astronomy, specifically in approaches to mapping dark energy by looking for weak gravitational lensing of galaxies. This is a tricky problem. We don’t know really know much about dark energy, and we’re looking for galaxies whose size and shape we don’t know, because they’re distorted by that dark energy. This seems like a problem with too many unknowns, but one group at CMU have found a way to attack the problem through generative creation of galaxy images. They expect to be able to use methods of this nature, together with models of estimated shearing of the images caused by lensing, to map against the images we actually detect. By tuning to get accurate matches they can effectively deduce the characteristics of the dark energy distribution.

    Deep learning marches on. It continues to become more interesting, more capable and more widely applicable. The Nature article that started me on this topic is HERE.

    More articles by Bernard…


    Qorvo Uses ClioSoft to Bring Design Data Management to RF Design

    Qorvo Uses ClioSoft to Bring Design Data Management to RF Design
    by Mitch Heins on 02-13-2017 at 12:00 pm

    A couple weeks ago I gave a heads-up about a webinar that was being hosted by ClioSoft, Qorvo and Keysight. The topic of the webinar was how to manage custom RF designs across multiple design teams and CAD flows. The webinar was held on February 1st and included presentations by Marcus Ray of Qorvo and Michele Azarian of Keysight.

    Much has been written about ClioSoft’s SOS product. In summary, it’s a great product for data and IP management and it enables companies to manage design complexity across multiple, geographical dispersed design teams. I saw this put into practice in my previous job, where one of our customers, a large semiconductor IC provider, was using ClioSoft with our EDA products to simultaneously work on the same design using teams located in Japan, Europe and the United States. In that case, the task was somewhat simplified by the fact that all of those team were using tools from one EDA vendor. None the less, it was a powerful statement as to the capabilities that ClioSoft provided.

    In the Qorvo case, they are doing RF system design using multiple IC technologies all assembled onto a 4 to 10-layer laminate that includes matching components embedded in the laminate for each die. Qorvo teams are dispersed across multiple locations around the world with some of those teams using Cadence’s Virtuoso design environment while others are using Keysight’s ADS environment.

    At least three things had to happen to make this flow possible.

    • Cadence had to integrate with ClioSoft SOS – first versions of this were released in 2001, not too long after ClioSoft’s founding in 1997.
    • Keysight had to integrate with ClioSoft SOS – first versions of this were released in 2012.
    • Lastly, ClioSoft recently did some interoperability work around OpenAccess (which was already being used by both Cadence and Keysight) to enable the two different EDA vendors to both read and write the same OpenAccess meta-data. This last step enabled true interoperability by giving both systems the same understanding about all of the data being shared.Data sharing was made much simpler by the fact that both Cadence and Keysight were using the same OpenAccess databases for their design repositories. However, as anyone who has worked with OpenAccess knows, it takes more to interoperate than simply being able to read and write OpenAccess data. The ability to share a common understanding of what is in the database makes all the difference in the world and ClioSoft’s work to codify this meta-data was a key component to making this interoperability flow work. Once this was in place, the Qorvo design teams were able to seamlessly move back and forth between Virtuoso and ADS while taking advantage of all of the ClioSoft data management capabilities.

      The beauty for Qorvo is that they are able to use ClioSoft SOS to manage and share their RF work across all their various design sites while interoperating between Cadence Virtuoso and Keysight ADS. Another key feature in this setup is that SOS is fairly technology agnostic as seen by the fact that Qorvo is managing multiple IC technologies in addition to laminate substrates. The impact of this is that Qorvo is able to hierarchically build designs using these different technologies and then simulate them together at the system level.

    Another nice feature of this setup is that Qorvo can use SOS’s capabilities to seed the workspaces of each of their design groups. This includes common libraries, IPs, test benches, scripts and data files. Even though the teams are in different geographies, they all have access to a consistent design environment that is kept up to date by design management policies. Designers can selectively pull what they need, but having ready-made setups available saves time and goes a long way toward eliminating simple but critical errors like using out-of-date test benches or high level models that don’t match their lower level implementations.

    The webinar did a nice job of explaining the desire and need for data and IP management and the presenters did a good job of showing how easy it was to move back and forth between the tool sets integrated with ClioSoft’s SOS7. Key features for Qorvos in this heterogeneous environment included revision control of IP and designs, team collaboration across multiple sites, archiving of design revisions and IP management that is used to trace IP usage in each of their tape-outs. This last item can come in handy when issues are found in an IP block and you need to know which designs may be affected by those issues. Additionally, having archival and versioning control also enables downstream teams such as product and test engineering to have easy access to design data without the need to hunt down design engineers who have moved on to different projects.

    Which brings up the final point of this webinar, which is that data management really needs to be done across the entire design and manufacturing flow, including requirements and specifications, logic design and test bench generation, IC and laminate layout, tape-out revision control, packaging revisions, and back annotation of empirical data from fabricated parts. These systems are complex and require good data management practices to ensure success and Qorvo found that ClioSoft SOS was the platform that worked for them.

    If you missed the webinar you can view a video recording of the event here: http://cliosoft.com/corp_web/webinar_recordings/ads_0117_qrvo/request.php

    Additional information can also be found for each vendors’ offerings at the following websites:
    ClioSoft: www.cliosoft.com

    Keysight Technologies: http://www.keysight.com/en/pc-1297113/advanced-design-system-ads?cc=US&lc=eng

    Cadence Design Systems: http://www.cadence.com

    Also Read

    Qorvo and KeySight to Present on Managing Collaboration for Multi-site, Multi-vendor RF Design

    Tool Trends Highlight an Industry Trend for AMS designs

    Managing International Design Collaboration


    CEO Interview: Amit Gupta of Solido Design

    CEO Interview: Amit Gupta of Solido Design
    by Daniel Nenni on 02-13-2017 at 7:00 am

    Solido Design Automation is rapidly making a name for itself in EDA. Amit Gupta is founder and CEO of Solido Design Automation, based in Saskatoon, Canada. You should also know that Solido is one of the founding members of SemiWiki.com. In the last six years we have published 44 Solido related blogs that have racked up more than 200,000 page views. I recently had the opportunity to have a New Year’s chat with Amit for a Solido update. Below is a throwback graphic I used in one of my early Solido blogs and it is still one of my favorites because it is so true.

    Tell us about Solido Design Automation
    I founded Solido in 2005. We focus on providing variation-aware design software for custom IC designers. Our flagship product, Variation Designer, was launched in 2007 with version 4 available today. Product development and customer applications are both based in Saskatoon, Canada, and we have sales offices around the world.

    We currently have a team of about 60 people working to create, provide, maintain, and support products for custom IC designers. We have over 35 major customers working in memory design, standard cell library design, and analog/RF design. Solido’s software helps them meet industry and market demands by building designs with better power, better performance, better area, and better yield.

    We have 15 patents protecting our core machine learning technologies, enabling designers to get the most accurate results in the fastest time.

    What makes Solido unique?
    There are a few aspects that make Solido really unique:

    First, we invest heavily in machine learning technologies to provide disruptive solutions to our customers in terms of speed, accuracy, capacity, and verifiability.

    Second, we invest heavily in user experience design experts to provide an unmatched product-user interface that is easy to use and deploy quickly across an organization.

    Third, we invest heavily in our customer applications team to ensure our world-wide user base of over 2000 people have great support and get the full benefits of Solido software.

    The combination of these investments has given us the world leading position in variation-aware design software.

    Why should designers be concerned with design variation?
    There are some big semiconductor trends happening right now. We’re seeing growth in many semiconductor segments: mobile, 5G networking, automotive, IoT and industrial IoT, and cloud computing.

    This growth is driving chip complexity. There’s a need to move to advanced nodes, including advanced FinFET designs at 16-, 14-, 10-, and 7nm; FDSOI designs at 22- and 12nm; and low-power variant designs at both advanced and mature nodes at 28nm, 40nm, and 65nm.

    To meet specifications and to stay competitive in designing the best-performing quality chip with low power, high performance, low area, and high yield, designers need to be able to do extensive SPICE simulations to account for all the potential design variation. Using brute-force PVT and Monte Carlo requires too much time and resources for full design coverage. Solido Variation Designer enables customers to get full design coverage in orders-of-magnitude fewer simulations than brute force.

    What does Variation Designer allow the designer to do?
    Solido Variation Designer uses machine learning algorithms that enable designers to reduce the number of simulations from 10 to one million times fewer simulations, while still achieving the accuracy of brute-force PVT and Monte Carlo analysis. As a result, our customers achieve full design coverage without having to compromise on accuracy, allowing them to get high performance, low area, low power, high yielding ICs and to stay competitive within these rapidly advancing semiconductor trends.

    We’ve hit an inflection point between design challenges and a need for variation-aware design tools. With our machine learning technologies, we’re able to meet those challenges. In addition, our user interface allows customers to pick it up and implement it in their organizations quickly and efficiently.

    2016 was a big year for Solido. What were some of the highlights?
    2016 was a great year. We are now among the largest private electronic design automation (EDA) companies, achieving 50% revenue growth, again; which we’ve accomplished each year for the last 5 years. We were also recognized in Deloitte’s Technology Fast 50[SUP]TM[/SUP] program, for being among the fastest growing technology companies.

    We’ve also been hiring very aggressively. Last year we increased our team from 30 to 50 people. This year, we will be more than doubling our team, to over 100 people. We’re actively hiring software developers and customer applications people to support our growing customer base and continue to build the world’s best product.

    We’re really looking forward to 2017. Our software is being used by more designers and more companies, and we’ll be launching some exciting new products in 2017.

    About Solido Design Automation
    Solido Design Automation Inc. is a leading provider of variation-aware design software for high yield and performance IP and systems-on-a-chip (SOCs). Solido plays an essential role in de-risking the variation impacts associated with the move to advanced and low-power processes, providing design teams improved power, performance, area and yield for memory, standard cell, analog/RF, and custom digital design. Solido’s efficient software solutions address the exponentially increasing analysis required without compromising time-to-market. The privately held company is venture capital funded and has offices in the USA, Canada, Asia and Europe. For further information, visit www.solidodesign.comor call 306-382-4100.

    Also Read:

    CEO Interview: David Dutton of Silvaco

    CEO Interview: Toshio Nakama of S2C

    CTO Interview: Mohamed Kassem of efabless


    DVCon San Jose February 27th – March 2nd

    DVCon San Jose February 27th – March 2nd
    by Bernard Murphy on 02-10-2017 at 7:00 am

    DVCon is fast approaching, less than 3 weeks away. As a verification geek, this must be one of my favorite conferences, so I’ll be there; you’ll see me at tutorials, presentations and wandering around the Exhibit hall. (Pictures here from the 2016 DVCon – many of the same attendees will be at this year’s conference after all :cool:)

    As usual, Monday is tutorial day, which I personally find very helpful to stay current with emerging/evolving standards in verification. The day kicks off with a session on creating portable stimulus models in the soon-to-be-finalized portable test and stimulus standard (PSS). Quite a few companies are already using this in various (pre-ratified) forms so I expect it to take off fast. The afternoon continues with a review of the next step in UVM (IEEE 1800.2) and impact this may have on existing verification environments. Finally, you can wrap up with a tutorial on SystemC design and verification – what’s new in the synthesizable subset definition, advice for high-performance modeling and an update on the emerging UVM-System-C standard, so you can reuse your System-C testbenches at RTL.

    Tuesday is papers, posters and an intriguing lunch topic (Cadence-sponsored) on whether verification needs differ between edge nodes, hubs, networks and servers. Throughout, all the papers and posters look interesting. I’ll just mention a few that particularly caught my attention: Using UVM sequences to layer protocol verification (Microsoft), Emulation-based low-power validation (Samsung), trends in verification in 2016 (Harry Foster, Mentor), Assertion-based verification for AMS designs (poster, TI), Formal strategies for IP verification (poster, Microsoft), Regression efficiency with Jenkins (poster, Mentor), Optimizing random test using Machine Learning (ARM).

    Wednesday starts with a can’t-miss session – users talk back on the portable stimulus standard. Given the audiences I usually see at DVCon, I expect to hear lively debate. Again, a few topics of special interest for me include: Early software development/verification using hybrid emulation/virtual prototyping (Samsung), Making formal mainstream (Intel), Machine Learning-based PVT/worst-case coverage in AMS (TI). The lunch is sponsored by Synopsys with fellow Atrenta alum Piyush Sancheti moderating a discussion on how industry leaders approach verification using Synopsys technology.

    The post-lunch panel could be exciting, depending on how controversial the panelists wants to be, debating what SystemVerilog has done for us (or to us) and what might come after. In afternoon papers, I like: Ironic but effective, how formal can improve your simulation constraints (Mediatek), and Methods to improve verification reuse in AMBA-based designs (SK Hynix).

    Thursday is back to tutorials, kicking off with Cadence talking about new approaches to reinventing SoC verification. Mentor have framed a tutorial on formal in an entertaining task – how to verify an FPGA-based solar-powered rescue drone using only formal, when you’re depending on that drone working to get out word that you need to be rescued. Synopsys follows with a very important tutorial on managing low power verification complexity, organized by another fellow Atrenta alum, Kiran Vittal. Low power design has made verification significantly more complex. How do you know you have covered all realistic possibilities, given a seemingly boundless range of configuration and switching options and how can you systematically approach power verification?

    Mentor hosts a lunch on trends in verification with a view to an Enterprise Verification platform – should be interesting. Afternoon tutorials start with Cadence talking about IP verification and warning this is not a solved problem. They’ll discuss how to optimize coverage across the spectrum of verification techniques. Mentor follows with a tutorial on how to create a complex UVM testbench in a couple of hours. I’m curious to see how they do that. Synopsys closes with a tutorial on optimizing productivity with formal and getting to closure with formal (a perennially intriguing topic).

    If you are involved in verification, DVCon is the one conference each year you cannot afford to miss. Signup HERE.

    More articles by Bernard…


    GlobalFoundries Makes Pure-Play Foundry Great Again!

    GlobalFoundries Makes Pure-Play Foundry Great Again!
    by Daniel Nenni on 02-09-2017 at 9:00 pm

    The pure-play foundry business just got stronger and so did semiconductor manufacturing in the United States. As we all know, the fabless semiconductor industry started by utilizing extra capacity from traditional semiconductor manufacturers (IDMs). However, putting your designs in the hands of a competitor is not a good idea so the pure-play foundry business was born (1987) and has become more dominant every year, absolutely.


    Today we still have a wide range of pure-play foundries but most of them have fallen behind and are still struggling with FinFETs (SMIC and UMC) or have stopped leading edge development all together (Powerchip, TowerJazz, Vanguard, Hua Hong, Dongbu, and X-Fab). As a result, the front door was left wide open for IDM foundries (Intel and Samsung) to bring leading edge technology to the insatiable fabless chip and fabless system companies.

    That door is now closing with the GlobalFoundries acquisition of IBM semiconductor and the leading edge process development expertise that came with it. Further proof is the multi billion dollar expansion announcement GF made today (GLOBALFOUNDRIES Expands to Meet Worldwide Customer Demand).

    GF’s name has come up quite frequently of late during conferences and customer visits, especially by the IP companies who are now developing IP for the GF 7nm process. Take a look at the customer quotes and let me confirm that I have heard significant GF chatter involving these companies and about a dozen more:

    “GF has had a strong foundry relationship with Qualcomm Technologies for many years across a wide range of process nodes,” said Roawen Chen, senior vice president, QCT global operations, Qualcomm Technologies, Inc. “We are excited to see GF making these new investments in differentiated technology and expanding global capacity to support Qualcomm Technologies in delivering the next wave of innovation across a range of integrated circuits that support our business.”

    “Collaborative foundry partnerships are critical for us to differentiate ourselves in the competitive market for mobile SoCs,” said Min Li, chief executive officer of Rockchip. “We are pleased to see GF bringing its innovative 22FDX technology to China and investing in the capacity necessary to support the country’s growing fabless semiconductor industry.”

    “As our customers increasingly demand more from their mobile experiences, the need for a strong manufacturing partner is greater than ever,” said Joe Chen, co-chief operating officer of MediaTek. “We are thrilled to have a partner like GF that invests in the global capacity we need to deliver powerful and efficient mobile technologies for markets ranging from networking and connectivity to the Internet of Things.”


    The expansion involves their facilities in New York (FinFET), Dresden (FD-SOI, Singapore (CMOS), and the new fab in China (CMOS and FD-SOI), meaning GlobalFoundries is truly a global pure-play foundry:

    • US advanced manufacturing, New York Fab 8, we are expanding 14nm FinFET capacity by 20% as well as developing advanced 7nm FinFET technology by 2018.
    • European manufacturing, Dresden Fab 1, expanding 22FDX*® capacity by 40% by 2020 as well as developing 12FDX™ technology with expected tape-outs in mid-2018
    • Asia Pacific manufacturing, Singapore 300mm and 200mm Fabs, expanding 40nm capacity by 35% at 300mm, 180nm capacity at 200mm as well as adding new capabilities to produce industry-leading RF-SOI technology.
    • China manufacturing, Chengdu Fab 11, a new 300mm fab in joint venture with the Chengdu municipality to support existing 180/130nm technologies, production starting in 2018 and then focus on manufacturing GF’s commercially available 22FDX process technology, with volume production expected to start in 2019.

    And of course we can all thank Sanjay Ja, one of my favorite semiconductor CEOs, for making the pure-play foundry business great again:

    “We continue to invest in capacity and technology to meet the needs of our worldwide customer base,” said GF CEO Sanjay Jha. “We are seeing strong demand for both our mainstream and advanced technologies, from our world-class RF-SOI platform for connected devices to our FD-SOI and FinFET roadmap at the leading edge. These new investments will allow us to expand our existing fabs while growing our presence in China through a partnership in Chengdu.”


    Intel Alternative Facts!

    Intel Alternative Facts!
    by Robert Maire on 02-09-2017 at 12:00 pm

    Brian Krzanich, CEO of Intel, announced a $7B investment in Fab 42 in Arizona in the oval office, next to Trump as evidence of a positive reaction to Trump’s new policies.

    Alternative fact;
    Paul Otellini, then Intel’s CEO, made a similar promise about Fab 42 in the company of Obama in 2011, during a visit to Hillsboro, Oregon.

    BK said that it would bring 3,000 new Intel jobs to Arizona as the states largest private employer. BK further said that these were not jobs returning to the US from overseas but that Intel was all about “growth”

    Alternative fact; If you add the 3,000 jobs that may be hired in the future for Fab 42, to the 12,000 or so that Intel reduced last year, Intel is still negative 9,000 jobs…AKA “negative growth”

    There was no mention, in the oval office, of the H1B visa program in which Intel joined with a 100 other silicon valley companies, this past weekend, to sue the government over.

    Alternative fact;
    Intel asked the government for 14,523 H1B visas and green cards for foreign workers between 2010-2015 the years leading up to the 12,000 employee reduction of US workers following those additions.

    Failed dinner consolation prize…

    Perhaps the oval office photo op and announcement was done to make up for BK first setting up a dinner at his home for then candidate Donald Trump then being forced to cancel it due to the outcry in silicon valley. BK as one of the few Trump supporters in the valley will be expecting some payback in the form of tax and regulatory easements that were hinted at during the announcement today.

    Fab 42’s long rumored resurrection…

    After being put in “mothballs” several years ago it was only a matter of time before an appropriate use would be found at an appropriate time. With 10NM firmly in Israel it makes Fab 42 an easy choice for 7NM. When you add to the decision making process and the advent of EUV tools at 7NM or 5NM which require boatloads of electrical power along with gigantic, very expensive cranes to hoist the huge tools into the Fab, the only place that makes sense is Fab42 for 7NM. So the reality is that this was going to happen any way but Intel wanted to get some free political capital out of it

    When is a dollar not a dollar? When its part of Intel’s CAPEX plan….

    We have heard from many suppliers and tool makers in the semiconductor industry that are wondering who is getting all of the alleged spending on CAPEX from Intel. The numbers just don’t seem to add up. Intel’s announced CAPEX over the last couple of years does not seem to be proportionately translating into dollars spent at suppliers. Its almost as if $1 announced by Intel translates into 50 cents spent in real money. Its almost impossible to accurately measure this but our anecdotal evidence points to less spending than announced. Part of this may be “sandbagging” by management to make it easier for Intel to hit its financial targets but even still there’s a mismatch.

    We have heard from a number of suppliers in the industry that Intel is not only no longer number one in spending, a title it lost long ago, but doesn’t even make it into the top 3 or 4 spenders anymore at many vendors.

    Not all $7B goes to the US…

    Given that the bricks and mortar are all done at Fab 42 and all that is needed is equipment move in we can assume that the $7B is all equipment. If we subtract spend on ASML, TEL, Hitachi and all other foreign vendors its likely that less than $5B actually “stays” in the US

    Intel probably still spends more overseas than in the US on CAPEX…

    If we look at Intel’s global footprint of fabs, especially the near term spend in Israel and China, the $7B spend in Fab 42 , especially when spread over several years it is in the minority. I am sure that BK could stand next to China’s Xi Jinping, in the equivalent of their oval office in China, for a similar photo op and claim a similar, if not larger amount of money that will be spent and jobs that will be created for memory production in China by Intel. We are also sure that Intel got some sweet political deals there as well…..

    Intel the stock…
    We view today’s announcement as not impactful either way for shareholders of the company, but we do applaud Intel’s ability to work both political sides of China versus U.S. and H1B versus foreigner bans. Intel hasn’t given up anything it would not have done anyway and in return it gets an IOU with Trump at a time when silicon valley is in open revolt against the new administration.

    Maybe BK read “The art of the deal”…..


    Notes from the Neural Edge

    Notes from the Neural Edge
    by Bernard Murphy on 02-09-2017 at 7:00 am

    Cadence recently hosted a summit on embedded neural nets, the second in a series for them. This isn’t a Cadence pitch but it is noteworthy that Cadence is leading a discussion on a topic which is arguably the hottest in tech today, with this range and expertise of speakers (Stanford, Berkeley, ex-Baidu, Deepscale, Cadence and more), and drawing at times a standing room only crowd. It’s encouraging to see them take a place at the big table; I’m looking forward to seeing more of this.


    This was an information-rich event so I can only offer a quick summary of highlights. If you want to dig deeper Cadence has said that they will post the slides within the next few weeks. The theme was around embedding neural nets in the edge – smartphones and IoT devices. I talked about this in an earlier blog. We can already do lots of clever recognition in the cloud and we do training in the cloud. But, as one speaker observed, inference needs to be on the edge to be widely useful; value is greatly diminished if you must go back to the cloud for each recognition task. So the big focus now is on embedded applications, particularly in vision, speech and natural language (I’ll mostly use vision applications as examples in the rest of the blog). Embedded application creates new challenges because it needs to be much lower power, it needs to run fast on limited resources and it must be much more accessible to a wide range of developers.

    One common theme was need for greatly improved algorithms. To see why, understand that recent deep nets can have ~250 layers. In theory each node in each layer requires a multiply-accumulate (MAC) and the number of these required per layer may not be dramatically less than the number of pixels in an image. Which means you’ll need to process at billions of MACs per second in a naïve implementation. But great progress is being made. Several speakers talked about sparse matrix handling; many/most (trained) weights for real recognition are zero so all those operations can be skipped. And training downloads/update sizes can be massively reduced.

    Then there’s operation accuracy. We tend to think that more is always better (floating point, 64 bit), especially in handling images, but apparently that has been massive overkill. Multiple speakers talked about weights as fixed-point numbers and most were getting down to 4-bit sizes. You might think this creates massive noise in recognition but it seems that incremental accuracy achieved above this level is negligible. This is supported empirically and to some extent theoretically. One speaker even successfully used ternary weights (-1, 0 and +1). These improvements further reduce power and increase performance.

    Another observation was that general-purpose algorithms are often the wrong way to go. General-purpose may be easier in implementation, but some objectives can be much better optimized if tuned to an objective. A good example is image segmentation – localizing a lane on the road, or a pedestrian, or a nearby car. For (automotive) ADAS applications the goal is to find bounding boxes, not detailed information about an object, which can make recognition much more efficient. Incidentally, you might think optimizing power shouldn’t be a big deal in a car, but I learned at this summit that one current autonomous system fills the trunk of a BMW with electronics and must cool down after 2 hours of driving. So I guess it is a big deal.


    What is the best platform for neural nets as measured by performance and power efficiency? It’s generally agreed that CPUs aren’t in the running, GPUs and FPGAs do better but are not as effective as DSPs designed for vision applications, DSPs tuned to vision and neural net applications do better still. And as always, engines custom-designed for NN applications outperform everything else. Some of these can get interesting. Kunle Olokotun, a professor at Stanford presented a tiled interleaving of memory processing units and pattern processing units as one approach, but of course custom hardware will need to show compelling advantages outside research programs. Closer to volume applications, Cadence showed several special capabilities they have added to their Vision P6 DSP, designed around minimizing power per MAC, minimizing data movement and optimizing MACs per second.

    Another problem that got quite a bit of coverage was software development productivity and building a base of engineers skilled in this field. Google, Facebook and similar companies can afford armies of PhDs, but that’s not a workable solution for most solution providers. A lot of work is going into democratizing recognition intelligence through platforms and libraries like OpenCV, Vuforia and OpenVX. Stanford is working on OptiML to intelligently map from parallel patterns in a re-targetable way onto different underlying hardware platforms. As for building a pool of skilled graduates, that one seems to be solving itself. In the US at least, Machine Learning is apparently the fastest-growing unit in undergraduate CS programs.

    Pixel explosion in image sensors

    AI was stuck for a long time in cycles of disappointment where results never quite rose to expectations, but neural nets have decisively broken out of that trap, generally meeting or exceeding human performance. Among many examples, automated recognition is now detecting skin cancers with the same level of accuracy as dermatologists with 12 years training and lip-reading solutions (useful when giving commands in a noisy environment) are detecting sentences at better than 90% accuracy, compared to human lip-readers at ~55%. Perhaps most important, recognition is now going mainstream. Advanced ADAS features such as lane control and collision-avoidance already depend on scene segmentation. Meanwhile the number of image sensors already built surpasses the number of people in the world and is growing exponentially, implying that automated recognition of varying types must be growing at similar speeds. Neural net-based recognition seems to have entered a new and virtuous cycle, driving rapid advances of the kind listed here and rapid adoption in the market. Heady times for people in this field.

    You can learn more about Cadence vision solutions HERE.

    More articles by Bernard…