NVM Survey 25 Wide Banner for SemiWiki 800x100 px (1)

Think your future historical encrypted data is secure? Think again…

Think your future historical encrypted data is secure? Think again…
by Bill Montgomery on 02-18-2017 at 7:00 am

It’s been 32 years since the successful sci-fi comedy, Back to the Future, saw 17-year old Marty McFly – played by Michael J. Fox – accidentally travel 30 years back in time to 1955. The film was a box-office smash, as audiences worldwide delighted in McFly’s antics, only to soon realize that the cool kid from the future was jeopardizing the very thing that made his life a reality – his then-teen parents meeting and falling in love.

A key tenant of this film is that somebody living in today’s world – McFly – has discovered a way to go back in time – through a cosmic “back door” – and gains access to secret (let’s call it “encrypted”) information considered inaccessible to anyone other than those who generated it first-hand.

While time travel is not conceivable as far as we know (though I suspect Google is working on it), what is conceivable is that people or nation states will soon be able to travel back in digital time and retrieve secure encrypted data always considered non-retrievable.

Say what?

Well, here’s what Isaac Chuang, a distinguished MIT Professor of Physics, Electrical Engineering and Computer Science, says,

”if you are a nation state, you probably don’t want to publicly store your secrets using encryption that relies on factoring as a hard-to-invert problem. Because when these quantum computers start coming out, you’ll be able to go back and unencrypt all those old secrets.”

That shouldn’t be a problem, right? After all, in today’s technologically-advanced day and age, who and what could possibly be using factoring in cryptographic schemes?

The answer: most everybody and everything. That’s because the foundational cryptography underlying most existing so-called “secure” solutions is RSA, which relieson factoring as a hard-to-invert problem for its underlying security and which leading pundits believe is on the verge of crumbling.

And it’s not just the pundits that are ringing the death knoll for RSA. The chart below, extracted from the US NIST Report on Post – Quantum Cryptography clearly states that RSA, ECDH and DSA are nearing the end of their life cycle.


Professor Chuang alerts Nation States to the dangers of continuing to use crypto which relies on factoring, but his warning just as easily applies to businesses, even individuals. Picture your competitors or personal enemies going back in digital time and unlocking any data that you have ever created and encrypted for safe-keeping. The impact of such revelations will be overwhelming, even dangerous.

It’s time for the world to abandon cryptographic schemes created last century and make the bold move to quantum-resistant crypto that will ensure secrets, be they national, corporate or personal, remain where they are meant to be – locked and impermeable to outside intrusion.


Could China Take the Lead in Installed 300mm Capacity?

Could China Take the Lead in Installed 300mm Capacity?
by Scotten Jones on 02-17-2017 at 12:00 pm

China buys more than half of the semiconductors manufactured in the world and yet only produces less the 10% of their own demand. Recently there have been a lot of announcements out of China about large scale investments in semiconductor manufacturing. The Chinese government for example has announced plans to invest $161 billion dollars over ten years in semiconductor manufacturing.

In terms of specific announcements Tsinghua Unigroup has the most ambitious plans. Tsinghua Unigroup has announced plans to invest $28 billion dollars in a 500,000 wafer per month foundry fab, $24 billion dollars for a 300,000 wafer per month 3D NAND Fab for their XMC subsidiary and also has plans for a $30 billion dollar investment in a 300,000 wafers per month memory fab. GLOBALFOUNDRIES has partnered with the Government of Chengdu on a $10 billion dollar foundry fab we project will produce >60,000 wafer per month including GLOBALFOUNDRIES 22FDX FDSOI process. There are several other projects underway or planned as well.

IC Knowledge LLC produces a database of all the current and planned 300mm fabs worldwide. We believe this is the most detailed and comprehensive 300mm database available. We are currently tracking 164 fabs and part of our analysis includes capacity by country.

As of the end of 2016 the top five countries in the world in terms of 300mm capacity in order from greatest to least capacity are:
[LIST=1]

  • South Korea
  • Taiwan
  • Japan
  • United States
  • China

    Based on current announcements and our projections we expect that China will pass the United States for the fourth most installed capacity by the end of 2018. The end of 2018 capacity by country would then become:
    [LIST=1]

  • South Korea
  • Taiwan
  • Japan
  • China
  • United States

    Further projecting forward to 2020 we are forecasting China may pass Japan for third place and the rankings to be:
    [LIST=1]

  • South Korea
  • Taiwan
  • China
  • Japan
  • United States

    As surprising as these results may be, by the end of 2023 are forecasting that China may pass Taiwan for second place and the ranking to be:
    [LIST=1]

  • South Korea
  • China
  • Taiwan
  • Japan
  • United States

    And finally, in 2024 we are forecasting China may become the world leader in installed 300mm capacity and the ranking to be:
    [LIST=1]

  • China
  • South Korea
  • Taiwan
  • Japan
  • United States

    Figure 1 summarizes the percentage of worldwide capacity each country represents by year.

    Figure 1. Percentage of worldwide 300mm capacity by country.

    This analysis comes with several cautions:

    • This is 300mm capacity only and does not include older legacy 200mm and smaller wafer sizes. However, as 300mm is the most advanced and productive wafer size we believe this is a good metric for determining manufacturing leadership.
    • This analysis is based on current and announced fabs. Other countries could install more capacity than we are currently forecasting, and, or china could install less capacity than they are currently planning. The recent announcements from China are both larger and more forward looking than most 300mm announcements. Certainly, China has been working to climb the ranks of the semiconductor industry for a long time with only moderate success to-date so not all of these announcements may actually take place.
    • Currently the 300mm fabs in China lag the leading edge. The leading foundries worldwide are ramping 10nm processes and preparing 7nm for the next 12 to 24 months at a time when China’s most advanced foundry fabs don’t yet have 14nm in production. The XMC 3D NAND Fab will bring up a 32-layer process at a time that other 3D NAND producers will be on 64 layers and working on 96 layers.

    Despite these cautions, the potential for China to have the largest installed 300mm capacity base by the mid twenty twenties should serve as a warning to the rest of the semiconductor industry of how aggressive China’s plans for semiconductor manufacturing expansion are.


  • Mentor Plays for Keeps in Emulation

    Mentor Plays for Keeps in Emulation
    by Bernard Murphy on 02-17-2017 at 7:00 am

    EDA has always been a fiercely competitive market, no more so than in emulation where the clash of claims and counter-claims can leave those of us on the sidelines wondering who’s really on top. Sales are the obvious indicator but leadership there flips back and forth between product releases. That makes Mentor’s choice to play a long game all the more interesting. This week they announced their Veloce Strato platform, raising the bar on specs again but also positioning this as the first step in a 5-year plan. When did you last hear of a 5-year plan from an EDA company?

    Jean-Marie Brunet (Sr. Director of Marketing at Mentor) told me they had started with a study of capacity needs, stretching out to 2021. They see need to support up to 2.5 billion gates today and they have charted sizes, based on semiconductor company announcements and internal analysis, up to 15 billion gates 5 years from now. These monster designs are found where you would expect – CPUs, GPUs, APUs (application processing units in smartphones) and NPUs (network processing units) – possibly multi-die or multi-chip, maybe single die if EUV ever goes mainstream.

    To get reasonable run-time performance, a design must fit in a single emulation box, whatever that box may contain. In light of these capacity projections, Mentor felt they had to move to a new architecture; Jean-Marie casts this as an evolution of the Veloce2 architecture rather than a revolution but significant enough that it can rise to these demands. This week they announced the Veloce Strato platform, designed to scale all the way to 15 billion gates, and availability of the Veloce StratoM emulator at 2.5 billion gate capacity as the first step in this plan.

    Together with the Veloce Strato OS, designed with the same objective in mind, the new Veloce StratoM system delivers some impressive stats:

    • 2.5x the capacity of Veloce2
    • Up to 3x improvement in compile times with 100% success rate
    • Up to 10x improvement in time to debug/visibility
    • Up to 3x co-modeling speed improvement
    • Overall, up to 5x improvement in throughput (compile-run-debug).


    Strato OS has been designed to be platform independent and remains compatible with Veloce Apps and protocol solutions; it is also interoperable across legacy Veloce installations (Jean-Marie didn’t say how far back), as well as with Strato solutions. The StratoM fits in the aisle of a datacenter at 4-4.5 racks high and remains air-cooled. And StratoM boxes can be linked through StratoLink to further extend multi-user support.

    Strato OS in many ways is the centerpiece of the Veloce Strato architecture. Of course, it must maintain compatibility and transparency across different Veloce architectures, but it also needs to offer support across multiple use-modes (ICE, accelerated testbenches, virtual components and others). Most important, it must integrate support for debug capabilities like Replay and LiveStream. This cross-platform support is a big part of what ensures scalability in the solution.

    Applications for emulation beyond functional verification are multiplying in areas like software development and debug, power estimation and test debug. These capabilities will also scale with the platform. And there’s another compelling application where this level of emulation horsepower is already starting to become important – validation. We tend to think primarily in terms of verification when we think of EDA hardware, but validation (does the system operate as expected, not just does it conform to the spec) is just as important. Mentor has already taken a step in this direction in their partnership with Ixia, to model realistic network traffic. They anticipate, and they can hardly be wrong, that pre-silicon validation along these lines can only become more important across a wide range of designs. That will for them drive Strato OS as a common platform for verification, prototyping and validation solutions.

    Naturally Mentor already has a StratoM customer and, just as naturally, it’s a customer who doesn’t allow their name to be used in press releases. But given the class of designs requiring this kind of box, it doesn’t take a lot of thought to narrow down the list of possibles. Feedback from that customer has been very positive and it sounds like other customers are now starting to use StratoM.

    Putting this all together, Mentor has laid out a path to support scalable emulation of 15 billion gates within 5 years, they have redesigned hardware and software to meet this goal and have delivered the first step on that path, Veloce StratoM, proven with at least one large customer. And finally, the solution requires no disruption to existing Veloce customer flows, apps or protocol modeling; a run on StratoM just appears to be on a bigger and faster resource. That looks like the start of a well-executed long-game.

    More articles by Bernard…


    Four Barriers to Using an SoC for IoT Projects

    Four Barriers to Using an SoC for IoT Projects
    by Daniel Payne on 02-16-2017 at 12:00 pm

    I often read about the large number of expected IoT design starts around the world, so I started to think about what the barriers are for launching this industry in order to meet the projections. One of my favorite IoT devices is the Garmin Edge 820, a computer for cyclists that has sensors for speed, cadence, power, heart rate, altitude and temperature. The Edge 820 also communicates with Bluetooth and ANT+ wireless protocols, and has GPS to track each ride. At the recent ARM TechCon event there was a panel session on this topic of IoT design with participants from ARM, Mentor Graphics, Open-Silicon and Sondrel. This group came up with the following four barriers to using an SoC for IoT projects:

    • Cost of the semiconductor IP blocks
    • Cost of the EDA software tools for design and verification
    • Silicon development costs
    • SoC design experience

    Typical IoT devices use sensors, which means processing analog signals, plus there is typically a processor to run an OS or code. Here’s a snapshot of the building blocks in most IoT chips:

    NRE
    This acronym stands for Non-Recurring Engineering, and it appears in three of the top four barriers to creating an IoT project. So what if there was a way to reduce this NRE level so that you could do a proof of concept at little to no costs? Now that idea sounds compelling, and it turns out that ARM and Mentor Graphics have done something about it.

    Related blog – Industrial IoT – Beyond Silicon Valley

    ARM DesignStart
    We’ve heard about the great success that ARM enjoys as an IP company offering CPU cores for many market segments, and they’ve created a way for designers t get a trial selection of their cores without charge, called DesignStart.

    What DesignStart means is that you can get a free download and use for design and simulation their Cortex-M0. The M0 offers a low-power, 32-bit CPU in a very small size.

    EDA Software Tools
    Now that you have a processor and some of your own analog IP blocks, you’ll need some EDA tools to do design exploration. There are free evaluation tools from Tanner EDA, part of Mentor Graphics, that last for 30 days, enough time to do a proof of concept. Design entry is done with schematic capture, and simulation is handled by T-spice for the analog portions and ModelSim for the digital blocks.

    Related blog – IoT from SEMI Meeting: EDA, Image Sensors, MEMS

    Sample IoT Design
    To illustrate how you would use the ARM + Mentor design flow, consider an example IoT design with a sensor, ADC block, and Processor:


    For this proof-of-concept design we’re just connecting up the components, but not running any code on the M0 processor, rather we are verifying the simple control logic between ADC and processor. In the Design Kit from ARM you’ll receive a pre-integrated processor subsystem with the following peripheral components:

    Our Control Block is shown in dark purple above so we next add Verilog code to describe the behavior using the text editor in S-Edit:

    The control block connects to the subsystem bus, so we use Verilog again after learning a bit about the APB (AMBA Advanced Peripheral Bus) and create a module to define APB inputs and outputs, design IOs, design signals, and port mapping:

    In Verilog we connect our peripheral to the M0, then we can write a simple text program for the M0 in C code using the ARM Keil MDK-Lite, a software development environment. Here’s the C code that sets the memory-mapped address of the APB port 15:

    This C code runs printf statements in the simulator through the UART module. With an ADC input set at 1.8V and ADC reference at 2.2V, then we will expect an ADC output value of (1.85V/2.2V) * 256 = 215. If 215 counts were simulated the test passes, else it fails.

    Simulating the IoT Design
    Design entry was done with S-Edit and the Verilog-AMS netlist gets split into two parts for simulation in either digital or analog simulators:


    One last step is to create a design testbench that models the analog sensor input as a constant 1.8V, has a clock, and reads IO values for display:

    Our 8-bit ADC does a successive approximation that converts the analog input from the sensor into a digital value, read by the processor. In the waveforms below we can see the Red signal reaching the 1.8V level:

    Summary
    It is now possible to do a proof-of-concept SoC design for an IoT project at no cost, other than your engineering time by using processor IP from ARM and EDA tools from Tanner EDA. So the first three barriers listed at the start are now addressed, the fourth barrier is addressed by ARM where they have a list of SoC design partners to help you through the development process. There is an 11 page White Paper from Mentor with more details here online. I cannot wait to see all of the new IoT designs coming out over the next few years that will improve my life.


    Aldec Rounds Out ALINT-PRO Checker

    Aldec Rounds Out ALINT-PRO Checker
    by Bernard Murphy on 02-16-2017 at 7:00 am

    If there’s anyone out there who still doesn’t accept the importance of static RTL verification in the arsenal of functional verification methods, I haven’t met any recently. That wasn’t the case in my early days in this field. Back then I grew used to hearing “I don’t make mistakes in my RTL”, “I’ll catch that in simulation”, “My editor automatically sets the RTL up correctly” and variants on these confident statements of infallibility.

    Positions like these became much less frequent after IP reuse and SoC design took off. You might still feel the same way about your own code, but now you must work with RTL developed by someone no longer at the company, and integrate with other RTL developed by parties even further removed. How can you know what assumptions they made? You don’t have time to reverse-engineer this stuff in simulation, so do you just hope the other designers thought exactly the way you do when they built that code?

    This topic is fairly widely understood in the ASIC world, perhaps less so in the FPGA world where design teams now working on monster FPGA SoC designs are starting to learn the importance of verification disciplines their ASIC counterparts have crafted over many years. A recent survey on trends in verification highlights that FPGA and ASIC verification needs are converging, which is good news for ALDEC who have in ALINT-PRO offered a common verification platform for both.

    Static verification is your first safety net in functional verification. Naturally it won’t catch complex functional problems but it will get you past the basics – incorrect inferences, unintended truncation, unclocked cycles and other basic design flaws. You could still catch many of these in simulation but in verification environments of any scale that would be grossly inefficient. These all kick off with smoke tests, including static verification, to ensure basic mistakes are caught before valuable simulation cycles are wasted.

    What’s more, simulation won’t catch everything. Signals crossing between asynchronous clock domains (say between a peripheral port and a central bus) can lock into metastable states or drop cycles, causing all kinds of havoc. While some claim you can catch these problems using simulation approaches, analyses of that kind are invariably incomplete. Static tools like ALINT-PRO have this kind of analysis built-in and since it is static, it is test-case independent, ensuring you will find all potentially problematic crossings.

    That said, I’ll now contradict myself by adding that sometimes a combination of static and dynamic analysis is essential to reach more complete domain-crossing verification, especially where functional behavior is an essential part of the check. This often comes up in checking handshaking synchronization schemes. Aldec support this through close linkages between ALINT-PRO and the Riviera-PRO simulator or other simulators.

    ALINT-PRO also provides pre-defined block-level models for Xilinx primitive libraries and now adds models for most of the Intel/Altera families of devices. This is important. When static analysis tools bump into a hard macro, such as a memory, they need hints on how to proceed, such as whether this is a registered interface and which clock controls the interface. Aldec provides a method for you to define these yourself, but life is a lot easier when models for all the basic blocks are already defined and verified with the FPGA vendor.

    One last point I learned the hard way during my time in these trenches. Static checkers are based on rules and everyone has their own opinion on what rules should and shouldn’t be checked and at what stage. Some users want the whole check to be fire-and-forget so they enable all rules (more must be better, right?) and run. The result is massive volumes of reports they can’t possibly read and which they therefore ignore. Until a silicon spin fails, the boss asks whether they checked the static analysis and there’s a long, awkward silence.


    The lesson is that checking everything makes no sense; you must be selective. And what you choose to select is sensitive to where you are in the design cycle. When building a brand-new RTL block, you might want to require more checks to comply with an internal (or external) standard. When checking modifications to a legacy piece of RTL, you need to loosen up; you don’t want to know about coding-style problems in areas you don’t plan to touch. In system integration, you want to focus mostly on functional issues (such as clock and reset domain crossing analysis). ALINT-PRO makes it possible to craft these choices in way that reflects your local preferences.

    You can read the ALINT-PRO product description HERE.

    More articles by Bernard…


    Using HSPICE StatEye to Tackle DDR4 Rail Jitter

    Using HSPICE StatEye to Tackle DDR4 Rail Jitter
    by Tom Simon on 02-15-2017 at 12:00 pm

    The world is a risky place, according to Scott Wedge, Principal R&D Engineer at Synopsys, who presented at the Synopsys HSPICE SIG on Feb 2[SUP]nd[/SUP] in Santa Clara. Indeed, the world circuit designers face can be uncertain. Dealing with risk and departure from ideal was a main theme in the fascinating talks at this dinner event, which is held every year in conjunction with DesignCon. Scott focused on how it is possible to overcome uncertainty through predictability.

    Scott talked about new features in HSPICE that can help designers do their job better and work to manage that risk. Sources of uncertainty can come from variability and mismatch. Then over time MOS aging can contribute to reliability issues down the road. With smaller Vdd, circuits are more vulnerable to noise and jitter. The list goes on. Shortly I’ll get back to one of his items – noise effects that can lead to higher bit error rates in high speed data paths.

    There were two customer presentations as well. One from Xilinx on how Synopsys worked with them to create new modeling methods to enable advanced node designs. The other customer presentation was from ARM where they discussed large scale Monte Carlo simulation methods.

    John Ellis from the Signal and Power Integrity (SPI) Group at Synopsys gave an impressive presentation titled “Simulating Power Supply Induced Jitter Effects in Low BER DDR4 Interfaces Using StatEye”. While this is quite a mouthful, the title makes very clear what the subject matter of the talk was. With DDR4 and LPDDR4 there is now an explicit BER requirement. Brute force simulation of 10^16 bits leaves two choices: an eternity of SPICE runs, or the use of a Linear Time Invariant (LTI) environment in a statistical simulator. We are caught between the need for transistor level accuracy and the overwhelming time requirements of traditional SPICE.

    John’s approach is to leverage HSPICE StatEye functionality to capture simultaneous switching output (SSO) effects – including rail jitter. StatEye at its most basic can rapidly capture the pulse response from an LTI system. This approach works well for modeling the datapath for LPDDR4. However, there is more at work here that the data path by itself. Fortunately, it is possible to use HSPICE StatEye to model nonlinear effects as well.

    The question he posed is: Can we capture SSO performance adequately? His approach is to include the IO model in the StatEye simulation. By adding it to the channel, it is possible to include it in the pulse response – which will be affected by the IO voltage. Drawing the IO current from a non-ideal supply path with rail noise creates a nonlinear response with the output distorted by SSO noise.

    SPICE simulations show differing edge response times for rise and fall when the model is nonlinear. Standard StatEye misses these effects. Synopsys StatEye can model nonlinearities with “Multi-Edge” or “Full Transient” mode. Each has its advantages and limitations.

    Multiple Edge response requires one simulation for each edge. Each response can be saved for future use to shorten run times. Clearly with the addition of each edge response the HSPICE StatEye better matches the SPICE results.

    Full Transient response gives a probability density function (PDF) based on an arbitrary bit stream. However, it cannot be saved and must be rerun for each case. This will serve as a good starting point to see how well we can model the effect of rail jitter on the data path. If this works, Multiple Edge Response is a possible route to increased flexibility and speed.

    John’s presentation included details about the simulation model set up for DDR4. Then he showed how rail noise of around 13% will affect the reference SPICE runs. He compared the “Full Transient” mode of HSPICE to SPICE in predicting rail noise and saw good correlation. If this did not work, the subsequent analysis based on this nonlinearity would not be useful.

    Next he compared HSPICE StatEye in Full-Transient mode for the full datapath including the rail noise to his SPICE results and saw good agreement in the horizontal and vertical openings. VREF was shifted somewhat. Next he worked to include more realistic triggering by factoring in DQS, including jitter. This does require two StatEye runs, but the results are impressive. The first run is used to generate the jitter function and the second one applies this to the full channel. The results are definitely better.

    The next steps were to apply the Multiple Edge mode to the same case. Rather than attempt to cover the details here, I would suggest viewing the entire presentation, along with the others from the meeting.

    The end results show that by applying HSPICE StatEye with its full capabilities to power and signal integrity problems like this can yield significant insight into system performance. This in turn, can help reduce risk in high speed designs by factoring in real world effects during the design and verification stages of product development.


    Xilinx vs Altera Update 2017

    Xilinx vs Altera Update 2017
    by Daniel Nenni on 02-15-2017 at 7:00 am

    I truly miss the Xilinx versus Altera war of words (competition at its finest) and competition is what makes the fabless semiconductor ecosystem truly great, absolutely. So with great disappointment I read the Intel Analyst Day transcript published by Bloomberg last week. It is attached at the bottom in case you are interested but to me it is a 41 page snoozefest.

    Here is the detailed (sarcasm) update on the Programmable Solutions Group (Altera) from Intel CEO Brian Krzanich (BK). Please note that PSG was not even on the agenda with the rest of the Intel groups:

    What I’m really proud of for this group is two things for 2016. One, 14 nanometer, first 14 nanometer FPGA shipped to customers, which is on Intel technology…

    Let’s not forget that Intel and Altera signed a foundry agreement exactly four years ago for the 14nm product that they are just now shipping today. This agreement led to the image below showing that the Intel 14nm process is by far superior to the TSMC 16nm process. Xilinx has been in production at 16nm for more than a year now which brings me to my first question for BK:

    Can you please update this slide now that both chips are in production?

    and secondly, in 2016, they hit their growth targets growing faster than market, meaning we believe we gained share in 2016. First year out of an acquisition, we feel part of an acquisition we feel like those are great results and something to be proud of for Intel and the Altera team.


    The FPGA market has an expected CAGR of 8.4% from 2016 to 2020 and according to the recent Xilinx investor call:

    [LIST=1]

  • Sales from our 28-nanometer Zynq product family increased by nearly 20%.
  • 20-nanometer revenue again reached a record level, significantly exceeding our $50 million target.
  • 16-nanometer sales grew significantly in the December quarter to a new record, exceeding our forecast.
  • 16-nanometer shipping 12 unique products to over 300 active customers.

    What specific market share did you gain in 2016?

    And we’re already starting to look at 10-nanometer products for FPGAs as well. So, we believe we can continue to win share and grow faster than market in FPGAs.

    Xilinx is skipping TSMC 10nm in favor of an accelerated 7nm schedule which will go into production in 2018. As history has shown, the first FPGA to a process node wins majority market share. Xilinx beat Altera to 28nm by a quarter. Xilinx beat Altera to 20nm by about a year, and 16nm beat 14nm by more than a year.

    Xilinx Collaborates with TSMC on 7nm for Fourth Consecutive Generation of All Programmable Technology Leadership and Multi-node Scaling Advantage. Four generations of advanced process technology and 3D ICs, fourth generation of FinFETs

    SAN JOSE, Calif., May 28, 2015/PRNewswire/ –Xilinx, Inc. (NASDAQ: XLNX) announced that it has collaborated with TSMC on the 7nm process and 3D IC technology for its next generation of All Programmable FPGAs, MPSoCs, and 3D ICs. The technology represents the fourth consecutive generation where the two companies have worked together on advanced process and CoWoS 3D stacking technology, and will become TSMC’s fourth generation of FinFET technology. The collaboration will provide Xilinx a multi-node scaling advantage and build on its outstanding product, execution, and market success at 28nm, 20nm, and 16nm nodes.

    From what it looks like today, Xilinx 7nm will beat Altera 10nm again by more than a year. In fact, Xilinx may be at TSMC 5nm by the time Altera has ramped up 10nm.

    Bottom line: The problem with semiconductor marketing writing checks that engineering can’t cash is that at some point in time those checks will bounce. Intel’s former bravado and current lack of transparency in the FPGA market has fallen flat and that is nothing to be proud of.


  • Making Functional Simulation Faster with a Parallel Approach

    Making Functional Simulation Faster with a Parallel Approach
    by Daniel Payne on 02-14-2017 at 12:00 pm

    I’ll never forgot working at Intel on a team designing a graphics chip when we wanted to simulate to ensure proper functionality before tapeout, however because of the long run times it was decided to make a compromise to speed things up by reducing the size of the display window to just 32×32 pixels. Well, when first silicon arrived, sure enough, the only display that worked was 32×32 pixels, so we had to do another re-spin to correct for logic bugs. In the 1980’s it was quite popular to be using logic simulators that were interpreted, making it easy to interactively debug hundreds to thousands of gates.

    In the 1990’s I was working at an EDA company that acquired the simulator company that had just written the fastest, compiled-code Verilog simulator. Wow, what a dramatic improvement over the older, interpreted logic simulators.

    Today, we have SoCs with billions of gates, so this extreme size has really pushed the EDA vendors to come out with something new that can handle that capacity with run times that take hours to days, instead of weeks. The new approach to deal with these present day challenges is a 3rd generation, parallel simulation engine that scales. Here’s a chart showing the three generations of functional simulators:

    I spoke by phone with Adam Sherer of Cadence Design Systems recently to get his insight about functional simulation since the 1980’s. It turns out that back in early 2016 Cadence acquired this start-up company Rocketick with a parallel simulator called RocketSim. Yes, most of the EDA companies had been trying to develop their own parallel simulators, but the earliest results were not promising enough to become viable products because of poor scaling and manual compile processes. The real accomplishment of RocketSim was to provide a parallel simulator that could:

    • Handle multiple cores
    • Accept multiple clocking domains
    • Work with complex interconnect fabrics
    • Simulate hundreds of IP cores
    • Scale to billions of components
    • Support RTL, gate-level functional simulation and gate-level DFT

    Related blog – EDA Mergers and Acquisitions Wiki

    The secret sauce behind RocketSim is the ability to identify dependencies among independent threads of execution, while minimizing the memory footprint required. You can expect the following typical speed-ups when using this parallel simulation approach:

    • 3X for Verilog / SystemVerilog RTL
    • 5X for gate-level functional simulation
    • 10X for gate-level DFT

    With fine-grain multi-processing technology, you can run RocketSim on multi-core servers using up to 64 cores, and it knows how to separate your code into portions that can be accelerated, and portions that cannot be accelerated. For the actual users of this simulator you don’t need to change your testbench, design or even the assertions, now that’s convenient.

    Forum – CDNS reports increases in Q1 2016 results, $448M revenue, $0.17/share earnings

    The largest SoC teams have long used hardware-based engines like Palladium to get even faster runtimes, although that approach can become pricey compared to software simulators. One difference between a software simulator like RocketSim and hardware engine like Palladium, is that RocketSim handles four-state logic which includes the Z and X states while the hardware engine supports only 2-state logic.

    Related blog – Improving Methodology the NVIDIA Way

    I was impressed to learn that the RocketSim team, based in Israel, has actually grown in size since being acquired by Cadence, always a positive sign that the team is being treated well and that the marketplace is growing for a parallel simulator.

    Summary
    Functional simulation has come a long ways since the 1980’s, so we are living in exciting times as the promise of parallel simulation is being adopted to keep simulation run times reasonable instead of having to wait weeks and months for regression results. Adam Sherer has written a White Paper on this topic that you may read online here.


    The Next Big Thing in Deep Learning

    The Next Big Thing in Deep Learning
    by Bernard Murphy on 02-14-2017 at 7:00 am

    I mentioned adversarial learning in an earlier blog, used to harden recognition systems against bad actors who could use slightly tweaked images to force significant misidentification of objects. It’s now looking like methods of this nature aren’t just an interesting sidebar on machine learning, they are driving major advances in the field (per Yann LeCun at Facebook).

    The class of systems considered in these approaches are called Generative Adversarial Networks (GANs) in which one neural network is played off against another. One network, called the discriminator, performs image recognition with a twist – it reports on whether it believes the image to be real or fake (artificially constructed). The second network, called the generator, reverses the normal function of a recognition system to create artificial images which it feeds to the discriminator. If the discriminator determines an image to be fake, it feeds back information to the generator on what caused it to come to that conclusion.

    The beauty of this setup is that this pair of networks, after a bootstrap on a relatively modest set of real images, can self-train to recognition/generation levels of quality that would normally require much larger set of labeled image databases. This is a big deal. A standard reference for images, ImageNet, contains over 14 million images across 1000 categories. That’s for “standard” benchmark images. If you want to train on something outside that set, unless you get lucky you must first build a database of tens of thousands of labeled reference images. But with GAN approaches you can reduce the size of the training database to hundreds of images. That’s not only more efficient, it can be important where access to larger databases can be limited for privacy reasons, as is often the case for medical data.

    This raises an interesting question in deep learning – if GAN-enhanced training on a small set of examples can achieve similar levels of recognition to (non-enhanced) training on a much larger set, doesn’t that imply significant redundancy in the larger set? But then how do you measure or better yet eliminate that redundancy? This is a question we understand quite well in verification, but I’m not aware of work in this area for deep learning training data. I would think the topic should be extremely important. A well-chosen training set, together with GAN methods, could train a system to be accurate in recognition across a wide range of examples. A poorly chosen training set, even with GAN reinforcement, could do no better than recognize well across a limited range. If anyone knows of work in this area, let me know.

    So one thing you get out of GAN is improved learning on smaller datasets. But the other thing you get is improved image generation (because the discriminator is also training the generator). Why would that be useful? I can imagine that movie-makers might find some way to take advantage of this. A more serious application is to support something called inpainting – filling in missing parts of an image. This has obvious applications in criminal investigation as one example.

    Another very interesting application is in astronomy, specifically in approaches to mapping dark energy by looking for weak gravitational lensing of galaxies. This is a tricky problem. We don’t know really know much about dark energy, and we’re looking for galaxies whose size and shape we don’t know, because they’re distorted by that dark energy. This seems like a problem with too many unknowns, but one group at CMU have found a way to attack the problem through generative creation of galaxy images. They expect to be able to use methods of this nature, together with models of estimated shearing of the images caused by lensing, to map against the images we actually detect. By tuning to get accurate matches they can effectively deduce the characteristics of the dark energy distribution.

    Deep learning marches on. It continues to become more interesting, more capable and more widely applicable. The Nature article that started me on this topic is HERE.

    More articles by Bernard…


    Qorvo Uses ClioSoft to Bring Design Data Management to RF Design

    Qorvo Uses ClioSoft to Bring Design Data Management to RF Design
    by Mitch Heins on 02-13-2017 at 12:00 pm

    A couple weeks ago I gave a heads-up about a webinar that was being hosted by ClioSoft, Qorvo and Keysight. The topic of the webinar was how to manage custom RF designs across multiple design teams and CAD flows. The webinar was held on February 1st and included presentations by Marcus Ray of Qorvo and Michele Azarian of Keysight.

    Much has been written about ClioSoft’s SOS product. In summary, it’s a great product for data and IP management and it enables companies to manage design complexity across multiple, geographical dispersed design teams. I saw this put into practice in my previous job, where one of our customers, a large semiconductor IC provider, was using ClioSoft with our EDA products to simultaneously work on the same design using teams located in Japan, Europe and the United States. In that case, the task was somewhat simplified by the fact that all of those team were using tools from one EDA vendor. None the less, it was a powerful statement as to the capabilities that ClioSoft provided.

    In the Qorvo case, they are doing RF system design using multiple IC technologies all assembled onto a 4 to 10-layer laminate that includes matching components embedded in the laminate for each die. Qorvo teams are dispersed across multiple locations around the world with some of those teams using Cadence’s Virtuoso design environment while others are using Keysight’s ADS environment.

    At least three things had to happen to make this flow possible.

    • Cadence had to integrate with ClioSoft SOS – first versions of this were released in 2001, not too long after ClioSoft’s founding in 1997.
    • Keysight had to integrate with ClioSoft SOS – first versions of this were released in 2012.
    • Lastly, ClioSoft recently did some interoperability work around OpenAccess (which was already being used by both Cadence and Keysight) to enable the two different EDA vendors to both read and write the same OpenAccess meta-data. This last step enabled true interoperability by giving both systems the same understanding about all of the data being shared.Data sharing was made much simpler by the fact that both Cadence and Keysight were using the same OpenAccess databases for their design repositories. However, as anyone who has worked with OpenAccess knows, it takes more to interoperate than simply being able to read and write OpenAccess data. The ability to share a common understanding of what is in the database makes all the difference in the world and ClioSoft’s work to codify this meta-data was a key component to making this interoperability flow work. Once this was in place, the Qorvo design teams were able to seamlessly move back and forth between Virtuoso and ADS while taking advantage of all of the ClioSoft data management capabilities.

      The beauty for Qorvo is that they are able to use ClioSoft SOS to manage and share their RF work across all their various design sites while interoperating between Cadence Virtuoso and Keysight ADS. Another key feature in this setup is that SOS is fairly technology agnostic as seen by the fact that Qorvo is managing multiple IC technologies in addition to laminate substrates. The impact of this is that Qorvo is able to hierarchically build designs using these different technologies and then simulate them together at the system level.

    Another nice feature of this setup is that Qorvo can use SOS’s capabilities to seed the workspaces of each of their design groups. This includes common libraries, IPs, test benches, scripts and data files. Even though the teams are in different geographies, they all have access to a consistent design environment that is kept up to date by design management policies. Designers can selectively pull what they need, but having ready-made setups available saves time and goes a long way toward eliminating simple but critical errors like using out-of-date test benches or high level models that don’t match their lower level implementations.

    The webinar did a nice job of explaining the desire and need for data and IP management and the presenters did a good job of showing how easy it was to move back and forth between the tool sets integrated with ClioSoft’s SOS7. Key features for Qorvos in this heterogeneous environment included revision control of IP and designs, team collaboration across multiple sites, archiving of design revisions and IP management that is used to trace IP usage in each of their tape-outs. This last item can come in handy when issues are found in an IP block and you need to know which designs may be affected by those issues. Additionally, having archival and versioning control also enables downstream teams such as product and test engineering to have easy access to design data without the need to hunt down design engineers who have moved on to different projects.

    Which brings up the final point of this webinar, which is that data management really needs to be done across the entire design and manufacturing flow, including requirements and specifications, logic design and test bench generation, IC and laminate layout, tape-out revision control, packaging revisions, and back annotation of empirical data from fabricated parts. These systems are complex and require good data management practices to ensure success and Qorvo found that ClioSoft SOS was the platform that worked for them.

    If you missed the webinar you can view a video recording of the event here: http://cliosoft.com/corp_web/webinar_recordings/ads_0117_qrvo/request.php

    Additional information can also be found for each vendors’ offerings at the following websites:
    ClioSoft: www.cliosoft.com

    Keysight Technologies: http://www.keysight.com/en/pc-1297113/advanced-design-system-ads?cc=US&lc=eng

    Cadence Design Systems: http://www.cadence.com

    Also Read

    Qorvo and KeySight to Present on Managing Collaboration for Multi-site, Multi-vendor RF Design

    Tool Trends Highlight an Industry Trend for AMS designs

    Managing International Design Collaboration