Synopsys IP Designs Edge AI 800x100

Cooley on Synopsys-EVE

Cooley on Synopsys-EVE
by Paul McLellan on 10-02-2012 at 7:56 am

John Cooley has an interesting “scoop” on the Synopsys-EVE acquisition. The acquisition itself is not a surprise, it is the one big hole in Synopsys’s product line and EVE is the perfect plug to fill it. It was also about the only thing Cadence has (apart from PCB) that Synopsys does not.

The interesting thing is that John noticed it via a filing in San Jose district court where a document said that Synopsys entered into an agreement to acquire EVE on Septemeber 27th. Even before the acquisition is even announced (when, presumably, Synopsys would inherit the EVE-Mentor legal stuff) Synopsys seems to be suing Mentor to get a ruling that Mentor doesn’t have a leg to stand on.

What I don’t understand is how, under full-disclosure rules, it is acceptable for Synopsys to tell the court without telling analysts and, indeed, everyone else by a press release. Right now I’m in China and for some reason the great firewall won’t let me access the Synopsys website (all that subversive simulation technology or something) but I can access the EVE website and there is nothing there about any acquisition. Synopsys is a $1.75B company and EVE is a $60M-ish company so perhaps it is regarded as completely non-material but I’ve never seen anything like it where via one obscure channel the company is aannouncing the acquisition while via the normal channel for announcing these things there is silence. I wonder what would happen if someone formally asked the company about it: “we never comment on takeover rumors” is the usual response of course.

Presumably more will become clear in the next few days, perhaps even tomorrow. We shall see.

Another interesting question, mainly of prurient interest, is who leaked it to John. I mean I can’t believe he is a daily reader of San Jose court documents.

More details on deepchip here.


Current Embedded Memory Solutions Are Inadequate for 100G Ethernet

Current Embedded Memory Solutions Are Inadequate for 100G Ethernet
by Sundar Iyer on 10-01-2012 at 7:00 pm

With an estimated 7 billion connected devices, the demand for rich content, including video, games, and mobile apps is skyrocketing. Service providers around the globe are scrambling to transform their networks to satisfy the overwhelming demand for content bandwidth. Over the next few years, they will be looking to network equipment manufacturers to provide high performance and cost effective products that will ultimately fulfill the promise of 100G Ethernet. However, network equipment manufacturers must grapple with the stark reality that aggregated line speeds (and as a result the requirements on aggregate networking processing speeds) have grown at over 25% CAGR in the past decade. This has dwarfed the growth rate of SoC memory clock speeds.

System designers can crank up processing performance with faster processer speeds, parallel architectures, and multicore processors. However, if memory performance cannot keep up, processors will have to wait for memory requests to execute, which will cause the system to stall. Memory performance must be increased. Since memory clock speeds are limited, the next logical step to boost performance is to use multiport memories, which allow multiple memory access requests to be processed in parallel within a single clock cycle.

For example, with the recent launch of 100G Ethernet, there is a need for line cards supporting two or four 100G links per card. As aggregated line rates approach 400 Gb/s (via the more traditional aggregation of 40 10Gb/s ports, or 4 100 Gb/s ports in future), the networking datapath needs to support 600 million packets per second i.e., 1200 million 64-byte cells per second in the worst case. This requires 1.2 GHz clock frequency or more, depending on the design, which is not possible with any single-port memory available today.

While multiport memories have traditionally had a reputation for being difficult to implement, new technology now makes multiport memories an attractive choice for high performance networking applications. At Memoir we have developed Algorithmic Memory™, essentially a configurable multiport memory that can be synthesized by combining commercially available single-port memory IP with specialized memory algorithms. These algorithms employ a variety of techniques within the memory to increase performance such as caching, address-translation, pipelining, encoding, etc. which are all transparent to the end user. The resulting memories appear as standard multiport embedded memories (typically with no added clock cycle latency), that can be easily integrated on chip within existing SoC design flows.

Algorithmic Memory addresses the challenges of memory performance at a higher level. It allows system architects to treat memory performance as a configurable characteristic with its own set of trade-offs with respect to speed, area and power. For example, it is possible to increase the performance of single-port memory by 4X, as measured in memory operations per second (MOPS), by using the single-port memory to generate a four-port Algorithmic Memory.

In addition, the read and write ports of these multiport memories can be configured based on the application requirement. For example, as shown in Figure 1, multiport memories with four read ports (e.g.: 4Ror1W per cycle multiport memory) can be used for applications such as lookups which primarily need read performance, and perform occasional writes. Similarly, other applications such as Netflow, Statistics Counters, etc. (which need equal read and write performance) can use 2R2W multiport memories.


Figure 1. There are no single-port memories that can support networking data-paths with 600M packets (or 1200M 64-byte cells/sec) per second. Four-port Algorithmic Memory can deliver up to 2400 million MOPS at 600 MHz as required for next generation 10G/100G Ethernet systems. Memory read/write ports can also be configured for application specific requirements, including asymmetrical reads and writes.

Depending on the memory IP core that is selected, it is possible to create memories with performance increases of up to 10X more MOPS. In some cases, Algorithmic Memory technology can also be used to lower memory area and power consumption without sacrificing performance. This is because there is a significant area and power penalty when a higher performance memory is built using circuits alone. With Algorithmic Memory technology, it is possible to take a lower performance memory (which typically has lower area and power), incorporate memory algorithms, and synthesize a new memory. The new Algorithmic Memory achieves the same MOPS as a high performance memory built using circuits alone, but can have lower area and power. The area and power savings are even more beneficial for high performance networking ASICs (typically over ~400mm[SUP]2[/SUP] and over 80W in power) that are architected at the inflection point of cost, power, yield and buildability.

Since they are implemented as a soft RTL, Algorithmic Memories are compatible with any process, node, or foundry. The memories use a standard SRAM interface with identical pinouts, and integrate into a normal SoC design flow, including ASIC, ASSP, GPP and FPGA implementations.

In summary, memory processing tends to be the weak link in increasing network performance. Networking wire speeds are increasing faster than the increase in memory clock speeds. Networking gear is very memory intensive and often requires several memory operations per packet. The bottom line is that faster processors alone cannot improve network performance unless we are able to increase the total MOPS. As rates approach 400 Gb/s, there are no practical viable memory solutions, other than to use multiport memories. As rates approach 400 Gb/s, there are no practical viable physical memory solutions. Algorithmic memories (which build on physical memories), offer a scalable and versatile alternative, that can help alleviate the performance challenges of aggregated 100G Ethernet and beyond.


User Review: iOS 6 on iPad

User Review: iOS 6 on iPad
by Daniel Payne on 10-01-2012 at 11:11 am

Much has been written about the new iPhone 5 and iOS 6 in terms of the features, specifications, bill of materials, and chips used in the design. Today I’ll share my experiences of actually using the new iOS 6 on iPad as an EDA blogger.

Upgrading to iOS6
Clicking the On button and noticing that the App Store icon has something new, I just click the icon to start the upgrade process. The download took about 5 minutes and then the actual install takes another 15 minutes or so.
Continue reading “User Review: iOS 6 on iPad”


Toshiba’s ReRAM R&D Roadmap

Toshiba’s ReRAM R&D Roadmap
by Ed McKernan on 09-30-2012 at 11:00 pm

Most companies in the memory business have ReRAM on their radar if not their roadmaps. Toshiba have made some bullish comments about the roadmap and chip size for ReRAM at a recent R&D Strategies Update. At face value, the schedule would put Toshiba quite a bit ahead of their competitors. Over at ReRAM-Forum.com, we have done a little digging into these announcements and wonder what it all means. Read More at the ReRAM-Forum Blog.


Converge in Detroit

Converge in Detroit
by Paul McLellan on 09-30-2012 at 10:04 pm

When I worked for VaST we went to a show that I’d never heard of in EDA: SAE Convergence (SAE is the Society of Automotive Engineers). It is held once every two years and it focuses on transportation electronics, primarily automotive although there did seem to be some aerospace stuff there too. This is an even year, Convergence is October 16-17th.

The conference is held in the Detroit Convention Center (officially the Cobo Center). This is one of those convention centers that the city built in a blighted area (OK, pretty much downtown, but I repeat myself) to try and revitalize it. It didn’t work. The area still has nothing there apart from a couple of hotels that people wiser than us didn’t stay in. We asked in the hotel where there was somewhere we could get breakfast and it turned out to be a 4 mile taxi ride.

You can see the old glories of Detroit, what obviously used to be smart department stores now just boarded up. It is hard to believe that in the 1950s Detroit was not just rich, it was the richest city in the entire world. The one place in Detroit that seemed to be safe and fun was Greek Town, which really did have some very good Greek restaurants (and lots of other types).

Anyway, Convergence is mostly about automotive electronics. There are really two separate businesses in automotive. Safety critical (airbags, ABS, engine control) and everything else (adjusting your seat, GPS, radios). The issues for the two markets are very different. After all, if you have an acceleration problem like Toyota a few years ago (almost certainly driver error and nothing to do with electronics) you have a big problem. If you occasionally have problems getting to the next track on your CD, not so much.

Infotainment is pretty much like any other consumer product. A lot of it is ARM-based. There is not much difference between designing an in-car GPS than a smart-phone. In fact nobody really wants an in-car GPS any more since we already have smart-phones. Just as the ideal in-car entertainment system is just a plug for our iPhone/Android phone. An expensive multiple-disk CD-changer just isn’t really needed when we have all our music in our pockets.

Safety critical is an odd beast. Quite a lot is based on unusual microprocessors that you’ve probably never heard of (NEC v850 for example). Some is standard (Freescale is a big supplier). There are in-car networks like FlexRay and CANbus. When I was visiting GM back then, they had two big strategic programs: embedded software and emissions/mileage. Since much of the second is down to the engine control computers, which are mostly software, the two programs had a large area of overlap.

Synopsys will be at Convergence, booth 719. They will have 3 demos, one in safety critical, one in infotainment and one in multi-domain physical modeling:

  • Virtual Hardware-in-the-Loop & Fault Testing for Safety Critical Embedded Control.This demonstration will highlight how Virtualizer Development Kits based on microcontroller virtual prototypes can help start system integration and test early as well as facilitate fault injection testing for safety critical embedded control applications. Microcontroller from Freescale, Renesas and Infineon as well as integration with automotive development tools such as Simulink, Vector and Saber are supported by Synopsys.
  • Accelerating the Development of Android SW Stacks for Car Infotainment Application.This demonstration will highlight how Synopsys Virtualizer Development Kits for ARM Cortex Processors provide an Android-aware debug and analysis framework allowing for a deterministic and successive top-down debug approach. The demonstration will leverage a virtual prototype of the ARM big.LITTLE subsystem.
  • Multi-Domain Physical Modeling and Simulation for Robust xEV Power Systems. This demonstration will highlight how SaberRD is used to create high-fidelity physical models of key components in an xEV powertrain and how its extensive selection of analysis capabilities can be applied to assess system reliability and achieve robustness goals.

Click here to register for Convergence (sorry, you missed the pre-registration discount).


How Much Cost Reduction Will 450mm Wafers Provide

How Much Cost Reduction Will 450mm Wafers Provide
by Paul McLellan on 09-28-2012 at 9:05 pm

I’ve been digging around the Interwebs a bit trying to find out what the received wisdom is about how big a cost reduction can be expected if and when we transition to 450mm (18″) wafers from today’s standard of 300mm (12″). And the answers are totally all over the place. They vary from about a 30% cost reduction to a cost increase. That is, 450mm wafers may cost more per square cm than 300mm.

Here are some of the issues:

Creating wafer blanks. It is currently very expensive and not expected to come down to equivalent 300mm costs. It turns out that pulling an 18″ ingot of silicon has to be done really slowly because of various stress reasons. A 450mm wafer blank is expected to be twice the price per area of 300mm.

Double patterning. There is some saving during lithography using 450mm versus 300mm because you don’t have to switch from one wafer to the next so often. But the main litho step of flashing the reticle on each die has no gain from having more die on the wafer. A flash is a flash. And with double patterning that step is a bigger fraction of the whole manufacturing process. Oh, and don’t think EUV will save the day completely. It is so late that the earliest that it can be introduced is maybe 9nm and expectation is that EUV will require double patterning (it is 13.5nm light, although I don’t really understand why the OPC techniques we used to get, say, 90nm with 193nm light without double patterning won’t work, but I’m not an expert on optics).

Since 450mm wafers hold roughly twice as many die as 300mm, any given volume involves half as many wafers so, except for the highest volume parts like flash, there will be an increase in the number of machine setups (fewer machines, fewer fabs, but more switches from lot to lot).

Cost of equipment. This is the biggie. If 450mm equipment only costs about 1.3X 300mm equipment then the future is probably rosy. But there are people out there predicting cost increases of 3X which will make the transition economically infeasible.

Capital efficiency. The big advantage, if 450nm is a success, is that it is more capital efficient. It doesn’t take so much capital to get the same capacity in place. But if that is really so, can the semiconductor equipment industry survive?

Process generations. Received wisdom is that only around now are semiconductor equipment manufacturers through their R&D costs and starting to make money on their investment for the 300mm transition. That’s a lot of process nodes. How many more nodes will there be (with current equipment) to recover any 450nm transition?

Who wants it? Intel and Samsung. TSMC is apparently a reluctant guest to the party since their business requires smaller runs and lots of changes from one design to another, so realistically their cost savings will be less than Intel and Samsung whatever happens.

Human resources. Developing new processes to keep on the 20nm-14nm-9nm transition already is arguably short of people since it has become so demanding. Where are the people going to come from to manage the 450nm transition? If we get 450mm but not 14nm what does that do to costs?

SEMI, the semiconductor equipment association, has been against 450nm. Almost everything they have written for the last 5 years has been against it. I can’t see how it will be a win for them. In fact, I think the Intel/TSMC/Samsung investments in ASML are an acknowledgement of that. The R&D investment has to come from the end users of the equipment.

Here’s a SEMI quote, admittedly from 2010:”450-mm wafer scale-up represents a low-return, high-risk investment opportunity for the entire semiconductor ecosystem; 450-mm should, therefore, be an extremely low priority area for industry investment,” says a recent SEMI report.

OK, so that’s not very informative and I haven’t come across anything that convinces me that anyone really knows the answer. But there is a lot of food for thought.


Variation at 28-nm with Solido and GLOBALFOUNDRIES

Variation at 28-nm with Solido and GLOBALFOUNDRIES
by Kris Breen on 09-27-2012 at 9:00 pm

At DAC 2012 GLOBALFOUNDRIES and Solido presented a user track poster titled “Understanding and Designing for Variation in GLOBALFOUNDRIES 28-nm Technology” (as was previously announced here). This post describes the work that we presented.

We set out to better understand the effects of variation on design at 28-nm. In particular, we had the following questions:

  • How does process variation compare between 28-nm and 40-nm?
  • How do process corners compare vs. statistical variation?
  • How can variation be handled effectively in 28-nm custom design?

To answer these questions we looked at GLOBALFOUNDRIES 28-nm and 40-nm technologies, and performed variation-aware analysis and design with Solido Variation Designer.

We first looked at how process variation has changed from GLOBALFOUNDRIES 40-nm technology to the 28-nm technology by measuring transistor saturation current, idsat, for a minimum-size transistor in each technology. The plots below show the statistical distribution of idsat in 28-nm (left) and 40-nm (right) technology with global variation applied. The results show a significant overall increase in idsat variation from 40-nm to 28-nm. When both global variation and mismatch effects were included, the overall increase in idsat variation was lower, but still quite significant. This result underscores the increasing importance of accounting for variation.

Next we used a delay cell and a PLL VCO design to compare PVT corner simulations with statistical simulations in GLOBALFOUNDRIES 28-nm technology. The plots below show the the results for the delay cell (left) and for the PLL VCO (right). From the plots it can be seen that, for the simple delay cell, the FF and SS process corners reasonably predict the tail region of the statistical performance distribution. However, for the more complex PLL VCO, the FF and SS corners do not align well with the best- and worst-case performance of the statistical distribution. This is because simple FF and SS corners are normally extracted based on digital performance assumptions and often do not reflect the true worst-case performance conditions of a particular design. To avoid this fundamental limitation, it is necessary either to increase the number of corners being simulated or else to use statistical simulation.

Solido Variation Designer helps with both of these approaches: Solido’s fast PVT analysis can be used when a large number of corners exist to reduce the number of simulations required, and Solido’s sigma-driven corner extraction capability can find accurate statistical corners that can be used instead of the traditional process corners.

Variation-Aware Design with the 28-nm PLL VCO

With the PLL VCO design, we used Solido Variation Designer to perform both fast PVT corner analysis and statistical design (3-sigma and high-sigma analysis).

PVT corner analysis is an integral part of a variation-aware design flow, taking into account different process conditions for various devices (e.g. mos, resistor, capacitor, etc.) as well as environmental conditions that may affect the design (e.g. voltage, temperature, bias, load). When designing for 28-nm technology, the number of corner combinations can readily become very large.

For the PLL VCO design, the conditions that needed to be taken into account were: process conditions for mos, resistor, and capacitor; along with voltage and temperature conditions. Even with just these few conditions, several hundred combinations needed to be considered to properly provide coverage for the design. Since the PLL VCO has a relatively long simulation time to measure its duty cycle, it would be very time consuming to simulate all combinations to provide sufficient coverage of the design.

The steps we used for fast PVT design with Solido Variation Designer are shown below:

Solido Variation Designer’s Run Fast PVT application made it possible to find the worst-case conditions 5.5x faster than running all combinations. Furthermore, the results from Fast PVT could then be used with Solido’s DesignSense application to determine the sensitivity of the design under variation-aware conditions. This made it much more practical to perform iterative design, and to explore opportunities to improve the duty cycle performance under worst-case conditions.

Statistical variation analysis and design is another key part of variation-aware design. As discussed earlier, corner analysis does not always capture design performance properly under variation conditions. Furthermore, specific sigma levels often need to be achieved, such as 6-sigma, which require the use of statistical simulation.

Solido Variation Designer makes it possible to verify designs at any practical sigma level. For lower-sigma designs, such as 3-sigma, Solido Variation Designer can be used with GLOBALFOUNDRIES models to extract 3-sigma corners, perform design iteration, and verify the design. The image below shows the result of performing 3-sigma corner extraction on the PLL VCO duty cycle. As can be seen in the image, Variation Designer was able to extract corners for the PLL VCO design that bound the distribution much better than the FF/SS corners.

For higher sigma levels, using Solido Variation Designer’s High-Sigma Monte Carlo (HSMC) capability makes it possible to perform fast, accurate, scalable and verifiable high-sigma design. This is important for high-volume and/or high-quality designs including memory, standard cell, automotive, and medical applications.

The steps we used for high-sigma design with Solido Variation Designer are shown below:

Solido Variation Designer provided a 100,000x and 16,666,667x reduction in the number of simulations required to accurately verify the design to 5- and 6-sigma, respectively. In addition, Variation Designer allowed the extraction of precise high-sigma corners that could be used to design iteratively while taking into account high-sigma variation.
Below are the results from statistical design with the PLL VCO:

It was clear during our work that variation is an important consideration for design at 28-nm. To achieve yielding, high-performance designs, it is necessary to be aware of variation and take it into account in the design flow. Using Solido Variation Designer with GLOBALFOUNDRIES technology makes it possible to do this efficiently and accurately.


Will Paul Otellini Convince Tim Cook to Fill Intel’s Fabs?

Will Paul Otellini Convince Tim Cook to Fill Intel’s Fabs?
by Ed McKernan on 09-27-2012 at 8:30 pm

An empty Fab is a terrible thing to waste, especially when it is leading edge. By the end of the year Intel will, by my back of the envelope calculation, be sitting with the equivalent of one idle 22nm Fab (cost $5B). What would you do if you were Paul Otellini?

Across the valley, in Cupertino, you have Tim Cook, whose modus operandi is to scour the world for underutilized resources and plug them into the ever-growing Apple Keiretsu at below market prices. It’s always time to go more vertical.

With the launch of the iphone 5 behind him and the supply chain ramped to deliver 50MU of iPhone 5 in Q4, there seems to be a silly game in the press of how to raise Tim’s dander on all that is wrong in the Apple ecosphere. The component shortages that exist today are in reality the flip side of the coin known as unlimited demand at Day 1 of the new product launch. However, with Samsung ever on Apple’s heals, the game doesn’t stop and Apple must continue to innovate as well as wring out supply chain inefficiencies. The one that, no doubt, is staring Cook in the eye for 2013 is the A6 processor currently in production in Samsung’s Austin Fab. It is the last major component being produced by Samsung and it needs to move to a friendlier foundry.

For months the rumor mills have been rattling with stories of a TSMC – Apple partnership at 20nm targeting first production the end of 2013. This seems logical, given that Apple is moving to a two-supplier model across most of its major components. If they were to continue with this strategy, then it would mean they have to pick up another foundry (i.e. Global Foundries or Intel) to go hand in hand with TSMC and avoid any single point of failure due to “Acts of God” or unforeseen upside, both of which we have seen the past 24 months.

Intel’s announcement a couple weeks ago on a PC slowdown in Q3 came with a hint that 22nm is yielding well. If, however Intel’s revenues going forward are flat or even slightly rising as opposed to the 24% growth they experienced in both 2010 and 2011 then the Fab expansion plans they outlined last year regarding 22nm and 14nm would raise the question – for what reason? Perhaps it was the only strategy that Otellini could logically employ as Intel tries to outrun TSMC and Samsung.

A year ago, there were doubts as to whether Intel’s new 22nm Finfet process would yield as well as previous process technologies. If the PC market and the DataCenter continued to grow as in past years and if Ivy Bridge were to cannibalize the graphics cores of AMD and nVidia, then the argument could be made to expand Intel’s 22nm Fab footprint from 3 to 4. And so it is expected at year-end the 4[SUP]th[/SUP] Fab will come on line while Intel is swimming in well yielding Ivy Bridges. Look out below AMD and nVidia, your days may be numbered in a soft PC market.

The addition of two mammoth 14nm Fabs that can be upgraded to 450mm to Intel’s capex budgets seems to speak of insanity, unless they expect them to come on line much sooner and that it truly does represent a 4 year lead over competitors. Mark Bohr at IDF mentioned that 14nm will be ready for production the end of 2013 and word is that the 14nm successor to Haswell, called Broadwell, is already up and running Windows. This begs the question, is Broadwell really two years away from production or will Intel launch it early, thus setting up a 22nm to 14nm Fab transition 2H 2013? Otellini would seem to be in a position to deploy his large, highly efficient 22nm Aircraft Carriers in any number of Foreign Oceans wreaking havoc. Or perhaps, aggressively leverage them for a long-term fab deal with Apple.

If Otellini were to offer Apple free wafers, would Tim Cook disregard it? Preposterous you say. OK, but this is what game theory is all about. You have to test the limits and I believe until the summer slowdown, Otellini’s bid to Apple was to sell wafers with a 60% margin markup.

In this new environment, Otellini will be more likely to offer a price that is closer to cost plus a small adder for anytime starting first half 2013 and extending thru 2015. What are the ramifications for Apple? The new A6 processor is a 95mm die in Samsung’s 32nm process and costs somewhere around $25 (I have seen estimates from $18 to $28). In round numbers the A6 in Intel’s 22nm process is 50mm in size. If Intel saves Apple $10 a chip, then it is equivalent to $3B a year (300MU) that drops to its Operating line and would add nearly $50 to Apple’s stock price (based on 15.5 P/E).

The overriding issue for Intel and Paul Otellini is, as I mentioned before, that they need to move to 14nm as quickly as possible and take as much of the market with them (both x86 and Apple) and thereby eliminate the threat posed by TSMC and Samsung as Foundries looking to supply a greater percentage of the total semiconductor market that is built in leading edge processes. Until the last couple years, Intel consistently had over 90% of the leading edge compute semiconductor content delivered with their x86 processors, a legacy that goes back to the transition of IBM mainframes to the Desktop PC.

The End Game continues to get more interesting as we get closer to “All In with Leading Edge.”

Full Disclosure: I am Long AAPL, INTC, QCOM and ALTR