BannerforSemiWiki 800x100 (2)

Synopsys Awarded TSMC’s Interface IP Partner of the Year

Synopsys Awarded TSMC’s Interface IP Partner of the Year
by Eric Esteve on 11-09-2011 at 9:19 am

Is it surprising to see that Synopsys has been selected Interface IP partner of the year by TSMC? Not really, as the company is the clear leader on this IP market segment (which includes USB, PCI Express, SATA, DDRn, HDMI, MIPI and others protocols like Ethernet, DisplayPort, Hyper Transport, Infiniband, Serial RapidIO…). But, looking five years back (in 2006), Synopsys was competing with Rambus (no more active on this type of activity), ARM (still present, but not very involved), and a bunch of “defunct” companies like ChipIdea (bought by MIPS in 2007, then sold to Synopsys in 2009), Virage Logic (acquired by Synopsys in 2010)…At that time, the Interface IP market was weighting $205M (according with Gartner) and Synopsys had a decent 25% market share. Since then, the growth has been sustained (see the picture showing the market evolution for USB, PCIe, DDRn, SATA and HDMI) and Synopsys is enjoying in 2010 a market share of… be patient, I will disclose the figure later in this document!

What we can see on the above picture is the negative impact of the Q4 2008-Q1/Q2/Q3 2009 recession on the growth rate for every segment – except DDRn Memory Controller. Even if in 2010, the market has recovered, we should come back to 20-30% like growth rate only in 2011. What will happen in 2012 depends, as always, of the health of the global economy. Assuming no catastrophic event, 2010/2011 growth should continue, and the interface IP market should reach in 2012 a $350M level, or be 58% larger than in 2009 (a 17% CAGR during these 3 years).

The reasons for growth are well known (at least for those who read Semiwiki frequently!): the massive move from parallel I/Os to high speed serial, the ever increasing need for more bandwidth, not only in Networking, but also in PC, PC peripheral, Wireless and Consumer Electronic segments – just because we (the end user) exchange more data through Emails, Social Media, watch movies or listen music on various, and new, electronic systems. Also because these protocols standards are not falling in commoditization (which badly impact the price you sell Interface IP), as the various organizations (SATA, USB, PCIe, DDRn to name the most important) are releasing new protocol version (PCIe gen-3, USB 3.0, SATA 6G, DDR4) which help to keep high selling price for the IP. For the mature protocols, the chip makers expects the IP vendors to port the PHY (physical part, technology dependant) on the latest technology node (40 or 28 nm), which again help to keep price in the high range (half million dollar or so). Thus the market growth will continue, at least for the next three to four years. IPnest has built a forecast dedicated to these Interface IP segments, up to 2015, and we expect to see a sustained growth for a market climbing to a $400M to $450M range (don’t expect IPNEST to release a 3 digit precision forecast, this is simply anti-scientific!)…

But what about Synopsys’ position? Our latest market evaluation (one week old) integrated in the “Interface IP Survey 2005-2010 – Forecast 2011-2015” shows that for 2010, Synopsys has not only kept the leader position, but has consolidated and has passed from a 25% market share in 2006 to a 40%+ share in 2010. Even more impressive, the company is getting at least 50% more market share (sometime more than 80%) in the segments where they are playing, namely USB, PCI Express, SATA, DDRn, with the exception of HDMI, where Silicon Image is really too strong- on a protocol they have invented, that make sense!

All of the above explains why TSMC has made the good choice, and any other decision would not have been rational… except maybe to decide to develop (or at least market) themselves the Interface IP functions, like the FPGA vendors are doing…

By the way, if you plan to attend IP-SoC 2011 in December 7-8[SUP]th[/SUP] in Grenoble, don’t miss the presentation I will give on the Interface IP market, see the Conference agenda.

Eric Esteve from IPNEST – Table of Contentfor “Interface IP Survey 2005-2010 – Forecast 2011-2015” available here


Using Processors in the SoC Dataplane

Using Processors in the SoC Dataplane
by Paul McLellan on 11-08-2011 at 9:17 am

Almost any chip of any complexity contains a control processor of some sort. These blocks are good at executing a wide range of algorithms but there are often two problems with them: the performance is inadequate for some application or the amount of power required is much too high. Control processors pay in limited performance and excessive power in order to get the huge flexibility that comes from programming a general purpose architecture. But for some applications such as audio, video, baseband DSP, security, protocol processing and more, it is not practical just to write the algorithm in software and run it on the control processor. These functionns comprise the dataplane of the SoC.

In many designs that dataplane is now the largest part of the SoC and, as a result, typically takes the longest to design and verify. The design is typically parceled out into several design groups to design and verify their part of the SoC. Meanwhile the software team is trying to start software development but this goes slowly and requires significant rework because the hardware design is not yet available.

A typical dataplane applilcation consists of the datapaths themselves, where the main data flow, along with a complex finite-state machine (FSM) that drives the various operations of the datapath. Think of a function like an MPEG video decode. The video data itself flows through the datapaths and the complex aglorithms that are used to decompress that data into a raw pixel stream are driven by a comlex state machine off to one side (see diagram).


There are two challenges with this architecture. RTL just isn’t a very good way to describe the complex data operations and as a result it takes too long to create and verify. Secondly, the complex algorithmic nature of the control function is not a good match for an implementation on the bare silicon, since any change, even a minor one, to tweak the algorithm typically requires a complete respin of the chip.

One obvious approach would be to use the control processor to drive the dataplane, but often this does not have enough performance (for high performance applications like router packet analysis) or consumes too much power for simple ones (like mp3 decode). In addition, the control processor is typically busy doing other things and so the only practical implementation requires adding a second control processor

A better approach is to use a dataplane processor (DPU). This can strike the right balance between performance, power and flexibility. It is not a general purpose processor but doesn’t come with a general purpose processors overheads. It is not as hard-wired or expensive as writing dedicated RTL. The FSM can be implemented extremely efficiently but retaining a measure of programmability.

The main advantages are:

  • Flexibility. Changing the firmware changes the blocks function.
  • Software-based development, with lower cost tools and ease of making last minute changes.
  • Faster and more powerful system modeling, and more complete coverage
  • Time-to-market
  • Ability to make post-silicon changes to the firmware to tweak/improve algorithms
  • DPU processor cores are pre-designed and pre-verified. Just add the firmware.

So the reasons to use a programmable processor in the dataplane of an SoC are clear: flexibility, design productivity and, typically, higher performance at lower power. Plus the capabillity to make changes to the SoC after tapeout without requiring a re-spin.

The Tensilica white paper on DPUs is here.



RTL Power Models

RTL Power Models
by Paul McLellan on 11-08-2011 at 8:00 am

One of the challenges of doing a design in the 28nm world is that everything depends on everything else. But some decisions need to be made early with imperfect information. But the better the information we have, the better those early decisions will be. One area of particular importance is selecting a package, designing a power network and generally putting together a power policy. Everything is power sensitive these days, not just mobile applications but anything that is going to end up inside a cloud computing server farm, such as routers, disk drives and servers. Power gating and clock gating make things worse, since they can cause very abrupt transitions as inrush current powers up a sleeping block or a large block suddenly starts to be clocked again.

But these decisions about power can’t wait until physical design is complete. That is too late. And the old way of handing things with a combination of guesswork and excel spreadsheets is too cumbersome and inaccurate. The penalties are severe. Underdesigning the power delivery network results in failures; overdesigning it results in larger die or a more expensive package. The ideal approach would be to be able to do the sort of analysis that is possible after physical design, but instead do it at the RTL level.

That is basically what Apache is announcing today. An RTL power model (RPM) of the chip combines three technologies.

  • Fast critical frame selection
  • PACE (Power Artist Calibrator and Estimator)
  • RTL (pre-synthesis) power analysis

To do power analysis requires vectors (well, there are some vectorless approaches but they are not very accurate). But all vectors are not created equal. There might be tens of millions of vectors in the full verification suite, but for critical power analysis there may be just very short sequences of a dozen vectors that cause the maximum stress to the power network or which cause the maximum change in the current. These critical frames are the ones that are needed to ensure that there are no power problems due to excessive current draw or excessively fast change that, for example, drains all the decaps.

PACE, PowerArtist Calibrator and Estimator, is a tool for analyzing a chip for a similar application and process and generating the parameters that are required to make the RTL power analysis accurate for similar designs. Obviously power depends on things like the capacitance of interconnect, the cell library used, the type of clock tree and so on. PACE characterizes this physical information allowing RTL-power based design decisions to be made with confidence.

With these two things in place, the critical frames and the data in PACE, plus, of course, the RTL, it is now possible to generate an RTL Power Model for the design. The RPM can then be used in Apache’s analysis tools. How accurate is it? Early customers have found that the RPM is within 15% of the values from actual layout with full parasitics. For example, here is a comparison of package resonance frequency comparing just the package (green) with the package along with a CPM (chip-power-model) from layout (red) and with the package along with a CPM from RPM (blue).


And another example, showing the change in resonance frequency as the size of the decoupling capacitors is changed.

More and more designs are basically assemblies of IP blocks, mostly RTL. The RTL is often poorly understood since it is purchased externally or created in another group overseas or comes from a previous incarnation of the design. This makes early analysis like this even more important and avoids either underdesigning the power delivery network, and risking in-field failures, or overdesigning it and being uncompetitive due to cost.


Managing Test Power for ICs

Managing Test Power for ICs
by Beth Martin on 11-07-2011 at 12:17 pm

The goal for automatic test pattern generation (ATPG) is to achieve maximum coverage with the fewest test patterns. This conflicts with the goals of managing power because during test, the IC is often operated beyond its normal functional modes to get the highest quality test results. When switching activity exceeds a device’s power capability during test, it can have detrimental effects on the IC, such as collapse of the power supply, switching noise, and excessive current that could lead to joule heating and connection failure. These effects lead to false failures, and can damage IC in ways that decrease it’s lifetime.

When planning for power-aware test and creating production test patterns, several techniques should be used to manage power during test. Using the techniques I outline here, scan shift switching—the largest contributor to power usage during test—can typically be reduced from 50% (normal level as even distribution on 1s and 0s) to 25% with minimal impact on test time.

Reduce the clock frequency to allow power to dissipate and reduce the heating and average power. However, this could exacerbate a problem with instantaneous power because the circuit will settle more between the lower frequency clock pulses.

Skew the clocks such that they rise at different points within the cycle, thus reducing instantaneous power. This technique is highly dependent on the clock design, does not help with average power, is circuit dependent and, in some cases will not address localized problems.

Use a modular test approachto manage switching activity during production test by sequencing test activity and controlling power on a block-by-block basis. This method requires configuring test pattern generation so that only active blocks are considered during the ATPG process and the remaining blocks are held in a steady state.

Manage switching activity during scan test with various ATPG strategies. In standard ATPG, pattern generation targets the maximum number of faults in the minimum number of patterns. This approach leads to high levels of switching activity, usually in the early part of the test set. But if you relax the rate of fault detection and set a switching threshold as part of the initial constraints for ATPG, the coverage rate is spread throughout the entire test set, leading to a lower average switch activity.

Use clock-gating to limit capture power by holding state elements that are not being used to control or observe targeted faults. ATPG controls the clock-gating logic with either scan elements or primary pins. Hierarchical clock controls can provide a finer level of control granularity while using fewer control bits.

Achievable Results
Scan shift switching can typically be cut in half with minimal impact on test time. Capture switching activity can also be significantly reduced, but is highly design dependent. By leveraging features already in place at the system level to control power dissipation in functional mode, power used during production test can be effectively managed. When design-level approaches to power control are combined with ATPG and BIST techniques, you can achieve high-quality test and manage power integrity.For more information on low power testing, download my new white paper “Using Tessent Low Power Test to Manage Switching Activity.”By Giri Podichetty on behalf of Mentor Graphics.


3D Transistors @ TSMC 20nm!

3D Transistors @ TSMC 20nm!
by Daniel Nenni on 11-06-2011 at 12:51 pm

Ever since the TSMC OIP Forum where Dr. Shang-Yi Chiang openly asked customers, “When do you want 3D Transistors (FinFETS)?” I have heard quite a few debates on the topic inside the top fabless semiconductor companies. The bottom line, in my expert opinion, is that TSMC will add FinFETS to the N20 (20nm) process node in parallel with planar transistors and here are the reasons why:



Eliminating EXCESS:
In the next few years, traditional planar CMOS field-effect transistors will be replaced by alternate architectures that boost the gate’s control of the channel. The UTB SOI

replaces the bulk silicon channel with a thin layer of silicon mounted on insulator. The FinFET turns the transistor channel on its side and wraps the gate around three sides. (Click to enlarge)

The1999 IDM paper Sub 50-nm FinFET: PMOSstarted the 3D transistor ball rolling then in May of 2011 Intel announceda production version of a 3D transistor (TriGate) technology at 22nm. Intel is the leader in semiconductor process technologies so you can be sure that others will follow. Intel has a nice “History of the Transistor” backgrounder in case you are interested. Probably the most comprehensive article on the subject was just published by IEEE Spectrum “Transistor Wars: Rival architectures face off in a bid to keep Moore’s Law alive”. This is a must read for all of us semiconductor transistor laymen.


DOWN AND UP:
A cross section of UTB SOI transistors and a micrograph of an array of FinFET transistors .

Why the push to 3D transistors at 20nm?

Reason #1 is because of scaling. From 40nm to 28nm we saw significant opportunities for a reduction in die size and power requirements plus an increase in performance. The TSMC 28nm gate-last HKMG node will go down in history as the most profitable node ever, believe it! Unfortunately standard planar transistors are not scaling well from 28nm to 20nm, causing a reduction of the power/die savings and performance boost customers have come to expect from a process shrink. From what I have heard it is half what was expected/hoped for. As a result, TSMC will definitely offer 3D transistors at the 20nm node, probably as a mid-life node booster.



Shrinking returns:
As transistors got smaller, their power demands grew. By 2001, the power that leaked through a transistor when it was off was fast approaching the amount of power needed to turn the transistor on , a warning sign for the chip industry. As these Intel data show, the leakage problem eventually put a halt to the transistor scaling , a progression called Dennard’s law. Switching to alternate architectures will allow chipmakers to shrink transistors again, boosting transistor density and performance.

Reason #2 is because TSMC can and it will offer a significant competitive advantage against the second source foundries (UMC, GFI, SMIC). DR. Chenming Hu is considered an expert on the subject and is currently a TSMC Distinguished Professor of Microelectronics at University of California, Berkeley. Prior to that he was the Chief Technology Officer of TSMC. Hu coined the term FinFET 10+ years ago when he and his team built the first FinFETs and described them in the 1999 IEDM paper. The name FinFET because the transistors (technically known as Field Effect Transistors) look like fins. Hu didn’t register patents on the design or manufacturing process to make it as widely available as possible and was confident the industry would adopt it. Well, he was absolutely right!

The push for 3D transistors clearly shows that the days of planar transistor scaling will soon be behind us. It also shows what lengths we will go through to continue Moore’s Law. Or as TSMC says “More-than Moore Technologies”.


ARM Chips Away at Intel’s Server Business!

ARM Chips Away at Intel’s Server Business!
by Ed McKernan on 11-06-2011 at 7:40 am


When Intel entered the server market in the 1990s with their Pentium Processor and follow on Xeons beginning in 1998, they focused on the simple enterprise applications. At the same time they laid the groundwork for what will turn out to be a multi-decade, long war to wrest control from all mainframes and workstations. The announcements this past week by Calxeda with the first 32-bit ARM server chip and by ARM with their new 64-bit server architecture known as the “v8 Core” we see a similar strategy unfolding. We should not be surprised at ARMs aggressive push into server but we should also recognize that the battle between ARM and Intel will also occur over decades with many new interesting twists and alliances.

For the past year, Kirk Skaugen the General Manager of Intel’s Datacenter and Connected Systems Group has shown the slide to the left to describe not only the growth of the Server Market but how it is divided into many sub segments. The overriding message is that Intel is going after it all, but the more important point is that the market fragmentation will cause suppliers to customize their processors for the particular markets. The interesting takeaways from the Calxeda (pronounced Cal-Zeda) introduction that can be found here is not the fact that it is reported to be 90% lower power than Intel Xeon’s or said another way 10X the performance at the same power, it is that they are driving a complete new vector which takes into account packing density.

How many complete servers with networking and SATA storage interfaces can you cram into a 1U server (think pizza box size). The Calxeda chip integrates 4 ARM Cores with a DDR3 memory controller, 6 Gigabit Ethernet connections and 4 10G XAUI interfaces, SATA ports to interface up to 5 drives and 4 PCIe2.0 controllers. Think of it as a complete Server on Chip (another use of the acronym SOC). By tuning this chip to under 5 watts it allows dense packing without all the large, spacious cooling solutions, required with Intel Xeons running at up to 110 Watts. So what are the prospects for Calxeda? Although it is only a 32 bit chip, the Calxeda solution, assuming it undercuts an Intel Atom with its I/O chipset could play in the entry level office, small business server market handling rudimentary functions. It will also get a play in the data center dishing out web pages that don’t handle critical transactions. The datacenter companies have two large operating cost items: the power bill and the equipment cost. Google, Facebook, Amazon and the rest of the big server buyers can’t afford not to look at the Calxeda solution and to play around with it. HP recognized this and that is why they are establishing what they call the “HP Discovery Lab” in Houston for customers to test out their applications on the new server built with Calxeda parts. Assume that Google and the rest of teams of engineers tuning their code for ARM.

One other aspect of the Calxeda business model that is interesting. The tradition with many Xeon based servers is to buy the biggest processor and then load it with VMWare for the purpose of dividing many workloads on one processor since individual workloads rarely utilize even a fraction of a single processor. Calxeda’s model is to say instead that the user should get rid of the VMWare and run a single application per processor core. So now we have two economic models fighting it out. I am sure both are suitable for various market segments. But now VMWare will align closer with Intel. Now that we have seen what Calxeda has developed, what should we expect from nVidia, AMCC, and Intel. AMCC announced they have an ARM v8 running on FPGAs. This is fine, but too early to tell what direction they are headed. It is likely that nVidia has something in the labs that they will announce when they have silicon to sample. nVidia has the advantage of selling into the High Performance Computing (HPC) market today and would like to leverage that further into the datacenter. For Intel, there clearly is a need to rethink their architectures in ways that are dramatically different. For the dense server market they will need to imitate Calxeda in the integration of I/O (SATA and Ethernet).

Secondly, they are probably thinking and designing along the path of more heterogeneous processor cores (for example dropping the x86 floating point unit in some cases – as Calxeda has done in its server chip). And finally, they no doubt will demonstrate at 22nm that large L3 caches make a huge difference in performance benchmarks, even for simple applications. Why do I say this? Intel’s ultimate strength is in process technology and the fact that they are the world’s greatest designers of SRAM caches. They go hand-in-hand as the SRAM cache is the pipe cleaner for the new process technology. The Calxeda parts have a nice L2 cache but no L3 cache. Intel is brute forcing large L3 caches into Xeon based on the justification that any workload that remains on the processor die and avoids going off chip to DRAM is a power saving solution. As Intel builds Xeons with larger caches it simultaneously sell customers on the value proposition that the high ASP Xeons (as high as $4200 per part) save much more than that on their power bills. Today the highest end Xeon has 30MB of cache, in 12 months this could be 60MB on 22nm. Intel will argue many applications running on the Calxeda chip must go off to DRAM whereas with Intel they stay on the processor. This is where the PR performance/watt battle is likely to be fought with benchmarks over the next year. If Calxeda and the other ARM server processors gain traction, look for Intel to offer a “Celeron” server processor strategy that drags Calxeda into a scorched earth price war in a very narrow niche.

Meanwhile, the Intel high-end juggernaut will continue to steam ahead with more cores and bigger L3 caches. No one today can say how this will play out. Competition is good and there is a sense that more alternatives will lead to better economics. The good news is that the battle is joined and we get to live in interesting times.

Note: You must be logged in to read/write comments


PC Growth Latches on to the Parabolic Curve of Emerging Markets

PC Growth Latches on to the Parabolic Curve of Emerging Markets
by Ed McKernan on 11-04-2011 at 7:56 am

One of the interesting tidbits of information to come from Intel’s October earnings call was that Brazil, a country of nearly 200M people, has moved up to the #3 position in terms of PC unit sales. This was a shock to most people and as usual brushed aside by those not familiar with the happenings of the emerging markets (i.e. the countries keeping the world out of a true depression). A few days ago I saw an article about Brazil’s economy posted on one of my favorite web sites called Carpe Diem. The picture to the left and the following article should put things in perspective (see Brazil to Surpass U.K. in 2011 to Be No. 6 Economy). Brazil’s economy (GDP) has increased 500% since 2002 and is expected to grow another 40% in the next 4 years. Does this not look like the Moore’s Law parabolic curve with which we are all familiar?

For the past year, Intel has re-iterated on every conference call and analyst meeting that they conservatively saw an 11% growth rate for the PC market over the next 4 years. The Wall St. analysts scoffed that Intel was overly optimistic and used data from Gartner and IDC to back up them up. Gartner and IDC were in Intel’s words not able to accurately count sell through in the emerging markets. For those of you not familiar with the relationship between Intel and Gartner/IDC, let’s just say Intel NEVER Shares Processor Data with analysts. It’s a guessing game at best and therefore Gartner and IDC put together forecasts that are backward looking and biased towards the US and Western Europe. If these two regions are flat while the emerging markets are growing, then you get the picture.

The result of all this is that the understanding of the worldwide PC and Apple markets is skewed towards what sits on the analyst’s desk and not what is sold in the hinterlands. Intel knows best that what is going on is that there are three distinct markets in the world. There is the Apple growth story that is playing out in the US and Western Europe, cannibalizing the consumer/retail PC market at a fast clip. Then we have the corporate market that is tied to the Wintel legacy and these are selling at an awesome rate. How do we know, Intel’s strong revenue and gross margins tell us this. Finally there is the emerging market that is based almost solely on Intel or AMD with some fraction of Windows. (real or imaginary). For this market, the iPAD and MAC notebooks are too expensive. Given the growth rate of the emerging market economies, the PC will have a strong future.

Considering that Brazil is just surpassing the UK in GDP and Brazil is 3 times the population of the UK, then one can see several trends. First, income is rising to the point PCs are affordable. Second there is much more demand coming on stream the next few years from younger countries with rising salaries. And finally, if as one would expect that LCD prices will continue to fall, DVD drives discarded, and that SSDs will finally enter the mix as a cheaper alternative to HDDs in the next 24 months, then there is further room for notebooks to move lower in price. A $300 notebook today that trends to $200 and below may result in a new parabolic demand curve. Moore’s Law shows up again in another unsuspecting place.


Arteris vs Sonics battle…Let’s talk NoC architecture

Arteris vs Sonics battle…Let’s talk NoC architecture
by Eric Esteve on 11-04-2011 at 6:23 am

The text of this very first article about Arteris had disapeared from Semiwiki, for an absolutely unknowed reason…If you have missed it, this is a pretty useful introduction to NoC concept, as well as to the legal battle between Arteris and Sonics:

The Network on Chip is a pretty recent concept. Let’s try to understand how it works. Anybody who has been involved in the Supercomputer design (like I was in the 80’s), knows that you need a “piece” between the multiple CPU and memory banks, at that time a “crossbar switch”. To make it outrageously simple, you want to interconnect the M blocks on the left side with the N blocks on the right side, to do so you create a switch made of MxN wires.

The “crossbar switch” is as old as the phone industry! Would this type of architecture be implemented in a multimillion gates System on Chip, you easily can guess the kind of issues generated: routing congestion, too long wires, increasing delay and power consumption due to interconnects. Thanks to the modern telecommunication, the “old” way to build networks has been replaced by “bit-packet switching over multiplexed links”. This simply means you can use a single physical link to support several functional interconnect, and that you apply the packet transport approach, very popular in the new interface protocol like PCI Express for example, serialized to again reduce the number of wires. The NoC concept was born in the early 2000’s and the first dedicated research symposium on Networks on Chip was held at Princeton University, in May 2007.

Let’s come back to the fight of the day, on my left Sonics, fighting with Arteris, on my right. Sonics has been founded in 1996, it’s important to mention it, as it was very early in the NoC history. The first product launched by Sonics, like the Synapse 3220, was based on “crossbar switch” topology and the “significant benefit”, according with Sonics here was “its control of the interconnects power dissipation. In a classic bus-based architecture, during any given transaction, all of the wires in the bus are driven, not just those from a given initiator to a given target. This wastes a significant amount of power. Synapse 3220 addresses the problem by activating only the required segment of the interconnect for a given transaction, while the rest is kept deactivated.” As you can see, the product was not packet based, neither multiplexed, nor serialized – it was a crossbar switch where you could deactivate the busses not used in a given transaction. If we look at the NoC product released in 2005 (Arteris was created in 2003), like the SonicsMX, it was still based on “crossbar switch” (just look at the dark blue blocks):

When Sonics came on the market, they were alone on a niche, enjoy many design win and grow their customer base. And they had to keep their legacy interface (based on OCP for example) to satisfy the existing customers when developing new products. When Arteris start business (in the mid 2000’s), they jump to the most effective, modern NoC topology: “point-to-point connections rather than the more typical mixture of multiple fan-out meshes. While a more standard hybrid bus will contain a centralized crossbar and numerous wires that create difficult-to-route congestion points, an NoC’s distributed communication architecture minimizes routing congestion points to simplify the chip’s layout effort”.

What was the market answer? In 2005, Sonics was still enjoying prestigious design win for NoC at many application processor chip makers for the consumer or wireless handset market (“Broadcom, Texas Instruments, Toshiba Corp., Samsung and several unnamed Original Equipment Manufacturers”). That we see today is that Arteris’ customer list includes: Qualcomm, Texas Instruments, Toshiba, Samsung… and also LG, Pixelworks or Megachip! There is nothing like customer design win to quantify an IP product success.

So, I don’t know if Sonics’ patent can be applied to Arteris’ NoC IP (I am not a lawyer, neither a NoC expert), but that I can see is that Sonics came very early on the NoC IP market, using a “crossbar switch” topology and has enjoyed a good success on a niche market… where it was the single player. About 10 years later, Arteris came to the same niche, but rich, market (complex SoC for application processor for wireless handset, or multimedia processors) with a more innovative product (see above) and win pretty quickly (5 years is quick to design-in a new concept) majors IDM or ODM sockets… If your product is not good enough, is it time to go to legal? I don’t say it’s the case, but it look like it is!

Eric Esteve fom IPNEST


Learning Verilog for ASIC and FPGA Design

Learning Verilog for ASIC and FPGA Design
by Daniel Payne on 11-02-2011 at 11:17 am

Verilog History
Prabhu Goel founded Gateway Design Automation and Phil Moorby wrote the Verilog language back in 1984. In 1989 Cadence acquired Gateway and Verilog grew into a de-facto HDL standard. I first met Prabu at Wang Labs in 1982 where I designed a rather untestable custom chip named the WL-2001 (yes, it was named to honor 2001 A Space Odyssey) and was lectured about the virtues of testability, oh well.

Learning Verilog
Today you can learn Verilog by a variety of means:
[LIST=1]

  • Buy a book and self study
  • Browse the Internet and self study
  • Attend a class, seminar or workshop

    I’ve learned Verilog through self study and kept in touch with a corporate trainer named Tom Wille who operates TM Associates, we both worked at Mentor Graphics. Several years ago Tom asked me to update and deliver a Verilog class for Lattice Semiconductor to use:

    I’ve delivered the Verilog class to both Lattice Semi and other companies in the US. Recently I updated the Verilog class again and trained a group of AEs at Lattice Semi in Hillsboro, Oregon using:

    Class Experience
    Each AE brought in their own laptop computer loaded with software and I handed out a thick binder with lecture material and notes, and a smaller binder for lab exercises. Most of the AEs used Aldec and Lattice Diamond using Microsoft Windows however one AE ran ModelSim using Linux. Some of the reasons for having an on-site Verilog training class are:
    [LIST=1]

  • Convenient method for engineers to quickly come up to speed and learn Verilog by theory (lecture) and application (labs)
  • Interactive questions encouraged
  • Uses a tested process for learning
  • Learn by doing the labs


    In three days we covered 12 units of study, typically two units before lunch and two units after lunch. Here’s the general outline we followed:

    Day 1
    Unit 1: IntroductionCoding and synthesizing a typical Verilog module to be used in the wireless chip.

    • Synthesis-friendly features of Verilog-2001
    • Migrating the module from an FPGA prototyping technology to a submicron ASIC technology
    • Wireless chip design spec

    Unit 2: Combinational Logic

    • Effective use of conditional constructs (ifelse, case,casez, ?:)
    • Decoding. Priority encoding
    • Code conversion. ROM usage
    • Multiplexing/demultiplexing
    • Iterative constructs (for, while, disable)
    • Signed/unsigned arithmetic
    • Using concurrency

    Unit 3: Sequential Logic

    • Sequential building blocks
    • Registers with synch/asynch reset and clock enable
    • Parallel/serial converter
    • Ring counter
    • Edge detector
    • Using blocking vs. non-blocking assignments
    • Non-synthesizable constructs and workarounds
    • ASM (algorithmic state machine) charts as an aid to sequential-machine design

    Unit 4: Block Integration

    • Chip-level design and integration issues
    • Coding above the block level
    • Multiple clock domains
    • Partitioning an entire chip into modules
    • Separating blocks with different design goals
    • Registering outputs
    • Maximizing optimization
    • Instantiating IP blocks such as Synopsys DesignWare
    • Instantiating I/O cells using generate loops

    Day 2
    Unit 5: FSMs and Controllers

    • Coding FSM-oriented designs
    • ASM (algorithmic state machine) chart usage
    • Mealy vs. Moore insights
    • Modified Mealy FSM with registered next-outputs
    • Hierarchical FSMs
    • Controller for wireless chip
    • Datapath/controller paradigm


    Unit 6: Getting the most out of your tools

    • Synthesizable HDL subset
    • Unsupported constructs
    • Excluding simulator-oriented code
    • Using parameters and localparam
    • Name-based redefinition
    • Text substitution
    • Managing code modules
    • Using include directives and definition.vh files
    • Coding for reuse and portability

    Unit 7: Coding for Area

    • Classic area/delay trade-off
    • Avoiding excess logic
    • Reducing ASIC gate count or FPGA LUT usage
    • Minimizing algebraic tree nodes
    • Sharing arithmetic resources
    • Sharing non-arithmetic logic like array indexing
    • Cacheing recomputed quantities
    • Scheduling over multiple clock cycles


    Unit 8: Coding for Performance

    • Parallelizing operations
    • Minimizing algebraic tree height
    • Resource implementation selection
    • Exploiting concurrency
    • Accommodating late input arrivals

    Day 3 Unit 9: Verification

    • Verification definition, methodology
    • Testbench architecture
    • Clock generation
    • Timescale
    • Stimulus generation
    • Sampling response at regular intervals or on change
    • Comparison of responses
    • Using $random
    • Using forkjoin


    Unit 10: Testbench Techniques

    • Encapsulating tests within tasks
    • Self-checking testbenches
    • File-oriented testbenches
    • Using $readmem
    • Fixed vectors
    • Bus functional models
    • Synchronizing stimuli
    • Named events
    • Accessing the Verilog PLI

    Unit 11: Avoiding Simulation Pitfalls

    • Weaknesses of Verilog-2001
    • Truncation and other risky coding constructs
    • Timescale pitfalls
    • Avoiding race conditions during simulation
    • Avoiding simulation-synthesis mismatches
    • Avoiding simulator bottlenecks and improving performance

    Unit 12: Advanced Topics

    • Bottom-up vs. top-down verification methodology
    • Emergence of static verification
    • Coding guidelines for formal equivalence
    • Co-simulation using Vera
    • Scan-based testing
    • DFT guidelines
    • Future directions.

    The pace is fast and the group of AEs had many questions that were answered and clarified using the white board. More than half of the time is spent in the labs where students really get to apply the theory in a practical way by coding Verilog, debugging and then verifying correct results. We code both designs and test benches.

    In this particular class we did uncover one subtle difference between Verilog simulation results using Modelsim versus Aldec. The student using Modelsim was able to tweak the one lab design to pass the test bench.

    Summary
    If you have a group of engineers that needs to learn Verilog for the first time, or just increase their Verilog understanding then consider contacting Tom Wille to find out if an on-site class might be of value. His company also offers VHDL training and has been in business for many years using a variety of freelance instructors.


  • High-efficiency PVT and Monte Carlo analysis in the TSMC AMS Reference Flow for optimal yield in memory, analog and digital design!

    High-efficiency PVT and Monte Carlo analysis in the TSMC AMS Reference Flow for optimal yield in memory, analog and digital design!
    by Daniel Nenni on 11-01-2011 at 9:00 am

    Hello Daniel,
    I am very interested on the articles on the PVT simulation, I have worked in that area in the past when I worked in process technology development and spice modeling and I also started a company called Device modeling technology (DMT) which built a Spice model library of discrete components, such as Bipolar/MOS /POWER MOSFET/Analog Switch/ADC/CDA/PLL sold to companies like Fujitsu, Toshiba …etc.

    We used to have a project when I worked on R&D to simulate the process based on the device architecture and send the out data to a simulator called PICE which is a device simulator and the output again was sent to the input of Spice simulator , as the Process simulator , the device simulator and spice simulator are connected.

    We can easily define the performance of the targeted analog circuit with variation of process recipe and device structures, we can also predict the yield of each corner with running the spice PVT simulation against the six sigmal spice models. However, as you know, the performance always has to compromise with the reliability, and you can’t run the circuit simulation together with the reliability models, because no such models are available.

    As a result I do not pay much attention to the result of spice simulation, because it can never tell you what the reliability will be with the result of spice simulation, and I still believe real corner lot wafer is the best way to verify the performance, yield and reliability.

    Hi Edward,

    Process variation is of great interest at 28nm and even more at 20nm. In a recent independent survey, variation-aware custom IC design was ranked the number one area requiring advancement over the next two years. The survey revealed:

    [LIST=1]

  • 53% of design groups missed deadlines or experienced respins due to variation issues
  • Designers experienced an average 2 month delay due to variation issues
  • Designers spent an average 22% of design time on variation issues

    For further information, see the Gary Smith EDA analyst report on variation design.

    Here is a recent webinar done by Solido and TSMC on High-efficiency PVT and Monte Carlo analysis in the TSMC AMS Reference Flow for optimal yield in memory, analog and digital design.

    Attendees of this webinar learned:

    [LIST=1]

  • Variation challenges in custom IC design
  • Variation-aware solutions available in the TSMC AMS reference flow
  • Methods to develop and verify designs over PVT corners in less time
  • How to efficiently apply Monte Carlo techniques in design sign-off
  • How Monte Carlo is really possible up to 6-sigma
  • Customer case studies of the above methods

    Solido customer case studies include:

    [LIST=1]

  • NVIDIA for memory, standard cell, analog/RF design
  • Qualcomm for memory design
  • Huawei-HiSilicon for analog design
  • Qualcomm for I/O design
  • Anonymous for analog/RF design

    Presenters:

    [LIST=1]

  • Nigel Bleasdale, Director of Product Management, Solido Design Automation
  • Jason Chen, Design Methodology and Service Marketing, TSMC

    Audience: Circuit Designers, Design Managers, CAD Engineers