An indecision is actually a very bad decision. It is an implicit decision for the status quo. It is worse than an explicit decision regardless of how bad that decision turns out to be. In my experience, our indecisiveness is caused by various reasons involving trade-offs.
Continue reading “What Makes Us Indecisive?”
Moore’s Law Linear Approximation and Mathematical Analysis!
Respected Gordon Moore has given the real computing power to the world and Respected Stephen Hawking’s work from past many years has given the reality of physics and mathematics to the universe. We can imagine the shrinking and intelligence in computing due to the real evolution of semiconductor technology. The process node has shrunk from 250 nanometer (Year 1997) to 14 nanometer (Year 2014). And continues to shrink but have limitations due to fundamental laws of Physics.
Continue reading “Moore’s Law Linear Approximation and Mathematical Analysis!”
The Revenge of Microprocessor Design: The Return of the Macro
(Two Star Wars™ allusions in one title – eat your heart out George Lucas.) Most of us are comfortable with the idea that you design more or less whatever you want in RTL and let the synthesis tool pick logic gates to implement that functionality. Sure it may need a little guidance here and there but otherwise synthesis is more or less a hands-free operation (subject to meeting timing). Not so for microprocessor designers who until recently, thanks to demands of very tight margins, had to manage sequential stages using special large macros more often than not, even while they relied on synthesis for other logic.
Now it seems FinFET technologies are driving us all back to large macros. The problem is that, in several cases for a variety of reasons to do with the arcana of FinFET technology, an aggregate of small cells placed and routed using standard methods often has significantly lower performance and higher power than a custom-crafted macro (cue old designers muttering “Well – duh”). Some opportunities to optimize are large bit count register trays, pulse latches and retention flops. There are even valuable performance and power improvements possible in folding larger logic gates into registers.
As always, there’s a challenge. Actually two challenges, both in characterization. Large macros simulate exponentially slower than smaller macros because run-time dramatically increases with number of nodes. Worse yet, Monte Carlo Spice, required to deal with acute on-chip variation in FinFETS, massively amplifies run-time on these large circuits. You’re caught between an unavoidable need to use these macros to meet power and performance goals and impossibly long times to characterize with sufficient accuracy to be able to trust that characterization in chip-level analysis.
Maybe you could sample a sparser space in Monte Carlo and apply margins to handle whatever you might have missed? It’s pretty clear this approach is no longer viable, especially if you are running at low voltages (0.6V) where the difference between early and late arrivals can be as much as 100%. And we all know (or should know) about the evils of over margining, perhaps resulting in a problem in a device that came out in spec from one foundry but much higher power (and lower battery life) from another.
So you have to do comprehensive variance analysis and you know that any standard approach is impossibly slow. What you need is a way to simulate large circuits with Spice-level accuracy but much faster, and a better way than Monte Carlo to deal with mapping the effect of variations on many paths into the characterization. Macro FX™, based on the CLKDA FX simulator, combines solutions to both problems. FX is an intrinsically fast Spice accurate simulator (“check” on the first problem), but what I find most interesting is how it deals with the second problem. To elaborate a little more on that issue, no matter how fast any one Spice simulation runs, Monte Carlo techniques multiply that time by hundreds or thousands in order to cover a sufficiently representative sample space. Whatever gain you may have in simulation speed, you lose that and much more in having to run many, many simulations.
FX on the other hand solves for sensitivities based on variances as it runs simulation, which is a lot more efficient than running simulations many times over. The outcome is a parameterization of variances within the characterization data. This can be output as AOCV, LVF, POCV or SOCV tables, or can be used directly in-line in PathFX™ to get the ultimate in signoff accuracy.
You can learn more about how FinFETs are driving a need for larger macros, and how MacroFX helps HERE. To learn more about the effects of variance on timing, especially at low voltages, click HERE.
May the FX be with you.
China Further Expands Its Influence in Semiconductor Industry
For the fourth consecutive year, China’s semiconductor consumption growth far exceeded worldwide market growth. At the end of 2014, the country had a record 56.6% of the global semiconductor consumption market.
China’s semiconductor consumption grew by 12.6% in 2014, exceeding the worldwide chip market growth of 9.8%. To put that in a broader perspective, over the past 11 years China’s semiconductor consumption has grown at an 18.8% compounded annual growth rate (CAGR), compared with a 6.6% CAGR for total worldwide consumption.
At the end of 2014, China had three semiconductor companies with US$1 billion or more in annual revenue. Collectively, these companies have experienced an 18.5% CAGR over the past 11 years. In the future, we expect to see more Chinese semiconductor companies break the billion-dollar revenue mark either organically or through mergers.
Even with Chinese semiconductor companies growing in number and size, non-Chinese global semiconductor companies remain the dominant semiconductor suppliers to China. This contributed to China’s integrated circuit (IC) consumption/production gap of US$120 billion at the end of 2014—US$12 billion wider than at the end of 2013. The growing production/consumption gap and the strategic importance of the industry will continue to favorably influence the policies of the Chinese government as far as the semiconductor industry is concerned.
So what areas contributed to China’s chip consumption during 2014? We saw heavier concentrations in the data processing (computing) and communications applications sectors, and slightly more concentration in the consumer sector versus the worldwide market. Contrasting with the worldwide market, however, China’s semiconductor consumption was less concentrated in the automotive sector, and noticeably less concentrated in the industrial/medical/other, and military/aerospace sectors.
China’s IC consumption over the past decade has grown by more than US$134 billion, compared with just US$99 billion for the worldwide market. The country’s growth in this area has come at the expense of other regions’ IC markets, but we’re also seeing China’s rate of IC consumption market growth gradually moving closer to the worldwide rate. China’s IC consumption increased by more than US$20 billion in 2014, which was US$6 billion less than the worldwide market.
China’s O-S-D (optoelectronics-sensor-discrete) segment tells a similar story, with consumption growing 8.1% during 2o14 to reach a new peak of US$34.3 billion. Sensors are fundamental to the Internet of Things (IoT), and will help drive an increase in semiconductor industry billings. But for the first time in four years, China’s O-S-D increase was slightly less than the worldwide market increase, meaning China’s O-S-D market share remained relatively flat in 2014, at 56%.
When we look at China’s pragmatic government policies in this area, and combine them with a culture of entrepreneurship and a vast pool of engineering talent, we believe the Chinese semiconductor industry should continue to gain strength over the rest of the decade.
I’d love to hear your opinion in the comments. How do you see China’s semiconductor industry taking shape over the next several years? Are there additional factors you see influencing the space?
Also, I encourage you to stay informed in the coming months as we explore this area more. You can register for updates on our microsite covering China’s impact on the semiconductor industry.
Raman Chitkara leads the global technology practice at PwC. Read his full biography here.
Don’t forget to follow SemiWiki on LinkedIn HERE…Thank you for your support!
Moore’s Law and Silicon Forest
When I first moved to Oregon in 1978 the largest industry was forestry, but then the endangered Spotted Owl was found and that put an end to many forestry companies and decimated the economy of many rural cities. Strangely enough it turns out that the Spotted Owl was found in great numbers across multiple states, so it never should’ve been placed on the endangered species list in the first place. Fast forward to 2015 and the number one Oregon industry is semiconductors, with Intel at the top revenue position. Many Californians have made the move to Oregon because housing is affordable in the Silicon Forest, the air is clean, and our Pinot Noir wines are world-class.
SEMIis a global industry association and they just hosted an event in Wilsonville (aka Mentor Graphics) with over 150 techies in attendance, the theme was Moore’s Law and I attended after a short 7 mile drive from Tualatin.
Mentor Graphics
Wally Rhines waxed energetically about extending semiconductor cost reduction for another 20 years, although even Gordon Moore is quoted as saying that, “No exponential is forever”. Here’s the log-log chart defining the learning curve where the X-Axis is cumulative units produced and the Y-axis is cost per unit:
Rhines showed historical data that followed this exact curve for:
- Revenue per transistor versus cumulative transistors shipped
- Personal computer volume
- Revenue per MIPS
- Semi equipment supplier
- Photolithography equipment
- Assembly equipment
- ATE
- EDA tools
In general Wally has seen that EDA revenue is about 2% of total Semiconductor revenues over the past 20 years now.
The costs per wafer show about a 20% increase per new node, but at the 20nm node the costs were more expensive than 28nm per transistor. At 14nm the cost per transistor still too high because we’ve missed the commercialization of EUV sources. In spite of all that, he expects that we’ll have continued progress per Moore’s Law for another decade.
ASML
Chris Spence started out by humorously apologizing for EUV being late to market, then presented an info-packed slide on 30 years of Litho progress.
EUV is still beset by slow throughput as measured in Wafers Per Hour (WPH) and high costs. ASML’s 3rd generation EUV tool, the NXE 3350B, can product up to 125 WPH, it’s progress but it isn’t easy
Defects can be related to IC layout patterns, not defect particles, so OPC can predict those patterns with a tool like Tachyon. Yes, multi-patterning allows us to reach lower than 10nm, however it will be pricey.
INTEL
Our speaker from the supply chain side was David Bloss, and he was quick to emphasize that, “Moore’s law is thriving at Intel”. The most controversial thing that David shared was that Intel has found a way to make cheaper transistors at 22nm and smaller nodes, contrary to what the rest of the industry seems to be saying. Their 7nm node is about a decade out, so that means 2025.
This slide is from Bill Holt’s investor meeting presentation of November 2014, one year ago. Intel was pleased that their suppliers are helping make Moore’s Law continue on. As for 450mm wafers we can expect that Intel will wait for the industry to drive demand.
IBM
From the R&D side we had Dr. Vamsi Paruchuri talking about the materials and process technology for advanced CMOS nodes. Although I don’t have permission to show you any of the slides that Vamsi presented, I can show you the IBM 7nm test chip that came out of fab in July, designed with silicon germanium for FinFET transistors, using EUV to get the small geometries.
Credit: Darryl Bautista/IBM
I found it interest that at the 7nm node they needed to use SiGe channels on FinFET for lower power and faster switching. Gate lengths are 15nm in 7nm technology, transistor width is 8nm minimum, and the height is 20nm.
Other interesting research technologies include Si Nanowire, Carbon Electronics (Carbon Nanotubes – optics and semi together), III IV FinFet, Stacked Nanowires, Stacked Nanosheet, and Gate All Around FinFET (GAAFET).
ASE
Representing the package and assembly business we heard from John Hunt. The hot acronym to remember is for Wafer Level Chip Scale Package, WLCSP.
Popular consumer devices like the Apple Watch and iPhone will be using technology like WLCSP to enable extremely thin products. TSMC will build the Apple A10 with fanout packaging technology. For IoT devices you can expect to see Fan Out SiP (System in Package).
Yole Development
Dave Towne showed us how more than Moore is being enabled by advanced packaging. Yole Development has standard reports for sale, custom research reports, and they know the market trends. On Fan Out Wafer Level Packaging (FOWLP) they are predicting this market segment to grow from $174M this year to $790M next year when the A10 is announced from Apple and using TSMC for both the chip and packaging technology. TSMC calls this Integrated Fan Out (InFO).
Take a quick look at how 2D versus 3D packaging features compare:
Source: Yole Development
Mobile phone companies like Apple and Samsung are routinely using Wafer Level Packaging to achieve the thinnest integrations. CMOS Image sensors are a big user of 3D packaging, and that trend continues on strongly.
Summary
It was a pleasant surprise to be exposed to such diverse presentations from SEMI members in one setting, because I normally have my EDA blinders on and get too focused on software-only solutions to semiconductor design challenges. It was clear that all of these companies must collaborate closely in order to exploit everything that silicon has to offer us. For now, Moore’s Law continues and the only question is, “At what cost per transistor?”
Don’t forget to follow SemiWiki on LinkedIn HERE…Thank you for your support!
ARM TechCon 2015 Preview!
Of all the live events I attend ARM TechCon is one of my favorites. The keynotes are always very good but the real meat of the conference is who attends because that is who the content is specifically developed for:
Why Gemini 2.0 is tailored for tomorrow’ SoC designs?
You probably have seen many times this graphic showing that the number of IP blocks has exploded, going from a few dozens in SoC designed in 65 nm to 120 if not more for last generation SoC targeting 16FF or 10FF. This graphic is very good at synthesizing the raw IP count, but it doesn’t tell you about another strong trend: more agents are participating in coherency. We may turn this another way: complex SoC designs are moving from homogeneous zone of comfort to heterogeneous, and this reality has started as soon as designs have moved to single core (single cache level) to TWO cores supporting cache memory. Looking at the picture below, you realize that certain designs may integrate several very complex multi-CPU (and GPU) clusters, architectured around three cache levels. Cache coherency is becoming a key issue, when integrating a Network-on-Chip, better to make sure that the NoC is cache coherent!
NetSpeed has initially built Gemini NoC IP to explicitly support cache-coherent designs, the new version, Gemini 2.0, has been developed to support SoC requiring coherency across multiple IP clusters. Customers are using NoCStudio to create ARM compliant interconnects using AMBA-compliant, ACE and ACE-lite, as well as coherency details. Gemini 2.0 improves configurability, adding a last level cache option and generates synthesizable RTL. Gemini 2.0 is an automated coherent NoC generator and the design team will take benefit of the tool, especially when being under the pressure of an aggressive schedule.
A new feature has been added into Gemini 2.0, Pegasus cache. In fact, Pegasus is a configurable IP and the designer will determine cache capacity, associativity, banking, internal power gating and allocation policies for each Pegasus, as a SoC can include several Pegasus modules. For example, the number of outstanding cache misses and other cache configuration features are configurable.
Using NetSpeed coherent NoC architecture allows improving cache utilization by controlling address ranges serviced by each cache, or defining which IP blocks can access the cache. A coherent cache only allows coherent access. What about the non-coherent traffic? It goes directly to memory.
Pegasus only supports one layer of coherency, and if different IP blocks including the same Pegasus cache, they will be non-coherent to each other. NetSpeed NocStudio enables cache hierarchy customization. Gemini 2.0 architecture is flexible enough to adapt to different scenario. Clearly explained on the above picture, different clusters made of two CPU with cache can be implemented in a variety of manners. One cluster will require the two CPU to share the same Cache Coherency Controller (CCC), the same Last Level Cache and the same (external) memory. Another cluster will be architecture with CPU still sharing the same CCC, but one CPU directly accessing on-chip RAM when the other will access external memory through Last Level cache…
Such greater flexibility explains why customers developing complexes coherent systems would greatly benefit from Gemini 2.0 to accelerate Time-To-Market. Architects who need a scalable, high-performance, correct-by-construction SoC interconnect should evaluate NetSpeed’s technology, especially if the design requires cache coherence.
Pushing on AXI-connected IP in FPGAs
Success stories are great. Reading how someone uses a product contributes much more insight than reading about a product. Last month we had a teaser for a presentation by Wave Semiconductor; this month, we have the slides showing how they are using FPGA-based prototyping, AXI transactions, and DPI to speed up development.
First, a mea culpa. I got the press release last month and the term DPI dangled ambiguously. My frame of reference is “deep packet inspection”, and when I inquired via email if that was the intent, the answer was yes. It seemed like a plausible response.
Wave is developing a programmable data flow computing accelerator with IP forming massively parallel reconfigurable processor arrays and a fabric interconnect. In their own words, they are “moving algorithms to the data”, a key tenet of big data analytics. From the press release just issued (link below), we see they have rebranded their architecture as Byte-Fabric. An older diagram on their website portrays the concept; this may not be the most recent version:
So, I get the slides from Wave this month, and sure enough DPI is all over them as promised. What they are really talking about is the SystemVerilog Direct Programming Interface, allowing C calls from SystemVerilog. Blogger fail. Now that we have the story straight, let’s connect the dots on this success and what it means for ASIC developers.
Wave is pushing on AXI as the strategy for IP interconnect in FPGAs for several reasons, primarily to aid in integration. Xilinx is heavily promoting the use of AXI-connected IP in FPGAs, which is a great thing to reduce partitioning differences between FPGA-based prototypes and the final ASIC. An abstracted AXI backbone helps with DMA, quality of service, out-of-order transactions, and other system-level performance characteristics.
The key to this particular success story is less obvious. By using the AXI interconnect, the FPGA-based prototyping platform from S2C can dynamically load, unload, and debug software through AXI on the FPGAs using a host PC. By applying a C language API, the same code from the verification environment can be reused in production.
Put another way, production C code can be used to verify the FPGA-based prototype. Usually testbenches are developed in SystemVerilog or maybe SystemC. In this project, Wave has used S2C’s Prodigy ProtoBridge with a driver that connects to a Xilinx PCIe IP block in the Prodigy Logic Module. This gives the PC user-level access to the AXI master, simplifying read/write to the entire FPGA using C/C++ calls. Then, a DPI-based AXI Master Transactor spans between the SystemVerilog testbench and C code.
In Wave’s environment, algorithms for big data analytics are developed in C, so running actual C code is essential to thorough verification testing. By preserving the ability to run the SystemVerilog testbench and use more conventional simulation tools, yet run production C code, Wave has combined the best of both worlds on an S2C platform.
If that type of hybrid scheme combining host-based simulation with an FPGA-based prototyping platform sounds vaguely familiar, it may be because it is. This idea bears a lot of resemblance to SCE-MI – a working group within Accellera that Toshio Nakama of S2C once chaired. The difference in this approach is the AXI connection and the heavy use of AXI-based IP in an FPGA, with a simpler API scheme that does the basics.
I’ll be curious to see what the final Wave Byte-Fabric product looks like. You can read more about what Wave has done with the S2C platform, AXI, and DPI in their development:
Wave Semiconductor Achieves Rapid Design Success Using S2C FPGA Prototyping Solutions
Case Study: AXI HW/SW Verification for FPGA (registration and PDF download)
When Wave says this approach simplified their debug environment and reduced development time, keep in mind the end product – the Byte-Fabric SoC and supporting software – is designed around AXI for performance. Tapping directly into AXI removes several interim steps in verification, particularly in creating a testbench reusing production C code on an FPGA-based prototyping platform.
Perfecting the Great Verification Fugue
Michael Sanie (Senior Director Marketing in the Synopsys Verification Group) gave the wrap-up presentation at SpyGlass World recently, on the Synopsys Verification Direction. I learned from an interview Michael gave to Paul McLellan that he is an accomplished pianist. I’m a pianist also, though of considerably less talent, so I had to build this blog around a musical theme.
The fugue is a form, perfected in the Baroque period, in which one or more melodies chase (the meaning of fugue) through multiple voices or levels of a piece. This reminds me of the way we think of functional verification today. We have multiple different ways to verify, like multiple voices: simulation, static and formal verification, emulation, virtual prototyping and FPGA prototyping. Each of these has its own advantages but you couldn’t think of completing verification without using all of them. And as in a fugue, verification involves a complex interplay of themes between these voices. Emulation may carry a theme for a while, until some unexpected behavior creates a need to drop into simulation, which may in turn require a need to drop a sub-theme into formal verification until a problem is resolved, from which the theme then returns to the emulation voice.
Now imagine if each time a transition was reached the performers had to stop to re-read the score, reset instruments and do a couple of dry-runs to make sure they had the next part polished. The audience wouldn’t stick around for long. But that’s the way the verification fugue has worked in the past. Synopsys coined the term Verification Continuum™ for rendering a seamless verification fugue. This requires native integrations of compile, debug and more across the platform of solutions in the continuum. If every tool reads the score in the same way, there is no need to re-read on every transition. If every tool interfaces to debug in the same way, there is no need to reset debug style or cross-compare between incompatible interfaces.
Fully accomplishing this is not easy, in part because individual components have their origins in different tools with different objectives but also because an optimum architecture for one component isn’t necessarily optimum for another. The Synopsys approach is a more difficult but also more enduring engineering solution than any number of quick fixes. Each individual theme (a component) is refined, while seamless transitions between the themes (the Continuum) are optimized. Synopsys started with this approach several years ago, even before its acquisitions of Springsoft and Eve. We’re already seeing the results of those optimizations.
Perfecting the Verification Continuum is an ongoing exercise. The SpyGlass™ integration roadmap is still being developed. But the direction is clear: to make delivering a majestic verification fugue as seamless and as compelling as possible. You can learn more about Synopsys verification solutions HERE. This shows a pretty easy-to-understand picture of the Verification Continuum. And if you’re confused (as I was) about the difference between Verification Continuum and Verification Compiler, Verification Continuum is a platform of verification solutions whereas Verification Compiler is a product which includes simulation, VIP, static and formal verification and more, but not emulation or prototyping. To learn more about Verification Compiler, click HERE.
By the way, if you’re not a fan of classical music and all this fugue talk means nothing to you, fear not. A greatly simplified version is the round (Row, row, row your boat, Frère Jaques or Three blind mice, sung by multiple voices, each starting at a different time). Counterpoint – using multiple voices with different but harmonically related melodies – underlies the fugue and is quite common in rock, folk and related music. Some examples: Scarborough Fair (Simon and Garfunkel), Hello Goodbye (Beatles), and I Get Around (Beach Boys). Replace “fugue” with any one of these and you should get the idea – melodies chasing around between different voices.
Implications of LTE in the 5 GHz Band
Back in December 2013, during a 3GPP Radio Access Network (RAN) plenary meeting, Ericsson and Qualcomm introduced LTE-Unlicensed, a scheme that puts LTE-Advanced signals in the 5 GHz unlicensed UNII band, in conjunction with an LTE “anchor” signal in a licensed band. The objective is supplemental downlink bandwidth (eventually both supplemental downlink and uplink bandwidth), to help meet ever-expanding mobile broadband data demands.
This proposal, and the follow-on License-Assisted Access (LAA) proposal targeted for 3GPP Release 13, has generated a great deal of discussion regarding coexistence and airtime fairness among 5 GHz LTE and 802.11n/ac signals. The FCC, as steward of both license and licensed spectrum in the United States, has been the locus of stakeholder comments, both affirmative and less sanguine. The stakes in this game are enormously high, especially for Wi-Fi services incumbents.
Wireless services providers have spent many tens of billions of dollars globally acquiring geographically exclusive spectrum licenses. With up to 500 MHz of unlicensed spectrum available between 5.17 and 5.85 GHz, the economic incentive to cellular operators for LTE-U and LAA is obvious. Qualcomm and other proponents have put forward technical methods of clear channel assessment to assure coexistence and airtime fairness based on carrier sensing and energy detection. Most countries in the world require a “Listen-Before-Talk” protocol to avoid co-channel interference, not only between LTE signals and Wi-Fi signals, but also among multiple LAA service operators.
To me, there are some interesting and perhaps more subtle open questions:
- Supplier market dynamics. It seems clear that LAA drives a tighter functional integration between the LTE transceiver and the Wi-Fi transceiver. The medium-term impact to Wi-Fi only semiconductor radio suppliers is potentially significant.
- Need for additional RF content in smartphones and small cell base stations. LAA is likely to be a boost for small cell base station suppliers, since maximum transmit power levels in unlicensed bands is much lower than licensed bands that can utilize macro cell base stations. Both LAA enabled small cells and smartphones will likely need additional 5 GHz TDD RF front-end content to meet new consumer connectivity use cases. LAA complicates the already staggering complexity introduced by 3GPP Release 10 LTE carrier aggregation, which allows up to five 20 MHz interband component carriers. Release 13 is anticipated to support up to thirty-two 20 MHz interband component carriers across licensed FDD and TDD bands, as well as the 5 GHz unlicensed TDD band. The RF front-end content required to support these new carrier aggregation combinations will no doubt have to grow substantially.
- Potential impact on the layer 2 (data link layer) architecture in next generation (“5G”) cellular and IEEE 802.11ax Wi-Fi. The admission control mechanisms defined in 3GPP LTE-A provide assured levels of Quality of Service not typically achieved in IEEE 802.11n/ac, and despite the enhancements of 802.11e, Wi-Fi is a best effort service in most consumer network implementations. This is one of the secondary technical motivations for LAA proponents. There is continued room for improvement in the MAC layer architectures of next generation wireless standards to complement expected PHY layer improvements, and it will be interesting to follow innovations in this area.
We will be publishing a report on these and other commercial and technical implications in the coming months.
Don’t forget to follow SemiWiki on LinkedIn HERE…Thank you for your support!

