RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Worldwide Design IP Revenue Grew 12.4% in 2017

Worldwide Design IP Revenue Grew 12.4% in 2017
by Daniel Nenni on 05-11-2018 at 7:00 am

When starting SemiWiki we focused on three market segments: EDA, IP, and the Foundries. Founding SemiWiki bloggers Daniel Payne and Paul McLellan were popular EDA bloggers with their own sites and I blogged about the foundries so we were able to combine our blogs and hit the ground running. For IP I recruited Dr. Eric Esteve who had never blogged before but he took to it quite quickly. I knew Eric from his IP reports at my previous position working with the foundries at Virage Logic.

Since going online in 2011 SemiWiki has published 693 IP related blogs with 3,572,124 views. Eric has written 277 of those blogs averaging close to 6,000 views per blog. Today Eric is by far the most respected IP analyst with the most detailed and accurate reports and it is an honor to work with him, absolutely.

According to the Design IP Report from IP-Nest the market is still doing very well with YoY growth of 12.4% in 2017. The ARM Group of Softbank (previously known as ARM Holdings) is again a strong #1 with IP revenues (licenses plus royalties) of $1,660 million and 46.2% market share, followed by Synopsys growing by 18% to $525 million and 14.7% share. Broadcom, being the addition of Avago + LSI Logic + Broadcom, is making an entry in the top 3, replacing Imagination. Both Cadence and CEVA are showing 20%+ growth in 2017.


IPnest has defined 11 categories ranking IP vendors. The CPU IP category is the largest with about 42% of revenues from design IP. There are strong disparities between CPU, DSP, and GPU & ISP as the weight of the CPU category is about 9x the DSP and 5x the GPU/ISP.


ARM is obviously #1 in the CPU category, and will probably keep this position for a long time due to the royalty mechanism. Nevertheless, we can see that ARM CPU IP license revenue has declined by 6.8% YoY, more than compensated by the royalty revenue growing by 17.8%. The reasons may be multiple. After the ARM acquisition by SoftBank, the accounting policy was changed creating what Eric calls an “artifact”. However, in my opinion we are starting to see the impact of RISC-V becoming a credible alternative to the ARM CPU hegemony. The 2019 Design IP report should confirm this.

In the Processor group (CPU + DSP + GPU & ISP), Imagination Technologies (IMG) is still #2 but I expect their royalty revenue to collapse when Apple effectively moves to an internal GPU solution. Now, if we consider the followers, both CEVA and Cadence have made 20%+ progression in 2017, then it wouldn’t be surprising to see one of these two companies becoming the #2 next year. In my opinion it will be CEVA but in any case the ranking in the Processor group will be disrupted next year.


The next group after Processor is the Physical IP, including: Wired Interface IP, SRAM memory compiler, other Memory Compilers, Physical Libraries, Analog and Mixed-Signal and Wireless Interface IP.

If we look at the Wired Interface IP category, it’s now at $735 million (20% YoY growth) and 20.5% of the total. Synopsys is the clear leader with about 45% market share, as well as in the Physical IP group with 35% market share.

If you take a look at the picture above you can see an interesting trend; the 2016/2017 evolution of the Wired Interface IP as a portion of the total moves from 19% to 20.5% when Processor IP declines from 58.3% to 56.4%.

As forecasted by IPnest in the “Interface IP Survey & Forecast”, the Wired Interface IP should reach $1 billion in a few years (2021 or 2022). Even though ARM is giving up in China through a so-called “joint venture”, the Wired Interface IP category will remain an island of (growing) stability in the 2020s.

Increasing complexity: can you imagine that the DDR5 memory controller PHY is now running at 4400 Mb/s? This is also 4.4 Gb/s or almost the PCIe Gen-2 data rate (5 Gb/s), which was one of the IP stars 10 years ago.

Eric was chairman of a panel during DAC 2017 “Growing IP market despite semi consolidation” and this panel came to a consensus: The IP market in 2010-2020 is like the EDA market in 1980-1990, outsourcing was the rule and the result was an EDA market completely externalized by 2000. As long as a function is not perceived as a differentiator by a design team, it can be outsourced and it will become an IP sold by commercial vendors (see the Top 10 list).

Bottom line: If you apply a 12% CAGR for the next 5 years you can easily predict a $6 billion IP market in 2022.

To buy this report, or talk to Eric you can contact him at eric.esteve@ip-nest.com. He will also be at DAC again this year if you want to meet him.


TSMC Technologies for Mobile and HPC

TSMC Technologies for Mobile and HPC
by Alex Tan on 05-10-2018 at 12:00 pm

During TSMC 2018 Technology Symposium, Dr. B.J. Woo, TSMC VP of Business Development presented market trends in the area of mobile applications and HPC computing as well as shared TSMC progress in making breakthrough efforts in the technology offerings to serve these two market segments.

Both 5G and AI are taking the center stage in shaping the high double-digit data growth demand. For mobile segment, the move from 4G LTE to 5G requires the use of higher modem speed (from 1Gbps to 10Gbps), 50% faster CPU, twice faster GPU, double transistor density, 3X performance increase of AI engines to 3 TOPS (Trillion Operations Per Seconds) target and without much power increase. In this market segment, TSMC is ushering the move from 28HPC+ towards 16FFC.

On the cloud side, data center switch demands double throughput, from 12.8Tbps to 25.6Tbps as the computing demand move towards the network edge. Similarly the drive towards double memory bandwidth, 3 to 4x increase in throughput of AI accelerators and up to 4x transistor density improvement are taking place.

N7 Technology Progress
Dr. Woo stated that delivering high density and power efficiency requirements to satisfy low latency of data intensive AI application is key to the success of TSMC N10 process. It has also enabled AI in the smartphones space. On the sideline, N7 node has been making good progress in providing excellent PPA values with >3x density improvement, >35% speed gain and >65% power reduction over its 16nm predecessor.

N7 HPC track provides 13% speed over N7 mobile (7.5T vs 6T), while it has passed the yield and qual tests (SRAM, FEOL, MEOL, BEOL) and MP-ready D0. Part of the contributing factor is TSMC successful leveraged learning from N10 D0 and it is targeted for Fab15.

The N7 IP ecosystem is also in ready state with over 50 tapeouts slated by end of 2018 for mobile, HPC, automotive and servers. The 7nm technology is anticipated to be having a long life similar to its predecessor 28nm/16nm nodes. The combination of mild pitch scaling from N10 to N7 plus the migration from immersion to EUV scaling and denser standard cell architecture make significant overall improvement.

EUV Adoption and N7+ Process Node
She shared some progress of the EUV application on N7+. Applicable on selected layers, EUV reduces the process complexity and enhances resulting pattern fidelity. It also enables future technology scaling while offering better performance, yield and shorter cycle time. Dr. Woo showed caption of via resistance having much tighter distribution in N7+ EUV versus N7+ immersion as it delivers better uniformity.

The N7+ value proposition includes delivering 20% more logic density over N7, 10% lower power at same speed, and additional performance improvements anticipated from the ongoing collaboration with customers.

N7+ will also have double digit good die increase over N7 node as it gains traction from capitalizing the use of the same equipment and tooling. She claimed that it has lower defect density than other foundries as well as comparable 256Mb SRAM yield and device performance vs N7 baseline. TSMC provides easy IP porting (layout and re-K) from N7 to N7+ for those design entities that do not need to be redesigned.

HPC and N7+ Process Node
For HPC platform solution, the move from N7 to N7+ involves the incorporation of EUV, denser standard cell architecture, ultra-low Vt transistors, high-performance cells, SHDMIM (Super High Density MIM) capacitance and larger CPP (Contacted Poly Pitch) and 1-fin cells.

N7+ offers better performance and power usage through the use of an innovative standard cell architecture. It allows higher fin density in the same footprint for 3% speedup. On the other hand, reducing power by applying single-fin in non-timing-critical area reduces about 20% capacitance and in-turns, the dynamic power number.

The adoption of new structures also enhances MIM capacitance and utilization rate for HPC 3% to 5% performance boost. N7+ design kit is ready for supporting the IP ecosystem.

N5 Value Proposition
It has new elVt (extreme low Vt) offering a 25% max speed-up versus N7, incorporating aggressive scaling and full-fledged EUV. N5 has made good progress with double digit yield on 256Mb SRAM. Risk production is slated to be 1H2019.

Dr. Woo also shared a few metrics compared with N7 process (test vehicles used ARM A72 CPU core + internal ring oscillator):
– 15% speed improvement (up to 25% max speed)
– 30% power reduction
– 1.8x increased logic density through innovative layout and routing
– 1.3x analog density improvement through poly pitch shrink and selective Lg and fin #, yielding a more structured layout (“brick-like” patterns)

16FFC/12FFC Technologies
Dr. Woo covered RF technologies and roadmap (more on this in subsequent blog on IoT and Automotive). She mentioned that N16 and N12 FinFet based platform technologies have broad coverage, addressing HPC, mobile, consumer and automotive. Both 16FFC and 12FFC have shown strong adoption data with over 220 tapeouts. 12FFC should deliver 10% speed gain, 20% power reduction and 20% increased logic density vs 16FFC through dual-pitch BEOL, device boost, 6-track stdcell library and 0.5v VCCmin.

To recap, AI and 5G are key drivers for both mobile and HPC product evolutions. Along this line, TSMC keeps pushing the PPA (Power, Performance and Area) envelopes for the mainstream products while delivering leading RF technologies to keep pace with technology accelerated designs in these segments.

Also read: Top 10 Highlights of the TSMC 2018 Technology Symposium


Converter Circuit Optimization Gets Powerful New Tool

Converter Circuit Optimization Gets Powerful New Tool
by Tom Simon on 05-09-2018 at 12:00 pm

DC converter circuit efficiency can have a big effect on the battery life of mobile devices. It also can affect power efficiency for wall-power operated circuits. Even before parasitics are factored in, converter circuit designers have a lot of issues to contend with. Optimizing circuit operation is essential for giving consumers what they want. Switching converters are light years ahead of old-school transformer based designs. However, switching converters are often operating at high frequencies that can create challenges for efficient operation. In addition to parasitic inductances and additional current from reverse recovery effect, the PowerMOS devices themselves do not operate as ideal devices.

It’s necessary to understand that PowerMOS devices are really an assembly of large numbers of parallel intrinsic devices with a complex and distributed structure. As such, switching does not occur simultaneously across all the intrinsic devices. Within a PowerMOS RC delays greatly affect Vgs present at the gates in the low and high side transistors. Previously it has been difficult to run simulations that take this into consideration. Fine grain extraction of gate, source and drain interconnect is not a good application for traditional rule based extractors. Designers have struggled with this lack of visibility up until now.

Recently Magwel has released a tool specifically targeted at realizing comprehensive and accurate simulation of converter circuits, including the complex internals of PowerMOS devices. Magwel’s PTM-TR does several unique things to provide transparency into the detailed switching behavior of PowerMOS devices. PTM-TR uses a solver based extractor to correctly and accurately determine parasitics for the internal metallization within PowerMOS devices. The gate regions are divided up according to user set parameters and the intrinsic device model is applied to create a simulation view of the device that incorporates full internal structure. This model is known as a Fast3D model and is used by PTM-TR with Cadence Spectre® to co-simulate dynamic gate switching behavior at each time step of circuit operation.

Because the Fast3D model is used in conjunction with Spectre circuit simulation, it can be used with test benches, or to perform any desired simulation, such as corner analysis. PTM-TR comes with the additional benefit of showing graphically the internal field view of the device at each time step. This is a direct benefit of the co-simulation. Magwel has a fascinating video on their website that shows how the field view of the PowerMOS device can be useful in understanding dynamic switching performance.

The Magwel video highlights how Vgs reported in simulation can differ from Vgs at each individual gate location. At one time step in the transition of the half-bridge, the delta in Vgs is ~2V. This can have a large effect on shoot through current. Also, during early switching with only certain sections of the device turned on, higher than expected current densities are possible – leading to EM and thermal issues. With PTM-TR designers can modify and test PowerMOS devices to achieve optimal performance.

PTM-TR is part of Magwel’s complete family of power transistor modeling tools. The base product, PTM, reports Rdson and static power and current per layer. PTM-ET gives insight into combined electro-thermal performance of PowerMOS devices. PTM-ET uses thermal models that can include heat sources and sinks on the die, as well as thermal properties of the package.

The PTM-TR video can be viewed on the Magwel website. More information about the PTM product family and Magwel’s solutions for ESD and power distribution network analysis can be found there as well.


Semiconductor Specialization Drives New Industry Structure

Semiconductor Specialization Drives New Industry Structure
by Daniel Nenni on 05-09-2018 at 7:00 am

When traveling the world there are the things that you see and the people that you meet. I have been very fortunate to meet some of the most amazing people and one of those people is Dr. Walden Rhines. Wally spent the first half of his career in semiconductors at TI and the second half in EDA with Mentor Graphics which gives him a cyborg like quality, absolutely.

Continue reading “Semiconductor Specialization Drives New Industry Structure”


Low Power Verification Shifting Left

Low Power Verification Shifting Left
by Bernard Murphy on 05-08-2018 at 11:00 pm

I normally think of shift left as a way to move functional verification earlier in design, to compress the overall design cycle. But it can also make sense in other contexts, one particularly important example being power intent verification.

If you know anything about power intent, you know that it affects pretty much all aspects of design, from architecture through verification to PG netlist. If the power intent description (in UPF) isn’t correct and synchronized with the design at any of these stages, you can run into painful rework. At each handoff step you will carefully verify, but design and power intent both iterate and transform through the flow so keeping these aligned can become very time (and schedule) consuming.

One important step towards minimizing the impact is to ensure that interpretation is consistent across the design flow. UPF is a standard but that doesn’t guarantee all tools will interpret commands in exactly the same way, especially where the standard does not completely bound interpretation. Which means, like it or not, closure between synthesis, implementation and verification in mixed vendor flows is likely to be more difficult. Not to say you couldn’t also have problems of this kind in flows from a single vendor, but Synopsys have made special effort to maximize consistency in interpreting power intent across their toolset.

A second consideration is how you verify power intent and at what stage in the design. There are static and dynamic ways to verify this intent (VC LP, and VCS NLP power-aware simulation,). Each of these naturally excel at certain kinds of checks. You couldn’t meaningfully check in simulation that you are using the appropriate level shifters (LS) wherever you need LS, but you can in VC LP. Conversely you can’t check the correct ordering and sequencing of say a soft reset in VC LP, but you can in power-aware sims.

Naturally there are grey areas in this ideal division of techniques and those are starting to draw more attention as design and verification teams push to reduce their cycle times. Simulation is always going to be time-consuming on big designs; simply getting through power-on-reset takes time, before you can start checking detailed power behavior. As usual, the more bugs that can be flushed out before that time-consuming simulation starts, the more you can focus on finding difficult functional bugs. This is why we build smoke tests.

In low power verification this is not a second-order optimization. Power aware sims can take days, whereas a VC LP run can complete in an hour. For example, an incorrect isolation strategy could trigger X-propagation in simulation, which burns up not only run-time but also debug time. Improved static checking to trap at least some of these cases then becomes more than a signoff step – it optimizes the dynamic power verification flow. Hello shift left.

This grey area is ripe for checks. Some already provided by VC LP include checking whether a global clock or reset signal passes through a buffer (or other gate) in a switchable power domain; this can obviously lead to problems when that domain is off. Similarly, a signal controlling isolation between two domains but sourced in an unrelated switchable domain will never switch out of isolation while that domain is off. These are the kinds of problems that can waste simulation cycles. Synopsys tells me they continue to add more checks of this type to minimize these kinds of problems.

Another capability that I have always thought would be useful is to be able to check UPF independent of the RTL. After all, the RTL may not yet be ready or it may be rapidly changing; that shouldn’t mean that the power intent developer is stuck being able to write UPF but not check its validity. VC LP apparently provides this capability, allowing you to check your UPF standalone for completeness and correctness; given the power/state table definition, are all appropriate strategies defined, for isolation and level-shifters for example?

Prior to implementation, VC LP will run predictive checks for DC/ICC2, looking for potential synthesis or P&R issues, such as problems with incorrect or missing library cells and signal routing. And of course for low power signoff, it will run checks on the PG netlist looking for missing or incorrect connections by power/ground domains. Still the signoff value you ultimately need, but now adding capabilities for shifting low power verification left. You can learn more about VC LP HERE.


Cross View Static Validation

Cross View Static Validation
by Alex Tan on 05-08-2018 at 12:00 pm

Improper handling of design validation could simply translate into a debugging exercise. In mainstream RTL2GDS flow, design implementation involves a top-level integration and lower-level block developments. These lower-level components, comprising of macros, IPs and standard cells are subjected to frequent abstraction captures as inherently required by many cross-domain development and analysis tools. As a result, validation without automation is becoming a complex debugging feat.

Checklist
In dealing with numerous design view formats such as for netlist or layout, ambiguity may be present at the interface. Port directionality and labeling are critical. Although some attributes or design entities might later be implicitly defined at the top-level –such as the interface signals, which become internal nets at the top-level, correct assignment is still needed for completeness of the subdesign or the model’s boundary conditions. For example default I/O loading condition or input slew rate may be needed to prevent non-realistic ideal scenario.

Many approaches are available to capture a checklist in design flow. Static verification driven processes such as code linting, design rules checks and optimization constraints related checks have been quite effective in shaping design implementation and have their reliance on the use of a comprehensive checklist (rule set). Such checklist is normally aggregated over several project cycles and could be overwhelmingly long as well as complex to automate.

To address this cross data-format IP blocks or library validation need, Fractal Technologies provides Crossfire software. It is capable of doing the following two types of checks:

  • Implementation completeness of a design entity. For example it will compare any item that has been parsed against its equivalent items from a reference. Such checks are applied to all different object types in the Crossfire data model, such as cells, pins, nets, timing arcs and power domains.

  • Intrinsic sanity checks of model databases. For example it uses the various process corners described in library files and check whether delays shows increase with rising substrate temperature or decrease with increasing supply voltage.

In general, there are three usage categories for Crossfire as a sign-off tool for libraries and hard IP’s; as sanity checkpoint in library/IP development and as facility to inspect for the quality of incoming libraries and hard IP’s.

View Formats
Over the last few years, Fractal enhanced the Crossfire validation to expand coverage of settings and database used by many third-party well known implementation and analysis tools. Over 40 supported formats and databases that include the following list:


Bottleneck and Root Cause Analysis
Design rule checks could be in the order of hundreds or thousands. Multiple occurences of them could translate to tens of thousands or higher DRC errors depending on the design size. Similarly for database consistency checks, root causing the trigger point of a downstream error is a tricky feat requiring both traceability and understanding of the data flow.

Fractal Crosscheck has a mechanism to allow quick traceability through color-based connected graphs called fingerprint analysis and visualization (as shown in figure 2). For example designer could view the flagged errors and filter-out only on a cluster of interest through the color aided selection or by way of controlled waivers. This can be done up and down the design entities readily linked. Binning errors based on a rule-pivot or format-pivot can be performed as well.


Another Fractal usage flavor is to perform debug visualization through the use of the reporting dashboard. For example, assuming that an error occurs after running LEF vs DEF check check run, the designer could click on the analyze button to open message window. Subsequently, by selecting the layout view, the portion of the layout contributing to the error is highlighted in a pop-up view.

Library and hard IP validation is key in ensuring a smooth deployment for downstream tools in the design flow. Fractal’s Crossfire provides the facility to bridge many cross-domain design views and rule-based checklist to confirm the integrity of lower-level design implementation. The GUI driven dashboard and fingerprint diagram simplify diagnostic reporting, visualization and debugging process.

Native checks can be combined with externally done validation results (such as from DRC runs) to be viewed through the navigation dashboard. Currently more than 250 checks have been incorporated into Fractal Crossfire.

For more Fractal tool info, including whitepapers on usage cases and features, check this website LINK.


Formal Signoff – a Cisco Perspective

Formal Signoff – a Cisco Perspective
by Bernard Murphy on 05-08-2018 at 7:00 am

The second segment of Oski’s most recent “Decoding Formal” event was a talk by Anatoli Sokhatski (formal tech lead at Cisco) on training and methodology development for a structured and scalable approach to formal verification, particularly with emphasis on formal signoff.

Anatoli stressed that he and others in the team did not go into this as novices in formal, but they found they needed a more consistent methodology they could carry forward between designs. Formal had been used successfully on previous products, however learning hadn’t really been captured before the experts moved on. I’m beginning to think this is a root-cause for why many organizations still consider formal to be hard. Imagine if you had to re-create, from scratch on each program, UVM and constrained-random expertise, along with coverage measures, assertions and all the other paraphernalia of modern dynamic verification. You don’t have to do that because you can start from a lot of process, assets and training built up over many programs. The same approach equally makes sense for formal.

Anatoli’s talk had three main sections: why formal is so important in verification for networking, a few examples of the methods they use to manage complexity in proofs, and the process they developed around formal signoff to ensure this expertise could carry forward onto other programs.

Formal is important in many contexts, but it’s always interesting to understand the application-specific reasons that make it important to a given domain. For Anatoli, in networking the problem starts with wide busses, deep pipelines and big memories. This is compounded by protocol challenges – packets spread across many cycles, packet interleaving and priority packets breaking into lower priority packets. Together these create one of those challenges so suited to formal – a huge range of combinational possibilities with some sequential depth but not too much, in proving that there is no possible case in which data could be corrupted in some manner.

Anatoli’s group kicked off with an intensive 2-week training Oski training session (lectures and labs) which gave them a solid grounding in flows, principles for calculating required proof depths (needed when justifying bounded proofs), techniques to overcome complexity (which I’ll briefly mention next), abstraction methods, constraints and managing over-constraints, wrapping up with principles of formal coverage and formal signoff.

Anatoli discussed several concepts which I’ve seen touched on in advanced formal usage, though rarely explained in depth. I’m going to give a quick summary here (thanks to info provided by Roger Sabbagh, VP Applications Engineering at Oski). I hope to do a deeper dive on these in a later blog. All of these are prompted by a basic question: how do you check the correctness/ integrity of packets passing through a pipeline when of course those packets can contain arbitrary data?

The starting point (at least for me) is something called the floating pulse method. This is a technique in which the formal tool can assert a single pulse at some arbitrary time after reset, and on that pulse do something special. The special thing that is done in this context is to tag a word in an incoming packet word with an easily recognized ID, a process widely known as coloring. That color can then be spotted at a later/deeper cycle, allowing for various kinds of check.

So for example, Anatoli said they applied this method to check that the time between the floating pulse on an incoming packet, and when that forwarded colored data appeared at the output. This should fall within a required maximum number of cycles. The clever trick here is that, thanks to the floating pulse method, the formal tool effectively tracks anyelement of anyincoming packet, therefore verifying that this maximum number of cycles holds for all possible packet sequences.

Anatoli talked about other checks, but I’ll mention just one that I particularly like, not least because it has confused me for a long time. This is Wolper coloringand is used (at least here) for data integrity checks. The same general approach is followed as above, but in this case, twoconsecutive words are colored, presumably differently, in the incoming packet. The check at the output is then that the same words are seen consecutively, in the correct sequence, with no additional colored words around them. This confirms that no words were dropped, no words were replicated, words weren’t swapped and nothing was inserted between words. In other words, data integrity was preserved. Again, pretty clever.

The third part of Anatoli’s presentation was on how they setup a repeatable and scalable flow, particularly for formal signoff though I imagine elements of this approach would be useful even for basic property proving. The first step towards not having to reinvent the wheel each time in a process must be documentation and templates. He detailed their approach:

  • A document describing phases & milestones

    • FV Planning, Basic Function Complete, Main Function Complete, Block Ready for Speculative Freeze, FV complete
  • A detailed description for/definition of each phase; for example for Main Function Complete:

    • Checkers implemented for all non-error cases
    • Constraints implemented for all non-error cases
    • Over-constraints applied to disable error cases
  • Defined exit criteria for each phase, for example for FV Complete:

    • Final FV Environment Specification review
    • Final FV Test Plan review
    • Checklist approved

Templates include a checklist for each milestone (e.g. review required proof depth per assertion, review cover properties), a documentation template (e.g. interface components, E2E checkers, testbench configurations). Finally, they capture this in their (in-house) testplan tracking tool (similar I would guess to Cadence vPlanner/vManager). May all seem like a bunch of bureaucratic overhead, but anyone who has had to launch similar tasks more than twice or been tasked with training new hires on verification setup will instantly understand the value of this process.

Naturally they support all of their regression testing through in-house Makefile flows, which they use to run both simulation and formal testing (with different goal scripts per task). One aspect I found interesting is that they control proof parameters through Makefile macros, which simplifies setting up case-split runs (though no doubt they may need to experiment with this to ensure they cover all the parameters on which they might want to split).

Anatoli made a final point on interface checkers. Even though they are aiming for formal signoff, they still want to roll up interface checks (assertions and constraints) to simulation so they capture these in a separate file which can easily be reused in sim regressions. A reminder that even the more advanced users of formal methods (E2E checks) still want to sanity check against simulation. You can watch the video HERE.


AI processing requirements reveal weaknesses in current methods

AI processing requirements reveal weaknesses in current methods
by Tom Simon on 05-07-2018 at 12:00 pm

The traditional ways of boosting computing throughput are either to increase operating frequency or to use multiprocessing. The industry has done a good job of applying these techniques to maintain a steady increase in performance. However, there is a discontinuity in the needs for processing power. Artificial Intelligence (AI) is creating an exponential increase in throughput. Strategies for delivering more performance for AI have included moving to increased numbers of processors and moving to architectures that include GPUs and FPGAs.

Nonetheless, hard obstacles exist in using these approaches. For starters, processor speeds have been frozen at ~4GHZ for quite some time. This is why parallel processing has become the method of choice. Yet, as the number of processing elements increase, the performance per processor decreases as overhead ramps up. This is a result of a combination of hardware limitations and the difficulty in fully utilizing in software the available computing power. With multiprocessing, GPUs or FPGAs, there is consistently a bottleneck with the ‘head end’ processor. AI exacerbates this with its inherently parallel operation with massive peer-to-peer data transfer needs.

Wave Computing has developed systems that use a completely new, underlying dataflow processor architecture optimized to run machine learning (ML) workloads effortlessly and at scale. Their massively parallel data processor unit (DPU) chips contain 16,000 individual processors that are highly interconnected. Each processor runs non-branching instruction sequences, handling training or inference tasks at clock rates pushing 6 GHz. This is possible because they have eschewed a global clock signal and use an elegant synchronization method to keep the processors in step with each other.

In the Wave Compute Appliance, multiple chips are used to achieve up to 128,000 processing elements per appliance. Four appliances can be combined to provide 512,000 processing elements.

The hard work of delegating processing across the reconfigurable array is done up front with their programming tools. Without the need for a central task scheduler, their solution avoids a major potential choke point. In addition to a throughput advantage, their approach offers a significant energy advantage with the deskside system consuming no more than 0.16kW.

Naturally, the system connectivity and memory needs to be optimized to take advantage of their dataflow architecture. Their DPU boards have slots for 32GB of interconnected high-speed, high-bandwidth DRAM, and are equipped with 512GB of DDR4 high-capacity DRAM. Additionally, there is PCIe connectivity. Wave Computing had the task of building in extensive and varied high-speed off chip interfaces. To meet their power and performance needs at 16nm, they tapped Analog Bits for the SOC’s critical SerDes IP.

Using optimized SerDes can save a lot of area and power, given the increasing portion of a total chip’s resources they are consuming in newer designs, with many lanes and higher data rates. Wave Computing wanted to provision SerDes that matched the power sipping characteristics of their DPUs. Analog Bits offers a state-of-the-art, non-LC based SerDes that supports multiple protocols and is tolerant of metallization and orientation changes. This means that on three sides of their DPU, Wave Computing was able to use the one SerDes to support all their needs. Analog Bits’ 16nm low-power SerDes needs only 4mW/Gbps to operate up to 10Gbps. Above 16Gbps, their solution only uses 6.5mW/Gbps.

Wave Computing estimates that AI training can use 80% of data center capacity. Deploying a faster and more energy efficient data center resource for this activity could dramatically change AI performance and its carbon footprint. A side benefit will be the ability to allow for more experimentation and ultimately the development of better algorithms. A data scientist could have a supercomputer at their deskside for their dedicated use. Alternatively, with lower power densities and more compact packaging, serious computing power can move from the cloud closer to the edge to provide lower latency and better data fusion.

Comprehensive information about Analog Bits’ analog IP can be found on their website. To learn more about Wave Computing’s solution that uses this IP, I suggest looking on their website and checking out the white papers on their dataflow technology.

About Wave: Wave Computing is revolutionizing the deep learning industry by enabling organizations to drive better business value from their data with its roadmap of WaveFlow™ computing systems. The company’s innovative system solutions based upon dataflow technology provide high-performance training and high-efficiency inferencing at scale, bringing deep learning to customers’ data wherever it may be. Wave Computing was named a Machine Learning Industry Technology Innovation Leader by Frost & Sullivan, and a Top 25 Artificial Intelligence Provider by CIO Application Magazine.

About Analog Bits:Founded in 1995, Analog Bits, Inc. (www.analogbits.com), is a leading supplier of mixed-signal IP with a reputation for easy and reliable integration into advanced SOCs. Products include precision clocking macros such as PLLs & DLLs, programmable interconnect solutions such as multi-protocol SERDES and programmable I/O’s as well as specialized memories such as high-speed SRAMs and TCAMs. With billions of IP cores fabricated in customer silicon and design kits supporting processes from 0.35-micron to 7nm, Analog Bits has an outstanding heritage of “first-time-working” with foundries and IDMs.

Also Read:

7nm SERDES Design and Qualification Challenges!

CEO Interview: Alan Rogers of Analog Bits

IP development strategy and hockey


My advice to the world’s entrepreneurs: Copy and steal the Silicon Valley way

My advice to the world’s entrepreneurs: Copy and steal the Silicon Valley way
by Vivek Wadhwa on 05-07-2018 at 7:00 am

In a videoconference hosted by Indian startup media publication Inc42, I gave entrepreneurs some advice that startled them. I said that instead of trying to invent new things, they should copy and steal all the ideas they can from China, Silicon Valley and the rest of the world. A billion Indians coming online through inexpensive smartphones offer Indian entrepreneurs an opportunity to build a digital infrastructure that will transform the country. The best way of getting started on that is not to reinvent the wheel but to learn from the successes and failures of others.

Before Japan, Korea and China began to innovate, they were called copycat nations; their electronics and consumer products were knockoffs from the West. Silicon Valley succeeds because it excels in sharing ideas and building on the work of others. As Steve Jobs said in 1994, “Picasso had a saying, ‘Good artists copy, great artists steal,’ and we have you know always been shameless about stealing great ideas.” Almost every Apple product has features that were first developed by others; rarely do its technologies wholly originate within the company.

Mark Zuckerberg also built Facebook by taking pages from MySpace and Friendster, and he continues to copy products. Facebook Places is a replica of Foursquare; Messenger video imitates Skype; Facebook Stories is a clone of Snapchat; and Facebook Live combines the best features of Meerkat and Periscope. This is another one of Silicon Valley’s other secrets: if stealing doesn’t work, then buy the company.

By the way, they don’t call this copying or stealing; it is “knowledge sharing.” Silicon Valley has very high rates of job-hopping, and top engineers rarely work at any one company for more than three years; they routinely join their competitors or start their own companies. As long as engineers don’t steal computer code or designs, they can build on the work they did before. Valley firms understand that collaborating and competing at the same time leads to success. This is even reflected in California’s unusual laws, which bar noncompetition agreements.

In most places, entrepreneurs hesitate to tell others what they are doing. Yet in Silicon Valley, entrepreneurs know that when they share an idea, they get important feedback. Both sides learn by exchanging ideas and developing new ones. So when you walk into a coffee shop in Palo Alto, those you ask will not hesitate to tell you their product-development plans.

Neither companies nor countries can succeed, however, merely by copying. They must move very fast and keep improving themselves and adapting to changing markets and technologies.

Apple became the most valuable company in the world because it didn’t hesitate to cannibalize its own technologies. Steve Jobs didn’t worry that the iPad would hurt the sales of its laptops or that the music player in the iPhone would eliminate the need to buy an iPod. The company moved forward quickly as competitors copied its designs.

Technology is now moving faster than ever and becoming affordable to all. Advances in artificial intelligence, computing, networks and sensors are making it possible to build new trillion-dollar industries and destroy old ones. The new technologies that once only the West had access to are now available everywhere. As the world’s entrepreneurs learn from one another, they will find opportunities to solve the problems of not only their own countries but the world. And we will all benefit in a big way from this.

For more, follow me on Twitter: @wadhwa and visit my website: www.wadhwa.com.


Automotive FD-SOI Update

Automotive FD-SOI Update
by Daniel Nenni on 05-07-2018 at 7:00 am

GF FD SOI

We have been tracking automotive related articles on SemiWiki since 2015 and have published more than 300 automotive blogs thus far that have garnered more than one million views. The automotive publishing pace has picked up quite a bit lately and the number of domains reading them has increased exponentially. So yes, automotive is a big deal for the semiconductor business, absolutely!

GlobalFoundries has a very nice automotive landing page to get you started. If you look at the market overview you will see the different market segments: Powertrain, Advanced Driver Assistance (ADAS), Infotainment, Body (human) Electronics, Instrument Cluster, Chassis and Safety, and EV.

FD-SOI is also a popular topic on SemiWiki. Since 2013 we have published 90 blogs that have been viewed close to one million times. FD-SOI is also one of the most commented on topics on Semiwiki. In fact, when I first blogged about FD-SOI I immediately thought of two markets: China and Automotive.

Today the automotive market is hot around the world especially in China so the question I had for GlobalFoundries after their latest FD-SOI design win with Arbe Robotics for Resolution Imaging Radar to Enable Safety for Autonomous Cars was: How does FD-SOI compare to FinFET or bulk planar technology for reliability aging?

Reliability aging is one of my biggest concerns with automotive semiconductors especially when talking about autonomous cars. My wife and I generally keep our cars for about 10 years (100,000 miles) and we keep our iPhones for 1-2 years. The National average for new car ownership seems to increase every year and is close to 6 years according to Kelly Blue Book. But the average life of a vehicle is at an all-time high of 10.8 years which matches us since we generally buy new cars.

Jamie Schaeffer, Sr. Director of product line management, at GF gave me the answer I was looking for:

All technologies are qualified to industry standard specifications, performance is the variable that is maximized while still satisfying the reliability limits of a predefined lifetime specification.22FDX, however, provides three unique advantages for reliability aging compared to bulk planar or FinFET technologies:

First, utilizing body-bias to boost frequency has inherently less impact on reliability lifetime compared to the traditional approach of applying voltage overdrive. Body-bias has zero to minimal impact on typical reliability parameters (TDDB, BTI, HCI) compared to voltage overdrive which strongly accelerates reliability aging and restricts design parameters such as Temperature, Lifetime, Area, or PPM.

Second, by using adaptive body-bias techniques the designer can compensate for reliability aging effects in their design. This enables power and area efficiencies by designing with less margin and compensating, real-time, for aging induced performance degradation through the use of on-die process monitors, controller IP, and body-bias generators. The use of body-bias is also more precise, less noisy, and less sensitive to IR drop than applying voltage overdrive techniques.

Third, 22FDX has superior soft-error rate for alpha and neutron radiation compared to bulk technologies giving it an intrinsic advantage compared to bulk technologies for radiation induced aging effects. For single-cell upsets 22FDX demonstrates 30x better FIT/Mbit (failure in time per Mbit) and for multi-cell upsets 22FDX demonstrates >1000x better FIP/Mbit compared to the historical trend of bulk technologies including 28, 40, and 45nm nodes. The improved SER in 22FDX saves layout space and computation overhead for ECC.

Jamie is one of my favorite sources/speakers at GF. He has a PhD from the University of Texas at Austin in Material Science and Engineering and started his career at Freescale in 1997 before joining GF in 2010. If you have more questions for Jamie post them here and I will make sure they get answered.