Banner Electrical Verification The invisible bottleneck in IC design updated 1

EUV Continues Roll Out With Lumpy Quarters Ahead

EUV Continues Roll Out With Lumpy Quarters Ahead
by Robert Maire on 04-20-2018 at 12:00 pm

ASML put up good results with revenues of Euro2.285B versus street of Euro2.22B and EPS of Euro1.26 versus street of Euro1.17. Guide is for Euro2.55B versus street of Euro2.46B but EPS of Euro1.16 versus street EPS of Euro1.35 on lower gross margins, slipping from 48% to 43%.

A couple of EUV systems have slipped out. This is not surprising given the hugely complex systems and surrounding environment. We would expect to see more slips going forward as customer acceptance and ability to ship may be impacted by many unpredictable variables.

The obvious issue is that the very high cost of EUV systems can make for lumpy quarters if a couple of systems shift around as we are seeing. We think this shifting is not likely to get better and could potentially get worse as customers may delay shipments or installs as they work the bugs out of their EUV process flow.

The gross margin may be more of a concern as investors could become worried that costs are getting away from the company as tools slip or problems arise that add to costs. Part of the issue can simply be product mix of lower margin EUV versus more DUV as experienced in Q1 but we think we need to watch for increasing costs associated with problems.

ASML is not the second domino after Lam
Investors have cause for concern after Lam’s meltdown yesterday and today now followed by ASML’s less than stellar guide as well. While both stock problems are due to guidance issues they are not tied to the same root cause.

In Lam’s case, the concern is about a roll over in memory spending and in ASML’s case it is more related to lumpy roll out of EUV and associated margin variation and revenue instability. The only commonality is that business is at levels that are so high that its difficult for things to get much better from here which means there is one direction of travel.

Everyone’s in line for EUV & lining up for high NA
ASML said they will no longer talk about back log. The company has very solid backlog as no one wants to be shut out of EUV, or left behind, when the industry makes the transition. However, by the same token, no one wants to have a bunch of unproductive, very expensive EUV tools sitting around while bugs are being worked out of the process.

This is a very difficult balancing act and will contribute to pushes and pulls in EUV shipments. Though ASML surely wants certainty in ship dates and revenue, customers may want to game the system a bit more. With the announcement of 3 orders for high NA systems we can be sure that means Intel, Samsung & TSMC are the first 3 in line.

Big memory exposure
Memory sales jumped to 74% of business in March up from 53% which paralleled Lam’s jump to memory at 84% of business. This obviously helped out with DUV’s higher margins. As we mentioned yesterday, this also re-focuses investor attention on less memory centric tool makers such as KLAC, NANO or NVMI etc; who could have experienced a dip much as Lam or ASML did but it also may not be the case as yield management tools sell on a different cycle from wafer fab tools such as ASML and Lam.

Multiple E Beam
ASML also announced a significant step in its E Beam program with the successful test of a 9 parallel beam system. Though this is a very long way off from a highly parallel systems of many more beams, it is none the less a positive step that bears watching.

KLA missed E Beam when it canceled its program and whose engineer went on to found E beam at Hermes, which was highly successful, which is now inside ASML on its way to multi beam. KLA will have to double down on its actinic inspection that it has revived as ASML is not slowing down and continues to push all things EUV as hard as possible.

The stocks
Both ASML and LRCX are fighting to stay above their $200 price points. We think this is a critical support level that both stocks have continued to dance around. We think they will continue to be weak as there aren’t any catalysts to go out and buy the shares. We wouldn’t abandon ship and dump our position, we simply have no strong incentive to put more money to work in these names as we are fighting a negative tape and negative sentiment for the group. The secular story for both ASML and LRCX remain strong but the near term news will likely stop forward progress.


The Intention View: Disruptive Innovation for Analog Design

The Intention View: Disruptive Innovation for Analog Design
by Daniel Nenni on 04-20-2018 at 7:00 am

Intento Design builds responsive analog EDA. The ID-Xplore tool is used for analog design acceleration and technology porting at the functional level, helping companies move analog IP quickly between technology nodes and across business units. The Intention view is a simple, elegant, and powerful concept that gives the speed of digital design to the analog designer for the first time. Intento Design receives a lot of questions on how the Intention view was conceived and how it works.

The following is a Q&A discussion with Dr. Caitlin Brandon of Intento Design:

How did Intento Design create the Intention view?Well, Analog EDA has proven very difficult for many reasons but mostly because the approach used by analog designers was not considered. From their training, analog designers are much more pen-and-paper than digital designers and these processes are counterintuitive to automate.

For instance, the first training an analog designer receives is usually the DC bias of an amplifier. The DC bias and sizing are calculated with pen-and-paper, transistor size data is entered in the schematic by hand and an AC performance simulation is started with a SPICE simulator. Performance results from the hand-calculated DC bias calculation are now seen for the first time. Clearly, using a pen-and-paper approach, getting to the performance results is slow.

But, here’s the interesting thing, before starting an AC analysis, the SPICE simulator first calculates the DC operating point using a DC solver. Likewise, a TRAN analysis also requires the starting DC operating point calculation, with energy stored in components. For Periodic AC, a more complex analysis necessary for modern mixed-signal analog systems, multiple DC operating points are calculated using – you guessed it – the DC solver. Spot the trend? Yes – SPICE simulation relies on the DC solver.

There it is – if a designer could directly choose their optimal DC operating point, the AC, TRAN and other performances would be met in an almost straightforward manner. The importance of DC analysis has been underestimated in analog design automation. And the EDA industry has failed to provide a tool for design of the analog circuit DC bias – until now.

So, the Intention view is used for DC bias and transistor size calculations? Yes, ID-Xplore uses the testbench directly for performance analysis, and the innovation of the Intention view provides the ability to quickly calculate transistor size data.

In creating the Intention view, it became really clear that the DC bias itself is not a simple act to follow. During calculation of a DC bias, the analog designer uses substantial explicit/trained knowledge and implicit/gained knowledge. Their trained knowledge obviously involves applied circuit analysis, such as Kirchhoff laws and known circuit topologies, whereas their gained knowledge may involve knowing by experience where signal frequency content will be lost due to the physical implementation. The designer understands – at a glance of the schematic – how analog sub-blocks work together to achieve block-level performance. To calculate a good DC operating point, the analog designer translates sub-block DC bias parameters into a first order model of some performance objectives and then calculates preliminary transistor sizing. It is this unique combination of advanced trained knowledge and advanced gained knowledge that makes analog circuit design challenging.

So – if it isn’t broke, why fix it?
Exactly. The analog designer knows how to describe their DC bias. What they really need is an automation tool to help them explore a range of bias and sizing in a technology PDK. Intento Design responded to this analog automation gap with the ID-Xplore tool and the Intention view.

What is the Intention view?
The Intention view is a text-based description of the analog circuit DC bias, written in a way which is very similar to the paper-and-pen version. The Intention view itself is technology independent, and an exploration can be done in any PDK. Electrical and physical parameters described in the DC bias, such as channel inversion, node voltage, branch current, transistor length – the list goes on – can be explored in just minutes to hours. With many varying parameters, the potential number of DC bias points can be quite large, often in the 100’s of millions, which obviously requires ID-Xplore automation to handle.

So, as a way of thinking about it, the Intention view is old-school – just like the pen-and-paper approach – and the ID-Xplore tool is new-school, advanced, automation enabling large scale exploration of the designer intentions (the Intention view).

But what happens to the performance specifications? As a plugin tool, the testbench performance specifications are evaluated at full SPICE accuracy for each individual DC bias point. Transistor sizing, previously handcrafted into transistor parameter fields in traditional design, is now fully automated. The ID-Xplore tool uses the OpenAccess database to back-annotate the designer selected DC bias and transistor sizing.

Does the Intention view play well with others – for collaboration? Yes, because collaboration and individual member experience are among the strongest factors impacting analog design team performance, the Intention view and ID-Xplore enable both knowledge transfer and training. Engineers share Intention views directly when they share the schematic database. For training, the ID-Xplore tool is useful to understand circuit sensitivity and design impacts at the performance level of the PDK. Exploration can uncover trends or verify hard limits of performance, illustrating where to concentrate design effort or when it is necessary to implement an entirely different schematic topology.

In short, the Intention view was created to allow designers to design, document and share a vision for the analog circuit performance as they see it – transistor by transistor.

Is Intention-based exploration faster than size-based optimizers?
Yes. Analog designers are plenty smart. At Intento Design, we like to think we are plenty smart too – for designing EDA tools. Here’s why.

Size-based optimizers use the SPICE DC solver on the whole circuit – even when only minor incremental or local size changes are required, resulting in substantial excess calculation. In addition, being size-based, the exploration requires many steps and can sometimes produce non-realizable results.

For ID-Xplore, we took a different approach. Using graph-theory, we analyzed the nodes and edges of the DC solver matrix and created a structural approach. While graph-theory itself can be very complex, the simple fact is that the number of SPICE calculations is much, much lower. In addition, variation of electrical parameters, rather than size, ensures that an exploration stays within a designer-validated region.

Analog designers appreciate right away the graph-based structural approach. Structural, of course, refers to the fact that the arrangement of the transistors in the circuit is taken in account while calculating size and bias. For instance, take a differential pair with a varying tail bias current. Assuming the input gate-voltage is fixed, the differential pair source-voltage varies with the current. Consequentially, changes can occur in the tail current transistor drain-voltage and the differential pair bulk-source voltage. Graph theory allows ID-Xplore sizing operations to accurately transmit branch-level, or even circuit-wide, information so local transistor sizing is correct-by-construction, taking into account both gross changes (current) as well as local effects, such as threshold voltage variation. When you think about it, this is exactly what the analog design engineer does with pen-and-paper.

So, the pen-and-paper approach of the analog designer really is best?Yes, that’s true, compared with a size-based optimizer, which is a brute force approach, the pen-and-paper approach is already more intelligent. The Intention view, combined with the exploration capabilities and data analytics of ID-Xplore, are designed to mirror quite closely the applied training of the analog designer. This means corporate investment in analog design team knowledge is not lost – but accelerated.

How does the Intention view enable technology porting?This is simple to understand. The Intention view is a parameterized description of the DC bias. And, while default values may be set inside the Intention view, it’s really inside the ID-Xplore tool that values are assigned for exploration in any given PDK. To move from one technology to another, the ID-Xplore tool is simply pointed at another technology PDK and the default parameters are adjusted for the exploration.

What tool is used to create the Intention view?The Intention view is created inside the Constraint Editor[SUP]TM[/SUP] of Cadence. To do this, a transistor or group of transistors is selected and then the Intento Design pull-down menu in the Constraint Manager tab is selected which opens the constraint entry field. The constraint data fields take parameterized electrical descriptions of the DC bias parameters, such as bias current and overdrive voltage. Once complete, the Intention view is exported to the ID-Xplore tool for design exploration using testbenches already setup in ADE-XL or ADE Assembler.


Figure 1 Creation of the Intention view inside the Constraint Editor of Cadence

Can you show an example of an ID-Xplore Intention view exploration?Yes, the following image shows design curve results in ID-Xplore. Each curve shows specifications that resulted from a unique DC bias point; the design curves together show the range of performances for the number of design points simulated. Selecting a specific solution shows the electrical values in the design point, the performances, and transistor size data. In this way the ID-Xplore tool really displays DC bias vs. performance, and size data is directly available for back-annotation.


Figure 2 ID-Xplore showing results of exploration of Intention view

How fast is Intention-based exploration using ID-Xplore?
Fast. Moving into 2018, Intento Design has worked on enhanced data display and faster operations to produce almost 200x speed performance under-the-hood of ID-Xplore. With multi-core parallel partitioning and advanced graph analysis, exploration on the DC bias of a multi-stage, fully-differential 75 transistor CMOS amplifier now takes only a few minutes.


Figure 3 Design acceleration using ID-Xplore

Minutes?
Well, the creation of the Intention view itself can take an hour or more, because this is really the analog hand-crafting aspect of the tool, but the exploration takes only minutes – yes. And, once created, the Intention view stays attached to the schematic to enable more exploration or technology porting. We believe ID-Xplore using the Intention view is an elegant, powerful and disruptive analog EDA – putting the speed of digital design into the analog designer’s hands for the first time.

Related Blog


Data Breach Laws 0-to-50 States in 16 Years

Data Breach Laws 0-to-50 States in 16 Years
by Matthew Rosenquist on 04-19-2018 at 12:00 pm

It has taken the U.S. 16 years to enact Data Breach laws in each state. California led the way, with the first, in 2002 to protect its citizens. Last in line was Alabama, which just signed their law in March 2018. There is no overarching consistent data breach law at the federal level. It is all handled independently by each state. This causes some confusion as there are different standards and requirements. Businesses must understand and conform to each, in addition to all the international privacy laws.

Over the past decade, privacy compliance has become a massive bureaucratic beast, requiring policies, lawyers, audits, and oversight to meet a sometimes vague and complex regulatory landscape that is often changing. A legion of privacy professionals now exists throughout the world.

All for Good Reason

The world of technology leapt forward beyond the limits of paper records which were difficult to duplicate, share, and transit. We have successfully created a world where digital information can easily be created and disseminated across the planet in the blink of an eye. This has led to the desire to gather more data on people and their behaviors. Their financial status, social influence, purchasing preferences, political viewpoints, and many other facets are valuable to influencers and product vendors.

Innovation adoption moved too fast and mistakes were made. Companies who develop products and services were far too quick to begin gathering such valuable knowledge nuggets of their customers. Consumers and governments were lax or greatly delayed in establishing proper controls to protect people’s data. End users were blasé in caring what they shared, who could obtain it, and chose to remain ignorant in how it could be used to their detriment. It all seemed harmless, until it wasn’t.

Unscrupulous yet profitable data sharing crept into the mix. Criminals realized the windfall of nearly unprotected data just waited for them to scoop it up. The results began to turn the mindset of society. Data was valuable, even in the wrong hands.

People were being manipulated and treated with unfair bias, based upon private data that was now in the open. Personal financial data and healthcare records were the first major issues. Fraudsters who obtain a few select pieces of information could cause an economic tornado for victims, opening credit lines, loans, making fraudulent purchases, and even filing for fake tax refunds. Harvesting login credentials and passwords opened systems and services to manipulation and hacking. Even subtle data collection, such as web browsing habits, searches, and product purchases were used to create profiles that marketeers could wield to improve sales. Recently, social media connections have been used to manipulate the attention economy to sway viewers political and social opinions. It is a free-for-all, fueled by personal data.

Rules to Play Nice

As late as they are arriving, it has become apparent that regulations are needed to establish guard-rails that will begin to force boundaries of data gathering, handling, and protection to stem the hemorrhaging losses.

It has been a long sixteen years, to get a fundamental data breach law on the books in every state. The first privacy laws in the U.S. are primarily focused on breach notification. That is only the first step. Like Europe, we must also address the collection, protection, fair-use, and ability for subjects to correct and control their data. The upcoming EU General Data Privacy Regulation (GDPR) is the latest version that unifies privacy regulation across the European Union. The U.S. is far more fragmented and less comprehensive.

Enforcement is also required. Stiff penalties help with the encouragement for compliance and can take many forms. Regulatory fines, litigation, and customer loyalty are all plausible forces to positively shift protections to the users and away from other self-serving entities. In the U.S. the damages for non-compliance can vary but are considered minimal. The GDPR however can penalize a company up to 4% of their global revenue, which establishes a new high-water mark. Overall, no one carrot or stick will be a quick fix, but progress, maturity, and stability is needed.

This is a race. We must move faster, with greater purpose, and better foresight in cooperation with businesses, consumers, and legislatures if we are to limit damages while enabling the technology everyone wants in their lives.

Interested in more? Follow me on your favorite social sites for insights and what is going on in cybersecurity:

LinkedIn, Twitter (@Matt_Rosenquist), YouTube, Information Security Strategy blog, Medium, and Steemit


Meltdown, Spectre and Formal

Meltdown, Spectre and Formal
by Bernard Murphy on 04-19-2018 at 7:00 am

Once again Oski delivered in their most recent Decoding Formal session, kicking off with a talk on the infamous Meltdown and Spectre bugs and possible relevance of formal methods in finding these and related problems. So far I haven’t invested much effort in understanding these beyond a hand-waving “cache and speculative execution” sort of view so I found the talk educational (given that I’m a non-processor type); I hope you will find my even more condensed summary an acceptable intro to the more detailed video (which you can find HERE).


The speaker was Mike Hamburg of Rambus who, among other teams such as Google ProjectZero, were involved in the research on these vulnerabilities (summarized in the image above). I’ll only consider the Meltdown part of his talk, starting with a simple example he used. Consider this line of code:

result = foo[ bar [ i ] * 256 ];

In the simple world, to execute this, you look up a value referenced by bar, multiply by 256 and lookup what that references in the array foo. In an even modest OS with a distinction between kernel mode and user mode, a user-mode process will error on a bounds-check if it tries to access kernel-only memory. This is part of how you implement secure domains in memory. Only trusted processes can access privileged regions. Particularly for virtual machines running in a cloud, you expect these types of wall between processes. I can’t look at your stuff and you can’t look at mine.

But we want to run really fast, so the simple world-view isn’t quite right. Instead processors work on many instructions per clock-cycle, allowing for 100 or more operations executed simultaneously in flight, including instructions ahead of the current instruction. These speculative executions work with whatever values the processor reasonably assumes they may have – with data values already in cache or triggering a fetch if a needed value is not yet in cache or making a prediction about whether a branch may be taken or a variety of other guesses. When the current program counter catches up, if the guess was correct, the result is ready, if not the processor has to unwind that stage and re-execute. Despite the misses and consequent rework, overall this works well enough that most programs run much faster.

The problem, as Mike put it, is that the unwinding is architectural but not micro-architectural. When a speculatively-executed instruction turns out to have been wrong, the processor winds back the appropriate instructions to re-execute. But other stuff that happened during the speculative execution doesn’t get wound back. Data was fetched into cache and branch predictors were updated. Who cares? It doesn’t affect the results after all. Cache fetches in these cases even prove beneficial apparently – more often than not they save time later on.

This is where Meltdown happens but in a fairly subtle way, through cache-timing side-channel attacks. Daniel Bernstein showed in 2005 that it is possible to extract secret data for the AES encryption algorithm through such attacks, simply by measuring encryption times with a precision timer for a series of known plaintext samples and running a simple analysis on those values. In a system with cache these times vary, which is what ultimately leaks information; you run enough data through the system and you can reconstruct the key.

In case you hoped this might be an isolated problem for AES, these kinds of attacks are more broadly applicable and not just for getting encryption keys. Mike made the point that without mitigation, a Meltdown-enabled attack can effectively read all user memory. A defense especially in the cloud is to use separate (memory) page tables per process which may not be a big deal on modern server-class CPUs but can have a 30% performance penalty on older CPUs.

Another approach would be to make all memory accesses take the same time, at least within certain domains, or disallow speculation without bounds checks on critical operations or … In fact given the complexity of modern processor architectures, it’s not easy to forestall or even anticipate all possible ways an attack might be launched. In Mike’s view formal can play a role in detecting potential issues, though he thinks this would be limited to small cores today. So not yet server-class cores, but I’m sure the big processor guys are working hard on the problem.

This would start by writing a contract for covert channels – what should and should not be possible. Mike feels this can’t be a blanket attempt to make attacks impossible – maybe that could only happen if we forbid speculation. But we could on small machines define contracts to bound general execution versus execution around privileged/secret operations and/or characterize those cases where such guarantees cannot be given. Then a careful crypto-programmer for example could write code in such a way that it would be not susceptible, or at least less so, to this class of attack.

Mike wrapped up with some general observations. Absent banning speculative execution, we’re likely to need careful analysis of covert channels. Perhaps we need to rely on slower but more resilient secure code (allowing for memory access checks even in speculation – I think I heard that AMD already does this). We also need to plan more for secure enclaves inaccessible by any form of external code, privileged or not. It was unclear in his mind whether TrustZone or similar systems could rise to this need today (maybe they can, but that’s not yet known). Certainly it seems more and more desirable to run crypto in a separate core or (if still run in software) supported by dedicated instructions hardened against side-channel attacks. I suspect there will be a lot of interest in further advancing proofs of resilience to such attacks. In formal at least this won’t be available anytime soon in apps – this is going to take serious hands-on property-development and checking.


RDC – A Cousin To CDC

RDC – A Cousin To CDC
by Alex Tan on 04-18-2018 at 12:00 pm

In a post-silicon bringup, it is customary to bring the design into a known state prior to applying further testing sequences. This is achieved through a Power-on-Reset (POR) or similar reset strategy which translates to initializing all the storage elements to a known state.

During design implementation, varying degrees of constraining may be applied on the Reset signal. For example, designer may impose some multicycle paths (MCP) constraint in order to avoid unneeded timing optimization on the reset logic (although check for slew violation is still necessary). In this article we will discuss Reset mechanism and RDC (Reset Domain Crossings).

Just like the notion of hard- or soft-reboot in system bringup, we could first categorize this initialization step into hard/soft-reset as captured in figure 1.
In synchronous designs, asynchronous reset de-assertion operation causes metastability issue and unpredictable values in the memory elements. This increases risk of not having a stable design initialization. The snapshot in figure 2 illustrates the issue, in which the reset signal de-asserts during the active clock edge change –causing metastability issues as well as randomly initialized register values. To avoid a non-determinism, synchronization at reset deassertion is needed.

Synchronous and Asynchronous Reset
Let’s probe into the flip-flop element which sits in the center of this phenomenon, In the standard cell library, this storage element or register may come in two flavors, i.e, with reset and no-reset option. On the other hand, in the design RTL codes, registers may be pre-instantiated or left to be inferred during logic synthesis, depending on whether logic designer would like to impose control on the type of registers used. If inferred, logic synthesis will also infer reset implementation and select registers from the library with the corresponding set/reset configuration.

In the FPGA design, a slice may contain cluster of registers sharing a set of control signals such as clock, enable and reset related. Frequent endorsement of using a non-resettable register in FPGA design is stemmed from better device utilization in both Shift Register LUTs (SRL) and Block RAM (BRAM), although care should be given in initializing registers to known values during functional simulation as registers with undefined “X” states are very common occurences here.


Comparing two types of reset implementation, each comes with advantages and disadvantages.

Unlike ASICs, FPGA designs implement the Power-on-Reset function. It initiates program load into the bitstream and configures the LUT’s. The bitstream contains the initial values for every register and RAM bit in the device. Registers are initialized throughout the configuration process and the Global Set Reset (GSR) signal keeps the device in non-operational mode until the end of configuration stage.

Reset Synchronization Techniques
Since reset signal is external to a device and asychronous, it needs to be synchronized. The conversion and synchronization of external-to-device to internal-to-device reset signal can be achieved through multi-stage registers. A minimum of two clock cycles is required to ensure minimum reset pulse-width is met. However, depending on the type of registers (i.e, non-resettable) used, the reset synchronizer could require as much as ten clock cycles. It is recommended that any reset de-assertion should be done only upon stable clock and during its active operation. For example in some subsystems containing finite state machines or counters, all registers must come out of reset on the same clock edge to prevent illegal state transition.

Similar to the solution in Clock Domain Crossing (CDC), NDFF synchronizer can also be utilized as synchronizer across two clock domains. Since each domain has varying minimum pulsewidth requirement, a pulse stretcher can be inserted prior to the synchronizer to ascertain that minimum pulsewidth is met. In FPGA, glitch prevention may be warranted due to non-resettable registers being used. Initializing these registers to similar values of the external asynchronous reset signal should avoid possible reset glitches.


FPGA resources such as SRL and BRAM contain non-resettable registers and may introduce non-determinism as the GSR net can release different storage elements in different clock-cycles. This in turn triggers a chain reaction causing some registers to “wake-up” one or more clock cycles earlier than the others. Coupled with any presence of sequential loop-back condition, this may corrupt the initialization values and lead design into an unpredictable state.

The selection of non-resettable register is also to accommodate synthesis optimization techniques commonly seen in Intel’s devices, such as register retiming, pipelining or other register related netlist modifications. Register-specific optimizations are done only in the absence of an asynchronous reset. For list of design practices that could help eliminate non-determinism, refer to this.

In design with multiple reset signals targeting different sections of the system, RDC could occur. These signals introduce asynchronoous reset assertion events in each reset domain, which may lead to metastability and unpredictable design initialization (see figure 3a).
The reset operation frequency also increase susceptibility to the RDC effect, and on the other hand, a proper reset ordering sequence should minimize its occurrences. Linting tools such asAldec’s ALINT-PRO help designer to identify RDC and other aspects of reset domains. It helps generate design assertions to confirm proper reset sequences are done in the design.

RDC synchronization methods includes isolating the receiving domain register from the source domain register. An enable such as “iso_en” is asserted prior to the assertion of “rst1”. The receiving registers holds its value during an asynchronously set FF data change as seen in figure 3b. Through linting step, such isolation cells used in RDC prevention can be identified and verification code can be generated to ensure correct operation of these “iso_en” signals relative to signal assertions. For discussion of other technique, refer to this.

Despite fewer occurrences compared with CDC, RDC has been getting more attention especially for heterogeneous designs with complex reset strategies and segregated regions needing frequent reset sequences. Preventing and fixing such condition can be addressed through the use of both Linting tools and subsequent functional verification.

For info on Aldec’s ALINT-PRO™, please checkhere,
For Aldec’s white-paper on RDC, please find ithere.


Artificial Intelligence calls for Smart Interconnect

Artificial Intelligence calls for Smart Interconnect
by Tom Simon on 04-18-2018 at 7:00 am

Artificial Intelligence based systems are driving a metamorphosis in computing, and consequently precipitating a large shift in SOC design. AI training is often done in the cloud and has requirements for handling huge amounts of data with forward and backward data connections. Inference usually occurs at the edge and must be power efficient and fast. Each of these imposes new requirements on computing systems. Training puts a premium on throughput and inference relies on low latency, especially for real time applications like ADAS.

To accommodate these new requirements, there are sweeping changes occurring in computational architectures. In much the same way that mini- and then micro- computers changed the landscape of computing, the changes necessitated to support AI will permanently alter how things are done.

The what and how of these changes was the topic of a presentation given by NetSpeed at the Linley Processor Conference on April 11[SUP]th[/SUP] in Santa Clara. The presentation by Anush Mohandass, VP of Marketing at NetSpeed, discusses how a smart interconnect fabric helps to enable embedded AI applications. Their first point was that AI is making its way into a large and broad number of applications. These include vision, speech, forecasting, robotics and diagnostics, among others.

Inside of these new SOCs there is a new data flow. A large number of compute elements which are small and efficient need to perform peer to peer data exchange rapidly and efficiently. There will be many multicast requests and the transfers should be non-blocking. Indeed, QoS becomes very important. Previous architectures operated differently, with processing units using a central memory as an interchange system.

AI systems need ‘any-to-any’ data exchanges that benefit from wide interfaces and need to support long bursts. However, the central requirement is that all the elements need to be active simultaneously. Naturally, it is easy to see that this can lead to power management issues that should be resolved with aggressive clock gating and traffic sensitive optimizations.

NetSpeed talked about their approach, which can help enable SOCs that have requirements like those imposed by AI applications. They provide the logic needed to integrate, coordinate and control the large number of types and instances of IPs in an SOC. This covers many facets: interconnect, cache coherency, system level cache, system level debug, bandwidth allocation, QoS controls, power management, and clock crossings. With so many parameters and requirements, what is really needed is a design environment specifically geared to implementing the optimal solution.

This is something NetSpeed offers. It supports an architectural design approach that starts off with a specification, and then helps work through the various tradeoffs. Their design environment provides feedback along the way and is checking for design correctness continually.

NetSpeed offers Orion for creating non-coherent interconnect. Their Gemini offering is for coherent system backbones. Their Crux backbone is architecture agnostic. Finally, for programmable L2, L3, and LLC cache they offer Pegasus. Their design environment assists with design and assembly. They use a machine learning based cognitive engine to help with implementation. The system outputs extensive data analytics and visualizations.

In much the same as TCP/IP offers a multi layered protocol that provides abstraction for data transmission on the internet, NetSpeed’s SOC solution uses a multi-layer protocol implementation to provide optimal performance and highest throughput. With this comes QoS, multicast support and no blocking behavior, needed for AI processing.

The NetSpeed presentation went into greater depth on the technology and is well worth reviewing. The big take away is that entirely new ways of design will be necessary to accommodate the needs of AI in future SOCs. It may come to pass that we look back at CPU based computing the way we do punched cards and magnetic tapes.


Tensilica 5th Generation DSP: Mix of Vision and AI

Tensilica 5th Generation DSP: Mix of Vision and AI
by Eric Esteve on 04-17-2018 at 12:00 pm

Cadence has launched the new Tensilica Vision Q6 DSP IP, delivering 1.5x more performance than the former Vision P6 DSP IP and 1.25X better power efficiency. According with Cadence, the mobile industry is moving from traditional feature-based embedded vision to AI-based algorithm, even if all use cases still have mix of vision and AI operations. The result is need for both vision and AI processing in the camera pipeline, translating into the implementation of both Vision Q6 DSP and C5 DSP to solve the complete camera processing pipeline.

Implemented in the Huawei Mate 10, Cadence Vision DSP enables advanced imaging applications like HDR video, image stabilization or hybrid zoom with 2 scene facing cameras. Compared to CPU or GPU, Vision P6 and now Q6 helps meeting high resolution video capture, thanks to their high-performance capability and battery life requirements, thanks to much better energy efficiency. The Vision P6 IP core also serves as the processing unit for AI processing in the MediaTek P60, that MediaTek call the Mobile APU.

If you look at the way MediaTek communicates about their P60, AI capability is as much highlighted as the power of the four ARM Cortex A-73 CPU as “users can enjoy AI-infused experiences in apps with deep-learning facial detection (DL-FD), real-time beautification, novel, real-time overlays, object and scene identification, AR/MR acceleration, enhancements to photography or real-time video previews and much more.”

Cadence Vision DSP are also implemented in chips supporting automotive application like the GW5400 camera video processor (CVP) from GEO Semiconductor where the Vision DSP enables ADAS functions such as pedestrian detection, object detection, blind spot detection, cross traffic alert, driver attention monitoring, lane departure warning, as well as target-less auto calibration (AutoCAL®). For such device, energy efficiency is key to meet the very-low-power, zero-air-flow requirements for automotive cameras.

According with Mike Demler, senior analyst at The Linley Group. “SoC providers are seeing an increased demand for vision and AI processing to enable innovative user experiences like real-time effects at video capture frame rates. The Q6 offers a significant performance boost relative to the P6, but it retains the programmability developers need to support rapidly evolving neural network architectures. This is a compelling value proposition for SoC providers who also want the flexibility to do both vision and AI processing.”

The race for higher performance in vision processing is impacting all kind of applications, as well as the emerging need to implement local AI engines. If we take a look around, we can list:

Mobile

Over the next 4 years, there will be 3X increase in dual cameras and projections show that smartphone shipments integrating dual-sensors will be at 50%/50% in 2020.
On-device AI experiences at video capture rates is now a feature that help differentiate smartphone suppliers.

AR/VR Headsets

In robotic mapping and navigation, simultaneous localization and mapping (SLAM) is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent’s location within it. Latency requirements for SLAM and image processing are decreasing, pushing again the need for speed.
On-device AI is required for object detection/recognition, gesture recognition and eye tracking.

Surveillance cameras

Need to increase camera resolution and image enhancement techniques and for on-device AI for family/stranger resolution and anomaly detection.

Automotive

This is probably the most demanding segment, as it requires increase in number of cameras and camera resolution. On-device AI is clearly a “must have” for ADAS for pedestrian/object recognition.

Drones and robots

360 capture at 4K or greater resolutions and advanced computer vision for autonomous navigation is required as well as on-device AI for subject recognition and scene recognition.

To increase performance, some obvious solution like increasing SIMD width or VLIW slots to bring more parallelism, implementing N-core to multiply the processing power or simply run the processor at higher frequency have severe drawback in term of power consumption, area impact or programming model.

Cadence has reworked the processor architecture, now based on 13-stages pipeline, and the Vision Q6 can reach 1.5 GHz peak frequency. Compared with the Vision P6, the Q6 delivers 1.5X performance for vision and AI applications, 1.5X frequency in the same floorplan area and 1.25X better energy efficiency at Vision P6 peak performance. To compare apple with apple, these data comes from an implantation on 16nm process in both cases.

As we can see on the above picture, the complete architecture of Tensilica Vision Q6 DSP has been reworked, with deeper pipeline, improved system bandwidth and imaging and AI enhancement for this 5[SUP]th[/SUP] generation Vision DSP IP.

ByEric Esteve fromIPnest


Sometimes a Solver is a Suitable Solution

Sometimes a Solver is a Suitable Solution
by admin on 04-17-2018 at 7:00 am

Traditional, rule based, RC extractors rely on a substantial base of assumptions, which are increasingly proving unreliable. Having accurate RC extraction results for parasitic R’s and C’s is extremely important for ensuring proper circuit operation and for optimizing performance and power. Advanced process nodes are making it more difficult to get sufficiently accurate parasitics using rule based extractors. The problem is twofold, the design data given to the extractor is looking less and less like the actual fabricated physical design, and using rules is becoming less accurate due to increasingly complex structures in the circuits. These problems are occurring in BEOL and MEOL.

During a webinar in March, Dr. Garrett Schlenvogt at Silvaco gave some examples of the divergence between rules based extraction and the more accurate solver based approach. Using a ring oscillator, Garrett showed how, as metal structures become more complex, simulated delays diverged from measurement and solver based delays. The figure below illustrates this.

Another point that Garrett made during the webinar is that the 3D geometry to be analyzed needs to match the results from the fabrication process, not just the idealized 3D extrapolation of the 2D layout. He outlined the many factors that need to be considered. In advanced designs there are multiple dielectrics and metals. The geometries are not nicely stratified and metals frequently are not planar. In addition, the metal cross sections are not rectangular. The image below gives an idea of the complexity of fabricated 3D structures.

Clearly a solver cannot be used on large designs, but there are many cases where it can be used not just at the device level, but also at the circuit level. Using a simple step by step sequence, each step in the fabrication process is described and then applied to the mask information. Users can toggle between precise physical modeling or a simplified final representation, depending on accuracy requirements.

The output of Victory Process is passed to Victory Mesh for meshing. For non TCAD users it’s easy enough to take the interconnect portion of the design into Clever 3D, their Field Solver based extraction tool. This will produce a netlist including parasitics suitable for SPICE simulation. This provides a flow that is much easier to deploy than a classic TCAD approach, but gives the benefits of extremely high accuracy.

Because their modeling of physical fabrication steps is comprehensive, there are applications for this flow in many other domains besides FinFET/CMOS. Garrett touched on TFT/LCD/OLED, power devices such as DMOS/IGBT/SiC/GaN, optical, and even rad-hard applications. Another of his examples showed a conformal metal interconnect modeled with and without 3D considerations. The figure below shows the difference in the resistance value results.

During the webinar Garrett mentioned several interesting applications that can benefit from accurate RC extraction. One of these was MEMS capacitors. Another application he highlighted was CCD sensors. Garrett closed with an example containing a memory cell. Along with the parasitics, Silvaco generates a 3D model that can be viewed to ensure the processing steps are properly defined and that the resulting structure is correct.

For engineers looking for the most accurate results, field solver based extraction is the first choice. A field solver based extractor can also be used to verify a rule based approach. However, for full chip and high capacity designs a rule based approach will be needed. The entire webinar, with much more information than we could cover here, is available on the Silvaco website.


Functional Safety – the Analytics

Functional Safety – the Analytics
by Bernard Murphy on 04-17-2018 at 7:00 am

ISO 26262 is serious stuff, the governing process behind automotive safety. But, as I have observed before, it doesn’t make for light reading. The standard is all about process and V-diagrams, mountains of documentation and accredited experts. I wouldn’t trade a word of it (or my safety) for a more satisfying read, but all that process stuff doesn’t really speak to my analytic soul. I’ve recently seen detailed tutorials / white-papers from several sources covering the analytics, which I’ll touch on in extracts in upcoming blogs but I’ll start with the Synopsys functional safety tutorial at DVCon, to set the analytics big picture (apologies to the real big picture folks – this is a blog, I have to keep it short).

To open, chip and IP suppliers have to satisfy Safety Element out of Contexttesting requirements under Assumptions of Use which basically comes down to demonstrating fault avoidance/control and independent verification for expected ASIL requirements under expected integration contexts. Which in turn means that random hardware failures/faults can be detected/mitigated with an appropriate level of coverage (assuming design/manufacturing faults are already handled).

Functional safety analysis/optimization then starts with a failure mode and effects analysis (FMEA), a breakdown of the potential functional modes of failure in the IP/design. Also included in this analysis is an assessment of the consequence of the failure and the likely probability/severity of the failure (how important is this potential failure given the project use-modes?). For example, a failure mode for a FIFO would be that the FULL flag is not raised when the FIFO is full, and a consequence would be that data could be overwritten. A safety mechanism to mitigate the problem (assuming this is a critical concern for projected use-cases) might be a redundant read/write control. All of this obviously requires significant design/architecture expertise and might be captured in a spreadsheet or a spreadsheet-like tool automating some of this process.

The next step is called failure mode and effects diagnostic analysis (FMEDA) which really comes down to “how well did we do in meeting the safety goal?” This document winds up being a part of the ISO 26262 signoff so it’s a very important step where you assess safety metrics based on the FMEA analysis together with planned safety mechanisms where provided. Inputs to this step include acceptable FIT-rates or MTBF values for various types of failure and a model for distribution of possible failures across the design.

Here’s where we get to fault simulation along with all the usual pros and cons of simulation. First, performance is critical; a direct approach would require total run-times comparable to logic simulation time multiplied by the number of faults being simulated, which would be impossibly slow when you consider the number of nodes that may have to be faulted. Apparently, Synopsys’ Z01X fault simulator is able to concurrently simulate several thousand faults at a time (I’m guessing through clever overlapping of redundant analysis – only branch when needed), which should significantly improve performance.

There are two more challenges: how comprehensively you want to fault areas in the design and, as always, how good your test suite is. Synopsys suggests that at the outset of what they call your fault campaign, you start with a relatively low percentage of faults (around a given failure mode) to check that your safety mechanism meets expectations. Later you may want to crank up fault coverage depending on confidence (or otherwise) in the safety mechanisms you are using. They also make a point that formal verification can significantly improve fault-sim productivity by pre-eliminating faults that can’t be activated or can’t be observed (see also Finding your way through Formal Verification for a discussion on this topic).

An area I find especially interesting in this domain is coverage – how well is simulation covering the faults you have injected? The standard requires determining whether the effect of a fault can be detected at observation points(generally the outputs of a block) and whether diagnostic points in a safety mechanism are activated in the presence of the fault (e.g. a safety alarm pin is activated). A natural concern is that the stimulus you supply may not be sufficient for a fault to propagate. This is where coverage analysis, typically specific to fault simulation, becomes important (Synopsys provides this through Inspect Fault Viewer).

At the end of all this analysis, refinement and design improvement you get estimated MTBFs for all the different classes of fault which ultimately roll up into 26262 metrics for the design. These can then be aligned to the standard required for the various ASIL levels.

Now that’s analytics. You can learn more about the Synopsys safety solutions HERE.


Samsung is Starting 7nm Production with EUV in June

Samsung is Starting 7nm Production with EUV in June
by Scotten Jones on 04-16-2018 at 12:00 pm

There is a report in the Seoul Economic Daily that Samsung has completed development of their 7nm process using EUV and that production will begin in June. What is claimed in the report is:

  • The process is installed in the Hwaseong S3 Fab
  • Samsung has more than 10 EUV systems installed
  • Production starts in June with Qualcomm, Xilinx, Apple and HiSilicon as customers (Authors correction: the original article was in Korean and the source I used got the translation wrong, apparently only Qualcomm was listed in the original article)

Initially when I read this I was skeptical, but the more I have thought about it and investigated the various elements of this claim, the more I have come to believe this report is largely true. The following is my rational:

According to my tracking of 300mm wafer fabs as published in the IC Knowledge 300mm Watch Database, the Hwaseong fab has 4 phases. Phase 1 is for DRAM, phase 2 is for 3D NAND, phase 3 and phase 4 are know as S3 phase 1 and 2 and are for logic. Phase 1 is 10nm and phase 2 for 7nm. The S designation is used by Samsung for their foundry logic fabs. This fab “cluster” is also known to be Samsung’s EUV hub so 7nm production with EUV in “S3” makes sense and is consistent with this site and our expectations for how it will be used.

I gave a talk on EUV at ISS this year that I wrote up here. While researching EUV status for that talk I tried to determine where every EUV system installed to-date is located. It was my conclusion that Samsung had approximately 10 EUV systems installed consistent with the more than 10 EUV system installed assertion in the article.

The biggest surprise in this article is the idea that development of 7nm with EUV is done. At the SPIE Advanced Lithography Conference this year it seemed like everyone just woke up to the stochastic issues with EUV (I wrote that up here).

Simply put, dose is given by:

Dose = photon energy x number of photons

EUV photons have roughly 10x the energy of deep UV photons and for the same dose there is roughly 10x fewer EUV photons. This contributes to a variety of stochastic effects such as shot noise and photoresist issues. There is however a simple fix to this – run a higher dose. The problem with running a higher dose is the impact on throughput.

The following slide from my ISS talk illustrates the effect of dose on throughput.

Figure 1. EUV Throughput

To-date the throughput numbers that ASML has published are based on a 20mJ/cm[SUP]2[/SUP] dose with 96 steps and no pellicle. Logic devices generally require around 110 steps and Samsung is expected to be using EUV for metal layers, so they will need a pellicle. I have been hearing rumors that Samsung is using a 50mJ/cm[SUP]2[/SUP] dose. The overall impact of the higher dose, pellicle and more steps is that throughput will only be around 60 wafers per hour (wph) (please note that since this slide was created ASML has achieved 140 wph for 96 steps, 20mJ/cm2 and no pellicle).

Assuming Samsung is in fact running at 50mJ/cm[SUP]2[/SUP] that may be sufficient to get around most of the worst stochastic issues and produce a usable process.

The question then becomes would they be willing to accept such low throughput and therefore increased costs. Once again there is a relevant rumor and that is that “foundries” are accepting that they may have to absorb higher initial costs for EUV wafers. Samsung is also a company that is rumored to use brute force to get a process started. At 10nm it was said that in the beginning when yields were low that they simply ran a lot of wafers to get shippable parts. Perhaps in order to get EUV started they will accept low throughput and high costs to be first to production and to start the high-volume learning process.

The combination of what is known about EUV and the rumors about Samsung make me believe that we will in fact see Samsung begin to ship 7nm wafers using EUV starting in June. Likely this will be by running the EUV systems in a way that delivers low throughput and high costs and there may be yield issues as well, but this will make Samsung the first to enter production with EUV.

I will say the customer list surprises me, I thought Apple was at TSMC for 100% of their 7nm business and I thought Qualcomm and Xilinx were also TSMC 7nm customers. But the rest of this report is credible in my opinion.