Bronco Webinar 800x100 1

Eta Compute Receives Two Awards from ARM at TechCon

Eta Compute Receives Two Awards from ARM at TechCon
by Tom Simon on 11-19-2018 at 7:00 am

Many startups set out with the goal of accomplishing a technical feat that was previously considered impossible. Quite frankly most do not succeed. Yet, occasionally a company comes along that succeeds with a game changing breakthrough. ETA Compute has done just this. Yet, even more impressively, this 3-year-old company has done more than just develop one “impossible” technological achievement, they have developed two. The best part is that they already have working products that incorporate them. Consequently, they are positioned to radically change artificial intelligence processing on edge devices.

Eta Compute has announced their TENSAI AI platform, which is based on the ARM Cortex M3 processor, and has demonstrated a 30X reduction in energy consumption for image classification. Using only 0.4 mJ per image, it has bested previously a published energy consumption figure for a different processor that used 30mJ for the same task. The unique technology enabling this is Eta Compute’s delay insensitive asynchronous logic (DIAL). Not only does it save tremendous amounts of power, but it enables dynamic voltage and frequency scaling, and near threshold voltage operation.


Their novel processor, based on their asynchronous logic, won them two awards at the ARM TechCon, the Design Innovation of the Year and the Best Use of Advanced Technologies awards. By implementing the extremely popular and proven Cortex M3 processor with dramatically improved power efficiency, they have opened up opportunities for applying more comprehensive AI processing on edge devices. When edge devices can efficiently run neural networks a large number of new applications open up. At the same there are reductions in overall power consumption, latency, and bandwidth.

The TENSAI processor offers a Cortex M3 running at up to 100MHz, with sub 1uA sleep current, 512K Flash and 128K SRAM. It has an 8/16 bit dual MAC DSP. It also has independent SRAM for the M3 and DSP. Lastly, it has DMA engines to ensure efficient data transfer to memory from IO. Its highly efficient PMIC adjusts voltage to keep operating frequency constant despite process and voltage variations. An interesting characteristic of the device performance is the attractive scaling of current per Mhz. For instance, running Coremark at 3.3V and 10MHz the processor draws 13.3 uA/MHz. At 100MHz it draws 18.1uA/Mhz.

The second technical accomplishment that Eta Compute has under their belt is implementation of a Spiking Neural Network (SNN). This kind of network works more closely to their biological analogues than traditional CNNs. SNNs require fewer neurons and only require addition operations. This can make them 100X more efficient than traditional CNNs. ETA Compute says that their SNN is capable of doing unsupervised learning with no data labels. This makes it ideal for anomaly detection applications.

Eta Compute is actively exploring new application areas for their technology. For instance, always-on wake-up features are familiar to anyone who uses Google Home or Alexa. This is an ideal edge operation, as it needs to run fast and not require sending large amounts of raw data to the cloud. Data reduction at the edge is another promising technique. Edge neural networks can crop and filter data intelligently so smaller data sets are sent to the cloud for more intensive processing. Eta Compute says they already have a number of novel engagements to apply their technology in agriculture, retail, factories and even in data centers where there are large numbers of sensors deployed to monitor environmental parameters.

By combining their hardware and software advantages, it seems that Eta Computing is in a strong position for the move to enable AI at the edge, where its impacts will be significant in the coming years. We have already seen the changes brought about by pervasive connectivity. Anyone looking at what is ahead can see that there will be even more dramatic changes coming from pervasive AI. More information on Eta Compute’s technology and product can be found on their website.


Why Apple failed in India and how it can recover

Why Apple failed in India and how it can recover
by Vivek Wadhwa on 11-18-2018 at 12:00 pm

Apple iPhone sales in India are expected to have fallen dramatically this year to two million, from three million phones last year. Reuters reports that at the peak shopping season, in Diwali, Apple stores were deserted. This occurred in the world’s fastest-growing market, in which smartphone sales are increasing often by more than 20% every quarter.

Yet Apple’s loss of the Indian market was entirely predictable. In a Washington Postcolumn of March 2017, I described Apple’s repetition in India of the mistakes it made in China: relying entirely on its brand recognition to build a market for its products there. Rather than attempt to understand the needs of its customers, Apple made insulting plans to market older and inferior versions of iPhones to its Indian customers — and lost their loyalty.

The iPhone no longer stands out as it once did from its competition. Chinese and domestic smartphones boasting capabilities similar to those of the iPhone are now available for a fraction of the iPhone’s cost. Samsung’s high-end phones have far more advanced features. And, with practically no brand recognition by the hundreds of millions of Indians who are buying their first devices, Apple does not have any form of product lock-in as it does with western consumers who have owned other Apple products and are now buying smartphones. Apple also made no real attempt to customize its phones or applications to address the needs of Indian consumers; they are the same as in the United States. Siri struggles no less on an Indian iPhone than on a U.S. one to recognize an Indian name or city or to play Bollywood tunes.

It wasn’t even their technical superiority that made the earlier iPhones so appealing to the well-to-do in India; it was the status and accompanying social gratification they offered. There is no gratification in buying a product that is clearly inferior. Indian consumers who can afford iPhones want the latest and greatest, not hand-me-downs.

So Apple could hardly have botched its entry into the Indian market more perfectly.

And it’s not just Apple’s global distribution and marketing strategy that needs an overhaul. The company needs to rethink the way it innovates. Its pursuit of perfection is out of touch with the times.

The way in which innovation happens now is that you release a basic product and let the market tell you how to make it better. Google, Facebook, Tesla, and tens of thousands of startup companies are always releasing what are called minimum viable products, functional prototypes with the most basic of features. The idea is to get something out as quickly as possible and learn from customer feedback. That is because in the fast-moving technology world, there is no time to get a product perfect; the perfected product may become obsolete even before it is released.

Apple hasn’t figured that out yet. It maintains a fortress of secrecy, and its leaders dictate product features. When it releases a new technology, it goes to extremes to ensure elegant design and perfection. Steve Jobs was a true visionary who refused to listen to customers — believing that he knew better than they did about what they needed. He ruled with an iron fist and did not tolerate dissension. And people in one Apple division never knew what others in the company were developing; that’s the kind of secrecy the company maintained.

Jobs’s tactics worked very well for him, and he created the most valuable company in the world. But, since those days, technological change has accelerated and cheaper alternatives have become available from all around the globe.

Apple’s last major innovation, the iPhone, was released in June 2007. Since then, Apple has been tweaking that device’s componentry, adding faster processors and more-advanced sensors, and releasing it in larger and smaller form factors — as with the iPad and Apple Watch. Even Apple’s most recent announcements were uninspiring: yes, yet more smaller and larger iPhones, iPads, and watches.

There is a way in which Apple could use India’s market to its advantage: to make it a testbed for its experimental technologies. No doubt Apple has a trove of products that need market validation and that are not yet perfect, such as TV sets, virtual-reality headsets, and new types of medical devices. India provides a massive market that will lap up the innovations and provide critical advice. Apple could develop these products in Indian languages so that they aren’t usable back at home, and price them for affordability to their Indian customers.

To the visionaries who once guided Apple, experimenting with new ideas in new markets would have been an obvious possibility to explore. Taking instead the unimaginative option of dumping leftovers on a prime market suggests that Apple’s present leaders have let their imaginations wither on the vine.

For more, follow on Twitter: @wadhwa and visit my website: www.wadhwa.com


AMAT and the Jinhua Jinx!

AMAT and the Jinhua Jinx!
by Robert Maire on 11-18-2018 at 7:00 am

Applied Materials reported a just “in line” quarter but guidance was well below street expectation. AMAT reported EPS of $0.97 and revenues of $4.01B versus street of $0.97 and $4B. Guidance missed the mark by a wide margin with revs of $3.56 to $3.86 and EPS of $0.75 to $0.83 versus already reduced street expectations of $3.94B and $0.92 in EPS. Applied’s sock was down almost 10% in after market trading.

We had predicted in our preview note yesterday that we thought Applied would disappoint versus reduced expectations and they certainly did.

Jinhua Jinx
We suggested that the Jinhua loss which was downplayed by KLAC and LRCX was going to be worse at AMAT due to their extra China exposure and we were proven correct as AMAT management laid most of the blame for weak Q/Q guidance on Jinhua by saying that revenue would have been flat to up without the Jinhua issue.

Share Slump
We also suggested in our preview that we were concerned about share loss and Applied said on the call that share loss was the second reason for the worse than expected guidance.

Gary Dickerson, CEO of Applied said that current conditions “Do not play to Applied’s strengths” which is code for share loss. Management suggested that EUV roll out was part of the share loss issue, something we have been talking about for a long time as multi-patterning use will be reduced and fabs will be spending more on litho related tools.

“Can you Canoe?”
Management also said that the we will see more of a “shallow and gradual recovery” as compared to previous expectations of a quicker come back. This suggests more of a “U” or “canoe” shaped cycle bottom than previously expected.

It seems very clear that 2019 will be weaker than the 2018 WFE peak.

We had suggested in our preview that AMAT would buyback fewer shares which would not help offset reduced EPS as it did at Lam which bought back a slug of stock to pump up EPS. In the quarter AMAT bought back about $750M in stock so the EPS weakness was more apparent. AMAT may be keeping some of its powder dry and may amp up its buy backs if the share price drops further.

Service Saves & Display is OK
Service was very strong and up 18% while systems were down by 5%. Display at $700M was not bad and in line with plan. Applied said that going forward systems could be down 21% but service up 7%.

Applied harder hit
The main issue we pointed to in our preview note was that we thought AMAT would get hit harder than its peers in the group and this report seems to underscore that view and prove our thesis correct.

Lam is also exposed to similar share weakness but perhaps not as much China exposure. ASML with its EUV roll out and long lead times is perhaps the most immune to the current weakness. KLAC is also more resistant to the near term issues as it supports the EUV roll out but it does have a high China exposure at 25% of business.

The Stocks
We expect AMAT to get hit for the disappointment and its enough of a miss and weaker guide to take the rest of the group down in sympathy along with it. We have been saying that while we are getting closer to a bottom, we are not there yet so its still not safe to go back in the water and buy the stocks.

The overhang from China remains far from resolved and AMATs report makes us painfully aware of that. There is not likely to be a near term recovery in memory and the weakness could continue for a lot of 2019. Right now there is zero visibility as to the timing of a recovery and AMAT underscored that with its shallower and more gradual comment. We continue to stay on the safety of the sidelines avoid the group and more specifically Applied.


I Thought that Lint Was a Solved Problem

I Thought that Lint Was a Solved Problem
by Daniel Nenni on 11-16-2018 at 12:00 pm

A few months back, we interviewed Cristian Amitroaie, the CEO of AMIQ EDA. We talked mostly about their Design and Verification Tools (DVT) Eclipse Integrated Development Environment (IDE) and how it helps design and verification engineers develop code in SystemVerilog and several other languages. Cristian also mentioned their Verissimo SystemVerilog Testbench Linter, which enforces group or corporate coding guidelines for verification environments written in SystemVerilog. This stuck in my mind because it’s hard to imagine making money selling a stand-alone linter when so much related functionality is already built into the front ends of simulators and logic synthesis tools. So, I started asking more questions.

A little history lesson may be useful. The idea of a dedicated utility to examine code for programming errors and dubious constructs arose in the Unix team at Bell Labs in the late 70s, where Steve Johnson wrote the original version. The name “lint” cleverly captured the idea of getting rid of something unwanted. We have lint brushes for our clothes, lint traps in our dryers, and lint tools for our code. Many typos and simple syntax errors could be detected very early in the coding process, and over time linters added deeper analysis and the ability to find some semantic errors as well.

C programmers who used lint couldn’t imagine life without it. Its fast runtime and precise messages made it an ideal tool for use whenever code was changed. In fact, lint was sometimes added as a “screen” for checking in code to revision control systems. Linters were developed for other programming languages and became essential tools for software engineers. Then, in the late 80s and early 90s, hardware engineers started coding as well. Schematics were replaced by hardware description languages (HDLs) such as Verilog and VHDL. Hardware designers soon faced many of the same debug challenges as their programming colleagues.

In the mid-90s, InterHDL introduced Verilint, which checked for typos, races, dead code, undeclared variables, and more. As with original lint, Verilint ran quickly and produced clear, accurate results. Other companies developed competing products but, over time, most of them were absorbed into the “Big 3” EDA vendors. Many of the linting capabilities were rolled into the front ends of other tools. This leads me back to my opening question of whether there is a place in the market today for stand-alone linters and how AMIQ is being successful.

For a start, Verissimo has the advantage of being faster and better suited to the linting task than simulation or synthesis. But as I dug into Verissimo further, I began to appreciate why it is so popular. One immediate asset is the number of rules: more than 450 according to Cristian, about 75% of them based on the coding guidelines of actual users. This ensures that the problems reported are real issues for real testbenches. Although Verissimo also finds problems in designs, its focus is on testbenches, which are far more complex. This code is not well checked by simulators or traditional linters, while logic synthesis tools never read in testbenches at all.

Further, testbenches use the most advanced object-oriented constructs of the SystemVerilog and Universal Verification Methodology (UVM) standards. That’s why so many rules are needed. In fact, the latest version of the IEEE SystemVerilog standard is more than 1300 pages, and the latest UVM release adds nearly 500 more. With this complexity come overlapping constructs, multiple ways of doing the same thing, and opportunities for language misuse, performance issues, and code maintenance issues. Verissimo enforces best practices for dealing with all these challenges, with rules based on problems found by real users in the past.


Figure 1: Verissimo displays the results from checking more than 450 SystemVerilog rules

Now that verification consumes much more of a project schedule than design, managers are looking closely at how to improve efficiency. With two or three verification engineers per designer, it is important for all members of the team to be aligned on coding style and interoperability. Cristian points out that testbench linting is a great way to ensure that everyone is following the rules for coding conventions, naming conventions, constructs/patterns/recipes to be used or avoided, and even the organization of testbench files. It brings new team members up to speed quickly, reducing their learning curve and aligning them to the prescribed way of coding. Verissimo also automates most of the traditional manual code review tasks, saving even more project time.

The resource demands of verification also mean that testbench code is more likely to be reused than in the past. Verification harnesses, interface checkers, and other testbench elements are often applicable to multiple generations of designs. In fact, Cristian argues that the lifespan of verification code may exceed the lifespan of actual designs. Sometimes this code lives so long that nobody remembers exactly what it does, but everybody is afraid to throw it away or rewrite it. Testbench linting plays an important role in ensuring that new and legacy code is consistent. Just as with original C lint, Verissimo is frequently used as a screen before check-in.


Figure 2: Debugging Verissimo rule violations is easy within DVT Eclipse IDE

Verissimo supports the definition of new rules, customization of rules by changing/tuning parameters, enabling, disabling, or setting severity levels, and waiving of specific rule instances. Verissimo can be run from a command line, started from a regression manager, included in a continuous integration process, or invoked from within DVT Eclipse IDE. Cristian notes that it is easier to see and fix errors when running in the IDE GUI, a big improvement over early linters. All its current users also have access to lots of other tools, so clearly Verissimo adds unique value to the design and verification flow. I’m now sold on the idea that a stand-alone linting tool with the right set of features can succeed today. By the way, I also learned that “Verissimo” means “very true” in Italian.

To learn more, visit https://www.dvteclipse.com/products/verissimo-linter.

Also Read

Easing Your Way into Portable Stimulus

CEO Interview: Cristian Amitroaie of AMIQ EDA

Automated Documentation of Space-Borne FPGA Designs


SiFive Extends Portfolio with 7 Series RISC-V Cores

SiFive Extends Portfolio with 7 Series RISC-V Cores
by Camille Kokozaki on 11-16-2018 at 7:00 am

At the recent Linley Fall Processor Conference in Santa Clara, Jack Kang, SiFive’s VP of Product Marketing introduced SiFive’s Core IP 7 Series.Designed to power devices requiring Embedded IntelligenceandIntelligence Everywhere,the cores allow scalability, efficient performance and customization. The Core IP 7 Series is suited for use in consumer devices (AR/VR gaming, wearables), storage and networking (5G, SSD, SAN, NAS) and AI/ML/edge (sensor hubs, gateways, IoT, autonomous machines).

The 7 Series product family includes the E7, S7, and the U7 product series. The E7 Core IP Series comprises the 32-bit E76 and E76-MC and provides hard real-time capabilities. The SiFive Core IP S7 Series brings high-performance 64-bit architectures to the embedded markets with the S76 and S76-MC. The SiFive Core IP U7 Series is a Linux-capable applications processor with a highly configurable memory architecture for domain-specific customization. The 64-bit U74 and U74-MC, like all SiFive U cores, fully support Linux, while the E76, E76-MC, S76, S76-MC support bare metal environments and real-time operating systems.

The broad portfolio of cores enabled by the 7 Series feature low power consumption, 64-bit addressability, tight accelerator coupling, and custom instruction allowance. These features are new to the market and provide the highest performance commercial RISC-V processor IP available today. The SiFive Core IP 7 Series raises the bar with hardware-based, real-time capabilities and unprecedented scalability.

The 7 Series enables the sharing of common features with in-cluster heterogenous compute and allows users to combine E7 and S7 cores with U7 cores in a single coherent operation, thereby greatly easing the software team’s development effort

More specifically the Core IP 7 Series offers:

[table] border=”1″ cellspacing=”0″ cellpadding=”0″
|-
| style=”width: 173px” | Efficient Performance
| style=”width: 204px” | Scalability
| style=”width: 245px” | Feature Set
|-
| style=”width: 173px” | ~60% improvement in CoreMarks/MHz*
| style=”width: 204px” | 8+1 coherent CPUs in
a cluster
| style=”width: 245px” | In-cluster heterogeneous compute for Application + Real-time processors
|-
| style=”width: 173px” | ~40% improvement in DMIPS/MHz*
| style=”width: 204px” | 512 coherent on-chip CPUs via TileLink
| style=”width: 245px” | 64-bit architectures across the portfolio
|-
| style=”width: 173px” | ~10% improvement in Fmax*
| style=”width: 204px” | 2048 multi-socket coherent CPUs via ChipLink
| style=”width: 245px” | Innovative L1 Memory microarchitecture
|-

*Compared to SiFive Core IP 5 series

In storage applications, the 64-bit real-time addressability will be a key feature for big data applications to exploit. In addition, the capability for specific custom instructions will greatly supplement storage, machine learning (ML) and cryptography use cases.

Tightly integrated memories (TIM) and cache lock capabilities will benefit critical real-time workloads in 5G and networking. Configurable memory maps and coherent accelerator ports allow designers to tightly couple storage with specific accelerators. It is also possible to have coherent in-cluster combinations of application processors and real-time processors. Safety applications will be enhanced by ECC capability across the SRAMs as well as significant guarantees around deterministic performance.

AR/VR/sensor fusion applications can combine multiple SiFive Core IP series. For example, the 2, 3, 5 and 7 series can all be flexibly integrated into a single design with tight power constraints. Mixed precision arithmetic accelerates machine learning compute.

Within the 7 Series portfolio, standard cores are offered where existing configurations with known power, performance and area (PPA) may be preferred by customers. Customers will have the option of using the standard cores as silicon verified design start points with the ability to customize the 7 Series core to meet application-specific requirements.

The U7 Series contains

  • Heterogenous in-cluster combinations of application processor and real-time processor supported
  • Configurable Level 2 cache with cache lock capability and Tightly Integrated Memory (TIM) available
  • Functional safety, security and real-time features such as:

o SECDED ECC on all L1 and L2 memories
o PMP and MMU for memory protection
o Programmatically clear and/or disable dynamic branch prediction for deterministic execution and enhanced security

E7, S7, U7 Core Series Architectural Features

  • Dual Issue, in-order 8-stage Harvard pipeline
  • A very flexible memory system
  • Multi-core capable with coherency and optional L2 (E7, S7)
  • Deterministic fast interrupt responses
  • Higher throughput and efficiency


The E7/S7 Level 1 memory system allows access to large SRAMS that are on the system side and allows other masters on the SoC to access the memory through the main core complex with fast I/O ports ideal for hanging accelerators.

SiFive can aggregate value by giving a single deliverable to customers with all the various desired hardware design options packaged, integrated, and enabled with software development.

During a panel at the Linley Fall Processor Conference, Kang stressed the configurability of the cores, which all have the ability to change branch prediction sizes as well as L1 and L2 memory configurations. The cores can include or exclude single or double-precision floating-point-units and have the ability to add custom extensions. The ability to combine cores in a heterogeneous cluster is a unique differentiator from other core architectures that are not coherent. Kang clarified that, even though internal development is in Chisel, all deliverables to customers are in human-readable Verilog.

I got a chance to have a side chat with Jack Kang, and he clarified that heterogenous operation refers to a mix of SiFive cores from different core series connected together to form a coherent core complex.

Kang added that, even though no single architecture rules them all, customers need general purpose programmability and control. In new AI and ML application domains, vector extensions can be added to provide functionality.

I then asked what constitutes the next success metric in achieving critical mass or adoption. Kang said that the market has started seeing commercial products with RISC-V chips this year. The next phase is then seeing those products being announced, launched, and shipped in volume. The situation today is that companies are hesitant of being first to adopt RISC-V in their industry while simultaneously worrying about being left behind. This has forced many companies to review their RISC-V strategies. Kang’s view is that companies will be late to RISC-V if they do not come in now. Next year, RISC-V products will be out. And since products are trailing indicators, it is a sure sign to move to RISC-V now. Kang sees that companies are not merely replacing designs but are seeking advanced features and choosing RISC-V (and SiFive specifically) because it allows them to tackle unsolved problems.

Kang went on to say that a good architecture serves as table stakes but that there must also be an ecosystem to back it up. RISC-V has an ecosystem that is being globally co-developed by all RISC-V member companies (including the likes of Google and Samsung), not just SiFive (founded in 2015 by the inventors of the RISC-V ISA). The rate of growth of the software ecosystem is high, with Debian and Fedora being ported to RISC-V as evidence of momentum. The ecosystem and tools are rapidly maturing.
I asked Kang if he had a special message to convey. He stated that with the 7 Series, SiFive brings new features, such as in-cluster heterogenous core complexes, that are needed to enable embedded intelligence. The takeaway is that “RISC-V is just not a replacement architecture. It is innovation and customization with new features enabling embedded intelligence and we are starting to see it really take off.”

Other factoids:

– With SCIE (SiFive Custom Instruction Extension) customers can add custom, Verilog-based, instructions which execute in a single cycle or multi-cycle. Some customers can create their own extensions and keep them secret.
– SCIE uses intrinsics for custom instruction generation which decouples custom instructions from specific compiler versions and allows for use with standard GCC and LLVM toolchains.
– SiFive’s RISC-V Core IP will always support the latest RISC-V standard extensions.


Eliminate PCB Re Spins using an Integrated Multi Dimensional Verification Platform

Eliminate PCB Re Spins using an Integrated Multi Dimensional Verification Platform
by Daniel Nenni on 11-15-2018 at 12:00 pm

The rapidly increasing complexity of today’s designs, combined with schedule pressure to deliver innovative products to market as quickly as possible, strains engineering resources to the limit, often to the point of breaking. As a result, 17% of all projects get canceled, and another 28% miss their target release date (Source: Lifecycle Insights – September, 2018). Project health is suffering. A more efficient design flow is needed to better utilize available engineering resources, while keeping complex projects moving forward on schedule.

The key to a more efficient design flow is the early detection and elimination of potential design issues. These potential issues can range from simple schematic errors allowed to propagate forward into layout, to complex mechanical issues, to issues impacting product testability and manufacturability. Identifying and fixing these potential issues as early in the process as possible avoids unnecessary schedule delays and costly design re-spins. It also frees up valuable engineering talent to move on to other projects.

The Conventional Design Flow
The traditional project development flow is inefficient and fraught with pitfalls. It relies far too heavily on manual reviews and costly prototypes. Verification of each design phase occurs far too late in the process. Valuable engineering resources are spent debugging errors in the lab that should have been caught during schematic entry. Errors uncovered this late in the game result in costly re-spins, that once again follow the same inefficient, error prone, manual review process.

As a result of this conventional process flow, the typical project goes through 2.9 re-spins, with an average schedule hit of 8.5 days and a cost of $44,000 per re-spin (Source: Lifecycle Insights – September 2018). For high-performance designs, the costs are often much higher. Due to the complexities of modern designs, these delays and added costs are unpredictable and project managers tend to bake them into their schedules and budgets. This conventional approach wastes time, talent, materials and puts projects at risk for cancelation.

The Shift-Left Approach to Integrated Design Verification
In order to eliminate the inefficiencies of the conventional design flow, a “shift-left” approach is desired that integrates verification as early as possible in the design process. This means catching errors and potential issues at the source, before they can propagate forward into subsequent phases of the project. Schematic errors should be caught during schematic entry, not in the lab after building costly prototypes and hundreds of hours of debug time. Automated schematic integrity analysis should be employed to eliminate the reliance on manual, visual schematic reviews.

Routing constraints for signal and power integrity, as well as design for test constraints should be specified during schematic capture, not shoe-horned in at the layout phase. Signal and power integrity, EMI compliance, thermal analysis and vibration analysis should all be validated during the layout process.

The goal of the shift-left approach is the same in all cases – to move as much verification as possible as early in the design cycle as permissible, while also automating the analysis to provide the highest possible degree of coverage. The conventional design flow is frustratingly unpredictable. It relies far too much on manual visual design checks that allow far too many errors to propagate forward to the next step in the process. The shift-left verification flow catches errors and identifies potential issues early in the process where they are quickly and economically corrected. It is a more efficient process that provides more predictable results, eliminates design re-spins and yields higher quality products in less time. The ultimate goal should be an all-inclusive, multi-dimensional verification process that reduces reliance on both manual reviews and manual debugging of physical prototypes.

A Multi-Dimensional Verification Solution
A multi-dimensional verification solution is comprised of a broad range of analysis and verification tools used during the schematic and layout phases of the project. These tools are aimed at non-specialist PCB design engineers and layout designers and allow them to work within their familiar authoring environments to identify problems early in the design.

During schematic capture automated schematic integrity analysis is performed to eliminate common schematic errors that often escape the manual review process. Signal integrity and power integrity analysis is performed to determine a set of placement and routing constraints to be passed forward to layout. Testability analysis should also occur during schematic entry, prior to layout. The design is analyzed, test point requirements are identified and passed to layout as constraints.

During layout, as component placement is progressing, EMI validation, thermal analysis, vibration/acceleration and manufacturability analysis should all be performed to quickly identify and correct any potential issues. In the traditional design flow, these issues would not be discovered until physical testing in an EMI, thermal or HALT test chamber. If they are not caught during layout, issues that impact the mechanical integrity of the design are usually the most expensive and time consuming to fix. Such issues often require board re-spins and tooling changes to correct. Simulations during layout greatly increases the likelihood of first-pass success.

The Tools to Implement a Shift-Left Automated Verification Design Flow
The Mentor® Xpedition® platform includes all of the shift-left enhancements described in this article. Xpedition provides a multi-dimensional integrated verification platform for single and multi-board PCB designs that includes automated schematic integrity analysis with built-in automated design checks and an extensive library of intelligent models. Xpedition also includes integrated testability analysis, automated component modeling for vibration analysis, DC voltage drop analysis for rigid-flex and multi-board designs, as well as concurrent DFM analysis during layout.

Xpedition provides powerful verification tools integrated within the authoring environment to enable easier, faster validation which reduces costly design re-spins, improves time-to-market for new products and results in higher quality product with fewer defects. The Xpedition integrated verification platform is a better, more modern approach to today’s complex design challenges.

For more information, read Integrated Verification: A Shift-Left Solution for a More Efficient Design Flow.


NXP Strengthens Security, Broadens ML Application at the Edge

NXP Strengthens Security, Broadens ML Application at the Edge
by Bernard Murphy on 11-15-2018 at 7:00 am

Security and machine learning (ML) are among the hottest areas in tech, especially for the IoT. The need for higher security is, or should be, blindingly obvious at this point. We struggle to fend off daily attacks even in our mainstream compute and networking environment. How defenseless will we be when we have billions of devices in our homes, cars, cities, utilities and farms, open to attack by any malcontent (or worse) with an urge to create chaos? Meanwhile, ML is gaining traction at the edge simply because, for many of these devices, the classic human-interface paradigm of keyboards and monitors/cryptic displays is too cumbersome, too difficult to use and too costly.

In support of raising capabilities in both of these domains, NXP recently launched a couple of new platforms and a toolkit for intelligence at the edge. I’ll start with the platforms, the LPC5500 microcontrollers and i.MX RT600 crossover processors. They argue a multi-layered approach to security in these platforms, including

  • Secure boot for hardware-based immutable root-of-trust
  • Certificate-based secure debug authentication
  • Encrypted on-chip firmware storage with real-time, latency-free decryption

They’ve added a couple more important security features. Device-unique keys can be generated on-demand through a physically unclonable SRAM-based function (PUF). They also provide support for the DICE standard which is becoming increasingly popular in IoT identity, attestation and encryption. Even more interesting (to me), NXP are working on a relationship with Dover Microsystems, about who I’ll talk more in a later blog. NXP plan to integrate Dover’s CoreGuard technology offering an active, rule-based security mechanism.

On the ML side, NXP recently announced their eIQ software environment for mapping cloud-trained ML environments to edge devices. I found this to be one of most compelling parts of the NXP announcement. Normally when you think about mapping a TensorFlow, Caffee2 or whatever neural net model to a resource-constrained edge device, you think about mapping to specific NN architecture in that device. But what if you need to target a wide range of devices, all the way from CPUs up to dedicated ML cores? Will that require a different mapping solution and lots of ML expertise per platform? According to Geoff Lees, Sr VP and GM of microcontrollers, eIQ and the platforms mentioned above should make this multi-device targeting a lot easier.

I asked why anyone would want to implement ML on a CPU. After all, CPUs are famously the least effective platform for ML in terms of power per watt. I asked a similar question at an ARM press briefing last year and got what I thought was a rather defensive response. So I was curious to get NXP’s take. Geoff provided a great example of an intelligent microwave. This doesn’t need a lot of ML horsepower to handle (locally) trigger-word recognition and basic natural language processing for a very limited vocabulary. Or better yet, recognizing the food when you put into the microwave. Nor does it have to provide microsecond response times or run off a coin cell battery (since a microwave has to be wired anyway). So a Cortex M33 with its support for DSP processing is amply suited to the task and likely cheaper than more elegant NN platforms. Which should be important in a mass-market appliance.

For fancier applications, you’ll still want to rely on a dedicated ML engine. In the i.MX RT600 family, this is the Cadence Tensilica Hi-Fi 4 DSP. Hopefully now you see the value of eIQ – a common ML mapping platform which can handle mapping to all NXP devices, from high-end i.MX 8QM down through mid-range devices to the Cortex M33-based devices.

As examples of how these technologies can be applied, NXP recently showed (at the Barcelona IoT World Congress) an industrial application in which they used various subsystems including drones for operator recognition (are you allowed to perform this function), object recognition for operator safety, voice control and anomaly detection to predict failures in drone operation. At TechCon they showed trigger word recognition and voice control and in vision they showed food recognition (for that microwave) and traffic sign recognition.

From microwaves to traffic sign recognition and factory floor automation, looks like NXP is making a play to own an important piece of edge processing, both in security and in machine learning, across a wide range of processor solutions.


Mentor’s Symphony in Tune with AMS Designer Needs

Mentor’s Symphony in Tune with AMS Designer Needs
by Tom Simon on 11-14-2018 at 12:00 pm

Mixed signal simulation is a very hot topic these days. In modern designs, it is harder to draw a line between the analog and digital and work with them independently. Analog blocks are showing up everywhere. Even in what would have qualified as a digital design a few years ago, now designers need to look at things like PLLs, IOs and SerDes from a detailed analog perspective in context to ensure proper design behavior and performance. The drive to reduce power, the addition of sensors, increased use of ADCs, oscillators and other analog blocks in SOCs have all exacerbated the need for faster, easier and more accurate mixed signal modeling. At the same time requirements imposed by automotive standards such as ISO 26262 are creating the need for more comprehensive verification of mixed signal chips.

This last week Mentor has created quite a buzz with the introduction of their Symphony Mixed Signal Platform. Mentor has never been a slouch when it comes to analog and digital simulation. However, their AFS (Analog Fast SPICE) has been a game changer for the industry. What Symphony brings to the table is the ability to easily combine the leading analog simulator with Mentor’s, or other, digital simulators. At the same time Symphony overcomes many of the limitations that engineers faced while trying to verify mixed signal designs.

Typically, transistor level analog simulation was too slow to incorporate with digital simulations. As a result, people turned to behavioral models to speed up the analog side of the simulation. However, creating these models requires specializes skills and extra development time. And, of course any design revision required rework. Symphony lets design teams avoid the need for behavioral modeling to achieve faster run times. AFS provides nanometer SPICE accuracy and a capacity of 20M SPICE elements.

One of the key concepts of Mentor’s Symphony is their use of Boundary Elements (BE) that support all signal types and multiple power domains, including dynamic supplies. Their approach significantly improves debug, where now detailed information about signals at the interfaces can easily be examined in detail. Their approach is flexible enough that mixed digital and analog hierarchies are easily supported, with multiple levels and no restrictions on mixing A or D at each level. One important feature that Mentor is highlighting is their Hi-Z checking capability, which lets designers detect when a mixed signal net goes into a ‘Z’ state.

According to Mentor they have 30 customers who have been using Symphony prior to its release and their announcement contains many customer quotes reporting dramatic improvements in runtime and overall results.

Stepping back, this new product from Mentor is starting to paint a picture of what the Siemens acquisition means for Mentor. Going from a public company to a privately held company can mean big changes. I know that many people in the industry were wondering if Mentor would become the private EDA group for Siemens or if they would be able to continue robust product development. Much of Mentor’s more recent reputation and success has come from the Calibre line. Of course, Mentor has very competitive offerings across their product line. However, Symphony looks like a major long-term investment that aims to upset the analog mixed signal flow status quo. There is more information about Symphony on the mentor website.


Synopsys DDR5 LPDDR5 Memory Interface IP Targets AI, Automotive, and Mobile SoCs

Synopsys DDR5 LPDDR5 Memory Interface IP Targets AI, Automotive, and Mobile SoCs
by Camille Kokozaki on 11-14-2018 at 7:00 am

Synopsys announced on October 24 new DesignWare[SUP]®[/SUP] Memory Interface IP solutions supporting the next-generation DDR5 and LPDDR5 SDRAMs. The DDR5 and LPDDR5 IP significantly increase memory interface bandwidth compared to DDR4 and LPDDR4/4X SDRAM interfaces, while reducing area and improving power efficiency. The DesignWare DDR5 IP, operating at up to 4800 Mbps data rates, can interface with multiple DIMMs per channel up to 80 bits wide, delivering the fastest DDR memory interface solution for artificial intelligence (AI) and data center system-on-chips (SoCs).

The industry’s first LPDDR5 IP, running at up to 6400 Mbps, provides significant area and power savings for mobile and automotive SoCs with its dual-channel memory interface option that shares common circuitry between independent channels. For additional power savings, the DesignWare Memory Interface IP solutions provide several low-power states with short exit latencies and offer multiple pre-trained states for dynamic frequency change capability. The DDR5 and LPDDR5 controller and PHY seamlessly interoperate via the latest DFI 5.0 interface, providing a complete memory interface IP solution for high-bandwidth, low-power SoC designs.

DesignWare DDR IP Solutions

[table] border=”1″ cellspacing=”0″ cellpadding=”0″ style=”width: 100%”
|-
| style=”width: 15.32%” | DesignWare DDR PHY(New)
(full list here)
| style=”width: 33.92%” | SDRAMs Supported /
Maximum Data Rate

| style=”width: 15.9%” | Interface to Memory
Controller

| style=”width: 34.86%” | Typical Application
|-
| style=”width: 15.32%” | LPDDR5/4/4X
| style=”width: 33.92%” | LPDDR5 / 6400 Mbps
LPDDR4 / 4267 Mbps
LPDDR4X/ 4267 Mbps
| style=”width: 15.9%” | DFI 5.0
| style=”width: 34.86%” | Design in 16-nm and below that requires high-performance mobile SDRAM support up to 6400 Mbps
|-
| style=”width: 15.32%” | DDR5/4
| style=”width: 33.92%” | DDR5 / 4800 Mbps
DDR4 / 3200 Mbps
| style=”width: 15.9%” | DFI 5.0
| style=”width: 34.86%” | Design in 16-nm and below that requires high-performance DDR5/4 support up to 4800 Mbps
|-

Some highlights:

  • The industry’s first LPDDR5 controller, PHY, verification, and IP solution support data rates up to 6400 Mbps with up to 40% less area than previous generations
  • The complete DDR5 IP solution supports up to 4800 Mbps with single, dual channels for discrete devices and DIMMs
  • Both solutions provide several low-power states with short exit latencies and offer multiple pre-trained states for dynamic frequency change capability

The DesignWare DDR5 and LPDDR5 IP solutions support all required features of the DDR and LPDDR specifications, enabling designers to incorporate the necessary functionality into their SoCs:

  • Firmware-based training via an embedded calibration processor in the PHY optimizes the boot-time memory training for highest data reliability and margin at the system level. It also allows fast updates to the training algorithms without requiring changes to the hardware
  • Decision feedback equalization (DFE) used in the input receivers reduces the impact of inter-symbol interference (ISI) to improve signal integrity
  • Reliability, availability, serviceability (RAS) features, including inline or sideband error correcting code (ECC), parity, and data cyclic redundancy checks (CRC), reduce system downtime
  • Synopsys PHY hardening and signal/power integrity expertise enable faster design completion time and a higher design confidence degree.
  • Synopsys VIP for DDR5 and LPDDR5 provides randomized configuration and runtime selection, as well as built-in comprehensive coverage, verification plan, and protocol checks for increased productivity.

ARM, Micron and SK Hynix provided testimonials in a Synopsys press release on October 24, 2018. In that press release John Koeter, vice president of Marketing for IP at Synopsys, emphasized that Emerging applications such as AI, automotive, and cloud are requiring significantly higher memory bandwidth to address the massive amount of data throughput. He added that Synopsys is offering designers the fastest DDR5 and LPDDR5 IP solutions on the most advanced FinFET processes to deliver innovative products that are differentiated in bandwidth, power, and area.

Availability

  • The DesignWare DDR5 PHY and LPDDR5 PHY are scheduled to be available in Q1 of 2019
  • The DesignWare DDR5 Controller and LPDDR5 Controller are scheduled to be available in Q2 of 2019
  • The VC Verification IP for DDR5 and LPDDR5 is available now.

Worth Noting

  • All the DFI-compatible DDR PHYs are supported by Synopsys’ unique DesignWare DDR PHY Compiler. In addition, Synopsys’ DesignWare DDR5/4 Controller, LPDDR5/4/4X Controller, and Enhanced Universal DDR Memory and Protocol Controller IP feature a DFI-compliant interface, low latency and low gate count while offering high bandwidth. Optional market-specific features like AMBA AXI/4 AXI Quality of Service (QoS) and Reliability, Availability, and Serviceability (RAS) features allow you to match the area and capabilities of the controllers to designer needs.
  • Synopsys also offers DesignWare HBM2 IP, which provides 12x the bandwidth of DDR4 IP and ten times better power efficiency for graphics, high-performance computing, and networking SoCs.

DesignWare[SUP]®[/SUP] Memory Interface IP solutions


Fusion Synthesis for Advanced Process Nodes

Fusion Synthesis for Advanced Process Nodes
by Alex Tan on 11-13-2018 at 12:00 pm

Synopsys recently unleashed Fusion Compiler™, a new RTL-to-GDSII product that enables a data-driven design implementation by revamping Design Compiler architecture and leveraging the successful Fusion Technology –seamlessly fusing the logical and physical realms to produce predictable QoR. It is a long-awaited move that provides a breakthrough solution as more designers are migrating into deep advanced nodes, 7nm and beyond.

Let’s glance through earlier synthesis key challenges that might act as precursors to subsequent developments leading towards this vital product announcement.

Traditional synthesis challenges
As part of the RTL-to-GDSII flow, synthesis tool such as Design Compiler transforms design RTL description into an optimized gate-level representation. This includes performing architectural, logic and gate level optimization steps. Synthesis utilizes standard cell library, pre-characterized for timing and power across various input slews, load conditions and process corners (or PVT –Process, Voltage, Temperature), to generate optimal design based on a given set of PPA (Performance, Power and Area) target. Over time, synthesis has been fitted with limited physical and placement awareness as inroad into routing.

As wire performance fails to keep pace with device performance in advanced process scaling, inadequate interconnect modeling or estimation has translated to a disparity between the synthesis QoR and those generated by downstream physical implementation tools. The once tolerable trade-off across PPA during micrometer process node era is no longer acceptable for sign-off in advanced nodes –as designs are increasingly being targeted for emerging applications that require power efficiency as well.

Interconnect shift impacts not only on delay related metric, but also on power due to the increased RC or degraded-slew induced power dissipation. Such gap has been exacerbated by device threshold lowering or near threshold condition that shifts the total power contribution from dynamic to the static term. This drives the need of having a solution that delivers optimal results for both performance and power.

Moreover, increased design density also has strained synthesis tool and has demanded scalability, runtime improvements and more physical awareness. For example, the tool needs not only a congestion awareness but also a capability for generating legalized placement –to ascertain an accurate resource utilization and minimal perturbation during route optimization.

Common Data Model and Fusion Technology
Key to this breakthrough is the adoption of a common data-centric architecture. The Fusion Compiler single data model contains both logical and physical information to enable sharing of library, data, constraints, and design intent throughout the implementation flow. It has scalability to support ultra-large designs with the smallest feasible memory footprint. The Fusion Data Model serves all design phases and provides faster tool-data model interaction, interactive what-if analysis, and native multi-everything (cores, corners, etc.) with near-linear scalability across multiple CPU cores. It also supports transparent, multi-level hierarchy and the efficiency to run compute-intensive algorithms, facilitating more optimizations for better QoR.

Another enabler is Synopsys Fusion Technology™ which was announced in March 2018. It provides new level of integration of Synopsys synthesis, place & route and signoff tools, by redefining conventional product boundaries with systematic sharing of algorithms, code and data representations across multiple tasks.

Fusion provides Design Fusion, ECO Fusion, Signoff Fusion, and Test Fusion technologies. Design Fusion enables synthesis technology inside place-and-route, and place-and-route technology inside synthesis. ECO Fusion drives faster signoff closure with the signoff analysis and ECO optimization enabled directly from within implementation. Signoff Fusion eliminates design margin and over-design, using PrimeTime and StarRC for both optimization and signoff. Test Fusion is the combination of design-for-test (DFT), synthesis and automatic pattern generation (ATPG) technologies. Using physical design data, Test Fusion ensures optimal placement of test points while minimizing routing congestion and area impact.

Fusion Technology offers a bidirectional access between synthesis and the adjoining implementation tools, including sharing of optimization engines between the two domains. As Fusion Compiler integrates all synthesis, place-and-route and signoff engines on a single data model, it removes the necessity of having data conversion and transfer –hence, providing good QoR accuracy, best predictability and optimal throughput.

The Fusion Design Platform also AI-enhanced to enable additional QoR and TTR gains by speeding up computation-intensive analyses, predicting outcomes to improve decision-making, and leveraging past learning to drive better results.

Fusion Compiler QoR and Customer Feedback
The unified architecture of Fusion Compiler shares technologies across the RTL-to-GDSII flow delivering 20 percent better QoR and 2X faster time-to-results (TTR). It has been silicon-proven at several customers.

Fusion Compiler’s new solver-based global optimization engine enables path-based total negative slack (TNS) targets and analysis of critical path traces for effective design closures. Both pre- and post-route engines use the same costing and infrastructure for consistent correlation throughout the flow. Its underlying multi-corner multi-mode (MCMM) and multi-voltage (MV)-aware heuristic algorithms concurrently tackle all the design metrics for best QoR. Likewise, logic remapping, rewiring and legalization interleaved with placement minimizes congestion and speeds timing closure. The CTS engine follows a networking flow paradigm for optimal balancing of latency and skew.

“The power of this technology is essential for the design of tomorrow’s FinFET-based automotive applications. With Fusion Compiler, we achieved the target design goal and completed the tapeout. Compared to conventional technology, we confirmed a 33 percent reduction in timing violations, 10 percent area reduction, and 30 percent less leakage power, while cutting the design turnaround time in half. We have completed the integration of Fusion Compiler in Toshiba’s design environment and have begun to deploy it to upcoming SoC designs,” said Seiichi Mori, senior vice president, Toshiba Electronic Devices and Storage Corporation.

As shown in figure 2 and 3, Fusion Compiler runs produced more optimal PPA results from improved via-pillars handling and CCD (Concurrent Clock Data) optimizations –two snapshots of many underlying technological enhancements that promote the overall QoR gain.

“As design complexity increases across all our market segments, our key requirement is to achieve the best product performance coupled with the highest levels of predictability,” said Michael Goddard, senior vice president, Samsung SARC and ACL. “With Fusion Compiler, we are on track to achieve optimal PPA with up to 10 percent better timing, 10 percent lower leakage, two-to-five percent dynamic power savings, and typically two-to-three percent area reduction for our most challenging design blocks on our imminent tapeout. In addition, the predictable path from synthesis to signoff reduces design iterations, ensuring that we can meet our aggressive product schedules.”

“Strong semiconductor market drivers like autonomous driving and the adoption of AI continue to drive global demand for larger, faster, and more energy-efficient SoCs”, said Sassine Ghazi, co-general manager of Synopsys’ Design Group.

“Our early assessment of Fusion Compiler shows significantly better full-flow predictability, faster full-flow turnaround time, and better timing QoR compared to the previous approaches. We are collaborating with Synopsys to deploy this innovative RTL-to-GDSII solution, as it will streamline physical design of our mission-critical projects and allow us to bring new products to market much faster,” said Taichiro Sasabe, general manager, SoC Design Division at Socionext.

With this release of Fusion Compiler, Synopsys has raised the bar for a holistic synthesis solution –replacing the traditional RTL-to-GDSII design flow that is comprised of either disconnected or loosely-coupled tools for emerging applications and advanced process nodes.

For Fusion Compiler whitepaper please check HERE and datasheet HERE.