RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Autonomous Vehicles: Avoiding Obstacles and Responsibility

Autonomous Vehicles: Avoiding Obstacles and Responsibility
by Roger C. Lanctot on 09-27-2020 at 6:00 am

Autonomous Vehicles Avoiding Obstacles and Responsibility

The headline screams off of the page and challenges all that we know about the fatal crash in Tempe, Ariz., that took the life of Elaine Herzberg two and a half years ago. “Backup Driver of Autonomous Uber SUV Charged with Negligent Homicide in Arizona.”

How could the National Transportation Safety Board (NTSB) associate itself with such an outcome?  Shouldn’t Uber ATG bear full responsibility for the crash?

An NPR (National Public Radio) report notes that “Rafaela Vasquez (the driver of the Uber Advanced Technologies Group car) appeared in court on Tuesday (last week) in Maricopa County, Ariz. She pleaded not guilty to the (negligent homicide) charge and has been released with an ankle monitor. Her trial is set for Feb. 21st.”

A glance at the extensive NTSB reporting on its investigation reveals a Volvo SUV kitted out with a vast sensor array including:

  • A Velodyne Lidar
  • Eight radar sensors
  • Ten cameras
  • 12 ultrasonic sensors
  • GPS, Inertial Measurement Unit, and LTE

The report also details the decision-making protocols written into the software code defining the automated driving system (ADS) and further notes the vehicle was designed to operate only in designated areas that had been “mapped” by Uber. Not all sensors were operational and when the ADS system was engaged the existing advanced driver assistance system in the underlying Volvo SUV was disengaged.

NTSB Final Report on Uber ATG Crash: https://www.ntsb.gov/investigations/AccidentReports/Reports/HAR1903.pdf

Responsibility for the crash could be credibly attributed to Uber on a variety of levels including the configuration of the system, the reliability and robustness of the underlying software code and algorithms, the reliability of the hardware including sensors and processors, the quality of the underlying map governing the operational design domain (ODD), and the training of the “safety driver.”

Uber’s responsibility, no liability, virtually shouts from the pages of the NTSB report. In the instance of the fatal crash, for instance, the system determined that emergency braking was required, but the emergency braking maneuver is not enabled when the car is in self-driving mode, according to NTSB findings. Uber stated that this was intended to reduce the likelihood of erratic stops on public roads (something Tesla vehicles have been found to do).  It was up to the safety driver to intervene – but the system was not designed to alert the driver when emergency braking is required.

It’s true that the safety driver was distracted at the time of the crash – watching a television program on a mobile device brought into the car. Video recorded by the vehicle’s driver monitoring system clearly reveals this distraction.

The purpose of the driver monitor is to ensure that the driver remains engaged in the driving task. Uber failed to link the driver monitor system to either a driver warning or to a disengagement of the automated driving system. The fatal crash is a strong argument for simultaneous remote driver monitoring if not remote vehicle control in such a testing circumstance.

Worse even than these issues, though, was Uber ATG’s history of crashes as reported by the NTSB:

“ATG shared records of fleet crash history with NTSB investigators. The records showed that between September 2016 and March 2018 (excluding the current crash), there were 37 crashes and incidents involving ATG test vehicles which at the time operated in autonomous mode. Most of these crashes involved another vehicle striking the ATG test vehicle—33 such incidents; 25 of them were rear-end crashes and in 8 crashes ATG test vehicle was side swiped by another vehicle.

“In only two incidents, the ATG test vehicles were the striking vehicles. In one incident, the ATG vehicle struck a bent bicycle lane bollard that partially occupied the ATG test vehicle’s lane of travel. In another incident, the vehicle operator took control of the vehicle to avoid a rapidly approaching oncoming vehicle that entered the ATG vehicle’s lane of travel; the vehicle operator steered away and struck a parked car. In the remaining two incidents, an ATG vehicle was damaged by a passing pedestrian while the vehicle was stopped.”

The history reported by Uber ATG to NTSB suggests a less than stellar performance by the ATG system on the road leading up to the fatal crash. It is perhaps no surprise that both Uber ATG and its newfound partner in automated driving at the time, Nvidia, stopped testing their automated driving systems in the wake of the crash.

The conclusion of the NTSB was that the ADS was sufficiently proficient to lull the safety driver into a false sense of security and that Uber failed to put adequate countermeasures in place to overcome that driver complacency. In the end, though, the NTSB determined the probable cause of the crash to be “the failure of the vehicle operator to monitor the driving environment and the operation of the automated driving system because she was visually distracted throughout the trip by her personal cell phone.”

Uber settled with Elaine Herzberg’s family almost immediately. Uber ATG has made multiple changes in its program and resumed testing since the crash. But the surfacing of criminal charges against the safety driver two and a half years after the incident raises questions regarding responsibility and liability in the automotive industry.

With decent legal representation, the Uber safety driver should be able to avoid serious sanction or jail time. Given the flawed configuration of the automated driving system and the failure to link the driver monitor to its operation clearly points to Uber’s responsibility.

Getting this right, properly assigning responsibility, is essential to the creation, deployment, and adoption of semi-autonomous systems such as Tesla Motors’ Autopilot beta and General Motors’ Super Cruise. Before deploying these systems we must know how, when, and whether they will work. There is no excuse for allowing the attention of drivers to stray during automated operation – if a system is really semi- and not fully autonomous.

It’s pretty clear from the NTSB report that the Uber ATG system should never have been on the road as configured. It was a crash waiting to happen. In the process, Uber cast the entire autonomous vehicle project and the related regulatory framework – or lack of one – in doubt.

We are left with the unresolved issue of how to regulate automated driving – even as testing continues to expand across the U.S. and commercial deployments commence. The lack of a regulatory or enforcement infrastructure is the enduring legacy of the incident.

We are left with self-certification – which clearly failed in the case of Uber ATG. Regulators and legislators are left holding the bag – which is full of inscrutable software code and algorithms.

Advocates for Federal AV legislation are arguing for widespread regulatory exemptions for AVs, and Federal priority over future AV regulations concerning vehicle design and performance parameters. In this case, perhaps a Federal framework might be a good start.

In the meantime, all AVs ought to be equipped with remote driver monitoring as well as remote control. It is clear that in the Tempe, Ariz., crash Uber ATG lost all plausible deniability. The system recorded the driver’s misbehavior without doing anything about it. That ought to be immediately corrected.


Don’t You Forget About “e”

Don’t You Forget About “e”
by Daniel Nenni on 09-25-2020 at 10:00 am

e Flow Vert

I imagine that the title of this post will remind many of 80s synth-pop, or perhaps the movie The Breakfast Club. But my topic is the venerable hardware verification language (HVL) known simply as e. It has quite an interesting history and it played a key role in the development of the modern testbench methodology that most chip verification engineers use today. I was wondering about the language and where it stands now, and I thought that it would be an interesting topic for a blog post. Let me start with the history. By the late 1980s, functional verification was hitting a wall. In the days of small chips, at best the designers might have hand-written some interesting input values or sequences, run them in simulation, and looked at waveforms to check the results. As chips grew bigger, this was no longer enough.

Project managers saw value in separating design and verification, and during the second half of the 80s dedicated verification engineers became more common. They generally started with a verification plan in a spreadsheet or document, iterating all the features to be verified. The engineers hand-wrote tests for these features, checking them off as they ran and passed in simulation. Verification teams gradually developed more automated methods, including randomized input data and self-checking tests. They started using hardware description language (HDL) line coverage to see how well the tests exercised the design, and some of the more advanced teams added ad hoc functional coverage metrics such as reporting which states in a finite state machine (FSM) had been visited.

In the early 90s, a really smart guy named Yoav Hollander invented the e language to further automate chip verification. He developed the Specman tool to execute the language when linked with an HDL simulator, and formed InSpec (later named Verisity) to market the solution. Specman was introduced as a product in 1996 and it quickly gained favor with teams developing some of the biggest and baddest chips in the world. Specman and e represented a major shift in verification. Object-oriented programming (OOP) provided data encapsulation, inputs were randomized within the bounds of constraints, functional coverage constructs generated precise verification metrics, assertions monitored for unexpected conditions, and aspect-oriented programming (AOP) made it easier for users to add new functionality to existing testbenches.

Cadence acquired Verisity, standardized e as IEEE 1647, and added native support to its line of simulators. The language was a significant influence on SystemVerilog (IEEE 1800), but it seemed that many Specman users had no interest in changing. It wasn’t just because of different syntax; e has several key features, especially around AOP, that were not—and are still not—available in SystemVerilog. There are countless millions of lines of e code in use, and new code is being developed all the time for new projects and even new companies, as experienced verification engineers change jobs and are reluctant to lose the productivity gains they have seen. I checked with friends at Cadence and they confirmed this active usage, noting that they have recently added some new valuable e-related features to Specman Elite and their flagship Xcelium simulator.

The most common rap against e has been that it is a “single-vendor language” but that’s not really the case. Specman Elite enables e support for other simulators and there have been multiple companies over the years offering related tools, Verification IP, and services. One of these is AMIQ EDA, whose Design and Verification Tools (DVT) Eclipse Integrated Development Environment (IDE) includes e support. I touch base with their CEO Cristian Amitroaie every few months, so I asked him about the status of the language. Frankly, he surprised me a bit when he said that they have more than 1000 active users writing testbenches in e. They do have quite a few more SystemVerilog users, but the e-xperts remain e-nthusiastic and have no plans to give up the advantages they enjoy.

From Cristian’s viewpoint, e is just another in a long list of standard languages and formats they support, including Verilog and Verilog-AMS, SystemVerilog, VHDL, Portable Stimulus Standard (PSS), SystemC, Property Specification Language (PSL), the Universal Verification Methodology (UVM), and the Unified Power Format (UPF). He believes strongly that verification engineers using e have every right to expect the same sort of EDA tool features and support as their SystemVerilog and C/C++/SystemC colleagues. Accordingly, DVT Eclipse IDE provides a full range of capabilities. Users can search and use hyperlinks to navigate around the testbench code as well as the design being verified. They can take advantage of specialized OOP and AOP views showing hierarchies, inheritance, and extensions.

DVT Eclipse IDE compiles e code “on the fly” as it is typed in, reporting a wide range of syntactic and semantic errors. Cristian said that he is especially proud of the built-in language intelligence that allows the tool to suggest fixes for many classes of problems, from typographical errors and undeclared variables to errors in complex verification structures. For new constructs being added to the testbench, DVT Eclipse IDE provides easy-to-complete templates that enable correct-by-construction programming. Renaming verification elements is performed with no need for manual searching, and code can be automatically reformatted to satisfy project or corporate coding guidelines.

I found it fascinating to learn how popular e is and to see the high level of assistance available to the many verification engineers devoted to this well-proven solution. As we discussed recently, engineers today live in a polyglot world and it’s great to see AMIQ EDA stepping up to support such a wide range of language and formats as uniformly as possible.

To learn more, visit https://www.dvteclipse.com.

Also Read

The Polyglot World of Hardware Design and Verification

An Important Step in Tackling the Debug Monster

Debugging Hardware Designs Using Software Capabilities


Synopsys talks about their DesignWare USB4 PHY at TSMC’s OIP

Synopsys talks about their DesignWare USB4 PHY at TSMC’s OIP
by Tom Simon on 09-25-2020 at 6:00 am

USB4 operating modes

When USB initially came out it revolutionized how peripherals connect to host systems. We all remember when Apple did away with many separate connections for mouse, keyboard, audio and more with their first computers supporting USB. USB has continued to develop more flexibility and more throughput. In 2015 Apple again introduced the MacBook with just a single USB Type C connector and only a headphone jack. The Type C connector has been used for USB 3.2, but will now also be used for the latest USB specification – USB4. Synopsys recently gave an excellent presentation on USB4 and their DesignWare USB4 PHY IP at The TSMC OIP event. Despite all the changes and improvements in USB, each generation maintains compatibility with earlier versions. Gervais Fong, Director of Marketing at Synopsys, clearly described how backwards compatibility is maintained while impressive new features and performance are added.

In 1998 the first specification for USB 1.1 allowed data transfers of 1.5 or 12 Mbits/s. Leaping forward, USB4 supports all previous data rates and can run at 40 Gbits/s max aggregate bandwidth. One of the biggest additions are the USB4 host controller and device routers. Nevertheless, USB4 maintains bypasses for 1 and 2 lane legacy USB up to 20Gbits/s and 1, 2 or 4 lanes for DisplayPort 1.4 TX up to 20 Gbits/s. This permits older devices that do not use a USB router to still transfer data. USB4 also supports tunneling of PCIe, USB and DisplayPort at up to 40 Gbits/s. USB4 incorporates UMTI+ and PIPE5.

Gervais included a useful slide showing USB4’s five different operating modes. Rather than try to describe the five modes, the slide is included below. The trend of combining protocols is significant. It means that with a single connector high speed data for peripherals, networking, storage and displays are all supported. This improves the user experience and offers unmatched flexibility. A high level of interoperability is available because Apple and Intel are both contributing and supporting USB’s evolution.

Five Modes for DesignWare USB4 PHY

While the user experience is improving, chip designers who want to incorporate USB4 need to ensure that their USB silicon is fully compliant and has been completely verified. The USB4 PHY alone needs to support a dizzying array of operating modes, configurations, protocols and speeds. Gervais points out the USB4 PHY is not just handling USB, it is handing DisplayPort and Thunderbolt as well. The PHY has to interface with and be compatible with the router and controllers.

Synopsys has developed a DesignWare USB4 PHY that meets all of the specification’s requirement and is available on 12nm, 6/7nm and 5nm. It is built on an optimized, low power SerDes. Gervais said that they have over 100,000 CPU hours of simulation with Synopsys routers and controllers.

Gervais also talked about their test silicon from TSMC N5 that is now being tested. The PHY includes a programmable 3-tap Feed Forward Equalizer that is used to adjust the equalization for the various operating modes and frequencies. This is essential for meeting the USB4 PHY specifications. They have achieved first silicon success in TSMC N5P. The eye diagram for this silicon at 20 Gbits/s shows a wide open eye for TX. The receive path includes a Continuous Time Linear Equalizer and 1-tap Decision Feedback Equalizer with programmable settings.

The complete DesignWare USB4 solution from Synopsys includes PHYs, router, controller, verification IP and supporting subsystems. The talk presented a comprehensive overview of USB4 and its requirements, as well as an insightful look at the Synopsys DesignWare that supports interface development.

Also Read:

AI/ML SoCs Get a Boost from Synopsys IP on TSMC’s 7nm and 5nm

Parallel-Based PHY IP for Die-to-Die Connectivity

Making Full Memory IP Robust During Design


112G/56G SerDes – Select the Right PAM4 SerDes for Your Application

112G/56G SerDes – Select the Right PAM4 SerDes for Your Application
by Mike Gianfagna on 09-24-2020 at 10:00 am

112G56G SerDes Select the right PAM4 SerDes for your application

This is another installment covering TSMC’s very popular Open Innovation Platform event (OIP), held on August 25. This event presents a diverse and high-impact series of presentations describing how TSMC’s vast ecosystem collaborates with each other and with TSMC. Not all SerDes are the same. The presentation covered here, from Cadence, discusses the various flavors of LR, MR/VSR and XSR high speed SerDes and where they fit best. When it comes to 112G/56G SerDes, you really need to select the right PAM4 SerDes for your application.

The presentation was given by Wendy Wu, product marketing director at Cadence. Wendy has also worked in marketing and applications engineering at NetLogic Microsystems, Broadcom and Cavium. Wendy speaks with strong authority on the topic. She began her talk discussing a semiconductor law that is somewhat less know than Moore’s Law, but very relevant. Rent’s rule is based on internal memoranda at IBM from 1960. It basically says that the number of I/O pins tracks the number of gates/transistors. So, functionality increase requires I/O bandwidth to increase. This is why the topic is inherently important.

Wendy then discussed how high-speed interconnect is the backbone of cloud data centers. Higher throughput with lower latency and flat power describe the challenge. Wendy shared an interesting statistic – 85% of the traffic in a typical data center is between compute nodes in that data center.  Data communications is clearly a key item for continued growth in this huge market.

Looking at AI requirements for high-speed comms, 7nm and 5nm are the preferred nodes today, with 3nm around the corner. We are at the cutting edge here. Wendy then discussed the various applications for 56G and 112G SerDes. She touched on four areas:

Long reach: backplane applications – between processors and racks. Drive, performance and signal loss are key parameters here.

Medium reach: chip-to-chip and mid-range backplanes.

Very short reach: chip to module applications.

Extra short reach: die-to-die, system in package applications.

With regard to die-to-die communications, three methods were discussed. This technology is also an enabler for the growing chiplet market. There is the previously discussed PAM4 SerDes approach. NRZ serial interface is another approach. Finally, a parallel interface can be considered, similar to what is used for HBM stacks with a silicon interposer. Each of these approaches has its strengths and weaknesses.

Next, Wendy examined analog vs. digital equalizer architectures. An analog solution delivers better density and lower power but is susceptible to channel noise and can equalize up to 20db of loss. Analog-to-digital, DSP-based approaches are more stable and reliable. They can equalize up to 40db of loss. Traditionally, these solutions have been higher power than analog. Starting at 7nm and below, the power requirements of digital solutions are very similar to analog. With all this background, what is the best approach?  Clearly that depends on the application. Wendy provided a good overview of where each technology fits. This is captured in the diagram below.

Wendy then discussed the 56G and 112G offerings from Cadence, built by a best-in-class engineering team that is strong in both analog and digital techniques. The IP is fully compliant with relevant industry standards. She also pointed out that Cadence works with connector, cable and optical module suppliers to ensure good interoperability. Both 56G and 112G parts are proven with multiple test chips. She explained that the portfolio can support requirements from LR to XSR. These points are illustrated by the graphic at the top of this post.

Wendy went into some detail on the Cadence 112G-LR DSP SerDes. The key advantages are summarized in the figure below.

Wendy concluded with a discussion of the Cadence UltraLink D2D PHY IP. This IP can connect two designs through a multi-chip module or an organic substrate. The figure, below, summarizes the performance parameters of this IP.

You can learn more about  how to select the right PAM4 SerDes for your application and the Cadence IP portfolio here.

Also Read:

Lip-Bu Hyperscaler Cast Kicks off CadenceLIVE

How does TensorFlow Lite on Tensilica HiFi DSP IP Sound?

Ultra-Short Reach PHY IP Optimized for Advanced Packaging Technology


Verifying Warm Memory. Virtualizing to manage complexity

Verifying Warm Memory. Virtualizing to manage complexity
by Bernard Murphy on 09-24-2020 at 6:00 am

Verifying warm memory

SSD memory is enjoying a new resurgence in datacenters through NVMe. Not as a replacement for more traditional HDD disk drives, which though slower are still much cheaper. NVMe storage has instead become a storage cache between hot DRAM memory close to processors and the “cold” HDD storage. I commented last year on why this has become important for the hyperscalers. Cloud throughput and therefore revenues are heavily impacted by storage latencies, which makes fast storage cache a high priority. Which creates implications for verifying warm memory – proving your solution will deliver what it promises.

You start to wonder what other operations you could offload into storage. SQL serving for example. Database operations work on lots of data which can dominate latency (and power) if you first have to drag it all over to the processor. It’s faster and lower power to do the bulk of the heavy lifting right in the NVMe unit. I’ve even seen a recent suggestion that linear algebra could be moved into SQL, from which it would be a short jump to push it into NVMe. Another paper suggests an architecture to accelerate big data computation using this kind of approach.

Architecture complexity

It seems there is no limit to what we can do with computation close to storage, when we put our minds to it. All of which makes that NVMe memory much more powerful. The downside is that verifying warm memory implementations, already complex, becomes even more complex.

First there’s the architecture complexity. One of these devices may service multiple hosts and many I/O queues. It must provide a similar level of security to that offered by the hosts including at least encryption, perhaps a hardware root of trust and other features to harden the device against attacks.

Implementation complexity

Then there’s the implementation complexity. It must deal with the NVMe interface, encryption, logical to physical address mapping, wear-leveling, garbage collection, interface with local DRAM through DDR (to store data while it’s doing garbage collection) and so on. This is a full-blown processor in its own right.  As if that weren’t enough, you can’t just model the flash as perfect memory. Reading a bit can return a soft error to which the controller must adapt. According to the Mentor Veloce folks, design teams need to model flash bit behavior down to this level of accuracy in order to have full confidence in their system-level testing. Mentor provide soft models for NAND, NOR and DDR to represent these components.

Traffic complexity

Finally, there’s traffic complexity. A verification plan must also model traffic with all the variations you might expect to see in those loads from the host (one or more servers), connected through a PCIe interface. For benchmarking this requires running a standard I/O load like IOmeter, FIO or CrystalMark. Measuring throughput, latencies, all the factors you are aiming to improve through use of warm memories.

Put all of this together and you have a big verification task – virtual host and an SSD simulation model which you have to run in emulation to deliver the kind of throughput you need for this volume of verification. Ben Whitehead, Storage Products Specialist at Mentor, has written a white-paper, “Virtual Verification of Computational Storage Devices”, to describe the Veloce solution they have assembled to address this need. With a bunch of application-specific features for measurement, checking and debug.  An interesting read for anyone working in this hot domain.

Also Read:

Trusted IoT Ecosystem for Security – Created by the GSA and Chaired by Mentor/Siemens

Emulation as a Service Benefits New AI Chip

WEBINAR: Addressing Verification Challenges in the Development of Optimized SRAM Solutions with surecore and Mentor Solido


Update on Mentor’s Acquisition of Avatar Integrated Systems

Update on Mentor’s Acquisition of Avatar Integrated Systems
by Daniel Nenni on 09-23-2020 at 10:00 am

route centric architecture

Mentor Graphics, a Siemens Business, has completed their acquisition of EDA company Avatar Integrated Systems.  I recently spoke with Joe Sawicki, Executive VP of the Mentor IC EDA segment, about the acquisition strategy and IC Design platform goals for integration of the Avatar products.

Avatar (formerly ATopTech) focused on physical implementation tools for complex, digital SoC designs – e.g., floorplanning, placement, clock-tree synthesis, routing, and ECO flows.  Specifically, the foundation of the Aprisa Product was to build their physical algorithms on a route-centric, hierarchical data model.  The right-hand side of the figure below highlights the Avatar strategy.

The Aprisa SAPR input data is a simple LEF/DEF design model from a (physical-aware) logic synthesis toolset.  From the synthesis netlist, Aprisa applies optimizations that focus on ensuring subsequent routability – e.g., congestion avoidance, pin access, adherence to multipatterning decomposition coloring.  An internal physical DRC verification engine is applied.  A diverse set of clock tree design styles are available, including useful clock skew timing optimizations throughout.

An internal synthesis engine allows for further optimization.  The input netlist placement assumptions may not accurately reflect the route impact of congestion, R*C delays, and clock skews.  Logic restructuring based on the routing model may be needed.  The tool incorporates static timing, noise, IR, and EM analysis algorithms to guide placement and route assignment decisions.

Joe indicated, “Designers of complex SoCs at advanced nodes are seeking the following from their APR flow – better synthesis-to-post route timing correlation, no coupling noise issues, no DRC violations, in short, fewer APR iterations and faster time to closure.  We benchmarked Aprisa, and found the PPA results to be excellent.  The learning curve was extremely quick.  We had competitive evaluation data within a few weeks.” 

The figure above illustrates the pre-route (Steiner estimate) to post-route timing correlation on the Mentor benchmarks at the 7nm node.

Joe then described the IC Design product strategy.  “The Nitro-SoC platform will be supported through the 16/14nm node.  Going forward, Aprisa will be the SAPR solution for 7nm and below.  The DRC engine that was internal to Aprisa will be replaced by Calibre InRoute.”

Joe continued, “The strength of the combined engineering and support teams will offer roadmap stability and continuity to customers, who may have been anxious given the relatively small size of Avatar’s team.  Mentor will leverage its relationship with the foundries to extend the Aprisa product certification for advanced process nodes.”

With regards to the competitive position of the new offering, relative to the integrated platforms available for physical implementation, Joe said, “Designers want an APR tool that is feature-rich and easy to use.  The route-centric data model and optimization algorithms in Aprisa provide faster closure and signoff accurate results.  The use of a physical-aware (placement-centric) synthesis flow is a good start, but the set of optimizations available is a key differentiator, specifically route-aware logic re-synthesis.  Refinement is where you get considerable value.  We’ve already flipped customers from other products.” 

It will be interesting to track how Aprisa emerges in the reference flow certification from the foundries, and how the route-centric with logic re-synthesis methodology evolves as a point tool solution.  Mentor’s acquisition of Avatar expands the scope and future development of SAPR offerings.  More competition among EDA providers is always a good thing for the IC design community.


Executive Interview: Vic Kulkarni of ANSYS

Executive Interview: Vic Kulkarni of ANSYS
by Daniel Nenni on 09-23-2020 at 6:00 am

Ansys Ideas 1

On the eve of the Innovative Designs Enabled by Ansys Semiconductor (IDEAS) Forum I spoke with Vic on a range of topics including his opening keynote: Accelerating Moore and Beyond Moore with Multiphysics. You can register here

Vic Kulkarni is Vice President and Chief Strategist, Semiconductor Business Unit, Ansys, San Jose. CA. Vic is responsible for steering the business, technology, go-to-market and product strategy, connecting the dots from chip-package-system design solutions with ANSYS multi-physics simulation technology to address challenges faced by multiple verticals, including 5G, AI, HPC, mobile and autonomous. He drives strategic customer executive relationships and acquisitions with Ansys leadership team.

Q: What are the key trends which are shaping your business?
Hi-tech sector remains strong.

We are witnessing a renaissance in semiconductor and electronic systems. We see an emerging duality between Moore’s Scaling Law and the Beyond Moore trend.

On the one hand, compute-intensive demands by a range of markets – including HPC, cloud, storage, autonomous vehicles, 5G, and ML/AI – are driving scaling feature sizes down to 5à4à and now 3nm as Tier-1 semis and hyper-scalers continue to invest in semiconductors. This is due to increased workloads of HPC cloud compute, networking storage, 5G, AI training and inferencing chips like Google TPU.

At the same time, there is an accelerating trend to go Beyond Moore with 2.5/3D ICs, chiplets, and other multi-die configurations driven by edge compute, 3D intelligent sensors for autonomous, and high-bandwidth, low-latency, power, area and cost-sensitive applications.

We believe that pervasive multiphysics simulation and analysis in all phases of the design cycle from ideation to lifecycle management will be an important enabler to accelerate innovation and achieve silicon-to-system success.

Q: How are customers responding to the pandemic?
Despite COVID-19, we kept focusing on our customer support excellence delivery and achieved significant success in pre sales campaigns, customer design tape-outs and customer technical collaboration.

A few cash-poor startups are affected by COVID-19, but that’s a small fraction of our business. We see a great momentum of our RedHawk-SC flagship PI-SI signoff product in China. We completed 9 evaluations and have several ongoing/planned product evaluations.

Automotive electronics remains on track, as these companies continue to invest in R&D that enable autonomy.

Q: Tell me more about your upcoming opening keynote for the IDEAS Digital Forum.
Vic took me through his presentation which is a great set-up for the first day. He starts with a brief overview of the Ansys Multiphysics Simulation Platform and moves into the benefits of a simulation-driven design from Concept to Design to Validation and the resulting savings. ANSYS has a broad range of customers so these numbers are VERY impressive.

Vic then talks about custom chips by systems companies for differentiation and faster TTM, semiconductor megatrends and technology challenges. The airplane graphic above explains it quite well (ANSYS tools are on the wings).

Bottom line: ANSYS is an important part of the leading edge semiconductor ecosystem for simulation, AI/ML, HPC, 5G, hardware security and autonomous vehicles. And while I miss the ANSYS live events (great food and networking) the ANSYS virtual events are must attend, absolutely.

Also Read

World’s Leading Chip Designers at IDEAS Digital Forum Show How to Streamline Design Flows and Reduce Design Cost

Ansys Multiphysics Platform Tackles Power Management ICs

Qualcomm on Power Estimation, Optimizing for Gaming on Mobile GPUs


AI/ML SoCs Get a Boost from Synopsys IP on TSMC’s 7nm and 5nm

AI/ML SoCs Get a Boost from Synopsys IP on TSMC’s 7nm and 5nm
by Mike Gianfagna on 09-22-2020 at 10:00 am

AIML SoCs Get a Boost from Synopsys IP on TSMCs 7nm and 5nm

This is another installment covering TSMC’s very popular Open Innovation Platform event (OIP), held on August 25. This event presents a diverse and high-impact series of presentations describing how TSMC’s vast ecosystem collaborates with each other and with TSMC. The presentation covered here from Synopsys focuses on the unique needs of training and inference for AI/ML engines. The algorithms implemented by these designs have very specific requirements. Meeting those requirements demands specialized IP. These special needs and the optimized Synopsys DesignWare IP are discussed to illustrate how AI/ML SoCs get a boost from Synopsys IP on TSMC’s 7nm and 5nm processes.

The presentation was given by Faisal Goriawalla, senior product marketing manager at Synopsys. Faisal has over 18 years of engineering and marketing experience in embedded physical IP libraries and non-volatile RAM. He started his career developing embedded SRAM memory compilers and before Synopsys held various technical and marketing positions for memories, standard cells and I/O libraries at ARM. Faisal’s strong background inspires confidence.

Faisal began his presentation focusing on the unique requirements of deep learning and convolutional neural networks (CNNs). He explained that CNNs create a mathematical graph of a problem and train it with a data set of known values. The process begins with training the network, which is compute intensive and then proceeds to inference, where the trained model is deployed. He went into a very good explanation of the requirements of various AI problems with regard to performance, model compression and power. The diagram below summarizes this discussion.

He then explained some of the aspects of a CNN and how it is used to process two-dimensional data. This segment of the presentation provides a very good overview of AI algorithms. I recommend watching it if this is of interest.

Faisal then discussed some of the design challenges for AI chips. Of course, power and area are key items, along with a predictable schedule. He pointed out that an application-aware approach is needed to meet these goals. Some of the items to consider with an approach like this include:

  • Choosing the right mix of VTs-Lg-tracks
  • Converging on an optimal floorplan
  • Managing congestion in multiply-accumulate blocks (MACs)
  • Navigating the RTL to GDSII flow
  • Achieving PPA targets

Faisal went into some detail on these points. The discussion then turned to application-aware IP, what is needed, and what the benefits will be. From an IP component point of view, what is needed to achieve PPA targets includes:

  • Low power memories, especially for Read
  • Low power combo cells to reduce internal energy
  • Complex combinational cells to reduce switching power
  • Special clock gates with lower internal power
  • Granular delay cells to reduce the area and power cost of hold fix
  • Multi-bit flops to reduce active power

From a methodology point of view, what is needed includes:

  • Choice of VT-Lg to give a good starting point on PPA
  • Power recovery post-route to reduce leakage
  • Flow stage correlation never adds >10% to any metric

Faisal then discussed some of the DesignWare IP solutions from Synopsys to address these requirements:

HPC Kit Enhanced for AI Applications

This package includes IP for object detection and recognition. There are special cells to reduce CNN power consumption up to 39%. Tradeoff tuning enables a 7% frequency boost with 28% lower power. The figure below summarizes some of the benefits of the HPC Kit. This IP is typically used for ADAS applications.

Memory Architectures

The benefits of customizing memory architectures to optimize PPA for AI designs was also discussed. Synopsys offers a wide range of architectures, bitcells, VTs and PVTs here, including:

  • Ultra-high density, high density and high speed
  • Small (128Kb) range register file
  • Large (>1Mb) range SRAM
  • UHD 2-port memories provide FIFO functionality with smaller area & lower leakage at slower speeds
  • Configurable multi-port memories

GPIO Libraries

AI designs are typically core limited (as opposed to pad limited). Inline I/O libraries with a less height and more width form factor are optimal to reduce SoC area for this situation. Synopsys offers DesignWare IO Libraries with:

  • High (up to 250MHz) performance and high drive strengths for additional margin while supporting longer trace lengths
  • Support for 1.8V, 2.5V and 3.3V I/O supplies (technology dependent) for other interfaces on an AI/ML SoC

DFT

The ability to integrate an on-chip test and repair engine is important for reducing area and power in AI applications. The Synopsys STAR Memory System provides this support. Total core area can be reduced by ~7% and dynamic power can be reduced by ~12%.

Conclusion

Faisal concluded by explaining that the IP discussed is silicon-proven in volume at TSMC 7nm and test silicon proven at TSMC 5nm. You can learn more about Synopsys DesignWare IP for AI here. You can access the TSMC OIP presentations here. AI/ML SoCs truly get a boost from Synopsys IP on TSMC’s 7nm and 5nm.

Also Read:

Parallel-Based PHY IP for Die-to-Die Connectivity

Making Full Memory IP Robust During Design

ARC Processor Virtual Summit!


Bug Trace Minimization. Innovation in Verification

Bug Trace Minimization. Innovation in Verification
by Bernard Murphy on 09-22-2020 at 6:00 am

innovation min

A checker tripped in verification. Is there a bug trace minimization technique to simplify manual debug? Paul Cunningham (GM, Verification at Cadence), Jim Hogan and I continue our series to highlight all the great research that’s out there in verification. Feel free to comment.

The Innovation

This month’s pick is Simulation-Based Bug Trace Minimization With BMC-Based Refinement. We found this paper in IEEE Transactions on CAD, 2005. The authors are/were from the University of Michigan. This is an old paper but still intriguing.

Debug, tracing back from an identified bug to the root cause, is  the biggest time-sink in verification. Any contribution to reducing that time will have high value. The authors’ approach starts with a waveform trace from a simulation or semi-formal analysis. It aims to reduce the trace to a much shorter trace that still triggers the bug, an easier starting point for manual debug.

The paper describes four simulation-based and one BMC-based technique to reduce traces. They first reduce traces by removing cycles, re-simulating each time to check the bug still reproduces. At first glance this looks very unscalable, requiring O(N2) runs for a trace of N cycles. They greatly reduce this complexity first by hashing the circuit state at each cycle. They then watch for hash matches between the original trace and a re-simulation of a candidate reduced trace. If a previously hashed state is hit during re-simulation, they know that the bug can be reached from that hashed state. They can abort the simulation since it can still trigger the bug, i.e. the reduction is proven viable.

Through this process they look for any variant trace which triggers the check sooner, which becomes a new and shorter reference trace. Alternatively it may hit a state already seen in an earlier analysis at a later clock cycle. Then this trace can skip ahead, also leading to a shorter reference trace. In a final high-effort simulation step, they also look for opportunities to drop input events (rather than whole cycles), as shown in table 3.

Using common datapath functions, FPU, DES and a picoJava engine as benchmarks, the authors show impressive reductions in cycle lengths, better than 98% in most cases and better than 99% in all cases in removing unnecessary inputs. Runtime on most tests was under a minute. The most complex (picoJava) was 10hrs for 30k cycles. Reduced traces were mostly under 10 cycles.

Paul’s view

This reminds me of an earlier paper we discussed “Using AI to locate a fault”. Both combine multiple methods in fault tracing, getting more out of the combination than out of any one method alone. This approach combines five methods, four simulation-based and one BMC-based. The results, e.g. in figure 12, clearly show how different techniques have different impact on different testcases. Which underlines that you really need all these methods. For a commercial vendor this looks very practical, a fusion of methods rather than a single super-method. Tables 5 and 6 show the bulk of reduction coming from simulation methods and a smaller incremental benefit from BMC. Also encouraging for commercial mass deployments given scalability considerations for model-checking.

Intuitively the method makes a lot of sense. State hashing and looking for matches will almost certainly be very effective on randomized simulations. It’s the classic computer science random walk problem where the drunken man walking randomly is going to do a lot of circling relative to the amount of actual useful distance moved. All these circles quickly prune away by looking for state hash matches, which should massively reduce practical runtimes.

In their experiments picoJava (around 140k gates) ran in 10 hours. That was running a 1GHz Sun blade (2005 remember). Now 3GHz servers are typical, so you’re looking at 500K gates in 10 hours for a single CPU job on a modern server farm. The algorithm is also very parallelizable, so can be scaled up by just farming re-simulation jobs out to multiple servers, sharing the same state hash database. Which makes it very commercially interesting.

Jim’s view

Moving this into the cloud seems a great way to speed up and reach for larger designs. Thinking about it, 10 hours is an overnight run. You could slipstream these runs in behind regression runs. By the time the verification team had a chance to look at them, most or all of those traces would already be reduced.

On investment, this naturally fits into the verification suite, so it’s not an independent product. A bit of a challenge is that this is speeding up something engineers already do rather than making something possible that wasn’t possible before. Can it in some way show a huge improvement in productivity? Maybe add an AI angle for 100X improvement over time? That could enhance appeal for an investor.

My view

First, while a lot of good ideas come from software verification, it’s nice to see some coming from hardware verification. Second, I had been looking mostly for recent papers. Paul pushed me to look at some older papers as well. Good intuition!

Click HERE to see the previous Innovation blog

Also Read

Anirudh CadenceLIVE Plays Up Computational Software

Lip-Bu Hyperscaler Cast Kicks off CadenceLIVE

Quick Error Detection. Innovation in Verification


WEBINAR: UVM RISC-V and DV

WEBINAR: UVM RISC-V and DV
by Daniel Payne on 09-21-2020 at 10:00 am

UVM Testbench RISC-V

Oh, our semiconductor industry just loves acronyms, and the title of my blog packs three of the most popular acronyms together at once. I attended a webinar hosted by Aldec last week on this topic, “UVM Simulation-based environment for Ibex RISC-V CPU core with Google RISC-V DV“. Verification engineers have been adopting the Universal Verification Methodology in order to make their verification results more robust, in less time.

RISC-V continues to grow in importance as an open source, Instruction Set Architecture (ISA), and at the dac.com site there are some 3,110 search results for RISC-V. I just expect this trend to continue, because engineers often want to customize aspects of their SoC for a specific purpose or domain. A big question then arises on how do you actually verify a RISC-V project.

Google has created a SV/UVM based instruction generator for RISC-V processor verification, then posted it on GitHub. There have been some 765 commits, so this is an actively supported instruction generator.

There are many RISC-V core projects around the world to choose from, and Ibex is a small, 32-bit RISC-V core, also available on Github with 1,860 commits to date.

Using Riviera-PRO Aldec simulates the UVM testbench with the Google DV random instruction generator and Ibex RISC-V core.

Source: https://ibex-core.readthedocs.io/en/latest/verification.html

In the testbench SV classes are blocks with rounded corners, while SV modules are shown as square corners, finally the code to be run is depicted in blue with folded corners.

Random commands come from the Google DV generator, and the testbench also has random interrupts during testing. The co-simulation flow has both an ISS and RTL loaded with test binaries, simulations are run, then the results are compared by a Python script. You can have the same verification experience if you assemble all of the pieces:

  • SystemVerilog simulator that supports UVM (i.e. Riviera-PRO)
  • Instruction Set Simulator (Spike or OVPsim)
  • RISC-V toolchain

Here’s the flow of tools and files used for verification:

The second half of the webinar was showing an actual, live demo of this verification flow in action, running the makefiles, scripts, simulators and comparison on a Linux platform. Batch mode verification was shown first verifying the Ibex core, then GUI mode was run next, and in both cases there were zero mismatches between the ISS and Riviera-PRO simulator results.

The GUI for Riviera-PRO had multiple windows: Source code, Classes, Assertions, Messages. In the upper right is the Classes Window, showing instances and hierarchy, methods, properties, derived classes and base classes.

A very useful documentation feature was how a UVM graph could be auto-generated within Riviera-PRO, as it showed memory interface agents, one interrupt agent, connections, and then stepped into instances for more details to understand connectivity.

An assertions window showed all assertions in one place, without having to look at multiple files, while seeing any failures, passes, and the time when it last happened, quite useful for debugging.

Next, the waveform viewer was invoked after restarting simulation, and they added waveforms from the DUT.

Finally, they showed the RTL code coverage after simulation had finished, then generated an HTML cumulative summary.

Summary

RISC-V is one of the biggest topics of 2020 for the electronics industry, and the ecosystem continues to grow each day, but verification can be a burden. Aldec showed in this webinar how their SystemVerilog simulator along with other tools could be used in verifying a RISC-V core called Ibex.

I’ve included links to each open source tool on Github, so go explore on your own and save some verification time, instead of starting from scratch.

To watch the archived webinar, visit here.

Related Blogs