I imagine that the title of this post will remind many of 80s synth-pop, or perhaps the movie The Breakfast Club. But my topic is the venerable hardware verification language (HVL) known simply as e. It has quite an interesting history and it played a key role in the development of the modern testbench methodology that most chip verification… Read More
WP_Query Object ( [query] => Array ( [page_id] => author/eric-esteve-215.html ) [query_vars] => Array ( [page_id] => 0 [error] => [m] => [p] => 0 [post_parent] => [subpost] => [subpost_id] => [attachment] => [attachment_id] => 0 [name] => [pagename] => [second] => [minute] => [hour] => [day] => 0 [monthnum] => 0 [year] => 0 [w] => 0 [category_name] => [tag] => [cat] => [tag_id] => [author] => [author_name] => [feed] => [tb] => [paged] => 0 [meta_key] => [meta_value] => [preview] => [s] => [sentence] => [title] => [fields] => [menu_order] => [embed] => [category__in] => Array ( ) [category__not_in] => Array ( ) [category__and] => Array ( ) [post__in] => Array ( ) [post__not_in] => Array ( ) [post_name__in] => Array ( ) [tag__in] => Array ( ) [tag__not_in] => Array ( ) [tag__and] => Array ( ) [tag_slug__in] => Array ( ) [tag_slug__and] => Array ( ) [post_parent__in] => Array ( ) [post_parent__not_in] => Array ( ) [author__in] => Array ( ) [author__not_in] => Array ( ) [ignore_sticky_posts] => [suppress_filters] => [cache_results] => [update_post_term_cache] => 1 [lazy_load_term_meta] => 1 [update_post_meta_cache] => 1 [post_type] => [posts_per_page] => 10 [nopaging] => [comments_per_page] => 50 [no_found_rows] => [order] => DESC ) [tax_query] => WP_Tax_Query Object ( [queries] => Array ( ) [relation] => AND [table_aliases:protected] => Array ( ) [queried_terms] => Array ( ) [primary_table] => wp5_posts [primary_id_column] => ID ) [meta_query] => WP_Meta_Query Object ( [queries] => Array ( ) [relation] => [meta_table] => [meta_id_column] => [primary_table] => [primary_id_column] => [table_aliases:protected] => Array ( ) [clauses:protected] => Array ( ) [has_or_relation:protected] => ) [date_query] => [queried_object] => [queried_object_id] => [request] => SELECT SQL_CALC_FOUND_ROWS wp5_posts.ID FROM wp5_posts WHERE 1=1 AND wp5_posts.post_type = 'post' AND (wp5_posts.post_status = 'publish' OR wp5_posts.post_status = 'expired' OR wp5_posts.post_status = 'tribe-ea-success' OR wp5_posts.post_status = 'tribe-ea-failed' OR wp5_posts.post_status = 'tribe-ea-schedule' OR wp5_posts.post_status = 'tribe-ea-pending' OR wp5_posts.post_status = 'tribe-ea-draft') ORDER BY wp5_posts.post_date DESC LIMIT 0, 10 [posts] => Array (  => WP_Post Object ( [ID] => 291250 [post_author] => 28 [post_date] => 2020-09-25 10:00:10 [post_date_gmt] => 2020-09-25 17:00:10 [post_content] => I imagine that the title of this post will remind many of 80s synth-pop, or perhaps the movie The Breakfast Club. But my topic is the venerable hardware verification language (HVL) known simply as e. It has quite an interesting history and it played a key role in the development of the modern testbench methodology that most chip verification engineers use today. I was wondering about the language and where it stands now, and I thought that it would be an interesting topic for a blog post. Let me start with the history. By the late 1980s, functional verification was hitting a wall. In the days of small chips, at best the designers might have hand-written some interesting input values or sequences, run them in simulation, and looked at waveforms to check the results. As chips grew bigger, this was no longer enough. Project managers saw value in separating design and verification, and during the second half of the 80s dedicated verification engineers became more common. They generally started with a verification plan in a spreadsheet or document, iterating all the features to be verified. The engineers hand-wrote tests for these features, checking them off as they ran and passed in simulation. Verification teams gradually developed more automated methods, including randomized input data and self-checking tests. They started using hardware description language (HDL) line coverage to see how well the tests exercised the design, and some of the more advanced teams added ad hoc functional coverage metrics such as reporting which states in a finite state machine (FSM) had been visited. In the early 90s, a really smart guy named Yoav Hollander invented the e language to further automate chip verification. He developed the Specman tool to execute the language when linked with an HDL simulator, and formed InSpec (later named Verisity) to market the solution. Specman was introduced as a product in 1996 and it quickly gained favor with teams developing some of the biggest and baddest chips in the world. Specman and e represented a major shift in verification. Object-oriented programming (OOP) provided data encapsulation, inputs were randomized within the bounds of constraints, functional coverage constructs generated precise verification metrics, assertions monitored for unexpected conditions, and aspect-oriented programming (AOP) made it easier for users to add new functionality to existing testbenches. Cadence acquired Verisity, standardized e as IEEE 1647, and added native support to its line of simulators. The language was a significant influence on SystemVerilog (IEEE 1800), but it seemed that many Specman users had no interest in changing. It wasn’t just because of different syntax; e has several key features, especially around AOP, that were not—and are still not—available in SystemVerilog. There are countless millions of lines of e code in use, and new code is being developed all the time for new projects and even new companies, as experienced verification engineers change jobs and are reluctant to lose the productivity gains they have seen. I checked with friends at Cadence and they confirmed this active usage, noting that they have recently added some new valuable e-related features to Specman Elite and their flagship Xcelium simulator. The most common rap against e has been that it is a “single-vendor language” but that’s not really the case. Specman Elite enables e support for other simulators and there have been multiple companies over the years offering related tools, Verification IP, and services. One of these is AMIQ EDA, whose Design and Verification Tools (DVT) Eclipse Integrated Development Environment (IDE) includes e support. I touch base with their CEO Cristian Amitroaie every few months, so I asked him about the status of the language. Frankly, he surprised me a bit when he said that they have more than 1000 active users writing testbenches in e. They do have quite a few more SystemVerilog users, but the e-xperts remain e-nthusiastic and have no plans to give up the advantages they enjoy. From Cristian’s viewpoint, e is just another in a long list of standard languages and formats they support, including Verilog and Verilog-AMS, SystemVerilog, VHDL, Portable Stimulus Standard (PSS), SystemC, Property Specification Language (PSL), the Universal Verification Methodology (UVM), and the Unified Power Format (UPF). He believes strongly that verification engineers using e have every right to expect the same sort of EDA tool features and support as their SystemVerilog and C/C++/SystemC colleagues. Accordingly, DVT Eclipse IDE provides a full range of capabilities. Users can search and use hyperlinks to navigate around the testbench code as well as the design being verified. They can take advantage of specialized OOP and AOP views showing hierarchies, inheritance, and extensions. DVT Eclipse IDE compiles e code “on the fly” as it is typed in, reporting a wide range of syntactic and semantic errors. Cristian said that he is especially proud of the built-in language intelligence that allows the tool to suggest fixes for many classes of problems, from typographical errors and undeclared variables to errors in complex verification structures. For new constructs being added to the testbench, DVT Eclipse IDE provides easy-to-complete templates that enable correct-by-construction programming. Renaming verification elements is performed with no need for manual searching, and code can be automatically reformatted to satisfy project or corporate coding guidelines. I found it fascinating to learn how popular e is and to see the high level of assistance available to the many verification engineers devoted to this well-proven solution. As we discussed recently, engineers today live in a polyglot world and it’s great to see AMIQ EDA stepping up to support such a wide range of language and formats as uniformly as possible. To learn more, visit https://www.dvteclipse.com. [post_title] => Don’t You Forget About “e” [post_excerpt] => [post_status] => publish [comment_status] => open [ping_status] => open [post_password] => [post_name] => dont-you-forget-about-e [to_ping] => [pinged] => [post_modified] => 2020-09-25 07:35:26 [post_modified_gmt] => 2020-09-25 14:35:26 [post_content_filtered] => [post_parent] => 0 [guid] => https://semiwiki.com/?p=291250 [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [filter] => raw )  => WP_Post Object ( [ID] => 291216 [post_author] => 13 [post_date] => 2020-09-25 06:00:24 [post_date_gmt] => 2020-09-25 13:00:24 [post_content] => When USB initially came out it revolutionized how peripherals connect to host systems. We all remember when Apple did away with many separate connections for mouse, keyboard, audio and more with their first computers supporting USB. USB has continued to develop more flexibility and more throughput. In 2015 Apple again introduced the MacBook with just a single USB Type C connector and only a headphone jack. The Type C connector has been used for USB 3.2, but will now also be used for the latest USB specification – USB4. Synopsys recently gave an excellent presentation on USB4 and their DesignWare USB4 PHY IP at The TSMC OIP event. Despite all the changes and improvements in USB, each generation maintains compatibility with earlier versions. Gervais Fong, Director of Marketing at Synopsys, clearly described how backwards compatibility is maintained while impressive new features and performance are added. In 1998 the first specification for USB 1.1 allowed data transfers of 1.5 or 12 Mbits/s. Leaping forward, USB4 supports all previous data rates and can run at 40 Gbits/s max aggregate bandwidth. One of the biggest additions are the USB4 host controller and device routers. Nevertheless, USB4 maintains bypasses for 1 and 2 lane legacy USB up to 20Gbits/s and 1, 2 or 4 lanes for DisplayPort 1.4 TX up to 20 Gbits/s. This permits older devices that do not use a USB router to still transfer data. USB4 also supports tunneling of PCIe, USB and DisplayPort at up to 40 Gbits/s. USB4 incorporates UMTI+ and PIPE5. Gervais included a useful slide showing USB4’s five different operating modes. Rather than try to describe the five modes, the slide is included below. The trend of combining protocols is significant. It means that with a single connector high speed data for peripherals, networking, storage and displays are all supported. This improves the user experience and offers unmatched flexibility. A high level of interoperability is available because Apple and Intel are both contributing and supporting USB’s evolution. [caption id="attachment_291218" align="aligncenter" width="721"] Five Modes for DesignWare USB4 PHY[/caption] While the user experience is improving, chip designers who want to incorporate USB4 need to ensure that their USB silicon is fully compliant and has been completely verified. The USB4 PHY alone needs to support a dizzying array of operating modes, configurations, protocols and speeds. Gervais points out the USB4 PHY is not just handling USB, it is handing DisplayPort and Thunderbolt as well. The PHY has to interface with and be compatible with the router and controllers. Synopsys has developed a DesignWare USB4 PHY that meets all of the specification’s requirement and is available on 12nm, 6/7nm and 5nm. It is built on an optimized, low power SerDes. Gervais said that they have over 100,000 CPU hours of simulation with Synopsys routers and controllers. Gervais also talked about their test silicon from TSMC N5 that is now being tested. The PHY includes a programmable 3-tap Feed Forward Equalizer that is used to adjust the equalization for the various operating modes and frequencies. This is essential for meeting the USB4 PHY specifications. They have achieved first silicon success in TSMC N5P. The eye diagram for this silicon at 20 Gbits/s shows a wide open eye for TX. The receive path includes a Continuous Time Linear Equalizer and 1-tap Decision Feedback Equalizer with programmable settings. The complete DesignWare USB4 solution from Synopsys includes PHYs, router, controller, verification IP and supporting subsystems. The talk presented a comprehensive overview of USB4 and its requirements, as well as an insightful look at the Synopsys DesignWare that supports interface development. [post_title] => Synopsys talks about their DesignWare USB4 PHY at TSMC's OIP [post_excerpt] => [post_status] => publish [comment_status] => open [ping_status] => open [post_password] => [post_name] => designware-usb4-phy [to_ping] => [pinged] => [post_modified] => 2020-09-24 11:51:03 [post_modified_gmt] => 2020-09-24 18:51:03 [post_content_filtered] => [post_parent] => 0 [guid] => https://semiwiki.com/?p=291216 [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 1 [filter] => raw )  => WP_Post Object ( [ID] => 291100 [post_author] => 11830 [post_date] => 2020-09-24 10:00:20 [post_date_gmt] => 2020-09-24 17:00:20 [post_content] =>
This is another installment covering TSMC’s very popular Open Innovation Platform event (OIP), held on August 25. This event presents a diverse and high-impact series of presentations describing how TSMC’s vast ecosystem collaborates with each other and with TSMC. Not all SerDes are the same. The presentation covered here, from Cadence, discusses the various flavors of LR, MR/VSR and XSR high speed SerDes and where they fit best. When it comes to 112G/56G SerDes, you really need to select the right PAM4 SerDes for your application.
The presentation was given by Wendy Wu, product marketing director at Cadence. Wendy has also worked in marketing and applications engineering at NetLogic Microsystems, Broadcom and Cavium. Wendy speaks with strong authority on the topic. She began her talk discussing a semiconductor law that is somewhat less know than Moore’s Law, but very relevant. Rent’s rule is based on internal memoranda at IBM from 1960. It basically says that the number of I/O pins tracks the number of gates/transistors. So, functionality increase requires I/O bandwidth to increase. This is why the topic is inherently important.
Wendy then discussed how high-speed interconnect is the backbone of cloud data centers. Higher throughput with lower latency and flat power describe the challenge. Wendy shared an interesting statistic – 85% of the traffic in a typical data center is between compute nodes in that data center. Data communications is clearly a key item for continued growth in this huge market.
Looking at AI requirements for high-speed comms, 7nm and 5nm are the preferred nodes today, with 3nm around the corner. We are at the cutting edge here. Wendy then discussed the various applications for 56G and 112G SerDes. She touched on four areas:
Long reach: backplane applications – between processors and racks. Drive, performance and signal loss are key parameters here.
Medium reach: chip-to-chip and mid-range backplanes.
Very short reach: chip to module applications.
Extra short reach: die-to-die, system in package applications.
With regard to die-to-die communications, three methods were discussed. This technology is also an enabler for the growing chiplet market. There is the previously discussed PAM4 SerDes approach. NRZ serial interface is another approach. Finally, a parallel interface can be considered, similar to what is used for HBM stacks with a silicon interposer. Each of these approaches has its strengths and weaknesses.
Next, Wendy examined analog vs. digital equalizer architectures. An analog solution delivers better density and lower power but is susceptible to channel noise and can equalize up to 20db of loss. Analog-to-digital, DSP-based approaches are more stable and reliable. They can equalize up to 40db of loss. Traditionally, these solutions have been higher power than analog. Starting at 7nm and below, the power requirements of digital solutions are very similar to analog. With all this background, what is the best approach? Clearly that depends on the application. Wendy provided a good overview of where each technology fits. This is captured in the diagram below.
Wendy then discussed the 56G and 112G offerings from Cadence, built by a best-in-class engineering team that is strong in both analog and digital techniques. The IP is fully compliant with relevant industry standards. She also pointed out that Cadence works with connector, cable and optical module suppliers to ensure good interoperability. Both 56G and 112G parts are proven with multiple test chips. She explained that the portfolio can support requirements from LR to XSR. These points are illustrated by the graphic at the top of this post.
Wendy went into some detail on the Cadence 112G-LR DSP SerDes. The key advantages are summarized in the figure below.
Wendy concluded with a discussion of the Cadence UltraLink D2D PHY IP. This IP can connect two designs through a multi-chip module or an organic substrate. The figure, below, summarizes the performance parameters of this IP.
You can learn more about how to select the right PAM4 SerDes for your application and the Cadence IP portfolio here.[post_title] => 112G/56G SerDes - Select the Right PAM4 SerDes for Your Application [post_excerpt] => [post_status] => publish [comment_status] => open [ping_status] => open [post_password] => [post_name] => 112g-56g-serdes-select-the-right-pam4-serdes-for-your-application [to_ping] => [pinged] => [post_modified] => 2020-09-22 11:06:40 [post_modified_gmt] => 2020-09-22 18:06:40 [post_content_filtered] => [post_parent] => 0 [guid] => https://semiwiki.com/?p=291100 [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [filter] => raw )  => WP_Post Object ( [ID] => 290720 [post_author] => 16 [post_date] => 2020-09-24 06:00:14 [post_date_gmt] => 2020-09-24 13:00:14 [post_content] => SSD memory is enjoying a new resurgence in datacenters through NVMe. Not as a replacement for more traditional HDD disk drives, which though slower are still much cheaper. NVMe storage has instead become a storage cache between hot DRAM memory close to processors and the “cold” HDD storage. I commented last year on why this has become important for the hyperscalers. Cloud throughput and therefore revenues are heavily impacted by storage latencies, which makes fast storage cache a high priority. Which creates implications for verifying warm memory - proving your solution will deliver what it promises. You start to wonder what other operations you could offload into storage. SQL serving for example. Database operations work on lots of data which can dominate latency (and power) if you first have to drag it all over to the processor. It’s faster and lower power to do the bulk of the heavy lifting right in the NVMe unit. I’ve even seen a recent suggestion that linear algebra could be moved into SQL, from which it would be a short jump to push it into NVMe. Another paper suggests an architecture to accelerate big data computation using this kind of approach.
Architecture complexityIt seems there is no limit to what we can do with computation close to storage, when we put our minds to it. All of which makes that NVMe memory much more powerful. The downside is that verifying warm memory implementations, already complex, becomes even more complex. First there’s the architecture complexity. One of these devices may service multiple hosts and many I/O queues. It must provide a similar level of security to that offered by the hosts including at least encryption, perhaps a hardware root of trust and other features to harden the device against attacks.
Implementation complexityThen there’s the implementation complexity. It must deal with the NVMe interface, encryption, logical to physical address mapping, wear-leveling, garbage collection, interface with local DRAM through DDR (to store data while it’s doing garbage collection) and so on. This is a full-blown processor in its own right. As if that weren’t enough, you can’t just model the flash as perfect memory. Reading a bit can return a soft error to which the controller must adapt. According to the Mentor Veloce folks, design teams need to model flash bit behavior down to this level of accuracy in order to have full confidence in their system-level testing. Mentor provide soft models for NAND, NOR and DDR to represent these components.
Traffic complexityFinally, there’s traffic complexity. A verification plan must also model traffic with all the variations you might expect to see in those loads from the host (one or more servers), connected through a PCIe interface. For benchmarking this requires running a standard I/O load like IOmeter, FIO or CrystalMark. Measuring throughput, latencies, all the factors you are aiming to improve through use of warm memories. Put all of this together and you have a big verification task – virtual host and an SSD simulation model which you have to run in emulation to deliver the kind of throughput you need for this volume of verification. Ben Whitehead, Storage Products Specialist at Mentor, has written a white-paper, “Virtual Verification of Computational Storage Devices”, to describe the Veloce solution they have assembled to address this need. With a bunch of application-specific features for measurement, checking and debug. An interesting read for anyone working in this hot domain. [post_title] => Verifying Warm Memory. Virtualizing to manage complexity [post_excerpt] => [post_status] => publish [comment_status] => open [ping_status] => open [post_password] => [post_name] => verifying-warm-memory-virtualizing-to-manage-complexity [to_ping] => [pinged] => [post_modified] => 2020-09-10 14:02:45 [post_modified_gmt] => 2020-09-10 21:02:45 [post_content_filtered] => [post_parent] => 0 [guid] => https://semiwiki.com/?p=290720 [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [filter] => raw )  => WP_Post Object ( [ID] => 291157 [post_author] => 28 [post_date] => 2020-09-23 10:00:42 [post_date_gmt] => 2020-09-23 17:00:42 [post_content] => Mentor Graphics, a Siemens Business, has completed their acquisition of EDA company Avatar Integrated Systems. I recently spoke with Joe Sawicki, Executive VP of the Mentor IC EDA segment, about the acquisition strategy and IC Design platform goals for integration of the Avatar products. Avatar (formerly ATopTech) focused on physical implementation tools for complex, digital SoC designs – e.g., floorplanning, placement, clock-tree synthesis, routing, and ECO flows. Specifically, the foundation of the Aprisa Product was to build their physical algorithms on a route-centric, hierarchical data model. The right-hand side of the figure below highlights the Avatar strategy. The Aprisa SAPR input data is a simple LEF/DEF design model from a (physical-aware) logic synthesis toolset. From the synthesis netlist, Aprisa applies optimizations that focus on ensuring subsequent routability – e.g., congestion avoidance, pin access, adherence to multipatterning decomposition coloring. An internal physical DRC verification engine is applied. A diverse set of clock tree design styles are available, including useful clock skew timing optimizations throughout. An internal synthesis engine allows for further optimization. The input netlist placement assumptions may not accurately reflect the route impact of congestion, R*C delays, and clock skews. Logic restructuring based on the routing model may be needed. The tool incorporates static timing, noise, IR, and EM analysis algorithms to guide placement and route assignment decisions. Joe indicated, “Designers of complex SoCs at advanced nodes are seeking the following from their APR flow – better synthesis-to-post route timing correlation, no coupling noise issues, no DRC violations, in short, fewer APR iterations and faster time to closure. We benchmarked Aprisa, and found the PPA results to be excellent. The learning curve was extremely quick. We had competitive evaluation data within a few weeks.” The figure above illustrates the pre-route (Steiner estimate) to post-route timing correlation on the Mentor benchmarks at the 7nm node. Joe then described the IC Design product strategy. “The Nitro-SoC platform will be supported through the 16/14nm node. Going forward, Aprisa will be the SAPR solution for 7nm and below. The DRC engine that was internal to Aprisa will be replaced by Calibre InRoute.” Joe continued, “The strength of the combined engineering and support teams will offer roadmap stability and continuity to customers, who may have been anxious given the relatively small size of Avatar’s team. Mentor will leverage its relationship with the foundries to extend the Aprisa product certification for advanced process nodes.” With regards to the competitive position of the new offering, relative to the integrated platforms available for physical implementation, Joe said, “Designers want an APR tool that is feature-rich and easy to use. The route-centric data model and optimization algorithms in Aprisa provide faster closure and signoff accurate results. The use of a physical-aware (placement-centric) synthesis flow is a good start, but the set of optimizations available is a key differentiator, specifically route-aware logic re-synthesis. Refinement is where you get considerable value. We’ve already flipped customers from other products.” It will be interesting to track how Aprisa emerges in the reference flow certification from the foundries, and how the route-centric with logic re-synthesis methodology evolves as a point tool solution. Mentor’s acquisition of Avatar expands the scope and future development of SAPR offerings. More competition among EDA providers is always a good thing for the IC design community. [post_title] => Update on Mentor’s Acquisition of Avatar Integrated Systems [post_excerpt] => [post_status] => publish [comment_status] => open [ping_status] => open [post_password] => [post_name] => update-on-mentors-acquisition-of-avatar-integrated-systems [to_ping] => [pinged] => [post_modified] => 2020-09-24 11:37:29 [post_modified_gmt] => 2020-09-24 18:37:29 [post_content_filtered] => [post_parent] => 0 [guid] => https://semiwiki.com/?p=291157 [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [filter] => raw )  => WP_Post Object ( [ID] => 291182 [post_author] => 28 [post_date] => 2020-09-23 06:00:57 [post_date_gmt] => 2020-09-23 13:00:57 [post_content] => On the eve of the Innovative Designs Enabled by Ansys Semiconductor (IDEAS) Forum I spoke with Vic on a range of topics including his opening keynote: Accelerating Moore and Beyond Moore with Multiphysics. You can register here. Vic Kulkarni is Vice President and Chief Strategist, Semiconductor Business Unit, Ansys, San Jose. CA. Vic is responsible for steering the business, technology, go-to-market and product strategy, connecting the dots from chip-package-system design solutions with ANSYS multi-physics simulation technology to address challenges faced by multiple verticals, including 5G, AI, HPC, mobile and autonomous. He drives strategic customer executive relationships and acquisitions with Ansys leadership team. Q: What are the key trends which are shaping your business? Hi-tech sector remains strong. We are witnessing a renaissance in semiconductor and electronic systems. We see an emerging duality between Moore’s Scaling Law and the Beyond Moore trend. On the one hand, compute-intensive demands by a range of markets - including HPC, cloud, storage, autonomous vehicles, 5G, and ML/AI - are driving scaling feature sizes down to 5à4à and now 3nm as Tier-1 semis and hyper-scalers continue to invest in semiconductors. This is due to increased workloads of HPC cloud compute, networking storage, 5G, AI training and inferencing chips like Google TPU. At the same time, there is an accelerating trend to go Beyond Moore with 2.5/3D ICs, chiplets, and other multi-die configurations driven by edge compute, 3D intelligent sensors for autonomous, and high-bandwidth, low-latency, power, area and cost-sensitive applications. We believe that pervasive multiphysics simulation and analysis in all phases of the design cycle from ideation to lifecycle management will be an important enabler to accelerate innovation and achieve silicon-to-system success. Q: How are customers responding to the pandemic? Despite COVID-19, we kept focusing on our customer support excellence delivery and achieved significant success in pre sales campaigns, customer design tape-outs and customer technical collaboration. A few cash-poor startups are affected by COVID-19, but that's a small fraction of our business. We see a great momentum of our RedHawk-SC flagship PI-SI signoff product in China. We completed 9 evaluations and have several ongoing/planned product evaluations. Automotive electronics remains on track, as these companies continue to invest in R&D that enable autonomy. Q: Tell me more about your upcoming opening keynote for the IDEAS Digital Forum. Vic took me through his presentation which is a great set-up for the first day. He starts with a brief overview of the Ansys Multiphysics Simulation Platform and moves into the benefits of a simulation-driven design from Concept to Design to Validation and the resulting savings. ANSYS has a broad range of customers so these numbers are VERY impressive. Vic then talks about custom chips by systems companies for differentiation and faster TTM, semiconductor megatrends and technology challenges. The airplane graphic above explains it quite well (ANSYS tools are on the wings). Bottom line: ANSYS is an important part of the leading edge semiconductor ecosystem for simulation, AI/ML, HPC, 5G, hardware security and autonomous vehicles. And while I miss the ANSYS live events (great food and networking) the ANSYS virtual events are must attend, absolutely. [post_title] => Executive Interview: Vic Kulkarni of ANSYS [post_excerpt] => [post_status] => publish [comment_status] => open [ping_status] => open [post_password] => [post_name] => executive-interview-vic-kulkarni-of-ansys-2 [to_ping] => [pinged] => [post_modified] => 2020-09-24 11:39:08 [post_modified_gmt] => 2020-09-24 18:39:08 [post_content_filtered] => [post_parent] => 0 [guid] => https://semiwiki.com/?p=291182 [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [filter] => raw )  => WP_Post Object ( [ID] => 291098 [post_author] => 11830 [post_date] => 2020-09-22 10:00:33 [post_date_gmt] => 2020-09-22 17:00:33 [post_content] =>
This is another installment covering TSMC’s very popular Open Innovation Platform event (OIP), held on August 25. This event presents a diverse and high-impact series of presentations describing how TSMC’s vast ecosystem collaborates with each other and with TSMC. The presentation covered here from Synopsys focuses on the unique needs of training and inference for AI/ML engines. The algorithms implemented by these designs have very specific requirements. Meeting those requirements demands specialized IP. These special needs and the optimized Synopsys DesignWare IP are discussed to illustrate how AI/ML SoCs get a boost from Synopsys IP on TSMC's 7nm and 5nm processes.
The presentation was given by Faisal Goriawalla, senior product marketing manager at Synopsys. Faisal has over 18 years of engineering and marketing experience in embedded physical IP libraries and non-volatile RAM. He started his career developing embedded SRAM memory compilers and before Synopsys held various technical and marketing positions for memories, standard cells and I/O libraries at ARM. Faisal’s strong background inspires confidence.
Faisal began his presentation focusing on the unique requirements of deep learning and convolutional neural networks (CNNs). He explained that CNNs create a mathematical graph of a problem and train it with a data set of known values. The process begins with training the network, which is compute intensive and then proceeds to inference, where the trained model is deployed. He went into a very good explanation of the requirements of various AI problems with regard to performance, model compression and power. The diagram below summarizes this discussion.
He then explained some of the aspects of a CNN and how it is used to process two-dimensional data. This segment of the presentation provides a very good overview of AI algorithms. I recommend watching it if this is of interest.
Faisal then discussed some of the design challenges for AI chips. Of course, power and area are key items, along with a predictable schedule. He pointed out that an application-aware approach is needed to meet these goals. Some of the items to consider with an approach like this include:
- Choosing the right mix of VTs-Lg-tracks
- Converging on an optimal floorplan
- Managing congestion in multiply-accumulate blocks (MACs)
- Navigating the RTL to GDSII flow
- Achieving PPA targets
Faisal went into some detail on these points. The discussion then turned to application-aware IP, what is needed, and what the benefits will be. From an IP component point of view, what is needed to achieve PPA targets includes:
- Low power memories, especially for Read
- Low power combo cells to reduce internal energy
- Complex combinational cells to reduce switching power
- Special clock gates with lower internal power
- Granular delay cells to reduce the area and power cost of hold fix
- Multi-bit flops to reduce active power
From a methodology point of view, what is needed includes:
- Choice of VT-Lg to give a good starting point on PPA
- Power recovery post-route to reduce leakage
- Flow stage correlation never adds >10% to any metric
Faisal then discussed some of the DesignWare IP solutions from Synopsys to address these requirements:
HPC Kit Enhanced for AI Applications
This package includes IP for object detection and recognition. There are special cells to reduce CNN power consumption up to 39%. Tradeoff tuning enables a 7% frequency boost with 28% lower power. The figure below summarizes some of the benefits of the HPC Kit. This IP is typically used for ADAS applications.
The benefits of customizing memory architectures to optimize PPA for AI designs was also discussed. Synopsys offers a wide range of architectures, bitcells, VTs and PVTs here, including:
- Ultra-high density, high density and high speed
- Small (128Kb) range register file
- Large (>1Mb) range SRAM
- UHD 2-port memories provide FIFO functionality with smaller area & lower leakage at slower speeds
- Configurable multi-port memories
AI designs are typically core limited (as opposed to pad limited). Inline I/O libraries with a less height and more width form factor are optimal to reduce SoC area for this situation. Synopsys offers DesignWare IO Libraries with:
- High (up to 250MHz) performance and high drive strengths for additional margin while supporting longer trace lengths
- Support for 1.8V, 2.5V and 3.3V I/O supplies (technology dependent) for other interfaces on an AI/ML SoC
The ability to integrate an on-chip test and repair engine is important for reducing area and power in AI applications. The Synopsys STAR Memory System provides this support. Total core area can be reduced by ~7% and dynamic power can be reduced by ~12%.
Faisal concluded by explaining that the IP discussed is silicon-proven in volume at TSMC 7nm and test silicon proven at TSMC 5nm. You can learn more about Synopsys DesignWare IP for AI here. You can access the TSMC OIP presentations here. AI/ML SoCs truly get a boost from Synopsys IP on TSMC's 7nm and 5nm.[post_title] => AI/ML SoCs Get a Boost from Synopsys IP on TSMC's 7nm and 5nm [post_excerpt] => [post_status] => publish [comment_status] => open [ping_status] => open [post_password] => [post_name] => ai-ml-socs-get-a-boost-from-synopsys-ip-on-tsmcs-7nm-and-5nm [to_ping] => [pinged] => [post_modified] => 2020-09-21 08:20:19 [post_modified_gmt] => 2020-09-21 15:20:19 [post_content_filtered] => [post_parent] => 0 [guid] => https://semiwiki.com/?p=291098 [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [filter] => raw )  => WP_Post Object ( [ID] => 290760 [post_author] => 16 [post_date] => 2020-09-22 06:00:13 [post_date_gmt] => 2020-09-22 13:00:13 [post_content] => A checker tripped in verification. Is there a bug trace minimization technique to simplify manual debug? Paul Cunningham (GM, Verification at Cadence), Jim Hogan and I continue our series to highlight all the great research that’s out there in verification. Feel free to comment.
The InnovationThis month’s pick is Simulation-Based Bug Trace Minimization With BMC-Based Refinement. We found this paper in IEEE Transactions on CAD, 2005. The authors are/were from the University of Michigan. This is an old paper but still intriguing. Debug, tracing back from an identified bug to the root cause, is the biggest time-sink in verification. Any contribution to reducing that time will have high value. The authors’ approach starts with a waveform trace from a simulation or semi-formal analysis. It aims to reduce the trace to a much shorter trace that still triggers the bug, an easier starting point for manual debug. The paper describes four simulation-based and one BMC-based technique to reduce traces. They first reduce traces by removing cycles, re-simulating each time to check the bug still reproduces. At first glance this looks very unscalable, requiring O(N2) runs for a trace of N cycles. They greatly reduce this complexity first by hashing the circuit state at each cycle. They then watch for hash matches between the original trace and a re-simulation of a candidate reduced trace. If a previously hashed state is hit during re-simulation, they know that the bug can be reached from that hashed state. They can abort the simulation since it can still trigger the bug, i.e. the reduction is proven viable. Through this process they look for any variant trace which triggers the check sooner, which becomes a new and shorter reference trace. Alternatively it may hit a state already seen in an earlier analysis at a later clock cycle. Then this trace can skip ahead, also leading to a shorter reference trace. In a final high-effort simulation step, they also look for opportunities to drop input events (rather than whole cycles), as shown in table 3. Using common datapath functions, FPU, DES and a picoJava engine as benchmarks, the authors show impressive reductions in cycle lengths, better than 98% in most cases and better than 99% in all cases in removing unnecessary inputs. Runtime on most tests was under a minute. The most complex (picoJava) was 10hrs for 30k cycles. Reduced traces were mostly under 10 cycles.
Paul’s viewThis reminds me of an earlier paper we discussed “Using AI to locate a fault”. Both combine multiple methods in fault tracing, getting more out of the combination than out of any one method alone. This approach combines five methods, four simulation-based and one BMC-based. The results, e.g. in figure 12, clearly show how different techniques have different impact on different testcases. Which underlines that you really need all these methods. For a commercial vendor this looks very practical, a fusion of methods rather than a single super-method. Tables 5 and 6 show the bulk of reduction coming from simulation methods and a smaller incremental benefit from BMC. Also encouraging for commercial mass deployments given scalability considerations for model-checking. Intuitively the method makes a lot of sense. State hashing and looking for matches will almost certainly be very effective on randomized simulations. It’s the classic computer science random walk problem where the drunken man walking randomly is going to do a lot of circling relative to the amount of actual useful distance moved. All these circles quickly prune away by looking for state hash matches, which should massively reduce practical runtimes. In their experiments picoJava (around 140k gates) ran in 10 hours. That was running a 1GHz Sun blade (2005 remember). Now 3GHz servers are typical, so you’re looking at 500K gates in 10 hours for a single CPU job on a modern server farm. The algorithm is also very parallelizable, so can be scaled up by just farming re-simulation jobs out to multiple servers, sharing the same state hash database. Which makes it very commercially interesting.
Jim’s viewMoving this into the cloud seems a great way to speed up and reach for larger designs. Thinking about it, 10 hours is an overnight run. You could slipstream these runs in behind regression runs. By the time the verification team had a chance to look at them, most or all of those traces would already be reduced. On investment, this naturally fits into the verification suite, so it’s not an independent product. A bit of a challenge is that this is speeding up something engineers already do rather than making something possible that wasn’t possible before. Can it in some way show a huge improvement in productivity? Maybe add an AI angle for 100X improvement over time? That could enhance appeal for an investor.
My viewFirst, while a lot of good ideas come from software verification, it’s nice to see some coming from hardware verification. Second, I had been looking mostly for recent papers. Paul pushed me to look at some older papers as well. Good intuition! Click HERE to see the previous Innovation blog [post_title] => Bug Trace Minimization. Innovation in Verification [post_excerpt] => [post_status] => publish [comment_status] => open [ping_status] => open [post_password] => [post_name] => bug-trace-minimization-innovation-in-verification [to_ping] => [pinged] => [post_modified] => 2020-09-12 13:07:35 [post_modified_gmt] => 2020-09-12 20:07:35 [post_content_filtered] => [post_parent] => 0 [guid] => https://semiwiki.com/?p=290760 [menu_order] => 0 [post_type] => post [post_mime_type] => [comment_count] => 0 [filter] => raw )  => WP_Post Object ( [ID] => 290909 [post_author] => 3 [post_date] => 2020-09-21 10:00:25 [post_date_gmt] => 2020-09-21 17:00:25 [post_content] => Oh, our semiconductor industry just loves acronyms, and the title of my blog packs three of the most popular acronyms together at once. I attended a webinar hosted by Aldec last week on this topic, "UVM Simulation-based environment for Ibex RISC-V CPU core with Google RISC-V DV". Verification engineers have been adopting the Universal Verification Methodology in order to make their verification results more robust, in less time. RISC-V continues to grow in importance as an open source, Instruction Set Architecture (ISA), and at the dac.com site there are some 3,110 search results for RISC-V. I just expect this trend to continue, because engineers often want to customize aspects of their SoC for a specific purpose or domain. A big question then arises on how do you actually verify a RISC-V project. Google has created a SV/UVM based instruction generator for RISC-V processor verification, then posted it on GitHub. There have been some 765 commits, so this is an actively supported instruction generator. There are many RISC-V core projects around the world to choose from, and Ibex is a small, 32-bit RISC-V core, also available on Github with 1,860 commits to date. Using Riviera-PRO Aldec simulates the UVM testbench with the Google DV random instruction generator and Ibex RISC-V core. [caption id="attachment_290910" align="alignnone" width="1256"] Source: https://ibex-core.readthedocs.io/en/latest/verification.html[/caption] In the testbench SV classes are blocks with rounded corners, while SV modules are shown as square corners, finally the code to be run is depicted in blue with folded corners. Random commands come from the Google DV generator, and the testbench also has random interrupts during testing. The co-simulation flow has both an ISS and RTL loaded with test binaries, simulations are run, then the results are compared by a Python script. You can have the same verification experience if you assemble all of the pieces:
- SystemVerilog simulator that supports UVM (i.e. Riviera-PRO)
- Instruction Set Simulator (Spike or OVPsim)
- RISC-V toolchain
SummaryRISC-V is one of the biggest topics of 2020 for the electronics industry, and the ecosystem continues to grow each day, but verification can be a burden. Aldec showed in this webinar how their SystemVerilog simulator along with other tools could be used in verifying a RISC-V core called Ibex. I've included links to each open source tool on Github, so go explore on your own and save some verification time, instead of starting from scratch. To watch the archived webinar, visit here.
- Webinar Replay – Insight into Creating a Common Testbench
- Six Automated Steps to Design Partitioning for Multi-FPGA Prototyping Boards
- Enhancing Early Static FSM
When USB initially came out it revolutionized how peripherals connect to host systems. We all remember when Apple did away with many separate connections for mouse, keyboard, audio and more with their first computers supporting USB. USB has continued to develop more flexibility and more throughput. In 2015 Apple again introduced… Read More
This is another installment covering TSMC’s very popular Open Innovation Platform event (OIP), held on August 25. This event presents a diverse and high-impact series of presentations describing how TSMC’s vast ecosystem collaborates with each other and with TSMC. Not all SerDes are the same. The presentation covered here,… Read More
SSD memory is enjoying a new resurgence in datacenters through NVMe. Not as a replacement for more traditional HDD disk drives, which though slower are still much cheaper. NVMe storage has instead become a storage cache between hot DRAM memory close to processors and the “cold” HDD storage. I commented last year on why this has become… Read More
Mentor Graphics, a Siemens Business, has completed their acquisition of EDA company Avatar Integrated Systems. I recently spoke with Joe Sawicki, Executive VP of the Mentor IC EDA segment, about the acquisition strategy and IC Design platform goals for integration of the Avatar products.
Avatar (formerly ATopTech) focused… Read More
On the eve of the Innovative Designs Enabled by Ansys Semiconductor (IDEAS) Forum I spoke with Vic on a range of topics including his opening keynote: Accelerating Moore and Beyond Moore with Multiphysics. You can register here.
Vic Kulkarni is Vice President and Chief Strategist, Semiconductor Business Unit, Ansys, San Jose.… Read More
This is another installment covering TSMC’s very popular Open Innovation Platform event (OIP), held on August 25. This event presents a diverse and high-impact series of presentations describing how TSMC’s vast ecosystem collaborates with each other and with TSMC. The presentation covered here from Synopsys focuses on the… Read More
A checker tripped in verification. Is there a bug trace minimization technique to simplify manual debug? Paul Cunningham (GM, Verification at Cadence), Jim Hogan and I continue our series to highlight all the great research that’s out there in verification. Feel free to comment.
Oh, our semiconductor industry just loves acronyms, and the title of my blog packs three of the most popular acronyms together at once. I attended a webinar hosted by Aldec last week on this topic, “UVM Simulation-based environment for Ibex RISC-V CPU core with Google RISC-V DV“. Verification engineers have been … Read More
Murilo Pilon Pessatti is an Electrical Engineer with a MSEE in Analog IC design. He studied in Brazil at São Paulo University (USP) and earned a masters at Campinas State University (UNICAMP). Murilo then moved to Lisbon in Europe to work for ChipIdea, in the early 2000’s when the smartphone era was just taking off.
“I… Read More