Semiwiki EDA Webinar 800x100

Podcast EP50: What happens next in the CPU and GPU wars?

Podcast EP50: What happens next in the CPU and GPU wars?
by Daniel Nenni on 11-26-2021 at 10:00 am

Tom is the creator of the Moore’s Law Is Dead YouTube Channel and Broken Silicon podcast. He creates videos and writes articles containing in-depth commentary and analysis of what’s going on in Technology, Gaming, and Computer Hardware; and also recaps the news and interviews people working within the gaming & semiconductor industry on Broken Silicon.

YouTube Channel (https://www.youtube.com/channel/UCRPdsCVuH53rcbTcEkuY4uQ)

Podcast
(https://podcasts.apple.com/us/podcast/broken-silicon/id1467317304)

Website
(https://www.mooreslawisdead.com/).

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Pradeep Vajram of AlphaICs

CEO Interview: Pradeep Vajram of AlphaICs
by Daniel Nenni on 11-26-2021 at 6:00 am

Pradeep Pic 2020

Pradeep Vajram is a successful entrepreneur and a veteran in the Semiconductor / Embedded industry. He has over 25+ years of experience in having executed, at all levels of responsibilities, in design and development of ASIC products.

Pradeep has been an active investor in semiconductor and deep tech USA-INDIA corridor start-up, since 2017 and has vast experience in building successful businesses in Silicon Valley and India.

Currently, Pradeep is the CEO & Exec. Chairman of the AlphaICs Corporation. Before AlphaICs, Pradeep founded SmartPlay Technologies in 2008 – the world’s first integrated end-to-end product engineering services company. SmartPlay was then acquired by Aricent in 2015.

Prior to SmartPlay, he served as the Vice President of Engineering at Qualcomm, heading the India semiconductor division in Bangalore. Under his leadership, Qualcomm Bangalore Design Center developed into a strong center of excellence and delivered multiple 3G/4G products successfully.

Prior to Qualcomm, Pradeep was the CEO & co-founder of Spike Technologies – a leading chip design services company. Spike was acquired by Qualcomm in 2004.

Pradeep has a Bachelor’s degree in Electronics Engineering from Karnataka University & a Master’s degree in Computer Engineering from Wayne State University, Detroit

What is the backstory of AlphaICs and what does it do?

AlphaICs Corporation, a 4-year-old startup, designs and develops the best-in-class AI Co-processors for delivering high-performance AI computing on edge devices. With the growth in popularity of Deep Neural Networks, there has been a huge demand for running such networks in real-time, on edge devices.  The AI hardware market is estimated to be a $67 Billion market by 2025.     We have developed a power efficient, high throughput  AI Processor technology called Real AI Processor (RAPTM) for accelerating AI workloads. RAPTM is highly scalable and modular, enabling OEMs to choose the configuration that fits their performance and power requirements.

The RAPTM co-processor can be configured from 0.5 TOPS to 32 TOPS and can scale above 32 TOPS (64 TOPS, 128 TOPS, etc.)  by using a multi-core strategy. We have developed the entire software stack for creating and deploying neural networks, developed in on standard AI frameworks, on the RAPTM.   Software tool-chain provides an easy method to port existing neural networks onto our processors. Our software stack supports TensorFlow currently, and we plan to add support for other AI frameworks in the future.

What is your current status and go to marketing strategy?

We are excited to have first silicon Gluon that is an 8 TOPs AI inference coprocessor.  We  show cased  Gluon capabilities with our marketing partner CBC in the AI expo in Tokyo, Japan last month.

The response to our technology was very encouraging, and we are very excited to bring this product for our customers.  Competing solutions in the market are offering a SoC solution that integrates host processor and AI accelerator which necessitates complete redesign of the system resulting in huge investment and delay. We believe a co-processor strategy will quickly enable our customers to integrate AI capabilities in their current systems resulting in significant savings. Our initial focus is video analytics. This is a big market, and many verticals like surveillance, retail, automotive, manufacturing, healthcare will have AI enabled Video analytics applications by 2025.

Our product enables OEMs and system integrators to achieve market cost, and power-performance goals for edge solutions. So, in a nut-shell, we are developing high performance, low power, easy to use, edge AI co-processors for our customers to integrate AI quickly to their solutions.

How do you differentiate from various AI start-ups and incumbent solutions in this space?

AlphaICs differentiation comes from proprietary architecture. Gluon provides better throughput in lower power than incumbent products as well as other startups’ solutions. We have also developed a software tool-chain that makes it very convenient for users to deploy their trained networks on Gluon.

AlphaICs solutions will enable edge AI compute both for inference and incremental edge learning.  Edge learning is an ability of devices to learn from new data and scenarios on which they were not trained; providing additional intelligence to the edge devices. In this mode, devices start with a trained model on the partial data, and then they learn new scenarios as they encounter new data. We have showcased this on our Architecture, and it is a unique feature that gives our solution an advantage when compared to the other solutions out there. Edge learning is planned in our next generation product.

Can you elaborate your edge learning technology?

Today, edge devices run inferencing of trained deep neural networks to accomplish tasks such as object recognition, image classification, and image segmentation, to name a few. When new unseen data is encountered by the edge devices, the accuracy drop of such systems can be substantial. This is a major problem today for the real-world solution as nature of data keeps changing in these applications. With this in mind, at AlphaICs we designed our proprietary Real Artificial Intelligence Processor (RAPTM), to enable learning when new data is available to the edge devices; without affecting the already learned intelligence. We showcased Proof of Concept for Edge Learning based on a research grant from a US Gov R&D institution.  Our results are very promising, and we will continue to further develop this technology.

What is AlphaICs future roadmap and direction?

AlphaICs’ core technology RAPTM supports edge inference and edge learning. We are working to bring our next product that will integrate inference and edge learning. Our current solution is 8 TOPs and we will scale up to 64 TOPs as well integrate pre and post processing capabilities. We are very bullish on huge opportunities at the Edge and we have right technologies to enable edge AI for our customers.

https://alphaics.ai/

Also Read:

CEO Interview: Charbel Rizk of Oculi

CEO Update: Tuomas Hollman, Minima Processor CEO

CEO Interview: Dr. Ashish Darbari of Axiomise


PCIe Gen5 Interface Demo Running on a Speedster7t FPGA

PCIe Gen5 Interface Demo Running on a Speedster7t FPGA
by Kalar Rajendiran on 11-24-2021 at 10:00 am

PCIe Gen5 Interface Demo Board

The major market drivers of today all have one thing in common and that is the efficient management of data. Whether it is 5G, hyperscale computing, artificial intelligence, autonomous vehicles or IoT, there is data creation, processing, transmission and storage. All of these aspects of data management need to happen very fast. Fast storage and high-speed networking are ever more critical for today’s applications. Data centers and hyperscale data centers cannot afford to tolerate data traffic jams anywhere in the data path. They need to process incoming external data very efficiently and get the data to the final destination rapidly. But, with Ethernet speeds evolving must faster than PCIe generational speed jumps, the gap is growing.

As network interfaces upgrade from 100GbE to 400GbE, a full-duplex 400GbE link would require 800Gbps of bandwidth that translates to 100GB/s. A PCIe Gen4 x16 cannot handle that bandwidth but a PCIe Gen5 x16 can. And, as offloading tasks that were traditionally handled by the host is becoming more common, NVMe storage is being used like network attached storage with access managed by a SmartNIC. A faster NVMe storage solution can be implemented with PCIe Gen5. In other words, PCIe Gen5 will become very important for data centers where fast storage and high-speed networking are critical for communications.

SmartNICs are being expected to handle more functionality and offer flexibility to handle changing data management requirements. An earlier blog discussed how a reconfigurable SmartNIC can benefit from a Speedster7t FPGA based implementation. The focus of that post was the 2D-NoC feature of the Speedster7t FPGA. The blog was based on an Achronix webinar titled “Five Reasons Why a High Performance Reconfigurable SmartNIC Demands a 2D NoC.“ You can watch that on-demand webinar by registering here.

This blog focuses on the Speedster7t FPGA’s PCIe Gen5 capability. The Speedster7t family is one of the first FPGAs available now that natively supports the PCIe Gen5 specification. It is in this context that a recent video publication by Achronix is of interest. The video shows a demonstration of a successful PCIe Gen5 link between a Teledyne LeCroy PCIe exerciser and a Speedster7t FPGA. Teledyne LeCroy offers an integrated and automated compliance testing system, approved by the PCI-SIG® as a standard tool for compliance testing of PCIe specifications. The PCI Express exerciser can generate PCI Express transactions, observe behavior, and perform both stress testing and compliance testing.

Steve Mensor, vice president of sales and marketing at Achronix introduces the Speedster7t FPGA with a high-level overview of its features. He then hands off to Katie Purcell, application engineering manager at Achronix to present the PCIe Gen5 interface demo on Speedster7t FPGA. The demo setup includes a Speedster7t FPGA board, the PCIe exerciser and a connected computer to set up the exerciser.

First, Katie launches the exerciser’s control program graphical user interface (GUI) on the connected computer. The goal of the demo is to show the FPGA successfully link (achieving PCIe L0 state) at Gen1 through Gen5 specs. The demo shows that a PCIe L0 state can be achieved between the FPGA and the Gen5 capable LeCroy A58 PCIe exerciser. Although the FPGA can support up to PCIe Gen5 x16, the demo is run in x8 mode as that is the maximum mode supported by the exerciser. The demo shows all eight lanes downstream and upstream show the status of having reached the L0 state for a 32GT/s PCIe Gen5 data rate. The exerciser is cycled through to show that links can be achieved at all 5 PCIe Gen speeds.

If you are involved in or will be upgrading to a PCIe Gen5 system, you may want to watch the demo. It runs just 4-minutes long but could be useful for your project. You can find out more details about the Speedster7t FPGA family here.

 

 

 


WEBINAR: Using Design Porting as a Method to Access Foundry Capacity

WEBINAR: Using Design Porting as a Method to Access Foundry Capacity
by Tom Simon on 11-24-2021 at 8:00 am

Schematic Porting the NanoBeacon

There have always been good reasons to port designs to new foundries or processes. These reasons have included reusing IP in new projects, moving an entire design to a smaller node to improve PPA, or second sourcing manufacturing. While there can be many potential business motivations for any of the above, in today’s environment with semiconductor supply shortages, design porting has taken on a new and compelling importance. With almost every fabless semiconductor company facing reductions in fab allocation, design teams are pressed to move existing designs to alternative fabs.

Webinar: Efficient and User-Friendly Analog IP Migration

Second sourcing SOCs calls for porting both the digital and analog portions of the designs. In many SOCs it is enough to find equivalent analog IP, for such things as PLLs and IO’s, but mixed signal designs that feature custom IP blocks need more attention. While it is never truly easy to port digital designs, as a result of the use of RTL, libraries, synthesis and P&R this task is tractable. Analog is quite another thing altogether. Fortunately, MunEDA has a comprehensive solution for each stage of the analog design porting process. They offer their Schematic Porting Tool (WiCkeD SPT) and a suite of analog tools for tuning device parameters and design optimization.

InPLAY Inc.  is a rapidly growing company focused on RF designs for low latency wireless (SMULL), Bluetooth, and Industrial IoT. Their products offer unique features and extremely high performance in terms of range, throughput and battery life. With demand growing rapidly, especially for their new active BLE beacon product, NanoBeacon, they have sought to diversify their manufacturing. I spoke recently with InPLAY’s CoFounder and Director of RF/AMS Design Russell Mohn about how they are managing the process.

Design Porting the NanoBeacon

Russell told me that once they realized they would need to move production to additional foundries, they chose MunEDA’s SPT – partly because they were already using MunEDA’s WiCkeD analysis and verification tools to optimize their analog designs. WiCkeD offers Circuit & Sensitivity Analysis, PVT & Corner Analysis, MonteCarlo Statistical Analysis, High Sigma & Worst Case Analysis, and a Robustness Verification Flow. Russell has been quite happy with the design results he has achieved with WiCkeD, and it was an easy choice to look at SPT to solve their new challenges.

SPT handles all the details of switching to the devices in the new process PDK. SPT helps the user set up the device, pin and parameter mapping information. Of course, some manual intervention is required, but the SPT user interface makes the task intuitive and straight forward. SPT will even help manage the changes in the drawn schematic symbols so the schematic remains legible.

Symbol Mapping

In analog designs there is, of course, a lot more to moving to a new PDK than just mapping devices. Every aspect of the circuit behavior is prone to change. MunEDA’s DNO sizing and optimization tools, however, can automate most of the work using designer provided performance targets.

While I am sure that folks like Russell would rather be working 100% on developing new products, it come as a huge relief for him to have an effective option to keep up with the growing demand for their products in a time when the extra effort is required. It might be that SPT is a product whose time has come.

If you are interested in learning more about SPT and how it can smooth the move to new PDKs please register for this webinar.

 

Also Read

Numerical Sizing and Tuning Shortens Analog Design Cycles

CEO Interview: Harald Neubauer of MunEDA

Webinar on Methods for Monte Carlo and High Sigma Analysis


Traceability and ISO 26262

Traceability and ISO 26262
by Bernard Murphy on 11-24-2021 at 6:00 am

V graphic 2 min

Since traceability and its relationship to ISO 26262 may be an unfamiliar topic for many of my readers, I thought it might be useful to spend some time on why this area is important. What is the motivation behind a need for traceability in support of automotive systems development? The classic verification and validation V-diagram is a useful starting point for understanding. The left arm of the V decomposes system design from concepts into requirements, architecture, and detailed design. The right arm represents verification and validation steps from unit testing all the way up to full system validation.

System development, verification and validation

Interdependency in system design

First, let’s talk about systems. A system is generally more than just a chip + software running on that chip. An example system is a car in which SoCs (chips + software) play multiple roles. There are many mechanical components in a car – engine, body structure, braking, airbags, seats and windows. All of which are enabled in various ways by electronic components: sensors to detect possible collisions, actuators to control brakes and steering, seat and window positions, communication and infotainment. These must work together as flawlessly as possible. Accomplishing this goal is managed through mountains of specifications, requirements lists and use-case definitions to ensure everyone is building and testing against the same expectations.

Car design depends heavily on reuse for the same reasons we face in SoC design – cost, schedule, quality, reliability. Plus, of course, safety. Which puts heavy constraints on interfaces between levels in the system. A new SoC must comply with multiple existing requirements in addition to meeting new requirements. The larger system and its software are very expensive to change and re-certify. New components like SoCs must fit the system requirements.

Mind the gaps

Now we’ve established that everything starts with specifications, requirements and use-cases, all non-negotiable expectations on a supplier; how does an SoC company map those into what they need to build? Going down the left arm of the V, requirements are managed in tools like IBM DOORS or Jama, and specifications might be in PDF and use-cases, perhaps in SysML. This information is very high level, not directly executable by an RTL design team.

An architect will map these requirements manually using her expertise into a more detailed specification, leveraging available IP and company differentiated skills. She will also optimize the architecture to meet performance, power and cost goals. The architect will use a different set of tools at this level, together with virtual modeling, to start early software development. That intent is then translated, usually manually, into the more familiar RTL design and modeling phase, where the full implementation is developed.

In the right arm of the V, verification and validation start with unit testing. These tests are built independently from development to maximize integrity of the testing. Subsystem and system testing follow, also independently developed for the same reason. Finally, full system validation runs against system software in a lab emulation of the full electronic system, perhaps even with some mechanical modeling.

There are gaps between all these stages, some well-intended, but gaps nonetheless. Humans must bridge these gaps; however, we are imperfect. We miss some things, we misinterpret others, and we don’t stay current with spec changes. You might hope for a universal modeling language to design out human fallibility, but that dream seems unattainable. Instead, we bridge the gaps with traceability – links connecting a higher-level requirement to lower-level implementation and tests of that requirement.

How traceability bridges the gaps

Without automation, the way you check correspondence between levels is through painstaking line-by-line checks between requirements and implementation. Tying up experts for days in performing those checks. Not so bad the first time, but as the design evolves, if the customer changes the specification or if multiple customers have conflicting requirements, periodically repeating that detailed level of check becomes very hard.

Bridging the gaps with Arteris® Harmony Trace™ traceability

A better solution would automate links between requirements and implementation, say a bus width or a register offset. Setup will require some initial effort, but then the integrity of that check persists through the design lifecycle and beyond. In a design review, you don’t have to slog through the documents every time; the tool will check automatically. If some parameter slipped out of compliance, you’d instantly know. You know you are still in compliance if the tool hasn’t raised any flags.

Traceability also gives you instant evidence you can show to somebody who’s going to check your work. Maybe your own internal safety team, maybe a customer demanding proof of compliance, maybe a safety process auditor. There’s another benefit. When something goes wrong (because this is engineering; something always goes wrong), you have an audit trail through traceability to help figure out what you should change in your process.

Traceability throughout the lifecycle

Traceability isn’t only important in the design phase. After the chip goes into production, when the auto OEM is running extensive tests, they may find a problem. In diagnosis, they want to trace back through software to hardware components. Can this problem be attributed to a deficiency in the requirements?  Being able to trace quickly to a root cause can have a huge impact on corrective action and ultimately model release. Being able to provide quick turn-around and definitive evidence of compliance, or an unforeseen problem, can only enhance the reputation of a provider in the supply chain.

Arteris® Harmony Trace™

Is there such a solution available for SoC Product design teams? Arteris IP has now released its Harmony Trace product to automate and report on these links. Harmony Trace connects IP-XACT-based SoC assembly and the hardware/software interface to popular requirements management tools and to popular documentation formats. There is now an automated path to ensure compliance with those higher-level requirements and to be able to quickly demonstrate that compliance to customers and ISO 26262 auditors. To learn more, click HERE.

Also Read:

Physically Aware SoC Assembly

More Tales from the NoC Trenches

Smoothing the Path to NoC Adoption


Bonds, Wire-bonds: No Time to Mesh Mesh It All with Phi Plus

Bonds, Wire-bonds: No Time to Mesh Mesh It All with Phi Plus
by Matt Commens on 11-23-2021 at 10:00 am

Ansys Phi Plus

Automatic adaptive meshing in HFSS is a critical component of its advanced simulation process. Guided by Maxwell’s Equations, it efficiently refines the mesh to accurately capture both the geometric and electromagnetic detail of a design. The end result is a process that guarantees accurate and reliable simulation results with no user input required.

Before the adaptive meshing process can begin, there must be an initial mesh to faithfully represent the device’s geometry. With today’s highly dense and complex designs, creating this mesh can be a challenging task. A variety of initial meshing approaches are available in HFSS, each with a different scheme for mesh generation and a different set of strengths, apropos for different design types. For example, the TAU meshing technology is well suited for meshing complex 3D CAD while the Phi meshing technology is highly effective for meshing PCBs. Attempting to apply a one-mesh-fits-all paradigm is a significant challenge when approaching complex designs containing a mixture of CAD, like a PCB in a shielding enclosure.

Fortunately, today HFSS uses the new breakthrough HFSS Mesh Fusion technology to apply meshing approaches according to local CAD specifications. From there, HFSS proceeds with the same reliable, adaptive refinement process with guaranteed accuracy.

Figure 1. HFSS Mesh Fusion applied to PCB-Connector-Flex System. Mesh-left, Fields-right

Phi is one of the meshing techniques that Mesh Fusion can apply to local CAD for components like PCBs and IC packaging. Phi is “geometry-aware.” It works with designs that are composed of 2D layers uniformly swept in their normal. When applied to the right CAD, such as a PCB, Phi is 10, 15, or even 20 times faster than its partner 3D meshing technologies—Classic and TAU.

After two decades of meshing innovation, there were still a few “rogue” design types that were notoriously difficult to mesh—most notably a common, inexpensive design solution in consumer electronics: wirebond packaging. With tiny, high-aspect-ratio copper wires connecting to an integrated circuit inside the package, its design is both layered and 3D in nature. These two CAD features, when combined in a single design, were challenging for the existing meshing technologies to manage…until now.

Introducing Phi Plus
Ansys created Phi Plus using a ground-up approach for new meshing technology, specifically designed for challenging mixed geometries like wirebond packaging. Like its predecessor, Phi, it’s “geometry-aware.” It was designed specifically to understand how wirebond packages, and other complicated 3D components like PCB connectors, are manufactured and assembled. Phi Plus takes advantage of this design knowledge and accounts for those nuances in the meshing model.

Figure 2. Package on PCB system meshed with Phi Plus and HFSS Mesh Fusion. Z-stretched view

Key features include:

Parallel Meshing Technology
Ansys prioritized parallel meshing to speed up Phi Plus processing time. From the ground up, Phi Plus was designed to take advantage of parallel computational strategies; in beta, upwards of 10 times faster mesh times were observed when meshing with 12 cores.

Robustness and Reliability
Phi Plus combines the reliability of Phi with component-specific considerations to control for a uniform, high-quality, and water-tight mesh. A higher quality mesh generates downstream benefits in the adaptive meshing and frequency sweep solving steps. Benefits include smaller final mesh count, less solution memory, and the ability to solve more frequency points in parallel for a faster total speed with high performance computing (HPC) resources.

Figure 3. Close-up view of wirebond at top layer of package

More Than “Just” Fast
With no change to user flow, the entire simulation process can be upwards of 10 times faster with little to no mesh failures. For now, implementation requires a single settings change in simulation set up with no other action needed from the user. Phi Plus has proven to be so robust in its beta rollout that it’s anticipated to become the default meshing approach in a near-future release.

It doesn’t stop with “just” wirebond packages either. Phi Plus addresses system complexity that wasn’t possible before. Its robust meshing capabilities can manage other common 3D effects in complex electronics design, including trace etching or the inclusion of 3D Encrypted Components.

The Bottom Line
However beautifully rendered a 3D design may look on the computer screen, it’s the mesh that dictates what’s simulated. Mesh is foundational to an accurate physics model. HFSS has a long legacy of letting Maxwell’s Equations guide the creation of accurate, efficient mesh, and that legacy grows stronger with the advent of Phi Plus. With Ansys HFSS 2022 R1, Phi Plus meshing will be fully incorporated into our system-capable Mesh Fusion technology. Combined with hyper-scale technologies like Ansys Cloud, there are no limits to what teams can tackle with Ansys HFSS.

Also Read

Optical I/O Solutions for Next-Generation Computing Systems

Neural Network Growth Requires Unprecedented Semiconductor Scaling

SeaScape: EDA Platform for a Distributed Future


Learning-Based Power Modeling. Innovation in Verification

Learning-Based Power Modeling. Innovation in Verification
by Bernard Murphy on 11-23-2021 at 6:00 am

Innovation New

Learning-Based Power Modeling. Innovation in Verification

Is it possible to automatically generate abstract power models for complex IP which can both run fast and preserve high estimation accuracy? Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Learning-Based, Fine-Grain Power Modeling of System-Level Hardware IPs. We found this paper in the 2018 ACM Transactions on Design Automation of Electronic Systems. The authors are from UT Austin.

I find it easiest here to start from the authors’ experiments, and drill down along the way into their methods. They start from Vivado high-level synthesis (HLS) models of complex IP such as generalized matrix multiply and JPEG quantization. These they synthesize against a generic library and run gate-level simulations (GLS) and power estimation. Next they capture activity traces at nodes in the HLS model mapped at three levels of abstraction of the implementation: cycle-accurate traces for datapath resources, block-level IO activity within the IP, and IO activity on the boundary of the IP. Through ML methods they optimize representations to model power based on the three activity abstractions.

There is some sophistication in the mapping and learning phases. In resource-based abstractions, they adjust for out of order optimizations and pipelining introduced in synthesis. The method decomposes resource-based models to reduce problem dimensionality. At block and IP levels, the method pays attention to past states on IOs as a proxy for hidden internal state. The authors devote much of the paper to detailing these methods and justification.

Paul’s view

This is a very important topic, widely agreed to be one of the big still-unsolved problems in commercial EDA. Power estimation must span from low-level physical detail for accurate gate and wire estimation, up to a full software stack and apps running for billions of cycles for accurate activity traces. Abstracting mid-level behavior with good power accuracy and reasonable performance to run against those software loads is a holy grail for EDA.

The scope of this paper spans designs for which the RTL can be generated through high-level synthesis (HLS) from a C++ software model. I see two key contributions in the paper:

  • Reverse mapping activity from gate level simulations (GLS) to the C++ model and using this mapping to create a software level power model that enables power estimation from pure C++ simulations of the design.
  • Using an ML-based system to train these software level power models based on whatever low level GLS sims are available for the synthesized design.

A nice innovation in the first contribution is a “trace buffer” to solve the mapping problem when the HLS tool shares hardware resources across different lines in the C++ code or generates RTL that permissibly executes C++ code out of order.

The power models themselves are weighted sums of signal changes (activities), with weights trained using standard ML methods. But the authors play some clever tricks, having multiple weights per variable, with each weight denoting a different internal state of the program. Also, their power models conceptually convolve across clock cycles, with the weighted sum including weights for signal changes up to n clock cycles before and after the current clock cycle.

The results presented are solid, though for very small circuits (<25k gates). Also noteworthy is that HLS-based design is still a small fraction of market. Most designs have a behavioral C++ model and an independent hand coded RTL model without any well-defined mapping between the two. That said, this is a great paper. Very thought-provoking.

Raúl’s view

The article presents a novel machine learning-based approach for fast, data-dependent, fine grain, accurate power estimates of IPs. It can capture activity traces at three abstraction levels:

  • individual resources (“white box” cycle level),
  • block (“gray box” basic blocks), and
  • I/O transitions (“black box”invocations).

The technique develops a power model for each of these levels, e.g. for power in cycle n and state s

pn = 𝛉sn·asn

with an activity vector a over relevant signals, and a coefficient vector 𝛉. The approach develops similar models, with less coefficients and relevant signals, for blocks and I/O.  The method runs training using the same input vectors of an (accurate) gate-level simulation adjusting the coefficients, by various linear and non-linear regression models.

Experiments used the Vivado HLS (Xilinx), Synopsys DC and Primetime PX for power estimation. They tested 5 IP blocks between 700-22,000 gates. The authors were able to generate models within 34 minutes for all cases. Most of this time was spent generating the actual gate-level power traces. The accuracy at cycle-, block-, and I/O-level is within 10%, 9%, and 3% of the commercial gate-level estimation tool (presumably PrimeTime) and 2000x to 15,000x faster. They also write briefly about demonstrating benefits of these models for virtual platform prototyping and system-level design space exploration. Here they integrated into the GEM5 system simulator.

This is an interesting and possibly very useful approach power modeling. The results speak for themselves, but doubts remain about how general it is. How many signal traces are needed to train and what is the maximum error for a particular circuit; can this be bound? This is akin to “just 95% accuracy” of image recognition. There is perhaps the potential to develop and ship very fast/accurate power estimation models for IP.

My view

I spent quite a bit of time over the years on power estimation so I’m familiar with the challenges. Capturing models which preserve physical accuracy across a potentially unbounded state space of use cases is not a trivial problem. Every reasonable attempt to chip away at the problem is worthy of encouragement 😃

Also Read

Battery Sipping HiFi DSP Offers Always-On Sensor Fusion

Memory Consistency Checks at RTL. Innovation in Verification

Cadence Reveals Front-to-Back Safety


Machine Learning Applied to IP Validation, Running on AWS Graviton2

Machine Learning Applied to IP Validation, Running on AWS Graviton2
by Daniel Payne on 11-22-2021 at 10:00 am

Solido Variation Designer on Neoverse N1 CPU min

I recall meeting with Solido at DAC back in 2009, learning about their Variation Designer tool that allowed circuit designers to quickly find out how their designs performed under the effects of process variation, in effect finding the true corners of the process. Under the hood the Solido tool was using Machine Learning (ML) techniques so that instead of running millions of brute-force SPICE simulations in Monte Carlo analysis, they could get Monte Carlo results with only a much smaller subset of SPICE simulations. Mentor Graphics acquired Solido in December 2017, while Mentor was acquired by Siemens in March 2017.

I spoke with Sathishkumar Balasubramanian of Siemens EDA, where he is Head of Products for AMS verification, to get an update.  The big news is that EDA tools like  Variation Designer have run on the most popular engineering platforms powered by Intel chips, and now on the AWS Graviton2, which uses 64-bit Arm Neoverse cores. Not only are EDA tools moving to the cloud, but when you get in the cloud. then you get to choose X86 or AWS Graviton2 chips to run demanding workloads.

Engineers at Siemens EDA ported Solido Variation Designer software over to the AWS Graviton2, and also optimized for performance.  Users of Variation Designer on both X86 and AWS Graviton2 platforms have the same use experience.

Arm has been a long-time user of Solido tools, even prior to using AWS. They are seeing a 1,000X speedup in addition to better accuracy and coverage with Variation Designer on their IP validation runs, compared to the brute-force approach, where they need to verify standard cell IP to Six Sigma. The cost benefit analysis for Arm tilted in favor of using AWS over x86. With Graviton2 they were able to get more processors for the same costs as on x86.

Arm processor use is growing in HPC and data centers, giving EDA users some choices. The Variation Designer tool launches any of the major SPICE circuit simulators out there today, not just AFS from Siemens EDA. Arm could run their IP validation jobs on premise, but by using the AWS Graviton2-based Amazon EC2 instances on the cloud they just got a better return on investment, lowering costs by 24%, reducing CPU time by 12% and getting a 6% improved turnaround time. Running jobs in the cloud offers both scalability and capacity, something hard to achieve with on-premise. It’s kind of cool that Arm is designing IP for their next generation systems running on Arm cores in the cloud.

With Variation Designer you can expect to see four releases per year, along with monthly updates for any patches. Safety critical designs like automotive chips require 6 sigma validation of their IP, which is 1 failure in a billion units, so using a smart ML approach in the cloud gets you there quickly, and with lowered costs. IC designs at advanced process nodes require high sigma variation too, because there’s a much higher transistor count and silicon functionality per chip.

Summary

If you are using Solido Variation Designer already, and are attracted to the scalability and capacity of cloud computing, then give some thought to using AWS Graviton2 processors.

Related Blogs


Numerical Sizing and Tuning Shortens Analog Design Cycles

Numerical Sizing and Tuning Shortens Analog Design Cycles
by Tom Simon on 11-22-2021 at 6:00 am

Sizing and tuning

By any measure analog circuit design is a difficult and complex process. This point is driven home in a recent webinar by MunEDA. Michael Pronath, VP Products and Solutions at MunEDA, lays out why, even with the assistance of simulators, analog circuit sizing and tuning can consume weeks of time in what can potentially be a non-convergent process. The webinar titled “Optimal circuit sizing strategies for performance, low power, and high yield of analog and full custom IP” describes the problems designers encounter and offers a solution.

Memories, custom cells, RF and analog blocks are facing challenges that include smaller process nodes, difficult PPA trade-offs, reliability, low noise requirements, yield, etc. Equation-based circuit sizing can become intractable, especially when numerous performance specifications are added to the problem. It quite quickly becomes an expanding n-dimensional problem. Not only is it hard to find a working solution through manual iteration, manual approaches often prevent significant further optimization from being achieved. In many cases Michael explains reaching the optimal value of one spec violates another spec. This can arise from non-linear parameter dependencies resulting in mixed effects. He suggests that it’s not enough just to shift nominal values, but instead sensitivities need to be minimized so that yields can be improved.

Sizing and tuning

MunEDA has developed an automated tool that performs numerical resizing based on simulation results to refine device parameters to achieve all of a design’s specifications. The initial design needs some initial sizes, but even if it does not meet all design specs, MunEDA’s circuit sizing, optimization and variation analysis tools can find optimum results, or tell the designers that a different topology is necessary. MunEDA’s WiCkeD Tool Suite delivers performance optimization over multiple PVT corners on all test benches simultaneously. It is smart enough to perform automatic analog structure recognition. It performs yield optimization, and power & area optimization. It is suitable for traditional process technologies and FinFET nodes. As the WiCkeD tools work on a design they keep a design history and database at each step, so it is easy to perform experimentation and exploration.

MunEDA’s automated numerical circuit sizing with WiCkeD progresses through four distinct stages. Feasibility optimization locates the design with correct DC biasing for MOS devices. Nominal tuning at typical PVT fulfills specs at typical conditions. Worst case operating conditions are met through tuning at different PVTs. Design centering is done to improve the robustness of the design against process variation and mismatch. Each phase narrows down the design parameters to achieve the best result.

After this discussion Michael moves the webinar on to a thorough demo that shows in detail how designers interact with the tool as they go through a design. There were several interesting highlights during this part of the webinar. The user interface provides both text and graphic feedback on the design state and performance. There is a fascinating view of the design that shows pairwise dependencies between specs so it is easy to comprehend where trade-offs might be difficult or easy to make. At each step of the sizing, tuning and optimization process there are graphs available that show values for each specification.

Michael runs through a convergent process of fitting specifications and moving toward completion. Frequently a process that might have taken weeks through manual optimization can be completed in several hours – including setup. The design history allows reverting to earlier steps and repeating them with different goals to find the best result.

Michael concludes with several case studies of customer designs where the WiCkeD Tool Suite has delivered impressive results. He shows a high speed DDRx IO, a PA core & filter, and a rail to rail input push-pull output AMP. In each of these examples the design time was reduced from weeks to hours and the PPA often was better than the by-hand results by a wide margin.

Michael’s expertise shows though in this concise but detailed talk on how to improve analog circuit design efforts. Breaking the bottleneck on analog circuit sizing and tuning can have meaningful result in shortening time to tape out. The webinar is available to view on-demand at the MunEDA website.

Also Read

Webinar on Methods for Monte Carlo and High Sigma Analysis

CEO Interview: Harald Neubauer of MunEDA

Webinar on Tools and Solutions for Analog IP Migration


Supply Chain Breaks Under Strain Causes Miss, Weak Guide, Repairs Needed

Supply Chain Breaks Under Strain Causes Miss, Weak Guide, Repairs Needed
by Robert Maire on 11-21-2021 at 8:00 am

Applied Materials

-AMAT -Supply chain can’t keep up with expanding business
-May be longer term issue which will limit upside
-Being tough on vendors may have come back to bite Applied
-Fixing supply chain will likely take longer than the current cycle

Supply Chain issues come home to roost

Applied Materials missed on both earnings and revenues versus the street coming in at EPS of $1.94 and revenues of $6.123B versus street expectations of $1.96 and $6.375B with whispers of over $2 and about $6.5B.

Guidance is for $1.85 +- $0.07 and revenues of $6.16B +- $250M …which sounds like flattish in a world of huge demand. Street expectations were $2.01 and $6.5B.

As compared to the rest of the semiconductor equipment industry which ranged from virtually zero supply chain impact to significant impact, Applied was most negatively impacted of any company. This clearly implies that the problems reported were likely Applied specific. According to the company the supply chain related impact was around $300M.

Shoemakers Children go barefoot

Applied laid the blame primarily on semiconductor shortages although we think it likely goes well beyond that to subsystems using semiconductors. It also sounded like most of the issues were on process tools.

We find it a bit odd that after navigating issues for much of the year that supply chain problems finally caught up with them. We would imagine that much of the inventory and stock in the supply chain has likely been used up to the point where there is little to no buffer left in the system.

Its not question of if upside will be limited but how much it will be limited?

We had predicted in our quarterly preview note of October 4th that we would likely see more impact from supply chain issues that we had not previously seen in previous quarters and that upside would likely start to limit performance which would in turn limit stock upside. Applied’s report is the best example of that projected issue

We have at the very least two quarters of negative impact on revenue and earnings and likely beyond into 2022. Although the reported quarter is only about a 5% hit to revenues it could worsen

Does inability to supply lead to share loss to competitors?

From Applied management it was clear that the areas most impacted were coincidentally those areas with the most competition such as deposition and etch. Could we see Lam, Tokyo Electron, Hitachi and ASMI pick up some share as customers get more desperate for tools that Applied can’t supply.

This is obviously more damaging as it causes a longer term problem because its harder to get back lost share

Demand is beyond huge…. its all a supply issue

Demand remains at super strong levels and we will likely see record demand in the near term. Sooner or later this “super duper cycle” will subside and things will slow so its very important to “make hay while the sun shines” and not miss any opportunities. We would hope that Applied can address the supply issues before the current cycle slows.

Being tough on vendors may have come back to bite Applied

Applied has always been very proud of being tough on suppliers…. perhaps a bit too tough, always trying to squeeze the last margin point or concession out of vendors. It seems that management has recognized this, perhaps a little late in the current demand environment.

CEO Gary Dickerson said during opening remarks on the call ” the economic value of capturing upside opportunities far outweighs pure efficiency savings, we’re also seeing changes in supply agreements across the eco-system as companies place a premium on having preferential access to capacity”.

Its certainly better late than never but it will take a long time to change the ingrained habits of the supply chain managers at Applied (similar to those in the auto industry). In our long history in the semiconductor industry we know of a number of sub suppliers who either didn’t do business with Applied or preferred doing business with others due to Applied’s hardball tactics. Now that things are tight its going to be even tougher to get new/more suppliers that are already servicing other customers..

We will likely see some impact on margins near term for expediting supplies or shipments or simply paying more to incentivize suppliers.

The stocks

We will likely see a knee jerk reaction across the space for what is a problem that is more specific to Applied although not completely limited to Applied as others have issues though not as bad.

The fear will be that supply issues will get worse and spread further across the industry which is not untrue.

We had suggested in our Oct 4th preview that the stocks had seen their near term peak and supply chain fears could be the reason that keeps them in check for the near term.

In general we think that KLAC and LRCX have done a better job on the supply chain issues with KLAC seeing essentially zero impact. ASML trades in another universe so even supply chain issues are of little impact given their monopolistic position.

We continue with the view that upside is limited in general as the fears will likely grow especially after Applied’s report.

Sub-suppliers may see a bit more love in the near term but shouldn’t expect it to last through a downturn when it comes.

Also Read:

KLAC- Foundry/Logic Drives Outperformance- No Supply Chain Woes- Nice Beat

Intel – “Super” Moore’s Law Time warp-“TSMC inside” GPU & Global Flounders IPO

Intel- Analysts/Investor flub shows disconnect on Intel, Industry & challenges