ads mdx semiwiki building trust gen 800x100ai

Podcast EP119: The Latest Innovations at Agile Analog with Barry Paterson

Podcast EP119: The Latest Innovations at Agile Analog with Barry Paterson
by Daniel Nenni on 11-04-2022 at 10:00 am

Dan is joined by Barry Paterson, CEO of Agile Analog.  Barry has held senior leadership, engineering and product management roles at Wolfson Microelectronics and Dialog Semiconductor. He has been involved in the development of custom mixed-signal silicon solutions for many of the leading mobile and consumer electronics companies across the world. He has a technical background in Ethernet, audio, haptics and power management and is passionate about working with customers to deliver high quality products.

Barry provides an overview of exciting new development at Agile Analog, including new technology and new analog IP titles, as well as a unique digital cell library that is well-suited to digital control of analog processing.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Pushing Acceleration to the Edge

Pushing Acceleration to the Edge
by Dave Bursky on 11-04-2022 at 6:00 am

performane table siemens eda

As more AI applications turn to edge computing to reduce latencies, the need for more computational performance at the edge continues to increase. However, commodity compute engines don’t have enough compute power or are too power-hungry to meet the needs of edge systems. Thus, when designing AI accelerators for the edge, Joe Sawicki, the Executive VP for the IC EDA Division of Siemens, at last month’s AI Hardware Summit in Santa Clara, Calif., suggests that there are several approaches to consider: Custom hardware that is optimized for performance, high-level synthesis to radically reduce design cost, and hybrid verification to significantly reduce validation cost.

When these approaches are combined, designers can craft high-performance AI accelerators for edge computing applications. That high performance will be needed since model sizes of the AI algorithms are growing over time—over the past five years, explained Sawicki, the models (such as the ImageNet algorithm) have increased in computational load by more than 100X and that growth shows no sign of slowing down.

Industry estimates from International Business Strategies show that the AI value contribution to the overall IC market revenue will grow from today’s 18% to 66% by the year 2030, while the total IC market revenue will grow from $529 billion today to $1144 billion by 2030. The gain in AI value demonstrates the increasing momentum in custom accelerators to improve both edge device performance and overall AI performance. Although the customized accelerators can deliver exceptional performance, they have one drawback – they have limited flexibility since they are typically optimized for a limited number of algorithms.

In an example described by Sawicki, a configurable block of AI intellectual property is compared to a custom AI accelerator design. Area, Speed, Power, and Energy all show significant reductions for the custom accelerator (see table). For example, area is 50% smaller, speed improved by 94%, power dropped by 60%, and energy consumed per inference was reduced by 97%. It’s not magic here–it’s because the architecture was specifically targeted to implement the specific algorithm explained Sawicki.

Part of the optimization challenge is to determine the best level of quantization—for example, 32-bit floating-point accuracy is often preferred, but with just a small loss in the result precision, a 10-bit fixed-point alternative that saves 20X in area/power can be used instead, thus improving compute throughput and reducing chip area. Additionally, by applying high-level synthesis in the hardware design flow, designers can go from the neural network architecture to a C++ bit-accurate algorithm to a C++ Catapult architecture and on to high-level synthesis to craft a synthesizable RTL design that can be implemented using RTL synthesis tools and back-end tool flows.

The use of C++ allows designers to easily explore various architectural approaches, suggests Sawicki. In a second example, he described the design exploration of a RISC-V Rocket core and three design options in a 16-nm process—one that is optimized for low power using an accelerator plus the Rocket core, a second that focused on shrinking the core area to minimize silicon cost, and a third approach that was optimized for speed. For the low-power option, the core plus accelerator consumed 86.54 mW, ran at 25.67 ms, and occupied a total area of about 3 million square microns.  The second option reduced the total silicon area by about one-third to 2 million square microns, slowed down the execution to 37.54 ms, and kept the power to just under 90 mW. Lastly, the speed optimized version upped the area back to about the same level as the first option, improved the speed to just 12.07 ms, but upped the power consumption to 93.45 mW. These tradeoffs show that design choices can considerably affect the performance and area of a potential design.

The incorporation of AI/ML function in an EdgeAI design also adds additional verification challenges. The verification tools must deal with the training data set, the AI network mapping, as well as the AI accelerator logic (the structured RTL). As Sawicki explained, functional benchmarking has to deal with virtual platform performance, modeling of the hybrid platform, and simulation/emulation of the modeling platform. And all through that the tools must also perform power and performance analysis. To do that, the verification technology has to be matched to the needs of the project—hybrid verification and run-fast/run-accurate (ability to switch between model fidelity in a single run) make it possible to test real-world workloads in the verification environment.

By using open standards Sawicki expects designers to leverage a rich ecosystem of modeling capabilities in a heterogeneous environment for multi-domain and multi-vendor modeling. Tools for scenario generation, algorithmic modeling, TLM modeling and physics simulations can all be tied together via a system modeling interconnect approach that allows analog and digital simulation, hardware assisted verification through the use of digital twins, and virtual platform models to interact.

https://eda.sw.siemens.com/en-US/

Also Read:

Why Use PADS Professional Premium for Electronic Design

DFT Moves up to 2.5D and 3D IC

Siemens EDA Discuss Permanent and Transient Faults

3D IC – Managing the System-level Netlist


Elevating Production Testing with proteanTecs and Advantest’s ACS Edge™ Platforms

Elevating Production Testing with proteanTecs and Advantest’s ACS Edge™ Platforms
by Kalar Rajendiran on 11-03-2022 at 10:00 am

Embedded Universal Chip Telemetry Agents

SemiWiki recently posted a blog on “Deep Data Analytics for Accelerating SoC Product Development.” That blog focused on proteanTecs’ AI-enabled chip analytics platform that helps accelerate SoC product development. The blog provided insight into proteanTecs’ approach and shared quantifiable business-impact metrics as derived by Semico using a sample data center accelerator SoC.

proteanTecs recently published a whitepaper on how to enhance the economics of testing by leveraging its solution together with Advantest’s Advanced Cloud Solutions (ACS) Edge solution. With proteanTecs delivering enhanced visibility through embedded Universal Chip Telemetry (UCT), the ACS Edge enables the production test environment for real-time execution. The combination elevates production testing to a new level. This post will discuss the salient points garnered from that whitepaper.

Challenges for Enhanced Testing

With the ever increasing levels of system integration, the following are some of the Testing challenges.

    • Applications involve increasing amounts of hardware-software interactions, whether it is mobile oriented products or other embedded products
    • Ever increasing levels of integration within the same physical constraint makes it hard to understand the operating conditions at failure
    • Difficulty correlating results across test, assembly and system integration stages, due to lack of a common data language
    • Lack of visibility into operating conditions leading up to a failure, thereby making root cause analysis (RCA) difficult

Gaps in Current Testing Approaches

Traditionally, each stage of testing from post-silicon validation to system level testing has been with different equipment using different data languages for describing the respective tests. While the segmented approach may have helped optimize each stage, overall results could fall short of what is achievable through an integrated approach. As the segmented approach lacks data sharing among different testing stages, optimizing product cost is difficult. An overall optimization workflow with data sharing is needed for effective and economically efficient testing.

Moving Coverage Between Stages

Wafer Test has the highest cost per second, followed by Package Test and then System Level Test (SLT). But in order for a device manufacturer to guarantee the field failure rate, all that matters is the quality and reliability of the final product. As a device moves through different stages, the test cost vs cost of scrapping a device changes, directly impacting the profitability of a product. Naturally, a device manufacturer would want to run all of the effective screens at the earliest point, saving only elusive faults for system level testing.

For example, $500K of scrap costs may have been avoided by moving 5% coverage from package test to wafer sort. In order to shift-left or shift-right some coverage, one must correlate if the test at one stage is equivalent to the test at another stage. But that decision to move can only be enabled through improved visibility into each stage, allowing tests to be adjusted for each test insertion.

proteanTecs Platform

proteanTecs platform is designed to provide detailed device visibility from initial manufacturing testing throughout the device’s lifetime. It is a holistic software platform that applies machine learning and analytics to data created by on-chip UCT agents. The platform’s pre-configured set of dashboards provide meaningful and useful insights and alerts. The platform is scalable and flexible and fuses information from the entire fleet of chips and systems. And, it offers an open development environment for incorporation of customized algorithms.

Advantest’s ACS Edge™ Platform

Advantest’s ACS Edge™ is a high performance, highly secure edge compute and analytics solution enabling ultra-fast algorithmic AI decision making with millisecond latencies during test execution. ACS Edge connects to test equipment via a private high-speed encrypted link. Users develop machine learning or other compute-intensive applications which operate near real-time on data generated by tests in the test program. These applications are wrapped in an Open Container Initiative (OCI) compliant container which simplifies global distribution and management while hardening them against compute environment changes.

proteanTecs + ACS Edge: Tremendous Potential

The combination of the proteanTecs analytics platform with the ACS Edge analytics solution offers tremendous potential to change the economics of testing. Compared to traditional testing approaches, the combined solution delivers the following benefits.

    • Reduced test execution time (more than 70% reduction compared to traditional approaches)
    • Production yield improvement (more than 2% compared to traditional approaches)
    • Enhanced outlier detection within an insertion
    • Real-time adaptive test
    • Reduction in retesting
    • Improved visibility for shift-left decisions

The whitepaper includes lot of examples and use cases with benefits achieved reported in quantified form. For full details, download the joint whitepaper.

For more details about:

The proteanTecs platform, visit https://www.proteantecs.com/solutions

The ACS Edge platform, visit https://www.advantest.com/acs/edge

Also Read:

proteanTecs Technology Helps GUC Characterize Its GLink™ High-Speed Interface

How Deep Data Analytics Accelerates SoC Product Development

CEO Interview: Shai Cohen of proteanTecs


Step into the Future with New Area-Selective Processing Solutions for FSAV

Step into the Future with New Area-Selective Processing Solutions for FSAV
by Bhushan Zope on 11-03-2022 at 6:00 am

Figure4

Area selective processing (ASP) is assuming ever greater importance in semiconductor fabrication. ASP involves deposition and removal of materials at the molecular level¾10 nm or less.  Key applications of ASP include self-aligned contacts and fully self-aligned vias (FSAVs), scaling boosters that are essential to continue device shrinkage. By supporting techniques like metal-recess flows and dielectric-on-dielectric (DoD) flows, ASP can provide the tools needed for the semiconductor industry to stay on the roadmap.

Executing these precision fabrication steps requires very tight control of deposition and etch. These are complex processes with challenging chemistries. By addressing those issues, EMD Electronics, a business of Merck KGaA, Darmstadt, Germany, and its integrated portfolio of companies, provides a complete solution that enables semiconductor companies to build innovative devices.

Paths to ASP

Just as different types of photoresists have been formulated for conventional wafer patterning, so different ASP approaches have been developed to perform different tasks in additive fabrication.

·        Intrinsically Selective Molecules

Intrinsically selective molecules are molecules engineered to aggregate on specific materials, such as metals only or dielectrics only. They can be used for selective deposition or for selective epitaxial growth. Applications include the growth of silicon germanium (SiGe) source/drains for logic or memory transistors, or Si/SiGe stacks for nanosheet gate-all-around (GAA) transistors.

·        Small-Molecule Inhibitors

As the name suggests, small-molecule inhibitors discourage molecules from depositing on a surface. They are typically used to prepare nonplanar surfaces for the pattering of very small structures. The approach enables more aggressive scaling of design rules, which might otherwise be limited by gate-to-source/drain spacing and metal fill.

·        Self-Assembled Monolayers

Self-assembled monolayers (SAMs) can be spun on or applied using vapor-phase deposition. These precisely assembled layers are used for selective capping of the top or bottom of fine structures such as lines and spaces or vias. SAMs enable bottom-up fill of advanced-node trenches or vias.

·        Atomic-Layer Etching

Being able to precisely remove material is just as important as being able to precisely deposit it. Atomic-layer etching (ALE) enables the removal of monolayers of material. Applications include thickness reduction of DRAM dielectrics and cleanup steps like reactive ion etching (RIE). Atomic-layer etch is also used to create SiGe recesses in GAA transistors or to etch away other metal recesses.

EMD Electronics participates in all these areas.

ASP in action¾fully self-aligned vias (FSAVs)

FSAVs provide a good example of how ASP is being used to support device scaling. FSAVs are vias aligned with lines during the fabrication process through successive deposition and etch. It’s an effective approach but depends upon accurate positioning. Beyond the 3-nm node, edge placement errors (EPEs) from one layer to the next can be a problem. Now, the via is too close to the adjacent line. In extreme cases, copper can diffuse to the adjacent line, causing a short.

Reducing the width of the via in the next layer can help prevent this but the trade-off is increased resistance, with heat generation and greater power consumption, as well as RC delay. The alternative is to widen the spacing. This can be done in one of two ways: metal-recess flow and dielectric-on-dielectric (DoD) flow.

Metal-recess flow is an ASP step based on precision removal. It begins post CMP with a highly controlled etching step to create recesses in the copper (see figure 1). A second precision etch step cleans up the slides of the barriers down to the dielectric. Finally, the next layer of copper is deposited. The metal recess step increases the spacing between the via and the adjacent line.

Figure 1: Edge placement errors (EPEs) that reduce spacing between vias and adjacent lines can enable copper to diffuse across to the adjacent line, causing a short. In metal-recess flow, recessing the copper lines increases the distance between the via and the interconnect (right), preventing diffusion.

The DoD flow process requires both precision deposition and precision removal. In the DoD flow process, an inhibitor SAM engineered to grow on the copper but not the dielectric is grown on the metal (see figure 2). Next, a dielectric layer is deposited on the existing dielectric¾but the inhibitor layer prevents the growth of dielectric on the copper. After deposition, the inhibitor layer is removed from the copper. The cycle finishes with the growth of the next layer of via. Once again, the spacing between the via and the adjacent line has been increased using ASP.

Note that the dielectric-on-dielectric layer remains even after removal of the inhibitor layer.

Figure 2: In the DoD flow process, a copper-selective inhibitor (gray) is grown on the copper (brown). These features make it possible to deposit a dielectric layer (green) on top of the existing dielectric (blue). Once the inhibitor is removed from the copper, the next layer of via is deposited. Once again, the space between the via and the adjacent line is extended.

The fully self-aligned via (FSAV) process with ASP presents a few challenges:

  • DoD Process:
    • Materials: Developing a SAM selective for copper
    • Process conditions: High temperatures can degrade SAM molecules, but atomic-layer deposition (ALD) traditionally yields the best quality dielectric thin films at higher temperatures.
    • Chemistry: The halogen and oxygen precursors typically used for silicon dioxide (SiO2) ALD can damage the SAM, so we need halogen-free precursors and milder oxidants such as water.
  • ALE of copper is a multistage process requiring complex chemistry. The first stage involves chemically modifying the copper. The second stage actually removes the modified copper.

EMD Electronics has the answers

We specialize in sophisticated chemistries and can draw on the resources of our integrated solutions across the organization. This has enabled us to solve the core challenges across the process.

  • Materials: We’ve developed a SAM molecule that is copper selective.
  • Process conditions and chemistry: We’ve developed a reduced-temperature ALD based on a halogen-free silicon precursor with water as the oxidant. Despite running at lower temperatures, the process yields high quality dielectric films (dielectric constant ≤ 5; leakage current ≤ 5e-7A/cm2), without damaging the SAM molecules.
  • DoD Selectivity: as a result of the above-mentioned material and process development, we are able to demonstrate selectivity of deposited SiO2 film up to ~10nm for deposition on SiO2 and minimal, if any, deposition on copper (figure 3)
  • Chemistry: By optimizing the molecular composition of the chemistry and the copper modification stage of our ALE process, we have demonstrated effective and highly controlled copper removal with minimal added roughness. This is an important improvement over conventional processes, that allowed for atomic level control over copper etch rate (see figure 4).

Figure 3: DoD selectivity of >95% for deposited SiO2 film upto thickness of ~10nm on SiO2 but not on copper (Ref. G. Liu et al., ASD 2022 Conference)

Figure 4: Atomic level control over copper removal using our optimized ALE (blue) process , beginning at 170° C. Conventional ALE (pink) shows minimal removal of copper, regardless of process temperature.

Conclusion

ASP is an essential technology to equip the semiconductor industry to meet the challenges of the future. We used FSAVs here to highlight the power of ASP and our technologies but there are many other applications for this suite of processes in the semiconductor industry. ASP can be used for selective metal definition for source/drain. They can be applied to create an area-selective copper line barrier, for example, or for metal-on-metal deposition, such as a cobalt cap.

EMD Electronics supports the full ASP process, from the SAMs required to protect the surface to the precursors of processes required to deposit materials at low temperatures to the specialty processes needed to etch away inhibitors. We have novel chemistries to address the challenges. We have processes and chemistries optimized to work together. Most of all, we have an organization that delivers vertically integrated support for development, problem-solving and innovation.

Also Read:

Integrating Materials Solutions with Alex Yoon of Intermolecular

Ferroelectric Hafnia-based Materials for Neuromorphic ICs

Webinar: Rapid Exploration of Advanced Materials (for Ferroelectric Memory)


Podcast EP118: An Assessment of the Worldwide Semiconductor Ecosystem with Sagar Pushpala

Podcast EP118: An Assessment of the Worldwide Semiconductor Ecosystem with Sagar Pushpala
by Daniel Nenni on 11-02-2022 at 10:00 am

Dan is joined by Sagar Pushpala, a seasoned semiconductor professional with more than 35 years of experience with IDMs, fabless and related semiconductor entities. He is actively involved with nearly a dozen companies in the US, Singapore and India in advisory/board, consulting and investment roles.

Dan explores the dynamics of the worldwide semiconductor ecosystem with Sagar. Potential mergers, consolidation, areas requiring more investment and the interplay between financial and political forces are all discussed with specific comments for all regions of the world.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Slashing Power in Wearables. The Next Step

Slashing Power in Wearables. The Next Step
by Bernard Murphy on 11-02-2022 at 6:00 am

Fitness bamd and phone min

In wearables and hearables, low power is king. Earbuds for example still only manage a half-day active use before we need to recharge. Half a day falls short of truly convenient for most of us – a full day would be much better, allowing for overnight recharge. Physics limits battery sizes so system designers must look to SoC architectures to reduce power further. BLE is the obvious choice for communication with a nearby phone or other device but is a minimum requirement; further extending battery life requires more ingenuity. Some will come from cleverness in software running on an MCU or DSP, managing sensing, audio, automatic noise reduction and gesture recognition. However one significant opportunity is often overlooked – power reduction through application-specific SRAMs.

Why look to SRAMs for power reduction?

Embedded SRAM compilers provide some help in managing power, though typically fairly coarse-grained. Memory compiler vendors have consolidated significantly and together with foundry IP services must serve a broad range of market needs. Those providers can’t optimize for every possibility; instead, they tune for most popular options. Any design with an application-specific requirement must make do with a close but probably less than ideal fit. Second, all competitors have access to the same compilers. Since a significant percentage of a power budget consumed is likely in embedded memory, using a mainstream memory is a missed opportunity to differentiate on power.

Is it really possible to do better than the standard compilers? Dynamic power control in a memory doesn’t work the same way as in logic. Standard compilers offer a variety of compile options which primarily control the periphery and array-wide power switching. One way to go deeper while still using standard bitcells is to augment standard memory architectures. Clever address coding, limiting long line voltage swings and better control of bitline voltage swings are possible techniques here. For leakage, support for multiple banks allows for independent control of each bank and bitline control can also reduce leakage. Together these options can contribute significant power saving, both dynamic and static.

Going lower

Another familiar way to reduce power is to reduce voltage in DVFS operation. Here again, the constraints for memory are a little different than those for logic. The lower bound is the minimum voltage at which a memory cell can retain its value – Vret. However, the lowest voltage at which that memory content can be reliably accessed – Vmin – is normally close to the nominal operating voltage. A memory design which could drop Vmin closer to Vret could deliver significant power saving thanks to the V2 factor in dynamic power.

sureCore memory solutions

sureCore, based in the UK, has built application specific compilers for low power memories taking advantage of such ideas. One product, PowerMiser, offers single port SRAM IP with programmable sleep modes. PowerMiser delivers up to 50% dynamic and 20% static power reduction over competing options and is available as a compiler or in single instances. EverOn is the ultra-low power option, supporting operating voltages as low as Vret. In banked memory, a single bank can be powered at this low level to support wake functionality while other banks are powered off. This option is also available in compiler and single instance options. EverOn has shown 70% reduction in dynamic power and 60% in leakage power.

Another product line, MiniMiser, builds register files tuned to application-specific needs. The single rail design is not based on foundry bit cells and can be connected directly to system logic. MiniMiser memories commonly deliver power savings of >50% over competing options and are typically delivered as instances.

Not magic but careful design

How is sureCore able to offer these advantages? They build their own memory architectures, their own compilers and in some instance their own bit cells, guided by discussions with customers and their system application needs. They have also built their own automation for verification and robust characterization. Paul Wells, the CEO tells me that in many instances sureCore see common themes, around wearable, around AI, and around other applications. These drive in some cases standardized compilers, though they also advise clients on how to optimize configurations to system needs. Discussions might examine factors such read versus write dominance in an application for example.

In cases where a client needs the best possible power profile, sureCore offer a custom program called sureFIT. Here they work with a client using every trick in the book to build a one-of-a-kind solution. Techniques here include segmented arrays and bit-line voltage control, near-threshold operation, pipelined read circuitry and more.

sureCore memories already has deployments in several wearable, AI and RISC-V applications. You should check them out. As an IP service provider ranging between standard solutions and bespoke options to deliver best-in-class power profiles for many edge applications, sureCore is an intriguing proposition. It will be interesting to see how their business evolves!


Quadric’s Chimera GPNPU IP Blends NPU and DSP to Create a New Category of Hybrid SoC Processor

Quadric’s Chimera GPNPU IP Blends NPU and DSP to Create a New Category of Hybrid SoC Processor
by Kalar Rajendiran on 11-01-2022 at 10:00 am

Memory Optimization Equals Power Minimization

Performance, Power and Area (PPA) are the commonly touted metrics in the semiconductor industry placing PPA among the most widely used acronyms relating to chip development. And rightly so as these three metrics greatly impact all electronic products that are developed. The degree of impact depends of course on the specific electronic products and the target end markets and applications. Accordingly, PPA tradeoff decisions are made by product companies when choosing the various chips (and IP for ASICs) for their respective end products.

Yet another important consideration has to do with ensuring a product’s longevity without requiring a redesign. In other words, future proofing one’s product against changing market/product requirements. While product companies deploy auxiliary ways to extend the life of a product before having to redesign, the path that offers direct future proofing is the preferred one. For example, FPGAs played a key role in future proofing communications infrastructure oriented products during the market’s aggressive growth period with fast changing requirements. Sure, an alternate path may have offered better PPA benefits than the FPGA path delivered. But the FPGA path helped the product companies save lot of time and money by avoiding redesign/re-spin of chips and ensured they were able to maintain/grow their market share.

An additional consideration is the path that offers ease and speed at which a product could be developed. This directly translates to time to market which translates to market share and profitability. And last but not least is the ease with which customers could develop applications software on the product.

Market Situation

Artificial Intelligence (AI) driven, machine learning (ML) enabled products and applications are fast growing with enormous market growth opportunities. New ML models are being introduced in rapid fashion and existing models are being enhanced as well. The market opportunity spectrum ranges from data centers to edge-AI products and applications. Many of the products targeting these markets cannot afford to tradeoff PPA against future proofing against ease of product/application development.

What if there is a way to offer PPA optimization, future proofing, ease of product development and ease of application development, all rolled into one offering? A hybrid processor IP that simplifies SoC hardware design and programming. An unified architecture that addresses ML inference, pre-processing and post-processing all-in-one.

New Category of Hybrid SoC Processor

Recently, Quadric announced the first family of general-purpose neural processors (GPNPUs), a semiconductor intellectual property (IP) offering that blends neural processing accelerator and digital signal processor (DSP). The IP uses one unified architecture addressing ML performance characteristics and DSP capabilities with full C++ programmability. This post will look at the components of a typical ML-enabled SoC architecture, its limitations, the Quadric offerings, benefits and availability information.

Components of a Typical ML-Enabled SoC Architecture

The key components of a ML-enabled architecture include the neural processing unit (NPU), the digital signal processing (DSP) unit and a real-time central processing unit (CPU). The NPU is used to run the graph layers of today’s most popular ML networks and performs very well on known inference workloads. The DSPs are used to efficiently execute voice and image processing and involve complex math operations. The real-time CPU is used for coordinating ML workloads between the NPU, DSP and the memory that holds the ML model weights. Typically, only the CPU is directly available to the software developer for code development. The NPU and DSP are accessible only through pre-defined application programming interfaces (APIs).

Limitations of a Typical Architecture

As mentioned above, the typical accelerator NPUs are not fully programmable processors. While they run known graph layers very efficiently, they cannot run new layers as the ML models evolve. If a ML operator not available through an API is needed, it would need to be added on the CPU, knowing that it will perform poorly. The architecture does not lend itself to future proofing for new ML models and ML operators. At best, lower performing solutions can be rendered by implementing new ML operators on the real-time CPU.

Another limitation is that programmers have to partition their code across the NPU, DSP and real-time CPU and then tune the interactions for meeting the desired performance goals. The typical architecture may also lead to splitting matrix operations between a NPU core and a CPU core. This action leads to inference latency and power dissipation issues as large data blocks need to be exchanged between the cores.

Multiple IP cores from various IP suppliers force developers to rely on multiple design and productivity tool chains. Having to use multiple tool chains generally prolongs development times and makes debugging challenging as well.

Benefits of The Quadric Approach

Quadric’s Chimera GPNPU family creates a unified, single-core architecture for both ML inference and related conventional C++ processing of images, video, radar and other signals. This allows for the neural network graphs and C++ code to be merged into a single software code stream. Memory bandwidth is optimized by a single unified compilation stack and leads to significant power minimization. Programming a single-core system is also a lot easier and preferred than dealing with a heterogeneous multi-core system. Only one tool chain is required for scalar, vector, and matrix computations.

Other benefits of the unified Chimera GPNPU architecture include area and power savings resulting from not having to shuffle activation data between the NPU, DSP and CPU. The unified core architecture greatly simplifies hardware integration and makes the task of performance optimization much easier.

The system design task of profiling memory usage to determine the optimal amount of off-chip bandwidth is simplified as well. This also directly leads to power minimization.

Application Development

The Chimera software development kit (SDK) allows for the merging of graph code from common ML training toolsets with customers’ C++ code through a two-step compilation process. This leads to a single code stream that can run on the unified Chimera single processor core. The widely used ML training toolsets supported are TensorFlow, PyTorch, ONNX, and Caffe. Users of the implemented SoC will have full access to all of Chimera core resources for maximum flexibility with applications programming. The entire system can also be debugged from within a single debug console.

Future Proofing Without Losing Performance

The Chimera GPNPU architecture excels at convolution layers, the heart of convolutional neural networks (CNNs).  Chimera GPNPUs can run any ML operator. Custom ML operators can be added by writing a C++ kernel utilizing the Chimera Compute Library (CCL) API and compiling that kernel using the Chimera SDK.  The custom operators are performant at the native operators’ level as they utilize the relevant core resources of the Chimera GPNPU.

SoC developers can implement new neural network operators and libraries long after the SoC has been taped out. This in itself increases a chip’s useful life dramatically.

Software developers can continue to optimize the performance of their models and algorithms throughout a product’s lifecycle. They can add new features and functionality and gain a competitive edge for their products in the market place.

Current Offerings from Quadric

The Chimera architecture has already been proven at-speed in silicon. The entire family of QB Series GPNPUs can achieve 1 GHz operation in mainstream 16nm and 7nm processes using conventional standard cell flows and commonly available single-ported SRAM.  The Chimera cores can be targeted to any silicon foundry and any process technology.

The QB series of the Chimera family of GPNPUs includes three cores:

  • Chimera QB1 – 1 trillion operations per second (TOPS) machine learning, 64 giga operations per second (GOPs) DSP capability
  • Chimera QB4 – 4 TOPS machine learning, 256 GOPs DSP
  • Chimera QB16 – 16 TOPS machine learning, 1 TOPS DSP

If needed, two or more Chimera cores can be paired together to meet even higher performance levels

For more details, visit Quadric.io.

Availability

Quadric is ready for immediate engagement with customers.

November 2022:           Beta release of Quadric’s SDK.

Q1, 2023:                           Full production release of Quadric’s SDK

End of Q1, 2023:            Production ready RTL of Quadric’s GPNPU IP

About Quadric.io

Quadric.io Inc. is the leading licensor of general-purpose neural processor IP (GPNPU) that runs both machine learning inference workloads and classic DSP and control algorithms.  Quadric’s unified hardware and software architecture is optimized for on-device ML inference. Learn more at www.quadric.io.

Also Read:

Flash Memory Market Ushered in Fierce Competition with the Digitalization of Electric Vehicles

The Corellium Experience Moves to EDA

CEVA’s LE Audio/Auracast Solution


Why Use PADS Professional Premium for Electronic Design

Why Use PADS Professional Premium for Electronic Design
by Daniel Payne on 11-01-2022 at 6:00 am

PADS Designer min

My IC design career started just a few years before PADS got started in 1985 with a DOS-based tool for PCB design. A lot has changed since then, as PADS was acquired by Mentor Graphics in 2001, and continued to grow under Siemens EDA, now with four versions to choose from, where the top version is called PADS Professional Premium:

This blog focuses on PADS Professional Premium,  an EDA tool packed with useful automation, enabling engineers to design and verify their latest electronic systems. I’ll cover the top 12 features to give you an idea of how they help automate engineering tasks.

Schematic Definition

Schematic capture with PADS Designer is the starting point for circuit design and simulation, where components are selected from a centralized library and graphically placed. Hierarchy support allows for abstraction, and you can even use interconnect automation between components and blocks, eliminating manual connections.

Constraints are entered in an intuitive spreadsheet, lowering the time spent on entering physical and electrical rules for PCB designs.

PADS Designer

Part Selection and Library Creation

With billions of parts in the catalog, it’s quick to find the right component using PartQuest Portal Essential, a cloud-based app. For parts that aren’t found you can request they be added, or create your own custom library element.

PartQuest Explorer

Component Sourcing Data

You will always know if your components are available, their price, plus compliance and lifecycle data, because the Supply Chain cloud app is connected with 80 suppliers from around the globe. No more surprises about component shortages, and creating a BOM is quickly done in the Supply Chain app.

Verification by Simulation

Measuring the signal integrity in both pre-layout and post-layout ensures that your design will work reliably in the field, avoiding the time lost and expense of respins. Logic simulation, and AMS simulation using SPICE or VHDL-AMS models verify correctness before manufacturing begins. The analysis environment is integrated with the schematic and layout tools for easier signal integrity analysis.

Signal integrity analysis

Product Variants

Engineers create and manage their product variants by removing and replacing components. Each variant BOM is auto generated, and you can even compare the BOMs for reviews.

Variant Management

Collaboration and Version Management

Having a single source of truth is fundamental to team design, and using cloud access ensures that the right version is being used. Both desktop integration and web browser viewing approaches can be used with Connect for PADS Professional Premium. During the review process designers can mark-up using Connect. Any project related file can be shared with the entire team using Connect.

Web browser and desktop views

PCB Design

Placing components is simplified by using floorplanning groups set in the schematic. Both schematic and layout tools use the same spreadsheet environment for constraints. Automation for plane creation replaces manual effort.

Routing

Users can interactively control the router, instead of manually routing, reducing the PCB layout time up to 80%. The completed PCB layout can be transferred to manufacturing using the popular ODB++ and other formats.

Sketch and Interactive Route

MCAD Integration

PCB and mechanical engineers collaborate by using data exchange standards like ProSTEP file exchange format (IDX). For 3D models you can import into Siemens Solid Edge or NX tools, avoiding the use of two libraries.

Rigid-flex PCB Design

My laptop and smart phone devices both have several rigid-flex boards and this popular design style is supported in a correct-by-construction process used in the Layout tool.

Rigid-flex boards

RF Design

Bluetooth, WiFi and 5G devices have high frequency signals that require RF analysis in tools like Keysight ADS. The PADS tool integrates with RF analysis tools with import/export in schematic and PCB. Layout techniques for RF like via shielding, or tapered and chamfered corner traces are automated.

RF Design

FPGA/PCB Co-design

The high pin counts of FPGAs are challenging to route, so with FPGA/PCB co-design you can have fewer net line crossover, reduced routing layers, improved signal integrity fidelity, shorter traces, and a reduced number of vias.

Before and After: FPGA/PCB co-design

Summary

The PADS Professional Premium tool from Siemens EDA has met the challenges of modern PCB design flow like increased electronic product complexity,  through the use of automated features. To get more details there is an 18 page paper, requiring a brief registration.

Related Blogs


Flash Memory Market Ushered in Fierce Competition with the Digitalization of Electric Vehicles

Flash Memory Market Ushered in Fierce Competition with the Digitalization of Electric Vehicles
by Daniel Nenni on 10-31-2022 at 10:00 am

figure1

Governments worldwide have been paying close attention to alternative energy vehicles recently. Many have launched related electric vehicle subsidy policies, accelerating global sales over recent years.

In 2021 at IAA Mobility in Munich, Germany, many major car manufacturers, including Porsche, showcased their all-electric or related concept cars. In addition to adding more smart driving features, most new vehicles are equipped with extensive digital touch dashboards. The in-vehicle infotainment system, security and anti-hacking technology and the overall efficiency of electric vehicles have also been improved. In the meantime, cloud and edge computing capabilities are tremendously enhanced, epitomized by digital data streaming processing, zonal structure and digitalization level of the device. New energy storage concepts and new-look environmental enhancements, such as material recycling, have also been introduced.

WEBINAR REPLAY: Increasing Security Concerns in IoT Devices 

In June 2022, the European Parliament voted to cease the sale of fossil fuel vehicles from 2035. Although this was opposed, it has become common knowledge that electric vehicles will become mainstream in the future. The digital revolution of the automotive industry has also brought an infinite opportunities to memory chip manufacturers. Among the various types of memory chips, the competition for flash memory chips in the automotive market seems to be more intense.

HPC Platform to become the new standard for future cars with Flash memory

The transition from mainstream fossil fuel vehicles to electric vehicles, and now all-electric and driverless vehicles, requires the constant improvement of driver assistance systems (ADAS), which means that automotive chips must continually strive to deliver higher levels of self-driving features and sustainable enhancements of the AI algorithms.

HPC integrates several complex technologies, such as high-performance multi-core chips, in-vehicle operating systems, diversified software systems, high-speed and low-latency communication, functional safety, information security and OTA to satisfy application requirements like high-level autonomous driving and vehicle control. The introduction of HPC represents a fundamental restructuring of the automotive electrical and electronic architecture. Currently, many Tier 1 automotive manufacturers are considering adopting zonal architecture, where the central HPC platform makes the highest-level decision and then transmits data commands and power through domain control units (DCU) scattered in various parts of the vehicle.

Like today’s smartphones, high-end smart cars with enhanced software and hardware are available at higher costs. As a result, customers can enjoy higher-level audio and video facilities and enhanced security. Customers can also purchase an entry-level configured vehicle that meets regulatory requirements at the lowest cost. For car manufacturers, the source of revenue will no longer rely solely on new car sales or repairs and maintenance, but the software and platform customers can freely choose based on tiered pricing. Customers can subscribe to online video streaming on a monthly or annual basis and even purchase related services such as assisted autonomous driving at a one-time makeup price, which will bring additional revenue to car makers.

According to relevant statistics, the automotive HPC market size in 2022 is about $560 million, and it is expected to grow year on year reaching $8.05 billion by 2025. To seize the market, automotive chip developers have launched various new HPC SoCs (system on chip), some of which are even equipped with AI deep learning accelerators, which will collect, analyze and learn automotive-related data, as well as accumulate extensive data databases, to help future algorithms continue to improve. Therefore, memory chips responsible for collecting and storing vast amounts of information have become the cornerstone of the car’s journey to digitization.

Ensuring a Vehicle’s Information Flow Supports the Flash Memory

With the evolution of automobiles to digitalization, the demand for data storage and the transmission of information is getting higher. The flow of information and the reading/processing of data are key elements in the communication between devices. At the same time, OTA acts as a medium for devices to continuously learn and recognize new devices and communication languages. Therefore, the demand for applications such as OTA wireless updates will increase, and OTA-related personal and vehicle information security maintenance and authentication/authorization will become more critical.

In addition, communicating the flow of information is also required between in-vehicle devices, which created C-V2X (cellular vehicle to everything), a vital role in enhancing road safety, making traffic smoother and saving total energy consumption. As related statistics show, C-V2X is growing at a compound annual growth rate of about 30%. It is estimated that the market will reach $18.8 billion by 2027.

C-V2X is an in-vehicle communication system that includes more specific categories such as V2I (Vehicle-to-Infrastructure), V2N (Vehicle-to-Network), V2V (Vehicle-to-Vehicle), V2P (Vehicle-to-Pedestrian), V2D (Vehicle-to-device) and other information transmission and interpretation between vehicles and different systems or devices. Based on the information flow of these wireless communications, the timely performance and data flow analysis for driving safety is most important.

During this process, all automotive applications require qualified storage products and devices that will endure in embedded environments exposed to extreme temperatures. As a reliable non-volatile memory, NOR flash typically features fast reading speed, high stability, and no data loss when interrupted, which makes it ideal for automotive applications.

WEBINAR REPLAY: Increasing Security Concerns in IoT Devices 

However, with the endless increase in the amount of data that devices need to record, such as boot or startup information and user-related information, NAND flash, which has a larger capacity and cost-efficient advantage, gradually revealed its importance.

The Driving Force behind the Automotive Flash Memory Market

After decades of flash memory development, the NOR flash market has already tent to favour niche industries. However, after 2016, the increased demand for consumer electronics and IoT devices with low to medium capacity has led the NOR flash market to pick up and attract new players. The new players have brought NOR Flash into a new era of competition.

From an overall market perspective, benefiting from the rise of emerging application scenarios and the impact of shortages, NOR flash has ushered in a rising cycle. According to an IC Insights report,  in the second quarter of 2021, NOR flash only accounted for 4% of the flash memory market. NOR flash products surged by 63% to $2.9 billion last year. NOR flash shipments rose 33 % last year, while average selling prices increased more than 20 %. The NOR flash market is expected to grow another 21% to $3.5 billion in 2022.

However, considering the applications, the current NOR flash is mainly applied in low-capacity applications such as IoT and consumer electronic devices. As competition in this part of the market becomes increasingly fierce, it is easy to fall into a situation where profits diminish. For this reason, major storage manufacturers are trying to break into high-end applications.

The many applications is brought about by the digitalization of automobiles offer opportunities for memory products. According to IC Insights’ statistics on several NOR flash manufacturers’ performance, most NOR flash products’ growth mainly comes from automotive applications. Many manufacturers have seized the automotive market, which has once again activated the vitality of the NOR Flash market.

Winbond is the largest NOR flash supplier, with over $1 billion in sales, accounting for about one-third of the global NOR flash market share. Winbond has been deeply involved in the automotive field for more than ten years as a leading supplier in the NOR flash market. So far, the top ten automobile manufacturers in the world are end customers of Winbond memory products. And their existing niche DRAM, SLC NAND and other product lines have recently expanded their deployment of automotive NOR flash for high-margin product lines.

Winbond Serial NOR Flash has always played a leading role in most automotive applications. However, with the advancement of digitization, the code size that needs to be stored is also increasing. To meet this demand, Winbond launched OctalNAND Flash with 8 I/O ports to fulfil the requirement for instant communication and fast upload and download. Thanks to 8 I/Os, the maximum transmission rate can reach 240MB/s, which can be applied to high-speed and low-latency applications. Octal NOR flash will also join the party shortly.

In addition, Winbond memory products are also suitable for many additional applications, including wireless communication systems, Lidar, tire pressure detectors, in-vehicle wireless charging systems, electric vehicle battery management systems, airbag systems, head-up displays, power management systems and audio systems; as well as the OTA wireless system covering network communication, infotainment and driving recorders. It can support V2X applications, car navigation, digital dashboard, driver monitoring and interactive systems, and it also can be used in gateways, cameras and other in-vehicle fields.

No matter how in-vehicle devices evolve, driving safety will remain a top priority in the automotive industry. Winbond has continuously launched various flash memory products suitable for automotive applications while adhering to the belief of creating high-quality, high-efficiency, low-energy consumption and advanced information security products. Winbond will continue to work closely with automotive manufacturers to fulfil consumers’ wishes.

To learn more, visit Code Storage Flash Memory 

Also Read:

The Corellium Experience Moves to EDA

CEVA’s LE Audio/Auracast Solution

VeriSilicon’s AI-ISP Breaks the Limits of Traditional Computer Vision Technologies


Are EDA companies failing System PCB customers?

Are EDA companies failing System PCB customers?
by Rahul Razdan on 10-31-2022 at 6:00 am

figure1 5

Electronic Design Automation (EDA) is a critical industry which enables the development of electronic systems.  Traditionally, EDA has been bifurcated into two distinctive market segments: Semiconductor and Systems (PCB).   If one were to look at the EDA industry in the early 1970’s, one would find significant capabilities for the physical design of both semiconductor (layout) and system PCBs (board layout). Since the 1970’s, the economics of the EDA industry have been highly tied to the semiconductor industry and specifically Moore’s law. Thus, today, the semiconductor EDA business includes massive amounts of automation in synthesis (automatic Place, Route, Floorplanning), verification (formal, simulation, emulation, HW/SW co-verification), and IP (enabling, test, memory controllers, verification IP, etc).

However, interestingly, the System PCB capabilities are largely the same. That is, PCB physical design tools (ex Allegro) continue to provide value and certainly capabilities have been added for improved signal integrity as well as advanced packaging. However, the relative automation to handle the increasingly complex semiconductors at the higher levels of function (programmable fabrics, SW, AI) in a System PCB fabric are completely missing. In this article, we will discuss the nature of this missing functionality, the impact to the marketplace, and the opportunity for EDA to connect the semiconductor and systems parts of the electronic design process.

System PCB Design:

Figure 1: The Modern  System PCB Design Process for non-consumer

Traditionally, the relationship between semiconductor companies and their customers has been a function of the volume driven by the customer.  In very high volume markets such as the consumer marketplace, large numbers of staff from semiconductor companies work with their system counterparts to effectively co-design the system product.

 

Figure 2:  The System PCB Design Process

For the non-consumer electronics flow, the electronic design steps consist of the following stages (figure one),

  1. System Design:  In this phase, a senior system designer is mapping their idea of function to key electronics components.  In picking these key components, the system designer often makes these choices with the following considerations:
    1. Do these components conform to any certification requirements in my application?
    2. Is there a software (SW) ecosystem which provides so much value that I must pick hardware (HW) components in a specific software Architecture ?
    3. Are there AI/ML components which are critical to my application which imply choice of an optimal HW and SW stack most suited for my end application?
    4. Do these components fit in my operational domain of space, power, and performance at a feasibility level of analysis.
    5. Observation: This stage of determines the vast majority of immediate and lifecycle cost.
    6. Today, this stage of design is largely unstructured with the use of generic personal productivity tools such as Excel, Word, PDF (for reading 200+ page data sheets), and of course google search.  There is little to no EDA support.
  2. System Implementation:  In this phase, the key components from the system design must be refined into a physical PCB design.  Typically driven by electrical engineers within the organization or sourced by external design services, this stage of design has the following considerations:
    1. PCB Plumbing:  Combining the requirements of key components with the external facing aspects of the PCB is the job at this stage of design.  This often involves a physical layout of the PCB, defining power/gnd/clk architecture, and any signal level electrical work.  This phase also involves part selection, but typically of the low complexity (microcontrollers) and analog nature.   Today, this stage of design is reasonably well supported by the physical design, signal integrity, and electrical simulation tools from the traditional EDA Vendors such as Cadence, Zuken and Mentor-Graphics. Part Selection is reasonably well supported by web interfaces from companies such as Mouser and Digikey.
    2. Bootup Architecture:  As the physical design is being put together, a bootup architecture which typically proceeds through electrical stability (DC_OK), testability, micro-code/fpga ramp up, and finally to a live operating system. Typically, connected to this work are a large range of tools to help debug the PCB board. The combination of all of these capabilities is referred to as the Board Support Package (BSP).  BSPs must span across all the abstraction levels of the System PCB, so today, often they are “cobbled” together from a base of tools with parts sitting on various websites.

This design flow is a contrast from the System PCB flow of the 1980’s where the focus of a System PCB was largely to build a function.  In those days, the semiconductors used to build the function were of moderate complexity and the communication mechanism of a datasheet was adequate.   Today, the job of a System PCB designer is really to manage complex fabrics within complex HW/SW ecosystems (AI is coming next).

Yet, the primary method for communication of technical information is with the impression of English sitting in datasheets and websites. Further, most of the non-consumer marketplace has requirements for long life cycles (LLC) which are at odds with the core of the consumer-focused semiconductor chain.

This is all the more ironic because the semiconductor products from EDA companies actually contain all the information which is required by their System PCB EDA counterparts. 

Figure 3:  The Semiconductor and PCB Tool Disconnect

What is missing?   Two fundamental flaws in today’s EDA infrastructure (Figure 2):

  1. Semiconductor Signoff Flow:   Today, there is a very strong semiconductor signoff flow supported by EDA tools which transmits information from semiconductor designer to manufacturing.  However, there is no signoff flow which transmits information from semiconductor designers to system designers.  Rather, this communication interface is manual (legions of folks writing data sheets), lacking deep standards (tool specific symbol libraries), and haphazardly distributed over a variety of channels (websites, downloads, etc).
  2. System PCB Abstraction:  Today’s semiconductors not only need to communicate physical layer information, but also various levels of information in the SW (and increasingly AI) layers to system designers. In fact, this information is increasing in value relative to the physical layer information.  Currently, while there are plenty of  Semiconductor HW/SW/Behavioral EDA capabilities, there is no equivalent in the System PCB space to accept this information.

How do we address this massive gap?   Let’s consider the System PCB abstraction.

Building the System PCB Abstraction:

The abstractions can be broadly classified into four categories:

  1. Physical Layer or Hardware Abstraction: This has got to do with component pinouts, component block diagrams and functionality and CAD models for layout etc. This is the bottom layer.
  2. Programmable or Configurability Layer: Microcontrollers, FPGAs, highly programmable multi-function chips are becoming even more flexible as semiconductor vendors drive deeper integrations. For system designers, this means a new arsenal of tools and possibilities for embedded development. The trend towards multiple pre-built configurable options and more importantly towards “Configurable Logic (CL)”  which allows embedded programmers to easily add their own custom functionality from simple signal inverters to more complex Mancheter decoders, the CL can operate completely independently from the processor core. This is increasingly creating a very important new level of abstraction for system designers.
  3. Virtual Layer or Software Abstraction: This consists of an entire software stack – from OS ports to drivers to BSPs to IDE environments and libraries that aid end application development. In many end markets, the overwhelming amount of software IP is the driving factor for chip selection.
  4. Artificial Intelligence Abstraction:   Finally, given the focus on AI inference on the edge with powerful capabilities around vision or generic data processing, the AI stack supported is increasingly becoming vital for part selection and end application implementation.

However, the state of the union when it comes to availability of these abstractions to system designers is abysmal:

Connecting the Semiconductor and System PCB World

How can this issue be addressed?    Two simple steps:

  1. Smart System Designer:   The System PCB EDA flow must support all the important abstractions generated and enabled by the Semiconductor Design Process.
  2. Formal Sign Off Flow:   The semiconductor signoff flow needs to be extended in formalization, standardization, and content to include the abstractions described above.

Examples of these signoff steps are shown in figure 4 below.  In this picture, the validations done in the semiconductor design process are made available in a structured manner to the system designer.  Also, machine readable data is made available directly as opposed to being transmitted through a data-sheet.

Fig. 4 Smart System Designer: Connecting the Semi & PCB/System Toolsets

Overall, there is an excellent opportunity for EDA companies to leverage their System PCB and Semiconductor capabilities to add value to the vast number of System PCB customers.

Acknowledgements: Special thanks to Anurag Seth for co-authoring this article.

Also Read:

Bespoke Silicon Requires Bespoke EDA

Post-Silicon Consistency Checking. Innovation in Verification

Higher-order QAM and smarter workflows in VSA 2023