SemiWiki – Page 250 – The Open Forum for Semiconductor Professionals

RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

May 5, 2022September 11, 2022

Efficient Memory BIST Implementation

Test experts use the acronym BIST for Built In Self Test, it’s the test logic added to an IP block that speeds up the task of testing by creating stimulus and then looking at the output results. Memory IP is a popular category for SoC designers, as modern chips include multiple memory blocks for fast, local data and register storage needs. As the number of memory IP blocks increases in a chip, the challenge is how to implement memory BIST for the minimum area, and maximum throughput. A recent white paper from Harshitha Kodali, Product Engineer at Siemens EDA, focused on this topic, and I’ve learned how a shared bus architecture is the most efficient implementation.

Here’s what a shared bus architecture looks like, where the shared bus is shown in black, physical memories are in teal color, and the physical memories are combined into four logical memories:

Shared Bus Interface

The MBIST logic used to test any of these memories has several components:

Memory BIST shared bus hardware

Engineers insert this DFT logic automatically at the RTL or gate-level, and they define the memories using a Tessent Core Description (TCD) which has details like:

Shared bus interface ports
Access codes per logical memory
Port mappings between logical memories and shared bus interface

Shared bus learning is a methodology to automatically map the physical memory makeup of every logical memory, and verify that the cluster and logical memory library files are proper. Here’s the flow:

Shared bus learning flow

The library validation step ensures that no memory is missed from MBIST testing, port mappings are consistent, and that pipeline stages around the logical memory are consistent with the cluster TCD.

There are five steps to insert the shared bus logic into a design, where ICL is the Instrument Connectivity Language used by a IJTAG flow:

DFT insertion flow

For larger memories the yield can be improved if repairable memories are used, so the Tessent MemoryBIST approach does support this, and inserts the Built-In Repair Analysis (BIRA) plus Built-In Self Repair (BISR) logic. The added BIRA and BISR logic is shown below:

Repairable memory logic

Simple memory instances have a single port for Reading and Writing, however more complex configurations like multi-port and pseudo-vertical stacking are also supported with Tessent MemoryBIST. All of the memory configuration details are defined in the logical memory TCD.

The DFT area overhead can also be optimized if the design has identical memory instances that are not tested concurrently, as the memory interface and virtual memory will be reused.

Summary

Memory BIST with many IP instances can be efficiently implemented with a shared bus test using Tessent MemoryBIST. There’s quite a bit of flexibility in the DFT automation approach offered by Siemens EDA to handle physical memories, logical memories, memory library mapping and validation.

The complete White Paper is available to view online, with a simple registration step, or there’s a recorded webinar online to view.

Related Blogs

May 5, 2022July 18, 2025

Design IP Sales Grew 19.4% in 2021, confirm 2016-2021 CAGR of 9.8%

Design IP Sales Grew 19.4% in 2021, confirm 2016-2021 CAGR of 9.8%
by Eric Esteve on 05-05-2022 at 6:00 am
Categories: Alphawave Semi, Arm, Cadence, IP, IPnest, Semiconductor Services, Synopsys

Design IP Sales reached $5.45B in 2021, or 19.4% YoY after 16% in 2020, on-sync with semiconductor growth of 26.2% in 2021 according to WSTS. IPnest has released the “Design IP Report” in May 2022, ranking IP vendors by category (CPU, DSP, GPU & ISP, Wired Interface, SRAM Memory Compiler, Flash Memory Compiler, Library and I/O, AMS, Wireless Interface, Infrastructure and Misc. Digital) and by nature (License and Royalty).

The main trends shaking the Design IP in 2021 are very positive for most of the IP vendors, especially for Synopsys growing by 21.7%, more than the market, as well as Imagination Technologies (IMG) by 43.4% and Flash memory compiler vendors (SST, eMemory Technology) and Alphawave with more than 100% growth.

Synopsys and Alphawave growth confirm the importance of the wired interface IP market (with 22.7% growth for the category) aligned with the data-centric application, hyperscalar, datacenter, networking or IA. But the good performance of ARM and IMG proves the come back of the smartphone industry and the emergence of automotive as a growth vector.

Looking at the 2016-2021 IP market evolution can bring interesting information about the main trends. The global IP market has grown by 59.3% when Top 3 vendors have seen unequal growth. The #1 ARM grew by 33.7% when the #2 Synopsys grew by 140.9% and Cadence (#3) by 167.2%. Market share information is even more significant. ARM moved from 48.1% in 2016 to 40.4% in 2021 while Synopsys enjoyed a move from 13.1% to 19.7% (or a gain of 50% of market share from 2016 to 2021!) and Cadence is progressing from 3.4% to 5.8%.

This can be synthetized with the comparison of 2016 to 2021 CAGR:

The strong information is that the Design IP market has enjoyed almost 10% CAGR for 2016-2021! It’s also noticeable that Synopsys with 19.2% CAGR has grown more than three times compared with ARM (6% CAGR).

IPnest has also calculated the IP vendors ranking by License and royalty IP revenues:

Synopsys is the clear #1 winner by IP license revenues with 31.2% market share in 2021, while ARM is #2 with 25.6%. Alphawave, created in 2017, is now ranked #4 just behind Cadence, showing how high performance SerDes IP is essential for modern data-centric application (Alphawave is leader for PAM4 112G SerDes available in 7nm, 5nm and 3nm from various foundries, TSMC, Samsung and Intel-IFS).

Semiwiki readers shouldn’t be surprised, as I had predicted importance of SerDes IP in a blog written in 2012 “Such a small piece of Silicon, so strategic PHY IP” http://www.semiwiki.com/forum/content/1241-such-small-piece-silicon-so-strategic-phy-ip.html

In fact, Synopsys good performance is partly related to their strong focus on the wired interface category, where they enjoy 55.6% of 1.3B market, and high performance SerDes is the main pilar of the interconnect market. Synopsys has adopted a “One-Stop-Shop” strategy, supporting almost all protocols (USB, PCIe, Ethernet, SATA, HDMI, MIPI, DDR Memory Controller) and enjoying leading market share in every protocol.

Alphawave is complementary in the sense that their strategy is more “Stop-For-Top”, restricting their support to the most advanced products on the leading-edge technology nodes. If we look at 2021 Design IP results, both can be successful, following a different strategy and market positioning.

The 2021 ranking for Royalty shows ARM’s dominance with 60.8% market share, not a surprise if we consider their customer installed base and their strong position in the smartphone industry. More surprising is the come back of SST and Imagination Technologies (IMG) resp. #2 and #3 in this Top 5.

SST is benefiting from the microcontroller upturn as they equipped the majority of microcontroller products sold. IMG has been able to overcome the air pocket generated by Apple a few years ago, and re-position as a modern GPU provider in various segments on top of smartphone like automotive entertainment, Smart TV or Tablet.

With 19.4% YoY growth in 2021, the Design IP industry is simply confirming how incredibly healthy this niche is within the semiconductor market and the past 2016 to 2021 CAGR of 9.8% is a good metric! IPnest has also run a 5-year forecast (not yet published) for Design IP, to weight $11B in 2026 and predict a future CAGR (2021 to 2026) of 15%. Optimistic? This year-to-year 2021 growth is on-line with this prediction…

Eric Esteve from IPnest

To buy this report, or just discuss about IP, contact Eric Esteve (eric.esteve@ip-nest.com)

Also read:

Chiplet: Are You Ready For Next Semiconductor Revolution?

IPnest Forecast Interface IP Category Growth to $2.5B in 2025

Design IP Sales Grew 16.7% in 2020, Best Growth Rate Ever!

Podcast EP76: Geopolitical Forces on Semis, the Past, Present and Future with Terry Daly

Podcast EP76: Geopolitical Forces on Semis, the Past, Present and Future with Terry Daly
by Daniel Nenni on 05-04-2022 at 10:00 am

Dan is joined by Terry Daly, a 35-year veteran of the semiconductor industry, former senior VP at GLOBALFOUNDRIES and an executive at IBM Microelectronics. Terry is currently an independent consultant, and also a Senior Fellow at the Council on Emerging Market Enterprises at The Fletcher School of Law & Diplomacy at Tufts University.

Dan and Terry discuss the various sanctions, subsidies, competition and legislation associated with the semiconductor industry, with a view of how we got here and where it all may take us.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual

May 4, 2022May 10, 2022

Tensilica Edge Advances at Linley

Tensilica Edge Advances at Linley
by Bernard Murphy on 05-04-2022 at 6:00 am
Categories: AI, Cadence, EDA, Events

The Linley spring conference this year had a significant focus on AI at the edge, with all that implies. Low power/energy is a key consideration, though increasing performance demands for some applications are making this more challenging. David Bell (Product Marketing at Tensilica, Cadence) presented the Tensilica NNE110 engine to boost DSP-based AI, using a couple of smart speaker applications to illustrate its capabilities. Amid a firehose of imaging AI in the media, I for one am always happy to hear more about voice AI. The day when voice banishes keyboards can’t come soon enough for me 😎 These Tensilica Edge advances also support vision applications naturally.

The Need

DSPs are strong platforms for ML processing since ML needs have much in common with signal processing. Support for parallelism and accelerated MAC operations has been essential in measuring, filtering and compressing analog signals for many decades. The jump to ML applications is obvious. As those algorithms rapidly evolve, DSP architectures are also evolving for more parallelism, more MACs and more emphasis on keeping big data sets (weights, images, etc) on-chip for as long as possible to limit latencies and power.

Another area of evolution is in specialized accelerators to augment the DSP for specialized functions with even lower latency and power. In voice-based applications, two very important examples are noise suppression and trigger word detection. In noise suppression, intelligent filtering can now do better than conventional active noise filtering. Trigger word detection must be always-on, running at ultra-low power to allow the rest of the system to remain off until needed. Recognizing trigger words requires ML, at ultra-low power.

Meeting these needs with NNE110

A now popular method for de-noising is based on an LSTM network trained to separate speech from environmental noise. This allows adapting across a wide variety of environment possibilities. Profiling reveals that 77% of these operations running on pure DSP implementation are matrix and vector operations, and about half the remaining operations are in activation functions such as sigmoid or tanh. These are obvious candidates to run on the accelerator. Comparing between pure DSP and DSP+NNE implementations, both latency and power improve by over 3X. For a different de-noising algorithm , latency and power reduce even more dramatically, by 12X and 15X respectively. This is for a CNN based on U-NET, here adapted from a different domain.

Implementation

The NNE accelerator looks like it slips in very cleanly to the standard Tensilica XAF flow. In mapping instructions from TensorFlow Lite for Microcontrollers, standard Tensilica HiFi options are reference ops and HiFi optimized ops. NNE ops are just another option connected through a driver to the accelerator. In development, supported operations simply map to the accelerator rather than one of the other classes of ops.

David pointed out that multiple applications can benefit from this fast and very low-power always-on extension. This is in the visual domain as well as in voice recognition. Obvious candidates include trigger word recognition, visual wake words, gesture detection and more.

If you want to learn more, you probably had to be registered to the Linley conference to get the slides, however Cadence has a web page on NNE. Also you can learn more about the LSTM algorithm HERE and the U-NET algorithm HERE.

Also read:

ML-Based Coverage Refinement. Innovation in Verification

Cadence and DesignCon – Workflows and SI/PI Analysis

Symbolic Trojan Detection. Innovation in Verification

May 3, 2022July 18, 2025

Bigger, Faster and Better AI: Synopsys NPUs

Bigger, Faster and Better AI: Synopsys NPUs
by Kalar Rajendiran on 05-03-2022 at 10:00 am
Categories: EDA, Events, IP, Synopsys

AI-based applications are fast advancing with evolving neural network (NN) models, pushing aggressive performance envelopes. Just a few years ago, performance requirements of NN driven applications were at 1 TOPS and less. Current and future applications in the areas of augmented reality (AR), surveillance, high-end smartphones, ADAS vision/LiDAR/RADAR, high end gaming and more are calling for 50 TOPS to 1000+ TOPS. This trend is leading to development of neural processor units (NPUs) to handle this demanding requirement.

Pierre Paulin, Director of R&D, Embedded Vision at Synopsys gave a talk on NPUs at the Linley Spring Conference April 2022. His presentation was titled “Bigger, Faster and Better AI: Synopsys NPUs” and covered their recently announced ARC NPX6 and ARC NPX6FS processors. This post is a synopsis of the salient points from his talk.

Embedded Neural Network Trends

Four factors contribute to the increasing levels of performance requirements of artificial intelligence (AI) applications.

AI research is evolving and new neural network models are emerging. Solutions must be able to handle models such as the AlexNet from 2012 as well as the latest models such as the transformer and recommender graphs.
With the automotive market being a big adopter of AI, the applications need to meet functional safety requirements standards. This market requires mature and stable solutions.
Applications are leveraging higher definition sensors, multiple camera arrays and more complex algorithms. This calls for parallel processing of data from multiple types of sensors.
All of the above push more requirements on to the SoCs implementing and supporting the AI applications. The hardware and software solutions should enable quicker and quicker time to market.

Synopsys’ New Neural Processing Units (NPUs)

Synopsys recently introduced their new NPX series of NPUs to deliver performance, flexibility and efficiency demanded by the latest NN trends.

The NPU core of the NPX6 offering is based on a scalable architecture with 4K MACs building blocks. A single NPU instance can be built from 1 to 24 NPU cores. A multi-NPU configuration can include up to 8 NPU instances. Synopsys also offers NPX6FS to support the automotive market. Refer to the Figures below for corresponding block diagrams.

The key building block within the NPU core is the Convolution Accelerator. Synopsys’ main focus was on MAC utilization for handling the most modern graphs such as the EfficientNet. The NPX6/NPX6FS contain a generic tensor accelerator to handle the non-convolution parts and fully supports the Tensor Operator Set Architecture (TOSA).

A high bandwidth, low latency interconnect is included within the NPU core and is coupled with high-bandwidth L1 and L2 memories. The NPX6 also includes an intelligent broadcast feature which works as follows. Anytime a feature map or coefficient is read from external memory, it is read only once and reused as much as possible within the core. The data is broadcast only when used by more than one core.

Of course, the hardware is only half the story. The other half is software and Synopsys has been working on the entire effort for many years to deliver a solution that is fully automatic. Some of the key features/functionality are mentioned below.

Flexibility

With every new NN model comes a new activation function. The NPX6/NPX6FS cores support all activation functions (old, new and ones yet to come) using a programmable lookup table approach.

Enhanced datatype support

Though the industry is moving toward 8 bit datatype support, there are still cases where a mix of datatypes is appropriate. Synopsys provides a tool that automatically explores the hybrid versions of a couple of layers in 16 bit and all other layers in 8 bit. The NPX6 supports FP16 and BF16 (as options) with very low overhead. Customers are taking this option to quickly move from a GPU oriented, power hungry solution to an embedded, low power, small form factor solution.

Latency reduction

Instead of pipelining, the NPX architecture takes an approach of parallelizing a convolutional layer on multiple cores to deliver both higher throughput and lower latency.

Power Efficiency

The NPX6 is able to achieve 30 TOPS/W in 5nm, which is an order of magnitude better than many solutions out there today.

Bandwidth Reduction

With a machine running at over 100 TOPS, the NPX6 is able to handle the bandwidth requirement with a LPDDR4/LPDDR5 class of memory interface.

Benchmark Results

Refer to Figure below for performance benchmark results when comparing Frames per second per Watt as the metric.

On-Demand Access to Pierre’s entire talk and presentation

You can listen to Pierre’s talk from here, under “Keynote and Session 1.” You will find his presentation slides here, under “Day 1 – Keynote – AM Sessions.”

Also read:

The Path Towards Automation of Analog Design

Design to Layout Collaboration Mixed Signal

Synopsys Tutorial on Dependable System Design

May 3, 2022January 10, 2023

Future.HPC is Coming!

Future.HPC is Coming!
by Daniel Nenni on 05-03-2022 at 6:00 am
Categories: Altair, EDA, Events

According to the experts, the semiconductor industry is poised for a decade of growth and is projected to become a trillion dollar industry by 2030. In 2021 the semiconductor industry finally hit $600B so $1T by 2030 seems like a big ask, but not really if you look at the indicators inside the semiconductor ecosystem. Foundries, EDA, IP, and other ecosystem markers grew at record levels in 2021. The $1T question is: What will be the next big driver in the next decade for semiconductors?

The answer of course is high performance computing (HPC) and if you want to learn more about HPC market segments Argonne National Laboratory, 3M, and Google are a great place to start:

Future.HPC Virtual Event

Register Here

Compute Intelligence for Breakthrough Results

Altair’s flagship high-performance computing event looks at outstanding HPC-driven end results customers have realized in the last year. HPC professionals from across the globe will share how they empower their organizations to turn CPUs and GPUs into groundbreaking medical research, faster planes and automobiles, smaller chips, smarter financial models, and so much more.

Featuring a mix of leadership presentations, panel discussions, breakout sessions, and networking opportunities, attendees can connect with and learn from fellow designers, engineers, IT specialists, and data scientists on the latest technology topics influencing every industry and the world around us.

All presentations will have live audio translations into French, Spanish, German, Italian, and Portuguese. The event will be presented in two time zones – CEST (Europe & APAC) and EDT (AMER).

Join experts from Argonne National Laboratory, 3M, Google, WIRED Editor-in-Chief Greg Williams, and many more, for leadership presentations, roundtables, breakout sessions, and networking opportunities. Whether you’re an HPC pro or an end-user who prefers to keep the complexity “under the hood,” Future.HPC is the place to connect virtually with designers, engineers, IT specialists, and data scientists accelerating innovation timelines in your industry.

Day One Tuesday May 17, 2022

10:00 AM (Paris); 1:30 PM (India); 4:00 PM (Shanghai/Kuala Lumpur)
11:00 AM (New York); 8:00 AM (San Francisco); 10:00 AM (Mexico City); 12:00 PM (Sao Paulo)

Welcome and Introduction

Dr. Rosemary Francis, Chief Scientist, Altair
Rick Watkins, Senior Director Cloud Computing, Altair

Altair Keynote

James R. Scapa, Founder, Chairman, & CEO, Altair,
Joe Sorovetz, SVP Enterprise Solutions, Altair

Harnessing the Great Acceleration

Greg Williams, Editor-in-Chief, WIRED Magazine

The next big technology trends that will be driven by supercomputing.

3M: Accelerating Cloud Adoption for HPC Workloads with Altair PBS Professional and Altair Control

Gabe Turner, HPC Solutions Architect, 3M

With approximately 60,000 products spanning four business groups, 3M is constantly improving products to fit the needs of its customers. Modeling and simulation touch many stages of the product development process at 3M and high-performance computing is critical to accommodating such computationally intensive workloads. 3M has been using Altair PBS Professional for many years to great success, and in 2018 became an early adopter of Altair Control and its cloud bursting capabilities for HPC.

Google: Feature Partner Presentation

Dr. William (Bill) Magro, Chief Technologist, High-Performance Computing , Google

Building the Future of HPC Workload Management at ANL

William Allcock, ALCF Advanced Integration Group, Argonne National Laboratory

As Argonne National Laboratory prepares to support researchers tackling critical problems on its Polaris supercomputer and eventually its Exascale system, Aurora, William (Bill) Allcock, Manager, ALCF Advanced Integration Group, outlines the critical role of workload management in advancing what’s possible with HPC. He will share how his team’s initial exploration of open-source workload management technology led to their adoption of Altair’s commercial solution, PBS Professional, in 2021, and to a partnership between HPC experts at ANL and Altair that will …

HPC at Punch Torino

Mauro Bighi, CIO PUNCH Group

Panel Discussion on High-performance Computing: Decades of Technological Advancement Give Drastically New Meaning to “HPC”

Dr. Bill Nitzberg, Chief Engineer – HPC, Altair
Fritz Ferstl, SVP Software Development, Altair
Stuart Taylor, Director Software Development, Altair

At this moment, Altair has the largest brain trust of HPC expertise in the world. Join the experts who have shepherded technology as original as PBS, Sun Grid Engine and Runtime Design Automation into today’s high stakes commercial HPC space. They’ll discuss the decades of advancement, development and acquisition that went into assembling the most comprehensive HPC optimization portfolio on the market, preparing us for an all new definition of “high performance computing.”

Day Two Wednesday May 18, 2022

10:00 AM (Paris); 1:30 PM (India); 4:00 PM (Shanghai/Kuala Lumpur)
11:00 AM (New York); 8:00 AM (San Francisco); 10:00 AM (Mexico City); 12:00 PM (Sao Paulo)Welcome and IntroductionDr. Rosemary Francis, Chief Scientist, AltairA Cloud for Every Workload: Which Model is Right for You?

Rick Watkins, Senior Director Cloud Computing, Altair

As the leading provider of workload management and compute optimization technology for three decades, Altair has helped enterprise computing customers at the cutting edge of HPC move critical workloads to the cloud. Now, we’re helping thousands of simulation software end users enticed by the promise of increasing exploration potential with “infinite” scalability do the same. Whether you’re an HPC expert looking for tools to tune and optimize or an end user looking to boost productivity without IT complexity, join cloud.

Parallel Track – Cloud for End Users: Turbocharge Engineering Productivity, No IT Expertise Required

Raghvendra Srivastava, Product Manager, Altair

HPC and cloud accessibility make it possible to scale exploration exponentially, but the key to breakthrough results is ensuring time spent interfacing with IT doesn’t scale accordingly. In this breakout session just for simulation application end users, attendees will see how the most competitive teams access their software, their data, and powerful HPC resources to turbocharge productivity from anywhere in the world. From launching software on any device and accelerating jobs with on-demand solving power to seamlessly visualizing and sharing.

Parallel Track – Cloud for HPC Pros: Expand HPC Infrastructure On-demand with Cost-effective, Multi-cloud Scaling

Ian Littlewood, Product Manager Enterprise Computing, Altair

Beyond empowering your team with flexible compute resources to scale productivity, cloud bursting technology provides tunning and automation opportunities that make real impact on the metrics that matter most to your organization. Join this breakout session for HPC stakeholders to see how to manage, optimize and forecast compute resources, bursting to and between your on-prem resources and Oracle Cloud Infrastructure, Google Cloud Platform, Microsoft Azure, and Amazon Web Services (AWS). “It just works” may be the magic words for end.

Join us for our flagship HPC event for 2022!

Register Now to Receive Agenda Updates

About Altair
Altair is a global leader in computational science and artificial intelligence (AI) that provides software and cloud solutions in simulation, high-performance computing (HPC), data analytics, and AI. Altair enables organizations across all industries to compete more effectively and drive smarter decisions in an increasingly connected world – all while creating a greener, more sustainable future. For more information, visit https://www.altair.com/.

Also Read:

Six Essential Steps For Optimizing EDA Productivity

Latest Updates to Altair Accelerator, the Industry’s Fastest Enterprise Job Scheduler

Chip Design in the Cloud – Annapurna Labs and Altair

May 2, 2022May 4, 2022

IP Subsystems and Chiplets for Edge and AI Accelerators

IP Subsystems and Chiplets for Edge and AI Accelerators
by Daniel Payne on 05-02-2022 at 10:00 am
Categories: Chiplet, Events, OpenFive, Semiconductor Services
2 Comments

From a business viewpoint we often read in the technical press about the virtues of applying AI, and in the early days most of the AI model building was done in the cloud, because of the high computation requirements, yet there’s a developing trend now to use AI accelerators at the Edge. The other mega-trend in the past decade is that the RISC-V ISA has been applied to more tasks, and the momentum is only growing. Ketan Mehta from OpenFive presented at the IP-SoC Silicon Valley 2022 in April, so I attended to see what’s happening with RISC-V, chiplets, the Edge and AI accelerators.

OpenFive was founded in 2003 and named Open-Silicon at that time, venture-funded, and has been growing swiftly to over 600 people now, while providing custom silicon design services resulting in over 350 tape outs. They have engineering expertise in RISC-V, memory IP; and connectivity IP like chip to chip, died to die, and even chiplets.

The data center is morphing as the demands of HPC (High Performance Computing) continue to build, so processors are now connecting to accelerator NICs using the CXL standard, even HBM (High Bandwidth Memory) are using accelerators to connect with processors through CXL. Memory IP for cache is often used inside of these accelerators. OpenFive uses their experience designing scalable chiplets to address some of these technology challenges, by meeting system design requirements like:

Low latency – sub-10ns
Low footprint – edge devices, PCIe server cards
Low power – from 0.5W to 10W
High throughput – Tbps/mm

For connecting HBM, LPDDR IP and D2D (Die to Die) there are three IP products designed by OpenFive:

HBM2/2E IP Subsystem

LPDDR5/4X IP Subsystem

Die-to-Die IP Subsystem

A scalable chiplet platform from OpenFive can have compute, memory and connectivity IP; with both subsystems (green) and custom IP (blue):

An example of an Edge AI system using four RISC-V cores, hardware accelerators, memory controllers, and IO connectivity, all at a power target under 5W was presented by Ketan:

Edge AI RISC-V platform

Engineers at OpenFive have already delivered several scalable chiplet platforms to customers in process nodes ranging from 5nm up to 16nm, using a variety of RISC-V cores, memory IP and interconnect IP combinations.

Chiplets are a way to combine multiple die together in a single package, in order to achieve higher yields at lower costs than a single SoC, while meeting power and throughput budgets. There were two chiplet examples provided: CPU, IO chiplet.

CPU chiplet

IO chiplet

Summary

Ketan’s presentation showed me how OpenFive has been able to design and then deliver silicon-proven subsystems across multiple applications, like: Edge, AI and HPC. Chiplet usage is now ramping up, as more system companies are able to optimize their ideas using disaggregated silicon die that are tuned for the workloads of their applications. Using a vendor with a large array of IP subsystems is a competitive advantage, as IP reuse provides time to market benefits.

Related Blogs

May 2, 2022January 3, 2023

High Efficiency Edge Vision Processing Based on Dynamically Reconfigurable TPU Technology

High Efficiency Edge Vision Processing Based on Dynamically Reconfigurable TPU Technology
by Kalar Rajendiran on 05-02-2022 at 6:00 am
Categories: AI, eFPGA, Events, Flex Logix

While many tough problems relating to computing have been solved over the years, vision processing is still challenging in many ways. Cheng Wang, Co-Founder and CTO of FlexLogix Technologies gave a talk on the topic of edge vision processing at Linley’s Spring 2022 conference. During that talk he references how Gerald Sussman took the early steps of computer vision processing way back in 1966. Gerald, a first-year undergraduate student under the guidance of MIT AI Lab co-founder Marvin Minsky tried to link a camera to a computer. Much progress has happened since then. Of course, the requirements and the markets for computer vision haven’t stayed static during this time.

The early era of computer vision processing focused on industrial grade computing equipment that tolerated large form factors and high costs of the solutions. Fast forward to the most recent decade, neural network models and GPUs have played critical roles in advancing vision processing capabilities. But delivering solutions in smaller form factors and at low costs is still a challenge. In his talk, Cheng discusses the reasons behind these challenges and FlexLogix’s solution to edge vision processing based on dynamically reconfigurable TPU technology. The following are some excerpts from his presentation.

Performance, Efficiency and Flexibility

Edge computer vision requires extreme amount of processing at Teraops rates. And the vision solutions need to demonstrate high accuracy at low latencies, operate at low power and be available at low cost points. While GPUs can deliver the performance, they are large, expensive and power hungry and thus not a good match for edge compute devices. And GPUs count on a huge amount of memory bandwidth via DDR type interfaces. On top of these challenges, the neural models are also fast evolving. Not only are new models emerging at a rapid rate, even the same models undergo incremental changes at a frequent rate. Refer to Figure below to see how frequently the popular model YOLOv5 is going through changes.

The processing of neural network models is very different from general purpose processing when it comes to compute work load and memory access patterns. Each layer may require vary computational loads relative to the memory bandwidth that layer requires. And this changes dynamically as different layers are processed. So, an optimal approach to solving the challenges counts on memory efficiency and future proofing for changing models. Graph streaming will help reduce DRAM requirements but bandwidth matching on a varying load is difficult.

FlexLogix’s Dynamic TPU

FlexLogix’s Dynamic TPU offers a flexible, load-balanced, memory-efficient solution for edge vision processing applications.

The Dynamic TPU is implemented using Tensor Processor Arrays (ALUs) and EFLX logic. The architecture enables very efficient layer processing across multiple Tensor Processor Arrays that communicate via FlexLogix’s XFLX InterConnect and access L2 SRAM for memory efficiency. As the TPU uses EFLX cores, the control and data paths are future proofed for changes in activation functions and operator changes. By streaming data at a sub-graph level, more efficient bandwidth matching is made possible. Refer to Figure below.

While a GPU-based edge vision processing solution may consume power in the 75W-300W range , a Dynamic TPU based solution will consume in the 6W-10W range. Whereas a GPU-based solution predominantly relies on GDDR, a Dynamic TPU-based solution relies on local connections, XFLX connections, flexible L2 memories and LPDDR.

The FlexLogix solution includes the InferX SDK which directly converts a TensorFlow graph model to dynamic InferX hardware instance. A Dynamic TPU-based solution will yield a much higher efficiency on the Inference/Watt and Inference/$ metrics compared to a GPU or CPU based solution. All in all, a superior performance with software flexibility and future proofing versus ASIC solutions.

The Infrastructure Investment and Jobs Act (IIJA) passed last year in the U.S. earmarks billions of dollars that can be used for the deployment of potentially life-saving C-V2X car connectivity technology. The U.S. Department of Transportation and state DOTs are poised to commence that spending, but one thing stands in the way of car maker or state DOT willingness to proceed – a lawsuit by the Intelligent Transportation Systems of America (ITSA) and the American Association of State Highway Transportation Officials (AASHTO).

ITSA and AASHTO are seeking to reverse the Federal Communication Commission’s (FCC) re-allocation of 45MHz of spectrum in the 5.9GHz band – previously preserved for dedicated short range communication (DSRC) use by automobiles – for unlicensed Wi-Fi use. ITSA and AASHTO want the 45MHz restored.

The judge is unlikely and probably does not have the authority to reverse the FCC’s unanimous decision. The only possible path to success for the ITSA and AASHTO would be for the judge to find the FCC’s decision-making process somehow flawed. This is highly unlikely – which means that the legal action is a waste of time and money and maybe…lives.

Ironically, ITSA and AASHTO wave the bloody flag in their efforts to preserve the prior spectrum allocation – claiming their efforts are intended to save lives the best way possible with connected car technology. Their efforts are actually further delaying the prospect of the adoption of connected car technology in the near term.

ITSA CEO Laura Chase comments in an opinion piece in the latest ITSA magazine:

“While there is no magic bullet to reduce crashes and fatalities, we have a responsibility to use all the tools at our disposal to save lives. The best tool we currently have is connected vehicle technologies – but without wide-scale deployment, we can’t hope to move the needle on reducing traffic fatalities.”

Well, Laura, we can’t move that needle as long as ITSA and AASHTO continue to inject uncertainty into the regulatory process. Car makers need clarify, alignment, and commitment. State and Federal contracting authorities, too, need clarity. ITSA and AASHTO have muddied the waters and put a bullet in the head of potentially life-saving infrastructure projects incorporating C-V2X technology.

Even stranger, Chase says the ITSA is simultaneously working on re-imbursement for stranded DSRC deployments – something that is already provided for in the IIJA. The only good news from ITSA is that the organization appears to be “accepting” the FCC’s overt endorsement of cellular-based C-V2X technology over DSRC. Thank goodness for small things.

The legal action by ITSA and AASHTO means that car companies or state DOTs are frozen. Proposals can’t be written and money cannot be allocated until the case is resolved.

Multiple car companies and DOTs have applied for waivers from the FCC to proceed with their projects – but the FCC has not even posted the waiver requests, which are subject to public comment. ITSA and AASHTO have gummed up the very process for which they have worked for more than 20 years – to bring V2X technology to the market.

A senior General Motors executive speaking as part of a 5.9GHz forum at the recent ITSA event in Charlotte, N.C., said, of this legal action: “You lost half of the dedicated 5.9GHz spectrum because you did nothing with it for 20 years. If you don’t do something with it now you’re likely to lose the rest of it.”

Worse, though, is the reality that the legal action by ITSA and AASHTO is actually costing the U.S. valuable time in the race to compete with China. China long ago abandoned DSRC as the primary connected car technology in favor of C-V2X.

As many as 13 auto makers in China have either already introduced C-V2X-equipped vehicles or have announced plans to do so. In the U.S., Ford Motor Company, Audi of America, and Jaguar Land Rover and multiple state DOTs have submitted waiver requests to introduce the technology – and, so, they wait.

ITSA and AASHTO are on the wrong side of history. These organizations are wasting time, money, and lives in the interest of turning back the clock. The FCC has spoken. The spectrum has been allocated. The billions of dollars have been approved. It’s time for ITSA and AASHTO to simply get out of the way.

Also read:

OnStar: Getting Connectivity Wrong

Tesla: Canary in the Coal Mine

ISO 26262: Feeling Safe in Your Self-Driving Car

May 1, 2022March 26, 2023

Has KLA lost its way?

Has KLA lost its way?
by Robert Maire on 05-01-2022 at 6:00 am
Categories: Semiconductor Advisors, Semiconductor Services
3 Comments

-KLA has another great QTR in face of overwhelming demand
-Supply chain issues obliterated by backlog
-Longer term technology leadership concerns are increasing
-We see limited upside near term & remain cyclically cautious

Another great quarter- demand remains super strong

KLA’s performance remains great as does overall semiconductor equipment demand. KLA reported revenues of $2.3B versus expectations of $2.2B and non GAAP EPS of $5.13 versus street of $4.82. Guidance was for revenues of $2.3B to $2.55B versus current expectations of $2.36B with earnings in the range of $4.93 to $6.03 versus current expectations of $5.30

KLA can “dial in” numbers given the huge backlog

Historically KLA has almost always been able to accurately dial in numbers for the next quarter given the huge and long backlog they have. The current backlog is out the door and down the street and not likely to shorten any time soon.

This results in the ability to both guide and deliver numbers wherever they wish. Some segments remain a little bit lumpy due to high selling prices or mix shifts. With most deliveries running at a year or more and over $8B in solid orders we don’t see a lot of risk to the backlog right now. However we have seen backlog deflate in prior cycles but we never with the level we currently have.

Supply chain issues remain but not very impactful

Supply chain issues remain “fluid” but the large backlog clearly mitigates most if not all of that instability. As compared to other companies in the industry that typically run a more turns business KLA can modulate to adapt to shortages.

Yield management continues to be a crucial market segment

Growth in all things yield management remain very strong especially in emerging markets that have a lot more to learn when it comes to semiconductor manufacturing. This means China which remains a huge market for semiconductor tool makers including KLA.

This adds perhaps a bit more risk but so far we don’t think the US government is in a mood to upset the Chinese by turning up the heat in trade restrictions given the Ukraine situation.

Has KLA lost its way?

KLA’s first product was reticle inspection back in 1975 and it has been one of the two pillars of the company along with wafer inspection for the entire life of the company. We think the reticle inspection pillar of the company is weakening, though perhaps not completely, it has certainly lost the technology lead and along with it the future business.

A former upstart pimple on KLA, Lasertec, has clearly taken the technology lead and with it, the most profitable as well as future of the reticle inspection market. Lasertec likely has the dominant share of leading edge reticle inspection revenue as well and will likely expand that lead.

Lasertec’s recent quarter

Lasertec recently announced its quarter and along with it projections for the next twelve months business which they expect will come in at over $2B versus KLA’s just reported $611M in the quarter in “patterning” which likely does not represent pure 100% reticle inspection tools.

More importantly, Lasertec is the only game in town in EUV actinic inspection. We believe KLA’s actinic tool has been further delayed by issues with several hardware subsystems let alone the fact that the noble gas Xenon, which the system runs on, has skyrocketed recently from $10 a liter to over $200 a liter if you can get it…and still climbing.

We have heard that KLA’s E-Beam reticle inspection (the 8XX) tool has not been popular customers and public data shows that E-Beam is just way too slow. But right now customers may settle for a slower 8XX tool as actinic is years away from either Lasertec or maybe eventually KLA.

KLA may point to “print check” (AKA print and pray) which uses a wafer inspection tool to look at what has been printed from the reticle on the wafer but its not a direct (only inferred) system that is useless in a mask shop anyway. Actinic is clearly the gold standard and only Lasertec has it.

Data points from SPIE

We recently attended SPIE (a conference about all things lithography) and a talk given by a major chipmaker who is first in line for High NA EUV tools spoke about High NA reticle inspection and showed a picture of a system…and it wasn’t KLA.

So currently where the industry is and is going in reticle inspection is not KLA and KLA may not have time to catch up given the delays. We have seen this movie before as ASML was around 10-15 years delayed in getting EUV scanners to market. KLA’s multi year self imposed halt in the program certainly made things even worse.

KLA still does a great business in older technology reticle inspection for all those second and third tier fabs in China but that’s not saying a lot.

Weakness in E Beam wafer tools

While reticle inspection may already be a fait acompli we are also starting to get more concerned about wafer inspection. ASML recently announced a 5X5, 25 multibeam (not multicolumn…there is a difference) E Beam wafer inspection tool in the Hermes division. ASML has been winning in wafer defect inspection while AMAT has been exploding in the E Beam wafer metrology market. KLA still dominates in optical, which is about 4 times the size of E Beam, but clearly needs to catch up to ASML and AMAT in E Beam.

The stock

The results and financials are great…as always. Demand remains super strong. We certainly are not concerned about the near term but have questions about the longer term especially when the market eventually slows.

Right now customers are desperate for tools and anything that will help the yields of ever more complex process so KLA is in a good seat. Perhaps not as good as ASML but second best.

Much of the current success is due to momentum, size and desperation not necessarily technology leadership. This makes us more concerned about the longer term issues .

While 2022 seems almost “in the bag” we are more concerned about where things go when the tide goes out and exposes issues in the longer term.
From a valuation perspective its hard to fight the negative tape in chip stocks and much of the strong performance including a strong second half is already baked into the numbers and expectations.

We don’t see a lot of upside headroom in the stock and see more longer term potential downside at this point which would make us avoid putting more money to work here.

Also read:

LRCX weak miss results and guide Supply chain worse than expected and longer to fix

Chip Enabler and Bottleneck ASML

DUV, EUV now PUV Next gen Litho and Materials Shortages worsen supply chain

Summary

Related Blogs

Also read:

The Need

Meeting these needs with NNE110

Implementation

Also read:

Embedded Neural Network Trends

Synopsys’ New Neural Processing Units (NPUs)

Flexibility

Enhanced datatype support

Latency reduction

Power Efficiency

Bandwidth Reduction

Benchmark Results

On-Demand Access to Pierre’s entire talk and presentation

Also read:

Future.HPC Virtual Event

Join us for our flagship HPC event for 2022!

Also Read:

Summary

Related Blogs

Performance, Efficiency and Flexibility

FlexLogix’s Dynamic TPU

On-Demand Access to Cheng’s talk and presentation

Also read:

Also read:

Another great quarter- demand remains super strong

KLA can “dial in” numbers given the huge backlog

Supply chain issues remain but not very impactful

Yield management continues to be a crucial market segment

Has KLA lost its way?

Lasertec’s recent quarter

Data points from SPIE

Weakness in E Beam wafer tools

The stock

Also read: