Banner 800x100 0810

Tensilica Edge Advances at Linley

Tensilica Edge Advances at Linley
by Bernard Murphy on 05-04-2022 at 6:00 am

NNE graphic min

The Linley spring conference this year had a significant focus on AI at the edge, with all that implies. Low power/energy is a key consideration, though increasing performance demands for some applications are making this more challenging. David Bell (Product Marketing at Tensilica, Cadence) presented the Tensilica NNE110 engine to boost DSP-based AI, using a couple of smart speaker applications to illustrate its capabilities. Amid a firehose of imaging AI in the media, I for one am always happy to hear more about voice AI. The day when voice banishes keyboards can’t come soon enough for me 😎 These Tensilica Edge advances also support vision applications naturally.

The Need

DSPs are strong platforms for ML processing since ML needs have much in common with signal processing. Support for parallelism and accelerated MAC operations has been essential in measuring, filtering and compressing analog signals for many decades. The jump to ML applications is obvious. As those algorithms rapidly evolve, DSP architectures are also evolving for more parallelism, more MACs and more emphasis on keeping big data sets (weights, images, etc) on-chip for as long as possible to limit latencies and power.

Another area of evolution is in specialized accelerators to augment the DSP for specialized functions with even lower latency and power. In voice-based applications, two very important examples are noise suppression and trigger word detection. In noise suppression, intelligent filtering can now do better than conventional active noise filtering. Trigger word detection must be always-on, running at ultra-low power to allow the rest of the system to remain off until needed. Recognizing trigger words requires ML, at ultra-low power.

Meeting these needs with NNE110

A now popular method for de-noising is based on an LSTM network trained to separate speech from environmental noise. This allows adapting across a wide variety of environment possibilities. Profiling reveals that 77% of these operations running on pure DSP implementation are matrix and vector operations, and about half the remaining operations are in activation functions such as sigmoid or tanh. These are obvious candidates to run on the accelerator. Comparing between pure DSP and DSP+NNE implementations, both latency and power improve by over 3X. For a different de-noising algorithm , latency and power reduce even more dramatically, by 12X and 15X respectively. This is for a CNN  based on U-NET, here adapted from a different domain.

Implementation

The NNE accelerator looks like it slips in very cleanly to the standard Tensilica XAF flow. In mapping instructions from TensorFlow Lite for Microcontrollers, standard Tensilica HiFi options are reference ops and HiFi optimized ops. NNE ops are just another option connected through a driver to the accelerator. In development, supported operations simply map to the accelerator rather than one of the other classes of ops.

David pointed out that multiple applications can benefit from this fast and very low-power always-on extension. This is in the visual domain as well as in voice recognition. Obvious candidates include trigger word recognition, visual wake words, gesture detection and more.

If you want to learn more, you probably had to be registered to the Linley conference to get the slides, however Cadence has a web page on NNE. Also you can learn more about the LSTM algorithm HERE and the U-NET algorithm HERE.

Also read:

ML-Based Coverage Refinement. Innovation in Verification

Cadence and DesignCon – Workflows and SI/PI Analysis

Symbolic Trojan Detection. Innovation in Verification

 


Bigger, Faster and Better AI: Synopsys NPUs

Bigger, Faster and Better AI: Synopsys NPUs
by Kalar Rajendiran on 05-03-2022 at 10:00 am

ARC NPX6 440 TOPS

AI-based applications are fast advancing with evolving neural network (NN) models, pushing aggressive performance envelopes. Just a few years ago, performance requirements of NN driven applications were at 1 TOPS and less. Current and future applications in the areas of augmented reality (AR), surveillance, high-end smartphones, ADAS vision/LiDAR/RADAR, high end gaming and more are calling for 50 TOPS to 1000+ TOPS. This trend is leading to development of neural processor units (NPUs) to handle this demanding requirement.

Pierre Paulin, Director of R&D, Embedded Vision at Synopsys gave a talk on NPUs at the Linley Spring Conference April 2022. His presentation was titled “Bigger, Faster and Better AI: Synopsys NPUs” and covered their recently announced ARC NPX6 and ARC NPX6FS processors. This post is a synopsis of the salient points from his talk.

Embedded Neural Network Trends

Four factors contribute to the increasing levels of performance requirements of artificial intelligence (AI) applications.

  • AI research is evolving and new neural network models are emerging. Solutions must be able to handle models such as the AlexNet from 2012 as well as the latest models such as the transformer and recommender graphs.
  • With the automotive market being a big adopter of AI, the applications need to meet functional safety requirements standards. This market requires mature and stable solutions.
  • Applications are leveraging higher definition sensors, multiple camera arrays and more complex algorithms. This calls for parallel processing of data from multiple types of sensors.
  • All of the above push more requirements on to the SoCs implementing and supporting the AI applications. The hardware and software solutions should enable quicker and quicker time to market.

Synopsys’ New Neural Processing Units (NPUs)

Synopsys recently introduced their new NPX series of NPUs to deliver performance, flexibility and efficiency demanded by the latest NN trends.

The NPU core of the NPX6 offering is based on a scalable architecture with 4K MACs building blocks. A single NPU instance can be built from 1 to 24 NPU cores. A multi-NPU configuration can include up to 8 NPU instances. Synopsys also offers NPX6FS to support the automotive market. Refer to the Figures below for corresponding block diagrams.

The key building block within the NPU core is the Convolution Accelerator. Synopsys’ main focus was on MAC utilization for handling the most modern graphs such as the EfficientNet. The NPX6/NPX6FS contain a generic tensor accelerator to handle the non-convolution parts and fully supports the Tensor Operator Set Architecture (TOSA).

A high bandwidth, low latency interconnect is included within the NPU core and is coupled with high-bandwidth L1 and L2 memories. The NPX6 also includes an intelligent broadcast feature which works as follows. Anytime a feature map or coefficient is read from external memory, it is read only once and reused as much as possible within the core. The data is broadcast only when used by more than one core.

Of course, the hardware is only half the story. The other half is software and Synopsys has been working on the entire effort for many years to deliver a solution that is fully automatic. Some of the key features/functionality are mentioned below.

Flexibility

With every new NN model comes a new activation function. The NPX6/NPX6FS cores support all activation functions (old, new and ones yet to come) using a programmable lookup table approach.

Enhanced datatype support

Though the industry is moving toward 8 bit datatype support, there are still cases where a mix of datatypes is appropriate. Synopsys provides a tool that automatically explores the hybrid versions of a couple of layers in 16 bit and all other layers in 8 bit. The NPX6 supports FP16 and BF16 (as options) with very low overhead. Customers are taking this option to quickly move from a GPU oriented, power hungry solution to an embedded, low power, small form factor solution.

Latency reduction

Instead of pipelining, the NPX architecture takes an approach of parallelizing a convolutional layer on multiple cores to deliver both higher throughput and lower latency.

Power Efficiency

The NPX6 is able to achieve 30 TOPS/W in 5nm, which is an order of magnitude better than many solutions out there today.

Bandwidth Reduction

With a machine running at over 100 TOPS, the NPX6 is able to handle the bandwidth requirement with a LPDDR4/LPDDR5 class of memory interface.

Benchmark Results

Refer to Figure below for performance benchmark results when comparing Frames per second per Watt as the metric.

On-Demand Access to Pierre’s entire talk and presentation

You can listen to Pierre’s talk from here, under “Keynote and Session 1.”  You will find his presentation slides here, under “Day 1 – Keynote – AM Sessions.”

Also read:

The Path Towards Automation of Analog Design

Design to Layout Collaboration Mixed Signal

Synopsys Tutorial on Dependable System Design

 

 


Future.HPC is Coming!

Future.HPC is Coming!
by Daniel Nenni on 05-03-2022 at 6:00 am

HPC Image Altair SemiWiki

According to the experts, the semiconductor industry is poised for a decade of growth and is projected to become a trillion dollar industry by 2030. In 2021 the semiconductor industry finally hit $600B so $1T by 2030 seems like a big ask, but not really if you look at the indicators inside the semiconductor ecosystem. Foundries, EDA, IP, and other ecosystem markers grew at record levels in 2021. The $1T question is: What will be the next big driver in the next decade for semiconductors?

The answer of course is high performance computing (HPC) and if you want to learn  more about HPC market segments Argonne National Laboratory, 3M, and Google are a great place to start:

Future.HPC Virtual Event

Register Here

Compute Intelligence for Breakthrough Results

Altair’s flagship high-performance computing event looks at outstanding HPC-driven end results customers have realized in the last year. HPC professionals from across the globe will share how they empower their organizations to turn CPUs and GPUs into groundbreaking medical research, faster planes and automobiles, smaller chips, smarter financial models, and so much more.

Featuring a mix of leadership presentations, panel discussions, breakout sessions, and networking opportunities, attendees can connect with and learn from fellow designers, engineers, IT specialists, and data scientists on the latest technology topics influencing every industry and the world around us.

All presentations will have live audio translations into French, Spanish, German, Italian, and Portuguese.  The event will be presented in two time zones – CEST (Europe & APAC) and EDT (AMER).

Join experts from Argonne National Laboratory, 3M, Google, WIRED Editor-in-Chief Greg Williams, and many more, for leadership presentations, roundtables, breakout sessions, and networking opportunities. Whether you’re an HPC pro or an end-user who prefers to keep the complexity “under the hood,” Future.HPC is the place to connect virtually with designers, engineers, IT specialists, and data scientists accelerating innovation timelines in your industry.

Day One Tuesday May 17, 2022

10:00 AM (Paris); 1:30 PM (India); 4:00 PM (Shanghai/Kuala Lumpur)
11:00 AM (New York); 8:00 AM (San Francisco); 10:00 AM (Mexico City); 12:00 PM (Sao Paulo)

Welcome and Introduction

Dr. Rosemary Francis, Chief Scientist, Altair
Rick Watkins, Senior Director Cloud Computing, Altair

Altair Keynote

James R. Scapa, Founder, Chairman, & CEO, Altair,
Joe Sorovetz, SVP Enterprise Solutions, Altair

Harnessing the Great Acceleration

Greg Williams, Editor-in-Chief, WIRED Magazine

The next big technology trends that will be driven by supercomputing.

3M: Accelerating Cloud Adoption for HPC Workloads with Altair PBS Professional and Altair Control

Gabe Turner, HPC Solutions Architect, 3M

With approximately 60,000 products spanning four business groups, 3M is constantly improving products to fit the needs of its customers. Modeling and simulation touch many stages of the product development process at 3M and high-performance computing is critical to accommodating such computationally intensive workloads. 3M has been using Altair PBS Professional for many years to great success, and in 2018 became an early adopter of Altair Control and its cloud bursting capabilities for HPC.

Google: Feature Partner Presentation

Dr. William (Bill) Magro, Chief Technologist, High-Performance Computing , Google

Building the Future of HPC Workload Management at ANL

William Allcock, ALCF Advanced Integration Group, Argonne National Laboratory

As Argonne National Laboratory prepares to support researchers tackling critical problems on its Polaris supercomputer and eventually its Exascale system, Aurora, William (Bill) Allcock, Manager, ALCF Advanced Integration Group, outlines the critical role of workload management in advancing what’s possible with HPC. He will share how his team’s initial exploration of open-source workload management technology led to their adoption of Altair’s commercial solution, PBS Professional, in 2021, and to a partnership between HPC experts at ANL and Altair that will …

HPC at Punch Torino

Mauro Bighi, CIO PUNCH Group

Panel Discussion on High-performance Computing: Decades of Technological Advancement Give Drastically New Meaning to “HPC”

Dr. Bill Nitzberg, Chief Engineer – HPC, Altair
Fritz Ferstl, SVP Software Development, Altair
Stuart Taylor, Director Software Development, Altair

At this moment, Altair has the largest brain trust of HPC expertise in the world. Join the experts who have shepherded technology as original as PBS, Sun Grid Engine and Runtime Design Automation into today’s high stakes commercial HPC space. They’ll discuss the decades of advancement, development and acquisition that went into assembling the most comprehensive HPC optimization portfolio on the market, preparing us for an all new definition of “high performance computing.”

Day Two Wednesday May 18, 2022

10:00 AM (Paris); 1:30 PM (India); 4:00 PM (Shanghai/Kuala Lumpur)
11:00 AM (New York); 8:00 AM (San Francisco); 10:00 AM (Mexico City); 12:00 PM (Sao Paulo)
Welcome and IntroductionDr. Rosemary Francis, Chief Scientist, AltairA Cloud for Every Workload: Which Model is Right for You?

Rick Watkins, Senior Director Cloud Computing, Altair

As the leading provider of workload management and compute optimization technology for three decades, Altair has helped enterprise computing customers at the cutting edge of HPC move critical workloads to the cloud. Now, we’re helping thousands of simulation software end users enticed by the promise of increasing exploration potential with “infinite” scalability do the same. Whether you’re an HPC expert looking for tools to tune and optimize or an end user looking to boost productivity without IT complexity, join cloud.

Parallel Track – Cloud for End Users: Turbocharge Engineering Productivity, No IT Expertise Required

Raghvendra Srivastava, Product Manager, Altair

HPC and cloud accessibility make it possible to scale exploration exponentially, but the key to breakthrough results is ensuring time spent interfacing with IT doesn’t scale accordingly. In this breakout session just for simulation application end users, attendees will see how the most competitive teams access their software, their data, and powerful HPC resources to turbocharge productivity from anywhere in the world. From launching software on any device and accelerating jobs with on-demand solving power to seamlessly visualizing and sharing.

Parallel Track – Cloud for HPC Pros: Expand HPC Infrastructure On-demand with Cost-effective, Multi-cloud Scaling

Ian Littlewood, Product Manager Enterprise Computing, Altair

Beyond empowering your team with flexible compute resources to scale productivity, cloud bursting technology provides tunning and automation opportunities that make real impact on the metrics that matter most to your organization. Join this breakout session for HPC stakeholders to see how to manage, optimize and forecast compute resources, bursting to and between your on-prem resources and Oracle Cloud Infrastructure, Google Cloud Platform, Microsoft Azure, and Amazon Web Services (AWS). “It just works” may be the magic words for end.

Join us for our flagship HPC event for 2022!

Register Now to Receive Agenda Updates

 

About Altair
Altair is a global leader in computational science and artificial intelligence (AI) that provides software and cloud solutions in simulation, high-performance computing (HPC), data analytics, and AI. Altair enables organizations across all industries to compete more effectively and drive smarter decisions in an increasingly connected world – all while creating a greener, more sustainable future. For more information, visit https://www.altair.com/.

Also Read:

Six Essential Steps For Optimizing EDA Productivity

Latest Updates to Altair Accelerator, the Industry’s Fastest Enterprise Job Scheduler

Chip Design in the Cloud – Annapurna Labs and Altair


IP Subsystems and Chiplets for Edge and AI Accelerators

IP Subsystems and Chiplets for Edge and AI Accelerators
by Daniel Payne on 05-02-2022 at 10:00 am

Scalable Chiplet Platform min

From a business viewpoint we often read in the technical press about the virtues of applying AI, and in the early days most of the AI model building was done in the cloud, because of the high computation requirements, yet there’s a developing trend now to use AI accelerators at the Edge. The other mega-trend in the past decade is that the RISC-V ISA has been applied to more tasks, and the momentum is only growing. Ketan Mehta from OpenFive presented at the IP-SoC Silicon Valley 2022 in April, so I attended to see what’s happening with RISC-V, chiplets, the Edge and AI accelerators.

OpenFive was founded in 2003 and named Open-Silicon at that time, venture-funded, and has been growing swiftly to over 600 people now, while providing custom silicon design services resulting in over 350 tape outs. They have engineering expertise in RISC-V, memory IP; and connectivity IP like chip to chip, died to die, and even chiplets.

The data center is morphing as the demands of HPC (High Performance Computing) continue to build, so processors are now connecting to accelerator NICs using the CXL standard, even HBM (High Bandwidth Memory) are using accelerators to connect with processors through CXL. Memory IP for cache is often used inside of these accelerators. OpenFive uses their experience designing scalable chiplets to address some of these technology challenges, by meeting system design requirements like:

  • Low latency – sub-10ns
  • Low footprint – edge devices, PCIe server cards
  • Low power – from 0.5W to 10W
  • High throughput – Tbps/mm

For connecting HBM, LPDDR IP and D2D (Die to Die) there are three IP products designed by OpenFive:

A scalable chiplet platform from OpenFive can have compute, memory and connectivity IP; with both subsystems (green) and custom IP (blue):

An example of an Edge AI system using four RISC-V cores, hardware accelerators, memory controllers, and IO connectivity, all at a power target under 5W was presented by Ketan:

Edge AI RISC-V platform

Engineers at OpenFive have already delivered several scalable chiplet platforms to customers in process nodes ranging from 5nm up to 16nm, using a variety of RISC-V cores, memory IP and interconnect IP combinations.

Chiplets are a way to combine multiple die together in a single package, in order to achieve higher yields at lower costs than a single SoC, while meeting power and throughput budgets. There were two chiplet examples provided: CPU, IO chiplet.

CPU chiplet
IO chiplet

Summary

Ketan’s presentation showed me how OpenFive has been able to design and then deliver silicon-proven subsystems across multiple applications, like: Edge, AI and HPC. Chiplet usage is now ramping up, as more system companies are able to optimize their ideas using disaggregated silicon die that are tuned for the workloads of their applications. Using a vendor with a large array of IP subsystems is a competitive advantage, as IP reuse provides time to market benefits.

Related Blogs


High Efficiency Edge Vision Processing Based on Dynamically Reconfigurable TPU Technology

High Efficiency Edge Vision Processing Based on Dynamically Reconfigurable TPU Technology
by Kalar Rajendiran on 05-02-2022 at 6:00 am

Fast model evolution Flexibility is key

While many tough problems relating to computing have been solved over the years, vision processing is still challenging in many ways. Cheng Wang, Co-Founder and CTO of FlexLogix Technologies gave a talk on the topic of edge vision processing at Linley’s Spring 2022 conference. During that talk he references how Gerald Sussman took the early steps of computer vision processing way back in 1966. Gerald, a first-year undergraduate student under the guidance of MIT AI Lab co-founder Marvin Minsky tried to link a camera to a computer. Much progress has happened since then. Of course, the requirements and the markets for computer vision haven’t stayed static during this time.

The early era of computer vision processing focused on industrial grade computing equipment that tolerated large form factors and high costs of the solutions. Fast forward to the most recent decade, neural network models and GPUs have played critical roles in advancing vision processing capabilities. But delivering solutions in smaller form factors and at low costs is still a challenge. In his talk, Cheng discusses the reasons behind these challenges and FlexLogix’s solution to edge vision processing based on dynamically reconfigurable TPU technology. The following are some excerpts from his presentation.

Performance, Efficiency and Flexibility

Edge computer vision requires extreme amount of processing at Teraops rates. And the vision solutions need to demonstrate high accuracy at low latencies, operate at low power and be available at low cost points. While GPUs can deliver the performance, they are large, expensive and power hungry and thus not a good match for edge compute devices. And GPUs count on a huge amount of memory bandwidth via DDR type interfaces.  On top of these challenges, the neural models are also fast evolving. Not only are new models emerging at a rapid rate, even the same models undergo incremental changes at a frequent rate. Refer to Figure below to see how frequently the popular model YOLOv5 is going through changes.

The processing of neural network models is very different from general purpose processing when it comes to compute work load and memory access patterns. Each layer may require vary computational loads relative to the memory bandwidth that layer requires. And this changes dynamically as different layers are processed. So, an optimal approach to solving the challenges counts on memory efficiency and future proofing for changing models. Graph streaming will help reduce DRAM requirements but bandwidth matching on a varying load is difficult.

FlexLogix’s Dynamic TPU

FlexLogix’s Dynamic TPU offers a flexible, load-balanced, memory-efficient solution for edge vision processing applications.

The Dynamic TPU is implemented using Tensor Processor Arrays (ALUs) and EFLX logic. The architecture enables very efficient layer processing across multiple Tensor Processor Arrays that communicate via FlexLogix’s XFLX InterConnect and access L2 SRAM for memory efficiency. As the TPU uses EFLX cores, the control and data paths are future proofed for changes in activation functions and operator changes. By streaming data at a sub-graph level, more efficient bandwidth matching is made possible. Refer to Figure below.

While a GPU-based edge vision processing solution may consume power in the 75W-300W range , a Dynamic TPU based solution will consume in the 6W-10W range. Whereas a GPU-based solution predominantly relies on GDDR, a Dynamic TPU-based solution relies on local connections, XFLX connections, flexible L2 memories and LPDDR.

The FlexLogix solution includes the InferX SDK which directly converts a TensorFlow graph model to dynamic InferX hardware instance. A Dynamic TPU-based solution will yield a much higher efficiency on the Inference/Watt and Inference/$ metrics compared to a GPU or CPU based solution. All in all, a superior performance with software flexibility and future proofing versus ASIC solutions.

On-Demand Access to Cheng’s talk and presentation

You can listen to Cheng’s talk from here, under Session 5.  You will find his presentation slides here, under Day 2- PM Sessions.

Also read:

A Flexible and Efficient Edge-AI Solution Using InferX X1 and InferX SDK

Flex Logix and Socionext are Revolutionizing 5G Platform Design

Using eFPGA to Dynamically Adapt to Changing Workloads

 


ITSA – Not So Intelligent Transportation

ITSA – Not So Intelligent Transportation
by Roger C. Lanctot on 05-01-2022 at 10:00 am

ITSA Not So Intelligent Transportation

The Infrastructure Investment and Jobs Act (IIJA) passed last year in the U.S. earmarks billions of dollars that can be used for the deployment of potentially life-saving C-V2X car connectivity technology. The U.S. Department of Transportation and state DOTs are poised to commence that spending, but one thing stands in the way of car maker or state DOT willingness to proceed – a lawsuit by the Intelligent Transportation Systems of America (ITSA) and the American Association of State Highway Transportation Officials (AASHTO).

ITSA and AASHTO are seeking to reverse the Federal Communication Commission’s (FCC) re-allocation of 45MHz of spectrum in the 5.9GHz band – previously preserved for dedicated short range communication (DSRC) use by automobiles – for unlicensed Wi-Fi use. ITSA and AASHTO want the 45MHz restored.

The judge is unlikely and probably does not have the authority to reverse the FCC’s unanimous decision. The only possible path to success for the ITSA and AASHTO would be for the judge to find the FCC’s decision-making process somehow flawed. This is highly unlikely – which means that the legal action is a waste of time and money and maybe…lives.

Ironically, ITSA and AASHTO wave the bloody flag in their efforts to preserve the prior spectrum allocation – claiming their efforts are intended to save lives the best way possible with connected car technology. Their efforts are actually further delaying the prospect of the adoption of connected car technology in the near term.

ITSA CEO Laura Chase comments in an opinion piece in the latest ITSA magazine:

“While there is no magic bullet to reduce crashes and fatalities, we have a responsibility to use all the tools at our disposal to save lives. The best tool we currently have is connected vehicle technologies – but without wide-scale deployment, we can’t hope to move the needle on reducing traffic fatalities.”

Well, Laura, we can’t move that needle as long as ITSA and AASHTO continue to inject uncertainty into the regulatory process. Car makers need clarify, alignment, and commitment. State and Federal contracting authorities, too, need clarity. ITSA and AASHTO have muddied the waters and put a bullet in the head of potentially life-saving infrastructure projects incorporating C-V2X technology.

Even stranger, Chase says the ITSA is simultaneously working on re-imbursement for stranded DSRC deployments – something that is already provided for in the IIJA. The only good news from ITSA is that the organization appears to be “accepting” the FCC’s overt endorsement of cellular-based C-V2X technology over DSRC. Thank goodness for small things.

The legal action by ITSA and AASHTO means that car companies or state DOTs are frozen. Proposals can’t be written and money cannot be allocated until the case is resolved.

Multiple car companies and DOTs have applied for waivers from the FCC to proceed with their projects – but the FCC has not even posted the waiver requests, which are subject to public comment. ITSA and AASHTO have gummed up the very process for which they have worked for more than 20 years – to bring V2X technology to the market.

A senior General Motors executive speaking as part of a 5.9GHz forum at the recent ITSA event in Charlotte, N.C., said, of this legal action: “You lost half of the dedicated 5.9GHz spectrum because you did nothing with it for 20 years. If you don’t do something with it now you’re likely to lose the rest of it.”

Worse, though, is the reality that the legal action by ITSA and AASHTO is actually costing the U.S. valuable time in the race to compete with China. China long ago abandoned DSRC as the primary connected car technology in favor of C-V2X.

As many as 13 auto makers in China have either already introduced C-V2X-equipped vehicles or have announced plans to do so. In the U.S., Ford Motor Company, Audi of America, and Jaguar Land Rover and multiple state DOTs have submitted waiver requests to introduce the technology – and, so, they wait.

ITSA and AASHTO are on the wrong side of history. These organizations are wasting time, money, and lives in the interest of turning back the clock. The FCC has spoken. The spectrum has been allocated. The billions of dollars have been approved. It’s time for ITSA and AASHTO to simply get out of the way.

Also read:

OnStar: Getting Connectivity Wrong

Tesla: Canary in the Coal Mine

ISO 26262: Feeling Safe in Your Self-Driving Car


Has KLA lost its way?

Has KLA lost its way?
by Robert Maire on 05-01-2022 at 6:00 am

KLA SPIE 2022

-KLA has another great QTR in face of overwhelming demand
-Supply chain issues obliterated by backlog
-Longer term technology leadership concerns are increasing
-We see limited upside near term & remain cyclically cautious

Another great quarter- demand remains super strong

KLA’s performance remains great as does overall semiconductor equipment demand. KLA reported revenues of $2.3B versus expectations of $2.2B and non GAAP EPS of $5.13 versus street of $4.82. Guidance was for revenues of $2.3B to $2.55B versus current expectations of $2.36B with earnings in the range of $4.93 to $6.03 versus current expectations of $5.30

KLA can “dial in” numbers given the huge backlog

Historically KLA has almost always been able to accurately dial in numbers for the next quarter given the huge and long backlog they have. The current backlog is out the door and down the street and not likely to shorten any time soon.

This results in the ability to both guide and deliver numbers wherever they wish. Some segments remain a little bit lumpy due to high selling prices or mix shifts. With most deliveries running at a year or more and over $8B in solid orders we don’t see a lot of risk to the backlog right now. However we have seen backlog deflate in prior cycles but we never with the level we currently have.

Supply chain issues remain but not very impactful

Supply chain issues remain “fluid” but the large backlog clearly mitigates most if not all of that instability. As compared to other companies in the industry that typically run a more turns business KLA can modulate to adapt to shortages.

Yield management continues to be a crucial market segment

Growth in all things yield management remain very strong especially in emerging markets that have a lot more to learn when it comes to semiconductor manufacturing. This means China which remains a huge market for semiconductor tool makers including KLA.

This adds perhaps a bit more risk but so far we don’t think the US government is in a mood to upset the Chinese by turning up the heat in trade restrictions given the Ukraine situation.

Has KLA lost its way?

KLA’s first product was reticle inspection back in 1975 and it has been one of the two pillars of the company along with wafer inspection for the entire life of the company. We think the reticle inspection pillar of the company is weakening, though perhaps not completely, it has certainly lost the technology lead and along with it the future business.

A former upstart pimple on KLA, Lasertec, has clearly taken the technology lead and with it, the most profitable as well as future of the reticle inspection market. Lasertec likely has the dominant share of leading edge reticle inspection revenue as well and will likely expand that lead.

Lasertec’s recent quarter

Lasertec recently announced its quarter and along with it projections for the next twelve months business which they expect will come in at over $2B versus KLA’s just reported $611M in the quarter in “patterning” which likely does not represent pure 100% reticle inspection tools.

More importantly, Lasertec is the only game in town in EUV actinic inspection. We believe KLA’s actinic tool has been further delayed by issues with several hardware subsystems let alone the fact that the noble gas Xenon, which the system runs on, has skyrocketed recently from $10 a liter to over $200 a liter if you can get it…and still climbing.

We have heard that KLA’s E-Beam reticle inspection (the 8XX) tool has not been popular customers and public data shows that E-Beam is just way too slow. But right now customers may settle for a slower 8XX tool as actinic is years away from either Lasertec or maybe eventually KLA.

KLA may point to “print check” (AKA print and pray) which uses a wafer inspection tool to look at what has been printed from the reticle on the wafer but its not a direct (only inferred) system that is useless in a mask shop anyway. Actinic is clearly the gold standard and only Lasertec has it.

Data points from SPIE

We recently attended SPIE (a conference about all things lithography) and a talk given by a major chipmaker who is first in line for High NA EUV tools spoke about High NA reticle inspection and showed a picture of a system…and it wasn’t KLA.

So currently where the industry is and is going in reticle inspection is not KLA and KLA may not have time to catch up given the delays. We have seen this movie before as ASML was around 10-15 years delayed in getting EUV scanners to market. KLA’s multi year self imposed halt in the program certainly made things even worse.

KLA still does a great business in older technology reticle inspection for all those second and third tier fabs in China but that’s not saying a lot.

Weakness in E Beam wafer tools

While reticle inspection may already be a fait acompli we are also starting to get more concerned about wafer inspection. ASML recently announced a 5X5, 25 multibeam (not multicolumn…there is a difference) E Beam wafer inspection tool in the Hermes division. ASML has been winning in wafer defect inspection while AMAT has been exploding in the E Beam wafer metrology market. KLA still dominates in optical, which is about 4 times the size of E Beam, but clearly needs to catch up to ASML and AMAT in E Beam.

The stock

The results and financials are great…as always. Demand remains super strong. We certainly are not concerned about the near term but have questions about the longer term especially when the market eventually slows.

Right now customers are desperate for tools and anything that will help the yields of ever more complex process so KLA is in a good seat. Perhaps not as good as ASML but second best.

Much of the current success is due to momentum, size and desperation not necessarily technology leadership. This makes us more concerned about the longer term issues .

While 2022 seems almost “in the bag” we are more concerned about where things go when the tide goes out and exposes issues in the longer term.
From a valuation perspective its hard to fight the negative tape in chip stocks and much of the strong performance including a strong second half is already baked into the numbers and expectations.

We don’t see a lot of upside headroom in the stock and see more longer term potential downside at this point which would make us avoid putting more money to work here.

Also read:

LRCX weak miss results and guide Supply chain worse than expected and longer to fix

Chip Enabler and Bottleneck ASML

DUV, EUV now PUV Next gen Litho and Materials Shortages worsen supply chain


Podcast EP75: Getting There is Half the Fun – Connecting the Digital World with Alphawave IP

Podcast EP75: Getting There is Half the Fun – Connecting the Digital World with Alphawave IP
by Daniel Nenni on 04-29-2022 at 10:00 am

Dan is joined by Tony Pialis, the co-founder and CEO of Alphawave, a global leader in high-speed connectivity IP enabling industries such as AI, autonomous vehicles, 5G, hyperscale data centers, and more. Tony and Dan discuss the requirements for data connectivity across many high-growth markets and what is required for successful deployment.

Tony is the former VP of Analog and Mixed-Signal IP at Intel and has co-founded three semiconductor IP companies, including Snowbush Microelectronics Inc (sold to Gennum/Semtech and now part of Rambus) and V Semiconductor Inc (acquired by Intel).

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Dr. Robert Giterman of RAAAM Memory Technologies

CEO Interview: Dr. Robert Giterman of RAAAM Memory Technologies
by Daniel Nenni on 04-29-2022 at 6:00 am

RAAM Memory Group Photo SemiWiki

Dr. Robert Giterman is Co-Founder and CEO of RAAAM Memory Technologies Ltd, and has over nine-years of experience with the research and development of GCRAM technology, which is being commercialized by RAAAM. Dr. Giterman obtained his PhD from the Emerging Nanoscaled Circuits and Systems Labs Research Center in Bar-Ilan University. Following the completion of his PhD in 2018, he joined the Telecommunications Circuits Laboratory in the Ecole Polytechnique Federale de Lausanne, Switzerland, as a post-doctoral researcher. As part of his research, he has led the front-end and physical implementations of multiple ASICs, and mentored numerous PhD thesis and MSc projects in the field of VLSI embedded memories. Dr. Giterman has authored over 40 scientific papers and holds 10 patents.

First, please tell me about RAAAM?
RAAAM Memory Technologies Ltd. is an innovative embedded memory solutions provider, that delivers the most cost-effective on-chip memory technology in the semiconductor industry. RAAAM’s silicon-proven Gain-Cell RAM (GCRAM) technology combines the density advantages of embedded DRAM with SRAM performance, without any modifications to the standard CMOS process available from multiple foundries.

RAAAM’s patented GCRAM technology can be used by semiconductor companies as a drop-in replacement for SRAM in their SoCs, allowing to significantly reduce fabrication costs through a significant die size reduction. Alternatively, increasing the on-chip memory capacity in the same die size enables a dramatic reduction in the off-chip data movement to resolve the memory bottleneck. This increase in on-chip memory capacity will enable additional features that can enable industry growth for applications in the areas of AR/VR, Machine Learning (ML), Internet-of-Things (IoT), and Automotive.

What problem are you solving?
Important industry growth drivers, such as ML, IoT, Automotive and AR/VR, operate on ever-growing amounts of data that is typically stored off-chip in an external DRAM. Unfortunately, off-chip memory accesses are up-to 1000x more costly in latency and power compared to on-chip data movement. This limits the bandwidth and power efficiency of modern systems. In order to reduce these off-chip data movements, almost all SoCs incorporate large amounts of on-chip embedded memory caches that are typically implemented with SRAM and often constitute over 50% of the silicon area. This memory bottleneck is further aggravated since SRAM scaling has been increasingly difficult in recent nodes, shrinking only at a rate of 20%-25% compared to almost 50% scaling for logic.

Can you tell us more about GCRAM technology?
GCRAM technology relies on a high-density bitcell that requires only 2-3 transistors (depending on priorities on area or performance). This structure offers up-to 2X area reduction over high-density 6T SRAM designs. The bitcell is composed of decoupled write and read ports, providing native two ported operation, with a parasitic storage node capacitor keeping the data. Unlike conventional 1T-1C eDRAM, GCRAM does not rely on delicate charge sharing to read the data. Instead, our GCRAM provides an active read transistor that provides an amplified bit-line current, offering low-latency non-destructive readout without the need for large storage capacitors. As a result, GCRAM does not require any changes or additional costs to the standard CMOS fabrication process and scales with technology when properly designed.

While the concept of 2T/3T memory cells has been tried in the past, reduction of the parasitic storage capacitor and concerns about increasing leakage currents has so far discouraged its application beyond 65nm. RAAAM’s patented innovations comprise clever circuit design at both memory bitcell and periphery levels, resulting in significantly reduced bitcell leakage and enhanced data retention times, as well as specialized refresh algorithms optimized for various applications, ensuring very high memory availability even under the most extreme operating conditions. In fact, we had demonstrated the successful scaling of GCRAM technology across process nodes of various foundries (e.g., TSMC, ST, Samsung, UMC), including recent silicon demonstrators in 28nm (Bulk and FD-SOI) and 16nm FinFET technologies implementing up-to 1Mbit of GCRAM memory macros.

Can you share details about your team at RAAAM and what has been done to validate the GCRAM technology?
RAAAM founders, including Robert Giterman, Andreas Burg, Alexander Fish, Adam Teman and Danny Biran, bring over 100+ combined years of semiconductor experience. In fact, RAAAM is built on a decade of world-leading research in the area of embedded memories, and GCRAM in particular. Our work on GCRAM technology has been demonstrated on 10 silicon prototypes of leading semiconductor foundries in a wide range of process nodes ranging from 16nm to 180nm, including bulk CMOS, FD-SOI and FinFET processes. Our work on GCRAM is documented by more than 30 peer-reviewed scientific publications in books, journals, and conference proceedings, and is protected by 10 patents.

Who is going to use RAAAM’s technology and what will they gain?
RAAAM’s GCRAM technology enables a significant chip fabrication cost reduction or highly improved performance, resolving the memory bottleneck for semiconductor companies in various application fields. Since GCRAM is directly compatible with any standard CMOS process and uses an SRAM-like interface, it can easily be integrated into existing SoC designs.

As an example for potential system benefits, we can look at the Machine Learning accelerators domain using a 7nm AI processor integrating 900MB of SRAM on a single die. In this case, the SRAM area constitutes over 50% of the overall die size. Replacing SRAM with RAAAM’s GCRAM technology can provide a reduction of up-to 25% of the overall die size, resulting in up-to $35 savings per die.

Alternatively, for memory-bandwidth limited systems, increasing the on-chip memory capacity can bring substantial performance and power improvements. In fact, the required DRAM bandwidth is often inversely proportional to the on-chip memory capacity. With off-chip memory accesses being up-to 1000x more costly in power and latency compared to on-chip data movement, replacing SRAM with 2X more GCRAM capacity at the same area footprint significantly reduces the off-chip bandwidth requirements and enables RAAAM’s customers to gain a competitive advantage in the power consumption of their chip.

What is RAAAM’s engagement model?
RAAAM follows an IP vendor licensing model. Semiconductor companies can license RAAAM’s GCRAM technology for a fee and production unit royalties RAAAM implements the front-end memory controller and GCRAM-based hard memory macros according to the customer specifications and delivers a soft RTL wrapper (using a standard SRAM interface), which instantiates the GCRAM hard

macros (GDS) and the soft refresh control (RTL). Additionally, the customer receives a characterization report of the hard memory macro and a behavioral model for system-level verification. At present,

RAAAM is working on the implementation and qualification of a GCRAM-based memory compiler, which will enable RAAAM’s customers to automatically generate the complete front and back-end views of GCRAM IP and corresponding characterization reports according to customer specifications.

Can you tell us about your recent achievements?
RAAAM has made very exciting progress recently. First, we have been evaluating the benefits of our technology for leading semiconductor companies, which has confirmed our projected substantial improvements from a performance and cost perspective over existing solutions based on SRAM. In fact, we have recently engaged with a very large semiconductor company on a long-term, co-development project and we continue running customer evaluations for various application fields and process nodes. We see growing interest in our technology in a variety of applications, both in very advanced process (7nm and beyond) nodes and in less advanced ones (16nm and higher). Finally, we are extremely pleased to have joined the Silicon Catalyst Incubator, allowing us to gain access to their comprehensive ecosystem of In-Kind Partners, Advisors, and Corporate VC and institutional investor network.

What is on the horizon for RAAAM?
Our product development roadmap includes full memory qualification in selected nodes of leading semiconductor foundries, based on customer demand. In addition, we have on-going discussions with numerous foundries for further technology migration to their next generation process nodes. Furthermore, we are looking to expand our embedded memory platform and introduce design flow automation based on our memory compiler development efforts. To this end, we are in the process of raising Seed funding to fully qualify our GCRAM technology and to accelerate our company’s overall business growth.

A preliminary GCRAM product brief is available upon request, please send an email to info@raaam-tech.com. Additional information can be found at: https://raaam-tech.com/technology  https://www.linkedin.com/company/raaam

Also read:

CEO Interview: Dr. Esko Mikkola of Alphacore

CEO Interview: Kelly Peng of Kura Technologies

CEO Interview: Aki Fujimura of D2S


Freemium Business Model Applied to Analog IC Layout Automation

Freemium Business Model Applied to Analog IC Layout Automation
by Daniel Payne on 04-28-2022 at 10:00 am

animate preview min

Freemium is the two words “free” and “premium” combined together, and many of us have enjoyed using freemium apps on our phones, tablets and desktop devices over the years. The concept is quite simple, you find an app that is useful, and download the free version, mostly to see if it operates as advertised, and then decide if there’s enough promise to warrant buying the fully-featured version. But wait, is there actually any EDA vendor offering a freemium business model?

Yes, about a year ago, the UK-based company Pulsic introduced their Animate Preview tool to the EDA world as a free download. The only requirement is that you are using Cadence Virtuoso IC6.1.6, IC6.1.7 or IC6.1.8 software. I had a Zoom call with three Pulsic folks this month to better understand this freemium model:


Mark Williams, CEO

  • Mark Waller, Director of User Enablement
  • Otger Perich, Digital Marketing

Q: Why a freemium model?

A: The typical EDA evaluation cycle for a new EDA tool is way too long. Often requiring an NDA to be agreed, terms and conditions to be negotiated, and time and resources for a formal evaluation. It can take many weeks before potential customers can start to really get to know the product’s capabilities.

We wanted to find a way to shortcut this process and remove all of the barriers to entry. With the freemium model, any interested engineer can quickly and directly download a free version and get started in minutes instead of weeks.

To make the freemium model work, we made Animate easy to use with a very simple UI, easy to learn and operate.

Q: What does Animate Preview do?

A: Animate Preview works within their Cadence Virtuoso schematic editor, where a circuit designer can quickly see the automatically created initial layout of their analog cells in minutes. The designer can see the effect of their circuit design decisions in the layout and get accurate area estimates. The free version contains all the features of the premium product, the user can do everything that can be done in the paid version, but they can only then save the design outline and IO pins.

The paid version is called Preview Plus, and with that version, you can save the automatically created initial layouts into OpenAccess. The saved layout includes all the detailed placement information and is a great starting point for creating the final analog block layout.

Animate Preview inside the schematic editor

Q: How long does it take to learn Animate Preview?

A: It’s fast; from downloading the app to seeing the first circuit layout can happen in as little as 20 minutes because it’s a simple process of filling out a form and opening the link in an email to get started. Anyone with a Cadence Virtuoso environment for schematics can use Animate Preview on their analog cells. We’re using a cloud-based license, so you don’t need to think about licensing.

Q: Does the Pulsic tool come with any design examples?

A: Yes, we ship with a Pulsic PDK with example designs in that technology, plus there’s a set of videos to get you started. It’s all designed to just run out of the box. As well as the getting started videos, there is a series of 2-minute tutorials, with 22 tutorials available.

Animate Preview runs in the background when you open a schematic in Virtuoso, which you use just like you normally would. The layouts appear automatically and are updated when circuit changes are made, all without the user needing to create any constraints. Just install and then see the auto-generated IC layouts based on schematics.

Q: What process technology is supported for analog IC layout generation?

A: Our focus has been to ensure that Animate creates great results for TSMC processes from 180nm down to 22nm. However, Animate will work with any planar process on OpenAccess with Cadence P-Cells. We have customers using Animate on many other processes from several fabs. We’re also starting to support some FD-SOI technology, but no FinFET yet.

Q: Is the generated IC layout always DRC clean?

A: Yes, the generated IC layout should be DRC clean, especially for TSMC processes. For other processes, if the rules are in the OA tech file, Animate will obey them. Most customers get good results out of the box, but if a user has any issues, they can contact Pulsic for better support.

Animate Preview generated layout

Q: So, who is using Animate for analog IC cell layout automation?

A: One company that we can talk about is Silicon Labs, out of Austin, Texas; back in 2019, when they were using an early version of the Animate technology, they said, “In our initial evaluation of Animate, we needed to achieve both efficiency and quality for our analog IC layouts, and Animate provided excellent results equal to using traditional approaches but in far less time,” said Stretch Young, Director of Layout at Silicon Labs.  “Collaborating with Pulsic, we see opportunities to improve the quality of our layout, which will increase productivity and save design time.”

Q: How many downloads so far of Animate Preview from your web site?

A: About 360 engineers have downloaded Animate so far. About 100 of these downloaders have created IC layouts, and we’ve followed up with 10s of engagements.

Q: What are some of the benefits of offering a freemium model for EDA tools?

A: With the freemium model, there is less pressure. We see that the users like the free download experience, and then we support them when they have follow-up questions. Users can see the benefits of analog automation within days without the hassle and pressure of the usual EDA sales process. Only if they like what they see and want to save the placement do they need to talk to us.

Launching a new product in COVID times was always going to be a challenge, but a big benefit for us was that we didn’t have to travel to do prospecting because it’s been all online evaluations. So we were able to reach the target audience much quicker.

Q: What types of IC design end markets are attracted to analog IC layout automation?

A: The IoT market has been the most significant sweet spot so far because of the need to be quick to market cheaply and the ability to iterate quickly.  Automotive and general analog IP providers also see great results from our tool.

Q: What are the limitations of Animate Preview as an EDA tool?

A: Animate Preview is designed for core analog cells. The tool is always-on inside the Cadence Virtuoso Schematic Editor and continually updates as you change the schematic. So you just leave it on all of the time, but it will warn you if it cannot apply the technology to a cell. A built-in circuit suitability check will warn you when the circuit is not suitable for Animate, e.g., a hierarchy that is too large or a digital block. Animate Preview will automatically create a layout for analog blocks with up to 100 schematic symbols. With Preview Plus, the user can create a layout for larger analog blocks; it might take a few minutes instead of seconds to produce a result.

Q: Will your company be attending DAC in SFO this summer?

A: Yes, look for our booth, and there will be a theatre setup to show the benefits of analog IC layout automation.

Q: How does Animate Preview work, under the hood?

A: Animate is radically different from other IC layout automation because it has a PolyMorphic approach in a virtual space, producing optimal IC layouts. It really is a unique architecture. The polymorphic engine is patented, but we don’t talk about how it works.

Related Blogs