Banner 800x100 0810

Semiconductor Crash Update

Semiconductor Crash Update
by Daniel Nenni on 05-12-2022 at 10:00 am

Semiconductors are Capturing Electronics

Earlier this year semiconductor oracle Malcom Penn did his 2022 forecast which I covered here: Are We Headed for a Semiconductor Crash? The big difference with this update is the black economic clouds that are looming which may again highlight Malcolm’s forecasting prowess. I spent an hour with Malcolm and company on his Zoom cast yesterday and now have his slides. Great chap that Malcolm.

The $600B question is: When will the semiconductor CEOs start issuing warnings?

RECAP: PERFECT STORM BROKE IN JULY 2020 (BUT NOBODY WAS PAYING ATTENTION)

The 6.5% growth in 2020 set the stage for a big 2021 which ended up at 26%. Previous semiconductor records were 37% in 2000 and 32% in 2010 so 26% is not that big of a number meaning we will have a shorter distance to fall. Covid was the trigger for our recent shortages but it really was a supply/demand imbalance camouflaged by a crippled supply chain.

The big difference today is the daunting economic and geopolitical issues that could possibly raise Malcolm to genius forecasting level. The horrific geopolitics with Russia and China, rampant inflation, the workforce challenges around the world, and of course Covid is not done with us yet, not even close.

Let’s take a look at the four key drivers from Malcolm’s presentation:

  1. Economy: Determines what consumers can afford to buy.
  2. Unit Demand: Reflects what consumers actually buy plus/minus inventory adjustments.
  3. Capacity: Determines how much demand can be met (under or over supply).
  4. ASPs: Sets the price units can be sold for (supply – demand + value proposition).

The Economy is the big change since I last talked to Malcolm. In my 60 years I have never experienced a more uncertain time other than the housing crash in 2008 where a significant amount of my net worth would disappear overnight. Today, however, I am a financial genius for holding fast. Property values here are about double of the peak in 2008 which is great but is also a little concerning.

Bottom line: I think we can all agree the economy is in turmoil with the inflation spike and the jump in interest rates and debt. Maybe some financial experts can chime in here but in my experience this trend will get worse (recession?) before it gets better.

Unit Demand is definitely increasing due to the digitalization transformation that we have been working on for years. Here is a slide from a keynote at the Siemens EDA Users meeting last week. I will be writing about this in more detail next week. Unit volume is the great revealer of truth versus revenue so this is the one to watch. Unfortunately “take or pay” and “prepay” contracts are becoming much more common and that can disturb unit demand as a forecasting metric.

Bottom line: Long term semiconductor unit demand will continue to grow in my opinion (not at rate we experienced in 2021) but that will largely be due to the Covid backlog and inventory builds. The big risk here is China. China is in turmoil and they are the largest consumer of semiconductors. China stemmed the first Covid surge with draconian measures which they are again employing and again the electronics supply chain is impeded. Other parts of the world who are not paying attention to what is happening in China will suffer the consequences in the months to come, my opinion.

Capacity is a tricky one. Let’s break this one into two parts: Leading edge nodes (FinFETs) and mature nodes (Not FinFETs). We are building leading edge capacity with impunity. It’s a PR race between Intel, Samsung, and TSMC and since Intel is outsourcing significant FinFET capacity to TSMC it makes it even trickier.

To be clear, mature node capacity is being rapidly added but a lot of it is in China since they do not have access to leading edge technology but  will pale in comparison to FinFET capacity. Reshoring semiconductor manufacturing and the record setting CAPEX numbers are also an important part of this equation which makes the over supply argument even easier.

On the other side of the equation the semiconductor equipment companies are hugely backlogged no matter who you are and the electronics supply chain is still crippled so announcing CAPEX and actually spending it is two different things.

In my opinion, if all of the announced CAPEX is actually spent there will be some empty fabs waiting for equipment and customers. Remember, Intel had an empty fab in AZ for years and there are still empty fabs all over China. Staffing new fabs will also be a challenge since the semiconductor talent pool is seriously strained.

Bottom line: We did not have a wafer manufacturing capacity problem before Covid, we do not have a wafer manufacturing capacity problem today, and I don’t see an oversupply risk in the future. We did have a surge in chip demand due to Covid but that will end soon and the crippled supply chain (the inability to get systems assembled and to customers) is easing the fab pressures and that will continue this year and next depending on Covid and how we respond to it.

ASPs are being propped up by the shortage narrative. Brokers, distributors, middlemen are hording and raising prices causing ever more supply chain issues. I have heard of 10x+ price increases for $1 MPUs. Systems companies are paying a premium for off-the-shelf chips and foundries are raising wafer prices in record amounts which, at some point in time, will calm demand.

Bottom line: Malcom is convinced a significant crash is coming but I do not agree based on my ramblings above. If someone asked me to place a 10% over/under bet for semiconductor revenue growth in 2022 I would bet the farm on over. My personal number was and still is 15% growth in 2022.

Let me know if you agree or disagree in the comment section and we can go from there. Exciting times ahead, absolutely.

Also read:

Design IP Sales Grew 19.4% in 2021, confirm 2016-2021 CAGR of 9.8%

Semiconductor CapEx Warning

Chip Enabler and Bottleneck ASML

The ASIC Business is Surging!


Scaling is Failing with Moore’s Law and Dennard

Scaling is Failing with Moore’s Law and Dennard
by Dave Bursky on 05-12-2022 at 6:00 am

Scaling is Falling SemiWiki

Looking backward and forward, the white paper from Codasip “Scaling is Failing” by Roddy Urquhart provides an interesting history of processor development since the early 1970s to the present. However it doesn’t stop there and continues to extrapolate what the chip industry has in store for the rest of this decade. For the last half century, Moore’s Law, an observation regarding the number of transistors that can be integrated on  chip, was crafted by Gordon Moore, one of the founders of Intel Corp. That observation was followed by Robert Dennard of IBM Corp., who in addition to inventing the single-transistor DRAM cell, defined the rules for transistor scaling, now known as Dennard Scaling.

In addition to scaling, Amdahls law, stipulated by Gene Amdahl while at IBM Corp. in 1967, deals with the theoretical speedup possible when adding processors in parallel. Any speedup will be limited by those parts of the software that are required to be executed sequentially. Thus, Moore’s Law, Dennard Scaling, and Amdahl’s law have guided the semiconductor industry over the last half century (see the figure). However, Codasip claims they are all failing and that the industry must change and the processor paradigms must change with it. Some of those changes include the creation of domain-specific accelerators, customized solutions, and new companies that create disruptive solutions.

Supporting the paper’s premise that semiconductor scaling is failing are numerous examples in the microprocessor world. The examples start with the Intel x86 family as an illustration of how scaling failed as chip complexities and clock speeds increased with each new generation of the single-core CPUs. As each CPU generation’s clock frequency increased from the MHz to the GHz level thanks to the improvements in scaling, chip thermal limits became a restraining factor for performance. The performance limitation was the result of a dramatic increase in power consumption as clock speeds hit 3 GHz and higher and complexities hit close to a billion transistors on a chip. The smaller size of the transistors also resulted in increased leakage currents, and the higher leakage currents caused the chips to consume more power even when idling.

To avoid thermal runaway caused by increasing clock frequencies, designers opted for multi-core architectures, integrating two, four or more CPU cores on a single chip. These cores could operate at lower clock frequencies, share various on-chip resources, and thus consume less power. The additional benefit of the multiple cores was the ability to multitask, allowing the chip to run multiple programs simultaneously. However, the multicore approach was not enough for the CPUs to handle the myriad tasks that new applications such as graphics, image and audio processing, artificial intelligence, and still other functions.

Thus, Codasip is proposing that further processor specialization will deliver considerable performance improvements – the industry must change from adapting software to execute on available hardware to tailoring computational units to match their computational load. To accomplish this, many varied custom designs will be needed, permitting companies to design for differentiation. Additionally new approaches to processor design must be considered – especially the value of processor design language and processor design automation.

Using the RISC-V modular architecture as an example of the ability to create specialized cores and its flexibility to craft specialized instructions, Codasip sees the RISC-V as an excellent starting point for tailored processing units. Cores will typically be classified in one of four general categories – MCU, DSP, GPU, and AP (application processor), with each type optimized for a range of computations, some of which may not match what is actually required by the on-chip subsystem.  Some companies have already developed specialized cores (often referred to as application-specific instruction processors, ASIPs) that efficiently handle a narrowly-defined computational workload. However, crafting such cores requires specialized skills to define the instruction set, develop the processor microarchitecture, create the associated software tool chain, and finally, verify the core.

Codasip suggests that the only way to take specialization a step further is to create innovative architectures to tackle specialized processing problems.  Hardware should be created to match the software workload – that can be achieved by customizing the instruction set architecture, creating special microarchitectures, or creating novel processing cores and arrays. ASIPs can be considered a subset of domain-specific accelerator, a category defined in a paper presented in 2019 by John Hennessy and David Paterson – “A New Golden Age for Computer Architecture”.

They characterized DSAs as exploiting parallelism (such as instruction-level parallelism, or SIMD, or systolic arrays) if the class of applications benefitted from it. DSAs can better match their computational capabilities to the intended application. One example is the Tensor Processing Unit (TPU) developed by Google, which is a systolic array working with 8-bit precision. The more specialized the processor, the greater the efficiency in terms of silicon area and power consumption. However, with less specialization, the greater the flexibility of the DSA. On the DSA continuum there is the possibility of fine-tuning a core for performance, area, and power – and design for differentiation is enabled.

Specialization is not only a great opportunity, but it means that there will be many different designs created. Those designs will require a broader community of designers and a greater degree of design efficiency. Codasip sees four enablers that can contribute to the efficient design – the open RISC-V ISA, processor design language, processor design automation, and existing verified RISC-V cores for customization.

They feel that RISC-V – a free and open standard that only covers the instruction set architecture and not the microarchitecture – has garnered widespread support and does not prescribe a licensing model so both commercially licensed and open-sourced microarchitectures are possible. If designers use an processor design lanaguage such as Codasip’s CodAL, they have a complete processor description capable of supporting software, hardware, and verification aspects. If custom instructions are implemented by adding to the processor design language source and can thus be reflected in the software toolchain and verification environment as well as the RTL.

Also read:

Optimizing AI/ML Operations at the Edge

Podcast EP60: Knowing your bugs can make a big difference to elevate the quality of verification

 


Podcast EP78: A Tour of DAC 2022 with Rob Oshana, General Chair

Podcast EP78: A Tour of DAC 2022 with Rob Oshana, General Chair
by Daniel Nenni on 05-11-2022 at 10:00 am

Dan is joined by Rob Oshana, general chair of this year’s DAC. Rob is vice president of software engineering R&D for the Edge Processing business line at NXP.  He serves on multiple industry advisory boards and is a recognized international speaker.  He has published numerous books and articles on software engineering and embedded systems.  He is an adjunct professor at the University of Texas and Southern Methodist University and is a Senior Member of IEEE.

Dan and Rob discuss the program for this year’s DAC. What the various parts of the conference will offer and a surprising discussion about the dynamics of moving back to a live event. This year’s DAC is shaping up to be a memorable event with many relevant topics and focus areas. You definitely want to hear the backstory.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Why Software Rules AI Success at the Edge

Why Software Rules AI Success at the Edge
by Bernard Murphy on 05-11-2022 at 6:00 am

flexlogix min

It is an unavoidable fact that machine learning (ML) hardware architectures are evolving rapidly. Initially most visible in datacenters (many hyperscalars have built their own AI chips), the trend is now red-hot in inference engines for the edge, each spinning new ground-breaking methods. Markets demand these advances to support bigger images and voice recognition problems in real-time, for some level of local training, for local processing for privacy/security, and to reduce power and latency in communication to the fog or cloud. Product OEMs depend on those advances, for differentiation in power, latency, privacy, cost, etc. But they are far from expert in the underlying hardware. Without that understanding, how can they fully exploit their advantages? That’s why software rules AI success at the edge – the software that maps between open AI solutions trained in the cloud and these highly optimized hardware platforms.

FlexLogix at Linley

Randy Allen, VP of software at FlexLogix, presented at the Linley Spring conference on this question, illustrating through the FlexLogix InferX X1 product line. Briefly, the heart of the X1 is a dynamically reconfigurable tensor processor, in which you can rapidly reconfigure the hardware datapath for each layer of ML processing. As one layer completes processing, the next layer reconfigures in microseconds. X1 offers the software flexibility of a CPU solution but with the performance and power advantages of a full ASIC solution.

InferX is a good example of a highly optimized edge architecture. Capable of amazing performance at amazingly low power, but only if used correctly. Meeting that goal requires compiler magic which knows how to map optimally from one of the standard open-source networks (TensorFlow, PyTorch, etc) into the underlying hardware architecture and to connect back to the OEM application. Completely hands-free from an OEM developer point of view.

You can understand then why software (the compiler) can make or break a great edge AI solution. Because, for edge devices fantastic hardware is useless if it is only usable by experts.

More detail on the FlexLogiX compiler

A compiler for one of these devices is a completely different animal from a regular software compiler. In this instance it must map TensorFlow Lite or ONNX operators produced from standard trained networks; into a reconfigurable tensor processor in a way that maximizes throughput while minimizing power.

The compiler maps many diverse planes in a typical network model into the tensor fabric with several constraints in mind. First to organize operations in the network for maximum parallelism. Second to minimize off-chip memory traffic as much as possible. Since any operation needing to go off-chip will automatically incur significant latency and power penalties. So a major consideration in these compilers is finding the greatest possible reuse of the data already on on-chip. Image data, weights and activation function fetch and store delayed as long as possible off-chip memory operations. Between these two constraints is where the X1 compiler works; through scheduling, parallelizing and fusing operations, and maximizing on-chip data reuse. Then finally generating a bit stream for that optimized model to program the InferX device.

Going deeper

You can learn more about the InferX products HERE. If you’re interested in digging deeper into the compiler technology, there’s a nice presentation by Jeremy Roberson (also at FlexLogix), from the 2021 Spring Linley conference.


High-speed, low-power, Hybrid ADC at IP-SoC

High-speed, low-power, Hybrid ADC at IP-SoC
by Daniel Payne on 05-10-2022 at 10:00 am

hybrid adc min

Andrew Levy and I both worked at Intel and Opmaxx, and I knew that he was now working at Alphacore, an IP company specializing in mixed-signal, RF, imaging and rad-hard applications. I was curious what Alphacore was up to, so at the IP-SoC Silicon Valley 2022 event I watched the ADC presentation from Ken Potts, COO of Alphacore. Mr. Potts has been in the semiconductor industry for over three decades, including stints at Silicon Technologies, Cadence, Virage Logic,  Compass Design Automation and VLSI Technology.

I learned from a video interview at the event that Alphacore doubled in size during 2021, and is on track to double again in 2022, so that says a lot about their success in the IP marketplace. Ken’s presentation at IP-SoC was all about their progress in designing hybrid ADC circuits. Applications for RF data converters include: 5G radios, beamforming, direct to RF sampling, and phased array architectures. The challenge is how to achieve high bandwidth while also consuming low power.

The ideal process technology for delivering low power has been FDSOI, so Alphacore has done data converter designs with both STMicroelectronics 28nm, and Globalfoundries 22nm. FDSOI delivers about 70% lower power when compared to a 28nm bulk CMOS process node, so that’s a compelling reason to design with FDSOI. Another attraction for choosing FDSOI technology is that it has tolerance to ~100krad TID, something that aerospace designers are concerned about for reliable operating in orbit.

Mr. Potts shared that they have designed a 10-bit, 5GS/s ADC, that consumes only 19.7mW at 5GS/s, using a 800mV supply. As you lower the sampling to 3GS/s the power dips even lower to just 12.8mW.

Hybrid ADC

With ADC circuits there’s a question of gain and offset errors, so the good news is that Alphacore has an auto-calibration algorithm to eliminate interleaving spurs as shown below in the output spectrum results:

Spurs removed after calibration

For ADC circuits there’s a Figure Of Merit (FOM) to help compare performance, and the formula is Power / (Fs * 2^ENOB), and plotting Alphacore results shows about a 10X lower power while achieving GS/s conversion rates:

FOM comparison

ADC IP Choices

There are several ADC IP blocks to choose from that use the STMicroelectronics 28nm process:

  • 6-bit, 5GS/s at 14mW – A6B5G
  • 9-bit, 1GS/s at 2mW – A9B1G
  • 10-bit, 2.4GS/s at 6mW – A10B2G
  • 4-bit, 20GS/s Flash ADC – A4B20G

ADC circuits with 22FDX from Globalfoundries include several choices:

  • 10-bit, 3GS/s at 13mW – A10B3G
  • 10-bit, 5GS/s – A10B5G

The complete table of ADC offerings show which IP blocks are verified GDS II, silicon validated, or in foundry qualification status:

FDSOI CMOS IP Library

Design Kit

When you work with an IP vendor like Alphacore, they deliver to you a complete set of files so that you can place your converter block in a design, and complete your integration with simulation, verification and implementation:

  • GDSII layout file
  • RTL files
  • Schematic
  • Abstract view
  • All DRC/LVS logs
  • Extracted view
  • Extracted simulation model
  • Verilog-AMS models
  • Guide for DFT and I/O requirements

Alphacore Customers

The actual list is proprietary, but just be assured that the customers represent many segments, like: mil-aero, image sensor, advanced electronics, defense contractors, radars, national laboratories, research agencies. The ADC IP products are being used in six end segments:

  • Wireless and wireline communications
  • Defense applications
  • Test equipment
  • Imaging and Lidar
  • Automotive
  • Scientific & Industrial Instrumentation

Summary

I was pleased to learn about the rapid growth at Alphacore, which only validates there strong position in hybrid ADC IP blocks, achieving high-bandwidth at low power by clever design and use of FDSOI process technology. They have design kits in place, and most of their IP is already silicon proven, so that makes it a safe choice.

Related Blogs


WEBINAR: Challenges in analyzing High Performance clocks at 7nm and below process nodes

WEBINAR: Challenges in analyzing High Performance clocks at 7nm and below process nodes
by Daniel Nenni on 05-10-2022 at 6:00 am

Clock analysis rail to rail

Proper clock functionality and performance are essential for SoC operation. Static timing analysis (STA) tools have served well for verifying clocks, yet with new advanced process nodes, lower operating voltages, higher clock speeds and higher reliability requirements, STA tools alone can’t perform the kinds of analysis that are needed for clock sign-off anymore. At 7nm and below, a clock failure due to rail-to-rail, duty cycle distortion or aging issues can jeopardize an entire project. To help find and solve these problems San Jose based Infinisim has developed a product called ClockEdge that uses advances in simulation in conjunction with software specifically devoted to analyzing clocks.

REGISTER HERE

ClockEdge is most relevant for clock speeds in excess of 1GHz and clocks designed at below 10nm process nodes. It can handle traditional clock tree structures and also works with grid, mesh and spine-based clocks. ClockEdge overcomes the limitations that STA encounters, offering deeper insights into clocks that help with performance, power, reliability and more. At advanced process nodes STA tools guard-band their results, leading to over-design and unnecessary power consumption. STA also suffers due to lower operating voltages and non-linear device behavior. Results from STA miss rail-to-rail failures, aging effects and supply induced jitter, all of which can lead to chip failures.

Infinisim’s ClockEdge does more than just look at timing, it ensures that the clock is also functionally correct.  It delivers SPICE accurate results, typically with overnight turnaround even for the largest SOCs. ClockEdge performance is achieved through linear scaling using LSF jobs, unlike multithreading which plateaus at around 10-20X.  ClockEdge analyzes clock performance at multiple PVT corners and will perform HCI and NBTI aging analysis.  Another benefit of ClockEdge is its ability to compute peak-to-peak, average power and leakage current for each gate in the clock.

Clock analysis rail to rail

Unlike STA, ClockEdge analyzes the entire clock domain and looks at every clock path for its timing and electrical analysis at the same time. Going beyond looking at one path at a time can uncover situations where there may be excessive guard-banding or lurking failures. In advanced process node clock designs, duty cycle distortion or asymmetry in high and low pulse widths and rail-to-rail failures are often missed by STA but accurately predicted by ClockEdge. If not detected, both these errors can cause a host of problems and lead to timing problems in the finished chip. ClockEdge does full analog signal analysis to catch and report these issues.

ClockEdge is easy to use because the entire flow is focused on clock analysis. It automatically performs gate level tracing and sensitization, which is followed by transistor level simulation. ClockEdge has comprehensive post processing to generate the reports and the information needed to interpret clock functionality and performance results. As for inputs, ClockEdge uses the same information and data that are used by STA.

Clocks are too important to leave to STA at advanced nodes. A lot needs to be looked at, including power, rail-to-rail and aging to ensure design success. This is especially true for designs below 10nm, where many of these issues can slip through if only STA is used to look at clock issues. Infinisim has put a lot of work into ClockEdge, and they have gained acceptance with major semiconductor companies working on leading edge designs. Their website includes more information on the flow for ClockEdge.

REGISTER HERE

About Infinisim
Infinisim, Inc is a privately funded EDA company providing design verification solutions. Founded by industry luminaries, the Infinisim team has over 50 years of combined expertise in the area of design and verification.

Infinisim customers are leading edge semiconductor companies and foundries that are designing high-performance mobile, AI, CPU and GPU chips.

Infinisim has helped customers achieve unprecedented levels of confidence in design robustness prior to tape-out. Customers have been able to eliminate silicon re-spins, reduce chip design schedules by weeks and dramatically improve product quality and production yield. www.infinisim.com

Also Read

WEBINAR: Overcome Aging Issues in Clocks at Sub-10nm Designs

White Paper: A Closer Look at Aging on Clock Networks


Designing Ultra-Low-Power, Always On IP

Designing Ultra-Low-Power, Always On IP
by Daniel Payne on 05-09-2022 at 10:00 am

processor and sensor trends min

It’s popular to use DSP chips for vision processing in diverse applications like ADAS, security cameras and AR. Tensilica has been designing DSP chips and IP since 1997, and their technology was successful enough that Cadence acquired Tensilica back in 2013. At the IP-SoC Silicon Valley 2022 event in April I had the pleasure to watch the presentation from Amol Borkar, Product Marketing Director, IPG TIP, from Cadence Design Systems, and his topic was titled, Designing the Next Ultra-Low-Power Always-On Silicon.

In the past 25 years there have been over 40 billion processors shipped by Tensilica customers, 19 of the top 20 semiconductor vendors use Tensilica IP, and three recent customers include: UNISCOC (Application Processor), Light  (Auto vision) and Xvision (AR). The trends for processors and sensors can be shown across five categories, and the ultra-low-power segment requires a long battery life:

Processing and Sensor Trends

Microphones and ultrasonic sensors are common approaches to wake up electronic devices, and cameras are now being used for people detection, face detection, even gestures. Amol focused on vision and speech processing for always on embedded and IoT devices. An always-on (AON) device runs continuously, reads inputs from sensors, and often dissipates under 1mW of power, while using a microcontroller.

To wake up a device requires three stages: Detection, Authentication, Processing. For AON, the focus is on the Detection phase, where a low-power MCU or DSP is typically used to do something like keyword spotting, face detection, visual wake words, or handling gestures. An AON subsystem could be designed with both a Vision AON processor, and Audio AON processor; or with a combined IP for both workloads.

For audio and speech dominant AON processing tasks there is a processor family called the Tensilica HiFi DSP, then for image and camera dominant AON processing, there is a processor family called the Tensilica Vision DSP. The low-energy DSP for vision and AI tasks is the Tensilica Vision P1 DSP.

Always-On Face Detection

The earliest approaches used a camera connected to a CPU or App processor, although the neural network algorithms on a CPU are not very energy efficient. A more energy efficient approach is to have the camera connected to a vision DSP first, then alert the CPU or App processor after detection. The most energy efficient approach is shown below, where the P1 DSP reads a low resolution image and detects a face, then a vision DSP reads the high resolution image to detect, recognize and authenticate while connected to a CPU or app processor.

Always on face detection

The DSP and CPU only wake up after the P1 does the face detection, so this approach takes the least energy. Even after the face is detected, you can choose to have the P1 perform co-processing tasks, or have the P1 go into standby. Benchmark results compared a competitor to a Vision P1 system using TinyML v0.5 for visual wake word, image classification, keyword spotting and anomaly detection. The Vision P1 system performed higher inferences per second, and much lower energy numbers than the competitor.

TinyML v0.5 benchmark comparison

Some customers use the Vision P1 DSP with their smart image sensor to classify objects, support bounding boxes and segmentation. AR applications for foveated rendering have detected pupil center and gaze detection with a Vision P1 DSP.  Aside from AON, Vision P1 is a fully functioning Vision DSP and there’s plenty of support that comes along with the P1:

  • Imaging and vision DSP tools and libraries
  • Supports AI flow of XNNC, NNAPI, TFLm
  • 100+ model zoo networks

Summary

The Cadence Tensilica products for both audio and vision are quite popular, as they’ve been in the DSP business for decades now. The Vision P1 DSP has been optimized for AON applications, and performs tasks like face detection quite efficiently. There’s a 30 minute video of this presentation online.

Related Blogs


Advantages of Large-Scale Synchronous Clocking Domains in AI Chip Designs

Advantages of Large-Scale Synchronous Clocking Domains in AI Chip Designs
by Kalar Rajendiran on 05-09-2022 at 6:00 am

Large models challenge current AI hardware solutions

We are currently in the hockey stick growth phase of AI. Advances in artificial intelligence (AI) are happening at a lightning pace. And, while the rate of adoption is exploding, so is model size. Over the past couple of years, we’ve gone from about two billion parameters to Google Brain’s recently announced trillion-parameter AI language model, the largest yet. But with model size and complexity growing at a faster rate than hardware compute capability, AI hardware is running out of steam and OEMs are looking for new solutions.

Compute challenges are nothing new to the technology industry so, what is different this time around? To date, the tried-and-true means to increase performance has been to multiply the number of tiles, or cores or chips. However, more cores and larger chips amplify a long-standing problem for designers. Chip Developers need to continuously battle the innate physics of large chips, combating skew, process variation, and aging effects. And, these effects are only multiplied as companies transition to smaller process geometries. As engineers add more processors to a design, enacting a synchronous design at high frequencies becomes an almost insurmountable task. Physical design engineers are forced to overdesign these massive chips, leading to unnecessary clocking overhead and a decrease in inference rates or increase in training times.

At the recent Linley Spring Processor Conference, Movellus’ Aakash Jani presented on the challenges and opportunities of scaling performance in very large, many-core chip designs. He shows us how an innovative approach to clocking allows greater synchronization throughout the design and enables more efficient and scalable performance for emerging AI applications.

Requirements Driving Scalable AI

Data centers, autonomous vehicles and computer vision are some of the applications that are pushing the limits of scalable AI. The old way of throwing more chips and/or processors at the problem does not lead to a scalable solution. Refer to the Figure below.

In the era of AI, big multicore chips are the new normal. More tiles or cores require more area. More area leads to more power consumption, more interconnect, more latency, and more skew. All these chip infrastructure overhead problems are amplified on larger area designs. All these problems are impacted by the clock network. The above graph shows some well know AI processors that use a multicore approach to increase performance. The problem is that as more cores are added the performance per core decrease. This is due to chip infrastructure overhead and, to a large degree, inefficiencies in the clock network.

Today, designers address clocking issues with a divide and conquer approach. They may tackle the biggest offenders first and make incremental changes until they meet design requirements. But if we approach the problem holistically, there is an opportunity for major gains in power-efficiency and performance. Additionally, we can open the door to creating large synchronous clock domains, allowing engineers to scale their systolic arrays for the next generation of multi-trillion parameter models.

Movellus Solution

Movellus presented its holistic clocking solution: intelligent clock networks. What, exactly, is an Intelligent Clock Network? Every chip begins with a perfect clock signal. However, as the signal travels through the chip, it is often delayed and distorted because of process variation and the physics of the chip. Intelligent clock networks bypass most of these problems to help clock architects deliver an ideal clock signal to every flop. These networks achieve this lofty goal using strategically placed smart clock IP modules throughout the chip. Smart clock modules use Movellus’ intelligent clock network technology to actively compensate for skew, process variation, and aging. Smart clock modules are also aware of other smart clock modules and can synchronize with them to create large synchronous clock domains via a closed feedback loop. The beauty of this approach is that it eliminates the need for a multitude of retiming flops and cross domain clocking (CDC) buffers and thereby avoids a ton of clocking overhead and system latency. It also reduces design complexity and greatly eases timing closure.

The above chart compares Movellus’ intelligent clock network approach with today’s popular solutions, including a tool driven methodology with clock tree synthesis (CTS) and a semi-custom strategy that implements a mesh. The chart shows design tradeoffs regarding fmax, useful clock period, process flexibility, power and area efficiency, and ease of timing closure. Intelligent clock networks can bring the combined advantages of today’s solutions by offering the performance of a mesh at the power consumption of a tree.

Summary

Movellus shows how an intelligent clock network that takes a holistic approach to clocking delivers a significant performance enhancement compared to individual clock network component optimizations. The company introduces its new product, Maestro AI, an intelligent clock network IP platform. Maestro AI enables SoC designers to remove unwanted and accumulating system-level latency for larger chips and chiplets. Maestro intelligent clock network solutions occupy  a much smaller area compared to alternative solutions. The solution enables designers to expand the size of synchronous clock domains. Since the solution is offered in soft IP form, it is easily configurable to customer application requirements and portable to any process technology.

On-Demand Access to Aakash’s talk and presentation

You can listen to Aakash’s talk, “Advantages of Large-Scale Synchronous Clocking Domains in SoCs and Chiplets” here, under Session 4.  You will find his presentation slides here, under Day 2- AM Sessions.

Also read:

It’s Now Time for Smart Clock Networks

CEO Interview: Mo Faisal of Movellus

Performance, Power and Area (PPA) Benefits Through Intelligent Clock Networks


Demonstration of Dose-Driven Photoelectron Spread in EUV Resists

Demonstration of Dose-Driven Photoelectron Spread in EUV Resists
by Fred Chen on 05-08-2022 at 10:00 am

Demonstration of Dose Driven Photoelectron Spread in EUV Resists

As a consequence of having a ~13.5 nm wavelength, EUV photons transfer ~90% of their energy to ionized photoelectrons. Thus, EUV lithography is fundamentally mostly EUV photoelectron lithography. The actual resolution becomes dependent on photoelectron trajectories.

Photoelectron trajectories in EUV lithography were first extensively studied by Kotera et al. [1]. Photoelectrons are preferentially generated along the polarization direction. As EUV light is unpolarized and propagating mostly vertically in the resist, the photoelectron propagation direction tends to be in any random horizontal direction. Thus, the particular direction can be chosen from a uniform random distribution of angles between 0 and 360 degrees. At the same time, the distance traveled can be selected from a random quantile of an exponential distribution [2].

As shown in the figure below, the resulting distribution of distances over which the photoelectrons migrate and deposit their energy is randomly and irregularly shaped. It also depends very much on how many photoelectrons are generated within the same volume of resist, which is proportional to the dose.

EUV photoelectron lateral spread vs. accumulated dose, showing 1X, 2X, 3X, and 4X nominal dose levels. The top and bottom are two separate cases. Initial photoelectron position is (0,0). Nominal 1X dose is 60 mJ/cm2. With an absorption coefficient of 20/um [3], 22% of the light is absorbed in lower half of 40 nm metal-oxide resist, leading to 9 photoelectrons/nm2. Poissonian shot noise is not considered here. Axis labels are in nm.

Increasing the dose very obviously increases the photoelectron spread, effectively increasing the blur. Yet, a higher dose is needed to reduce the impact of Poissonian shot noise, which has not been considered here. The 3-sigma deviation for 9 absorbed photons is 100% [4]! Quadrupling the dose would halve it to 50%. There is therefore an unavoidable resolution-roughness tradeoff in EUV photoresists.

References

[1] M. Kotera et al., “Extreme Ultraviolet Lithography Simulation by Tracing Photoelectron Trajectories in Resist,” Jpn. J. Appl. Phys. 47, 4944 (2008).

[2] https://en.wikipedia.org/wiki/Exponential_distribution

[3] A. Grenville et al., “Integrated Fab Process for Metal Oxide EUV Photoresist,” Proc. SPIE 9425, 94250S (2015).

[4] https://en.wikipedia.org/wiki/Shot_noise

This article first appeared in LinkedIn Pulse: Demonstration of Dose-Driven Photoelectron Spread in EUV Resists 

Also Read:

Adding Random Secondary Electron Generation to Photon Shot Noise: Compounding EUV Stochastic Edge Roughness

Intel and the EUV Shortage

Can Intel Catch TSMC in 2025?


The Jig is Up for Car Data Brokers

The Jig is Up for Car Data Brokers
by Roger C. Lanctot on 05-08-2022 at 6:00 am

John Oliver Car Data Brokers

The same week that John Oliver took on the topic of privacy on his HBO program “Last Week Tonight,” one of the leading automotive data brokers – Otonomo – became the target of a class action lawsuit in California. While Oliver detailed the creepiness of everyday privacy violations on computers, mobile phones, and connected televisions, the lawsuit highlighted the privacy-compromising possibilities of connected cars.

Lawyers from Edelson PC have alleged that Otonomo collects location data on thousands of California residents resulting in a violation of the California Invasion of Privacy Act. The implications are sobering for both car makers and the dozen or more vehicle data brokers that have given form to this new market sector.

Otonomo rose to prominence with a billion-dollar SPAC in 2021. The special purpose acquisition corporation (SPAC) merger in August of last year valued money-losing Otonomo at $1.09B with a stock price of $8.31 a share. The gross proceeds of the SPAC were $255.1M for Otonomo, the stock price of which today stands at less than $2.

Otonomo was not alone in SPAC-ing in 2021. Fellow data broker startup Wejo concluded its own successful SPAC merger following Otonomo’s. Like Otonomo, Wejo’s stock has plunged since going public and the most recent quarterly results – reported March 31 – show an enormous $67.7M net loss with a further $100M+ loss anticipated for 2022.

Enthusiasm for “monetizing” car data was inspired by a notorious 2016 McKinsey report which estimated the value of car data at between $450B and $750B. What many readers of the report ignored was the portion of that forecasted value that was derivative or indirect. These literal interpreters and their car company enablers have sought to directly monetize vehicle data by literally selling it to growing networks of service providers and retailers also seeking to cash in.

The McKinsey report unleashed a gold rush among startups and established players seeking to unlock those billions. Car companies themselves got in on the act with companies such as Volkswagen, Audi, and Stellantis proclaiming that they would soon derive more value and revenue from car data than they would from selling vehicles.

Companies including Caruso and Xapix emerged alongside Wejo and Otonomo. All of these companies began collecting data from a range of sources including direct access to vehicle data from car makers (GM and Volkswagen sharing data with Wejo) as well as data derived from aftermarket devices and connected car smartphone apps.

The dismal reality is now setting in that not only is it difficult to extract value from car data (which often needs to be “cleaned up” and anonymized), it may also be illegal if not done properly. Data monetizers need expertise to extract value from vehicle data and they also need consumer consent.

Most early vehicle data applications derived first and foremost from vehicle tracking – i.e. the basic business of locating vehicles which may be in transit, making deliveries, or stolen. Of late, the emphasis has shifted toward vehicle diagnostics and driver behavior for service scheduling and insurance applications, respectively.

Most rental agreements and new car sales agreements include language regarding customer consent – but consent is often obtained as part of a fairly opaque process. The consumer scans past the small print that gives the dealer or auto maker the right to collect and resell vehicle data. This appears to be the case in the Otonomo class action.

Vice.com reports: “The plaintiff in the case is Saman Mollaei, a citizen of California. The lawsuit does not explain how it came to the conclusion that Otonomo is tracking tens of thousands of people in California.

“Mollaei drives a 2020 BMW X3, and when the vehicle was delivered to him, it contained an electronic device that allowed Otonomo to track its real-time location, according to the lawsuit. Importantly, the lawsuit alleges that Mollaei did not provide consent for this tracking, adding that ‘At no time did Otonomo receive—or even seek—Plaintiff’s consent to track his vehicle’s locations or movements using an electronic tracking device.’

“More broadly, the lawsuit claims that Otonomo ‘never requests (or receives) consent from drivers before tracking them and selling their highly private and valuable GPS location information to its clients.’ The lawsuit says that because Otonomo is ‘secretly’ tracking vehicle locations, it has violated the California Invasion of Privacy Act (CIPA), which bans the use of an ‘electronic tracking device to determine the location or movement of a person’ without consent.”

Otonomo says it protects customer privacy and obtains consent. Vice previously published an account of a 2021 investigation which revealed that data obtained from Otonomo could be used “to find where people likely lived, worked, and where else they drove.”

There is little doubt that vehicle data has value. Whether the value of that data amounts to hundreds of billions of dollars is debatable. What is clear, though, is that the organizations looking to cash in need consumer consent.

The Alliance for Automotive Innovation, representing the interests of foreign and domestic auto makers, has a privacy policy on its Website with two principles:

Participating automakers commit to:

1.     Providing customers with clear, meaningful information about the types of information collected and how it is used

2.   Obtaining affirmative consent before using geolocation, biometric, or driver behavior information for marketing and before sharing such information with unaffiliated third parties for their own use

Andrea Amico, founder of Privacy4Cars, a data privacy advocacy group, noted that the automotive industry continues to fall well short of these objectives. Amico said that auto makers seeking to access vehicle data need to provide:

1.     A clear, prominent notice

2.   A method for the customer to provide affirmative consent

3.   An easy-to-use method for the customer to opt out

4.   A process for managing a change in vehicle ownership

5.   A process for erasing data

6. The one exception might be data related to the operation of a safety or emergency response system – as well as data reporting required by regulatory authorities. Of course, internal use of data by an auto maker is to be distinguished from data used by, sold to, or shared with third parties.

As an example, Aiden is a vehicle data startup focused on the growing number of cars arriving in the market equipped with the Android Automotive operating system. Last week and at the Consumer Electronics Show in January, Aiden demonstrated a first notice of loss (FNOL) car insurance application with an in-dash consumer consent element.

The application provides a complete in-dash display of all data to be shared with the insurance company with the option for the consumer to “Accept,” delay a response: “Later,” or “Reject.” This is a clear and prominent display requiring an affirmative response (which could also be a rejection).

SOURCE: In-dash screen shot of Aiden FNOL consent form.

Current auto maker and, by extension, new car dealer privacy policies are inadequate to fulfill the requirements of prominent notification, affirmative consent, customer control, and ownership transfer. Under current conditions, the Privacy4cars representative suggested that the Otonomo class action is not likely to be the last.

Car makers, rental car companies, and operators of car sharing programs need to review their policies for data management and privacy. We know it isn’t easy to monetize vehicle data – as demonstrated by two money-losing SPACs. Now we know it also requires consent.

Also Read:

ITSA – Not So Intelligent Transportation

OnStar: Getting Connectivity Wrong

Tesla: Canary in the Coal Mine