Semiwiki EDA Webinar 800x100

Podcast EP79: Alphacore’s Capabilities and Growth Plans with Ken Potts

Podcast EP79: Alphacore’s Capabilities and Growth Plans with Ken Potts
by Daniel Nenni on 05-13-2022 at 10:00 am

Dan is joined by Ken Potts, Alphacore’s Chief Operating Officer. Ken has over 30 years of successful entrepreneurship in both Fortune 100 as well as emerging technology companies. Ken has held numerous executive and operational leadership roles in semiconductor products, semiconductor IP, and electronic design automation.

Dan and Ken discuss Alphacore’s current complement of high performance IP and chip services. The portfolio and customer base are detailed by Ken, with a discussion of what lies ahead in terms of customer growth and product expansion.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Chuck Gershman of Owl AI

CEO Interview: Chuck Gershman of Owl AI
by Daniel Nenni on 05-13-2022 at 6:00 am

Corp with Backdrop

Chuck Gershman CEO is the co-founder Owl Autonomous Imaging, Inc. Chuck is a Drexel University College of Engineering inductee into the Alumni Circle of Distinction, the highest honor bestowed upon alumni. He has been honored as a finalist for CMP publications (EE Times) prestigious ACE award as High Technology Executive of the Year and was previously named a Top 40 Healthcare Transformer by Medical Marketing & Media for his work on Clinical AI Decision Support for cancer patients. Chuck holds three US patents for his contributions to Microprocessor Architecture.

Chuck brings over 30 years of technology and semiconductor industry experience in executive management, marketing, engineering, business development, sales, consulting, and executive advising. Including Owl Autonomous Imaging, Mr. Gershman has served as CEO/COO and a Board Director for three companies, he knows what it takes to lead a vision to reality – having led successful exits with acquisitions by Intel and PMC-Sierra.

What is the backstory of Owl AI?
The foundation of Owl’s technology was created under a challenge grant from the US Air Force to track ballistic missiles in flight. Leveraging this technology and the associated patent portfolio, Owl has developed a monocular Thermal Ranging™ camera that provides HD Thermal video with precision ranging that delivers a 150x better spatial resolution than LIDAR (500x that of Radar). A number of our team members come from Kodak where they helped to develop the first commercial digital cameras and first optical scanner. With regards to thermal imaging our team has developed two thermal cameras that are currently deployed in space. The team also recently competed a military uncooled thermal design for one of the most advanced military grade thermal cameras developed to date.

What problems/challenges are you solving?
We are basically improving sensing and perception of living things such as humans and animals with our 3D dense range map regardless of time of day and regardless of visual impairments such as fog, rain, sleet, snow, exhaust, glare and speed to name some.

What markets does Owl address today?
Owl addresses automotive safety markets such as ADAS and AV’s, industrial off-road markets that require robotic mobility and select military applications. With regards to automotive safety. automatic emergency braking (AEB) has quickly evolved as a must have feature. This capability has now moved from not just automated braking of large objects like cars, busses or trucks but braking for pedestrians and animals. This is known as Pedestrain AEB. Though these systems have been shown to dramatically reduce accidents the current class of systems completely fail when operating at night. Testing completed earlier this year by the Insurance Institute of Highway Safety (IIHS) reported a 32% reduction in pedestrian crashes for systems enabled with PAEB versus those without during the day. However, they also found absolutely no difference in crash rate when operating at night. A complete fail. That is where Owl AI comes in.

What makes Owl AI and Owl AI’s products unique?
Owl’s new monocular ranging sensor system, called the Thermal Ranger, outputs a megapixel (MP) of thermal (night vision) video in parallel with optically fused, 3D range-maps that are similar in appearance to LiDAR and radar range map formats, but delivering orders of magnitude more data points per second. Owl’s solution is analogous to recent announcements of 3D single camera computer vision systems operating in the visual domain; however, Owl’s Thermal Ranger is unique as it delivers rich detail and 3D response day or night, including operation in extreme visually impaired environments known as DVE.

The Thermal Ranger is composed of a first of its kind Megapixel Digital Focal Plane Array (MP-DFPA) semiconductor chip producing nearly four times the resolution of today’s analog-based VGA thermal cameras. The Thermal Ranger also includes a multi-aperture optical component (MLA), and a suite of Convolutional Neural Network (CNN) ranging software for true thermal computer vision. The sensor operates in the thermal spectrum (longwave Infrared) allowing it to see the world clearly, in high-resolution, through adverse DVE and any lighting condition for instant classification and 3D location of pedestrians, cyclists, animals, vehicles, and other objects of interest. This is a true no light system, not to be misconstrued with a low light camera (NIR or SWIR).

This low cost, compact, single lens (monocular) system outputs megapixel HD thermal video producing vivid clarity, while simultaneously generating 3D range maps of up to 90 million points per second, which is orders of magnitude more angular and spatial resolution than LiDAR or radar sensors. For PAEB systems, the novel MLA enables simultaneous capture of both wide angle and telephoto fields of view (FOV) through a single main lens providing wide angle curb to curb response (100 degrees) while enhancing 2D long-range response to well beyond 300 meters and delivering high accuracy 3D range response at distances of over 185 meters. Removal of the MLA along with installation of a telephoto lens with a FOV idealized for long haul highway scenes results in the system being idealized for long haul AV trucking applications with object detection response up to 400m well beyond any other sensor available today including LiDAR.

What’s next for Owl AI? Or what is Owl AI’s future direction?
Owl currently has paying customers. Owl recently completed a Series A financing round of $15M to help us accelerate our development and we are focused on executing on our technology roadmap and expanding our go-to-market resources. We are starting to engage higher volume opportunity customers as well as identify and plan for future optimizations of our roadmap with key customer input. We believe our solution is cost effective today and we will continue to align our products with a strong value proposition over the long term.

Additional thoughts? 
We believe that today’s ADAS, AV and Robotic Mobility systems will be improved through the sensor diversity achieved by adding this fourth sensor modality. Lastly, our solutions are being designed with automotive quality standards in mind and we intend to meet the needs of the massive opportunity in this market.

Also Read:

CEO Interviews: Dr Ali El Kaafarani of PQShield

CEO Interview: Dr. Robert Giterman of RAAAM Memory Technologies

Experts Talk: RISC-V CEO Calista Redmond and Maven Silicon CEO Sivakumar P R on RISC-V Open Era of Computing


Semiconductor Crash Update

Semiconductor Crash Update
by Daniel Nenni on 05-12-2022 at 10:00 am

Semiconductors are Capturing Electronics

Earlier this year semiconductor oracle Malcom Penn did his 2022 forecast which I covered here: Are We Headed for a Semiconductor Crash? The big difference with this update is the black economic clouds that are looming which may again highlight Malcolm’s forecasting prowess. I spent an hour with Malcolm and company on his Zoom cast yesterday and now have his slides. Great chap that Malcolm.

The $600B question is: When will the semiconductor CEOs start issuing warnings?

RECAP: PERFECT STORM BROKE IN JULY 2020 (BUT NOBODY WAS PAYING ATTENTION)

The 6.5% growth in 2020 set the stage for a big 2021 which ended up at 26%. Previous semiconductor records were 37% in 2000 and 32% in 2010 so 26% is not that big of a number meaning we will have a shorter distance to fall. Covid was the trigger for our recent shortages but it really was a supply/demand imbalance camouflaged by a crippled supply chain.

The big difference today is the daunting economic and geopolitical issues that could possibly raise Malcolm to genius forecasting level. The horrific geopolitics with Russia and China, rampant inflation, the workforce challenges around the world, and of course Covid is not done with us yet, not even close.

Let’s take a look at the four key drivers from Malcolm’s presentation:

  1. Economy: Determines what consumers can afford to buy.
  2. Unit Demand: Reflects what consumers actually buy plus/minus inventory adjustments.
  3. Capacity: Determines how much demand can be met (under or over supply).
  4. ASPs: Sets the price units can be sold for (supply – demand + value proposition).

The Economy is the big change since I last talked to Malcolm. In my 60 years I have never experienced a more uncertain time other than the housing crash in 2008 where a significant amount of my net worth would disappear overnight. Today, however, I am a financial genius for holding fast. Property values here are about double of the peak in 2008 which is great but is also a little concerning.

Bottom line: I think we can all agree the economy is in turmoil with the inflation spike and the jump in interest rates and debt. Maybe some financial experts can chime in here but in my experience this trend will get worse (recession?) before it gets better.

Unit Demand is definitely increasing due to the digitalization transformation that we have been working on for years. Here is a slide from a keynote at the Siemens EDA Users meeting last week. I will be writing about this in more detail next week. Unit volume is the great revealer of truth versus revenue so this is the one to watch. Unfortunately “take or pay” and “prepay” contracts are becoming much more common and that can disturb unit demand as a forecasting metric.

Bottom line: Long term semiconductor unit demand will continue to grow in my opinion (not at rate we experienced in 2021) but that will largely be due to the Covid backlog and inventory builds. The big risk here is China. China is in turmoil and they are the largest consumer of semiconductors. China stemmed the first Covid surge with draconian measures which they are again employing and again the electronics supply chain is impeded. Other parts of the world who are not paying attention to what is happening in China will suffer the consequences in the months to come, my opinion.

Capacity is a tricky one. Let’s break this one into two parts: Leading edge nodes (FinFETs) and mature nodes (Not FinFETs). We are building leading edge capacity with impunity. It’s a PR race between Intel, Samsung, and TSMC and since Intel is outsourcing significant FinFET capacity to TSMC it makes it even trickier.

To be clear, mature node capacity is being rapidly added but a lot of it is in China since they do not have access to leading edge technology but  will pale in comparison to FinFET capacity. Reshoring semiconductor manufacturing and the record setting CAPEX numbers are also an important part of this equation which makes the over supply argument even easier.

On the other side of the equation the semiconductor equipment companies are hugely backlogged no matter who you are and the electronics supply chain is still crippled so announcing CAPEX and actually spending it is two different things.

In my opinion, if all of the announced CAPEX is actually spent there will be some empty fabs waiting for equipment and customers. Remember, Intel had an empty fab in AZ for years and there are still empty fabs all over China. Staffing new fabs will also be a challenge since the semiconductor talent pool is seriously strained.

Bottom line: We did not have a wafer manufacturing capacity problem before Covid, we do not have a wafer manufacturing capacity problem today, and I don’t see an oversupply risk in the future. We did have a surge in chip demand due to Covid but that will end soon and the crippled supply chain (the inability to get systems assembled and to customers) is easing the fab pressures and that will continue this year and next depending on Covid and how we respond to it.

ASPs are being propped up by the shortage narrative. Brokers, distributors, middlemen are hording and raising prices causing ever more supply chain issues. I have heard of 10x+ price increases for $1 MPUs. Systems companies are paying a premium for off-the-shelf chips and foundries are raising wafer prices in record amounts which, at some point in time, will calm demand.

Bottom line: Malcom is convinced a significant crash is coming but I do not agree based on my ramblings above. If someone asked me to place a 10% over/under bet for semiconductor revenue growth in 2022 I would bet the farm on over. My personal number was and still is 15% growth in 2022.

Let me know if you agree or disagree in the comment section and we can go from there. Exciting times ahead, absolutely.

Also read:

Design IP Sales Grew 19.4% in 2021, confirm 2016-2021 CAGR of 9.8%

Semiconductor CapEx Warning

Chip Enabler and Bottleneck ASML

The ASIC Business is Surging!


Scaling is Failing with Moore’s Law and Dennard

Scaling is Failing with Moore’s Law and Dennard
by Dave Bursky on 05-12-2022 at 6:00 am

Scaling is Falling SemiWiki

Looking backward and forward, the white paper from Codasip “Scaling is Failing” by Roddy Urquhart provides an interesting history of processor development since the early 1970s to the present. However it doesn’t stop there and continues to extrapolate what the chip industry has in store for the rest of this decade. For the last half century, Moore’s Law, an observation regarding the number of transistors that can be integrated on  chip, was crafted by Gordon Moore, one of the founders of Intel Corp. That observation was followed by Robert Dennard of IBM Corp., who in addition to inventing the single-transistor DRAM cell, defined the rules for transistor scaling, now known as Dennard Scaling.

In addition to scaling, Amdahls law, stipulated by Gene Amdahl while at IBM Corp. in 1967, deals with the theoretical speedup possible when adding processors in parallel. Any speedup will be limited by those parts of the software that are required to be executed sequentially. Thus, Moore’s Law, Dennard Scaling, and Amdahl’s law have guided the semiconductor industry over the last half century (see the figure). However, Codasip claims they are all failing and that the industry must change and the processor paradigms must change with it. Some of those changes include the creation of domain-specific accelerators, customized solutions, and new companies that create disruptive solutions.

Supporting the paper’s premise that semiconductor scaling is failing are numerous examples in the microprocessor world. The examples start with the Intel x86 family as an illustration of how scaling failed as chip complexities and clock speeds increased with each new generation of the single-core CPUs. As each CPU generation’s clock frequency increased from the MHz to the GHz level thanks to the improvements in scaling, chip thermal limits became a restraining factor for performance. The performance limitation was the result of a dramatic increase in power consumption as clock speeds hit 3 GHz and higher and complexities hit close to a billion transistors on a chip. The smaller size of the transistors also resulted in increased leakage currents, and the higher leakage currents caused the chips to consume more power even when idling.

To avoid thermal runaway caused by increasing clock frequencies, designers opted for multi-core architectures, integrating two, four or more CPU cores on a single chip. These cores could operate at lower clock frequencies, share various on-chip resources, and thus consume less power. The additional benefit of the multiple cores was the ability to multitask, allowing the chip to run multiple programs simultaneously. However, the multicore approach was not enough for the CPUs to handle the myriad tasks that new applications such as graphics, image and audio processing, artificial intelligence, and still other functions.

Thus, Codasip is proposing that further processor specialization will deliver considerable performance improvements – the industry must change from adapting software to execute on available hardware to tailoring computational units to match their computational load. To accomplish this, many varied custom designs will be needed, permitting companies to design for differentiation. Additionally new approaches to processor design must be considered – especially the value of processor design language and processor design automation.

Using the RISC-V modular architecture as an example of the ability to create specialized cores and its flexibility to craft specialized instructions, Codasip sees the RISC-V as an excellent starting point for tailored processing units. Cores will typically be classified in one of four general categories – MCU, DSP, GPU, and AP (application processor), with each type optimized for a range of computations, some of which may not match what is actually required by the on-chip subsystem.  Some companies have already developed specialized cores (often referred to as application-specific instruction processors, ASIPs) that efficiently handle a narrowly-defined computational workload. However, crafting such cores requires specialized skills to define the instruction set, develop the processor microarchitecture, create the associated software tool chain, and finally, verify the core.

Codasip suggests that the only way to take specialization a step further is to create innovative architectures to tackle specialized processing problems.  Hardware should be created to match the software workload – that can be achieved by customizing the instruction set architecture, creating special microarchitectures, or creating novel processing cores and arrays. ASIPs can be considered a subset of domain-specific accelerator, a category defined in a paper presented in 2019 by John Hennessy and David Paterson – “A New Golden Age for Computer Architecture”.

They characterized DSAs as exploiting parallelism (such as instruction-level parallelism, or SIMD, or systolic arrays) if the class of applications benefitted from it. DSAs can better match their computational capabilities to the intended application. One example is the Tensor Processing Unit (TPU) developed by Google, which is a systolic array working with 8-bit precision. The more specialized the processor, the greater the efficiency in terms of silicon area and power consumption. However, with less specialization, the greater the flexibility of the DSA. On the DSA continuum there is the possibility of fine-tuning a core for performance, area, and power – and design for differentiation is enabled.

Specialization is not only a great opportunity, but it means that there will be many different designs created. Those designs will require a broader community of designers and a greater degree of design efficiency. Codasip sees four enablers that can contribute to the efficient design – the open RISC-V ISA, processor design language, processor design automation, and existing verified RISC-V cores for customization.

They feel that RISC-V – a free and open standard that only covers the instruction set architecture and not the microarchitecture – has garnered widespread support and does not prescribe a licensing model so both commercially licensed and open-sourced microarchitectures are possible. If designers use an processor design lanaguage such as Codasip’s CodAL, they have a complete processor description capable of supporting software, hardware, and verification aspects. If custom instructions are implemented by adding to the processor design language source and can thus be reflected in the software toolchain and verification environment as well as the RTL.

Also read:

Optimizing AI/ML Operations at the Edge

Podcast EP60: Knowing your bugs can make a big difference to elevate the quality of verification

 


Podcast EP78: A Tour of DAC 2022 with Rob Oshana, General Chair

Podcast EP78: A Tour of DAC 2022 with Rob Oshana, General Chair
by Daniel Nenni on 05-11-2022 at 10:00 am

Dan is joined by Rob Oshana, general chair of this year’s DAC. Rob is vice president of software engineering R&D for the Edge Processing business line at NXP.  He serves on multiple industry advisory boards and is a recognized international speaker.  He has published numerous books and articles on software engineering and embedded systems.  He is an adjunct professor at the University of Texas and Southern Methodist University and is a Senior Member of IEEE.

Dan and Rob discuss the program for this year’s DAC. What the various parts of the conference will offer and a surprising discussion about the dynamics of moving back to a live event. This year’s DAC is shaping up to be a memorable event with many relevant topics and focus areas. You definitely want to hear the backstory.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Why Software Rules AI Success at the Edge

Why Software Rules AI Success at the Edge
by Bernard Murphy on 05-11-2022 at 6:00 am

flexlogix min

It is an unavoidable fact that machine learning (ML) hardware architectures are evolving rapidly. Initially most visible in datacenters (many hyperscalars have built their own AI chips), the trend is now red-hot in inference engines for the edge, each spinning new ground-breaking methods. Markets demand these advances to support bigger images and voice recognition problems in real-time, for some level of local training, for local processing for privacy/security, and to reduce power and latency in communication to the fog or cloud. Product OEMs depend on those advances, for differentiation in power, latency, privacy, cost, etc. But they are far from expert in the underlying hardware. Without that understanding, how can they fully exploit their advantages? That’s why software rules AI success at the edge – the software that maps between open AI solutions trained in the cloud and these highly optimized hardware platforms.

FlexLogix at Linley

Randy Allen, VP of software at FlexLogix, presented at the Linley Spring conference on this question, illustrating through the FlexLogix InferX X1 product line. Briefly, the heart of the X1 is a dynamically reconfigurable tensor processor, in which you can rapidly reconfigure the hardware datapath for each layer of ML processing. As one layer completes processing, the next layer reconfigures in microseconds. X1 offers the software flexibility of a CPU solution but with the performance and power advantages of a full ASIC solution.

InferX is a good example of a highly optimized edge architecture. Capable of amazing performance at amazingly low power, but only if used correctly. Meeting that goal requires compiler magic which knows how to map optimally from one of the standard open-source networks (TensorFlow, PyTorch, etc) into the underlying hardware architecture and to connect back to the OEM application. Completely hands-free from an OEM developer point of view.

You can understand then why software (the compiler) can make or break a great edge AI solution. Because, for edge devices fantastic hardware is useless if it is only usable by experts.

More detail on the FlexLogiX compiler

A compiler for one of these devices is a completely different animal from a regular software compiler. In this instance it must map TensorFlow Lite or ONNX operators produced from standard trained networks; into a reconfigurable tensor processor in a way that maximizes throughput while minimizing power.

The compiler maps many diverse planes in a typical network model into the tensor fabric with several constraints in mind. First to organize operations in the network for maximum parallelism. Second to minimize off-chip memory traffic as much as possible. Since any operation needing to go off-chip will automatically incur significant latency and power penalties. So a major consideration in these compilers is finding the greatest possible reuse of the data already on on-chip. Image data, weights and activation function fetch and store delayed as long as possible off-chip memory operations. Between these two constraints is where the X1 compiler works; through scheduling, parallelizing and fusing operations, and maximizing on-chip data reuse. Then finally generating a bit stream for that optimized model to program the InferX device.

Going deeper

You can learn more about the InferX products HERE. If you’re interested in digging deeper into the compiler technology, there’s a nice presentation by Jeremy Roberson (also at FlexLogix), from the 2021 Spring Linley conference.


High-speed, low-power, Hybrid ADC at IP-SoC

High-speed, low-power, Hybrid ADC at IP-SoC
by Daniel Payne on 05-10-2022 at 10:00 am

hybrid adc min

Andrew Levy and I both worked at Intel and Opmaxx, and I knew that he was now working at Alphacore, an IP company specializing in mixed-signal, RF, imaging and rad-hard applications. I was curious what Alphacore was up to, so at the IP-SoC Silicon Valley 2022 event I watched the ADC presentation from Ken Potts, COO of Alphacore. Mr. Potts has been in the semiconductor industry for over three decades, including stints at Silicon Technologies, Cadence, Virage Logic,  Compass Design Automation and VLSI Technology.

I learned from a video interview at the event that Alphacore doubled in size during 2021, and is on track to double again in 2022, so that says a lot about their success in the IP marketplace. Ken’s presentation at IP-SoC was all about their progress in designing hybrid ADC circuits. Applications for RF data converters include: 5G radios, beamforming, direct to RF sampling, and phased array architectures. The challenge is how to achieve high bandwidth while also consuming low power.

The ideal process technology for delivering low power has been FDSOI, so Alphacore has done data converter designs with both STMicroelectronics 28nm, and Globalfoundries 22nm. FDSOI delivers about 70% lower power when compared to a 28nm bulk CMOS process node, so that’s a compelling reason to design with FDSOI. Another attraction for choosing FDSOI technology is that it has tolerance to ~100krad TID, something that aerospace designers are concerned about for reliable operating in orbit.

Mr. Potts shared that they have designed a 10-bit, 5GS/s ADC, that consumes only 19.7mW at 5GS/s, using a 800mV supply. As you lower the sampling to 3GS/s the power dips even lower to just 12.8mW.

Hybrid ADC

With ADC circuits there’s a question of gain and offset errors, so the good news is that Alphacore has an auto-calibration algorithm to eliminate interleaving spurs as shown below in the output spectrum results:

Spurs removed after calibration

For ADC circuits there’s a Figure Of Merit (FOM) to help compare performance, and the formula is Power / (Fs * 2^ENOB), and plotting Alphacore results shows about a 10X lower power while achieving GS/s conversion rates:

FOM comparison

ADC IP Choices

There are several ADC IP blocks to choose from that use the STMicroelectronics 28nm process:

  • 6-bit, 5GS/s at 14mW – A6B5G
  • 9-bit, 1GS/s at 2mW – A9B1G
  • 10-bit, 2.4GS/s at 6mW – A10B2G
  • 4-bit, 20GS/s Flash ADC – A4B20G

ADC circuits with 22FDX from Globalfoundries include several choices:

  • 10-bit, 3GS/s at 13mW – A10B3G
  • 10-bit, 5GS/s – A10B5G

The complete table of ADC offerings show which IP blocks are verified GDS II, silicon validated, or in foundry qualification status:

FDSOI CMOS IP Library

Design Kit

When you work with an IP vendor like Alphacore, they deliver to you a complete set of files so that you can place your converter block in a design, and complete your integration with simulation, verification and implementation:

  • GDSII layout file
  • RTL files
  • Schematic
  • Abstract view
  • All DRC/LVS logs
  • Extracted view
  • Extracted simulation model
  • Verilog-AMS models
  • Guide for DFT and I/O requirements

Alphacore Customers

The actual list is proprietary, but just be assured that the customers represent many segments, like: mil-aero, image sensor, advanced electronics, defense contractors, radars, national laboratories, research agencies. The ADC IP products are being used in six end segments:

  • Wireless and wireline communications
  • Defense applications
  • Test equipment
  • Imaging and Lidar
  • Automotive
  • Scientific & Industrial Instrumentation

Summary

I was pleased to learn about the rapid growth at Alphacore, which only validates there strong position in hybrid ADC IP blocks, achieving high-bandwidth at low power by clever design and use of FDSOI process technology. They have design kits in place, and most of their IP is already silicon proven, so that makes it a safe choice.

Related Blogs


WEBINAR: Challenges in analyzing High Performance clocks at 7nm and below process nodes

WEBINAR: Challenges in analyzing High Performance clocks at 7nm and below process nodes
by Daniel Nenni on 05-10-2022 at 6:00 am

Clock analysis rail to rail

Proper clock functionality and performance are essential for SoC operation. Static timing analysis (STA) tools have served well for verifying clocks, yet with new advanced process nodes, lower operating voltages, higher clock speeds and higher reliability requirements, STA tools alone can’t perform the kinds of analysis that are needed for clock sign-off anymore. At 7nm and below, a clock failure due to rail-to-rail, duty cycle distortion or aging issues can jeopardize an entire project. To help find and solve these problems San Jose based Infinisim has developed a product called ClockEdge that uses advances in simulation in conjunction with software specifically devoted to analyzing clocks.

REGISTER HERE

ClockEdge is most relevant for clock speeds in excess of 1GHz and clocks designed at below 10nm process nodes. It can handle traditional clock tree structures and also works with grid, mesh and spine-based clocks. ClockEdge overcomes the limitations that STA encounters, offering deeper insights into clocks that help with performance, power, reliability and more. At advanced process nodes STA tools guard-band their results, leading to over-design and unnecessary power consumption. STA also suffers due to lower operating voltages and non-linear device behavior. Results from STA miss rail-to-rail failures, aging effects and supply induced jitter, all of which can lead to chip failures.

Infinisim’s ClockEdge does more than just look at timing, it ensures that the clock is also functionally correct.  It delivers SPICE accurate results, typically with overnight turnaround even for the largest SOCs. ClockEdge performance is achieved through linear scaling using LSF jobs, unlike multithreading which plateaus at around 10-20X.  ClockEdge analyzes clock performance at multiple PVT corners and will perform HCI and NBTI aging analysis.  Another benefit of ClockEdge is its ability to compute peak-to-peak, average power and leakage current for each gate in the clock.

Clock analysis rail to rail

Unlike STA, ClockEdge analyzes the entire clock domain and looks at every clock path for its timing and electrical analysis at the same time. Going beyond looking at one path at a time can uncover situations where there may be excessive guard-banding or lurking failures. In advanced process node clock designs, duty cycle distortion or asymmetry in high and low pulse widths and rail-to-rail failures are often missed by STA but accurately predicted by ClockEdge. If not detected, both these errors can cause a host of problems and lead to timing problems in the finished chip. ClockEdge does full analog signal analysis to catch and report these issues.

ClockEdge is easy to use because the entire flow is focused on clock analysis. It automatically performs gate level tracing and sensitization, which is followed by transistor level simulation. ClockEdge has comprehensive post processing to generate the reports and the information needed to interpret clock functionality and performance results. As for inputs, ClockEdge uses the same information and data that are used by STA.

Clocks are too important to leave to STA at advanced nodes. A lot needs to be looked at, including power, rail-to-rail and aging to ensure design success. This is especially true for designs below 10nm, where many of these issues can slip through if only STA is used to look at clock issues. Infinisim has put a lot of work into ClockEdge, and they have gained acceptance with major semiconductor companies working on leading edge designs. Their website includes more information on the flow for ClockEdge.

REGISTER HERE

About Infinisim
Infinisim, Inc is a privately funded EDA company providing design verification solutions. Founded by industry luminaries, the Infinisim team has over 50 years of combined expertise in the area of design and verification.

Infinisim customers are leading edge semiconductor companies and foundries that are designing high-performance mobile, AI, CPU and GPU chips.

Infinisim has helped customers achieve unprecedented levels of confidence in design robustness prior to tape-out. Customers have been able to eliminate silicon re-spins, reduce chip design schedules by weeks and dramatically improve product quality and production yield. www.infinisim.com

Also Read

WEBINAR: Overcome Aging Issues in Clocks at Sub-10nm Designs

White Paper: A Closer Look at Aging on Clock Networks


Designing Ultra-Low-Power, Always On IP

Designing Ultra-Low-Power, Always On IP
by Daniel Payne on 05-09-2022 at 10:00 am

processor and sensor trends min

It’s popular to use DSP chips for vision processing in diverse applications like ADAS, security cameras and AR. Tensilica has been designing DSP chips and IP since 1997, and their technology was successful enough that Cadence acquired Tensilica back in 2013. At the IP-SoC Silicon Valley 2022 event in April I had the pleasure to watch the presentation from Amol Borkar, Product Marketing Director, IPG TIP, from Cadence Design Systems, and his topic was titled, Designing the Next Ultra-Low-Power Always-On Silicon.

In the past 25 years there have been over 40 billion processors shipped by Tensilica customers, 19 of the top 20 semiconductor vendors use Tensilica IP, and three recent customers include: UNISCOC (Application Processor), Light  (Auto vision) and Xvision (AR). The trends for processors and sensors can be shown across five categories, and the ultra-low-power segment requires a long battery life:

Processing and Sensor Trends

Microphones and ultrasonic sensors are common approaches to wake up electronic devices, and cameras are now being used for people detection, face detection, even gestures. Amol focused on vision and speech processing for always on embedded and IoT devices. An always-on (AON) device runs continuously, reads inputs from sensors, and often dissipates under 1mW of power, while using a microcontroller.

To wake up a device requires three stages: Detection, Authentication, Processing. For AON, the focus is on the Detection phase, where a low-power MCU or DSP is typically used to do something like keyword spotting, face detection, visual wake words, or handling gestures. An AON subsystem could be designed with both a Vision AON processor, and Audio AON processor; or with a combined IP for both workloads.

For audio and speech dominant AON processing tasks there is a processor family called the Tensilica HiFi DSP, then for image and camera dominant AON processing, there is a processor family called the Tensilica Vision DSP. The low-energy DSP for vision and AI tasks is the Tensilica Vision P1 DSP.

Always-On Face Detection

The earliest approaches used a camera connected to a CPU or App processor, although the neural network algorithms on a CPU are not very energy efficient. A more energy efficient approach is to have the camera connected to a vision DSP first, then alert the CPU or App processor after detection. The most energy efficient approach is shown below, where the P1 DSP reads a low resolution image and detects a face, then a vision DSP reads the high resolution image to detect, recognize and authenticate while connected to a CPU or app processor.

Always on face detection

The DSP and CPU only wake up after the P1 does the face detection, so this approach takes the least energy. Even after the face is detected, you can choose to have the P1 perform co-processing tasks, or have the P1 go into standby. Benchmark results compared a competitor to a Vision P1 system using TinyML v0.5 for visual wake word, image classification, keyword spotting and anomaly detection. The Vision P1 system performed higher inferences per second, and much lower energy numbers than the competitor.

TinyML v0.5 benchmark comparison

Some customers use the Vision P1 DSP with their smart image sensor to classify objects, support bounding boxes and segmentation. AR applications for foveated rendering have detected pupil center and gaze detection with a Vision P1 DSP.  Aside from AON, Vision P1 is a fully functioning Vision DSP and there’s plenty of support that comes along with the P1:

  • Imaging and vision DSP tools and libraries
  • Supports AI flow of XNNC, NNAPI, TFLm
  • 100+ model zoo networks

Summary

The Cadence Tensilica products for both audio and vision are quite popular, as they’ve been in the DSP business for decades now. The Vision P1 DSP has been optimized for AON applications, and performs tasks like face detection quite efficiently. There’s a 30 minute video of this presentation online.

Related Blogs


Advantages of Large-Scale Synchronous Clocking Domains in AI Chip Designs

Advantages of Large-Scale Synchronous Clocking Domains in AI Chip Designs
by Kalar Rajendiran on 05-09-2022 at 6:00 am

Large models challenge current AI hardware solutions

We are currently in the hockey stick growth phase of AI. Advances in artificial intelligence (AI) are happening at a lightning pace. And, while the rate of adoption is exploding, so is model size. Over the past couple of years, we’ve gone from about two billion parameters to Google Brain’s recently announced trillion-parameter AI language model, the largest yet. But with model size and complexity growing at a faster rate than hardware compute capability, AI hardware is running out of steam and OEMs are looking for new solutions.

Compute challenges are nothing new to the technology industry so, what is different this time around? To date, the tried-and-true means to increase performance has been to multiply the number of tiles, or cores or chips. However, more cores and larger chips amplify a long-standing problem for designers. Chip Developers need to continuously battle the innate physics of large chips, combating skew, process variation, and aging effects. And, these effects are only multiplied as companies transition to smaller process geometries. As engineers add more processors to a design, enacting a synchronous design at high frequencies becomes an almost insurmountable task. Physical design engineers are forced to overdesign these massive chips, leading to unnecessary clocking overhead and a decrease in inference rates or increase in training times.

At the recent Linley Spring Processor Conference, Movellus’ Aakash Jani presented on the challenges and opportunities of scaling performance in very large, many-core chip designs. He shows us how an innovative approach to clocking allows greater synchronization throughout the design and enables more efficient and scalable performance for emerging AI applications.

Requirements Driving Scalable AI

Data centers, autonomous vehicles and computer vision are some of the applications that are pushing the limits of scalable AI. The old way of throwing more chips and/or processors at the problem does not lead to a scalable solution. Refer to the Figure below.

In the era of AI, big multicore chips are the new normal. More tiles or cores require more area. More area leads to more power consumption, more interconnect, more latency, and more skew. All these chip infrastructure overhead problems are amplified on larger area designs. All these problems are impacted by the clock network. The above graph shows some well know AI processors that use a multicore approach to increase performance. The problem is that as more cores are added the performance per core decrease. This is due to chip infrastructure overhead and, to a large degree, inefficiencies in the clock network.

Today, designers address clocking issues with a divide and conquer approach. They may tackle the biggest offenders first and make incremental changes until they meet design requirements. But if we approach the problem holistically, there is an opportunity for major gains in power-efficiency and performance. Additionally, we can open the door to creating large synchronous clock domains, allowing engineers to scale their systolic arrays for the next generation of multi-trillion parameter models.

Movellus Solution

Movellus presented its holistic clocking solution: intelligent clock networks. What, exactly, is an Intelligent Clock Network? Every chip begins with a perfect clock signal. However, as the signal travels through the chip, it is often delayed and distorted because of process variation and the physics of the chip. Intelligent clock networks bypass most of these problems to help clock architects deliver an ideal clock signal to every flop. These networks achieve this lofty goal using strategically placed smart clock IP modules throughout the chip. Smart clock modules use Movellus’ intelligent clock network technology to actively compensate for skew, process variation, and aging. Smart clock modules are also aware of other smart clock modules and can synchronize with them to create large synchronous clock domains via a closed feedback loop. The beauty of this approach is that it eliminates the need for a multitude of retiming flops and cross domain clocking (CDC) buffers and thereby avoids a ton of clocking overhead and system latency. It also reduces design complexity and greatly eases timing closure.

The above chart compares Movellus’ intelligent clock network approach with today’s popular solutions, including a tool driven methodology with clock tree synthesis (CTS) and a semi-custom strategy that implements a mesh. The chart shows design tradeoffs regarding fmax, useful clock period, process flexibility, power and area efficiency, and ease of timing closure. Intelligent clock networks can bring the combined advantages of today’s solutions by offering the performance of a mesh at the power consumption of a tree.

Summary

Movellus shows how an intelligent clock network that takes a holistic approach to clocking delivers a significant performance enhancement compared to individual clock network component optimizations. The company introduces its new product, Maestro AI, an intelligent clock network IP platform. Maestro AI enables SoC designers to remove unwanted and accumulating system-level latency for larger chips and chiplets. Maestro intelligent clock network solutions occupy  a much smaller area compared to alternative solutions. The solution enables designers to expand the size of synchronous clock domains. Since the solution is offered in soft IP form, it is easily configurable to customer application requirements and portable to any process technology.

On-Demand Access to Aakash’s talk and presentation

You can listen to Aakash’s talk, “Advantages of Large-Scale Synchronous Clocking Domains in SoCs and Chiplets” here, under Session 4.  You will find his presentation slides here, under Day 2- AM Sessions.

Also read:

It’s Now Time for Smart Clock Networks

CEO Interview: Mo Faisal of Movellus

Performance, Power and Area (PPA) Benefits Through Intelligent Clock Networks