SNPS1670747138 DAC 2025 800x100px HRes

Bug Hunting in NoCs. Innovation in Verification

Bug Hunting in NoCs. Innovation in Verification
by Bernard Murphy on 08-28-2024 at 6:00 am

Innovation New

Despite NoCs being finely tuned in legacy subsystems, when subsystems are connected in larger designs or even across multi-die structures, differing traffic policies and system-level delays between NoCs can introduce new opportunities for deadlocks, livelocks and other hazards. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is NoCFuzzer: Automating NoC Verification in UVM. 2024 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. The authors are from Peking University, Hong Kong University and Alibaba.

Functional bugs should be relatively uncommon in production grade NoCs, but performance bugs are highly dependent on expected traffic and configuration choices. By their nature NoCs will almost unavoidably include cycles; the mesh and toroidal topologies common in many-core servers and AI accelerators are obvious examples. Traffic in such cases may be subject to deadlock or livelock problems under enough traffic load. Equally weaknesses in scheduling algorithms can lead to resource starvation. Such hazards need not block traffic in a formal sense (never clearing) to undermine product success. If they take sufficiently long to clear, they will still fail to meet the expected service level agreements (SLAs) for the system.

There are traffic routing and scheduling solutions to mitigate such problems – many such solutions. Which will work fine within one NoC designed by one system integration team, but what happens when you must combine multiple legacy/3rd party subsystems, each with a NoC designed according to its own policy preferences and connected through a top-level NoC with its own policies? This issue takes on even more urgency in chiplet-based designs adding interposer NoCs to connect between chiplets. Verification solutions become essential to tease out potential bugs between these interconnected networks.

Paul’s view

A modern server CPU can have 100+ cores all connected through a complex coherent mesh-based network-on-a-chip (NOC). Verifying this NOC for correctness and performance is very hard problem and a hot topic with many of our top customers.

This month’s paper takes a concept called “fuzzing” from the software verification world and applies it to UVM-based verification of 3×3 OpenPiton NOC. The results are impressive: line and branch coverage hit 95% in 120hrs with the UVM bench vs. 100% in 2.5hrs with fuzzing; functional covergroups reach 89-99% in 120hrs with the UVM bench vs. 100% across all covergroups in 11hrs with fuzzing.  Also, the authors try injecting a corner case starvation bug into the design. The baseline UVM bench was not able to hit the bug after 100M packets whereas fuzzing was able to hit it after only 2M packets.

To achieve these results the authors use a fuzzing tool called AFL – checkout its Wikipedia page here. A key innovation in the paper is the way the UVM bench is connected to AFL: the authors invent a simple 4-byte XYLF format to represent a packet on the NOC. XY is the destination location, L the length, F a “free” flag. The UVM bench reads a binary file with a sequence of 4-byte chunks and then injects each packet in the sequence to each node in the NOC round robin style, first packet from cpu 00, then cpu 01, 02, 10, 11, and so on. If F is below some static threshold T then the UVM bench just has the cpu put nothing into the NOC for the equivalent length of that packet. The authors set T for a 20% chance of a “free” packet.

AFL is given an initial seed set of binary files taken from a non-fuzzed UVM bench run, applies them to the UVM bench, and is provided back with coverage data from the simulator – each line, branch, covergroup is just considered a coverpoint. AFL then starts applying mutations, randomly modifying bytes, splicing and re-stitching binary files, etc. A genetic algorithm is used to guide the mutation towards increasing coverage. It’s a wonderfully abstract, simple, and elegant utility that is completely blind to the goals for which it is aiming to improve coverage.

Great paper. Lots of potential to take this further commercially!

Raúl’s view

Fuzzing is a technique for automated software testing where a program is fed malformed or partially malformed data. These test inputs are usually variations on valid samples, modified either by mutation or according to a defined grammar. This month’s paper uses AFL (named after  a breed of rabbit) which employs mutation; its description offers a good understanding of fuzzing. Note that fuzzing differs from random or constrained random verification commonly applied in hardware verification.

The authors apply fuzzing techniques to hardware verification, specifically targeting Network-on-Chip (NoC) systems. The paper details the development of an UVM-based environment connected to the AFL fuzzer within a standard industrial verification process. They utilized Verilog, the Synopsys VCS simulator, and focused on conventional coverage metrics, predominantly code coverage. To interface the AFL Fuzzer to the UVM test environment, the test output of the fuzzer must be translated into a sequence of inputs for the NoC. Every NoC packet is represented as 40-bit string which contains the destination address, length, port (each node in the NoC has several ports) and a control flag that determines if the packet is to be executed or if the port remains idle. These strings are mutated by AFL. A simple grammar converts them into inputs for the NoC. This is one of the main contributions of the paper. The fuzzing framework is adaptable to any NoC topology.

NoC are the communication fabric of choice for digital systems containing hundreds of nodes and are hard to verify. The paper presents a case study of a compact 3×3 mesh NoC element from OpenPiton. The results are impressive: Fuzz testing achieved 100% line coverage in 2.6 hours, while Constrained Random Verification (CRV) only reached 97.3% in 120 hours. For branch coverage Fuzz testing achieved full coverage in 2.4 hours and CRV only reached 95.2% in 120 hours.

The paper is well written and offers impressive detail, with a practical focus that underscored its relevance in an industrial context. While it is occasionally somewhat verbose, it was certainly an excellent read.


Alphawave Semi Unlocks 1.2 TBps Connectivity for HPC and AI Infrastructure with 9.2 Gbps HBM3E Subsystem

Alphawave Semi Unlocks 1.2 TBps Connectivity for HPC and AI Infrastructure with 9.2 Gbps HBM3E Subsystem
by Kalar Rajendiran on 08-27-2024 at 10:00 am

9.2Gbps HBM3E Subsystem

In the rapidly evolving fields of high-performance computing (HPC) and artificial intelligence (AI), reducing time to market is crucial for maintaining competitive advantage. HBM3E systems play a pivotal role in this regard, particularly for hyperscaler and data center infrastructure customers. Alphawave Semi’s advanced HBM3E IP subsystem significantly contributes to this acceleration by providing a robust, high-bandwidth memory solution that integrates seamlessly with existing and new architectures.

The 9.2 Gbps HBM3E subsystem, combined with Alphawave Semi’s innovative silicon interposer, facilitates rapid deployment and scalability. This ensures that hyperscalers can quickly adapt to the growing data demands, leveraging the subsystem’s 1.2 TBps connectivity to enhance performance without extensive redesign cycles. The modular nature of the subsystem allows for flexible configurations, making it easier to tailor solutions to specific application needs and accelerating the development process.

Micron’s HBM3E Memory

Micron’s HBM3E memory stands out in the competitive landscape due to its superior power efficiency and performance. While all HBM3E variants aim to provide high bandwidth and low latency, Micron’s version offers up to 30% lower power consumption compared to its competitors. This efficiency is critical for data centers and AI applications, where power usage directly impacts operational costs and environmental footprint.

Micron’s HBM3E memory achieves this efficiency through advanced fabrication techniques and optimized design, ensuring that high-speed data transfer does not come at the cost of increased power usage. This makes it a preferred choice for integrating with high-performance systems that demand both speed and sustainability.

Alphawave Semi’s Innovative Silicon Interposer

At the heart of Alphawave Semi’s HBM3E subsystem is their state-of-the-art silicon interposer. This interposer is crucial for connecting HBM3E memory stacks with processors and other components, enabling high-speed, low-latency communication. In designing the interposer, Alphawave Semi addressed the challenges of increased signal loss due to longer interposer routing. By evaluating critical channel parameters such as insertion loss, return loss, intersymbol interference (ISI), and crosstalk, the team developed an optimized layout. Signal and ground trace widths, along with their spacing, were analyzed using 2D and 3D extraction tools, leading to a refined model that integrates microbump connections to signal traces. This iterative approach allowed the team to effectively shield against crosstalk between layers.

Detailed analyses of signal layer stack-ups, ground trace widths, vias, and the spacing between signal traces enabled the optimization of the interposer layout to mitigate adverse effects and boost performance. To achieve higher data rates, a jitter decomposition and analysis were performed on the interposer to budget for random jitter, power supply induced jitter, duty cycle distortion, and other factors. This set the necessary operating margins.

In addition, the interposer’s stack-up layers for signals, power, and decoupling capacitors underwent comprehensive evaluations for both CoWoS-S and CoWoS-R technologies in preparation for the transition to upcoming HBM4. The team engineered advanced silicon interposer layouts that provide excess margin, ensuring these configurations can support the elevated data rates required by future enhancements in HBM4 technology and varying operating conditions.

Alphawave Semi’s HBM3E IP Subsystem

Alphawave Semi’s HBM3E IP subsystem, comprising both PHY and controller IP, sets a new standard in high-performance memory solutions. With data rates reaching 9.2 Gbps per pin and a total bandwidth of 1.2 TBps, this subsystem is designed to meet the intense demands of AI and HPC workloads. The IP subsystem integrates seamlessly with Micron’s HBM3E memory and Alphawave’s silicon interposer, providing a comprehensive solution that enhances both performance and power efficiency.

The subsystem is highly configurable, adhering to JEDEC standards while allowing for application-specific optimizations. This flexibility ensures that customers can fine-tune their systems to achieve the best possible performance for their unique requirements, further reducing the time and effort needed for deployment.

Summary

Alphawave Semi’s HBM3E IP subsystem, powered by their innovative silicon interposer and Micron’s efficient HBM3E memory, represents a significant advancement in high-performance memory technology. By offering unparalleled bandwidth, enhanced power efficiency, and flexible integration options, this subsystem accelerates time to market for hyperscaler and data center infrastructure customers.

For more details, visit

https://awavesemi.com/silicon-ip/subsystems/hbm-subsystem/

Also Read:

Alphawave Semi Tapes Out Industry-First, Multi-Protocol I/O Connectivity Chiplet for HPC and AI Infrastructure

Driving Data Frontiers: High-Performance PCIe® and CXL® in Modern Infrastructures

AI System Connectivity for UCIe and Chiplet Interfaces Demand Escalating Bandwidth Needs


Analog Bits Momentum and a Look to the Future

Analog Bits Momentum and a Look to the Future
by Mike Gianfagna on 08-27-2024 at 6:00 am

Analog Bits Momentum and a Look to the Future

Analog Bits is aggressively moving to advanced nodes. On SemiWiki, Dan Nenni covered new IP in 3nm at DAC here. I covered the new Analog Bits 3nm IP presented at the TSMC Technology Symposium here. And now, there’s buzz about 2nm IP to be announced at the upcoming TSMC OIP event in September.  I was able to get a briefing from the master of analog IP, enology and viticulture Mahesh Tirupattur recently. The momentum is quite exciting, and I will cover that in this post. There is another aspect to the story – the future impact of all this innovation. Mahesh touched on some of that, and I will add my interpretation of what’s next. Let’s examine Analog Bits momentum and a look to the future.

The Momentum Builds

The Analog Bits catalog continues to grow, with a wide array of data communication, power management, sensing and clocking technology. Here is a partial list of IP that is targeted at TSMC N2:

Glitch Detector (current IP): Instant voltage excursion reporting with high bandwidth and voltage fluctuation detection. Delivers circuit protection and enhances system security in non-intended operation modes. IP can be cascaded to function similar to a flash ADC.

Synchronous Glitch Catcher (new IP):  Multi-output synchronized glitch detection. Reports voltage excursions above and below threshold during the clock period with high bandwidth. Improved detection accuracy with system clock alignment that also facilitates debugging and analysis.

Droop Detector (enhanced IP): Extended voltage range 0.495 – 1.05V with higher maximum bandwidth of 500MHz. Differential sensing and synchronous voltage level reporting. Precision in monitoring with continuous observation and adaptive power adjustment. A pinless version that operates at the core voltage is in development.

On-Die Low Dropout (LDO) Regulator (enhanced IP): Improved power efficiency. Fast transient response and efficient regulation and voltage scalability. Offers integration, space savings, and noise reduction. Use cases include high-performance CPU cores and high lane count, high-performance SerDes.

Chip-to-Chip (C2C) IO’s (enhanced IP): Supports core voltage signaling. Best suited for CoWoS with 2GHz+ speed of operation and 10GHz+ in low-loss media.

High-Accuracy PVT Sensor (enhanced IP): Untrimmed temperature accuracy was originally +/- 8 degrees C.  An improved version has been developed that delivers +/- 3.5 degrees C. Working silicon is available in TSMC N5A, N4 & N3P. The figure below summarizes performance.

PVT Sensor Temp Performance

Looking ahead, accuracy of +/- 1 degree C is possible with trimming. The challenge is, the trimming is affected by the die temperature, making it difficult to achieve this accuracy. Analog Bits has developed a way around this issue and will be delivering high accuracy PVT sensors for any die temperature.

This background sets the stage for what’s to come at the TSMC OIP event. In September, Analog Bits will tape out a test chip in TSMC N2. Here is a summary of what’s on that chip:

  • Die Size: 1.43×1.43mm
  • Wide-range PLL
  • 18-40MHz Xtal OSC
  • HS Differential Output Driver and Clock Receiver – Power Supply Droop Detector
  • High Accuracy PVT Sensors
  • Pinless High Accuracy PVT Sensor
  • LCPLL
  • Metal Stack – 1P 15M

The graphic at the top of this post is a picture of this test chip layout. In Q1, 2025 there will be another 2nm test chip with all the same IP and:

  • LDO
  • C2C & LC PLL’s
  • High Accuracy Sensor

The momentum and excitement will build.

A Look to the Future

Let’s re-cap some of the headaches analog designer face today. A big one is optimization of performance and power in an on-chip environment that is constantly changing, is prone to on-chip variation and is faced with all kinds of power-induced glitches. As everyone moves toward multi-die design, these problems are compounded across lots of chiplets that now also need a high-bandwidth, space-efficient, and power-efficient way to communicate.

If we take an inventory of the innovations being delivered by Analog Bits, we see on-chip technology that addresses all of these challenges head-on. Just review the list above and you will see a catalog of capabilities that sense, control and optimize pretty much all of it. 

So, the question becomes, what’s next? Mahesh stated that he views the mission of Analog Bits is to make life easier for the system designer. The solutions that are available and those in the pipeline certainly do that. But what else can be achieved? What if all the information being sensed, managed and optimized by the Analog Bits IP could be processed by on-chip software?

And what if that software could deliver adaptive control based on AI technology? This sounds like a new killer app to me. One that can create self-optimizing designs that will take performance and power to the next level.  I discussed these thoughts with Mahesh. He just smiled and said the future will be exciting.

I personally can’t wait to see what’s next.  And that’s my take on Analog Bits momentum and a look to the future.


Spatial audio concepts targeted for earbuds and soundbars

Spatial audio concepts targeted for earbuds and soundbars
by Don Dingee on 08-26-2024 at 10:00 am

Spatial audio concepts differ from traditional surround sound

Spatial audio technologies deliver more realistic sound by manipulating how the listener perceives sounds virtually sourced from different directions and distances in a 3D space. Where traditional surround sound technology uses various sound channels through many speakers positioned around a listener, spatial audio can deliver immersive experiences from fewer speakers in smaller packages, such as in a pair of earbuds or a compact soundbar. Kaushik Sethunath, Audio Test Engineer at Ceva, shared some thoughts leading into his series of blog posts explaining spatial audio concepts and parameters that help define innovative designs.

Better sound is intensely subjective for each listener

Audio has been the subject of intense scrutiny from expert reviewers since the initial development of high-fidelity analog recordings on 33rpm vinyl in 1948. Studio engineers became proficient at mixing multiple recorded tracks into stereo formats. At the peak of the vinyl format, 1970s bands like Steely Dan and Pink Floyd produced albums renowned for their complex yet crisp sound, becoming benchmarks for consumer stereo systems.

What constituted “stereo” sound was relatively simple, with left and right speakers standard and optional center and subwoofer channels on higher-end gear. If one spent more money on equipment – sensitive, mechanically smooth turntables, amplifiers with lower distortion and noise and higher dynamic range, and larger, more powerful speakers with improved response – the sound was, at least in theory, perceptibly better.

However, with so many variables in analog audio, including differences in the frequency sensitivity of each listener’s ears, better sound was a subjective measure. Vinyl records would degrade with handling and excessive play, altering even great experiences. Then, audio went digital, first on physical CDs, then in file formats such as MP3. Digital recordings don’t degrade over time, and new delivery mechanisms appeared.

Perhaps more importantly, digital audio technology ushered in significant engineering changes. Users moved from large, fixed stereo equipment and the 12” vinyl format to smaller, less expensive portable gear playing CDs or files. Some audio engineers responded by recording content for listening through lower-quality headphones in noisy ambient settings, using higher sound levels with less dynamic range, leaving the sound good enough for most listeners.

Use cases drive a need for an audio parameter framework

In the last few years, the pendulum has swung back: consumers can now buy digital audio technology rivaling high-end surround sound systems in affordable soundbars and earbuds, with pervasive streaming technology delivering more sophisticated audio formats like Dolby Atmos and DTS:X. The low-quality approaches to content are leaving listeners wanting more, and they are willing to spend incrementally more to get better quality they can hear.

“Trying to preserve the integrity of the original artist’s vision is really important,” says Sethunath. “We think the best way to experience sound is with different settings for different content. A podcast heard while commuting is a very different use case from a movie in the comfort of a home theater, and gamers have other needs, so there is no one-size-fits-all. Accordingly, based on the content, the parameters of the spatial audio processing need to be tuned, to create the appropriate spatial experience.”

Sethunath sees a more complex landscape where the industry lacks a framework to compare and quantify audio performance in different use cases. He proposes eight technical parameters in two broad categories to guide both spatial audio device design and content curation:

  • Spatialization
    • Degree of Externalization
    • Room Character and Presets
    • Maximum Number of Channels Rendered
    • Mono and Stereo Rendering
    • Artifacts
  • Head Tracking
    • Latency
    • Degrees of Freedom
    • Artifacts

There are tradeoffs and design decisions with host-based rendering (using the power of phones and tablets to do the heavy lifting of spatial audio processing) and embedded rendering on the headset (lowest latency, but without direct multi-channel support due to Bluetooth bandwidth limitations). Ceva provides optimized solutions for both architectures, including head tracking technology to enhance realism in affordable devices.

“I think creating a smoother onboarding process to spatial audio, walking people through what it can do and content that highlights the experience, will be compelling,” says Sethunath. He’s created a new series of three blog posts on spatial audio concepts, explaining the parameters in more detail and describing how designers can evaluate implementations. Links to the posts:

Evaluating Spatial Audio – Part 1 – Criteria & Challenges

Evaluating Spatial Audio – Part 2 – Creating and Curating Content for Testing

Evaluating Spatial Audio – Part 3 – Creating a Repeatable System to Evaluate Spatial Audio

 

For readers interested in Ceva’s IP with solutions for head tracking, more info is also online:

Ceva-RealSpace: Spatial Audio & Head Tracking Solution


A Closer Look at Conquering Clock Jitter with Infinisim

A Closer Look at Conquering Clock Jitter with Infinisim
by Mike Gianfagna on 08-26-2024 at 6:00 am

A Closer Look at Conquering Clock Jitter with Infinisim

As voltages go down and frequencies increase, the challenges in chip design become increasingly complex and unforgiving. Issues that once seemed manageable now escalate, while new obstacles emerge, demanding our attention. Among these challenges, clock jitter stands out as a formidable threat. At its core, clock jitter is defined as the variation of a clock signal from its ideal position in time. Seemingly minor, these kind of subtle variations in the clock can cause catastrophic failures in high-performance designs. Previously, Dan Nenni provided a great overview of the problem and what Infinisim is doing about it here.  Recently, I had the opportunity of speaking directly with the co-founder of Infinisim, where I gained profound insights into the enormity of the clock jitter problem and the monumental efforts required to address it. Read on for a closer look at conquering clock jitter with Infinisim.

Contributors to Clock Jitter

There are two main contributors to clock jitter – the PLL and the power delivery network (PDN). The PLL can deliver a noisy input signal to the clock circuit, creating jitter in the clock. In this case, the jitter is the same throughout the entire clock since it comes from one source. This localized effect isn’t the main focus for Infinisim’s tools. Instead, the company focuses on a much larger and more complex system design challenge, PDN induced jitter.

PDN jitter arises from a noisy supply voltage. Unlike PLL-induced jitter, PDNs can be influenced by multiple input pins and encompass numerous power domains. Add to that the local effects at each gate and you begin to see a pervasive and difficult to track problem. This is the area where Infinisim concentrates its efforts. The figure below illustrates these challenges.

PDN Jitter Challenge

What it Takes to Fix Clock Jitter

Dr. Zakir Hussain Syed

I had a highly informative discussion with Dr. Zakir Hussain Syed. Zakir is a co-founder and CTO at Infinisim with over 25 years of experience in EDA. His deep understanding of the issues was evident throughout our discussion, and I gained a wealth of knowledge from our exchange.

Zakir began by explaining the components of PDN-induced clock jitter. In the case of the PDN, every gate in the clock can see some level of noise-induced jitter. Each is an independent event, and the movement of clock edges is very small. Each event has the potential to change timing and behavior of the circuit. To find the best- and worst-case jitter in the circuit requires simulation of thousands of clock cycles – the errors can compound and the only way to find that is to simulate many cycles.

Furthermore, since the edge movement is very small, the simulation must be highly accurate. So, finding PDN-induced clock jitter requires SPICE-level accurate simulation over many cycles as quickly as possible. Remember, this is part of the verification loop, so speed is quite important.  Do you have a headache yet?  I began to at this point.

As Zakir continued, the problem got worse. Clock domains are becoming more complex thanks to multiple voltage domains. This creates more independent noise sources. Beyond that, power comes into the chip through many bump connections – potentially hundreds of bumps. Each bump will have its own noise signature which yet again increases the variety of issues that must be analyzed.

All this creates multiple types of clock jitter:

  • Absolute jitter:
    • The actual transition time compared to the ideal clock transition time
  • Period jitter:
    • The difference between actual transition and ideal transition at each period
  • Cycle-to-cycle jitter:
    • The difference in period jitter between two adjacent cycles periods

The figure below summarizes these effects.

Types of Clock Jitter

Zakir then provided a bit of history for perspective. For the case of on-chip variation (OCV), initially the worst-case number was used for guard banding. As designs got more complex, applying just one number created an overly pessimistic metric and the result was very poor circuit performance. For many years now, OCV is calculated across the chip at a very fine-grained local level to provide more realistic guard bands. We are now at a point where the same strategy needs to be applied to clock jitter guard banding. A single number must be replaced by fine-grained analysis across the entire clock of the chip.

That fine-grained analysis looks at the noise per gate, per path, per noise profile for each cycle. Designers are looking for the best and worst-case jitter on a local level to develop the guard banding to use. It turns out the worst-case jitter can happen anywhere in the path, not just on the input of the flops. If you couple that fact with the per noise profile analysis, designers cannot only develop much more accurate guard bands, the jitter in the circuit can also be reduced.

Zakir explained that the per gate analysis can identify the weakest gate in the path from a jitter perspective. That gate can then be modified to be less susceptible to jitter.  The per noise profile analysis can find the power bumps that generate the most noise and those, too can be modified to improve performance. All this helps improve the overall circuit performance in meaningful ways.

So, how does Infinisim manage to analyze all those profiles, circuits and scenarios over thousands of cycles with sub-picosecond resolution in a reasonable time frame? Zakir explained that relying on traditional SPICE simulations isn’t feasible – it will simply take far too long. Instead, he detailed Infinisim’s holistic approach to tackling this challenge.

First, the noise in the circuit is developed either with a commercial IR drop tool or with measurements if the silicon is available. That data is then analyzed by Infinisim’s ClockEdge and JitterEdge tools holistically across full clock domains. Using this analysis of the data over many scenarios finds the positive and negative jitter at every gate in the clock.

What is the Impact of Clock Jitter?

Just how big a problem is clock jitter?  Consider there are several potential impacts on chip performance and reliability. These include:

Slower chip performance: Clock jitter leads to timing uncertainties. This can cause data to arrive too early or too late, resulting in timing violations. To mitigate this, timing margins are increased which slows the clock frequency.

Lower yield: Clock jitter can cause a higher rate of timing failures, particularly in chips operating close to their performance limits. This can lead to a higher percentage of chips failing during testing and thus a lower manufacturing yield.

So, the question is, what’s the impact of the above effects? Here is one quick “back of the envelope” calculation. Assume a manufacturing cost per chip of $50 for a design with a projected volume of 1 million units. Further assume an expected yield without jitter issues of 95%. If we assume that jitter lowers yield by 5% (a 5% drop in yield due to jitter is a reasonable assumption for a high-volume production environment where even small timing issues can have significant impacts), the following will result:

  • Design without jitter issues:
    • Chips produced: 1,000,000
    • Yield: 95%
    • Good chips: 950,000
    • Cost per good chip: $52.63
  • With jitter issues (5% lower yield):
    • Chips produced: 1,000,000
    • Yield: 90%
    • Good chips: 900,000
    • Cost per good chip: $55.56
  • Increased cost per chip: $2.93
  • Total additional cost: $2.93 * 900,000 = $2,637,000

Jitter could easily cost you millions over the lifespan of a chip, and this calculation doesn’t even consider the potential loss of market share from a slower chip due to increased timing margins—a much greater concern in competitive markets where even minor performance deficits can lead to significant losses. The specifics may vary depending on your situation, but one thing is certain—clock jitter is a critical issue that cannot be overlooked.

To Learn More

If you are designing high-performance chips, you’re likely lowering voltage and boosting frequency – both of which elevate clock jitter to a critical first-order issue. I strongly recommend exploring how Infinisim can assist with this challenge. You can learn more about Infinisim’s jitter analysis capabilities here. You can also get a broad overview of what Infinisim can do along with access to a webinar replay on clock analysis at 7nm and below here.  And that’s a closer look at conquering clock jitter with Infinisim.


Podcast EP243: What is Yield Management and Why it is Important for Success with Kevin Robinson

Podcast EP243: What is Yield Management and Why it is Important for Success with Kevin Robinson
by Daniel Nenni on 08-23-2024 at 10:00 am

Dan is joined by Kevin Robinson, yieldHUB’s Vice President of Operations who was recently appointed Head of Sales for Europe, the Middle East & Africa. With over 23 years of experience as a test engineer in the semiconductor industry, Kevin brings a wealth of knowledge and dedication to his dual role. At yieldHUB, Kevin leads both sales and operations teams, playing a crucial role in delivering top-notch experiences to UK and European customers.

Kevin explains the basics of yield management in this broad conversation. He outlines the reasons to implement yield management early, which includes better market traction through customer trust and acceptance.

The aspects of buy vs. build for a yield management system are also explored, along with the risks of not implementing an early yield management system that can scale.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: BRAM DE MUER of ICsense

CEO Interview: BRAM DE MUER of ICsense
by Daniel Nenni on 08-23-2024 at 6:00 am

IMG 0411[6]

Bram co-founded ICsense in 2004 as a spin-off of the University of Leuven. He is CEO since 2004 and helped growing the company from 4 to over 100 people in 20 years while being profitable every year. He managed the acquisition by TDK in 2017. He is an experienced entrepreneur in the micro-electronics field with a strong interest in efficiently managing design teams and delivering projects with high quality.

Bram is a board member of Flanders Semiconductor, a non-profit organization that represents the Belgian semiconductor ecosystem. He is also member of the Crown Counsel of SOKwadraat, a non-profit organization to boost the number of spin-offs in Belgium. He holds a MSc. degree in micro- electronics and a Ph.D. from the Katholieke Universiteit Leuven, Belgium. Bram has been a research and postdoctoral assistant with ESAT-MICAS laboratories with Prof. M. Steyaert.

Tell us about your company?
At ICsense, we specialize in analog, mixed-signal and digital ASIC (Application-Specific Integrated Circuit) developments. We handle the complete chain from architectural definition, design, in-house test development upto mass production of the custom components. Today, we are one of the largest fabless European companies active in this domain.

I co-founded ICsense with 3 of my PhD colleagues back in 2004. Our focus has always been on analog, digital, mixed-signal, and high-voltage ICs, serving diverse industries including automotive, medical, industrial, and consumer electronics. ICsense is headquartered in Leuven, Belgium  and has a design center in Ghent, Belgium. The semiconductor ecosystem in Belgium is quite lively, with imec as a renown research center, world class universities and many industrial players in different parts of the semiconductor value chain represented by Flanders Semiconductors.

In 2017, we became part of the Japanese TDK Group (www.tdk.com), a leading supplier of electronic components. This enabled us to continue our strategy and serve customers worldwide as before. What many people don’t realize is that the majority of ICsense’s business today is outside the TDK Group!

Joining TDK has allowed us to grow faster and broaden our activities. We have invested in ATE (Automated Test Equipment, mass production testers and wafer probers) to do test program developments in-house. This makes ICsense unique in the market of ASIC suppliers, capable of building some of the highest-performance ASICs and bringing them into production for our customers.

What problems are you solving?
Many industries require specialized ICs tailored to specific applications, that off-the-shelf solutions often cannot adequately serve. To meet this need, we design custom ASICs for automotive, medical, industrial, and consumer electronics sectors, ensuring optimal performance and functionality.

Designing high-performance analog and mixed-signal ICs is inherently complex and requires specialized expertise. This expertise is the reason our customers knock on our door. Leveraging our extensive experience in analog, digital, mixed-signal, and high-voltage ICs, we deliver robust and reliable solutions. We develop advanced sensor interfaces, power management solutions, high-voltage actuation and sensing circuits, ultra-low-power circuitry and communication chips.

Every chip is uniquely build for one single customer at a time and only supplied to that customer. The customer’s IP is fully protected to keep his competitive edge in the market.

What application areas are your strongest?
In our 20 years of existence, we have built up a strong track record in complex ASIC developments in different technology nodes and for many different applications. We often push the boundaries to reach the highest performance or tweak the last uA out of a circuit. We are definitely not an “IP-gluer” (i.e. a company that simply combines existing IP blocks without modifications). Our design work is mostly custom, to meet the challenging requirements our customers are faced with.

Over the past 10 years, we have seen a strong growth in industries such as automotive and medical that require ICs meeting stringent quality and reliability standards. To address this, we employ rigorous design techniques. ICsense works according ISO13485 (medical) and ISO262626 (automotive) compliance standards. To give you one example, all the automotive ASICs we developed in the last 5 years are at least ASIL-B(D) Functional Safety level.

What keeps your customers up at night?
It really depends on the specific customer. We don’t have a typical client profile; our customers range from startups to large multinationals, from semiconductor companies to OEMs, each with their own unique concerns and expectations. In the medical market, for example, we collaborate with industry leaders in implants, such as Cochlear, as well as with brand-new startups aiming to bring novel ideas to new markets. The common ground among all our clients is their need for a partner who can build innovative, state-of-the-art ASICs with low risk and who supports sustainable production. They appreciate that ICsense combines the flexibility and dynamic team of a startup company, with the rigour, stability and sustainability of a large company.

In recent years, another major concern for our customers has been de-risking their supply chains. Discussions now frequently revolve around second sourcing and geopolitical issues. In response, we have been exploring more technology and partner options across the supply chain. Today, we are one of the few companies worldwide that can offer IC design in over 50 technology flavors, with fabrication facilities in the US, Europe, and Taiwan. Our specific design methodology allows us to efficiently work across various technology nodes, ensuring we can select the best match for our customers.

What does the competitive landscape look like and how do you differentiate?
Lately, there has been a lot of consolidation in the semiconductor value chain in Europe. As a result, ICsense remains one of the few companies of its size and capabilities that can serve external customers. Thanks to our mother company TDK, we can provide ASICs to Fortune 500 companies and to smaller companies and startups at the same time. With a team of over 100 skilled designers and in-house ATE and product engineering, we have a unique position in ASIC design and supply to the medical, industrial, consumer and automotive markets.

What new features/technology are you working on?
All our ASIC developments are customer specific. Some will hit the market as ASSP by our customer, most as part of a single product. Therefore, all the technology and features we are developing are confidential. We see some trends in the market, such as a shift towards smaller technology nodes (although not deep submicron) and a shift towards more differentiation in supply chain. Our technology-agnostic design approach is quite powerful to capture this trend.

Another trend is the push to higher integration and more functionality in many applications, from medical implants to industrial devices, that push the boundaries of the state-of-the-art. Again, this is one of our core strengths.

How do customers normally engage with your company?
We work with customers in 2 models: the first is a pure design support model, where we act as a virtual team to our customer. We perform the full design and hand over the design files, so our customer can integrate this further or handle the manufacturing themselves. Our second and most popular model is the turnkey supply model or -as we call it- ASIC design and supply. We handle the complete development from study upto mass production for our customer and we supply the ASICs to them throughout the lifetime of their product.

An ASIC design can start with just a back of the envelope idea or a full product requirement. Whatever the starting point, our first step is always to do a feasibility and architectural study in which we pin down all the details of the design to be made, define boundary conditions and prove with calculations and preliminary simulations that the requirements can be met.

We then proceed to the actual implementation, the design and layout work, which is the bulk of the work in the project. Through the design cycle, we continuously perform in-depth verification from transistor to chip top level to make sure all use cases are covered prior to the actual manufacturing of the wafers. In parallel to the manufacturing of the engineering silicon, we develop the ATE test hardware and software so that when the silicon returns from the fab, we can immediately start testing.

We have a good track record in first time functional designs, meaning that the ASIC is fully functional and can be used to build prototypes at the customer side. We typically only need a respin to fix small items and to optimise the yield. This is a result of our proprietary, systematic design flow based on commercially available EDA tools such as Cadence, Synopsys and Siemens.

The last stage is industrialisation, including qualification of the chips and perform additional statistical analysis to prove robustness over the lifetime of the product. Our product engineering team supports our customer with the ramp up, start of production and monitoring of yield, … during production. The supply model, direct or through partners, depends on the volume and the type of customer.

Also Read:

CEO Interview: Anders Storm of Sivers Semiconductors

CEO Interview: Zeev Collin of Semitech Semiconductor

CEO Interview: Yogish Kode of Glide Systems


Overcoming Verification Challenges of SPI NAND Flash Octal DDR

Overcoming Verification Challenges of SPI NAND Flash Octal DDR
by Kalar Rajendiran on 08-22-2024 at 10:00 am

Typical Octal Serial NAND Device

As the automotive industry continues to evolve, the demands for high-capacity, high-speed storage solutions are intensifying. Autonomous vehicles and V2X (Vehicle-to-Everything) communication systems generate and process massive amounts of data, necessitating advanced storage technologies capable of meeting these demands. NAND Flash memory, particularly in its Serial NAND form, has emerged as a critical component in this space, offering higher memory density compared to alternatives like NOR Flash. However, the adoption of new architectures, especially those involving SPI Octal DDR interfaces, presents unique challenges in the verification of these storage solutions.

Durlov Khan, a Product Engineering Lead at Cadence, gave a talk at the FMS 2024 Conference, on how his company helped overcome these verification challenges.

Challenges in Verifying SPI NAND Flash Octal DDR

One of the significant hurdles in integrating SPI Octal DDR NAND Flash into automotive applications is the difficulty in accurately verifying these advanced storage devices. Traditional verification models for NOR Flash memory cannot adequately model the architecture and addressing schemes of Serial NAND Flash memory, especially when it comes to the Command-Address-Data (C-A-D) instruction sequences.

Existing models for x1, x2, or x4 SPI Quad NAND devices fall short in simulating Octal SPI NAND devices due to key differences in architecture. Octal SPI NAND uses an 8-bit wide data bus, requiring more complex Command-Address-Data (C-A-D) sequences and additional signal pins (SIO3-SIO7), which aren’t supported by Quad SPI models.

Additionally, Octal devices operate at higher frequencies with stricter timing parameters, including the use of a Data Strobe (DS) signal for data synchronization. These factors make existing Quad SPI models inadequate for accurately simulating the behavior of Octal SPI NAND devices.

Attempting to replicate an Octal device by combining multiple SPI or SPI Quad NAND devices is not feasible due to signaling incompatibilities and significant discrepancies in AC/Timing parameters, leading to inaccurate verification results. This gap in verification capabilities poses a substantial risk, as it limits developers’ ability to ensure that their automotive storage solutions will perform reliably in real-world scenarios.

Collaborative Solution: SPI NAND Flash Memory Model Enhancement

To address these challenges, a collaborative effort was undertaken by Cadence, in partnership with Winbond, leading to the development of a robust solution for SPI Octal DDR verification. This solution centers around the enhancement of the Cadence SPI NAND Flash Memory Model, which now supports the new SPI Octal DDR capabilities.

This enhanced Memory Model can be activated through a configuration parameter and includes additional support for a Volatile Configuration Register. This register allows users to program the correct Octal transfer mode, enabling accurate simulation of the SPI Octal DDR interface. In this mode, legacy SI and SO pins are repurposed, and new SIO3-SIO7 pins are introduced, along with a Data Strobe (DS) output pin that works with read data to signal the host controller at maximum DDR frequencies.

The model is fully backward compatible and can operate in multiple modes, including 1-bit SPI Single Data Rate (SDR), 1-bit SPI Double Data Rate (DDR), 8-bit Octal SPI SDR, and 8-bit Octal SPI DDR, depending on user configuration. This flexibility ensures that developers can accurately simulate a wide range of operational scenarios, crucial for the varying demands of automotive applications.

Real-World Application and Results at NXP

The integration of the Cadence VIP into NXP’s test environment demonstrated the effectiveness of this solution. The VIP seamlessly supported various density grades of SPI NAND Flash, with commands automatically adapting to the specific density grade in use. This adaptability and the ability to accurately model the SPI Octal DDR interface provided NXP with a reliable verification tool, ensuring that their storage solutions met the stringent performance and reliability standards required in the automotive sector.

Summary

The challenges in verifying SPI NAND Flash Octal DDR devices highlight the complexities of developing advanced storage solutions for the automotive industry. However, through collaborative efforts and innovative solutions like the enhanced SPI NAND Flash Memory Model from Cadence, developers can overcome these challenges. This advancement not only supports the current needs of automotive applications but also lays the groundwork for future innovations in storage technology, ensuring that the next generation of vehicles can handle the ever-increasing demands of data processing and storage with efficiency, reliability, and security.

For more details, visit Cadence’s SPI NAND solutions page.

Also Read:

The Impact of UCIe on Chiplet Design: Lowering Barriers and Driving Innovation

The Future of Logic Equivalence Checking

Theorem Proving for Multipliers. Innovation in Verification


CAST Advances Lossless Data Compression Speed with a New IP Core

CAST Advances Lossless Data Compression Speed with a New IP Core
by Mike Gianfagna on 08-22-2024 at 6:00 am

CAST Advances Lossless Data Compression Speed with a New IP Core

Data compression is a critical element of many systems. Thanks to trends such as AI and highly connected systems there is more data to be stored and processed every day.  Data growth is staggering. Statista recently estimated that 90% of the world’s data was generated in the last two years. Storing and processing all that data demands ways to reduce the space it takes.

Data compression takes two basic forms – lossless and lossy. Lossy data compression can result in significantly smaller file sizes, but with the potential loss of some degree of quality. JPEG and MP3 are examples. Lossless compression also produces smaller files, but with complete fidelity to the original. Some data cannot tolerate loss during compression — such as text, code, and binaries — while for other data the maximum, original quality is essential — think medical imaging or financial information. GIF, PNG and ZIP are lossless formats.

So lossless data compression is quite prevalent. That’s why a new IP core from CAST has such significance. Let’s look at how CAST advances lossless data compression speed with a new IP core.

Lossless Compression Basics

As discussed, lossless compression doesn’t degrade the data, and so the decompressed data is identical to the original, just with a somewhat smaller file size. Lossless compression typically works by identifying and eliminating statistical redundancy in the information. This can require additional computing time, so ways to speed up the process are important.

There are many algorithms that can be applied to this problem. Two popular ones are:

LZ4 – which features an extremely fast decoder. The LZ4 library is provided as open-source software using a BSD license.

Snappy – is a compression/decompression library. The focus of this approach is very high speeds and reasonable compression. The software is provided by Google and is used extensively by the company.

The CAST Announcement

With that primer, the recent CAST announcement will have better context.

New LZ4 & Snappy IP Core from CAST Enables Fast Lossless Data Decompression.

This announcement focuses on a new IP core from CAST that accelerates the popular LZ4 and Snappy algorithms. The part can be used in both ASIC or FPGA implementations and delivers a hardware decompression engine. Average throughput is a peppy 7.8 bytes of decompressed data per clock cycle in its default configuration. And since it’s an IP core, decompression can be improved to 100Gbps or greater by instantiating the core multiple times.

Called LZ4SNP-D, CAST believes it is the first available RTL-designed IP core to implement LZ4 and Snappy lossless data decompression in ASICs or FPGAs from all popular providers. Systems using the core benefit from standalone operation, which offloads the host CPU from the significant tasks of decompressing data.

The core handles the parsing of the incoming compressed data files with no special preprocessing. Its extensive error tracking and reporting capabilities ensure smooth system operation, enabling automatic recovery from CRC 32, file size, coding, and non-correctible ECC errors.

My Conversation with the CEO

At DAC, I was able to meet with Nikos Zervas, the CEO of CAST. I found Nikos to be a great source of information on the company. You can read about that conversation here. So, when I saw this press release, I reached out to him to get some more details.

It turns out lossless data compression isn’t new for CAST. The company has been offering GZIP/ZLIB/Deflate lossless data compression and decompression engines since 2014. These engines have scalable throughput, and there are customers using them to compress and decompress at rates exceeding 400Gbps.

Applications include optimization for storage and/or communication bandwidth for data centers (e.g., SSDs, NICs) and automotive (for recording sensor data).  For other applications, the delay and power for moving large amounts of data between SoCs or within SoCs is optimized. I can remember several challenging chip designs during my ASIC days where this kind of problem can quickly become a real nightmare.

An example to consider is a flash device’s inline decompression of boot images. Flash memories are slow and power-hungry. Both the latency (i.e., boot time) and energy consumption for loading a boot image can be significantly reduced by storing a compressed file for that boot image and decompressing it on the fly during boot. Other use cases involve chipsets exchanging large amounts of data over Gbp-capable connections, or parallel processing platforms moving data from one processing element to the next.

It turns out CAST has tens of customers ranging from Tier 1 OEMs in the networking and telecom equipment area to augmented reality startups that use GZIP/ZLIB/Deflate lossless cores.

I asked Nikos, why introduce new lossless compression cores now? He explained that LZ4 and Snappy compression may not be as efficient as GZIP, but they are less computationally complex, and this attribute makes LZ4 and Snappy attractive in cases where compression is at times (or always) performed in software. The lower computational complexity also translates to a smaller and faster hardware implementation, which is also important, especially in the case of high processing rates (e.g., 100Gbps or 400Gbps) where the size of the compression or decompression engines is significant (in the order of millions to tens of millions of gates).

CAST had received multiple requests for these faster compression algorithms over the past couple of years. Nikos explained that the company listened and responded with the hardware LZ4/Snappy decompressor. He went on to say that a compressor core will follow. This technology appears to be quite popular. CAST had its first lead customer signed up before announcing the core.

To Learn More

The new LZ4SNP-D LZ4/Snappy Data Decompressor IP core is available now. It can be delivered in synthesizable HDL (System Verilog) or targeted FPGA netlist forms and includes everything required for successful implementation. Its deliverables include:

  • Sophisticated test environment
  • Simulation scripts, test vectors, and expected results
  • Synthesis script
  • Comprehensive user documentation

If your next design requires lossless compression, you should check out this new IP from CAST here.  And that’s how CAST advances lossless data compression speed with a new IP core.


Robust Semiconductor Market in 2024

Robust Semiconductor Market in 2024
by Bill Jewell on 08-21-2024 at 1:30 pm

Semiconductor Market Change 2024

The global semiconductor market reached $149.9 billion in the second quarter of 2024, according to WSTS. 2Q 2024 was up 6.5% from 1Q 2024 and up 18.3% from a year ago. WSTS revised 1Q 2024 up by $3 billion, making 1Q 2024 up 17.8% from a year ago instead of the previous 15.3%.

The major semiconductor companies posted generally strong 2Q 2024 revenue gains versus 1Q 2024. Of the top fifteen companies, only two – MediaTek and STMicroelectronics – saw revenue declines in 2Q 2024. The strongest growth was from the memory companies, with SK Hynix and Kioxia each up over 30%, Samsung Semiconductor up 23% and Micron Technology up 17%. The weighted average growth of the top fifteen companies in 2Q 2024 versus 1Q 2024 was 8%, with the memory companies up 22% and the non-memory companies up 3%.

Nvidia remained the largest semiconductor company, based on its 1Q 2024 guidance of $28 billion in 2Q 2024 revenue. Samsung was number two at $20.7 billion. Broadcom has not yet reported its 2Q 2024 results, but we estimate revenues at $13.0 billion, passing Intel at $12.8 billion. Intel slipped to fourth, after many years of being number one or number two.

Revenue guidance for 3Q 2024 versus 2Q 2024 is positive, but with a wide range of outlooks. AMD expects 3Q 2024 revenue to increase 15% based on strong growth in data center and client computing. Micron indicated the memory boom will continue, with supply below demand, and guided for 12% growth. Samsung Semiconductor and SK Hynix did not provide revenue guidance, but both companies expect continuing strong demand from server AI.

A few companies project low 3Q 2024 revenue growth of about 1%: Intel, MediaTek and STMicroelectronics. Intel blamed excess inventory for the weak outlook. The other five companies providing revenue guidance are in the 4% to 8% range. STMicroelectronics and NXP Semiconductors expect automotive to improve in 3Q 2024, but inventory issues remain in the industrial sector. Texas Instruments projects strength in personal electronics. The 3Q 2024 weighted average revenue growth of the nine non-memory companies providing guidance was 5%.

The substantial increase in the semiconductor market in the first half of 2024 (up 18% from the first half of 2023) will drive robust growth for the full year 2024. 2024 forecasts from the last few months range from 14.4% from the Cowan LRA Model to 20.7% from Statista Market Insights. Our Semiconductor Intelligence (SC-IQ) projection of a 17.0% increase in 2024 is in line with Gartner at 17.4% and WSTS at 16.0%.

The four estimates for 2025 show similar trends – slower but still strong growth ranging from our Semiconductor Intelligence’s 11.0% to Statista’s 15.6%. The growth deceleration from 2024 to 2025 ranges from minus 3.5 percentage points from WSTS (16% to 12.5%) to our minus 6 percentage points (17% to 11%). Our initial projections for 2026 are in the mid-single digits. The momentum from AI and a recovering memory market should taper off by then. The other major end markets (smartphones, PCs and automotive) will probably see flat to low growth in the next couple of years. Barring any significant new growth drivers to boost the market or an economic downturn to depress the market, the outlook for the semiconductor market should remain in the mid-single digits through the end of the decade.

Semiconductor Intelligence is a consulting firm providing market analysis, market insights and company analysis for anyone involved in the semiconductor industry – manufacturers, designers, foundries, suppliers, users or investors. Please contact me if you would like further information.

Bill Jewell
Semiconductor Intelligence, LLC
billjewell@sc-iq.com

Also Read:

Semiconductor CapEx Down in 2024, Up Strongly in 2025

Automotive Semiconductor Market Slowing

2024 Starts Slow, But Primed for Growth