SiC Forum2025 8 Static v3

Podcast EP179: An Expert Panel Discussion on the Move to Chiplets

Podcast EP179: An Expert Panel Discussion on the Move to Chiplets
by Daniel Nenni on 09-01-2023 at 10:00 am

Dan is joined by a panel of experts to discuss chiplets and 2.5/3D design. The panelists are: Saif Alam – Vice President of Engineering at Movellus Inc., Tony Mastroianni Siemens EDA- Advanced Packaging Solutions Director and Craig Bishop – CTO Deca Technologies.

In this spirited and informative discussion the panel explores the move to chiplets. Why it’s happening now and who can benefit from the trend are discussed in detail, along with considerations for ecosystem management. design methodology, the role of standards and addressing the risks associated with this new design style.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


The Incredible Journey of Analog Bits Through the Eyes of Mahesh Tirupattur

The Incredible Journey of Analog Bits Through the Eyes of Mahesh Tirupattur
by Mike Gianfagna on 09-01-2023 at 6:00 am

The Incredible Journey of Analog Bits Through the Eyes of Mahesh Tirupattur

If you’ve designed a chip with analog content (and who hasn’t), you know Analog Bits. Along the way, you likely met Mahesh. If you are a lover of fine wines, you probably know Mahesh quite well. More on that later. I got the opportunity to speak with him recently about what he’s been up to, both now and over the past few years. It’s a story about a love for technology and a love for wine. If you believe that wine is an art form, then the statement “life imitates art” is very relevant to what follows. Read on to learn about the incredible journey of Analog Bits through the eyes of Mahesh Tirupattur.

Wine, and Life Imitating Art

Just like in the high-tech world of Silicon Valley, there are many M&A transactions occurring in Napa Valley and beyond. Private equity firms are acquiring and consolidating many of the large wineries we’ve come to know and love over the years. While the details of these transactions are often not public, I do know a few facts about some of the larger ones, thanks to my love for wine and the connections I’ve made along the way.

The model for several of these deals is quite unique. A private equity firm will acquire a controlling interest in a winery and then essentially do nothing, allowing the original creativity to flow unhindered. The message to the owners is simple – we love your wine and the brand you’ve built. Please continue to do what you do regarding making your product. We’ll worry about the operational details.  And when you’re ready to step down, just call us and we’ll be ready to take over. Until then, keep focused on your passion.

This laissez faire acquisition strategy from the wine industry has found its way to other transactions as well. Case in point being the acquisition of Analog Bits by SEMIFIVE that occurred about a year and half ago. As covered on SemiWiki here, there is a variety of business models and foundry relationships that comprise the combination of these two companies. SEMIFIVE is the pioneer of platform-based SoC design, working with customers to implement innovative ideas into custom silicon in the most efficient way. The company has a close relationship with Samsung Foundry. Analog Bits is the leader in developing and delivering low-power integrated clocking, sensors and interconnect IP that are pervasive in virtually all of today’s semiconductors. The company has not only developed IP on the Samsung process, but it also has a close and growing relationship with TSMC.

One approach (and a common one) would be to combine the operations of both companies into one model with one set of relationships. That would cause a significant ripple effect in one or both of these company’s businesses, and not a good ripple effect. Rather than do that, SEMIFIVE took a page out of the winery acquisition playbook being used in Napa Valley and elsewhere.

Analog Bits continues to operate as an independent entity, but now as part of a larger enterprise. The company continues to do the things it loves to do, providing critical enabling IP that its customers need. Dan Nenni summarized it well in the SemiWiki post:

To me this acquisition is another 1+1=3. SEMIFIVE gets a strong IP base in North America plus foundry and customer relationships that have been silicon proven for 20+ years. Analog Bits gets the ability to scale rapidly and increase the depth and breadth of their IP offering.

I mentioned a connection between Mahesh and wine earlier. It turns out he is quite an accomplished Sommelier as well as a technologist, completing three of the four levels that pave the way to Master Sommelier. While there is still more road ahead for Mahesh to achieve this ultimate title, his progress in the face of also building a very successful IP business is noteworthy. There are 269 Master Sommeliers in the world today. This is truly a rare achievement. Mahesh has also become an expert in the making of Sake, which he claims is far more complex and nuanced than wine.

Perhaps this is the topic of a future blog post or podcast.

The Road Ahead

During my discussions with Mahesh, it was quite clear that he was happy with the outcome of the acquisition. The ability to continue to operate independently, continuing to do what he loves with the backing of a larger enterprise feels good. I can imagine the winemakers that were part of the Napa Valley acquisitions saying the same thing.

He talked about the great position Analog Bits enjoys in the development of purpose-built IP blocks for various high-growth markets. The track record and customer-focused nature of the company make this a great match. Mahesh talked about many new market opportunities. One interesting one is power management and spike detection. With so many cores and power domains in advanced designs, often fueled by AI, power spikes have become a very real liability. Analog Bits is developing on-chip IP to sense and manage these events.

Overall, Analog Bits is becoming more “sticky” for advanced designs thanks to their broad catalog and excellent track record. According to Mahesh, the future is bright and a larger operation at Analog Bits seems likely. And that’s just part of the incredible journey of Analog Bits through the eyes of Mahesh Tirupattur.

 


ISO 21434 for Cybersecurity-Aware SoC Development

ISO 21434 for Cybersecurity-Aware SoC Development
by Kalar Rajendiran on 08-31-2023 at 10:00 am

Cybersecurity agreement in supply chain

The automotive industry is undergoing a remarkable transformation, with vehicles becoming more connected, automated, and reliant on software. While these advancements promise convenience, comfort and efficiency to the consumers, the nature and complexity of the technologies also raise concerns for functional safety and security. The ISO 26262 standard was established for ensuring a systematic approach to functional safety in the automotive industry. This standard provides a comprehensive framework for managing functional safety throughout the entire product development lifecycle, including concept, design, implementation, production, operation, maintenance, and decommissioning. It offers guidance on hazard analysis, risk assessment, safety goals, safety mechanisms, and verification and validation processes to ensure that electronic systems function as intended and maintain safety even in the presence of faults or errors.

The ISO 26262 standard addresses impact to safety due to faults and failures. What about addressing factors such as cybersecurity? The soaring adoption of electronics in the automotive sector has led to a corresponding expansion in the cybersecurity threat landscape. As vehicles become more connected and reliant on software-driven functionality, the attack surface expands significantly. This convergence of technological advancement and risk underscores the critical importance of cybersecurity-aware development practices. Road vehicles rely heavily on communication between components and external systems, making them susceptible to various cyber risks. Over-the-Air (OTA) software updates dramatically increase cybersecurity risks. Hackers could potentially manipulate sensor data, compromise vehicle control systems, or gain unauthorized access to sensitive personal information. The ISO/SAE 21434 Road Vehicles—Cybersecurity Engineering standard was established to address the security challenges posed by cyberthreats to road vehicles.

Synopsys has recently published a whitepaper that delves into the ISO 21434 driven best practices for cybersecurity-aware SoC development. Anyone involved in the development and post-production support of automotive related products and systems would find this whitepaper very informative. Following are some excerpts.

Key Aspects of ISO 21434

The ISO 21434 standard provides a structured approach to identifying, assessing, and mitigating cybersecurity risks throughout the development of automotive products, including components like SoCs. This comprehensive framework builds upon similar principles of ISO 26262 to address the cybersecurity dimension. The alignment between these two standards not only streamlines the integration of cybersecurity practices but also establishes a common vocabulary, ensuring seamless adaptation for organizations already compliant with ISO 26262.

Organizational Responsibilities

ISO 21434 follows in the footsteps of ISO 26262 by delineating roles and responsibilities across various stages of product development. This includes the commitment of executive management, the establishment of standardized roles between suppliers and supply chain entities, the creation of distinct phases within the product life cycle, and the formulation of Threat Analysis and Risk Assessment (TARA) processes equivalent to Hazard Analysis and Risk Assessment (HARA) in ISO 26262.

Cybersecurity Risk Assessment and Management

Cybersecurity hinges on a thorough assessment of a product’s inherent risks and its vulnerabilities when deployed. Four critical factors govern the severity of a cybersecurity risk, enabling an informed approach to risk mitigation. These four key factors are the Threat Scenario, Impact, Attack Vector, and Attack Feasibility. Together, these factors determine the potential harm, enabling a structured evaluation of the risk’s impact and the need for intervention. In essence, the Threat Scenario and its Impact gauge potential damage, the Attack Vector factor maps how an attack could be executed, while the Feasibility factor evaluates the ease of enacting the attack. ISO 21434 offers techniques for calculating the risk score from these four factors and elucidates a structured approach for fostering a proactive stance against cyberattacks.

Security by Design

The Secure Development Lifecycle (SDL) process championed by Microsoft to address cybersecurity permeates all facets of production development. SDL orchestrates a number of measures during the design phase to safeguard products against potential vulnerabilities. At the heart of this phase lies the mandate to generate concrete evidence affirming the integration of secure practices the team has been trained for. This evidence encompasses a spectrum of reviews and metrics, from security design reviews and verification plan assessments to privacy design reviews. Tools such as Synopsys Coverity and Black Duck play pivotal roles, generating code coverage and composition analysis reports. These reports help gauge the codebase’s maturity while flagging vulnerabilities in third-party components.

Collaboration and Communication

In the interconnected world of today’s product development, cybersecurity cannot operate in isolation. A collaborative approach is imperative, demanding a cohesive and cybersecurity-aware handshake between every link in the supply chain. The collaborative mindset guides the development cycles, necessitating an ongoing flow of cybersecurity information among supply chain entities.

Cybersecurity agreement in supply chain

Continuous Monitoring and Updating

Continuously monitoring products to identify known vulnerabilities and updating, ensures cybersecurity from the product’s release to its decommissioning. Post-release support is a focal point in SDL’s continuum. It mandates the specification of requirements for post-production security controls. This meticulous preparation equips the product to navigate the complexities of its operational environment and supply chain.

Summary

Given the surge in electronics adoption into road vehicles and the evolving landscape of cyberattack threats, customers are demanding cybersecurity assurances.  Cybersecurity impacts every level of the automotive supply chain starting with semiconductor SoCs. For component suppliers, embracing standardized cybersecurity principles and processes becomes a strategic imperative to remain competitive in the dynamic automotive market. By adhering to these evolving industry standards, suppliers can not only address the growing cybersecurity concerns but also cater to the mounting customer expectations for robust cybersecurity assurance.

During development of complex SoCs, partnering with an IP supplier with a structured ISO 21434 development platform minimizes cybersecurity risks and ensures highest levels of success. Synopsys develops IP products as per the ISO 21434 standard and rigorously follows cybersecurity policies, processes and procedures as promulgated in the standard. The company deploys cybersecurity teams through all levels of the organization.

Cybersecurity teams through all levels of an organization

For more details, visit Accelerate Your Automotive Innovation page.

You can access the entire whitepaper here.

Also Read:

Key MAC Considerations for the Road to 1.6T Ethernet Success

AMD Puts Synopsys AI Verification Tools to the Test

WEBINAR: Why Rigorous Testing is So Important for PCI Express 6.0


Anomaly Detection Through ML. Innovation in Verification

Anomaly Detection Through ML. Innovation in Verification
by Bernard Murphy on 08-31-2023 at 6:00 am

Assertion based verification only catches problems for which you have written assertions. Is there a complementary approach to find problems you haven’t considered – the unknown unknowns? Paul Cunningham (Senior VP/GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Machine Learning-based Anomaly Detection for Post-silicon Bug Diagnosis. The paper published in the 2013 DATE Conference. The authors are/were from the University of Michigan.

Anomaly detection methods are popular where you can’t pre-characterize what you are looking for, in credit card fraud for example or in real-time security where hacks continue to evolve. The method gathers behaviors over a trial period, manually screened to be considered within expected behavior, then looks for outliers in ongoing testing as potential problems for closer review.

Anomaly detection techniques either use statistical analyses or machine learning. This paper uses machine learning to build a model of expected behavior. You could also easily imagine this analysis being shifted left into pre-silicon verification.

Paul’s view

This month we’ve pulled a paper from 10 years ago on using machine learning to try and automatically root cause bugs in post-silicon validation. It’s a fun read and looks like a great fit for re-visiting again now using DNNs or LLMs.

The authors equate root-causing post-silicon bugs to credit card fraud detection: every signal traced in every clock cycle can be thought of as a credit card transaction, and the problem of root causing a bug becomes analogous to identifying a fraudulent credit card transaction.

The authors’ approach goes as follows: divide up simulations into time slices and track the percent of time each post-silicon traced debug signal is high in each time slice. Then partition the signals based on the module hierarchy, aiming for a module size of around 500 signals. For each module in each time slice train a model of the “expected” distribution of signal %high times using a golden set of bug free post-silicon traces. This model is a very simple k-means clustering of the signals using difference in %high times as the “distance” between two signals.

For each failing post-silicon test, the %high signal distribution for each module in each time slice is compared to the golden model and the number of signals whose %high time is outside the bounding box of its golden model cluster are counted. If this number is over a noise threshold, then those signals in that time slice are flagged as the root cause of the failure.

It’s a cool idea but on the ten OpenSPARC testcases benchmarked, 30% of the tests do not report the correct time slice or signals, which is way too high to be of any practical use. I would love to see what would happen if a modern LLM or DNN was used instead of simple k-means clustering.

Raúl’s view

This is an “early” paper from 2013 using machine learning for post-silicon bug detection. For the time this must have been advanced work listed with 62 citations in Google Scholar.

The idea is straight forward: run a test many times on a post-silicon design and record the results. When intermittent bugs occur, different executions of the same test yield different results, some passing and some failing. Intermittent failures, often due to on-chip asynchronous events and electrical effects, are among the most difficult to diagnose. The authors briefly consider using supervised learning, in particular one-class learning (there is only positive training data available, bugs are rare), but discard it as “not a good match for the application of bug finding”. Instead, they apply k-means clustering; similar results are grouped into k clusters consisting of “close” results minimizing the sum-of-squares distance within clusters. The paper reveals numerous technical details necessary to reproduce the results: Results are recorded as the “fraction of time the signal’s value was one during the time step”; the number of signals from a design, of the order of 10,000, is the dimensionality in k-means clustering which is NP-hard with respect to the number of dimensions, so the number of signals is capped to 500 using principal component analysis; the number of clusters can’t be too small (underfitting) nor too large (overfitting); a proper anomaly detection threshold needs to be picked, expressed as the percentage of the total failing examples under consideration; time localization of a bug is achieved by two-step anomaly detection, identifying which time step presents a sufficient number of anomalies to reveal the occurrence of a bug and then in a second round identifying the responsible bug signals.

Experiments for an OpenSPARC T2 design of about 500M transistors ran 10 workloads of test lengths ranging between 60,000 and 1.2 million cycles 100 times each as training. Then they injected 10 errors and ran 1000 buggy tests. On average 347 signals were detected for a bug (ranging from none to 1000) and it took ~350 cycles of latency from bug injection to bug detection. Number of clusters and detection threshold strongly influence the results, as does the training data quantity. False positives and false negatives added up to 30-40 (in 1000 buggy tests).

Even though the authors observe that “Overall, among the 41,743 signals in the OpenSPARC T2 top-level, the anomaly detection algorithm identified 347, averaged over the bugs. This represents 0.8% of the total signals. Thus, our approach is able to reduce the pool of signals by 99.2%”, in practice this may not be of great help to an experienced designer. 10 years have passed, it would be interesting to repeat this work using today’s machine learning capabilities, for example LLMs for anomaly detection.


RISC-V 64 bit IP for High Performance

RISC-V 64 bit IP for High Performance
by Daniel Payne on 08-30-2023 at 10:00 am

Atrevido min

RISC-V as an Instruction Set Architecture (ISA) has grown quickly in commercial importance and relevance since its release to the open community in 2015, attracting many IP vendors that now provide a variety of RTL cores. Roger Espasa, CEO and Founder of Semidynamics, has presented at RISC-V events on how their IP is customized for compute challenges that require high bandwidth and high performance cores with vector units. Semidynamics was founded in 2016, has Barcelona for the HQ, and already has customers in the US and Asia by offering two customizable RISC-V IPs:

  • Avispado – in-order RISCV64GCV, supporting AXI and CHI
  • Atrevido – out-of-order RISCV64GC, supporting AXI and CHI

A typical CPU has a handful of big cores and large caches, making them easy to program, though not high performance.

GPUs, by contrast, have many tiny cores that provide high performance for parallel code, but are harder to program and add communication latency through the PCIe bus when data needs to be passed back and forth between the CPU and the GPU.

CPU, GPU comparison

The approach at Semidynamics is to use a RISC-V core connected to compute cores which makes it easy to program, higher performance for parallel codes and offering zero communication latency. CPU plus vector unit provides the best of both worlds.

CPU plus Vector unit

The RISC-V specification documents 32 vector registers, and you can add a number of vector cores, along with a connection to your cache inside a vector unit.

Vector Unit

With Semidynamics IP you can customize the number of Vector Cores: 4, 8, 16, 32. Another way to look at this is to note that 4 Vector Cores is 256-bit, up to 32 Vector Cores which is 2,048-bit.

IP users also choose which data types: FP64, FP32, FP16, BF16, INT64, INT32, INT16, INT8. For an AI application they may choose data types of FP16, BF16, while an HPC application could select FP64, FP32.

The third customization is the Vector Register Length, where for more performance and lower power you can make the vector register bigger than the vector unit.

Here’s the block diagram of the Atrevideo 423-V8:

Atrevido 423 + V8 Vector Unit

The vector unit is fully out of order, which is unique among RISC-V IP vendors. The combination of the vector unit plus Gazzillion unit are capable of streaming data at over 60 Bytes/cycles.

High Bandwidth: Vector + Gazzillion

The purple line shows the Read performance and in the L1 Cache it’s 20-60 bytes/cycle, other machines show a rapid drop in bandwidth after leaving L1 Cache, while this approach keeps going, with a flattening at 56. Even going to DDR memory shows a bandwidth of 40. With a clock rate of 1.0GHz that makes 40 GB/s bandwidth.

IP customers can even add their own RTL code connected to the Vector Unit for their own purposes.

Performance of matrix multiplication is important in AI workloads, and on the OOO V8 Vector Unit there’s a peak of 16 FP64 FLOPS/cycle, and a 99% of peak for a matrix size >= 400. For a small matrix size of 24×24 the performance is 7 FP64 FLOPS/cycle, or 50% of peak. Matrix multiplication for FP16 using a Vector Unit with 8 vector cores has a peak of 64 FP16 FLOPS/cycle, and 99% of peak for M >= 600.

A real-time object detection benchmark called YOLO (You Only Look Once) was run on the Atrevido 423-V8 platform, and it showed a 58% higher performance per vector core than competitors. These results were for video with 24 layers. 5.56 Gops/frame and about 9M parameters.

YOLO Comparison

Summary

Choosing a RISC-V IP vendor is a complicated task, so knowing about vendors like Semidynamics can help you better understand how a customized approach could most efficiently run your specific workloads. With Semidynamics you get to choose between architectural choices like in-order or out-of-order, with or without vector units. The reported numbers from this IP vendor look promising, and I look forward to their future announcements.

Related Videos

Also Read:

Deeper RISC-V pipeline plows through vector-scalar loops

RISC-V Summit Buzz – Semidynamics Founder and CEO Roger Espasa Introduces Extreme Customization

Configurable RISC-V core sidesteps cache misses with 128 fetches


Modeling EUV Stochastic Defects with Secondary Electron Blur

Modeling EUV Stochastic Defects with Secondary Electron Blur
by Fred Chen on 08-30-2023 at 8:00 am

Modeling EUV Stochastic Defects With Secondary Electron Blur

Extreme ultraviolet (EUV) lithography is often represented as benefiting from the 13.5 nm wavelength (actually it is a range of wavelengths, mostly ~13.2-13.8 nm), when actually it works through the action of secondary electrons, electrons released by photoelectrons which are themselves released from ionization by absorbed EUV (~90-94 eV) photons. The photons are not only absorbed in the photoresist film but also in the layers underneath. The released electrons migrate varying distances from the point of absorption, losing energy in the process.

These migration distances can go over 10 nm [1-2]. Consequently, images formed by EUV lithography are subject to an effect known as blur. Blur can be most basically understood as the reduction of the difference between the minimum and maximum chemical response of the photoresist. Blur is often modeled through a Gaussian function convolved with the original optical image [3-4].

In such modeling, however, it is often neglected to mention that the blur scale length, often referred as sigma, is not a fundamentally fixed number, but belongs to a distribution [5]. This is consistent with the fact that the higher EUV dose leads to a larger observed blur [2,5]. More electrons released allows a larger range of distances traveled [2,6]. Note that pure chemical blur from diffusion does not have the same dose dependence [3,7].

It was recently demonstrated that secondary electron blur increasing with dose can lead to the observed stochastic defects in EUV lithography [8]. The higher dose leads to a wider allowed range of blur.

Local base blur range at different doses, taken at different probabilities from the base blur probability distribution.

The simulation model combines three stages of random number generation: (1) photon absorption, (2) secondary electron yield, and (3) electron dose-dependent blur range. Unexposed stochastic defects are dominant at low doses where there are too few photons absorbed. Exposed stochastic defects are dominant at higher doses where the rare (e.g., probability ~ 1e-8) ultrahigh (>10 nm) blur promotes too much secondary electron exposure near the threshold value for printing.

Higher blur makes it easier for smaller stochastic dose variations to cross the printing threshold, enabling exposed or unexposed defects.

One consequence of both insufficient low photon absorption and dose-increased blur causing defects is the emergence of a floor or valley for stochastic defects, preventing them from being absent entirely.

At lower dose or exposed CD there tend to be unexposed defects, while at higher dose or exposed CD there tend to be exposed defects. This results in a floor or valley for stochastic defect occurrence.

Another way to interpret the defect floor or valley is that the enlarged blur range at low enough probability increases the entropy significantly and damages the image across all possible printing thresholds.

With the much larger blur range at low enough probabilities (1e-9 in this example), there is significant entropy in the image and the image is damaged regardless of printing threshold. At more commonly observed probabilities (e.g., 1e-1), the image preserves its usual appearance. Note: the raw pixel images were smoothed for better visualization.

It is therefore very risky to not include dose-dependent secondary electron blur ranges in any model for EUV lithography image or defect formation.

References

[1] I. Bespalov, “Key Role of Very Low Energy Electrons in Tin-Based Molecular Resists for Extreme Ultraviolet Nanolithography,” ACS Appl. Mater. Interfaces 12, 9881 (2020).

[2] S. Grzeskowiak et al., “Measuring Secondary Electron Blur,” Proc. SPIE 10960, 1096007 (2019).

[3] D. Van Steenwinckel et al., “Lithographic Importance of Acid Diffusion in Chemically Amplified Resists,” Proc. SPIE 5753, 269 (2005).

[4] T. Brunner et al., “Impact of resist blur on MEF, OPC, and CD control,” Proc. SPIE 5377, 141 (2004).

[5] A. Narasimhan et al., “Studying secondary electron behavior in EUV resists using experimentation and modeling,” Proc. SPIE 942, 942208 (2015).

[6] M. Kotera et al., “Extreme Ultraviolet Lithography Simulation by Tracing Photoelectron Trajectories in Resist, Jpn. J. Appl. Phys. 47, 4944 (2008).

[7] M. Yoshii et al., “Influence of resist blur on resolution of hyper-NA immersion lithography beyond 45-nm half-pitch,” J. Micro/Nanolith. MEMS MOEMS 8, 013003 (2009).

[8] F. Chen, “EUV Stochastic Defects from Secondary Electron Blur Increasing With Dose,” https://www.youtube.com/watch?v=Q169SHHRvXE, 8/20/2023.

This article first appeared in LinkedIn Pulse: Modeling EUV Stochastic Defects With Secondary Electron Blur

Also Read:

Enhanced Stochastic Imaging in High-NA EUV Lithography

Application-Specific Lithography: Via Separation for 5nm and Beyond

ASML Update SEMICON West 2023


Arm Inches Up the Infrastructure Value Chain

Arm Inches Up the Infrastructure Value Chain
by Bernard Murphy on 08-30-2023 at 6:00 am

Arm just revealed at HotChips their compute subsystems (CSS) direction led by CSS N2. The intent behind CSS is to provide pre-integrated, optimized and validated subsystems to accelerate time to market for infrastructure system builders. Think HPC servers, wireless infrastructure, big edge systems for industry, city, enterprise automation. This for me answers how Arm can add more value to system developers without becoming a chip company. They know their technology better than anyone else; by providing pre-designed, optimized and validated subsytems – cores, coherent interconnect, interrupt, memory management and I/O interfaces, together with SystemReady validation – they can chop a big chunk out of the total system development cycle.

Accelerating Custom Silicon

A completely custom design around core, interconnect, and other IPs obviously provides maximum flexibility and ability to differentiate but at a cost. That cost isn’t only in development but also in time to deployment. Time is becoming a very critical factor in fast moving markets – just look at AI and the changes it is driving in hyperscaler datacenters. I have to believe current economic uncertainties compound these concerns.

Those pressures are likely forcing an emphasis on differentiating only where essential and standardizing everywhere else, especially when proven experts can take care of a big core component. CSS provides a very standard yet configurable subsystem for many-core compute, include N2 cores (in this case), the coherent mesh network between those cores, together with interrupt and memory management, cache hierarchy, chiplet support through UCIe or custom interfaces, DDR5/LPDDR5 external memory interface, PCIe/CXL Gen5 for fast IO and or coherent IO, expansion IO, and system management.

All PPA optimized for an advanced 5nm TSMC process and proven SystemReady® with a reference software stack. The system developer still has plenty of scope for differentiation through added accelerators, specialized compute, their own power management, etc.

Neoverse V2

Arm also announced a next step in the Neoverse V-series, unsurprisingly improved over the V1 version with improved integer performance and reduction in system level cache misses. There is improvement on a variety of other benchmarks also.

Also noteworthy is its performance in the NVIDIA Grace-Hopper combo (based on Neoverse V2). NVIDIA shared real hardware data with Arm on performance versus Intel Sapphire Rapids and AMD Genoa. In raw performance the Grace CPU was mostly at par with AMD and generally faster than Sapphire Rapids by 30-40%.

Most striking for me was their calculation for a datacenter limited to 5MW, important because all datacenters are ultimately power limited. In this case Grace bested AMD in performance by between 70% and 150% and was far ahead of Intel.

Net value

First on Neoverse’s contribution to Grace-Hopper – wow. That system is at the center of the tech universe right now, thanks to AI in general and large language models in particular. This is an incredible reference. Second, while I’m sure that Intel and AMD can deliver better peak performance than Arm-based systems, and Grace-Hopper workloads are somewhat specialized, (a) most workloads don’t need high end performance and (b) AI is getting into everything now. It is becoming increasingly difficult to make a case that, for cost and sustainability over a complete datacenter, Arm-based systems shouldn’t play a much bigger role especially as expense budgets tighten.

For CSS-N2, based on their own analysis Arm estimates up to 80 engineering years of effort required to develop the CSS N2 level of integration, a number that existing customers confirm is in the right ballpark. In an engineer-constrained environment, this is 80 engineering years they can drop from their program cost and schedule without compromising whatever secret differentiation the want to add around the compute core.

These look like very logical next steps for Arm in their Neoverse product line. Faster performance in the V-series and let customers take advantage of Arm’s own experience and expertise in building N2-based compute systems, while leaving open lots of room for adding their own special sauce. You can read the press release HERE.


Visit with Easy-Logic at #60DAC

Visit with Easy-Logic at #60DAC
by Daniel Payne on 08-29-2023 at 10:00 am

Easy-Logic at #60DAC

I had read a little about Easy-Logic before #60DAC, so this meeting on Wednesday in Moscone West was my first in-person meeting with Jimmy Chen and Kager Tsai to learn about their EDA tools and where they fit into the overall IC design flow. A Functional Engineering Change Order (ECO) is a way to revise an IC design by updating the smallest portion of the circuit, avoiding a complete re-design. An ECO can happen quite late in the design stage, causing project delays or even failures, so minimizing this risk and reducing the time for an ECO is an important goal, one that Easy-Logic has productized in a tool called EasylogicECO.

Easy-Logic at #60DAC

This EDA tool flow diagram shows each place where EasylogicECO fits in with logic synthesis, DFT, low power insertion, Place & Route, IC layout and tape-out.

EasylogicECO tool flow

Let’s say that your engineering team is coding RTL and they find a bug late in the design cycle, they could make an RTL change and then use the EasylogicECO tool to compare the differences between the two RTL versions, and then implement the ECO changes, where the output is an ECO netlist and the commands to control the Place & Route tools from Cadence or Synopsys.

Another usage example for EasylogicECO is post tape-out where a bug is found or the spec changes, and then you want to do a metal-only ECO change in order to keep mask costs lower.

Easy-Logic is a 10 year old company, based in Hong Kong, and their EasylogicECO tool came out about 5-6 years ago. Most of their customers are in Asia and the names have been kept private, although there are quotes from several companies, like: Sitronix, Phytium, Chipone, Loongson Technology, ASPEED and Erisedtek. Users have designed products in industries for cell phone, HPC, networking, AI, servers, and high-end segments.

EasylogicECO is being used mostly on the advanced nodes, such as 7nm and 10nm, where design sizes can be 5 million instances per block, and functional ECOs are used at the module and block levels. Their tool isn’t really replacing other EDA tools, rather it fits neatly into existing EDA tool flows as shown above. Both Unix and Linux boxes run EasylogicECO, and the run times really depend on the complexity of the design changes. With a traditional methodology it could take 5 days to update a block with 5 million instances, but now with the Easy-Logic approach it can take only 12 hours. This methodology aims to make the smallest patch in the shortest amount of time.

Easy-Logic works at the RTL level. After logic synthesis you basically lose the design hierarchy, which makes it hard to do an ECO. Patents have been issued for the unique approach that EasylogicECO takes by staying at the RTL level.

Engineering teams can quickly evaluate within a day or two this approach from Easy-Logic. They’ve made the tool quite easy to use, so there’s a quick learning curve, as your inputs are just the original RTL, the revised RTL, the original netlist, the synthesized netlist of the revised RTL, and a library.

With 50 people in the company, you can contact an office in Hong Kong, San Jose, Beijing or Taiwan. 2023 was the first year at DAC for the company. Engineers can use this new ECO approach in four use cases:

  • Functional ECO
  • Low power ECO
  • Scan chain ECO
  • Metal ECO

Summary

SoC design is a very challenging approach to product development where time is money, and making last-minute changes like ECOs can make or break the success of a project. Easy-Logic has created a methodology to drastically shorten the time it takes for an ECO, while staying at the RTL level. I expect to see high interest in their EasylogicECO tool this year, and more customer success stories by next DAC in 2024.

Related Blogs

Key MAC Considerations for the Road to 1.6T Ethernet Success

Key MAC Considerations for the Road to 1.6T Ethernet Success
by Kalar Rajendiran on 08-29-2023 at 6:00 am

The World of Ethernet is Gigantic and Growing

Ethernet’s continual adaptation to meet the demands of a data-rich, interconnected world can be credited to the two axes along which its evolution has been propelled. The first axis emphasizes Ethernet’s role in enabling precise and reliable control over interconnected systems. As industries embrace automation and IoT, Ethernet facilitates real-time monitoring, seamless communication, and deterministic behavior, fostering a new era of industrial and infrastructure advancements. The second axis underscores Ethernet’s capacity to handle the burgeoning volumes of data generated by modern applications. From cloud computing to AI-driven analytics, Ethernet serves as the backbone for data movement, storage, and deep analysis, accelerating insights and innovation across diverse domains. The next speed milestone in ethernet’s evolution is 1.6T and this transformative leap requires a meticulous approach to meet the requirements along both of the above axes.

The advent of 1.6T Ethernet heralds a new era of connectivity, one where data-intensive applications will seamlessly coexist with latency-sensitive demands. Through the convergence of 224G SerDes technology, flexible and configurable MAC and PCS IP developments, and optimized silicon architectures, the networking industry can deliver solutions that not only meet but exceed the requirements of 1.6T ethernet systems. This is the context of a Synopsys-sponsored webinar where Jon Ames and John Swanson spotlighted the focus areas of design for achieving efficiency and delivering performance.

Key Considerations for 1.6T Ethernet Success

At the heart of the Ethernet subsystem are the application and transmit/receive (Tx/Rx) queues. Application queues handle data coming from applications and services running on network-connected devices. These queues manage the flow of data into the Ethernet subsystem for transmission. The Tx/Rx queues manage the movement of packets between the Media Access Control (MAC) layer and the PHY layer for transmission and reception, respectively. Efficient queue management ensures optimal data flow and minimizes latency. Scalability, flexibility, efficient packet handling, streamlined error handling, low latency, support for emerging protocols, energy efficiency, forward error correction (FEC) optimization, security and data integrity, interoperability and compliance are all key considerations in an Ethernet subsystem.

The MAC layer is responsible for frame formatting, addressing, error handling, and flow control. It manages the transmission and reception of Ethernet frames and interacts with the PHY layer to control frame transmission timings. Timing considerations are crucial to ensure proper communication between the PHY and MAC layers, especially at high speeds.

The Physical Coding Sublayer (PCS) is responsible for encoding and decoding data for transmission and reception. It interfaces between the MAC layer and the PMA/PMD layer. The PCS manages functions like data scrambling, error detection, and link synchronization. It prepares data from the MAC layer for transmission through the PMA/PMD layer.

The PMA (Physical Medium Attachment), PMD (Physical Medium Dependent), and PHY (Physical Layer) collectively handle the physical transmission of data over the network medium, be it copper cables or optical fibers. The PMA/PMD layer performs functions like clock and data recovery, signal conditioning, and modulation. The PHY layer manages signal transmission, equalization, and error correction to ensure reliable data transfer at high speeds.

The synergy between cutting-edge 224G SerDes technology and the development of innovative MAC and PCS IP is poised to redefine the accessibility and scalability of 1.6T Ethernet. These components play a pivotal role in the realization of off-the-shelf solutions that seamlessly align with forthcoming 1.6T Ethernet standards. The 224G SerDes technology offers the crucial physical layer connectivity required to sustain the high data rates demanded by 1.6T Ethernet. Achieving successful communication at high data rates requires close coordination between the PHY and MAC layers, accurate timing synchronization, and the implementation of effective error correction techniques. These factors will collectively contribute to the reliability, efficiency, and performance of 1.6T Ethernet networks.

Synopsys Solutions

Synopsys MAC, PCS, and 224G SerDes IP solutions come with pre-verified and optimized designs. This means that the IP has already undergone rigorous testing and validation, reducing the need for extensive in-house verification efforts. This accelerates the development process by providing a reliable foundation to build upon. The IP solutions are designed to comply with IEEE 802.3 Ethernet standards and ensure interoperability and compatibility with a wide range of devices and network configurations. Designers can rely on the IP’s adherence to these standards, saving time that would otherwise be spent on custom protocol implementation. The solutions often come with configurability options. This enables designers to tailor the IP to their specific application requirements without having to build everything from scratch. This configurability streamlines the design process and reduces the need for extensive manual modifications.

Summary

As the race toward 1.6T Ethernet intensifies, the development of silicon solutions capable of delivering optimized power efficiency and minimal silicon footprint becomes paramount. To harness the capabilities of 1.6T Ethernet without compromising on energy consumption and design complexity, engineers must craft architectures that seamlessly merge efficiency with innovation. This involves meticulous digital design, ensuring that the intricate interaction between hardware components and software layers is harmonious, thereby producing networking solutions that are both efficient and robust and help accelerate first pass silicon success.

For more details, visit the Synopsys Ethernet IP Solutions page.

You can watch the entire webinar on-demand from here.

Also Read:

WEBINAR: Why Rigorous Testing is So Important for PCI Express 6.0

Next-Gen AI Engine for Intelligent Vision Applications

VC Formal Enabled QED Proofs on a RISC-V Core


Systematic RISC-V architecture analysis and optimization

Systematic RISC-V architecture analysis and optimization
by Don Dingee on 08-28-2023 at 10:00 am

RISC V architecture analysis and optimization chain

The RISC-V movement has taken off so quickly because of the wide range of choices it offers designers. However, massive flexibility creates its own challenges. One is how to analyze, optimize, and verify an unproven RISC-V core design with potential microarchitecture changes allowed within the bounds of the specification. S2C, best known for its FPGA-based prototyping technology, gave an update at #60DAC into its emerging systematic RISC-V architecture analysis and optimization strategy, adding modeling and emulation capability.

Three phases to RISC-V architecture analysis

RISC-V differs from other processor architectures in how much customization is possible – from execution unit and pipeline configurations all the way to adding customized instructions. Developers are exploring the best fits of various RISC-V configurations in many applications, where some definitions are still ambiguous. EDA support has yet to catch up; basic tools exist, but few advanced modeling platforms are available.

These conditions leave teams in a problem: if they extend the RISC-V instruction set for their implementation, they must create new cycle-accurate models for those instructions before assessing performance, simulated or emulated. S2C is working to fill this void with a complete chain for systematic RISC-V architecture analysis and optimization featuring one familiar technology flanked by two others.

First in the chain is S2C’s new RISC-V “core master” model abstraction platform, Genesis. It provides stochastic modeling, system architecture modeling, and cycle-accurate modeling, with increasing levels of accuracy as models add fidelity. Genesis allows the simulation of commercially available RISC-V cores as IP modules, then updating parameters or adding custom logic to the microarchitecture. These simulations enable earlier optimization of cores.

Holding the middle of the analysis chain is the S2C Prodigy prototyping family, facilitating FPGA-based prototypes for hardware logic debugging, basic performance assessment, and early software development. Prodigy prototyping hardware also accepts off-the-shelf I/O modules developed by S2C for stimulus and consumption of real-world signals around the periphery of the SoC, as well as RISC-V IP performance verification.

 

New emulation capability comes with S2C’s OmniArk hybrid emulation system, capable of hyper-scale verification of RISC-V SoCs. OmniArk specializes in compiling automotive SoCs and boasts powerful debugging capabilities for an efficient verification environment. It scales up to 1 billion gates for large designs and supports verification modes like QEMU, TBA, and ICE.

An example: collaboration on the XiangShan RISC-V core project

Accurate behavioral models of RISC-V cores carry through early modeling, FPGA-based prototyping, and hardware emulation processes. Giving designers better control of both IP and models enables tasks once only possible in hardware prototypes to shift into virtual analysis activities earlier in the design cycle, creating more opportunities for optimization.

An example of systematic RISC-V architecture analysis and optimization is in S2C’s collaboration with the XiangShan project team based at the Chinese Academy of Sciences. XiangShan is a superscalar, six-wide, out-of-order RISC-V implementation targeting a Linux variant for its operating system.

The XiangShan team used S2C products to create a core verification platform integrated with an external GPU and other peripherals. The hyperscale core partitions into an S2C FPGA-based prototyping platform, with peripherals added via PCIe and other interfaces.

“As RISC-V technology has penetrated various fields, its open-source, conciseness, and high scalability are redefining the future of computing,” says Ying J. Chen, Vice President at S2C. “S2C’s three major product lines can provide various solutions like software performance evaluation for microarchitecture analysis, system integration, and specification compliance testing based on RISC-V.”

We expect more details soon from S2C on how the systematic RISC-V architecture analysis and optimization chain come together with upcoming US product announcements – for now, S2C’s Chinese language site has some information on Genesis. More details on the XiangShan RISC-V project are available from tutorials given at ASPLOS’23.

Also Read:

Sirius Wireless Partners with S2C on Wi-Fi6/BT RF IP Verification System for Finer Chip Design

S2C Accelerates Development Timeline of Bluetooth LE Audio SoC

S2C Helps Client to Achieve High-Performance Secure GPU Chip Verification