SNPS1670747138 DAC 2025 800x100px HRes

The Immensity of Software Development and the Challenges of Debugging Series (Part 2 of 4)

The Immensity of Software Development and the Challenges of Debugging Series (Part 2 of 4)
by Lauro Rizzatti on 09-25-2024 at 10:00 am

Immensity of SW development Part 2 Fig 1

Part 2 of this 4-part series reviews the role of virtual prototypes as stand-alone tools and their use in hybrid emulation for early software validation, a practice known as the “shift-left” methodology. It assesses the differences among these approaches, focusing on their pros and cons.

The Immensity of Software Development the Challenges of Debugging (Part 1 of 4)

Virtual Prototyping and Hybrid Emulation for Early Software Validation

Debugging a software stack for complex System-on-Chip (SoC) designs is a highly iterative process, often requiring the execution of trillions of verification cycles even before end-user applications are considered. For instance, booting Android can take anywhere from 20 seconds to one minute at 1 GHz, depending on the version and end-application, which equates to at least 20 billion cycles. Rebooting this mobile OS ten times a day consumes at least 200 billion cycles. When end-user workloads are factored in, the challenge intensifies, adding more trillions of cycles.

Given the immensity of this task, an effective methodology is essential to support software development teams. Various electronic design automation (EDA) tools and methods are available to address the challenges in response to the exponentially growing demand for processing cycles as one progresses up the software stack.

Hardware Description Language (HDL) Simulators: Viable for Bare-metal Software Validation

At the bottom of the stack, traditional hardware description language (HDL) simulators can be used for bare-metal software validation. However, HDL simulators do not scale with the increasing complexity of hardware designs and the related software complexity. For example, simulating a 100 million gate-equivalent design may advance at a rate of one cycle per second or slower, approximately six orders of magnitude lower than the throughput achievable with hardware-assisted verification engines. This significantly restricts their deployment to block-level or small IP designs, already during the early stages of the verification cycle.

Virtual-based and Hardware-assisted Platforms: Necessary for Early SW Bring-up on SoCs

To meet the demanding requirements of software validation on SoC designs, two primary approaches are prevalent:

  1. Virtual Prototypes: Extensively used for early software development, testing and validation when Register Transfer Level (RTL) code is still under development
  2. Hardware-Assisted Verification (HAV) Platforms: Valuable when the RTL is completed, or at least when the necessary hardware blocks and subsystems for software development are available.

Virtual prototypes and HAV platforms possess the processing power to execute billions of verification cycles within feasible timeframes, enabling multiple iterations per hour–or, for very large designs, per day.

Each of these approaches serves different objectives at different stages of the design verification process, providing a balanced solution to the growing challenges in hardware and software development.

Virtual Prototyping: Effective Methodology for Early Software Validation

A virtual prototype is a fully functional software model, typically described in C/C++, that represents an electronic system under development. It can model anything from a single SoC to a multi-die system or an entire electronic system, such as a network of electronic control units (ECUs) in a vehicle. Unlike an RTL model of equivalent functionality, a virtual prototype is considerably slimmer and operates at much higher speeds, often approaching real-world performance. It includes a fast processor model (instruction-set simulator) along with memory and register-accurate peripheral models, all integrated into a binary-compatible representation of the entire system.

Virtual prototypes are ideal platforms for validating the entire, unmodified software stack from bare-metal software to firmware, device drivers, operating systems, middleware, and applications. Their key advantage lies in their early availability during the design cycle, preceding RTL finalization. This early availability enables software engineers to start development in parallel with, or even ahead of, hardware development.

In addition, virtual prototypes can be shared with not only the internal development team but also with technology partners and key customers early in the design process. This collaborative approach allows iterative feedback, enabling external stakeholders to contribute to fine tuning the final product.

Ultimately, virtual prototypes enable software validation before RTL is available, streamline the development process, and accelerate tape out, resulting in reduced design costs. (See figure 1.)

Figure 1. Virtual Prototypes accelerate time-to-market with higher quality and fewer resources. Source: Synopsys
Virtual Prototyping Commercial Offerings

Today, several electronic design automation (EDA) companies offer virtual prototyping solutions that include libraries of components to address a broad range of design possibilities. The comprehensive nature of these libraries significantly decreases adoption costs for end users, as the modeling effort required for their specific virtual prototypes is substantially lower when most virtual models are readily available off-the-shelf. These solutions also feature robust support for design visualization and software debugging tools, enhancing the users’ ability to develop, analyze, debug, and optimize the software stacks driving their SoC designs effectively.

IP companies play a crucial role in the virtual prototyping ecosystem by supplying models of their IP. For early software validation, models of processor IPs and peripheral IPs are the most relevant.

Some of the leading providers of virtual prototypes include:

Arm

Arm offers a comprehensive catalog of Arm Fast Models (AFM), covering a wide range of processors and system IP. This comprises models for all Cortex processors, Neoverse processors, and various system IP components such as interconnects, memory management units, and peripherals.

RISC-V Virtual Models from IP providers

RISC-V instruction set simulators (ISS) for virtual platforms are available for various RISC-V variants, provided by research institutions and commercial processor IP vendors such as Synopsys, SiFive, Andes or OpenHW. These processor models are primarily based on two technologies, Imperas[1] Fast Processor Models (ImperasFPM) and QEMU (Quick Emulator).

Synopsys RISC-V Models

Synopsys offers the most comprehensive model library of IP Model Library, which includes Synopsys Interface IP, ARC processor IP and leading third-party embedded processor IP. Among them are Imperas Fast Processor Models (ImperasFPM) that cover the complete range of RISC-V standard extensions as well as vendor-specific extensions. Other processors models embrace those from Andes, Codasip, Imagination, Intel, lowRISC, Microsemi, MIPS, NSI-TEXE, OpenHW Group, SiFive, and Tenstorrent. In addition to their use in virtual platforms, these models are used in tools for design verification, as well as for analyzing and profiling RISC-V processors.

Quick Emulator (QEMU)

QEMU is an open-source framework used to create virtual prototypes of complete SoCs, comprising processors and their peripheral models. Open-source initiatives, such as those from Lenovo, are releasing processor models in QEMU for selected processors, such as the Arm Cortex-A series. Additionally, other processor companies are supporting the open-source ecosystem by enabling the creation of QEMU models for their RISC-V cores.

Virtualizer Development Kits (VDKs)

One of the critical barriers to deploying virtual prototypes is the rapid assembly and packaging for easy distribution. The Synopsys Virtualizer tool suite addresses this challenge by providing a large library of IP models, streamlining the creation of virtual prototypes. Teams can further extend and customize these prototypes with their own peripheral models, guided by the tool.

These prototypes can then be packaged as Virtualizer Development Kits (VDKs), which include a suite of efficient debug and analysis tools, as well as sample software. Notably, VDKs integrate seamlessly with leading embedded software debuggers, enabling software bring-up without requiring changes to existing development workflows. VDKs can represent systems ranging from a processor core and SoC to hardware boards and electronic systems, such as a network of electronic control units (ECUs) in a vehicle.

Additionally, Synopsys offers design services to help users fully leverage the capabilities of their VDKs, ensuring they maximize the potential of their virtual prototypes.

Hybrid Emulation: Bridging the Gap for Software Bring-up and Hardware Verification

The hierarchical design process of a SoC begins with a high-level description that assists the design team with multiple tasks. In the initial phase, the high-level description captures the design specifications and intent, enabling early validation of these tasks. Concurrently, it supports early software development through virtual prototyping.

As the design creation process progresses, the high-level description is decomposed into sub-systems and individual blocks, with details added through RTL descriptions. At this stage, co-simulating high-level description of design code and RTL code hinders the speed of execution of virtual prototypes because of the increased complexity of the RTL.

The solution comes through Hybrid Emulation, which leverages the strengths of both virtual prototyping and hardware emulation, creating a powerful synergy that enhances the design verification process.

  • Virtual Prototyping excels at high-speed execution, allowing unmodified production software to run on a virtualized version of the target hardware, enabling early software bring-up.
  • Hardware Emulation achieves cycle-accurate and fast execution of the RTL blocks and sub-systems, providing a detailed and accurate environment for hardware verification.

While both techniques offer significant speed advantages when deployed in their intended targets, integrating them presents challenges at the interface level where communication between virtual prototypes and hardware emulation occurs. This interface can often become a bottleneck, reducing the overall performance of the hybrid system.

To overcome this, the industry has established the Accellera’s SCE-MI (Standard Co-Emulation Modeling Interface) standard. This standard defines efficient protocols for data exchange between the virtual prototype and the emulated hardware, ensuring reliable, high-speed communication necessary to optimize the hybrid emulation process.

Hybrid Emulation Enables the Shift-Left Verification Methodology

Hybrid emulation enables the concurrent verification of hardware and validation of software by harnessing the fast performance of virtual machines with the accuracy of hardware emulation. The approach speeds up the entire verification and validation process, often referred to as “shift-left” methodology for moving it upstream on the process timeline. Yet, it comes with caveats.

In this hybrid environment, two separate runtime domains arise, each tailored to one of the two tools.

  • High-Performance Virtual Prototype: This domain can reach performance of hundreds of million instructions per second (MIPS). The fast environment allows tasks like booting a Linux OS in just a few minutes, compared to over an hour in traditional emulation.
  • Cycle-Accurate RTL Emulation: The RTL emulation domain performs at a speed of a few megahertz equating to few MIPS. Although slower, the emulator provides cycle-accurate hardware verification.

The performance disparity between the virtual machine environment and the RTL section can offer speed advantages, particularly for software-bound systems where software execution dictates overall performance. In such cases, the virtual machine can advance rapidly, significantly boosting the speed of software development and testing. Conversely, in hardware-bound systems, where the performance of the RTL code is the limiting factor, the overall speed improvement is minimal. Given that many software development tasks are software-bound, hybrid emulation proves to be highly beneficial for supporting software teams. See figure 2.

Figure 2. Acceleration of software bring-up and hardware-assisted verification via Hybrid Emulation (Shift-left Methodology). Source: Synopsys
Conclusions

Virtual prototyping and hybrid emulation are essential tools for efficient and effective development and debug of the software stack in modern SoC designs. When combined, they form a strategy that embodies the concept of ‘Shift Left’, a shift away from the traditional sequential hardware and software development.

By supporting a unified workflow, the “Shift-Left” methodology encourages close collaboration between hardware and software teams, enabling them to work in parallel, share critical insights, and complement each other’s efforts. This convergence speeds up the verification cycle, uncovers integration issues early on, enhances efficiency, and significantly expedites the time-to-market for SOC development.

[1] Imperas was acquired by Synopsys in 2023.

Also Read

The Immensity of Software Development the Challenges of Debugging (Part 1 of 4)


Safety Grading in DNNs. Innovation in Verification

Safety Grading in DNNs. Innovation in Verification
by Bernard Murphy on 09-25-2024 at 6:00 am

Innovation New

How do you measure safety for a DNN? There is no obvious way to screen for a subset of safety-critical nodes in the systolic array at the heart of DNNs. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Toward Functional Safety of Systolic Array-Based Deep Learning Hardware Accelerators. This was published in IEEE Trans on VLSI Systems. The authors are from Intel and UT Dallas. The paper has 49 citations (and still climbing!).

Think FMEDA has safety compliance locked down? Not so in AI accelerators, where what constitutes a safety-critical error in the hardware cannot be decoupled from the model running on the hardware. Research here is quite new but already it is clear that the rulebook for faulting, measurement, and ultimately safety mitigation must be re-thought for this class of hardware.

There are multiple recent papers in this field, some of which we may review in later blogs. This paper is a start, taking one view of what errors mean in this context (misclassification) and what tests should be run (a subset of representative test images). On the second point, the authors provide important suggestions on how to trim the test set to a level amenable to repeated re-analysis in realistic design schedules.

Paul’s view

Intriguing paper this month: how do you check that an autonomous drive AI accelerator in a car is free of hardware faults? This can be done at manufacturing time with scan chains and test patterns, but what about faults occurring later during the lifetime of the car? One way is to have duplicate AI hardware (known in the industry as “dual lock step”) and continuously compare outputs to check they are the same. But of course this literally doubles the cost, and AI accelerators are not cheap. They also consume a lot of power so dual lockstep drains the battery faster. Another is built-in-self-test (BIST) logic, which can be run each time the car is turned on. This is a good practical solution.

This paper proposes using a special set of test images that are carefully selected to be edge cases for the AI hardware to correctly classify. These test images can be run at power on and checked for correct classification, giving a BIST-like confidence but without needing the overhead of BIST logic. Unfortunately, the authors don’t give any direct comparisons between their method and commercial BIST approaches, but they do clearly explain their method and credibly argue that with only a very small number of test images, it is possible to detect almost all stuck-at faults in a 256×256 MAC AI accelerator.

The authors propose two methods to pick the edge case images. The second method is by far the best and is also easy to explain: the final layer of a neural network used to classify images has one neuron for each object type being classified (e.g. person, car, road sign, …). Each of these neurons outputs a numerical confidence score (0 to 1) that the image contains that object. The neuron with the highest confidence score wins. The authors sort all the training images by the max confidence score of all their neuron outputs and then use the n images with the lowest max confidence scores as their edge case BIST images. They present results on 3 different open-source image classification benchmarks. Injecting faults into the AI accelerator that cause it to have a 5% mis-classification rate, using 10 randomly selected images for BIST achieves only 25% fault coverage. Using their edge-case selection method to pick 10 images gives 100% fault coverage. Intuitive and effective result.

Raúl’s view

In March, we blogged about SiFI-AI, which simulated transient faults in DNN accelerators using fast AI inference and cycle-accurate RTL simulation. The results showed a “low” error probability of 2-8%, confirming the resilience of DNNs, but not acceptable for Functional Safety (FuSa) in many applications. This month’s paper explores both transient and permanent faults in DNN accelerators to assess FuSa, aiming to create a small set of test vectors that cover all FuSa violations.

The used configuration consists of 1) a Deep Neural Network (DNN) featuring three fully connected hidden layers with architecture 784-256-256-256-10, 2) a systolic array accelerator similar to Google’s Tensor Processing Unit (TPU) that has a 256 x 256 Multiply-Accumulate (MAC) array of 24-bit units, and 3) three datasets for image classification with 60,000 training and 10,000 test images: MNIST, a benchmark of digit images, F-MNIST a set of 10 fashion images, and CIFAR-10 a set of images in 10 classes.

The paper performs a FuSa assessment on inference of the fully trained DNN running on the systolic array, injecting various faults into the systolic array reused for all DNN layers. The error-free DNN shows a classification accuracy of 97.4%. Obvious and not so obvious findings include:

  • Errors in less significant bit positions have lower impact, in the accumulator from the 9th bit onwards no effect.
  • Accumulator errors drop accuracy by about 80%, while multiplier errors cause only a 10% drop.
  • More injected faults lead to greater accuracy reductions.
  • Activation functions affect accuracy: ReLU results in about an 80% drop, whereas Sigmoid and Tanh result in around a 40% drop.
  • Data sets also impact accuracy: MNIST and F-MNIST have about an 80% drop, while CIFAR-10 has only a 30% drop.

The key section of the paper focuses on how to select test cases for detecting FuSa violations (any reduction in accuracy). The primary insight is that instead of using random nondeterministic patterns from the entire input space, which mostly do not correspond to an image and cannot be classified by a DNN, the proposed algorithms choose specific patterns from the pool of test vectors in the application data set for FuSa violation detection. The first algorithm calculates the Euclidean distance of each test case from multiple classes in the data set and selects those that resemble multiple classes. The outcomes are remarkable: with only 14-109 test cases, 100% FuSA coverage is achieved. Another algorithm picks the k patterns that have the lowest prediction confidence values, where the number k of test patterns is set by the user. With just k=10 test patterns, which is 0.1% of the total 10,000, all FuSA violations are identified. The authors also present results for a larger DNN with 5 hidden layers and a bigger data set containing 112,800 training and 18,800 test images, achieving similar outcomes.

This is an enjoyable paper to read. The title “Towards Functional Safety” hints at not ready for practical application, due to the limited dataset of merely 10,000 images and just 10 classes. It remains open whether this approach would be effective in scenarios with significantly larger datasets and categories, such as automotive applications, face recognition, or weapon detection.


Collaboration Required to Maximize ASIC Chiplet Value

Collaboration Required to Maximize ASIC Chiplet Value
by Kalar Rajendiran on 09-24-2024 at 10:00 am

Chiplet Alchip

It is a well-known fact that chiplets provide several advantages over traditional monolithic chips. Despite these benefits, the transition to a chiplet-based design paradigm presents challenges that need coordinated efforts across the industry. In essence, collaborative efforts among various players involved are not just a nicety, but rather a hard requirement to realize the full benefits. And an ASIC provider can optimize the benefits even further by customizing the various chiplet components of a design.

I recently interviewed Erez Shaizaf, the CTO of Alchip Technologies, Limited, to gain deeper insights into how the combination of the chiplet-paradigm and an ASIC approach can maximize the benefits to be derived from chiplets. Alchip is a leading provider of silicon design and production services for merchant silicon and system companies developing complex and high-volume ASICs and SoCs. Here are the excerpts from that interview.

The Importance of Collaboration in Chiplet-Based Design

In a fast-paced industry where new tape-outs occur every 12 to 18 months, having the right technology is only part of the solution. Efficient execution is equally critical. Chiplet-based designs and advanced packaging technologies are inherently complex, requiring a multi-disciplinary approach to achieve peak performance and maximize financial returns. Extensive collaboration across the ASIC value chain ensures seamless integration from the architecture phase to design, all the way to mass production ramp-up.

Alchip’s Collaborative Efforts

Alchip collaborates closely with IP vendors, EDA providers, foundries, and assembly partners to ensure seamless integration of cutting-edge technologies into advanced packaging designs. This includes managing high-bandwidth memory and numerous UCIe lanes for high die-to-die bandwidth, along with 224G/PCIe multi-lane architectures to support both scale-up and scale-out needs. By partnering with EDA vendors across all stages of design, from RTL to GDS, and through package design and sign-off phases, Alchip streamlines design processes, optimizing them for rapid production and reduced time to market.

In addition to its work with EDA vendors, Alchip collaborates with foundries such as TSMC to ensure compatibility with advanced process nodes and packaging types. This holistic approach extends to assembly and testing partners to validate designs and accelerate the transition from tape-out to mass production. Through optimized package and interposer multi-die floor planning and careful supply chain management, Alchip enables high-volume production efficiently, meeting diverse customer needs while minimizing risk and ensuring smooth transitions from design to mass production within the shortest possible time.

Critical Role of ASIC Providers in the Chiplet Ecosystem

Despite the buzz surrounding chiplets, Alchip challenges the notion of a “chiplet market.” In reality, chiplet development remains largely captive, with an estimated 90% of chiplet development occurring internally within organizations (with the exception of high-bandwidth memory). Each customer has unique performance demands — spanning functionality, bandwidth, process node, and package type — that make a generalized chiplet market hard to foresee.

However, Alchip has taken proactive steps to address these challenges by developing the Soft Chiplet approach, a modular, front-end ready design that allows for rapid tape-out. This modular approach covers various aspects of chiplet design — including verification, firmware, delivery, documentation, and ready-to-harden flow. The Soft Chiplet is an embodiment of Alchip’s collaborative philosophy, allowing flexibility for different customer requirements while speeding up the tape-out process.

Handling Design Complexity in Multi-Die Systems

Designing multi-die systems and chiplet-based architectures introduce significant complexity. Alchip navigates this challenge by leveraging its talented engineering team, who continuously seek out the best new solutions. The Alchip team drives the ecosystem to innovate and collaborate in meeting customer demands.

Additionally, Alchip’s R&D efforts ensure that the company is always prepared for “the next node” — whether it’s 5nm, 3nm, or beyond. This approach positions the company to provide customers with the most cutting-edge technologies when they need them, ensuring faster tape-outs and more efficient designs.

Packaging Solutions for Chiplet Integration

Alchip supports all TSMC advanced packaging types, as well as certain advanced OSAT packaging solutions, to ensure optimal compatibility with diverse chiplet integration needs. These packaging solutions are critical for achieving the best possible performance and power efficiency in chiplet-based designs. Through its internal R&D efforts, Alchip continuously develops packaging solutions that address the evolving demands of its customers, ensuring high performance and efficient integration across a wide range of applications.

Success Stories

Alchip is a trusted partner for cloud service providers, merchant silicon companies, and innovative start-ups, each of whom has unique design and performance requirements. The company’s ability to tailor ASIC solutions to these varying needs underlines its value as a collaborative partner. However, due to the custom nature of these designs, specific case studies remain confidential.

Summary

Alchip is dedicated to pushing the boundaries of the chiplet ecosystem through collaboration and continuous technological progress. It is actively advancing its Soft Chiplet initiative, with a working prototype expected by the end of this year. This development is part of a broader roadmap aimed at building a flexible and collaborative ecosystem for chiplet-based systems. The company plans to reveal more about its advancements in chiplet technology at upcoming conferences.

To learn more, visit www.alchip.com

Also Read:

Synopsys and Alchip Accelerate IO & Memory Chiplet Design for Multi-Die Systems

The First Automotive Design ASIC Platform

Alchip is Golden, Keeps Breaking Records on Multiple KPIs


Andes Technology is Expanding RISC-V’s Horizons in High-Performance Computing Applications

Andes Technology is Expanding RISC-V’s Horizons in High-Performance Computing Applications
by Charlie Su on 09-24-2024 at 6:00 am

Andes Chip

By: Dr. Charlie Su, President and CTO, Andes Technology Corp.

At Andes Technology, we are excited to share some of our latest advancements and insights into the growing role of RISC-V in several high-performance applications. According to the SHD Group report, “IP Market RISC-V Market Report: Application Forecasts in a Heterogeneous World,” our processor IPs account for 30% of RISC-V shipments, making Andes the number one competitor in the market. This gives us unique visibility into the deployment of the RISC-V ISA worldwide. Our customers offer products for a diverse range of applications across AI/ML, automotive, communication/networking, microcontroller, mobile, and storage sectors.

Leading Innovations in AI Acceleration

AI has been one of our largest licensing markets for our vector processors and advanced CPUs, targeting applications ranging from low-power edge devices—such as smart home devices, wearable health monitors, predictive maintenance in industrial settings, and autonomous drones and robots—to the high-performance cloud AI and data centers, with customers like Meta and Sapeon.

Meta announced their first Meta Training Inference Accelerator (MTIA) at ISCA 2023. It featured an 8×8 64 PE (processing element) array. Each PE includes a processor subsystem incorporating two configurations of the Andes AX25-V100, the predecessor of Andes popular NX27V vector processor. Meta also leveraged Andes Custom Extensions (ACE) and COPILOT tools to heavily customize the cores to their unique requirements, significantly reducing their design teams’ engineering time.

Meta noted in their ISCA presentation: “Each PE contains two RISC-V cores… and are heavily customized to suit the functionalities needed… The set of customizations includes custom interfaces, custom registers, custom instructions, and custom exceptions… The focus and strategy when architecting the accelerator therefore has been on adopting and reusing suitable pieces of technology, as well as tools and environments, from vendors and the open-source community….”

Our NX27V and AX45MPV vector processors, and other advanced Andes cores, are key components in next-generation in-memory computing solutions. Customers such as Axelera, Houmo.AI, Rain AI, and TetraMem leverage our technology for their in-memory computing AI accelerators. Most recently, our customer Rivos commented that our NX45 in their high-end AI SoC is the only RISC-V core passing their rigorous verification tests.

A pioneer in compute-in-memory (CIM) technology, Rain AI licensed Andes’ AX45MPV RISC-V vector processor. The AX45MPV CPU future-proofs Rain’s CIM-based NPU by allowing the addition of custom instructions to encapsulate the CIM computing blocks, thereby greatly simplifying software development efforts, esp. for the AI compiler. Andes Custom Extension framework and its automated COPILOT tool streamlines this customization process as highlighted in the Rain AI’s talk in the recent Andes RISC-V CON.

AI continues to be an area of rapid innovation, making it a perfect fit for RISC-V.  For example, Lightelligence is developing photonics-based AI accelerators performing high-speed matrix multiplication—another exciting area where our vector processors play a crucial role.

Automotive and ADAS (Advanced Driver Assistance Systems) are also growth markets.  Our automotive customers use our Vector and Safety-Enabled family of RISC-V cores across a variety of applications including in-cabin monitoring, display, radar, and sensing/control.  Multiple customers have utilized our fully ISO26262-compliant products and safety packages, achieving product safety certification in record time!

Additionally, our traditional markets such as processors for media-rich computing platforms (e.g, smartphones, tablets, and TVs), signal processing in wireless communications, and general-purpose control, remain strong.  For example, Renesas uses Andes AX45MP for their Linux-capable MPU, the RZ/Five.  The significant growth in high-performance AI and Automotive combined with the rapidly expanding software ecosystem is also driving growth in the high-performance general-purpose processing segment.

New and repeat customers continue to value the flexibility, openness, and rapid time-to-market that is enabled by Andes commercial RISC-V IP coupled with our COPILOT toolset for custom extensions.  We are seeing this growth persist in our traditional markets of embedded control, DSP, and real-time systems.  More importantly, this growth is accelerating in AI, Automotive, and General Compute.

Enhanced Software Ecosystem and Innovations

The RISC-V software ecosystem for control processors is rapidly expanding, with support from GNU, LLVM, Linux, and Google’s official support for Android. Linux already has robust RISC-V support, from the toolchain to booting into various Linux distributions. The RISC-V Software Ecosystem (RISE) Project, formed by Andes and other key RISC-V companies, is actively working to enhance open-source support for application processors, recently bringing up support for Java 21 and 22.

Andes is heavily involved in RISE, currently contributing to compilers for supporting ever-increasing memory space and to QEMU (Quick EMUlator) for adding new RISC-V extensions such as IOPMP, which is a powerful fast simulation framework to enable early software exploration and development while SoC architecture is still on the drawing board.

Under the RISE project, Andes is also responsible to portimize (port & optimize) the entire OP-TEE (Open Portable Trusted Execution Environment), a trusted application framework, to RISC-V. With OP-TEE, RISC-V will have a competitive offering to ARM’s TrustZone software. OP-TEE provides the software level protection, while ePMP (Enhanced Physical Memory Protection) and IOPMP (Input/Output Physical Memory Protection) hardware level protection. These advanced memory protection features enhance system security and reliability by providing fine-grained control over permissions of memory accesses from RISC-V processors and other DMA-capable controllers, crucial for high-security applications. The porting of OP-TEE calls for significant work in TEE Client API and OP-TEE Linux Driver in the Non-secure World as well as TEE core API and OP-TEE OS in the Secure World. Underlying the Secure and Non-secure World is the OpenSBI layer, which Andes will enhance with OP-TEE extensions. OpenSBI (Open Supervisor Binary Interface) is another critical component in the RISC-V ecosystem, providing the necessary firmware layer to boot and manage RISC-V operating systems and hypervisors.

Regarding Android, Google has specified the base requirements for RISC-V in the Compatibility Definition Document (CDD). This includes RVA22 plus vector, vector crypto, and hypervisor support. Andes has been working on the ongoing Android Open-Source Project (AOSP) software by bringing them to run on QEMU and porting it to our non-compatible hardware to achieve faster performance for Android. During this process, we even identified some generic bugs present in ARM builds, which Google acknowledged and fixed.

We also focused on optimizing computational libraries using RISC-V Vector extension (RVV), such as libjpeg—the widely used library for handling JPEG images, offering efficient compression, decompression, and manipulation capabilities, and working on optimization for ART (Android Runtime). The Android Runtime (ART) represents a significant evolution in the Android platform’s runtime environment, offering substantial improvements in performance, efficiency, and developer productivity over its predecessor, Dalvik. By leveraging AOT (Ahead-of-Time) and JIT (Just-In-Time) compilation, enhanced garbage collection, and better debugging tools, ART provides a more responsive and efficient runtime for Android applications.

Conclusion

Andes Technology continues to deliver advanced RISC-V processor IP while making extensive contributions to its software ecosystem. Our collaborations with industry leaders in the AI, automotive, and computing sectors highlight the versatility and potential of RISC-V architecture. Through our robust hardware solutions and the ever-enriched RISC-V software ecosystem, we are paving the way for RISC-V to become a mainstream choice for high-performance computing applications. With ongoing efforts in compiler enhancements, runtime and library optimizations, trusted application framework, and support for critical industry standards, Andes Technology is leading the way in realizing the full potential of RISC-V. We are excited about the future possibilities and innovations, solidifying our position in the evolving landscape of high-performance computing.

Also Read:

TSMC OIP Ecosystem Forum Preview 2024

Linear pluggable optics target data center energy savings

Smarter, Faster LVS using Calibre nmLVS Recon


Synopsys Powers World’s Fastest UCIe-Based Multi-Die Designs with New IP Operating at 40 Gbps

Synopsys Powers World’s Fastest UCIe-Based Multi-Die Designs with New IP Operating at 40 Gbps
by Kalar Rajendiran on 09-23-2024 at 10:00 am

Synopsys 40G UCIe IP Solution

As the demand for higher performance computing solutions grows, so does the need for faster, more efficient data communication between components in complex multi-die system-on-chip (SoC) designs. In response to these needs, Synopsys has introduced the world’s fastest UCIe-based IP solution, capable of operating at a groundbreaking 40Gbps. This full IP solution, encompassing Controller, PHY, and Verification IP, represents a leap forward for die-to-die connectivity in SoCs, targeting data centers, AI, generative AI, and a wide array of cutting-edge applications.

What sets this solution apart is that it achieves 40Gbps per pin, a 25% increase over the 32Gbps defined in the UCIe specification, without compromising energy efficiency or die area. This capability opens up new possibilities for end users while maintaining seamless integration with the UCIe ecosystem.

During a chat with Michael Posner, vice president of IP product management at Synopsys, he commented that their active contribution to the UCIe consortium is what enabled the company to deliver a robust UCIe solution soon after the UCIe 2.0 ratification. Delivering robust IP solutions ahead of  a standard’s ratification has become routine for Synopsys as noticed with Ethernet 1.6T and PCIe 7.0 IP solution announcements.

The UCIe Advantage: An Industry Standard for Die-to-Die Connectivity

Universal Chiplet Interconnect Express (UCIe) has emerged as the de facto standard for die-to-die connectivity, offering a scalable solution for high-performance, multi-die designs. The key metrics of UCIe — bandwidth per mm², power efficiency, low latency, and minimal area usage — align perfectly with the needs of customers across various industries, from data centers to AI applications.

Why 40Gbps?

Synopsys’ new IP solution is UCIe compliant at 32Gbps per pin and goes beyond it, offering speeds up to 40Gbps per pin in specific use cases. By increasing the data transfer rate to 40Gbps, the company ensures that multi-die SoCs can efficiently handle the growing data requirements in these environments. While pushing the boundaries beyond the UCIe specification, the solution is able to provide more bandwidth without impacting the energy or area budgets of the design. This is particularly leverageable and beneficial in environments where dies are tightly linked, such as in multi-die architectures that need to efficiently transfer massive amounts of data between SoC components. The additional bandwidth provided by the 40Gbps solution allows system designers to future-proof their designs while meeting current demands.

Modular Architecture for Scalable Bandwidth

A significant advantage of this UCIe IP solution is its modular architecture. The PHY component of the IP consists of 64 Tx and 64 Rx lanes, which can be combined to create large-scale solutions with substantial bandwidth. This makes the solution ideal for memory-intensive applications, such as those using high-bandwidth memory (HBM) chiplets. In these designs, high-speed die-to-die communication is essential to ensure the performance of memory subsystems. The Synopsys UCIe solution’s 40Gbps capability offers an extra margin of performance for these applications, ensuring that memory bandwidth requirements are not only met but exceeded, allowing for greater flexibility in SoC design. This flexibility allows designers to optimize for both area and performance, achieving high data throughput without sacrificing valuable chip real estate.

“Heterogeneous integration with high-bandwidth die-to-die connectivity gives us the opportunity to deliver new memory chiplets with the efficiency needed for data-intensive AI applications,” said Jongwoo Lee, vice president of the System LSI IP Development Team at Samsung Electronics. “Leveraging Synopsys’ new 40G UCIe IP, we can extend our collaboration to develop industry-leading chiplet solutions for tomorrow’s high-performance data centers.”

With Synopsys’ solutions like PCIe 7.0 and Ethernet 1.6T already addressing the need for high-bandwidth data coming into SoCs, UCIe provides the means for that data to flow between dies seamlessly.

Signal Integrity Monitors: Enabling Reliable Mission-Critical Operations

The UCIe IP solution includes Signal Integrity Monitors which monitor the integrity of the links for identifying any degradation in performance. When any degradation is noticed, the system can be flagged for taking any proactive maintenance measures or the profiled information can be offloaded for further analysis.

The ability to assess link degradation and flag for corrective action before a failure occurs ensures higher reliability and reduces the risk of downtime. Synopsys 40G UCIe IP offers silicon lifecycle management (SLM) features, where data from the profiling process can be used for longer-term reliability assessments. A proactive approach to reliability is especially important for industries like data centers and automotive, where continuous operation and safety are paramount. Further, this feature is expected to be a game-changer in industries like automotive, where real-time data integrity is also essential for reliable vehicle operations.

UCIe’s Redundant Links and Lifecycle Management

The UCIe specification defines redundant links to enhance reliability. Synopsys’ UCIe solution supports redundant links and links found to be defective can be rerouted to spare links during manufacturing or testing.

Simplified Clocking for Easy Integration

The company has further reduced system integration complexity by incorporating built-in clocking capabilities that make use of a 100MHz reference clock, a standard across SoC designs for many years. UCIe typically requires a 1GHz or 2GHz clock, but by integrating the capability to work with a more common 100MHz clock, the company has simplified system-level integration, allowing designers to adopt the technology more easily without the need for complex clocking inputs.

Hardware-Based Initialization: A Key Differentiator

Another unique aspect of the company’s UCIe-based IP is its hardware-based initialization, which eliminates the need for processor or firmware interaction during the bring-up process. The company has simplified this process by developing a system that allows the UCIe link to come online without external intervention, greatly reducing the complexity and cost associated with multi-die designs.

This capability is particularly important in “blind die bring-up” scenarios, where the only connection between two dies is the UCIe link. By eliminating the need for firmware or processor involvement in establishing the link, the company reduces the need for complex software solutions, saving time and reducing development costs.

Summary

With its new UCIe IP solution, capable of operating at 40Gbps per pin, Synopsys has advanced die-to-die communication technology for high-performance AI SoCs. By delivering 25% more bandwidth than the UCIe specification without impacting energy or area, the company is enabling designers to meet the needs of today’s data-intensive applications, while ensuring scalability for future generations.

The modular architecture, integrated Signal Integrity Monitors and test features, protocol bridges and simplified clocking provide a robust, scalable solution for high-bandwidth, low-latency SoC designs. As industries continue to shift towards multi-die architectures, the company’s 40Gbps UCIe IP offers a powerful tool for system designers looking to optimize performance, reliability, and ease of integration.

Read the full press announcement here.

To learn more, visit Synopsys UCIe solution page.

Also Read:

Synopsys IP Processor Summit 2024

Mitigating AI Data Bottlenecks with PCIe 7.0

The Immensity of Software Development the Challenges of Debugging (Part 1 of 4)


PQShield Builds the First-Ever Post-Quantum Cryptography Chip

PQShield Builds the First-Ever Post-Quantum Cryptography Chip
by Mike Gianfagna on 09-23-2024 at 6:00 am

PQShield Builds the First Ever Post Quantum Cryptography Chip

Quantum computing promises to deliver vast increases in processing power. The technology exploits the properties of quantum mechanics to create revolutionary increases in performance. Medical and material science research are examples of fields that will see dramatic improvement when production-worthy quantum computers arrive.

Unfortunately, all existing data encryption will also become much easier to defeat when this happens. This is still in the future, but it is clearly coming, and vast efforts are underway to prepare for it. PQShield is part of this effort.

The company brings together a world-class collaboration of post-quantum cryptographers, researchers and engineers. Its technology includes hardware, software, firmware and research IP.

Recently, PQShield achieved a significant milestone, by building the first ever post-quantum cryptography test chip. Let’s look at some of the details behind how Team PQShield have done this, and what this development means in the field of post-quantum cryptography.

Why is this Significant?

In a recent video, Ben Packman, PQShield’s Chief Strategy Officer and Graeme Hickey, VP Engineering discuss the new test chip.

It’s noteworthy because NIST (the National Institute of Standards and Technology) has a major effort underway to develop a blueprint for the world to update cryptography, ready for the post-quantum era. New standards have been published, and the entire semiconductor supply chain is already anxious to get to work on implementation. Up to now that work has relied on simulation and FPGA prototyping.

With PQShield’s new test chip, the new NIST standards can be evaluated in a way that closely resembles the final production environment. Hardware and software can now evolve in lockstep, allowing for much better validation and faster progress.

About the Video

In this short video, Ben and Graeme from PQShield talk through the development of the test chip. The chip contains the complete PQPlatform catalog of IP offered by PQShield, in a package that can be programmed for a wide range of uses. Graeme points out that the test chip allows a deep dive into areas such as power utilization, performance, and the efficiency of Side Channel Analysis (SCA) countermeasures designed to defend against physical detection of cryptographic secrets. The test chip will provide important information and insights.

I recommend watching the video to get the full impact of this work. Ben Packman points out that the test chip has been completed before many of the NIST standards were finalized and wonders how the chip could therefore be compliant. Graeme explains that the use of hardware/software co-design in the development of the chip allows new standards (both now and in future) to be loaded into the hardware via firmware, and that the platform is likely to continue to provide substantial value going forward.

To Learn More

You can learn more about PQShield and its comprehensive approach to post-quantum cryptography, on SemiWiki. If you’re contemplating post-quantum additions in your next design, the video is well worth watching from the PQShield website. And that’s how PQShield builds the first-ever post-quantum cryptography chip.

 


Podcast EP248: The Far-Reaching Impact of Finwave Technology With Dr. Pierre-Yves Lesaicherre

Podcast EP248: The Far-Reaching Impact of Finwave Technology With Dr. Pierre-Yves Lesaicherre
by Daniel Nenni on 09-20-2024 at 10:00 am

Dan is joined by Dr. Pierre-Yves Lesaicherre. Before joining Finwave as CEO, Dr. Lesaicherre was the president, CEO and a director of Nanometrics , a leading provider of advanced process control metrology and software analytics. He also held the CEO position for Lumileds, an integrated manufacturer of LED components and automotive lighting lamps. Dr. Lesaicherre previously held senior executive positions at NXP and Philips Semiconductors, and served as chairman of the board of Silvaco Group, a leading supplier of TCAD, EDA software and design IP.

Dan explores with Pierre-Yves the unique technology invented by Finwave and its impact on the industry. The founding of Finwave is discussed, which includes the invention of a novel type of gallium nitride (GaN) transistor based on a FinFET architecture by the Finwave founders while working at MIT.

Pierre-Yves describes the impact of Finwave’s 3DGaN FinFET technology to significantly enhance power amplifier linearity, meeting the demands of advanced communication systems. He touches on several other innovations, including the world’s first high-speed, broadband, high-power RF switch. The superior speed and power handling capabilities of Finwave’s technology are discussed. Pierre-Yves also describes the recently announced work with GlobalFoundries to partner on RF GaN-on-Si technology for cellular handset applications.

Future plans for the company, including product ramp and fund-raising are also discussed.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Adam Khan of Diamond Quanta

CEO Interview: Adam Khan of Diamond Quanta
by Daniel Nenni on 09-20-2024 at 6:00 am

Adam Khan Bio Profile

Adam Khan is a pioneer in diamond semiconductor technology, renowned for his foresight and expertise. As the founder of AKHAN Semiconductor, he played a crucial role in innovating lab-grown diamond thin-films for various applications, such as enhancing smartphone screens and lenses with Miraj Diamond Glass® and improving aircraft survivability with Miraj Diamond Optics®. In 2019, under Adam’s leadership, AKHAN partnered with the U.S. Army and a major Defense Contractor to showcase the robust capabilities of its diamond technology, highlighting its importance in national security and defense. His vision secured a $20 million A-Round investment, accelerating the scaling of their consumer device technology to meet growing demand. Adam’s leadership and achievements have been widely recognized, including being named a Forbes 30 Under 30 honoree in Energy & Industry.

Tell us about your company?

Diamond Quanta is here to bring about the future of diamond technology. For us, that means applying our groundbreaking fabrication processes to advance the material’s potential, allowing it to confidently handle the long-term, modern issues facing semiconductors and quantum technology.

Through our novel fabrication and doping techniques, we are pushing diamond’s limits beyond what was previously thought possible of the material, expanding its efficiency, durability and sustainability capabilities to provide real-world solutions facing a wide spectrum of prominent industries.

What problems are you solving?

Power electronics face a major issue with energy loss during power conversion. This leads to inefficiency, malfunction and poor sustainability, among other things. Diamond Quanta is harnessing and advancing the exceptional properties of diamond’s wideband gap applications to tackle this issue head-on. Our approach creates a transformed diamond material that minimizes this power conversion energy loss, meaning the introductions of devices that are far more efficient while also being more compact and lightweight.

With these advantages in mind, our technology brings about power electronics that can withstand higher temperatures, operate at increased voltages and deliver superior performance across a broader range of frequencies, with a reduced carbon footprint on top of all of that.

What application areas are your strongest?

Right now, our technology is well-suited to address the issues plaguing data centers, the electric vehicle (EV) transition and aerospace manufacturing.

As data centers expand, their computational needs are growing, making it more and more difficult to efficiently power them. Using our diamond-based technology, semiconductors will have a much easier time handling these lofty power loads, as their superior efficiency and heat dissipation will allow them to confidently meet the computational needs that are asked of them, without requiring the inefficient cooling processes currently used to keep today’s chips functional in these applications.

That same efficiency and heat dissipation is vital to our technology’s application in the EV industry. Diamond-based electronics enable more compact and efficient power electronics, which directly translates to weight savings that are crucial for improving EV range. Range anxiety is one of the key consumer apprehensions to EV adoption; alleviating that anxiety will be crucial in further proliferating the electric transition.

In the ever-evolving worlds of commercial and defense aerospace technology, the application of diamond-based power electronics will similarly bring about major advantages through increased efficiency, weight reduction and heat dissipation, all leading to vehicles with longer lifespans that will require fewer maintenance and replacement costs.

What keeps your customers up at night?

Our customers, which include leaders in the semiconductor, aerospace, automotive, and consumer electronics industries, are often concerned about staying ahead of technological advancements and managing costs. They seek solutions that improve efficiency and performance while reducing energy consumption and thermal management issues. The reliability and scalability of new technologies are critical considerations that keep them up at night, as these factors directly impact their competitive edge and market presence.

What does the competitive landscape look like and how do you differentiate?

The competitive landscape in advanced materials, particularly diamond-based semiconductors, is evolving with several key players focusing on innovative applications of materials science. Diamond Quanta differentiates itself through our proprietary “Unified Diamond Framework,” which allows for precise doping and manipulation of diamond structures at the molecular level. This enables us to enhance mechanical, optical, and electronic properties to meet specific industry needs, which is not commonly offered by competitors. Our approach not only delivers superior performance but also ensures greater durability and efficiency in high-power and high-frequency, photonic and quantum transport applications.

What new features/technology are you working on?

Currently, Diamond Quanta is developing new techniques for integrating our diamond-based materials into existing semiconductor processes without the need for extensive retooling. We are also enhancing our quantum photonic devices, which promise to drastically improve data processing speeds and energy efficiency for applications in quantum computing and advanced AI systems. Furthermore, we are exploring the potential of our materials in supporting sustainable technologies, particularly in electric vehicles, data centers, and renewable energy systems.

How do customers normally engage with your company?

Customers typically engage with Diamond Quanta through direct inquiries via our website, at industry conferences, and through our business development team. We also see significant engagement through collaborations in research and development projects where we work closely with customer engineering teams to tailor our materials and technologies to their specific needs.

Also Read:

Executive Interview: Michael Wu, GM and President of Phison US

CEO Interview: Wendy Chen of MSquare Technology

CEO Interview: BRAM DE MUER of ICsense


TSMC OIP Ecosystem Forum Preview 2024

TSMC OIP Ecosystem Forum Preview 2024
by Daniel Nenni on 09-19-2024 at 10:00 am

TSMC OIP 2024

The 2024 live conferences have been well attended thus far and there are many more to come. The next big event in Silicon Valley is the TSMC Global OIP Ecosystem Forum on September 25th at the Santa Clara Convention Center. I expect a big crowd filled with both customers and partners.

This is the 16th year of OIP and it has been an honor to be a part of it. The importance of semiconductor ecosystems is greatly understated as is the importance of the TSMC OIP Ecosystem.

The big change I have seen over the last few years is momentum. The FinFET era has gained an incredible amount of ecosystem strength and the foundation of course is TSMC. When we hit 5nm the tide changed in TSMC’s favor with a huge amount of TSMC N5 EDA, IP, and ASIC services support. In fact, there were a record setting number of tape-outs on this node. This momentum has increased at 3nm with TSMC N3 (the final FinFET node) having the strongest ecosystem support and tape-outs in the history of the fabless ecosystem in my experience.

The momentum is continuing with N2 which will be the first GAA node for TSMC. Rumor has it N2 will have comparable tape-outs with N3. It is too soon to say what will happen with the angstrom era but my guess is that semiconductor innovation and Moore’s Law will continue in one form or another.

A final thought on the ecosystem, while it appears that IDM foundries have more R&D strength than pure-play foundries I can assure you that is not the case. The TSMC OIP Ecosystem, for example, includes the largest catalog of silicon verified IP in the history of the semiconductor industry. IP companies first develop IP in partnership with TSMC to leverage the massive TSMC customer base. In comparison, the IDM foundries pay millions of dollars to port select IP to each of their processes to encourage customer demand.

Throughout the FinFET era foundries, customers and partners have spent hundreds of billions of R&D dollars in support of the fabless semiconductor ecosystem which will get the semiconductor industry to the one trillion dollar mark by the end of this decade, absolutely.

Here is the event promo:

Get ready for a transformative event that will spark innovations of today and tomorrow’s semiconductor designs at the 2024 TSMC Global Open Innovation Platform (OIP) Ecosystem Forum!

This year’s forum is set to ignite excitement with a focus on how AI is transforming chip design and the latest advances in 3DIC system design. Join industry trailblazers and TSMC’s ecosystem partners for an inside look at the latest innovations and breakthroughs.

Through a series of compelling, multi-track presentations, you’ll witness firsthand how the ecosystem is collaborating to address critical design challenges and leverage AI in chip design processes.

Engage with thought leaders and innovators at this unique event, available both in-person and online across major global locations, including North America, Japan, Taiwan, China, Europe, and Israel.

Don’t miss out on this opportunity to connect with the forefront of semiconductor technology.

Get the latest on:
• Emerging challenges in advanced node design and corresponding design flows and methodologies for N3, N2, and A16 processes..

• The latest updates on TSMC’s 3DFabric chip stacking and advanced packaging technologies including InFO, CoWoS®, and TSMC-SoIC®, 3DFabric Alliance, and 3Dblox standard, along with innovative 3Dblox-based design enablement technologies and solutions, targeting HPC, AI/ML, and mobile applications.

• Comprehensive design solutions for specialty technologies, enabling ultra-low power, ultra-low voltage, analog migration, RF, mmWave, and automotive designs, targeting 5G, automotive, and IoT applications.

• Ecosystem-specific AI-assisted design flow implementations for enhanced productivity and optimization in 2D and 3D IC design.

• Successful, real-life applications of design technologies, IP solutions, and cloud-based designs from TSMC’s Open Innovation Platform® Ecosystem members and TSMC customers to speed up time-to-design and time-to-market.

REGISTER NOW

Also Read:

TSMC’s Business Update and Launch of a New Strategy

TSMC Foundry 2.0 and Intel IDM 2.0

What if China doesn’t want TSMC’s factories but wants to take them out?


Linear pluggable optics target data center energy savings

Linear pluggable optics target data center energy savings
by Don Dingee on 09-19-2024 at 6:00 am

Conceptual diagram of a retimed OSFP versus a linear direct drive solution using an advanced SerDes IP solution and linear pluggable optics

Data center density continues growing, driving interconnect technology to meet new challenges. Two of the largest are signal integrity and power consumption. Optical interconnects can solve many signal integrity issues posed by copper cabling and offer support for higher frequencies and bandwidths. Still, through sheer numbers in a data center – with projected 10x interconnect growth in racks for applications like AI – optical interfaces add up quickly to pose power consumption problems. Retiming circuitry provides flexibility at the cost of added power. New linear direct-drive techniques simplify interfaces, saving energy and helping close the interconnect scalability gap. Here, we highlight Synopsys’ efforts to usher in more efficient linear pluggable optics with their 1.6T Ethernet and PCIe 7.0 IP solutions.

What’s using most of the power in a pluggable optical interface?

Pluggable modules emerged years ago as an easier way to configure (and, in theory, upgrade within controller limits) physical network interfaces. Instead of swapping motherboards or expansion cards inside a server to get different network ports, designs accommodating SFPs let IT teams choose modules and mix and match them for their needs. SFPs also helped harmonize installations with varying types of network interfaces in different platforms across the enterprise network.

The latest form factor for high-speed Ethernet is OSFP. Density increases have fostered new types of OSFPs, which gang lower-speed lanes into a faster interface. A high-level view of an OSFP pluggable optical module shows there is more than just electrical to optical conversion – analog amplifiers team with an MCU and DSP for signal processing and retiming.

Because a high-speed network interface is likely continuously transferring a data stream, the PHY is continuously retiming the incoming signal. In a single OSFP, this power use may not seem like a lot. However, in a dense data center with aggregate transport bandwidth beyond 25T switches, projections show optical pluggable modules become one of the largest power consumers in the networking subsystem. With data center energy usage a crucial consideration, more efficient pluggable optical modules become essential to attain new levels of interconnect scalability.

New SerDes technology enables direct-drive optical interfaces

The complexity in an optical module arises from the onboard (or, more accurately, on-chip) PHY’s inability to compensate for a range of optical impairments, which worsen as speeds increase. What seemed like a good idea to move retiming into the optical module now merits rethinking as power efficiency bubbles up to the top of the list of concerns. A linear direct-drive (LDD) or linear pluggable optical (LPO) interface retools the electrical circuitry, usually in a network switch ASIC inside a server or network appliance, to handle the required compensation. One result is a simpler OSFP that deals only with electrical-to-optical conversion, significantly reducing the power consumption of the retiming function in the PHY.

The tradeoff is handling direct drive functionality efficiently in next-generation, optical-ready PHY IP. Moving the logic into a network controller ASIC requires careful attention to signal integrity and dealing with reflections, crosstalk, noise, dispersion, and non-linearities. High-speed digital circuitry in a compact footprint generates significant heat, requiring sound thermal management. Shared resources in the host ASIC supporting the SerDes IP provide power management advantages over the retimed implementation.

Synopsys is carving a path toward more efficient linear pluggable optics using co-simulation techniques to develop advanced SerDes IP solutions for faster Ethernet and PCI Express. At higher data rates, simplified models of photonic behavior through electrical equivalents provide inaccurate performance estimates. With more robust electro-optical modeling, simulating IP solutions in a system context offers better results. Synopsys IP solutions first appeared in demonstrations at OFC2024 using OpenLight’s PICs.

These Synopsys IP solutions enable scale-out and scale-up SoC designs:

  • A 1.6T Ethernet IP solution with multi-rate, multi-channel 1.6T Ethernet MAC and PCS controllers, 224G Ethernet PHY IP, and verification IP for easier SoC integration.
  • A PCIe 7.0 IP solution with a PHY, controller, IDE security module, and verification IP providing secure data transfers up to 512 GB/sec bidirectional in x16 configurations.

The Synopsys PHY IP for 224G Ethernet and PCIe 7.0/6.x have demonstrated capabilities for linear direct drive, and the 224G Ethernet works with retimed RX and TX.

Learning more about LDD and LPO solutions

Once the industry sees the possibilities for LDD/LPO in SoC designs for server and networking hardware, the ecosystem for linear pluggable optics solutions should develop rapidly to recapture as much as 30% of the energy used in a high-interconnect density data center. Synopsys is discussing more details of its unified electronic and photonic design approach and the optical direct drive IP solutions at two real-world events:

European Conference on Optical Communication (ECOC2024)

Optica Photonic-Enabled Cloud Computing Industry Summit at Synopsys

An on-demand Synopsys webinar also offers more insight into the rising interconnect demands, the evolution of OSFPs, LDD technology, and electro-optical co-simulation techniques:

To retime or not to retime? Getting ready for PCIe and Ethernet over Linear Pluggable Optics