Banner 800x100 0810

Speeding up Chiplet-Based Design Through Hardware Emulation

Speeding up Chiplet-Based Design Through Hardware Emulation
by Kalar Rajendiran on 02-16-2023 at 10:00 am

Barriers on the Continuum to SiP

The first chiplets focused summit took place last month. So many accomplished speakers gave keynote talks on what direction should and would the Chiplets ecosystem evolution take. Corigine presented the keynote on what direction hardware emulation should and would evolve for speeding up chiplet- based designs. During a pre-conference tutorial session, Corigine shared customer-based case studies to highlight how Corigine’s MimicPro prototyping and emulation solutions addressed challenges introduced by chiplet-based designs.

The Chiplet Summit introduced a new tag line, “Chiplets Make Huge Chips Happen.” With large monolithic SoCs losing favor in the face of Moore’s Law slowing down, the new tag line highlights how chiplets make large SoCs possible. Of course, tag lines by themselves don’t make things happen. It takes an ecosystem, the companies within the ecosystem and the people at these companies that make things happen. One of those companies is Corigine. Corigine is a fabless semiconductor company that designs and delivers leading edge EDA tools.

Corigine presented insightful thoughts and discussed their innovative solutions during various sessions at the conference. If you missed these sessions, the following is a synthesis of the salient points from those sessions.

Chiplet-based Design Benefits, Challenges and Solutions

Aside from the economic benefit derived from an yield perspective compared to a large monolithic SoC, chiplets bring many additional benefits to the table. These benefits are namely, architectural partitioning, enabling of re-use, time-to-market and product family scalability. Of course, there are many challenges too. The following diagram shows the continuum of barriers when implementing a chiplet-based chip.

With Corigine’s focus on addressing the front-end barriers, the following are its learnings during the course of its chiplet-based data processing unit (DPU) chip development work.

Chiplets-based Chip Development and Emulation Requirements

A key consideration for a chiplet is the decision on where to place its various I/O ports. This of course is driven by the system requirements such as machine language (ML) processing functionality and datapath SIMD or MIMD organization. With an effective architectural decomposition of the system, the next set of requirements revolves around the interconnect’s attributes. The interconnects should be open, extensible and backwards compatible.  For example, as UCIe is being driven as a standard for the D2D interconnects, as the UCIe standard evolves, UCIe V2.0 should also support V1.0 based chiplets.

With the interconnects addressed, the next requirement is a pre-tapeout platform to support integration and verification of heterogeneous chiplets. The platform should be able to support a very large number of transistors and ensure IP protection and segmentation. Finally, none of the above matter if silicon and software co-development cannot be accomplished rapidly and successfully. The co-development platform must provide built-in logic analyzers with complex trigger mechanism capabilities to insert waveforms during software debug.

Corigine’s MimicPro Prototyping and Emulation Solutions

To address the co-development platform, Corigine developed a series of FPGA-based prototyping and emulation platforms by working with the silicon and software teams developing their own chiplet-based DPU chip. These platforms are essentially combined prototyping and emulation systems that can provide faster software turnaround time. They include functionality for collecting and analyzing data and introducing design-for-test and design-for-manufacturing features, thereby enabling software verification before tapeout.

The MimicPro solutions deliver an order-of-magnitude performance improvement over traditional emulators of similar class. Corigine’s patented distributed routing and fine-grain multi-user clocking enable linear performance scaling irrespective of the size of a block being emulated. The dedicated scalable clock/routing infrastructure enables higher utilization of resources for logic emulation.

Corigine MimicPro was initially optimized for performance and scalability, enhanced with visibility, portability and security. It essentially combines rich debugging features and confidential information protection and 10-100MHz level performance of prototyping. It continues to grow with Corigine in house SmartNIC / Data Processing Unit chiplet design.

The following chart showcases the resource utilization efficiency of a MimicPro system in real life use by a SmartNIC.

The following is what Corigine is addressing for chiplets with its MimicPro solutions.

MIMIC Product Information  

MimicPro™ 32

The Corigine MimicPro Prototyping System provides performance and speed for ASIC and software development for both enterprise and cloud operation, with utmost security and scalability. The MimicPro solution provides scalability from 4 to 32 FPGAs. The system also provides easy upgradeability to the latest available FPGAs. The Corigine MimicPro system is the industry’s next-generation platform for automating prototyping including manual partitioning operations, while providing a system-level view for optimum partitioning and performance. In addition, the MimicPro system adds deep local debug capabilities providing much greater visibility and faster elimination of bugs. Thus, the MimicPro system reduces the overall development time and cost-effectively accelerates software development without the dependence on costly emulation.

For more detailed MimicPro™-32 information, you can refer to Corigine’s product page.

MimicPro-32

MimicTurbo™ GT Card

Corigine MimicTurbo GT card based on the UltraScale+™ VU19P FPGA is designed to simplify the deployment of FPGA based prototyping at the desktop. The card can support up to 48 million ASIC gates each, has onboard DDR4 component memory and can be configured to operate with additional connected MimicTurbo GT cards. The card supports 64 GTY transceivers (16 Quads) along with the essential I/O interfaces.

Corigine MimicTurbo GT board is available from the Xilinx website. You can find more detailed product information on AMD/Xilinx FPGA-based Corigine MimicTurbo GT card on this page.

MimicTurbo GT Card

Corigine MimicTurbo GT 1 FPAG board is available from the Xilinx website. You can find more detailed product information on Xilinx FPGA-based Corigine MimicTurbo GT card on this page.

Corigine at DVCon US 2023

Corigine is at DVCon demonstrating its MimicPro-32 this month.
Time: February 27 th – March 1 st
Location: DoubleTree by Hilton Hotel San Jose.
Registration: https://dvcon.org/registration/

Also Read:

Alphawave Semi at the Chiplet Summit

Who will Win in the Chiplet War?

The Era of Chiplets and Heterogeneous Integration: Challenges and Emerging Solutions to Support 2.5D and 3D Advanced Packaging


ML-Based Coverage Acceleration. Innovation in Verification

ML-Based Coverage Acceleration. Innovation in Verification
by Bernard Murphy on 02-16-2023 at 6:00 am

Innovation New

We looked at another paper on ML-based coverage acceleration back in April 2022. Here is a different angle from IBM. Paul Cunningham (Senior VP/GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome. And don’t forget to come see us at DVCon, first panel (8am) on March 1st 2023 in San Jose!

The Innovation

This month’s pick is Using DNNs and Smart Sampling for Coverage Closure Acceleration. The authors presented the paper at the 2020 MLCAD Workshop and are from IBM Research in Haifa and the University of BC in Canada.

The authors intent is to improve coverage on events which have been hit only rarely. They demonstrate their method for a CPU design, based on refining instruction set (IS) test templates for an IS simulator. Especially interesting in this paper is how they manage optimization in very noisy low statistics data where conventional gradient-based comparisons are problematic. They suggest several methods to overcome this challenge.

Paul’s view

Here is another paper on using DNNs to improve random instruction generators in CPU verification which, given the rise of Arm-based servers and RISC-V, is becoming an increasingly hot topic in our industry.

The paper begins by documenting a baseline non-DNN method to improve random instruction coverage. This method works by randomly tweaking instruction generator parameters and banking the tweaks if they improve coverage. The tweaking process is based on a gradient-free numerical method called implicit filtering (see here for a good summary), which works kind of like zoom out-then-in search: start with big parameter tweaks and zoom in to smaller parameter tweaks if the big tweaks don’t improve coverage.

The authors then accelerate their baseline method using a DNN to assess if the parameter tweaks will improve coverage before going ahead with costly real simulations to precisely measure the coverage. The DNN is re-trained after each batch of real simulations, so it is continuously improving.

The paper is well written, and the formal justification for their method is clearly explained. Results are presented on two arithmetic pipes of the IBM NorthStar processor (5 instructions and 8 registers). It’s a simple testcase and sims are run for only 100 clock cycles measuring only 185 cover points. Nevertheless, the results do show that the DNN-based method is able to hit all the cover points with half as many sims as the baseline implicit filtering method.  Nice result.

Raúl’s view

As Paul says, we are revisiting a topic we have covered before. In April 2022 we reviewed a paper by Google which incorporated a Control-Data-Flow-Graph into a neural network. Back in December 2021we reviewed a paper from U. Gainesville using Concolic (Concrete-Symbolic) testing to cover hard to reach branches. This month’s paper introduces a new algorithm for coverage-directed test generation combining test templates, random sampling, and implicit filtering (IF) with a deep neural network (DNN) model. The idea is as follows:

As is common in coverage directed generation, the approach uses test templates, vectors of weights on a set of test parameters that guide random test generation. Implicit filtering (IF) is an optimization algorithm based on grid search techniques around an initial guess to maximize chances to hit a particular event. To cover multiple events, the IF process is simply repeated for each event, called the parameter-after-parameter approach (PP). To speed up the IF process, the data collected during the IF process is used to train a DNN, which approximates the simulator and is much faster than simulating every test vector.

The effectiveness of the algorithms is evaluated employing an abstract high-level simulator of part of the NorthStar processor. Four algorithms are compared: Random sampling, PP, DNN and combining IF and DNN. The results of three experiments are reported:

  1. Running the algorithms with a fixed number of test templates, up to 400 runs. Combining IF and DNN is superior, missing only up to 1/3 of the hard to hit events
  2. Running the algorithms until all hard to hit events are covered. IF and DNN converges with half the number of test templates
  3. Running the last algorithm (IF and DNN) 5 times. All runs converge with a similar number of test templates, even the worst using ~30% less test templates than other algorithms

This is a well-written paper on a relevant problem in the field. It is (almost) self-contained, it is easy to follow, and the algorithms employed are reproducible. The results show a reduction of “the number of simulations by a factor of 2 or so” over implicit filtering. These results are based on one relatively simple experiment, NorthStar. I would have liked to see additional experimentation and results; some can be found in other publications by the authors.


The State of FPGA Functional Verification

The State of FPGA Functional Verification
by Daniel Payne on 02-15-2023 at 10:00 am

Design Styles min

Earlier I blogged about IC and ASIC functional verification, so today it’s time to round that out with the state of FPGA functional verification. The Wilson Research Group has been compiling an FPGA report every two years since 2018, so this marks the third time they’ve focused on this design segment. At $5.8 billion the FPGA market is sizable, and forecasted to grow to $8.1 billion by 2025. FPGAs started out in 1984 with limited gate capacity, and have now grown to include millions of gates, processors and standardized data protocols.

Low volume applications benefit from the NRE of FPGA devices, and engineers can quickly prototype their designs by verifying and validating at speed. FPGAs now include processors, like: Xilinx Zynq UltraSCALE, Intel Stratix, Microchip SmartFusion. From the 980 participants in the functional verification study, the FPGA and programmable SoC FPGA design styles are the most popular.

Design Styles

As the size of FPGAs has increased recently, the chance of a bug-free production release has dropped to just 17%, which is even worse than the 30% of IC and ASIC projects for correct first silicon. Clearly, we need better functional verification for complex FPGA systems.

FPGA bug escapes into production

The types of bugs found in production fall into several categories:

  • 53% – Logic or Functional
  • 31% – Firmware
  • 29% – Clocking
  • 28% – Timing, path too slow
  • 21% – Timing, path too fast
  • 18% – Mixed-signal interface
  • 9% – Safety feature
  • 8% – Security feature

Zooming into the largest category of failure, logic or functional, there are five root causes.

Root Causes

FGPA projects mostly didn’t complete on time, once again caused by the larger size of the systems, complexity of the logic and even the verification methods being used.

FPGA Design Schedules

Engineers on an FPGA team can have distinct titles like design engineer or verification engineer, yet on 22% of projects there were no verification engineers – meaning that the design engineers did double-duty and verified their own IP. Over the past 10 years there’s been a 38% increase in the number of verification engineers on an FPGA project, so that’s progress towards bug-free production.

Number of engineers

Verification engineers on FPGA projects spent most of their time on debug tasks at 47%:

  • 47% – Debug
  • 19% – Creating test and running simulation
  • 17% – Testbench development
  • 11% – Test Planning
  • 6% – Other

The number of embedded processors has steadily grown over time, so 65% of FPGA designs have one or more processor cores now, increasing the amount of verification between hardware, software interfaces; and managing on-chip networks.

Embedded Processors

The ever-popular RISC-V processor is embedded in 22% of FPGAs, and AI accelerators are used in 23% of projects. There are 3-4 average number of clock domains used on FPGAs, and they require gate-level timing simulations for verification, plus the use of static Clock Domain Crossing (CDC) tools for verification.

Security features are added to 49% of FPGA designs to hold sensitive data, plus 42% of FPGA projects adhere to safety-critical standards or guidelines. On SemiWiki we’ve often blogged about ISO 26262 and DO-254 standards. Functional Safety (FuSa) design efforts take between 25% to 50% of the overall project time.

Safety Critical Standards

The top three verification languages are VHDL, SystemVerilog and Verilog; but also notice the recent jumps in Python and C/C++ languages.

Verification Languages

The most popular FPGA methodologies and testbench base-case libraries are: Accellera UVM ,OSVVM and UVVM. The Python-based cocotb was even added as a new category for 2022.

Verification Methodologies

Assertion languages are led by SystemVerilog Assertions (SVA) at 45%, followed by Accellera Open Verification Library (OVL) at 13% and PSL at 11%. FPGA designs may combine VHDL for RTL design along with SVA for assertions.

Formal property checking is growing amongst FPGA projects, especially as more automatic formal apps have been introduced by EDA vendors.

Formal Techniques

Simulation-based verification approaches over the past 10 years shows steady adoption, listed in order of relevance: Code coverage, functional coverage, assertions, constrained random.

Summary

The low 17% bug-free number for FPGA projects in 2022 that made it into production was the most surprising number to me, as the effort to recall or re-program a device in the field is expensive and time consuming to correct. A more robust functional verification approach should lead to fewer bug escapes into production, and dividing the study participants into two groups does show the benefit.

Verification Adoption

Read the complete 18 page white paper here.

Related Blogs


Area-optimized AI inference for cost-sensitive applications

Area-optimized AI inference for cost-sensitive applications
by Don Dingee on 02-15-2023 at 6:00 am

Expedera uses packet-centric scalability to move up and down in AI inference performance while maintaining efficiency

Often, AI inference brings to mind more complex applications hungry for more processing power. At the other end of the spectrum, applications like home appliances and doorbell cameras can offer limited AI-enabled features but must be narrowly scoped to keep costs to a minimum. New area-optimized AI inference technology from Expedera is taking on this challenge, targeting 1 TOPS performance in the smallest possible chip area.

Optimized for one model, but maybe not for others

Fitting into an embedded device brings constraints and trade-offs. For example, many teams concentrate on developing the inference model for an application using a GPU-based implementation, only to discover that no amount of optimization will get them anywhere near the required power-performance-area (PPA) envelope.

A newer approach uses a neural processing unit (NPU) to handle AI inference workloads more efficiently, delivering the required throughput in less die size and power consumption. NPU hardware typically scales up or down to meet throughput requirements, often measured in tera operations per second (TOPS). In addition, compiler software can translate models developed in popular AI modeling frameworks like PyTorch, TensorFlow, and ONNN into run-time code for the NPU.

Following a long-held principle of embedded design, there’s a strong temptation for designers to optimize their NPU hardware in their application, wringing out every last cent of cost and milliwatt of power. However, if only a few AI inference models are in play, it might be possible to optimize hardware tightly using a deep understanding of model internals.

Model parameters manifest as operations, weights, and activations, varying considerably from model to model. Below is a graphic comparing several popular lower-end neural network models.

On top of these differences sits the neural network topology – how execution units interconnect in layers – adding to the variation. Supporting different models for additional features or modes leads to overdesigning with a one-size-fits-all NPU big enough to cover performance in all cases. However, living with the resulting cost and power inefficiencies may be untenable.

NPU co-design solves optimization challenges

It may seem futile to optimize AI inference in cost-sensitive devices where models are unknown when the project starts or running more than one model for mode preferences. But, is it possible to tailor an NPU more closely to a use case without enormous investments in design time or running the risk of an AI inference model changing later?

Here’s where Expedera’s NPU co-design philosophy shines. The key is not hardcoding models in hardware but instead using software to map models to hardware resources efficiently. Expedera does this with a unique work sequencing engine, breaking operations down into metadata sent to execution units as a packet stream. As a result, layer organization becomes virtual, operations order efficiently, and hardware utilization increases to 80% or more.

 

 

 

 

 

In some contexts, packet-centric scalability unlocks higher performance, but in Expedera’s area-optimized NPU technology, packets can also help scale performance down for the smallest chip area.

Smallest possible NPU for simple models

Customers say a smaller NPU that matches requirements and keeps costs to a minimum can make the difference between having AI inference or not in cost-sensitive applications. On the other hand, a general-purpose NPU might have to be overdesigned by as much as 3x, driving up die size, power requirements, and additional costs until a design is no longer economically feasible.

Starting with its Origin NPU architecture, fielded in over 8 million devices, Expedera tuned its engine for a set of low to mid-complexity neural networks, including MobileNet, EfficientNet, NanoDet, Tiny YOLOv3, and others. The results are the new Origin E1 edge AI processors, putting area-optimized 1 TOPS AI inference performance in soft NPU IP ready for any process technology.

“The focus of the Origin E1 is to deliver the ideal combination of small size and lower power consumption for 1 TOPS needs, all within an easy-to-deploy IP,” says Paul Karazuba, VP of Marketing for Expedera. “As Expedera has already done the optimization engineering required, we deliver time-to-market and risk-reduction benefits for our customers.”

Seeing a company invest in more than just simple throughput criteria to satisfy challenging embedded device requirements is refreshing. For more details on the area-optimized AI inference approach, please visit Expedera’s website.

Blog post: Sometimes Less is More—Introducing the New Origin E1 Edge AI Processor

NPU IP product page: Expedera Origin E1


Interconnect Choices for 2.5D and 3D IC Designs

Interconnect Choices for 2.5D and 3D IC Designs
by Daniel Payne on 02-14-2023 at 10:00 am

STCO min

A quick Google search for “2.5D 3D IC” returns 669,000 results, so it’s a popular topic for the semiconductor industry, and there are plenty of decisions to make, like whether to use an organic substrate or silicon interposer for interconnect of heterogenous semiconductor die. Design teams using 2.5D and 3D techniques soon realize that there are many data formats to consider:

  • GDS – chiplet layout
  • LEF/DEF – Library Exchange Format, Design Exchange Format
  • Excel – ball map
  • Verilog – logic design
  • ODB++ – BGA package
  • CSV – Comma Separated Value

A recent e-book from Siemens provides some much-needed guidance on the challenges of managing the connectivity across the multiple data formats. Source data gets imported into their connectivity management tool, and then each implementation tool receives the right data for analyzing thermal, SI (Signal Integrity), PI (Power Integrity), IR drop, system-level LVS, and assembly checking.

For consistency your design team should use a single source of truth, so that when a design change is made then the full system is updated, and each implementation tool has the newest input data. The Siemen’s workflow stays in sync through the system-level LVS approach.

There’s no standard file format between package, interposer and board teams, yet by using ODB++ you can take in package and PCB data to the planning tool, allowing your team to communicate and optimize using any EDA tool. A package designer can move bumps around, and then the silicon team can review the changes using DEF files to accept them.

The largest system in package designs can have one million total pins, so your tools need to handle that capacity. Yield on a substrate depends on the accurate placement of via, via arrays and metal areas. Your substrate or interposer layout tool has to manage the interfaces properly, and make sure to get the foundry or OSAT assembly design kit for optimal results.

From the Siemens tool you have a planning cockpit to graphically and quickly create a virtual prototype of the complete 2.5/3D package assembly, aka – digital twin. This methodology makes possible System Technology Co-Optimization (STCO).  Making early trade-offs between architecture and technology produce the best results for a new system, by using predictive analysis to sort through all the different design scenarios. Predictive analysis validates that the net names are consistent between the die, interposer and package, thus avoiding shorts and opens.

System Technology Co-Optimization

System LVS ensures that all design domains are DRC and LVS clean, validating connections at the package bumps, interposer and die.

Physical verification is required during many steps:

  • Die level DRC and LVS
  • Interposer
  • Package
  • All levels together

The Siemens planning tool does all of this, while keeping the system design correct from start to finish, eliminating late surprises. An equivalence check also needs to be run between the planning tool and the final design.

Using a digital twin methodology your team can now verify that the package system is correct. Early mistakes are quickly caught through verification, like “pins up, pins down”, through an overlaps check between the package, silicon and interposer. Bump locations will also be checked for consistency between package and IC teams. Checks can be run after every change or update, just to ensure that there are no surprises.

Summary

The inter-related teams of IC, package and board can now work together by using a digital twin approach, as offered by Siemens. Not many EDA vendors have the years of experience in tool flows for all three of these areas,  plus you can add many of your favorite point EDA tools. Collaboration and optimization are possible for the challenges of 2.5D/3D interconnects.

Read the full 14 page e-book from Siemens.

Related Blogs

 


PCIe 6.0: Challenges of Achieving 64GT/s with PAM4 in Lossy, HVM Channels

PCIe 6.0: Challenges of Achieving 64GT/s with PAM4 in Lossy, HVM Channels
by Kalar Rajendiran on 02-14-2023 at 6:00 am

Multi Level Challenges

As the premier high-speed communications and system design conference, DesignCon 2023 offered deep insights from various experts on a number of technical topics. In the area of high-speed communications, PCIe has a played a crucial role over the years in supporting increasingly higher communications speed with every new revision. Revision 6.0, the latest revision of this communications interface standard enables system designers to achieve advances in the deployment of AI inference engines and co-processors in data centers. Consequently, PCIe 6.0 was a hot topic at the conference, not just for the 64GT/s speed but also for understanding the engineering challenges to reliably deliver that speed.

PCIe 6.0 poses a demanding set of chip and system design challenges on engineers. To reliably deliver the full benefits of PCIe 6.0, collaboration and cooperation are needed to standardize specifications in the areas of PCIe card, cable, connector assembly, test method, measurement and tools and PCIe PHY and controller IP. An experts panel to discuss these very topics included David Bouse from Tektronix, Rick Eads from Keysight Technologies, Steve Krooswyk from Samtec, Madhumita Sanyal from Synopsys and Timothy Wig from Intel. The panel session was moderated by Pegah Alavi from Keysight Technologies.

Pegah opened the session by highlighting the challenges introduced by multi-level signaling (MLS) when the switch was made from NRZ to PAM4 signaling to support 64GT/s. The adoption of MLS has opened up the path to continue increasing data communications speeds. By mapping more than 1 bit into a transmitted symbol, the required bandwidth/bit is reduced. But MLS introduces a lot of challenges too, which need to be overcome to achieve the speed benefit in a reliably manner.

Under MLS, the signal to noise ratio worsens, negatively impacting the performance of the channel. Consequentially, all aspects of the channel need to be paid close attention to. With that introduction, Pegah set the stage for the panelists to update the audience on their respective areas of focus to deliver a reliable PCIe 6.0 end-user solution. The following is a synthesis of the salient points from the session.

PCIe Card and Cable Form Factor Updates

Rev 6.0 of the PCIe Card Electromechanical (CEM) form factor specification is being finalized in 2023. The Rev 6.0 mechanical updates are completely redefining chassis retention on the North and East vias.

The CEM card physical form factor introduces two new power connectors at 48V to deliver 600W.

A shielded plane/south via approach has been introduced to shield the send signals from the receive signals. Without the shielding plane/south via approach, PCIe 6.0 channels would be completely broken, given known examples of inattentive card layout sabotaging even PCIe 5.0 channels.

Two PCIe cable form factors are being defined. Both these new form factors are distinct from previous PCIe cable solutions. An internal cable form factor is being defined based on the EDSFF-TA-1016 cable system targeting PCIe 5.0 and PCIe 6.0 speeds. An external cable form factor is being defined based on the industry standard CDFP. The Internal PCIe cable form factor has been characterized for a range of connectors and cables from multiple vendors, mounting styles and lengths.

Test Methods and Tools

The PCIe ecosystem is keeping PCIe 7.0 in mind as they define and develop tools and test methods for PCIe 6.0. After all, PCIe 7.0 spec (128 GT/s) is just around the corner as it is expected to arrive in the 2024-2025 time frame. The Tx, Rx and channel compliance requirements are kept in mind as the simulation, test and measurement methods are being developed to validate connectors and cables-connector assemblies. Forward Error Correction (FEC) has been introduced in PCIe 6.0, a first for the PCIe interface standard to accommodate the impact of channel loss.

PCIe v6.0 Retimer

All of the things presented above ensure that the cards, cables, connectors and assemblies are validated to support PCIe 6.0. Depending on the end market and application, a PCIe-based system will be deploying different channel topologies, leveraging the hardware listed above. Consequentially, each channel topology will bring with it, its own characteristic that would impact the channel performance.

The following chart shows four different channel topologies that are commonly found in PCIe-based systems.

 

From the PCIe PHY perspective, it needs to be able to optimize for all possible channel topologies. Given the Reduced Insertion Loss budget imposed by the PCIe 6.0 specification, how to ensure that the signal from the Root port will reach the destination port without losing fidelity.

The solution is the introduction of a PCIe 6.0 Retimer circuit. PCIe Retimers enable the expansion of PCIe over system boards, backplanes, cables, risers and add-in cards, irrespective of the channel topology that is deployed.  A Retimer is a physical layer and protocol-aware device but software-transparent, and can reside in any place in the channel between PCIe Root-port and End-point.  It fully recovers the data over any channel from the Host PCIe Root-port, extracts clock and re-transmits the clean data over another channel to the PCIe End-point device. The Retimer solution is implemented in the form of customized PHY and light controller logic for the MAC.

Summary

The panelists offered a number of tips and tricks and best practices throughout the session. When DesignCon makes the panelists’ presentation materials available on their website, it would be a good idea to download as reference materials. You may want to reach out to the panelists for more specific detailed information.

Also Read:

Optimization Tradeoffs in Power and Latency for PCIe/CXL in Datacenters

Synopsys Design Space Optimization Hits a Milestone

Webinar: Achieving Consistent RTL Power Accuracy


Optimization Tradeoffs in Power and Latency for PCIe/CXL in Datacenters

Optimization Tradeoffs in Power and Latency for PCIe/CXL in Datacenters
by Daniel Nenni on 02-13-2023 at 10:00 am

Power Latency Webinar min

PCI Express Power Bottleneck

Madhumita Sanyal, Sr. Technical Product Manager, and Gary Ruggles, Sr. Product Manager, discussed the tradeoffs between power and latency in PCIe/CXL data centers during a live SemiWiki webinar on January 26, 2023. The demands on PCIe continue to grow with the integration of multiple components and the challenge of balancing power and latency. The increasing number of lanes, multicore processors, SSD storage, GPUs, accelerators, and network switches have contributed to this growth in demand for PCIe in compute, servers, and datacenter interconnects. Gary and Madhumita provided expert insights on PCIe power states and power/latency optimization. I will cherry pick a few things that interested me.

Watch the full webinar for a more comprehensive understanding on Power, Latency for PCIe/CXL in Datacenters from Synopsys experts.

Figure 1. Compute, Server, and Data Center Interconnect Devices with Multiple Lanes Hit the Power Ceiling

Reducing Power with L1 & L2 PCIe Power States

In the early days of PCIe, the standard was primarily focused on PCs and servers, for example achieving high throughput. This early standard lacked considerations for what we would now consider green or mobile friendly. However, since the introduction of PCIe 3.0, PCI-SIG has placed a strong emphasis on supporting aggressive power savings while continuing to advance performance goals. These power savings are achieved through the implementation of a standard defined as link states. Link states range from L0 (everything on) to L3 (everything off) with intermediate states contributing various levels of power savings. Possible link states continue to be refined as the standard advances.

Madhumita explained that PCIe PHYs are the big power hogs, accounting for as much as 80% to power consumption in a fully-on (L0) state! The lower power, L1 state, now includes various sub-states, enabling the deactivation of transceivers, PLLs, and analog circuitry in the PHY. The L2 power state reflects a power-off state with only auxiliary power to support circuitry such as retention logic. L1 (and its sub-states) and L2 are the workhorses for fine-tuning power savings. PCIe 6.0 introduces the option of L0p, which allows for dynamic power down on a subset of lanes in a link while keeping the remainder fully active. This feature results in both a reduction of the number of active lanes via L0p, which lowers the bandwidth, with a simultaneous reduction in the power consumption.

With PCIe power states defined, the Synopsys experts delved deeper into the process for the host and device to determine the appropriate link state. A link in any form of sleep state will incur a latency penalty upon waking – known as exit latency – such as when transitioning to an L0 state to support communication with an SSD. To reduce the system impact of this penalty, the standard specifies a latency tolerance reporting (LTR) mechanism which informs the host of the latency tolerance of the device towards an interrupt request, ultimately guiding the negotiation process.

Using Clock-Gating to Reduce Activity

The range of power saving options in digital logic is well known. I was particularly interested in the usage of clock gating techniques to optimize energy consumption by eliminating wasted clock toggling on individual flops or banks of flops, even globally for entire blocks. Dynamic voltage and frequency scaling (DVFS) decreases power by reducing operating voltage and clock frequency on functions which can afford to run slower at times. Although DVFS can result in significant power savings, it also adds complexity to the logic. Finally, power gating allows for the shutting off both dynamic and leakage power at a block level, except perhaps for auxiliary power to support retention logic.

In addition to these options, there are other techniques such as the use of mixed VT libraries. Madhumita also expanded on board and backplane considerations in balancing performance vs. power in PCIe 6.0. Low power can be achieved with lower channel reaches. For a more comprehensive discussion on these topics, I encourage you watch the webinar.

Latency in PCIe/CXL: Waiting is the Hardest Part!

Gary Ruggles recommends utilizing optimized embedded endpoints to reduce latency. These endpoints avoid the need for the full PCIe protocol from the host, through a physical connection and again through the full PCIe protocol on the device side. For example, a NIC interface could be embedded directly in the same SoC as the host, connecting to the PCIe switch directly through a low latency interface.

Gary also expanded on using a faster clock to decrease latency, while acknowledging the obvious challenges. A faster clock may require higher voltage levels, leading to increased dynamic power consumption, and higher speed libraries increase leakage power. However, the tradeoff between clock speed and pipelining is not always a total clearcut. Despite the potential increase in power consumption, a faster clock may still result in a performance advantage if the added pipelining latency is outweighed by the reduction in functional latency. Latency considerations factor in how you plan power states in PCIe. Fine-grained power state management can reduce power usage, but it also results in increased exit latencies, which can become more consequential when managing power aggressively.

Gary’s final point in managing latency is considering the use of CXL. This protocol is built based PCIe, while also supporting the standard protocol through CXL.io. CXL’s claim to fame is support for cache coherent communication through CXL.cache and CXL.mem. These interfaces offer much lower latency than PCIe. If you have need for coherent cache/memory access, CXL could be a good option.

Takeaways

Power consumption is a major concern in datacenters. The PCIe standard makes allowance for multiple power states to take advantage of opportunities to reduce power in the PHY and in the digital logic. Taking full advantage of the possibilities requires careful tradeoffs between optimization for latency, power, and throughput, all the way from software down to the PCIe physical layer. When suitable, CXL proves to be a promising solution, offering much lower latency compared to conventional PCIe.

Naturally ,Synopsys has production IP for PCIe (all the way up to Gen 6) and for CXL (all the way to CXL 3.0).

You can watch the webinar HERE.

Also Read:

PCIe 6.0: Challenges of Achieving 64GT/s with PAM4 in Lossy, HVM Channels

How to Efficiently and Effectively Secure SoC Interfaces for Data Protection

ARC Processor Summit 2022 Your embedded edge starts here!


Big plans for state-of-the-art RF and microwave EDA

Big plans for state-of-the-art RF and microwave EDA
by Don Dingee on 02-13-2023 at 6:00 am

RF front-end components are driving demand for state-of-the-art RF and microwave EDA

RF and microwave design is no longer confined to a few defense and aerospace EEs huddled in dark cubicles working with spreadsheets and primitive circuit simulators. Now, areas like 5G and automotive demand complex RF systems. Advanced RF and microwave EDA tools are taking on electromagnetic (EM), thermal, and power simulation, and everyone from systems engineering to foundry partners touch the workflow.

Quantitatively, the opportunity is drawing interest rapidly. Market analyst Yole Développement pegs CAGR for RF front-end components at 8.3% through 2026. Digital EDA companies are scrambling to incorporate RF-aware technology into their EDA mix – and finding they need many different pieces, not just point products, for a complete RF design workflow.

As a long-time leader in the RF and microwave design and test business, Keysight has proven tools and experience customers have counted on for decades. Its state-of-the-art RF/microwave EDA solutions will drive integration, openness, and scalability for anyone looking to innovate – even designers and systems engineers from digital-first backgrounds with less RF knowledge. Nilesh Kamdar, Senior Director and Portfolio Manager for RF/Microwave, Power Electronics & Device Modeling products, sat down to give us a sense of what’s coming in 2023.

On a journey together with customers to first-pass success

Kamdar sees his business and its mission with customers hinging on collaboration and trust, going back to its HP EEsof origin story four decades ago. “We’ve been #1 in RF/microwave EDA for decades because our customers trust us,” he says. “Now, it’s time to expand that trust and offer solutions that take existing customers – and new ones – into the future.” Keysight portfolio teams meet regularly with Tier 1 customers, and their list of pain points is weighty.

  • Time-to-market is shrinking, especially where consumer life cycles dominate,
  • Complexity is also growing exponentially, with more integration, advanced packaging and foundry processes, and higher expectations for user experience
  • Packing more into less space is causing multi-domain interactions, detuning RF performance, and putting pressure on managing heat and power consumption
  • Scale is now immense, with billions of devices produced from a single design
  • Open environments and platforms are redefining boundaries in ecosystems and workflows

Maybe it’s stating the obvious, but Kamdar says another theme for Keysight is design to verification to manufacturing. “We help people simulate it. We help people build it. We help people measure it. No other RF/microwave EDA company really offers all three phases,” he observes. The link between advanced measurement science in virtual and physical space, giving the same results any way a user chooses to work, is unique in the industry. In RF and microwave design, the destination counts, and Keysight gives customers the best chance of first-pass success with fewer schedule-killing hardware re-spins.

Doing more with automation, interoperability, and simulation

There’s also a few unknowns lurking beneath the surface. One is the “talent shortage.” It’s a case of not having the right talent with the right RF EDA tools at the right moment in the workflow. It could show up on teams spread in different departments, across facilities, across continents, or across organizations working together.

If a digital design is needed, many people and mature EDA tools can handle even complex designs on advanced processes with smooth handoffs. But introduce mixed-signal technology – CPU, memory, and RF in the same chipsets – and it’s a different game. “Designs can cross technology domains, with roundtrip loops between tools and people for changes and approvals,”  notes Daren McClearnon, RF and Microwave Product Marketing Lead at Keysight. “Co-design with less RF-centric people requires another level of interoperability, or else it can degenerate to ‘trick-or-treating’ manually around an organization, trying to close design issues.”

To Kamdar’s earlier point, this looks different to an RF EDA install base versus a prospect who hasn’t embraced the right tools for one of several reasons. “We like it when our customers surface their challenges, and Keysight usually has more to offer them,” Kamdar says. Fundamental changes like using industry-standard file formats, or incremental changes like scripting a frequently-used procedure, can have a big payoff for customers.

RF design prospects face what they think are tougher decisions. Changing a workflow can be painful, and learning curve costs exist. Kamdar boils it down to one question: are your existing solutions achieving your goals? He suggests it’s not always about switching tools per se but more about bringing in a tool that integrates with the mix of tools in service and delivers value without disrupting workflows and adding extra steps.

Kamdar says people know Keysight for RF and 5G design and electromagnetic simulation technology but not so much for other solutions in the portfolio, like thermal simulation, packaging design, and multi-physics analysis. Trust is vital, and Kamdar wants more prospects to experience what customers already see with real-world simulation accuracy. But the emphasis on automating processes and making everything interoperable to help the talent shortfall is equally important in the Keysight 2023 strategy.

Three areas where new Keysight RF and microwave EDA innovation is coming

Kamdar walked us through three focused areas for RF EDA innovation his R&D teams are aggressively pursuing, with rollouts expected throughout 2023.

  • Multi-technology and Open Platforms. Streamlining physical co-design and verification in an all-Keysight environment or in workflows paired with Cadence, Synopsys, and other EDA tools is a top priority. There are also ongoing improvements in foundry PDK offerings, with developments coming from several new or enhanced foundry and semiconductor partnerships.
  • 6G and mmWave Technology Leadership. Keysight is deeply committed to active participation in specification development for 5G-Advanced and 6G and supporting early-stage research working with customers. Improvements to the core EM simulation engine in several product lines target densification challenges, pushing state-of-the-art forward.
  • Enterprise Scale and Transformation. Cloud and high-performance computing platforms are having enterprise-wide productivity impacts. Keysight is bringing them to bear on RF and microwave engineering, scaling up for peak demands at critical moments in the development life cycle. The “all Python, all the time” message drives enhancements for scripted automation of repetitive tasks.

Looking further ahead, Kamdar also sees a more prominent role for AI/ML in modeling and simulation. A decade of Keysight AI/ML research is starting to weave its way into its RF EDA solutions. One exciting application for artificial neural network (ANN) technology is the datasheet curve-to-model work from Alex Petr’s team.

Kamdar concludes that with in-person tradeshows restarting, he and his team are excited to get out and see customers face-to-face again. Keysight’s RF and microwave EDA vision and latest announcements will be on full display this year at several major industry events, including IEEE’s IMS 2023 in San Diego in June and the 60th DAC in San Francisco in July. On the virtual event front, Keysight will speak on February 15th in an online panel moderated by Microwave Journal, pairing with Analog Devices, featured in a new Keysight case study on reference designs for RF front ends.

 

Microwave Journal Online Panel:
What is the Best Beamsteering Antenna Array and Repeater Technologies for 5G mmWave?

Also Read:

Higher-order QAM and smarter workflows in VSA 2023

Advanced EM simulations target conducted EMI and transients

Seeing 1/f noise more accurately


Podcast EP143: FPGAs, eFPGAs and the Emerging Chiplet Market

Podcast EP143: FPGAs, eFPGAs and the Emerging Chiplet Market
by Daniel Nenni on 02-10-2023 at 10:00 am

Dan is joined by Nick Ilyadis, Senior Director of Product Planning at Achronix. With over 35 years of data and semiconductor engineering and manufacturing experience and 72 issued patents under his name, Nick is a recognized expert on software and hardware development and quality control.

Dan explores the emerging chiplet market with Nick. The impact of standards, advanced packaging challenges and how and why to assemble a multi-die chiplet-based system are discussed. The application of chiplets in FPGA and eFPGA applications is also explored.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.

Achronix Semiconductor Corporation is a fabless semiconductor corporation based in Santa Clara, California, offering high-end FPGA-based data acceleration solutions, designed to address high-performance, compute-intensive and real-time processing applications. Achronix is the only supplier to have both high-performance and high-density standalone FPGAs and licensed eFPGA IP solutions. Achronix Speedster®7t FPGA and Speedcore™ eFPGA IP offerings are further enhanced by ready-to-use VectorPath™ accelerator cards targeting AI, machine learning, networking and data center applications. All Achronix products are fully supported by the Achronix Tool Suite which enables customers to quickly develop their own custom applications.


Dr. Anirudh Devgan Elected to The National Academy of Engineering (NAE)

Dr. Anirudh Devgan Elected to The National Academy of Engineering (NAE)
by Daniel Nenni on 02-10-2023 at 6:00 am

Dr. Anirudh Devgan Cadence

Having known many of the top EDA CEOs during my semiconductor tenure the common traits I have found are brilliance, humility, endurance, and a sharp sense of humor. EDA solves so many problems, complex problem after complex problem, that it takes teams of incredibly smart people to solve them. Even more difficult is leading these teams. Falling into the footsteps of great Cadence CEOs Joe Costello and Lip-Bu Tan, Dr. Anirudh Devgan has already made his place in EDA history, absolutely.

Founded in 1964, the U.S. National Academy of Engineering is a private, independent, nonprofit institution that provides engineering leadership in service to the nation. Its mission is to advance the welfare and prosperity of the nation by providing independent advice on matters involving engineering and technology, and by promoting a vibrant engineering profession and public appreciation of engineering.

National Academy of Engineering Elects 106 Members and 18 International Members

FOR IMMEDIATE RELEASE

TUE, FEBRUARY 07, 2023

Washington, D.C., February 07, 2023 —

The National Academy of Engineering (NAE) has elected 106 new members and 18 international members, announced NAE President John L. Anderson today. This brings the total U.S. membership to 2,420 and the number of international members to 319.

Election to the National Academy of Engineering is among the highest professional distinctions accorded to an engineer. Academy membership honors those who have made outstanding contributions to “engineering research, practice, or education, including, where appropriate, significant contributions to the engineering literature” and to “the pioneering of new and developing fields of technology, making major advancements in traditional fields of engineering, or developing/implementing innovative approaches to engineering education.” Election of new NAE members is the culmination of a yearlong process. The ballot is set in December and the final vote for membership occurs during January.

Individuals in the newly elected class will be formally inducted during the NAE’s annual meeting on Oct. 1, 2023. A list of the new members and international members follows, with their primary affiliations at the time of election and a brief statement of their principal engineering accomplishments.

New Members:

Devgan, Anirudh, president and CEO, Cadence Design Systems, San Jose, Calif. For technical and business leadership in the electronic design automation industry.

As we all know Anirudh  is not only President and CEO of Cadence Design Systems, Inc., he is a member of the Board of Directors. Prior to becoming CEO in 2021, he was President of Cadence, Executive Vice President and General Manager of the Digital & Signoff and System Verification groups. Prior to joining Cadence in 2012, Anirudh was with Magma Design Automation, and earlier held management and technical roles at the IBM Thomas J. Watson Research Center, IBM Microelectronics Division, and IBM Austin Research Lab.

What you may not know is that Anirudh successfully pioneered the application of massively parallel and distributed architectures to create several industry firsts and most impactful products in the areas of SPICE simulation, library characterization, place and route, static timing, power and electromagnetics, among several others. He also drove the first common compiler architecture for emulation and prototyping platforms.

As with other notable EDA CEOs, Anirudh has a collection of notable associations and awards including: IEEE Fellow, holds 27 US patents, Phil Kaufman Award for his extensive contributions to EDA as well as the IBM Corporate Award and IEEE McCalla Award. He serves on the boards of the Global Semiconductor Alliance and the Electronic System Design Alliance.

So, congratulations Anirudh, it is a pleasure working with you and thank you very much for your contributions to EDA!

Also Read:

2022 Retrospective. Innovation in Verification

Validating NoC Security. Innovation in Verification

Functional Safety for Automotive IP