Banner 800x100 0810

Podcast EP250: The Inner Workings of RISC-V International and Details About the RISC-V Summit with Andrea Gallo

Podcast EP250: The Inner Workings of RISC-V International and Details About the RISC-V Summit with Andrea Gallo
by Daniel Nenni on 10-04-2024 at 10:00 am

Dan is joined by Andrea Gallo, Vice President of Technology at RISC-V International. Andrea heads up the Technical Activities in collaboration with RISC-V members across workgroups and committees in growing the adoption of the RISC-V Instruction Set Architecture. Prior to RISC-V International, Andrea held multiple roles at Linaro developing Arm based solutions. Prior to Linaro Andrea was a Fellow at ST-Ericsson for smartphones and application processors and prior to that he spent 12 years at STMicroelectronics.

Andrea explains the structure and operation of the RISC-V International organization. This ambitious effort includes 70 working groups who each meet on a monthly basis. Andrea attends many of these meetings to ensure good collaboration and to maximize the innovation and impact for all the RISC-V members.

Andrea also describes the upcoming RISC-V Summit. The rich program includes tutorials, member meetings, the popular hackathon, exhibits, a large number of presentations and keynotes from industry leaders, and more.

The RISC-V Summit will take place October 21-23, 2024 in Santa Clara. There are still reduced rate registrations available. You can learn more about the conference and register here.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Nikhil Balram of Mojo Vision

CEO Interview: Nikhil Balram of Mojo Vision
by Daniel Nenni on 10-04-2024 at 6:00 am

NikhilBalram headshot213

Dr. Nikhil Balram has over 25 years of experience in the semiconductor and display industries. Past executive roles include CEO of EyeWay Vision Inc., a startup developing immersive AR glasses, Head of the Display Group at Google, responsible for developing display systems for all Google consumer hardware, including AR and VR, CEO of Ricoh Innovations Corporation, VP and GM of Digital Entertainment BU at Marvell and CTO of the Display Group at National Semiconductor.

He has received numerous awards including the Otto Schade Prize from the Society for Information Display (SID) and a Gold Stevie® Award for Executive of the Year in the Electronics category. Dr. Balram is recognized as a Fellow of the SID and was General Chair for Display Week 2021 and Program Chair for Display Week 2019. Dr. Balram received his B.S., M.S. and Ph.D. in electrical engineering from Carnegie Mellon University, and has served on the faculty of three major universities.

Tell us about your company?

Mojo Vision is developing and commercializing high-performance micro-LED display technology for consumer, enterprise, and government applications. The company combines breakthrough technology, leading display and semiconductor expertise, and an advanced 300mm manufacturing process to deliver on the promise of micro-LED displays. Mojo’s proprietary quantum-dot (QD) technology brings full color capability to our display platform and meets the performance demands for all form factors. Mojo Vision developed the world’s smallest, densest dynamic display for the first augmented reality (AR) smart contact lens and is now applying this innovation and expertise to lead the disruption of the $160B+ display industry. Our beachhead market is AR and we are laser focused on supporting big tech companies with our microdisplay development. 

What problems are you solving?

There are several problems that we are solving but for conciseness, I will focus on two critical ones for our beachhead AR customers – brightness and efficiency. A big problem today for our customers is efficient generation of light. Only a small % of original light input is transmitted in AR glasses which means that AR requires an extremely high level of brightness, particularly to be effective in sunlight. Brightness levels need to start with one million candelas per square meter (cd/m2); with that amount of light, conventional quantum-dots that are used in TVs degrade significantly. For TV applications, QDs are facing a light flux of 4 to 8 milliwatts per cm squared but in AR applications, QDs are facing between 4 and 8 watts per cm squared, a thousand times more! At Mojo, we created red and green QDs that solve this lifetime issue for micro-LED applications. For example, we published results from testing red QD film using a power density of 4 watts per cm squared—1,000 times more than a TV – and our red QD film showed flat emission with no degradation for 500 hours and took thousands of hours to degrade to the 80 percent emission level that is sometimes used as a measure of lifetime. That meets initial benchmarks from AR customers; it’s worth noting that a conventional red QD film degraded to 60 percent emission in only 80 hours in the same test setup. 

What application areas are your strongest?

As a component supplier, we don’t necessarily have end applications; rather, we support our customers who are building products for their customers, i.e. the end-users. We believe our micro-LED technology will be a platform that serves many different market segments – augmented reality, light field displays, smart wearables, smartphones, televisions, laptops, automotive heads-up displays, high-speed data connectivity in data centers, 3D printing, the list abounds! Any application that needs really tiny, highly bright, and incredibly efficient light sources can benefit from our technology. I mentioned earlier that our beachhead market is AR. For AR to truly scale, the form factor needs to look and feel like the eyeglasses most people wear today, and the industry continues to push for smaller, less obtrusive headsets and smart glasses. This is where we think our micro-LED technology adds the most immediate value and offers a significant advantage over current display technologies like OLED and LCoS (Liquid Crystal on Silicon).  The 4µm pixel size in Mojo’s current generation of monolithic RGB panels is critical to enabling a lightweight, compact display, which will be key to making glasses fully ‘smart’ without sacrificing visual-appeal and comfort. 

What keeps your customers up at night?

It varies depending on the specific market our customer serves. For those in the AR market, the competitive landscape is intense and constantly evolving. These customers are concerned with staying ahead of their tech rivals by integrating cutting-edge technology and offering value to their end-users. Balancing the right tradeoffs of form factor and performance is a critical worry. The market also has some uncertainty, and the AR hype cycle has left many investors and end users cautious.  Customers need to ensure their devices are not only innovative and scalable but also reliable and widely accessible to gain a foothold in this nascent market.

For those in the mass display market (e.g. TVs, laptops, etc.), the main factors keeping them up at night are the pressure of strong competition and aggressive cost management. In a sector characterized by thin margins and high volume, the race to offer the best price-performance ratio is always on. These customers are constantly seeking ways to reduce product costs while maintaining the highest standards of display quality. The need to innovate and differentiate their products without significantly increasing cost is a delicate balancing act. 

What does the competitive landscape look like and how do you differentiate?

The competitive landscape in the display market is both dynamic and challenging with a number of strong, established incumbents in traditional display technology (e.g. LCD) and a growing number of players in micro-LED display technology. Micro-LED companies are racing to overcome technical hurdles, achieve mass production, and deliver displays that surpass existing technologies in terms of brightness, efficiency, and color accuracy.

Mojo Vision stands out in the field through several key differentiators:

  • High Performance Quantum Dot (HPQD): We use properietary QD technology to provide vivid color and high brightness with high reliability. We own the entire end-to-end process for QDs – making, integration, testing – and effectively have a QD company nested within Mojo Vision! 
  • Stable Supply Chain: In an industry where supply chain disruptions can significantly impact production timelines and costs, our reliable supply chain offers a distinct advantage. We have established strong partnerships and a geopolitically stable supply chain, which has become a requirement for many large customers in the US and Europe.
  • Full Display System Expertise: Unlike many competitors who focus solely on certain elements of a display, we have comprehensive expertise in the entire display system. This holistic approach allows us to optimize every aspect of the display system, from CMOS backplanes to tiny LEDs to custom microlens arrays.  
  • Very Tiny, Very Efficient LEDs: Our micro-LEDs are not only incredibly small (much smaller than a red blood cell!) but also highly efficient. This combination results in displays that are more energy-efficient and capable of delivering superior performance in compact form factors. 

By focusing on these differentiators, we provide our customers with cutting-edge micro-LED displays that meet the highest standards of quality, performance, and size, helping them stay ahead in a highly competitive market.

What new features/technology are you working on?

As a startup, we must prioritize our resources and not “boil the ocean” so we are very focused on our beachhead market of AR, bringing a suite of microdisplay products to this next year. These microdisplays also have applicability to other markets such as light field displays and Auto head-up-displays (HUD).  We just announced a partnership with heads-up display (HUD) company CY Vision to develop HUDs with micro-LED technology for the automotive industry.  These HUDs will leverage artificial intelligence and 3D imaging to provide drivers an immersive and personalized driving experience with informative, line-of-sight overlays that promote driver safety and provide essential information.

At the same time, we are working to develop and validate our concept of Red, Green, and Blue (RGB) chiplets. Mojo’s innovation here will enable cost-effective large format display production by significantly increasing the number of LEDs per wafer and simplifying the mass transfer process. Traditional mass transfer process is complex, requiring LEDs from separate red, green, and blue wafers to be transferred to an intermediate substrate, and then to a final substrate. Our single RGB wafer with tiny RGB chiplets results in 3x fewer transfers per display and 10x+ more pixels per wafer, which means much lower costs.

How do customers normally engage with your company?

We engage directly with customers throughout the year through the deep connections established through our individual experiences in the semiconductor, display and AR/XR industries. Our focus is on customers who are leaders in their respective segments, rather than trying to engage with everyone. Also, we do presentations at several industry conferences, including keynotes, tutorial seminars, panel discussions, and invited papers and articles, that keep our customers, partners and competitors informed of our industry-leading progress.

Also Read:

CEO Interview: Doug Smith of Veevx

CEO Interview: Adam Khan of Diamond Quanta

Executive Interview: Michael Wu, GM and President of Phison US


The Immensity of Software Development and the Challenges of Debugging (Part 3 of 4)

The Immensity of Software Development and the Challenges of Debugging (Part 3 of 4)
by Lauro Rizzatti on 10-03-2024 at 10:00 am

Immensity of SW development Part 3 Figure 1

Part 3 of this 4-part series analyzes methods and tools involved in debugging software at different layers of the software stack.

Software debugging involves identifying and resolving issues ranging from functional misbehaviors to crashes. The essential requirement for validating software programs is the ability to monitor code execution on the underlying processor(s).

Software debugging practices and tools vary significantly depending on the layer of the software stack being addressed. As we move up the stack from bare-metal software to operating systems and finally to applications, three key factors undergo significant changes:

  1. Lines of Code (LOC) per Task: The number of lines of code per task increases substantially as we move up the stack.
  2. Computing Power (MIPS) Requirements: The computing power needed to execute software within a feasible timeframe for debugging grows exponentially.
  3. Hardware Dependency: The dependency on underlying hardware decreases as we ascend the software stack. Bare-metal software is highly hardware-dependent, while applications are typically hardware-independent.

Additionally, the skills required of software developers vary considerably depending on the specific software layer they are working on. Lower-level software development often necessitates a deep understanding of hardware interactions, making it well-suited for firmware developers. In contrast, operating system (OS) development demands the expertise of seasoned software engineers who should collaborate closely with the hardware design team to ensure seamless integration. At the application software layer, the focus shifts toward logic, user experience, and interface design, requiring developers to prioritize user interaction and intuitive functionality.

Table I below summarizes these comparisons, highlighting the differences in software debugging requirements across various layers of the software stack.

Table I: Comparison of three key software attributes along the software stack.

Effective software debugging is a multidimensional challenge influenced by a variety of factors. The scale of the software program, the computational resources available for validation, and the specific hardware dependencies all play critical roles in determining the optimal tools and methodologies for the task.

Software Debug at the Bottom of the Software Stack

The bare-metal software layer sits between the hardware and the operating system, allowing direct interaction with the hardware without any operating system intervention. This layer is crucial for systems that demand high performance, low latency, or have specific hardware constraints.

Typically, the bare-metal layer includes the following components:

  1. Bootloader: Responsible for initializing the hardware and setting up the system to ensure that all components are ready for operation.
  2. Hardware Abstraction Layer (HAL): A comprehensive set of APIs that allow the software to interact with hardware components. This layer enables the software to work with the hardware without needing to manage low-level details, providing a simplified and consistent interface.
  3. Device Drivers: These software components initialize, configure, and manage communication between software and hardware peripherals, ensuring seamless interaction between different system parts.

Prerequisites to Perform Software Validation at the Bottom of the Software Stack

When validating software at the lower levels of the software stack, two key prerequisites must be considered.

First, processing software code that goes beyond simple routines requires a substantial number of clock cycles, often numbering in the millions. This can be efficiently handled by virtual prototypes or hardware-assisted platforms, such as emulators or FPGA prototypes.

Second, the close interdependence of hardware and software at this level necessitates a detailed hardware description, typically provided by RTL. This is where hardware-assisted platforms excel. However, for designs modeled at a higher level than RTL, virtual prototypes can still be effective, provided the design is represented accurately at the register level.

Processor Trace for Bare-Metal Software Validation

Processor trace is a widely used method for software debugging that involves capturing the activity of a CPU or multiple CPUs non-intrusively. This includes monitoring memory accesses and data transfers with peripheral registers and sending the captured activity to external storage for analysis, either in real-time or offline, after reconstructing it into a human-readable form.

In essence, processor trace tracks the detailed history of program execution, providing cycle counts for performance analysis and global timestamps for correlating program execution across multiple processors. This capability is essential for debugging software coherency problems. Processor trace offers several advantages over traditional debugging methods like JTAG, including minimal impact on system performance and enhanced scalability.

However, processor trace also presents some challenges, such as accessing DUT (Device Under Test) internal data, storing large amounts of captured data, and the complexity and time-consuming nature of analyzing that data.

DUT Data Retrieval to External Storage

Retrieving DUT internal data in a hardware-assisted platform can be achieved through an interface consisting of a fabric of DPI-based transactors. This mechanism is relatively simple, it does not add overhead and marginally impacts execution speed. The state of any register and net can be monitored and saved to external storage. As the design grows larger and the run-time extends, exponentially more data gets retrieved.

Despite efforts to standardize the format of the collected data, there is currently no universal format, which poses a challenge to performing analysis. However, we must also acknowledge that DUT architectures like x86, RISC-V, ARM are fundamentally different to ever allow standardization.

In summary, even with the challenges, processor trace has been in use for many years and is broadly adopted by most modern processors from major vendors such as Arm, RISC-V, and others. With ARM, since it’s a single vendor – standardization has been easier to come by. On the other hand, RISC-V is open source and multi-vendor.

Arm TARMAC & CoreSight

Arm TARMAC and CoreSight are complementary Arm technologies for debugging and performance analysis.

TARMAC is a post-execution analysis tool capturing detailed instruction traces for in-depth investigations. It records every executed instruction, including register writes, memory reads, interrupts, and exceptions in a textual format. It generates reports and summaries based on the trace data, such as per-function profiling and call trees. This allows developers to replay and analyze the sequence of events that occurred during program execution.

CoreSight is an on-chip solution providing real-time visibility into system behavior without halting execution. It provides real-time access to the processor’s state, including registers, memory, and peripherals, without stopping the CPU. Table II compares Arm TARMAC vs CoreSight.

Table II: Comparison of Arm CoreSight versus TARMAC.

In essence, CoreSight is the hardware backbone that enables the generation of trace data, while Arm Tarmac is the software tool that makes sense of that data.

RISC-V E-Trace

Figure 1: Verdi provides a unified HW/SW views for efficent debug of the interactions between the two domains. Source: Synopsys

E-Trace is a high-compression tracing standard for RISC-V processors. By focusing on branch points rather than every instruction, it significantly reduces data volume, enabling multi-core tracing and larger trace buffers. This is especially beneficial to trace multiple cores simultaneously and store larger trace histories within fixed-size buffers. E-Trace is useful for debugging custom RISC-V cores with multiple extensions and instructions, ensuring that all customizations work correctly. It also supports performance profiling and code coverage analysis.

Synopsys Verdi Hardware/Software Debug

Verdi HW/SW Debug provides a unified view of hardware and software interactions. By synchronizing software elements (C code, assembly, variables, registers) with hardware aspects (waveforms, RTL, assertions), it enables seamless navigation between the two domains. This integrated approach facilitates efficient debugging by correlating software execution with hardware behavior, allowing users to step through code and waveforms simultaneously and pinpoint issues accurately. See Figure 1.

Synopsys ZeBu® Post-Run Debug (zPRD)

ZeBu Post-Run Debug (zPRD) is a comprehensive debugging platform that supports efficient and repeatable analysis. By decoupling the debug session from the original test environment, zPRD accelerates troubleshooting by allowing users to deterministically recreate any system state. It simplifies the debugging process by providing a centralized control center for common debugging tasks like signal forcing, memory access, and waveform generation. Leveraging PC resources, zPRD optimizes waveform creation for faster analysis.

Moving up of the Software Stack: OS Debug

Operating systems consist of a multitude of software programs, libraries, and utilities. While some components are larger than others, collectively they demand billions of execution cycles, with hardware dependencies playing a crucial role.

For debugging an operating system when hardware dependencies are critical, the processor trace method is still helpful. However, this approach, while effective, becomes more complex and time-consuming when dealing with the largest components of an OS.

GNU Debugger

Among the most popular C/C++ software debugging tools in the UNIX environment is GDB (GNU Debugger). GNU is a powerful command-line tool used to inspect and troubleshoot software programs as they execute. It’s invaluable for developers to identify and fix bugs, understand program behavior, and optimize performance.

The GNU key features Include:

  • Setting breakpoints: Pause program execution at specific points to inspect variables and program state.
  • Stepping through code: Execute code line by line to understand program flow.
  • Examining variables: Inspect the values of variables at any point during execution.
  • Backtracing: Examine the function call stack to understand how the program reached a particular point.
  • Modifying variables: Change the values of variables on the fly to test different scenarios.
  • Core dump analysis: Analyze core dumps to determine the cause of program crashes.
  • Remote Debugging: GDB can debug programs running on a different machine than the one it is running on, which is useful for debugging embedded systems or programs running on remote servers.

GDB can be employed to debug a wide range of issues in various programming languages. Among common use cases are:

  • Segmentation faults: These occur when a program tries to access memory it doesn’t own. GDB can help pinpoint the exact location where this happens.
  • Infinite loops: GDB can help you identify code sections that are looping endlessly.
  • Logical errors: By stepping through code line by line, you can examine variable values and program flow to find incorrect logic.
  • Memory leaks: While GDB doesn’t have direct tools for memory leak detection, it can help you analyze memory usage patterns.
  • Core dumps: When a program crashes unexpectedly, a core dump is generated. GDB can analyze this dump to determine the cause of the crash.
  • Performance bottlenecks: By profiling your code with GDB, you can identify sections that are consuming excessive resources.
  • Debugging multi-threaded programs: GDB supports debugging multi-threaded applications, allowing you to examine the state of each thread.

GDB is an effective debugging tool for software developers, especially those working with low-level or performance-critical code.

At the Top of the Software Stack: Application Software Debug

Application software spans a wide range of complexity and execution time. Some applications may execute within few millions of cycles, other all the way to billions of cycles. All demand efficient development environments. Virtual prototypes offer a near-silicon execution speed, making them ideal for pre-silicon software development.

A diverse array of debuggers serves different application needs, operating systems, programming languages, and development environments. Popular options include GDB, Google Chrome DevTools, LLDB, Microsoft Visual Studio Debugger, and Valgrind.

To further streamline development, the industry has adopted Integrated Development Environments (IDEs), which provide a comprehensive platform for coding, debugging, and other development tasks.

IDEs: Software Debugger’s Best Friend

An Integrated Development Environment (IDE) is a software application that streamlines software development by combining essential tools into a unified interface. These tools typically include a code editor, compiler, debugger, and often additional features like code completion and version control integration. By consolidating these functionalities, IDEs enhance developer productivity, reduce errors, and simplify project management. Available as both open-source and commercial products, IDEs can be standalone applications or part of larger software suites.

Further Software Debugging Methodology and Processes

Error prevention and detection are integral to software development. While debugging tools are essential, they complement a broader range of strategies and processes aimed at producing error-free code.

Development methodologies such as Agile, Waterfall, Rapid Application Development, and DevOps offer different approaches to project management, each with its own emphasis on quality control.

Specific practices like unit testing, code reviews, and pair programming are effective in identifying and preventing errors. Unit testing isolates code components for verification. Code reviews leverage peer expertise to catch oversights. Pair programming fosters real-time collaboration and knowledge sharing.

By combining these strategies with debugging tools, developers can significantly enhance software quality and reliability.

Conclusion

Debugging is an integral part of the software development process that spans the entire software stack, from low-level firmware to high-level application software. Each layer presents unique challenges and requires specialized tools and techniques.

In low-level debugging, understanding hardware interactions and system calls is crucial. Tools like processor trace help developers trace issues at this foundational level. This is where users tend to be comfortable with register models, address maps, memory maps etc. Moving up the stack, debugging becomes more abstract, involving memory management, API calls, and user interactions. Here, debuggers like GDB and integrated development environments (IDEs) with built-in debugging tools prove invaluable. The user in this space is more comfortable with the APIs provided by the OS or the application. They are dependent on hardware or firmware engineers to identify issues in the lower levels of the stack.

During the pre-silicon phase all software debugging tools all rely on the ability to execute the software on a fast execution target being a virtual prototype, emulation or FPGA-based prototyping. Beside the performance of the underlying pre-silicon target, the flexibility and ease of use to extract different types of debug data for the different software stack levels drive debug productivity. With more and more workloads moving to emulation and prototyping platforms, the user community is placing an even bigger ask to help debug their environments and system issues. However, there is this delicate balance between debuggability and performance of such a platform. There is an inverse relationship between debuggability and performance.

Looking forward, the evolution of debugging tools and methodologies is expected to embrace machine learning and AI to predict potential bugs and offer solutions, thereby transforming the landscape of software debugging.

Also Read:

The Immensity of Software Development the Challenges of Debugging (Part 1 of 4)

The Immensity of Software Development and the Challenges of Debugging Series (Part 2 of 4)


SystemVerilog Functional Coverage for Real Datatypes

SystemVerilog Functional Coverage for Real Datatypes
by Mariam Maurice on 10-03-2024 at 6:00 am

fig 1

Functional coverage acts as a guide to direct verification resources by identifying the tested and untested portions of a design. Functional coverage is a user-defined metric that assesses the extent to which the design specification, as listed by the test plan’s features, has been used. It can be used to estimate the presence of intriguing scenarios, corner cases, specification invariants, or other relevant design conditions inside the test plan’s features. Functional coverage is fully specified by the user or from a system point of view to cover all the typical scenarios and corner cases. Therefore, it requires more up-front effort (the verification engineer must write the coverage model). Moreover, it requires a more structured approach for the verification to shorten the overall verification effort and yield higher-quality designs. It can cover any value of a signal anywhere in the modeled DUT.

This article examines coverage models for the “real” datatype through actual analog devices modeled using SystemVerilog-Real Number modeling devices we used are phase-locked loops (PLL), analog-to-digital converters, and digital-to-analog converters but could be any modeled analog device. The article also shows how a simulator, in this case Questa from Siemens EDA, is used to divide the automatic bins and run the covergroups based on the coverage options that are supported for the “real” datatype. We used the Siemens Visualizer debug environment to clarify how users can debug the simulated values. This was especially important because the number of real-valued bins was huge and needed to be visualized to ensure that the desired values are captured within the modeled design.

Execution of Coverage Metrics Using

A. Cover an interval of an electrical signal

Here we seek to cover an interval of an electrical signal from its low to high amplitude value.

Listing 1: Covergroup to cover an interval of an electrical signal.

The amplitude could be a negative(-ve) or positive(+ve) analog value, a low or high supply value, or any modeled electrical signal whose value covers a range of a real amplitude datatype. This is helpful in covering any analog waveform with a certain range, such as sine, triangular, square, noisy, and clipped signals, as illustrated in Figure 1.

  • Cover -ve to -ve real value
  • Cover +ve to +ve real value
  • Cover -ve to +ve real value

The covergroup “V_COV” contains one coverpoint “vout_load,” which is associated with bins “v.” Notice that the automatic number of the bins is defined according to the “real_interval” that was passed as a “type_option” and divided according to the number of real values. This means that once a bin is created for a value then no other bin is created for this value, even if the value is repeated within a different range or declared as a separate value.

Figure 1. Waveforms executed with event-driven simulators.

Table 1 illustrates a simple example of how the number of automatic bins is generated.

B. Cover an electrical signal with a certain tolerance

Here we want to cover an interval of an electrical signal with a certain amplitude with a specific tolerance in its value.

Listing 2. Covergroup to cover an electrical signal with a certain tolerance.

For example, the voltage control of the Fractional-PLL always has a tolerance around its nominal value at the locking time. The verification engineer could increase the expected tolerance to ensure that the undesired values are uncovered. If the undesired values are covered then the modeled system should be changed. Therefore, functional coverage helps in ensuring that the expected values are covered, and the unexpected ones are uncovered, as illustrated in Figure 2. Figure 2(a) shows that the tolerance with the modeled PLL vcontrol increased or decreased by 2% (0.02), which means the values that are greater than (0.42) and less than (0.38) will not be covered, because we want to cover the unexpected values to capture the weird behaviors within the system. Figure 2(b) shows the desired values with the preferred tolerance of about 2% after the designer modified the modeled DUT to meet the system requirements. Notice that coverage collects the undesired values and allows the user to adjust the modeled system, resulting, in this case, in the values becoming less than (0.42) and greater than (0.38), which will maintain system correctness.

Figure 2. The vcontrol without (a) and with (b) the accepted tolerance.

C. Cover a signal value when another event occurred

Now we seek to cover a signal value with a certain range or reach specific values when another signal occurred, or we want to cover the output values when an input occurred. For example, the charge pump within the PLL reaches (Icp) when the reference clock is leading the feedback clock and reaches (-Icp) when the reference clock is lagging the feedback clock.

Listing 3. Covergroup to cover a signal value when another event occurred. 

D. Cover a set of parameters that maintain the stability of the system

The verification engineer can predict the boundaries in the parameters variations that maintain the stability of the system by putting the expected variation in each component/parameter value through a certain bin to cover (bins predicted) and another bin within the unexpected variation that exceeded the desired variations within the parameters (bins exceeded). The verification engineer should have the predicted bins to be hit rather than the exceeded bins, otherwise the system could be unstable.  Therefore, the verification engineer can use the coverage metrics to collect the trade-off variations within the system.

For example, figure 3 illustrates the stability of a certain PLL system by calculating the open loop response under the provided variation components in Table 2. So the verification engineer will build a covergroup for each component with the expected and exceeded variations, then apply cross-coverage between these covergroups to ensure that the collective metrics between covergroups are preserved according to system specifications.

Figure 3. Bode plot of PLL open loop response under components’ variations.

Table 2. Expected variation for each loop component/parameter and PM values that maintain stability.

Listing 4 is the provided covergroup for the loop filter components only (the first row in the Table 2) to illustrate the idea, but the user needs to build cover metrics for all the trade-off possibilities.

Listing 4. Covergroup to cover the parameters of the system and crossings between them.

Debug Coverage Metrics Using a Debug Environment

There are two modes that the user can choose between during debugging the coverage statistics. These modes are “view coverage” and “coverage debugging.” In the view coverage mode, the user debugs the coverage statistics results from the simulator without design or testbench debugging, while with the coverage debugging mode, the user can debug the coverage results together with the design and/or testbench debug.

The “covergroups” window displays the coverage results for SystemVerilog covergroups, coverpoints, crosses, and bins in the design. The Visualizer debug environment was used on the example listed in section D to show how it can help the functional verification engineer visualize functional correctness with the modeled design using the “covergroups” window.

The first case (shown in figure 4) illustrates that either there is no mismatch in the loop filter components or the mismatch is as expected, which is about 10% (as illustrated in table 2) thus maintaining the stability of the system. This means that the first two bins in cross “LPF” will be hit only while the others are not hit. Therefore, the first two bins “LPF_predicated/LPF_predicated_H” help in knowing that the desired values will be covered, but that does not mean the undesired values are uncovered except if the last five bins are un-hit (LPF_exceeded/LPF_exceeded_H/ LPF_exceeded_R2/ LPF_exceeded_C2/ LPF_exceeded_C1).

Figure 4. Case 1: The ideal or the predicated behavior of the loop filter component.

The second case illustrates when a component in the loop filter has a mismatch while the others have the expected values. The last three bins (LPF_exceeded_R2/LPF_exceeded_C2/LPF_exceeded_C1) are helpful to track the component that caused the mismatch. Many components probably exceed the predicted values. Figure 5 illustrates that the “C2” component has a mismatch and therefore the “LPF_exceeded” bin is also hit because at least one component has a mismatch

Figure 5. Case 2: There is at least one component mismatch.

The third case (shown in figure 6) illustrates that all the components have mismatches and exceeded the predicted values when the bin (LPF_exceeded_H) is hit. The design and verification engineers need to work together to allow the mismatches to go from the worst case, “case 3,” to the best case, “case 1,” as that is the first cast that will allow the system to be stable with the functional correctness. The bin “LPF_exceeded_H” is hit 582 times, which means that the three components are incorrectly modeled together 582 times through the simulation run.

Figure 6. Case 3: All components have mismatches and exceed predicted values.

Conclusion

SystemVerilog functional coverage can help verification engineers that used to model the analog signals in digital environment in the following:

  • Ensuring any “real” signal is covered under a certain amplitude This amplitude range could have tolerance due to any mismatch within the circuit.
  • Helping to capture the mismatches within the circuit components that may cause the system to behave correctly or not.
  • Verifying that a certain functionality of any system sub-block is

To read more on this topic, please see the new whitepaper from Siemens EDA, Functional verification of analog devices modeled using SV-RNM.

Mariam Maurice is a Product Engineer for Questa Simulator with a focus on RNM and the Visualizer Debug Environment at Siemens Digital Industries Software (DISW) in Electronic Design Automation (EDA). She received the B.Sc. (Hons.) and M.Sc. degrees in electronics engineering from Ain Shams University (ASU), Cairo, Egypt. Her research interests include analog/mixed-signal (AMS) integrated circuits and systems, system-level design, modeling AMS behaviors using hardware description languages, and Functional Verification.

Also Read:

Automating Reset Domain Crossing (RDC) Verification with Advanced Data Analytics

Smarter, Faster LVS using Calibre nmLVS Recon

Siemens EDA Offers a Comprehensive Guide to PCIe® Transport Security


Synopsys and TSMC Pave the Path for Trillion-Transistor AI and Multi-Die Chip Design

Synopsys and TSMC Pave the Path for Trillion-Transistor AI and Multi-Die Chip Design
by Kalar Rajendiran on 10-02-2024 at 10:00 am

OIP 2024 Synopsys TSMC

Synopsys made significant announcements during the recent TSMC OIP Ecosystem Forum, showcasing a range of cutting-edge solutions designed to address the growing complexities in semiconductor design. With a strong emphasis on enabling next-generation chip architectures, Synopsys introduced both new technologies and key updates to existing solutions in collaboration with TSMC.

At the heart of this collaboration is the goal of accelerating the development of trillion-transistor chips, which are necessary to support the computational demands of Artificial Intelligence (AI) and high-performance computing (HPC) applications. As these systems continue to grow in complexity, Synopsys and TSMC are collaborating to leverage AI to streamline the design process and ensure power efficiency, scalability, and system reliability. What caught my interest and attention was the focus multi-die, 3D Integrated Circuits (3DICs), and multi-physics design analysis are receiving in this collaboration. Before we dive into that, below is a roundup of the key announcements.

Roundup of the Key Announcements from Synopsys

Synopsys aims to enable the design of more complex, efficient, and scalable multi-die packages that can meet the evolving demands of AI, HPC, and other advanced computing applications.

Synopsys.ai Suite Optimized for TSMC N2 Process Technology: This was a key update, as Synopsys’ AI-driven EDA suite was already known for its ability to improve Quality of Results (QoR). The latest optimization focuses on the N2 process, helping designers move more swiftly to next-generation nodes while enhancing chip performance and power efficiency.

Backside Power Delivery in TSMC A16 Process: A new innovation that stood out was the backside power delivery system, which promises more efficient power routing and reduced energy consumption. This method helps manage the demands of trillion-transistor architectures by optimizing signal integrity and chip density.

Synopsys IP Solutions for 3DFabric Technologies: Updates were made to Synopsys’ UCIe and HBM4 IP solutions, which are crucial for TSMC’s 3DFabric technologies, including CoWoS (Chip on Wafer on Substrate) and SoIC (System on Integrated Chips). These updates further improve bandwidth and energy efficiency in multi-die designs.

3DIC Compiler, 3DSO.ai and Multi-Physics Flow: One of the more notable announcements involved the enhancement of Synopsys’ 3DIC Compiler platform and 3DSO.ai to address the complexities of multi-die designs and offer AI-driven multi-physics analysis during the design process, helping to streamline system-level integration.

TSMC Cloud Certification for Accelerated Design: To further accelerate the design process, Synopsys and TSMC have also enabled Synopsys EDA tools on the cloud, certified through TSMC’s Cloud Certification. This provides mutual customers with cloud-ready EDA tools that not only deliver accurate QoR but also seamlessly integrate with TSMC’s advanced process technologies.

The Importance of Multi-Die, 3DIC, and Multi-Physics Design

As semiconductor technology pushes beyond the traditional limits of Moore’s Law, multi-die designs and 3DICs have become essential for enhancing performance and density. These technologies allow for multiple dies, each with its own specialized function, to be stacked or placed side-by-side within a single package. However, the integration of these dies—especially when combining electronic ICs with photonic ICs—introduces significant design challenges.

One of the most pressing issues in multi-die design is thermal management. As multiple heat-generating dies are placed in close proximity, the risk of overheating increases, which can degrade performance and shorten the lifespan of the chip. Additionally, electromagnetic interference (EMI), signal integrity, and power distribution present further challenges that designers must account for during early-stage development.

This is where multi-physics analysis plays a critical role. Multi-physics analysis is the process of evaluating how different physical phenomena—such as heat dissipation, mechanical stress, and electrical signals—interact with one another within a chip package. Without an understanding of these interactions, it becomes nearly impossible to design reliable and efficient multi-die systems.

Synopsys Solutions for Multi-Die and 3DIC Challenges

Synopsys is at the forefront of addressing these challenges through its AI-powered solutions, many of which were updated or introduced during the TSMC OIP Ecosystem Forum. These tools are specifically designed to address the complexity of multi-die designs and 3DICs, where early-stage analysis and optimization are crucial for success.

AI-Driven EDA with Synopsys.ai

One of the most significant updates came from Synopsys.ai, which is now optimized for TSMC’s N2 process technology. This suite allows designers to leverage AI to improve design efficiency and reduce the time needed to move designs to production. By incorporating AI into the design process, Synopsys.ai helps engineers navigate the vast array of potential design configurations, ensuring that the most optimal solutions are chosen for performance, power efficiency, and thermal management.

“Synopsys’ certified Custom Compiler and PrimeSim solutions provide the performance and productivity gains that enable our designers to meet the silicon demands of high-performance analog design on the TSMC N2 process,” said Ching San Wu, Corporate VP at MediaTek in Synopsys’ news release. “Expanding our collaboration with Synopsys makes it possible for us to leverage the full potential of their AI-driven flow to accelerate our design migration and optimization efforts, improving the process required for delivering our industry-leading SoCs to multiple verticals.”

3DIC Compiler and 3DSO.ai for Multi-Die Systems

These tools enable designers to conduct multi-physics analysis early in the design process, which is essential for optimizing thermal and power management, signal integrity, and mechanical stability in multi-die systems. By identifying potential issues—such as hotspots or signal degradation—early in the process, designers can make informed adjustments before reaching the later stages of development, thus avoiding costly redesigns.

3DSO.ai leverages AI to analyze complex multi-die configurations, allowing engineers to test a wide range of potential scenarios in a fraction of the time it would take using traditional methods. This capability is critical as designs become more complex, with tens of thousands of possible combinations for how dies are stacked, interconnected, and cooled.

TSMC-certified Synopsys 3DIC Compiler’s compatibility with TSMC’s SoIC and CoWoS technologies further solidify its position as a leading platform for multi-die designs. This ensures seamless collaboration across design architecture and planning, design implementation, and signoff teams, enabling efficient 3DIC development for cutting-edge applications.

These technologies are critical for enabling the heterogeneous integration of dies in 3DIC systems, which helps overcome traditional scaling challenges such as thermal management and signal integrity.

As a demonstration vehicle, Synopsys achieved a successful tapeout recently, of a test chip featuring a multi-die design using TSMC’s CoWoS advanced packaging technology. This test chip leveraged TSMC’s 3DFabric technology and Synopsys’ multi-die solutions, including silicon-proven UCIe IP, 3DIC Compiler unified exploration-to-signoff platform, and the 3DSO.ai AI-driven optimization solution. The Figure below showcases the level of system analysis and optimization enabled by Synopsys 3DSO.ai. The test chip demonstrated unmatched performance reliability.

Figure: Synopsys 3DSO.ai AI-enabled system analysis and optimization 

Optimizing Power Delivery with Backside Power Innovations

The new backside power delivery capability, introduced through TSMC’s A16 process, represents a critical leap forward in ensuring power integrity in multi-die systems. By routing power through the backside of the chip, more space is made available on the front for signal routing and transistor placement. This helps reduce energy consumption while also enhancing signal integrity, ensuring that trillion-transistor designs can operate efficiently and reliably.

Summary

The announcements made by Synopsys at the TSMC OIP Ecosystem Forum underscore the growing importance of multi-die architectures, 3DIC systems, and multi-physics analysis in semiconductor design. With new AI-driven tools and key updates to existing solutions, Synopsys is helping engineers overcome the complex challenges posed by trillion-transistor designs and multi-die integration.

By leveraging Synopsys’ advanced EDA tools, platforms and IP, engineers can now address critical issues—like thermal management, signal integrity, and power distribution—at the earliest stages of the design process. This proactive approach not only improves design efficiency but also ensures that the final product meets the stringent performance requirements of AI, HPC, and other next-generation applications.

You can read the Synopsys announcement in its entirety here, and more details on the test chip tapeout here.

Also Read:

The Immensity of Software Development and the Challenges of Debugging (Part 3 of 4)

The Immensity of Software Development and the Challenges of Debugging Series (Part 2 of 4)

Synopsys Powers World’s Fastest UCIe-Based Multi-Die Designs with New IP Operating at 40 Gbps


Is AI-Based RTL Generation Ready for Prime Time?

Is AI-Based RTL Generation Ready for Prime Time?
by Bernard Murphy on 10-02-2024 at 6:00 am

shutterstock 2495413145 min

In semiconductor design there has been much fascination around the idea of using large language models (LLMs) for RTL generation; CoPilot provides one example. Based on a Google Scholar scan, a little over 100 papers were published in 2023, jumping to 310 papers in 2024. This is not surprising. If it works, automating design creation could be a powerful advantage to help designers become more productive (not to replace them as some would claim). But we know that AI claims have a tendency to run ahead of reality in some areas. Where does RTL generation sit on this spectrum?

Benchmarking

The field has moved beyond the early enthusiasm of existence proofs (“look at the RTL my generator built”) to somewhat more robust analysis. A good example is a paper published very recently in arXiv: Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks, with a majority of authors from Nvidia and one author from Cornell. A pretty authoritative source.

The authors have extended a benchmark (VerilogEval) they built in 2023 to evaluate LLM-based Verilog generators. The original work studied code completion tasks; in this paper they go further to include generating block RTL from natural language specifications. They also describe a mechanism for prompt tuning through in-context learning (additional guidance in the prompt). Importantly for both completion and spec to RTL they provide a method to classify failures by type, which I think could be helpful to guide prompt tuning.

Although there is no mention of simulation testbenches, the authors clearly used a simulator (Icarus Verilog) and talk about Verilog compile-time and run-time errors, so I assume the benchmark suite contains human-developed testbenches for each test.

Analysis

The authors compare performance across a wide range of LLMs, from GPT-4 models to Mistral, Llama, CodeGemma, DeepSeek Coder and RTLCoder DeepSeek. Small point of initial confusion for this engineer/physicist: they talk about temperature settings in a few places. This is a randomization factor for LLMs, nothing to do with physical temperature.

First, a little background on scoring generated code. The usual method to measure machine generated text is a score called BLEU (Bilingual evaluation understudy), intended to correlate with human-judged measures of quality/similarity. While appropriate for natural language translations, BLEU is not ideal for measuring code generation. Functional correctness is a better starting point, as measured in simulation.

The graphs/tables in the paper measure pass rate against a benchmark suite of tests, allowing one RTL generation attempt per test (pass@1), so no allowance for iterated improvement except in 1-shot refinement over 0-shot. 0-shot measures generation from an initial prompt and 1-shot measures generation from the initial prompt augmented with further guidance. The parameter ‘n’ in the tables is a wrinkle to manage variance in this estimate – higher n, lower variance.

Quality, measured through test pass rates within the benchmark suite, ranges from below 10% to as high as 60% in some cases. Unsurprisingly smaller (LLM) models don’t do as well as bigger models. Best rates are for GPT-4 Turbo with ~1T parameters and Llama 3.1 with 405B parameters. Within any given model, success rates for code completion and spec to RTL tests are roughly comparable. In many cases in-context learning/refined prompts improve quality, though for GPT-4 Turbo spec-to-RTL and Llama3 70B prompt engineering actually degrades quality.

Takeaways

Whether for code completion or spec to RTL, these accuracy rates suggest that RTL code generation is still a work in process.  I would be curious to know how an entry-level RTL designer would perform against these standards.

Also in this paper I see no mention of tests for synthesizability or PPA. (A different though smaller benchmark, RTLLM, also looks at these factors, where PPA is determined in physical synthesis I think – again short on details.)

More generally I also wonder about readability and debuggability. Perhaps here some modified version of the BLEU metric versus expert-generated code might be useful as a supplement to these scores.

Nevertheless, interesting to see how this area is progressing.


5 Expectations for the Memory Markets in 2025

5 Expectations for the Memory Markets in 2025
by Daniel Nenni on 10-01-2024 at 10:00 am

Expectations for the Memory Markets in 2025

TechInsights has a new memory report that is worth a look. It is free if you are a registered member which I am. HBM is of great interest and there is a section on emerging and embedded memories for chip designers. Even though I am more of a logic person, memory is an important part of the semiconductor industry. In fact, logic and memory go together like peas and carrots. If you are looking at the semiconductor industry as a whole and trying to figure out what 2025 looks like you have to include memory, absolutely.

TechInsights also has some interesting videos, I just finished the one on chiplets that was published last month:

TechInsights is a whole platform of reverse engineering, teardown, and market analysis in the semiconductor industry. This collection includes detailed circuit analysis, imagery, semiconductor process flows, device teardowns, illustrations, costing and pricing information, forecasts, market analysis, and expert commentary. 

Inside the memory report are links to many more (not free) reports included for a more detailed view. Here is the first section of the report:

5 Expectations for the Memory Markets in 2025

The memory markets, encompassing DRAM and NAND, are poised for significant growth in 2025, largely driven by the accelerating adoption of artificial intelligence (AI) and related technologies. As we navigate the complexities of these markets, several key trends emerge that are expected to shape the landscape. Here are five expectations for the memory markets in the coming year, along with a potential spoiler that could disrupt everything.

1. AI Leads to Continued Focus on High-Bandwidth Memory (HBM)

The rise of AI, particularly in data-intensive applications like machine learning and deep learning, is driving an unprecedented demand for high-bandwidth memory (HBM). Shipments of HBM are expected to grow by 70% year-over-year as data centers and AI processors increasingly rely on this type of memory to handle massive amounts of data with low latency. This surge in HBM demand is expected to reshape the DRAM market, with manufacturers prioritizing HBM production over traditional DRAM variants.(Learn More)

2. AI Drives Demand for High-Capacity SSDs and QLC Adoption

As AI continues to permeate various industries, the need for high-capacity solid-state drives (SSDs) is on the rise. This is particularly true for AI workloads that require extensive data storage and fast retrieval times. Consequently, the adoption of quad-level cell (QLC) NAND technology, which offers higher density at a lower cost, is expected to increase. QLC SSDs, despite their slower write speeds compared to other NAND types, will gain traction due to their cost-effectiveness and suitability for AI-driven data storage needs. Datacenter NAND bit demand growth is expected to exceed 30% in 2025, after explosive growth of about 70% in 2024.(Learn More)

3. Capex Investment Shifts Heavily Towards DRAM and HBM

Driven by the surge in AI applications, capital expenditure (capex) in the memory market is increasingly being funneled towards DRAM, particularly HBM. DRAM capex is projected to rise nearly 20% year-over-year as manufacturers expand their production capacities to meet the growing demand. However, this shift has left minimal investment for NAND production, creating a potential supply-driven bottleneck in the market. Profitability in the NAND sector continues to improve, which could reignite investment in this area as we move into 2026.(Learn More)

4. Edge AI Begins to Emerge but Won’t Impact Until 2026

Edge AI, which brings AI processing closer to the data source on devices like smartphones and PCs, is anticipated to hit the market in 2025. However, the full impact of this technology won’t be felt until 2026. Devices with true, on-device AI capabilities are expected to launch in late 2025, but sales volumes are unlikely to be significant enough to influence the memory markets immediately. The real shift should occur in 2026 as edge AI becomes more widespread, driving demand for memory solutions tailored to these new capabilities. (Learn More)

5. Datacenter AI Focus Delays Traditional Server Refresh Cycles

The focus on AI-driven data centers has led to a delay in the refresh cycle for traditional server infrastructure. Many organizations are diverting resources to upgrade their AI capabilities, leaving conventional servers in need of updates. While this delay might be manageable in the short term, at some point, these servers will need to be refreshed, potentially creating a sudden surge in demand for DRAM and NAND. This delayed refresh cycle could result in a significant uptick in memory demand once it finally happens. (Learn More)

Spoiler: A Sudden Halt in AI Development Could Upset Everything

While AI is the primary driver behind these market expectations, it’s important to consider the potential for a sudden slowdown in AI development. Whether due to macroeconomic headwinds, diminishing returns on AI investments, or technical roadblocks in scaling AI models, a significant deceleration in AI progress would have profound negative implications for the memory markets. Such a halt would likely lead to a sharp decline in demand for HBM, DRAM, and high-capacity SSDs, disrupting the expected growth and investment patterns in these sectors. As such, while the memory markets are poised for substantial growth in 2025, they remain highly susceptible to the broader trajectory of AI advancements.

Also Read:

Semiconductor Industry Update: Fair Winds and Following Seas!

Samsung Adds to Bad Semiconductor News

Hot Chips 2024: AI Hype Booms, But Can Nvidia’s Challengers Succeed?

The Semiconductor Business will find a way!


Sondrel Redefines the AI Chip Design Process

Sondrel Redefines the AI Chip Design Process
by Mike Gianfagna on 10-01-2024 at 6:00 am

Sondrel Redefines the AI Chip Design Process

Designing custom silicon for AI applications is a particularly vexing problem. These chips process enormous amounts of data with a complex architecture that typically contains a diverse complement of heterogeneous processors, memory systems and various IO strategies. Each of the many subsystems in this class of chip will have different data traffic requirements. Despite all these challenges, an effective architecture must run extremely efficiently, without processor stalls or any type of inefficient data flow. The speed and power requirements for this type of design cannot be met without a highly tuned architecture. These challenges have kept design teams hard at work for countless hours, trying to find the optimal solution. Recently, Sondrel announced a new approach to this problem that promises to make AI chip design far more efficient and predictable. Let’s examine how Sondrel redefines the AI chip design process.

Architecting the Future

Sondrel recently unveiled an advanced modeling process for AI chip designs. The approach is part of the company’s forward-looking Architecting the Future family of ASIC architecture frameworks and IP. By using a pre-verified ASIC framework and IP, Sondrel reduces the risks associated with “from scratch” custom chip design. The advanced modeling process is part of this overall risk reduction strategy.

The approach uses accurate, cycle-based system performance modeling early in the design process.  Starting early, before RTL development, begins the process of checking that the design will meet its specification. This verification approach continually evolves and can be used for the entire flow, from early specification to silicon. Using this unique approach with pre-verified design elements reduces risk and time to market. And thanks to the use of advanced process technology power can also be reduced while ensuring performance criteria can be reliably met.

Digging Deeper

Paul Martin

I had the opportunity to meet with Paul Martin, Sondrel’s Global Field Engineering Director to get more details on how the new approach works. Paul has been with Sondrel for almost ten years. He was previously with companies such as ARM, NXP Semiconductors and Cadence, so he has a deep understanding of what it takes to do advanced custom chip design.

Paul explained that at the core of the new approach is a commercially available transaction-based simulator. Both Sondrel and the supplier of this simulator have invested substantial effort to take this flow well beyond the typical cycle-accurate use model. 

He explained that detailed, timing-accurate models of many IP blocks have been developed. These models essentially create accurate data traffic profiles for each element. Going a bit further, the AI workloads that will issue transactions to these IP blocks are analyzed to create a graphical representation of how transactions are initiated to the chip-level elements such as processors and memories.

Using this view of how the system is running, Paul further explained that a closed-loop simulation system is created that can feedback results to the compiler and the micro-architecture optimization tools for a particular NPU to optimize its performance, avoiding bottlenecks. This ability to model and optimize a system at the transaction level is unique and can be quite powerful.

Paul went on to describe the custom workflow that has been built around the commercial simulator. This workflow allows the same stimulus models to be applied from architectural analysis to RTL design, to emulation and all the way to real silicon. Essentially, the transaction model can be applied all the way through the process to ensure the design maintains its expected level of performance and power. The elusive golden specification if you will.

Paul explained that by focusing on the architect vs. the software developer a truly new approach to complex AI chip design is created. He went on to explain that this approach has been applied to several reference designs. He cited examples for video and data processing, edge IoT data processing and automotive ADAS applications.

To Learn More

You can see the details of the recent Sondrel announcement here. There are also a couple of good pieces discussing Sondrel’s work in the automotive sector on SemiWiki here. And you can explore Sondrel’s overall Architecture the Future strategy here. And that’s how Sondrel redefines the AI chip design process. Exciting stuff.


Elevating AI with Cutting-Edge HBM4 Technology

Elevating AI with Cutting-Edge HBM4 Technology
by Kalar Rajendiran on 09-30-2024 at 10:00 am

HBM4 Compute Chiplet Subsystem

Artificial intelligence (AI) and machine learning (ML) are evolving at an extraordinary pace, powering advancements across industries. As models grow larger and more sophisticated, they require vast amounts of data to be processed in real-time. This demand puts pressure on the underlying hardware infrastructure, particularly memory, which must handle massive data sets with high speed and efficiency. High Bandwidth Memory (HBM) has emerged as a key enabler of this new generation of AI, providing the capacity and performance needed to push the boundaries of what AI can achieve.

The latest leap in HBM technology, HBM4, promises to elevate AI systems even further. With enhanced memory bandwidth, higher efficiency, and advanced design, HBM4 is set to become the backbone of future AI advancements, particularly in the realm of large-scale, data-intensive applications such as natural language processing, computer vision, and autonomous systems.

The Need for Advanced Memory in AI Systems

AI workloads, particularly deep neural networks, differ from traditional computing by requiring the parallel processing of vast data sets, creating unique memory challenges. These models demand high data throughput and low latency for optimal performance. High Bandwidth Memory (HBM) addresses these needs by offering superior bandwidth and energy efficiency. Unlike conventional memory, which uses wide external buses, HBM’s vertically stacked chips and direct processor interface minimize data travel distances, enabling faster transfers and reduced power consumption, making it ideal for high-performance AI systems.

How HBM4 Improves on Previous Generations

HBM4 significantly advances AI and ML performance by increasing bandwidth and memory density. With higher data throughput, HBM4 enables AI accelerators and GPUs to process hundreds of gigabytes per second more efficiently, reducing bottlenecks and boosting system performance. Its increased memory density, achieved by adding more layers to each stack, addresses the immense storage needs of large AI models, facilitating smoother scaling of AI systems.

Energy Efficiency and Scalability

As AI systems continue to scale, energy efficiency becomes a growing concern. AI training models are incredibly power-hungry, and as data centers expand their AI capabilities, the need for energy-efficient hardware becomes critical. HBM4 is designed with energy efficiency in mind. Its stacked architecture not only shortens data travel distances but also reduces the power needed to move data. Compared to previous generations, HBM4 achieves better performance-per-watt, which is crucial for the sustainability of large-scale AI deployments.

Scalability is another area where HBM4 shines. The ability to stack multiple layers of memory while maintaining high performance and low energy consumption means that AI systems can grow without becoming prohibitively expensive or inefficient. As AI applications expand from specialized data centers to edge computing environments, scalable memory like HBM4 becomes essential for deploying AI in a wide range of use cases, from autonomous vehicles to real-time language translation systems.

Optimizing AI Hardware with HBM4

The integration of HBM4 into AI hardware is essential for unlocking the full potential of modern AI accelerators, such as GPUs and custom AI chips, which require low-latency, high-bandwidth memory to support massive parallel processing. HBM4 enhances inference speeds, critical for real-time applications like autonomous driving, and accelerates AI model training by providing higher data throughput and larger memory capacity. These advancements enable faster, more efficient AI development, allowing for quicker model training and improved performance across AI workloads.

The Role of HBM4 in Large Language Models

HBM4 is ideal for developing large language models (LLMs) like GPT-4, which drive generative AI applications such as natural language understanding and content generation. LLMs require vast memory resources to store billions or trillions of parameters and handle data processing efficiently. HBM4’s high capacity and bandwidth enable the rapid access and transfer of data needed for both inference and training, supporting increasingly complex models and enhancing AI’s ability to generate human-like text and solve intricate tasks.

Alphawave Semi and HBM4

Alphawave Semi is pioneering the adoption of HBM4 technology by leveraging its expertise in packaging, signal integrity, and silicon design to optimize performance for next-generation AI systems. The company is evaluating advanced packaging solutions, such as CoWoS interposers and EMIB, to manage dense routing and high data rates. By co-optimizing memory IP, the channel, and DRAM, Alphawave Semi uses advanced 3D modeling and S-parameter analysis to ensure signal integrity, while fine-tuning equalization settings like Decision Feedback Equalization (DFE) to enhance data transfer reliability.

Alphawave Semi also focuses on optimizing complex interposer designs, analyzing key parameters like insertion loss and crosstalk, and implementing jitter decomposition techniques to support higher data rates. The development of patent-pending solutions to minimize crosstalk ensures the interposers are future-proofed for upcoming memory generations.

Summary

As AI advances, memory technologies like HBM4 will be crucial in unlocking new capabilities, from real-time decision-making in autonomous systems to more complex models in healthcare and finance. The future of AI relies on both software and hardware improvements, with HBM4 pushing the limits of AI performance through higher bandwidth, memory density, and energy efficiency. As AI adoption grows, HBM4 will play a foundational role in enabling faster, more efficient AI systems capable of solving the most data-intensive challenges.

For more details, visit this page.

Also Read:

Alphawave Semi Unlocks 1.2 TBps Connectivity for HPC and AI Infrastructure with 9.2 Gbps HBM3E Subsystem

Alphawave Semi Tapes Out Industry-First, Multi-Protocol I/O Connectivity Chiplet for HPC and AI Infrastructure

Driving Data Frontiers: High-Performance PCIe® and CXL® in Modern Infrastructures


Podcast EP249: A Conversation with Dr. Jason Cong, the 2024 Phil Kaufman Award Winner

Podcast EP249: A Conversation with Dr. Jason Cong, the 2024 Phil Kaufman Award Winner
by Daniel Nenni on 09-27-2024 at 10:00 am

Dan is joined by Dr. Jason Cong, the Volgenau Chair for Engineering Excellence Professor at the UCLA Computer Science Department. He is the director of the Center for Domain-Specific Computing and the director of VLSI Architecture, Synthesis, and Technology Laboratory. Dr. Cong’s research interests include novel architectures and compilation for customizable computing, synthesis of VLSI circuits and systems, and quantum computing.

Dr. Cong will be recognized by the Electronic System Design Alliance and the Council on Electronic Design Automation (CEDA) of the IEEE with the 2024 Phil Kaufman Award at a presentation and banquet on November 6 in San Jose, California.

In this far-reaching discussion, Dan explores the many contributions Jason has made to the semiconductor industry. His advanced research in FPGA design automation from from the circuit to system levels is discussed, along with the many successful companies he has catalyzed as a serial entrepreneur.

Dr. Cong is also inspiring a future generation of innovators through his teaching and research in areas such as quantum computing. He explores methods to inspire his students and the path to democratizing chip design, making it readily available to a wide range of new innovations.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.