SNPS1670747138 DAC 2025 800x100px HRes

Synopsys and TSMC Pave the Path for Trillion-Transistor AI and Multi-Die Chip Design

Synopsys and TSMC Pave the Path for Trillion-Transistor AI and Multi-Die Chip Design
by Kalar Rajendiran on 10-02-2024 at 10:00 am

OIP 2024 Synopsys TSMC

Synopsys made significant announcements during the recent TSMC OIP Ecosystem Forum, showcasing a range of cutting-edge solutions designed to address the growing complexities in semiconductor design. With a strong emphasis on enabling next-generation chip architectures, Synopsys introduced both new technologies and key updates to existing solutions in collaboration with TSMC.

At the heart of this collaboration is the goal of accelerating the development of trillion-transistor chips, which are necessary to support the computational demands of Artificial Intelligence (AI) and high-performance computing (HPC) applications. As these systems continue to grow in complexity, Synopsys and TSMC are collaborating to leverage AI to streamline the design process and ensure power efficiency, scalability, and system reliability. What caught my interest and attention was the focus multi-die, 3D Integrated Circuits (3DICs), and multi-physics design analysis are receiving in this collaboration. Before we dive into that, below is a roundup of the key announcements.

Roundup of the Key Announcements from Synopsys

Synopsys aims to enable the design of more complex, efficient, and scalable multi-die packages that can meet the evolving demands of AI, HPC, and other advanced computing applications.

Synopsys.ai Suite Optimized for TSMC N2 Process Technology: This was a key update, as Synopsys’ AI-driven EDA suite was already known for its ability to improve Quality of Results (QoR). The latest optimization focuses on the N2 process, helping designers move more swiftly to next-generation nodes while enhancing chip performance and power efficiency.

Backside Power Delivery in TSMC A16 Process: A new innovation that stood out was the backside power delivery system, which promises more efficient power routing and reduced energy consumption. This method helps manage the demands of trillion-transistor architectures by optimizing signal integrity and chip density.

Synopsys IP Solutions for 3DFabric Technologies: Updates were made to Synopsys’ UCIe and HBM4 IP solutions, which are crucial for TSMC’s 3DFabric technologies, including CoWoS (Chip on Wafer on Substrate) and SoIC (System on Integrated Chips). These updates further improve bandwidth and energy efficiency in multi-die designs.

3DIC Compiler, 3DSO.ai and Multi-Physics Flow: One of the more notable announcements involved the enhancement of Synopsys’ 3DIC Compiler platform and 3DSO.ai to address the complexities of multi-die designs and offer AI-driven multi-physics analysis during the design process, helping to streamline system-level integration.

TSMC Cloud Certification for Accelerated Design: To further accelerate the design process, Synopsys and TSMC have also enabled Synopsys EDA tools on the cloud, certified through TSMC’s Cloud Certification. This provides mutual customers with cloud-ready EDA tools that not only deliver accurate QoR but also seamlessly integrate with TSMC’s advanced process technologies.

The Importance of Multi-Die, 3DIC, and Multi-Physics Design

As semiconductor technology pushes beyond the traditional limits of Moore’s Law, multi-die designs and 3DICs have become essential for enhancing performance and density. These technologies allow for multiple dies, each with its own specialized function, to be stacked or placed side-by-side within a single package. However, the integration of these dies—especially when combining electronic ICs with photonic ICs—introduces significant design challenges.

One of the most pressing issues in multi-die design is thermal management. As multiple heat-generating dies are placed in close proximity, the risk of overheating increases, which can degrade performance and shorten the lifespan of the chip. Additionally, electromagnetic interference (EMI), signal integrity, and power distribution present further challenges that designers must account for during early-stage development.

This is where multi-physics analysis plays a critical role. Multi-physics analysis is the process of evaluating how different physical phenomena—such as heat dissipation, mechanical stress, and electrical signals—interact with one another within a chip package. Without an understanding of these interactions, it becomes nearly impossible to design reliable and efficient multi-die systems.

Synopsys Solutions for Multi-Die and 3DIC Challenges

Synopsys is at the forefront of addressing these challenges through its AI-powered solutions, many of which were updated or introduced during the TSMC OIP Ecosystem Forum. These tools are specifically designed to address the complexity of multi-die designs and 3DICs, where early-stage analysis and optimization are crucial for success.

AI-Driven EDA with Synopsys.ai

One of the most significant updates came from Synopsys.ai, which is now optimized for TSMC’s N2 process technology. This suite allows designers to leverage AI to improve design efficiency and reduce the time needed to move designs to production. By incorporating AI into the design process, Synopsys.ai helps engineers navigate the vast array of potential design configurations, ensuring that the most optimal solutions are chosen for performance, power efficiency, and thermal management.

“Synopsys’ certified Custom Compiler and PrimeSim solutions provide the performance and productivity gains that enable our designers to meet the silicon demands of high-performance analog design on the TSMC N2 process,” said Ching San Wu, Corporate VP at MediaTek in Synopsys’ news release. “Expanding our collaboration with Synopsys makes it possible for us to leverage the full potential of their AI-driven flow to accelerate our design migration and optimization efforts, improving the process required for delivering our industry-leading SoCs to multiple verticals.”

3DIC Compiler and 3DSO.ai for Multi-Die Systems

These tools enable designers to conduct multi-physics analysis early in the design process, which is essential for optimizing thermal and power management, signal integrity, and mechanical stability in multi-die systems. By identifying potential issues—such as hotspots or signal degradation—early in the process, designers can make informed adjustments before reaching the later stages of development, thus avoiding costly redesigns.

3DSO.ai leverages AI to analyze complex multi-die configurations, allowing engineers to test a wide range of potential scenarios in a fraction of the time it would take using traditional methods. This capability is critical as designs become more complex, with tens of thousands of possible combinations for how dies are stacked, interconnected, and cooled.

TSMC-certified Synopsys 3DIC Compiler’s compatibility with TSMC’s SoIC and CoWoS technologies further solidify its position as a leading platform for multi-die designs. This ensures seamless collaboration across design architecture and planning, design implementation, and signoff teams, enabling efficient 3DIC development for cutting-edge applications.

These technologies are critical for enabling the heterogeneous integration of dies in 3DIC systems, which helps overcome traditional scaling challenges such as thermal management and signal integrity.

As a demonstration vehicle, Synopsys achieved a successful tapeout recently, of a test chip featuring a multi-die design using TSMC’s CoWoS advanced packaging technology. This test chip leveraged TSMC’s 3DFabric technology and Synopsys’ multi-die solutions, including silicon-proven UCIe IP, 3DIC Compiler unified exploration-to-signoff platform, and the 3DSO.ai AI-driven optimization solution. The Figure below showcases the level of system analysis and optimization enabled by Synopsys 3DSO.ai. The test chip demonstrated unmatched performance reliability.

Figure: Synopsys 3DSO.ai AI-enabled system analysis and optimization 

Optimizing Power Delivery with Backside Power Innovations

The new backside power delivery capability, introduced through TSMC’s A16 process, represents a critical leap forward in ensuring power integrity in multi-die systems. By routing power through the backside of the chip, more space is made available on the front for signal routing and transistor placement. This helps reduce energy consumption while also enhancing signal integrity, ensuring that trillion-transistor designs can operate efficiently and reliably.

Summary

The announcements made by Synopsys at the TSMC OIP Ecosystem Forum underscore the growing importance of multi-die architectures, 3DIC systems, and multi-physics analysis in semiconductor design. With new AI-driven tools and key updates to existing solutions, Synopsys is helping engineers overcome the complex challenges posed by trillion-transistor designs and multi-die integration.

By leveraging Synopsys’ advanced EDA tools, platforms and IP, engineers can now address critical issues—like thermal management, signal integrity, and power distribution—at the earliest stages of the design process. This proactive approach not only improves design efficiency but also ensures that the final product meets the stringent performance requirements of AI, HPC, and other next-generation applications.

You can read the Synopsys announcement in its entirety here, and more details on the test chip tapeout here.

Also Read:

The Immensity of Software Development and the Challenges of Debugging (Part 3 of 4)

The Immensity of Software Development and the Challenges of Debugging Series (Part 2 of 4)

Synopsys Powers World’s Fastest UCIe-Based Multi-Die Designs with New IP Operating at 40 Gbps


Is AI-Based RTL Generation Ready for Prime Time?

Is AI-Based RTL Generation Ready for Prime Time?
by Bernard Murphy on 10-02-2024 at 6:00 am

shutterstock 2495413145 min

In semiconductor design there has been much fascination around the idea of using large language models (LLMs) for RTL generation; CoPilot provides one example. Based on a Google Scholar scan, a little over 100 papers were published in 2023, jumping to 310 papers in 2024. This is not surprising. If it works, automating design creation could be a powerful advantage to help designers become more productive (not to replace them as some would claim). But we know that AI claims have a tendency to run ahead of reality in some areas. Where does RTL generation sit on this spectrum?

Benchmarking

The field has moved beyond the early enthusiasm of existence proofs (“look at the RTL my generator built”) to somewhat more robust analysis. A good example is a paper published very recently in arXiv: Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks, with a majority of authors from Nvidia and one author from Cornell. A pretty authoritative source.

The authors have extended a benchmark (VerilogEval) they built in 2023 to evaluate LLM-based Verilog generators. The original work studied code completion tasks; in this paper they go further to include generating block RTL from natural language specifications. They also describe a mechanism for prompt tuning through in-context learning (additional guidance in the prompt). Importantly for both completion and spec to RTL they provide a method to classify failures by type, which I think could be helpful to guide prompt tuning.

Although there is no mention of simulation testbenches, the authors clearly used a simulator (Icarus Verilog) and talk about Verilog compile-time and run-time errors, so I assume the benchmark suite contains human-developed testbenches for each test.

Analysis

The authors compare performance across a wide range of LLMs, from GPT-4 models to Mistral, Llama, CodeGemma, DeepSeek Coder and RTLCoder DeepSeek. Small point of initial confusion for this engineer/physicist: they talk about temperature settings in a few places. This is a randomization factor for LLMs, nothing to do with physical temperature.

First, a little background on scoring generated code. The usual method to measure machine generated text is a score called BLEU (Bilingual evaluation understudy), intended to correlate with human-judged measures of quality/similarity. While appropriate for natural language translations, BLEU is not ideal for measuring code generation. Functional correctness is a better starting point, as measured in simulation.

The graphs/tables in the paper measure pass rate against a benchmark suite of tests, allowing one RTL generation attempt per test (pass@1), so no allowance for iterated improvement except in 1-shot refinement over 0-shot. 0-shot measures generation from an initial prompt and 1-shot measures generation from the initial prompt augmented with further guidance. The parameter ‘n’ in the tables is a wrinkle to manage variance in this estimate – higher n, lower variance.

Quality, measured through test pass rates within the benchmark suite, ranges from below 10% to as high as 60% in some cases. Unsurprisingly smaller (LLM) models don’t do as well as bigger models. Best rates are for GPT-4 Turbo with ~1T parameters and Llama 3.1 with 405B parameters. Within any given model, success rates for code completion and spec to RTL tests are roughly comparable. In many cases in-context learning/refined prompts improve quality, though for GPT-4 Turbo spec-to-RTL and Llama3 70B prompt engineering actually degrades quality.

Takeaways

Whether for code completion or spec to RTL, these accuracy rates suggest that RTL code generation is still a work in process.  I would be curious to know how an entry-level RTL designer would perform against these standards.

Also in this paper I see no mention of tests for synthesizability or PPA. (A different though smaller benchmark, RTLLM, also looks at these factors, where PPA is determined in physical synthesis I think – again short on details.)

More generally I also wonder about readability and debuggability. Perhaps here some modified version of the BLEU metric versus expert-generated code might be useful as a supplement to these scores.

Nevertheless, interesting to see how this area is progressing.


5 Expectations for the Memory Markets in 2025

5 Expectations for the Memory Markets in 2025
by Daniel Nenni on 10-01-2024 at 10:00 am

Expectations for the Memory Markets in 2025

TechInsights has a new memory report that is worth a look. It is free if you are a registered member which I am. HBM is of great interest and there is a section on emerging and embedded memories for chip designers. Even though I am more of a logic person, memory is an important part of the semiconductor industry. In fact, logic and memory go together like peas and carrots. If you are looking at the semiconductor industry as a whole and trying to figure out what 2025 looks like you have to include memory, absolutely.

TechInsights also has some interesting videos, I just finished the one on chiplets that was published last month:

TechInsights is a whole platform of reverse engineering, teardown, and market analysis in the semiconductor industry. This collection includes detailed circuit analysis, imagery, semiconductor process flows, device teardowns, illustrations, costing and pricing information, forecasts, market analysis, and expert commentary. 

Inside the memory report are links to many more (not free) reports included for a more detailed view. Here is the first section of the report:

5 Expectations for the Memory Markets in 2025

The memory markets, encompassing DRAM and NAND, are poised for significant growth in 2025, largely driven by the accelerating adoption of artificial intelligence (AI) and related technologies. As we navigate the complexities of these markets, several key trends emerge that are expected to shape the landscape. Here are five expectations for the memory markets in the coming year, along with a potential spoiler that could disrupt everything.

1. AI Leads to Continued Focus on High-Bandwidth Memory (HBM)

The rise of AI, particularly in data-intensive applications like machine learning and deep learning, is driving an unprecedented demand for high-bandwidth memory (HBM). Shipments of HBM are expected to grow by 70% year-over-year as data centers and AI processors increasingly rely on this type of memory to handle massive amounts of data with low latency. This surge in HBM demand is expected to reshape the DRAM market, with manufacturers prioritizing HBM production over traditional DRAM variants.(Learn More)

2. AI Drives Demand for High-Capacity SSDs and QLC Adoption

As AI continues to permeate various industries, the need for high-capacity solid-state drives (SSDs) is on the rise. This is particularly true for AI workloads that require extensive data storage and fast retrieval times. Consequently, the adoption of quad-level cell (QLC) NAND technology, which offers higher density at a lower cost, is expected to increase. QLC SSDs, despite their slower write speeds compared to other NAND types, will gain traction due to their cost-effectiveness and suitability for AI-driven data storage needs. Datacenter NAND bit demand growth is expected to exceed 30% in 2025, after explosive growth of about 70% in 2024.(Learn More)

3. Capex Investment Shifts Heavily Towards DRAM and HBM

Driven by the surge in AI applications, capital expenditure (capex) in the memory market is increasingly being funneled towards DRAM, particularly HBM. DRAM capex is projected to rise nearly 20% year-over-year as manufacturers expand their production capacities to meet the growing demand. However, this shift has left minimal investment for NAND production, creating a potential supply-driven bottleneck in the market. Profitability in the NAND sector continues to improve, which could reignite investment in this area as we move into 2026.(Learn More)

4. Edge AI Begins to Emerge but Won’t Impact Until 2026

Edge AI, which brings AI processing closer to the data source on devices like smartphones and PCs, is anticipated to hit the market in 2025. However, the full impact of this technology won’t be felt until 2026. Devices with true, on-device AI capabilities are expected to launch in late 2025, but sales volumes are unlikely to be significant enough to influence the memory markets immediately. The real shift should occur in 2026 as edge AI becomes more widespread, driving demand for memory solutions tailored to these new capabilities. (Learn More)

5. Datacenter AI Focus Delays Traditional Server Refresh Cycles

The focus on AI-driven data centers has led to a delay in the refresh cycle for traditional server infrastructure. Many organizations are diverting resources to upgrade their AI capabilities, leaving conventional servers in need of updates. While this delay might be manageable in the short term, at some point, these servers will need to be refreshed, potentially creating a sudden surge in demand for DRAM and NAND. This delayed refresh cycle could result in a significant uptick in memory demand once it finally happens. (Learn More)

Spoiler: A Sudden Halt in AI Development Could Upset Everything

While AI is the primary driver behind these market expectations, it’s important to consider the potential for a sudden slowdown in AI development. Whether due to macroeconomic headwinds, diminishing returns on AI investments, or technical roadblocks in scaling AI models, a significant deceleration in AI progress would have profound negative implications for the memory markets. Such a halt would likely lead to a sharp decline in demand for HBM, DRAM, and high-capacity SSDs, disrupting the expected growth and investment patterns in these sectors. As such, while the memory markets are poised for substantial growth in 2025, they remain highly susceptible to the broader trajectory of AI advancements.

Also Read:

Semiconductor Industry Update: Fair Winds and Following Seas!

Samsung Adds to Bad Semiconductor News

Hot Chips 2024: AI Hype Booms, But Can Nvidia’s Challengers Succeed?

The Semiconductor Business will find a way!


Sondrel Redefines the AI Chip Design Process

Sondrel Redefines the AI Chip Design Process
by Mike Gianfagna on 10-01-2024 at 6:00 am

Sondrel Redefines the AI Chip Design Process

Designing custom silicon for AI applications is a particularly vexing problem. These chips process enormous amounts of data with a complex architecture that typically contains a diverse complement of heterogeneous processors, memory systems and various IO strategies. Each of the many subsystems in this class of chip will have different data traffic requirements. Despite all these challenges, an effective architecture must run extremely efficiently, without processor stalls or any type of inefficient data flow. The speed and power requirements for this type of design cannot be met without a highly tuned architecture. These challenges have kept design teams hard at work for countless hours, trying to find the optimal solution. Recently, Sondrel announced a new approach to this problem that promises to make AI chip design far more efficient and predictable. Let’s examine how Sondrel redefines the AI chip design process.

Architecting the Future

Sondrel recently unveiled an advanced modeling process for AI chip designs. The approach is part of the company’s forward-looking Architecting the Future family of ASIC architecture frameworks and IP. By using a pre-verified ASIC framework and IP, Sondrel reduces the risks associated with “from scratch” custom chip design. The advanced modeling process is part of this overall risk reduction strategy.

The approach uses accurate, cycle-based system performance modeling early in the design process.  Starting early, before RTL development, begins the process of checking that the design will meet its specification. This verification approach continually evolves and can be used for the entire flow, from early specification to silicon. Using this unique approach with pre-verified design elements reduces risk and time to market. And thanks to the use of advanced process technology power can also be reduced while ensuring performance criteria can be reliably met.

Digging Deeper

Paul Martin

I had the opportunity to meet with Paul Martin, Sondrel’s Global Field Engineering Director to get more details on how the new approach works. Paul has been with Sondrel for almost ten years. He was previously with companies such as ARM, NXP Semiconductors and Cadence, so he has a deep understanding of what it takes to do advanced custom chip design.

Paul explained that at the core of the new approach is a commercially available transaction-based simulator. Both Sondrel and the supplier of this simulator have invested substantial effort to take this flow well beyond the typical cycle-accurate use model. 

He explained that detailed, timing-accurate models of many IP blocks have been developed. These models essentially create accurate data traffic profiles for each element. Going a bit further, the AI workloads that will issue transactions to these IP blocks are analyzed to create a graphical representation of how transactions are initiated to the chip-level elements such as processors and memories.

Using this view of how the system is running, Paul further explained that a closed-loop simulation system is created that can feedback results to the compiler and the micro-architecture optimization tools for a particular NPU to optimize its performance, avoiding bottlenecks. This ability to model and optimize a system at the transaction level is unique and can be quite powerful.

Paul went on to describe the custom workflow that has been built around the commercial simulator. This workflow allows the same stimulus models to be applied from architectural analysis to RTL design, to emulation and all the way to real silicon. Essentially, the transaction model can be applied all the way through the process to ensure the design maintains its expected level of performance and power. The elusive golden specification if you will.

Paul explained that by focusing on the architect vs. the software developer a truly new approach to complex AI chip design is created. He went on to explain that this approach has been applied to several reference designs. He cited examples for video and data processing, edge IoT data processing and automotive ADAS applications.

To Learn More

You can see the details of the recent Sondrel announcement here. There are also a couple of good pieces discussing Sondrel’s work in the automotive sector on SemiWiki here. And you can explore Sondrel’s overall Architecture the Future strategy here. And that’s how Sondrel redefines the AI chip design process. Exciting stuff.


Elevating AI with Cutting-Edge HBM4 Technology

Elevating AI with Cutting-Edge HBM4 Technology
by Kalar Rajendiran on 09-30-2024 at 10:00 am

HBM4 Compute Chiplet Subsystem

Artificial intelligence (AI) and machine learning (ML) are evolving at an extraordinary pace, powering advancements across industries. As models grow larger and more sophisticated, they require vast amounts of data to be processed in real-time. This demand puts pressure on the underlying hardware infrastructure, particularly memory, which must handle massive data sets with high speed and efficiency. High Bandwidth Memory (HBM) has emerged as a key enabler of this new generation of AI, providing the capacity and performance needed to push the boundaries of what AI can achieve.

The latest leap in HBM technology, HBM4, promises to elevate AI systems even further. With enhanced memory bandwidth, higher efficiency, and advanced design, HBM4 is set to become the backbone of future AI advancements, particularly in the realm of large-scale, data-intensive applications such as natural language processing, computer vision, and autonomous systems.

The Need for Advanced Memory in AI Systems

AI workloads, particularly deep neural networks, differ from traditional computing by requiring the parallel processing of vast data sets, creating unique memory challenges. These models demand high data throughput and low latency for optimal performance. High Bandwidth Memory (HBM) addresses these needs by offering superior bandwidth and energy efficiency. Unlike conventional memory, which uses wide external buses, HBM’s vertically stacked chips and direct processor interface minimize data travel distances, enabling faster transfers and reduced power consumption, making it ideal for high-performance AI systems.

How HBM4 Improves on Previous Generations

HBM4 significantly advances AI and ML performance by increasing bandwidth and memory density. With higher data throughput, HBM4 enables AI accelerators and GPUs to process hundreds of gigabytes per second more efficiently, reducing bottlenecks and boosting system performance. Its increased memory density, achieved by adding more layers to each stack, addresses the immense storage needs of large AI models, facilitating smoother scaling of AI systems.

Energy Efficiency and Scalability

As AI systems continue to scale, energy efficiency becomes a growing concern. AI training models are incredibly power-hungry, and as data centers expand their AI capabilities, the need for energy-efficient hardware becomes critical. HBM4 is designed with energy efficiency in mind. Its stacked architecture not only shortens data travel distances but also reduces the power needed to move data. Compared to previous generations, HBM4 achieves better performance-per-watt, which is crucial for the sustainability of large-scale AI deployments.

Scalability is another area where HBM4 shines. The ability to stack multiple layers of memory while maintaining high performance and low energy consumption means that AI systems can grow without becoming prohibitively expensive or inefficient. As AI applications expand from specialized data centers to edge computing environments, scalable memory like HBM4 becomes essential for deploying AI in a wide range of use cases, from autonomous vehicles to real-time language translation systems.

Optimizing AI Hardware with HBM4

The integration of HBM4 into AI hardware is essential for unlocking the full potential of modern AI accelerators, such as GPUs and custom AI chips, which require low-latency, high-bandwidth memory to support massive parallel processing. HBM4 enhances inference speeds, critical for real-time applications like autonomous driving, and accelerates AI model training by providing higher data throughput and larger memory capacity. These advancements enable faster, more efficient AI development, allowing for quicker model training and improved performance across AI workloads.

The Role of HBM4 in Large Language Models

HBM4 is ideal for developing large language models (LLMs) like GPT-4, which drive generative AI applications such as natural language understanding and content generation. LLMs require vast memory resources to store billions or trillions of parameters and handle data processing efficiently. HBM4’s high capacity and bandwidth enable the rapid access and transfer of data needed for both inference and training, supporting increasingly complex models and enhancing AI’s ability to generate human-like text and solve intricate tasks.

Alphawave Semi and HBM4

Alphawave Semi is pioneering the adoption of HBM4 technology by leveraging its expertise in packaging, signal integrity, and silicon design to optimize performance for next-generation AI systems. The company is evaluating advanced packaging solutions, such as CoWoS interposers and EMIB, to manage dense routing and high data rates. By co-optimizing memory IP, the channel, and DRAM, Alphawave Semi uses advanced 3D modeling and S-parameter analysis to ensure signal integrity, while fine-tuning equalization settings like Decision Feedback Equalization (DFE) to enhance data transfer reliability.

Alphawave Semi also focuses on optimizing complex interposer designs, analyzing key parameters like insertion loss and crosstalk, and implementing jitter decomposition techniques to support higher data rates. The development of patent-pending solutions to minimize crosstalk ensures the interposers are future-proofed for upcoming memory generations.

Summary

As AI advances, memory technologies like HBM4 will be crucial in unlocking new capabilities, from real-time decision-making in autonomous systems to more complex models in healthcare and finance. The future of AI relies on both software and hardware improvements, with HBM4 pushing the limits of AI performance through higher bandwidth, memory density, and energy efficiency. As AI adoption grows, HBM4 will play a foundational role in enabling faster, more efficient AI systems capable of solving the most data-intensive challenges.

For more details, visit this page.

Also Read:

Alphawave Semi Unlocks 1.2 TBps Connectivity for HPC and AI Infrastructure with 9.2 Gbps HBM3E Subsystem

Alphawave Semi Tapes Out Industry-First, Multi-Protocol I/O Connectivity Chiplet for HPC and AI Infrastructure

Driving Data Frontiers: High-Performance PCIe® and CXL® in Modern Infrastructures


Podcast EP249: A Conversation with Dr. Jason Cong, the 2024 Phil Kaufman Award Winner

Podcast EP249: A Conversation with Dr. Jason Cong, the 2024 Phil Kaufman Award Winner
by Daniel Nenni on 09-27-2024 at 10:00 am

Dan is joined by Dr. Jason Cong, the Volgenau Chair for Engineering Excellence Professor at the UCLA Computer Science Department. He is the director of the Center for Domain-Specific Computing and the director of VLSI Architecture, Synthesis, and Technology Laboratory. Dr. Cong’s research interests include novel architectures and compilation for customizable computing, synthesis of VLSI circuits and systems, and quantum computing.

Dr. Cong will be recognized by the Electronic System Design Alliance and the Council on Electronic Design Automation (CEDA) of the IEEE with the 2024 Phil Kaufman Award at a presentation and banquet on November 6 in San Jose, California.

In this far-reaching discussion, Dan explores the many contributions Jason has made to the semiconductor industry. His advanced research in FPGA design automation from from the circuit to system levels is discussed, along with the many successful companies he has catalyzed as a serial entrepreneur.

Dr. Cong is also inspiring a future generation of innovators through his teaching and research in areas such as quantum computing. He explores methods to inspire his students and the path to democratizing chip design, making it readily available to a wide range of new innovations.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Doug Smith of Veevx

CEO Interview: Doug Smith of Veevx
by Daniel Nenni on 09-27-2024 at 6:00 am

Veevx Team 23

Douglas Smith has focused his career on optimizing advanced technologies for high volume ASIC applications. He has led elite design teams at Motorola SPS then Broadcom for over 25 years. With 200+ successful tape outs generating $10B+ in revenue. Douglas left Broadcom to self-fund a startup focused on advanced memory technologies. He has assembled an elite design and business team to provide differentiated memory solutions to his customers.

Tell us about your company

Veevx is a fabless semiconductor company created by a veteran team of engineering architects that left Broadcom’s central engineering group to form a company focused on technology innovations to build products for the market needs. We were founded 2.5 years ago in Mesa, AZ with 20 people split between the US and India.

What problems are you solving?

Our products bring cloud level AI performance to mobile and edge devices. Processing typically performed in the cloud can be done directly on a mobile or edge devices allowing end users to do translations, AI assistance, VR & AR functions locally. This provides low latency, user tailored experience, privacy with longer battery life & higher security, giving better user experience.

We also provide technology to overcome the memory wall. AI functionally is driving the demand for high performance local memory (SRAM). The manufacturing processes and limitations imposed by device physics restricts SRAM scaling as CMOS nodes shrink. This increases memory costs.

Customers want advances in memory technology to overcome the limitations of SRAM. We will soon sample our iRAM to customers. iRAM is an advanced memory chiplet that is packaged with our customers’ microcontroller. It gives their devices more memory, in the same footprint, at significant power savings.

How do you enable your customers?

We are focused on mobile and low power edge devices. Our product enables end customers to perform rich AI functions, typically done in the data center, on the devices. This eliminates the latency of going to the data center, enhances privacy because the data stays on device and in some cases, reduces cost and power.

How do you differentiate from others in this arena?

Veevx specializes in new memory technology and mixed signal compute. Many academic and R&D papers discuss the advantages of compute in memory and how mixed signal compute is required to reduce the processing power while also increasing the performance for AI accelerators. Our team has decades of experience productizing new technology into silicon and specializes in the area needed for an ultra-low power AI accelerator compute product.

What new features/technology are you working on?

High-performance/Ultra-low power compute by integrating mixed signal compute directly into the memory architecture. We take ultra-low power MRAM and add innovations to increase the performance of the memory cells while adding innovative mixed signal compute to our silicon verified ultra-low power MRAM to maximize the compute in memory performance and energy efficiency. Our solution mitigates the power and latency by avoiding data movement to the processor. Our mixed signal compute is flexible to perform the mathematical operations used in AI using 1/100 the power of traditional digital compute engines. We want to keep our accelerator flexible so as models continue to evolve, we will be able to run the operations on our accelerator.

Can you tell us about your experience of being a Silicon Catalyst incubator company?

Being a Silicon Catalyst portfolio company for the past year and a half has been an invaluable experience for us. The incubator has accelerated our development by providing essential tools and foundry access through their in-kind partnerships resources that are often out of reach for startups.

They have offered us a platform to showcase our products and innovations to a wide industry audience, which has been instrumental in increasing our visibility and credibility in the market.

Moreover, Silicon Catalyst’s extensive network of experienced advisors, relevant investors, and collaborative customers has opened doors to new opportunities and partnerships. This support system has not only enhanced our technological capabilities but also propelled our business growth.

How do customers normally engage with your company?

We are a fabless semiconductor company that sells chips in chiplet (bare die) or in packaged form.

Customers can engage with Veevx via direct inquiries on our website, at industry conferences, and through our business development team.

We also engage in research and development collaboration projects with customer engineering teams to customize our products to their specific needs.

Veevx US Team

We can be reached at one of the following:
Email: dmccarty@veevx.com
Web: https://veevx.com
LinkedIn: www.linkedin.com/company/veevx-inc/

Also Read:

CEO Interview: Adam Khan of Diamond Quanta

Executive Interview: Michael Wu, GM and President of Phison US

CEO Interview: Wendy Chen of MSquare Technology


Asia Driving Electronics Growth

Asia Driving Electronics Growth
by Bill Jewell on 09-26-2024 at 4:00 pm

Electronics Production 2024 September

Electronics production in the major developed countries has been showing slow growth or declines in 2024. United States electronics production three-month-average change versus a year ago (3/12 change) was 0.4% in July 2024, the slowest since the pandemic year of 2020. Growth has been slowing since averaging 6.5% in 2022 and 2.3% in 2023. Japan has gone from averaging 6.0% 3/12 growth in 2023 to a 2% decline in June 2024. The 27 countries of the European Union (EU27) have reported declining electronics production since May 2023, except for a 3.2% increase in May 2024. In June 2024, EU27 production was down 8.0%. United Kingdom (UK) production has been declining since September 2023, except for a 0.7% increase in February 2024. UK production was down 3.7% in July 2024.

In contrast, most developing Asian countries are experiencing strong growth in electronics production. Taiwan and South Korea are considered developed countries, but their electronics industries are still emerging. Beginning in mid-2023, these countries showed a turnaround in production.

Taiwan has been the strongest country in Asia, with 3/12 production (measured in New Taiwan dollars) moving from a 3.2% decline in June 2023 to an increase of 49% in July 2024. This growth has been driven by computers, with production in January through July 2024 doubled versus January-July 2023. Much of the computer boom can be attributed to AI servers. Market Research firm MIC estimates Taiwan produces 90% of global AI servers. TrendForce projects the US dollar value of the AI server market will grow 69% in 2024.

Vietnam has also shown a strong turnaround in electronics production. The low in 3/12 change was negative 10.8% in May 2023. 3/12 growth was over 20% in June and July 2024. Vietnam has benefited from Samsung’s $22 billion investment in the country. Vietnam produces about half of Samsung’s smartphones.

South Korea’s production turnaround was more recent, with 3/12 production change negative in February through April 2024. Production turned positive in May 2024 at 3.2% and reached 23.8% growth in July 2024. Labor strikes this year at Samsung likely had an impact on production trends. The strong September growth may be a temporary blip.

India is demonstrating healthy electronics production growth, with July 2024 3/12 growth of 14%. 3/12 change had been negative from October 2022 through March 2024. India has benefited from multinational companies increasing manufacturing in the country. Apple has begun manufacturing its latest generation of iPhones, the 16 series, in India. Apple plans to produce 25% of its iPhones in India by 2025, up from about 14% last year, shifting production from China. Lenovo announced this month it has started making AI servers in India for local consumption and export. The government of India projects the country will double its electronics manufacturing over the next five years.

China remains the dominant electronics manufacturer in Asia, but growth has slowed as companies shift manufacturing to other countries. China’s 3/12 production change turned positive in April 2023 and peaked at 13.8% in June 2024. August 2024 3/12 growth was 12.3%, slower than the July data for the other countries in the chart.

The trend of weak electronics production growth in the U.S, Europe and Japan will likely continue for at least the next few years. Asia will remain the growth driver. Political and economic pressures affecting China will lead to continuing production shifts to other Asian nations. India seems poised for strong growth due to its huge labor force, low labor rates, and significant investment by multinational electronics companies.

Semiconductor Intelligence is a consulting firm providing market analysis, market insights and company analysis for anyone involved in the semiconductor industry – manufacturers, designers, foundries, suppliers, users or investors. Please contact me if you would like further information.

Bill Jewell
Semiconductor Intelligence, LLC
billjewell@sc-iq.com

Also Read:

Robust Semiconductor Market in 2024

Semiconductor CapEx Down in 2024, Up Strongly in 2025

Automotive Semiconductor Market Slowing


Automating Reset Domain Crossing (RDC) Verification with Advanced Data Analytics

Automating Reset Domain Crossing (RDC) Verification with Advanced Data Analytics
by Kalar Rajendiran on 09-26-2024 at 10:00 am

RDC Verification using Data Analysis Techniques

The complexity of System-on-Chip (SoC) designs continues to rise at an accelerated rate, with design complexity doubling approximately every two years. This increasing complexity makes verification a more difficult and time-consuming task for design engineers. Among the key verification challenges is managing reset domain crossing (RDC) issues, particularly in designs that utilize multiple asynchronous resets. RDC occurs when data is transferred between different reset domains.

While the use of Electronrc Design Automation (EDA) tools for clock domain crossing (CDC) verification has become a common practice, the verification of RDCs—an equally important aspect—has only recently gained prominence. RDC verification is necessary to ensure data stability between asynchronous reset domains. Failure to do so can hide timing and metastability issues in a SoC, resulting in unpredictable behavior during its operation.

Siemens EDA recently published a whitepaper addressing this important topic and how advanced data analytics techniques can be leveraged to achieve design verification of RDCs.

Challenges of RDC Verification

RDC verification tools are essential for detecting potential metastability issues in designs, but they generate vast amounts of data—often millions of RDC paths. Engineers must manually analyze this data to find common root causes for violations and apply appropriate constraints. This process is both time-consuming and error-prone, often leading to multiple design iterations that delay project timelines.

One of the primary challenges is the sheer volume of RDC violations reported, especially in the early stages of design when no constraints have been applied. The lack of proper constraints related to reset ordering, reset grouping, and isolation signals can lead to false violations and unnecessary debugging effort. Design teams can spend weeks analyzing these violations manually, often overlooking critical issues or spending too much time on trivial paths.

Addressing RDC Challenges with Data Analytics

The manual approach to RDC verification is no longer sufficient given the complexity of modern SoC designs. Advanced data analytics and supervised data processing techniques offer a promising solution. These techniques can quickly analyze the vast amounts of data generated by RDC tools, identify patterns, and suggest optimal setup constraints. By recognizing common root causes of violations, such as incorrect reset domain grouping or reset ordering issues, data analytics techniques provide recommendations for constraints that can be applied to resolve multiple RDC paths at once. These constraints may include stable signal declarations, reset ordering specifications, reset domains, isolation signals, and constant declarations.

 

Figure 1. RDC verification using data analysis techniques.

Key Recommendations for Improving RDC Verification

Several specific recommendations are identified in the whitepaper, to reduce RDC violations through automated data analysis.

Reset Ordering: Ensuring that the receiver flop’s reset is asserted before the transmitter flop’s reset can prevent metastability. If proper ordering constraints are not defined, RDC tools may flag multiple violations due to this common issue. For example, if reset RST2 (receiver) is asserted before RST1 (transmitter), it ensures that metastability does not propagate downstream.

Synchronous Reset Domains: RDC issues arise when reset signals for the source and destination registers are asynchronous. Grouping resets into synchronous reset domains during setup reduces the number of reported crossings.

Directive Specifications: Defining valid constraints for specific design scenarios can prevent unnecessary RDC violations. For example, if a receiving register’s clock is off when a transmitter register’s reset is asserted, there is no risk of metastability, and the tool should not report a violation. Neglecting such constraints leads to noisy results.

Stable Signals: Some signals within the design may not contribute to metastability despite being part of asynchronous reset domains. If these are not marked as stable, they will be incorrectly flagged as potential violations.

Isolation Signals: Isolation techniques can prevent metastability by isolating data transfer between reset domains during asynchronous reset assertion. Properly specified isolation constraints reduce the number of RDC paths requiring manual review.

Non-Resettable Receiver Registers: In some cases, non-resettable registers (NRR) may not pose a metastability risk if a downstream register in the same reset domain exists. Failing to specify such conditions leads to false violations.

Case Study: Data Analytics in Action

A case study was conducted to evaluate the effectiveness of using data analytics in RDC verification. The design in question consisted of 263,657 register bits, multiple clock domains, and nine reset domains. Initial RDC verification runs with manually applied constraints identified approximately 8,000 RDC paths.

After applying advanced data analytics techniques, a consolidated report was generated, recommending several constraints. These constraints addressed reset ordering, data isolation, and synchronous reset domain grouping, among other issues. Following the application of these recommendations, the number of RDC violations dropped from 8,000 to 2,732—a more than 60% reduction in violations.

The use of a threshold value of 200 (indicating constraints would be recommended only if they impacted at least 200 paths) helped streamline the process, focusing on high-impact issues and minimizing noise. The time to reach RDC verification closure was reduced from ten days to under four days, showcasing the significant time savings from data-driven analysis.

Results and Impact

The case study demonstrated that the application of data analytics to RDC verification can lead to a significant reduction in unsynchronized crossings and violations. By systematically identifying root causes and applying targeted constraints, verification engineers were able to resolve up to 60% of RDC violations without manual intervention. This reduction in violations accelerated the verification closure process and improved overall design quality. Additionally, the flexibility provided by the analytics approach—allowing engineers to focus on high-impact suggestions—streamlined the debugging process and ensured that effort was invested in solving critical design issues.

The table below shows the results from applying data analytics techniques to RDC verification of five different designs.

Summary

As SoC designs grow in complexity, traditional manual RDC verification methods are not scalable. By incorporating advanced data analytics into the verification process, engineers can significantly reduce closure time, improve result quality, and avoid costly silicon respins. These techniques not only accelerate root cause identification but also provide actionable insights through constraint recommendations that target common design issues. This automated approach ensures that real design bugs are addressed early, reducing metastability risks and strengthening the final SoC design. Integrating these methods into existing verification flows promises to save time and effort while delivering higher-quality, error-free designs.

The entire whitepaper can be downloaded from here.

Also Read:

Smarter, Faster LVS using Calibre nmLVS Recon

Siemens EDA Offers a Comprehensive Guide to PCIe® Transport Security

Calibre DesignEnhancer Improves Power Management Faster and Earlier


TSMC 16th OIP Ecosystem Forum First Thoughts

TSMC 16th OIP Ecosystem Forum First Thoughts
by Daniel Nenni on 09-26-2024 at 6:00 am

TSMC Advanced Technology Roadmap 2024

Even though this is the 16th OIP event please remember that TSMC has been working closely with EDA and IP companies for 20+ years with reference flows and other design enablement and silicon verification activities. The father of OIP officially is Dr. Morris Chang who named it the Grand Alliance. However, Dr. Cliff Hou is the one who actually created the OIP which is now the largest and strongest ecosystem in the history of semiconductors.

I spent a good portion of my career working with EDA and IP companies on foundry partnerships as well as foundries as a customer strategist. In fact, I still do and it is one of the most rewarding experiences of my career. Hsinchu was my second home for many years and the hospitality of the Taiwan people is unmatched. That same hospitality is a big part of the TSMC culture and part of the reason why they are the most trusted technology and capacity provider.

Bottom line: If anyone thinks this 20+ years of customer centric collaboration can be replicated or reproduced, it cannot, the OIP is a moving target, it expands and gets stronger every year. An ecosystem is also driven by the success of the company and in no part of history has TSMC been MORE successful than today, my opinion.

We will be covering the event in more detail next week but I wanted to share my first thoughts starting with a quote from a blog published yesterday by Dan Kochpatcharin, Head of Ecosystem and Alliance Management Division at TSMC. I met Dan 20 years ago when he was at Chartered Semiconductor. For the last 17 years he has been at TSMC where he started as Deputy Director of the TSMC IP Alliance (working for Cliff Hou) which is now a big part of the TSMC OIP.

Advancing 3D IC Design for AI Innovation by Dan Kochpatcharin

“Our collaboration with TSMC on advanced silicon solutions for our AWS-designed Nitro, Graviton, Trainium, and Inferentia chips enables us to push the boundaries of advanced process and packaging technologies, providing our customers with the best price performance for virtually any workload running on AWS.” – Gary Szilagyi, vice president, Annapurna Labs at AWS

Readers of the SemiWiki Forum will get this inside joke and if you think this quote from AWS is a coincidence you are wrong. C.C. Wei has a very competitive sense of humor!

Dr. L.C. Lu (Vice President of Research & Development / Design & Technology Platform) did the keynote which was quite good. I first met L.C. when he was in charge of the internal TSMC IP group working for Cliff Hou. He is a very smart no nonsense guy who is also a great leader. Coincidentally, L.C. and CC Wei both have P.h.D.s from Yale.

Some of the slides were very similar to the earlier TSMC Symposium slides which tells you that TSMC means what it says and says what it means. There were no schedule changes, it was all about implementation, implementation, and implementation.

L.C. did an interesting update on Design-Technology Co-Optimization (DTCO). I first heard of DTCO in 2022 and it really is the combination of design and process optimization. I do know customers who are using it but this is the first time I have seen actual silicon results. Remember, this is two years in the making for N3 FinFlex.

The numbers L.C. shared were impressive. In order to do real DTCO a foundry has to have both strong customer and EDA support and TSMC has the strongest. For energy efficiency (power savings) N3 customers are seeing 8%-20% power reductions and 6%-38% improvement in logic density depending on the fin configuration.

L.C. also shared DTCO numbers for N2 NanoFlex and the coming A16 SPR (Super Power Rail) which were all in the double digits (11%-30%). I do know quite a few customers who are designing to N2, in fact, it is just about all of TSMC’s N3 customers I am told. It will be interesting to see more customer numbers next year.

L.C. talked about packaging as well which we will cover in another blog but let me tell you this: By the end of 2024 CoWos will have more than 150 tape-outs from more than 25 different companies! And last I heard TSMC CoWos capacity will more than quadruple from 2023 levels by the end of 2026. Packaging is one of the reasons why I feel that the semiconductor industry has never been more exciting than it is today, absolutely!

Also Read:

TSMC OIP Ecosystem Forum Preview 2024

TSMC 16th OIP Ecosystem Forum First Thoughts

TSMC’s Business Update and Launch of a New Strategy