webinar banner AI 2026 v2

Silicon Catalyst: Searching for the Next Great Start-up

Silicon Catalyst: Searching for the Next Great Start-up
by Daniel Nenni on 01-04-2026 at 2:00 pm

SiC 400 Jan2026Deadline Static

Silicon Catalyst has emerged as a distinctive force in the global start-up ecosystem, positioning itself not merely as an accelerator, but as a launchpad for deep-technology innovation. Focused primarily on semiconductor and hardware-based start-ups, Silicon Catalyst addresses a long-standing gap in the venture landscape: while software companies can scale quickly with limited capital, hardware and silicon ventures often require years of development, significant funding, and specialized expertise. In its search for the next great start-up, Silicon Catalyst blends technical rigor, industry partnerships, and long-term vision to nurture companies capable of reshaping entire industries.

At the heart of Silicon Catalyst’s mission is the recognition that silicon remains foundational to modern technology. From artificial intelligence and autonomous vehicles to healthcare devices and clean energy systems, advances in semiconductors drive progress across sectors. Yet, despite their importance, early-stage silicon start-ups face daunting barriers, high fabrication costs, long design cycles, and limited access to manufacturing resources. Silicon Catalyst seeks to remove these obstacles by offering selected start-ups unparalleled access to tools, mentorship, and industry networks that would otherwise be out of reach.

The search for the next great start-up begins with a strong emphasis on technical differentiation. Silicon Catalyst looks for founding teams with deep domain expertise and novel approaches to chip design, materials, or system architecture. Incremental improvements are not enough; the organization prioritizes ideas that promise step-change performance, cost efficiency, or energy savings. Whether it is a breakthrough in photonic computing, advanced sensors, or specialized AI accelerators, the goal is to identify technologies with the potential to become industry standards rather than niche solutions.

Equally important is the quality of the founding team. Silicon Catalyst understands that even the most promising technology can fail without capable leadership. Successful applicants typically combine technical excellence with entrepreneurial resilience. Founders must demonstrate the ability to learn quickly, adapt to market feedback, and navigate the complex relationships inherent in the semiconductor supply chain. The accelerator’s mentors, many of whom are seasoned executives, engineers, and investors, play a crucial role in shaping these teams, helping them avoid common pitfalls and make informed strategic decisions.

What sets Silicon Catalyst apart is its ecosystem-driven model. Instead of relying solely on cash investments, it offers start-ups access to an extensive network of partners, including semiconductor companies, EDA tool providers, foundries, and packaging firms. This in-kind support dramatically reduces development costs and accelerates time to market. For start-ups, this can mean the difference between an idea that remains on paper and a product that reaches customers. For Silicon Catalyst, it ensures that the start-ups it supports are grounded in real-world feasibility, not just theoretical promise.

The search for the next great start-up is also shaped by long-term thinking. Silicon Catalyst recognizes that hardware innovation does not conform to the rapid timelines typical of software ventures. As a result, it encourages patience, from both founders and investors, while maintaining rigorous milestones and accountability. This balanced approach allows start-ups to tackle ambitious problems without being forced into premature commercialization.

Bottom line: Silicon Catalyst’s pursuit of the next great start-up is about more than financial returns. It is about advancing the technological infrastructure that underpins modern society. By empowering silicon-focused entrepreneurs, the organization helps ensure continued innovation in areas critical to economic growth, national competitiveness, and global sustainability. In doing so, Silicon Catalyst is not just searching for the next success story—it is actively shaping the future of technology itself.

Submit Application Here

Also Read:

Revitalizing Semiconductor StartUps

Silicon Catalyst on the Road to $1 Trillion Industry

The 2025 Semi Industry Forum: On the Road to a $1 Trillion Industry


CEO Interview with Artem Golubev of testRigor

CEO Interview with Artem Golubev of testRigor
by Daniel Nenni on 01-04-2026 at 12:00 pm

Testrigor Artem

Artem Golubev is the founder and CEO of testRigor, a company revolutionizing software test automation through AI-powered plain English testing. With a mission to eliminate the massive maintenance overhead that plagues traditional testing tools while simultaneously improving test coverage, testRigor has enabled hundreds of companies, including Netflix, Splunk, Business Wire, and Koch Industries, to build test automation 15X faster and spend 200X less time maintaining it. I recently had the opportunity to sit down with Artem to discuss testRigor’s innovative approach to solving one of the software industry’s most persistent challenges.

Tell us about your company.

For almost a decade, I’ve been on a mission to reduce the number of human efforts spent on testing AND improve test coverage at the same time. We built testRigor to solve problems that prevent people from being able to achieve 100% test automation.

The fundamental issue is that many QAs make the mistake of using tools like Selenium simply because they’re free and open-source. However, Selenium introduces a gigantic maintenance overhead that is so huge that it prevents companies from being able to build more automation because they’re drowning in test maintenance. The problem is that with Selenium, people test how engineers built things yesterday rather than how the application works from an end-user’s perspective.

testRigor was born to empower ANYONE to use free-flowing, plain English to build test automation. You don’t need to know how to code. Moreover, there is almost no maintenance since the specifications are in plain English, purely from the end-user’s perspective. Testing can be as easy as writing: “find and select a Kindle” and “add it to the cart.” Our AI-driven platform translates these high-level instructions into specific test steps automatically.

We strongly believe in our mission that technology can do MUCH more for testing compared to what is available on the market today. Our goal is to allow our customers to have as valuable a test suite as possible with as little effort as possible.

What problems are you solving?

The software testing industry faces three critical problems that we’re addressing head-on.

First is the maintenance nightmare. Traditional script-based automation tools require constant updates whenever there are changes to the application’s HTML structure, IDs, or XPath selectors. Companies spend 90-99% of their automation time on maintenance rather than creating new tests. This is simply unsustainable. With testRigor, because tests are written in plain English from the end-user’s perspective, they withstand changes in implementation details. When you write “click on the Submit button,” it doesn’t matter if developers change the button’s ID or CSS class; the test continues to work.

Second, there’s a massive skills gap. Traditional automation requires specialized programming knowledge, which means companies either need to hire expensive automation engineers or watch their manual QA teams sit on the sidelines. This creates a bottleneck where only a small percentage of test cases ever get automated. testRigor democratizes test automation by allowing product managers, manual testers, and business analysts to create and understand automated tests using plain English.

Third is the coverage problem. Most companies struggle to get beyond 30-40% test automation coverage because the maintenance burden becomes overwhelming. We’ve seen Fortune 1000 companies go from 34% automation coverage to 91% in under 9 months using testRigor with only their manual QA team. That’s the kind of transformation we enable.

What application areas are your strongest in?

testRigor excels in several key areas that make us the only end-to-end test automation tool companies need.

Our strongest area is cross-system end-to-end testing. We can build tests spanning web, mobile (both native and hybrid for iOS and Android), desktop applications, APIs, Salesforce, ServiceNow, Microsoft Dynamics, SAP, and any other third-party systems, all in one simple test. This is unique in the market. For instance, you can create a test that starts on a web application, sends an email, verifies the email content, clicks a link in that email, continues the flow on a mobile app, and validates API responses all in plain English within a single test case.

We’re particularly strong with form-based UIs and functionality that has predictable input/output. We excel at test cases that require multiple users to interact with the same flow, whether via email, SMS, or instant messages. Our platform is ideal for applications with constantly changing code and HTML IDs, where traditional automation fails spectacularly.

What keeps your customers up at night?

Our customers lose sleep over several critical concerns, and we’ve designed testRigor to address each one.

The biggest worry is production bugs reaching customers. Companies know they should have comprehensive test coverage, but with traditional tools, building and maintaining that coverage requires resources they don’t have. They’re caught in a terrible dilemma: invest heavily in test automation that will consume endless maintenance time, or ship with inadequate testing and risk customer-impacting bugs. With testRigor, they can finally achieve comprehensive coverage without the maintenance penalty.

Release velocity is another major concern. In today’s competitive market, companies need to ship features fast, but they’re terrified of breaking existing functionality. Traditional regression testing is a bottleneck; either you wait days for manual testing, or you skip tests and cross your fingers. Our customers use testRigor to run comprehensive regression suites in parallel, getting results in minutes instead of hours or days. This enables true continuous deployment.

Technical debt in testing is a silent killer. Many companies have invested years in Selenium-based automation, only to find themselves with brittle test suites that break with every release. They’re spending more time fixing tests than finding bugs. Migration seems daunting, but companies like Enumerate saved $180,000 by switching to testRigor, avoiding the Selenium setup costs and not needing to hire specialized automation engineers.

What new features/technology are you working on?

We’re continuously pushing the boundaries of what’s possible with AI-powered test automation, and we have several exciting developments in the pipeline.

Our generative AI capabilities are evolving rapidly. We recently launched features that allow users to generate entire test cases by simply providing an app description. The AI analyzes the application and creates comprehensive test scenarios automatically. We’re enhancing this to handle increasingly complex applications and edge cases. Users can also generate tests from their existing manual test documentation, just paste in their test steps, and testRigor converts them into executable automation.

We’re expanding our AI-powered testing capabilities specifically for LLM-based applications. As more companies build chatbots, virtual assistants, and other AI-driven features, they need ways to validate these systems. We’re developing advanced natural language validation that can assess whether AI responses are appropriate, detect bias or sensitive information leakage, and validate sentiment across diverse inputs.

Self-healing test capabilities are being enhanced. Our AI already automatically adapts tests when UI elements move or change, but we’re making this even more intelligent. The system will soon provide predictive insights about which tests might be affected by upcoming code changes, allowing teams to proactively review and adjust tests before running them.

How do customers normally engage with your company?

We’ve designed multiple pathways for customers to experience testRigor’s value, because we understand that different organizations have different needs and evaluation processes.

Many customers start with our forever-free public account. This allows a single user to explore testRigor with unlimited test cases, though tests are publicly visible. It’s perfect for individuals who want to experiment with the technology and see the plain English approach in action. The free tier includes a limited feature set and community support, but it’s enough to understand testRigor’s core value proposition.

For teams ready to get serious, we offer private tier plans starting at $99/month for a single user with 1,000 test cases. This provides access to our extended feature set and customer support. Our mid-tier plans, starting at $900 per month, offer unlimited users and test cases – ideal for growing teams that need to scale their automation efforts. Enterprise plans provide custom pricing with additional features like SSO, SLA guarantees, and on-premise deployment options.

The typical engagement starts with a personalized demo. A testRigor specialist walks prospects through the platform, often using the customer’s own application to demonstrate real value. We can usually show meaningful results within the first session – writing actual tests in plain English that immediately work against their system. This hands-on approach resonates with teams who are tired of lengthy evaluation processes with traditional tools.

Contact testRigor.

Also Read:

CEO Interview with Gopi Sirineni of Axiado

CEO Interview with Masha Petrova of Nullspace

CEO Interview with Eelko Brinkhoff of PhotonDelta


Podcast EP325: How KIOXIA is Revolutionizing NAND FLASH Memory

Podcast EP325: How KIOXIA is Revolutionizing NAND FLASH Memory
by Daniel Nenni on 01-02-2026 at 10:00 am

Daniel is joined by Doug Wong, senior member of the technical staff at KIOXIA America, where he has contributed to the advancement of memory technologies since 1993. He began his career with KIOXIA in the company’s Memory Division (then part of Toshiba America) and has since focused on a broad range of memory solutions, including PSRAM, SRAM, MROM, EPROMs, NOR flash, and NAND flash. Doug has been an active contributor to industry standards as a member of the JEDEC JC42.4 committee since 1996.

Dan explores the broad range of innovations KIOXIA has made in the development of 3D NAND FLASH memory with Doug. The technology innovations to produce 3D memories using multiple wafers are discussed, along with the work KOIXIA has done with JEDEC to advance the technology for the industry. Doug also describes the impact these advances have had on many growth industries, including AI and automotive. He also describes the the newer generations of the technology which are emerging along with the live demonstrations presented at several major shows.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Revolutionizing Hardware Design Debugging with Time Travel Technology

Revolutionizing Hardware Design Debugging with Time Travel Technology
by Daniel Nenni on 01-02-2026 at 6:00 am

DVCon Europe 2025 Undo.io

In the semiconductor industry High-Level Synthesis (HLS) and SystemC have become essential tools, allowing engineers to model complex hardware designs using familiar C/C++ constructs. Yet, despite the widespread adoption of these languages, the debugging workflows in hardware development lag far behind those in software engineering. Traditional methods rely heavily on print statements, logs, waveform viewers, and iterative trial-and-error, often leading to frustration when bugs appear intermittently or in third-party libraries. This is where time travel debugging changes everything.

Time travel debugging, as pioneered by tools like Undo, introduces a powerful paradigm: record, replay, and resolve. Instead of repeatedly rerunning a failing simulation in hopes of reproducing a bug, engineers record the entire execution of a Linux process from the process level down to individual CPU instructions. This recording captures every deterministic and nondeterministic event, including system calls, I/O, timing functions, and multithreaded interactions. Once a crash or failure occurs, the tool automatically stops recording, preserving the exact state at the point of failure.

The magic happens during replay. Engineers load the portable recording into the debugger and navigate freely forwards and backwards in time. If a crash is at the end of the recording, simply jump there and step backward from symptom to root cause. Traditional forward-only debuggers like GDB force users to restart runs repeatedly, but time travel eliminates guesswork. Commands mirror GDB’s familiar syntax with reverse counterparts: reverse-step, reverse-next, reverse-finish, and reverse-continue. A particularly powerful feature is “last,” which instantly jumps to the exact moment a variable or memory location was last modified—ideal for tracking memory corruption or race conditions.

In a live demonstration involving a SystemC testbench with multiple libraries, a subtle off-by-one error caused a failure: code intended to read the zero-index bit of a string incorrectly accessed the first bit, yielding garbage output. Using the recording, an AI assistant (Claude) interfaced with Undo via a custom library, autonomously navigated backward, set bookmarks, executed reverse commands, and pinpointed the exact faulty array access in minutes—without any manual intervention.

This approach shines in complex scenarios common to hardware modeling:
  • Race conditions: Multithreaded SystemC simulations often exhibit nondeterministic behavior. The “last” command, combined with reverse-continue, reveals which thread overwrote shared memory and when, exposing missing locks without recompilation.
  • Deadlocks: Recordings capture all thread states, allowing engineers to trace blocking calls across time.
  • Intermittent failures: By integrating recording into regression pipelines, failing tests automatically generate recordings only when assertions fail, ensuring reproducible evidence is ready the next morning.

Undo also addresses hardware engineers’ needs with a waveform viewer that generates standard waveforms from recordings. Clicking any signal jumps directly to the corresponding source code line in the debugger, bridging the gap between high-level C++ models and low-level signal behavior.

Performance overhead is minimal for computational code, often near full speed, though I/O-heavy or highly nondeterministic workloads incur some slowdown due to logging external inputs. The tool requires Linux with modern Intel/AMD processors but needs no code changes, debug builds, or instrumentation.

Compared to alternatives like the open-source rr project (great for academic use) or Microsoft’s Time Travel Debugging in Visual Studio, Undo offers production-grade reliability, multithreading support, and seamless integration with modern EDA workflows.

Engineers report that debugging complex SystemC models traditionally takes at least a day, often involving consultations with library vendors or code owners. Time travel debugging reduces this by 4x or more, democratizing debugging: junior engineers can trace issues in unfamiliar codebases by simply following data flow backward. This accelerates verification, improves coverage, shortens time-to-market, and preserves team sanity.

Bottom line: In an industry racing toward ever-more-complex designs, adopting time travel debugging isn’t just an upgrade, it’s a necessity. Tools like Undo bring software’s most powerful debugging techniques to hardware, empowering engineers to resolve bugs faster, more reliably, and with less frustration.

Contact Undo

Also Read:

Taming Concurrency: A New Era of Debugging Multithreaded Code

Video EP7: The impact of Undo’s Time Travel Debugging with Greg Law

CEO Interview with Dr Greg Law of Undo


Addressing Silent Data Corruption (SDC) with In-System Embedded Deterministic Testing

Addressing Silent Data Corruption (SDC) with In-System Embedded Deterministic Testing
by Daniel Nenni on 01-01-2026 at 10:00 am

Siemens Broadcom TSMC OIP2025 SemiWiki

Silent Data Corruption (SDC) represents a critical challenge in modern semiconductor design, particularly in high-performance computing environments like AI data centers. As highlighted in a collaborative presentation by Broadcom Inc. and Siemens EDA at the 2025 TSMC OIP event, SDC occurs when hardware defects cause erroneous computations without triggering detectable errors, leading to subtle yet devastating failures. In one customer experiment involving a 54-day training run on 16,384 GPUs, 419 unexpected interruptions were reported, with 6 attributed directly to SDC. Though rare, accounting for about 1.4% of fails, these incidents can disrupt mission-critical operations, such as AI model training, where reliability is paramount.

The presentation underscores the industry-wide nature of SDC, driven by shrinking process nodes and increasing chip complexity. Defects that evade manufacturing tests may manifest in-field due to aging, voltage fluctuations, or thermal stress. Traditional testing methods fall short here, as they require device removal for diagnostics, which is impractical in deployed systems. To combat this, the teams advocate for in-system testing capabilities that allow periodic checks without downtime. Running ATPG patterns directly in the field detects latent defects that could precipitate SDC, ensuring system integrity. For AI applications, this means integrating test suites that can be executed routinely, preventing costly interruptions. Moreover, new patterns tailored to SDC can be deployed remotely, extending device lifespan without physical intervention.

Siemens’ In-System Test (IST) solution emerges as a key enabler. Built on the Streaming Scan Network (SSN), IST interfaces with embedded deterministic test (EDT) structures to deliver ATPG patterns efficiently. The IST controller drives the SSN’s parallel interface, supporting high-bandwidth data transfer via protocols like APB or AXI. In Broadcom’s implementation, IST was adapted for an EDT-based design with a Streaming Scan Host at the chip level. The controller resides at the top level, loading patterns into local SRAM via an on-chip CPU. Block-level EDT patterns, originally for production testing, are retargeted to IST inputs, allowing selective testing of targeted blocks while maintaining functional operation elsewhere.

Implementation brought several design challenges to the fore. Functional isolation is paramount: “functional” blocks (e.g., CPU subsystems) must remain active to load and execute IST operations, while “targeted” blocks switch to scan mode for testing. This requires isolating scan inputs to prevent interference. All functional block inputs that could disrupt IST, such as interrupts or AXI signals, must be held in a “quiet” state. Outputs from targeted blocks, which toggle during capture, are gated to avoid propagating noise. Broadcom addressed this by inserting isolation blocks and enabling Test Data Registers for control.

Clock splitting posed another hurdle. Broadcom’s methodology places On-Chip Clock controllers (OCC) at the chip top due to custom clocking. Functional blocks need free-running clocks, but targeted ones require OCC activation for scan shifts. Solutions included branching pre-OCC clocks for functional paths or adding secondary OCCs for targeted branches, ensuring synchronized yet independent clock domains.

Verification and Static Timing Analysis added complexity. Typically, STA modes separate functional and Design-for-Test (DFT) paths, but IST demands a hybrid “merged” mode where some blocks are functional and others in DFT. The Siemens tool provides verification collaterals like transaction files, C code, and SystemVerilog tasks for Design Verification (DV) environments. Testing occurs on post-DFT netlists, incorporating boot sequences, which extends runtime. Close collaboration between DV and DFT teams was essential for deliverables and debugging handshakes.

Results from the APB-based IST implementation demonstrate feasibility. With a 32-bit wide subordinate interface and SSN data bus, hardware overhead was modest: the IST Controller (ISTC) added 200 flops and 5,000 normalized combinational logic units, while SSH contributed 1,000 flops and 30,000 units. Five intest modes were run for 2,500 patterns, using 2 MB on-chip SRAM (about 0.5 million 32-bit words). Pattern storage ranged from 165,000 to 260,000 words per mode, with counts of 22-35 patterns. Overall, ~1.9 million 32-bit words were managed, with 4 loads per mode, showcasing efficient compression and bandwidth utilization.

Bottom line: The collaboration between Broadcom and Siemens highlights IST’s role in mitigating SDC through in-field testing. Despite challenges in isolation, clocking, and verification, the solution was successfully implemented and verified in DFT and DV setups. Future efforts will extend to AXI-based IST, promising broader adoption. This approach not only enhances reliability in AI and hyperscale environments but also reduces field failures, underscoring the value of embedded deterministic testing in next-generation silicon.

Also Read:

Podcast EP323: How to Address the Challenges of 3DIC Design with John Ferguson

3D ESD verification: Tackling new challenges in advanced IC design

Signal Integrity Verification Using SPICE and IBIS-AMI


TSMC’s 6th ESG AWARD Receives over 5,800 Proposals, Igniting Sustainability Passion

TSMC’s 6th ESG AWARD Receives over 5,800 Proposals, Igniting Sustainability Passion
by Daniel Nenni on 01-01-2026 at 6:00 am

TSMC ESG Award Ceremony 2025

Taiwan Semiconductor Manufacturing Company has once again demonstrated its leadership in corporate sustainability with the successful conclusion of its 6th ESG AWARD, which attracted more than 5,800 proposals from employees across the organization. The overwhelming response reflects not only TSMC’s strong internal engagement but also the growing momentum of environmental, social, and governance (ESG) values within the global semiconductor industry.

Launched as a platform to encourage employee participation in sustainable innovation, the ESG AWARD has become one of TSMC’s most influential internal initiatives. The sixth edition recorded a significant increase in submissions compared to previous years, highlighting how sustainability has evolved from a corporate objective into a shared mission embraced by employees at all levels. Proposals covered a wide range of topics, including energy efficiency, carbon reduction, water resource management, waste minimization, supply chain responsibility, workplace well-being, and community engagement.

TSMC emphasized that the award is not merely a competition, but a catalyst for turning ideas into action. Many past award-winning proposals have been successfully implemented across fabs and offices, delivering measurable environmental and social benefits. These include innovations in energy-saving manufacturing processes, circular economy practices for materials reuse, and digital solutions to enhance operational transparency and governance. By empowering employees to contribute ideas directly linked to real-world impact, TSMC reinforces a culture where sustainability is embedded into daily operations.

The strong participation in the 6th ESG AWARD also reflects the broader pressures and responsibilities facing semiconductor manufacturers today. As demand for advanced chips grows alongside global digital transformation, the industry’s environmental footprint has come under increasing scrutiny. High energy consumption, water usage, and complex supply chains pose challenges that require both technological innovation and organizational commitment. TSMC’s approach demonstrates how internal engagement can play a crucial role in addressing these challenges proactively.

According to TSMC, proposals submitted this year showed greater maturity and cross-functional collaboration than in previous editions. Many teams combined technical expertise with ESG thinking, proposing solutions that balance productivity, cost efficiency, and sustainability. This shift suggests that ESG considerations are no longer treated as separate from core business goals, but rather as integral to long-term competitiveness and resilience.
The award process includes rigorous evaluation criteria, focusing on innovation, feasibility, scalability, and alignment with TSMC’s sustainability strategy. Selected proposals receive recognition and resources to support further development and implementation. This mechanism not only motivates employees but also accelerates the company’s progress toward its ESG targets, including net-zero ambitions and responsible supply chain management.

Beyond internal impact, the ESG AWARD sends a strong signal to stakeholders, including customers, investors, and partners. It highlights TSMC’s commitment to transparency, accountability, and continuous improvement in ESG performance. In an era where ESG metrics increasingly influence investment decisions and customer trust, such initiatives strengthen TSMC’s reputation as a responsible industry leader.

The enthusiasm generated by the 6th ESG AWARD underscores a key lesson for global corporations: sustainability thrives when employees are empowered to participate meaningfully.

Bottom Line: By transforming ESG from a top-down directive into a bottom-up movement, TSMC has ignited a passion that extends beyond awards and recognition. As the company looks ahead, the ideas and energy unleashed by this year’s record-breaking participation are expected to play a vital role in shaping a more sustainable future for both TSMC and the semiconductor industry as a whole.

Also Read:

TSMC based 3D Chips: Socionext Achieves Two Successful Tape-Outs in Just Seven Months!

Why TSMC is Known as the Trusted Foundry

TSMC’s Customized Technical Documentation Platform Enhances Customer Experience


Tiling Support in SiFive’s AI/ML Software Stack for RISC-V Vector-Matrix Extension

Tiling Support in SiFive’s AI/ML Software Stack for RISC-V Vector-Matrix Extension
by Daniel Nenni on 12-31-2025 at 10:00 am

SiFive AI ML RISC V Summit 2025

At the 2025 RISC-V Summit North America, Min Hsu, Staff Compiler Engineer at SiFive, presented on enhancing tiling support within SiFive’s AI/ML software stack for the RISC-V Vector-Matrix Extension (VME). This extension aims to boost matrix multiplication efficiency, a cornerstone of AI workloads. SiFive’s VME implementation introduces a large matrix accumulator state for the result matrix C, leveraging existing RISC-V Vector (RVV) registers to supply source operands A and B. This design enables outer-product-style multiplications directly into the C accumulator, with options for “fat” k>1 support to handle narrower input datatypes. Rows or columns of C can be moved to vector registers or loaded/stored from memory, and the C state may be segmented into multiple tiles. By positioning the accumulator near arithmetic units, the matrix engine achieves high throughput, making it ideal for compute-intensive AI tasks.

A key focus was tiled matrix multiplication, illustrated through a Python pseudocode example. The function tiled_matmul decomposes large matrices A (m x k), B (k x n), and C (m x n) into manageable tiles. Outer loops iterate over tile_m, tile_n, and tile_k dimensions, creating views of sub-matrices (e.g., lhs_tile = A[m1:m1+tile_m, k1:k1+tile_k]). Inner loops then apply register-level tiling with tile_m_v, tile_n_v, and tile_k_v, performing the core operation: dst_tile[mv:mv+tile_m_v, nv:nv+tile_n_v] += np.matmul(lhs_tile_v, rhs_tile_v). This hierarchical tiling optimizes data locality—outer tiles fit into caches, inner ones into registers—reducing memory access overhead and enhancing performance for large-scale AI models.

SiFive’s AI/ML software stack integrates these hardware features seamlessly, enabling end-to-end execution of high-profile models on SiFive platforms. Central to this is the Intermediate Representation Execution Environment (IREE), an open-source MLIR-based compiler and runtime optimized for SiFive microarchitectures. IREE supports diverse front-ends like PyTorch for LLMs, applying target-specific tiling policies to break down operations. It enables intra-operation parallelization, generates code via SiFive’s tuned LLVM compilers and Scalable Kernel Libraries (SKL), and mixes MLIR codegen with microkernels (ukernels) for efficiency. The runtime handles inter-operation parallelization through asynchronous execution and task scheduling, supporting both Linux and bare-metal environments.

Hsu highlighted advancements in multi-tile matrix multiplication within IREE. Previously, IREE supported only single-tile K-loops, where sources A0 and B0 are loaded once, and a single matmul accumulates into C00. Now, enhancements allow multi-tile K-loops, loading sources like A0, A1 once and distributing accumulations across multiple C tiles (e.g., C00 += A0 * B0, C10 += A1 * B0, then C01 += A0 * B1, C11 += A1 * B1). This reduces redundant loads, improving arithmetic intensity and efficiency, especially for deep neural networks where K dimensions are large.

In takeaways, Hsu emphasized that tiled matrix multiplication is essential for high-performance AI/ML applications, as it maximizes hardware utilization. IREE excels in automating and optimizing these tiling strategies. RISC-V’s VME is purpose-built for such tiled operations, delivering native performance gains. SiFive’s XM series implements VME in a compact, integrated form factor, and the team’s contributions to IREE—particularly multi-tile support—further amplify efficiency. This software-hardware synergy positions SiFive’s stack as a robust solution for AI acceleration on RISC-V, bridging custom extensions with standardized ecosystems to drive innovation in edge and datacenter AI.

Bottom line: The presentation underscores SiFive’s commitment to advancing RISC-V for AI, combining architectural extensions with sophisticated compiler tools to tackle compute bottlenecks effectively.

Also Read:

SiFive Launches Second-Generation Intelligence Family of RISC-V Cores

Podcast EP197: A Tour of the RISC-V Movement and SiFive’s Contributions with Jack Kang

Enhancing RISC-V Vector Extensions to Accelerate Performance on ML Workloads


TSMC based 3D Chips: Socionext Achieves Two Successful Tape-Outs in Just Seven Months!

TSMC based 3D Chips: Socionext Achieves Two Successful Tape-Outs in Just Seven Months!
by Daniel Nenni on 12-31-2025 at 6:00 am

Synopsys Socionext 3d IC

Socionext’s recent run of rapid 3D-IC tape-outs is a noteworthy milestone for the industry with two successful tape-outs in just seven months for complex, multi-die designs aimed at AI and HPC workloads. That pace of iteration highlights how advanced packaging, richer EDA toolchains, and closer foundry-ecosystem collaboration are turning what used to be multi-year projects into achievable, repeatable engineering cycles.

At the heart of this acceleration are three interlocking trends: face-to-face 3D stacking that shrinks inter-die latency, process-node specialization across dies (e.g., TSMC N3 compute plus TSMC N5 I/O), and EDA/IP/cloud toolchains purpose-built for multi-die flows. Socionext’s taped-out designs reportedly combine an N3 compute die with an N5 I/O die using TSMC’s SoIC-X 3D stacking, a configuration that reduces interconnect distance and power while increasing bandwidth versus traditional 2D or 2.5D approaches.

Speeding a 3D-IC from concept to tape-out requires more than just clever floorplanning. Mechanical and thermal challenges (warpage, delamination, and heat removal), stringent reliability checks, and new timing/IR signoff flows make multi-die design complex. Socionext’s achievement illustrates how tightly integrated IP (PHYs, SerDes), 3D-aware design rules, and cloud-enabled EDA can remove bottlenecks: by automating design-rule checks for stacked interfaces, enabling distributed compute for large signoff runs, and providing pre-verified IP blocks that support high-speed interconnects. The company itself and partners emphasize that combining proven IP with AI-augmented EDA flows shortened development cycles and improved first-pass quality.

From a product perspective, 3D stacking supports an attractive value proposition for AI and HPC: put logic where it matters, optimize each die on the best process node for that function, and connect them with ultra-dense interfaces to reach system-level PPA (power, performance, area) that 2D designs cannot match. For vendors like Socionext — which target consumer SoCs as well as data-center accelerators — the ability to deliver working 3D-ICs rapidly opens new architectural options (heterogeneous dies, separable I/O fabrics, and modular chiplet ecosystems). Recent Socionext materials also show the company expanding 3DIC and 5.5D packaging support and promoting configurable chiplet building blocks to simplify system assembly.

Industry partnerships are central to this story. Socionext’s work with EDA and IP suppliers, and collaboration within the TSMC OIP ecosystem, demonstrate that 3D-IC success depends on an end-to-end supply chain: foundry stacking capabilities, packaging houses that can handle F2F and 5.5D substrates, EDA tools that understand multi-die timing and thermal behavior, and IP that is 3D-aware. The Synopsys writeup covering Socionext’s timeline explicitly credits the use of Synopsys’ 3D-enabled IP, AI-powered EDA flows, and cloud solutions as instrumental in hitting multiple tape-outs quickly.

What does this mean for the broader market? Faster, repeatable 3D tape-outs lower the barrier to entry for companies wanting to pursue heterogeneous integration. They also pressure incumbents to adopt modular approaches and to invest in multi-die verification and manufacturing readiness. However, scaling from tape-out to high-yield mass production remains the next big hurdle: yields, test strategies, and supply-chain throughput for advanced packaging will determine whether such rapid tape-out cycles translate into volume shipments and cost-effective products.

Bottom line: Socionext’s two tape-outs in seven months are more than a marketing sound bite, they’re a signal that the multi-die era is maturing. With the right mix of IP, EDA, foundry packaging, and ecosystem collaboration, complex 3D systems can move from experimental demos to production-grade devices on timelines that were hard to imagine just a few years ago.

Also Read:

Cerebras AI Inference Wins Demo of the Year Award at TSMC North America Technology Symposium

TSMC Kumamoto: Pioneering Japan’s Semiconductor Revival

AI-Driven DRC Productivity Optimization: Revolutionizing Semiconductor Design


RISC-V Extensions for AI: Enhancing Performance in Machine Learning

RISC-V Extensions for AI: Enhancing Performance in Machine Learning
by Daniel Nenni on 12-30-2025 at 10:00 am

SiFive Risc V Summit 2025

In a presentation at the RISC-V Summit North America 2025, John Simpson, Senior Principal Architect at SiFive, delved into the evolving landscape of RISC-V extensions tailored for artificial intelligence and machine learning. RISC-V’s open architecture has fueled its adoption in AI/ML markets by allowing customization and extension of core designs. However, Simpson emphasized the importance of balancing this flexibility with standardization under profiles like RVA23 to foster an open ecosystem that promotes innovation while preserving differentiation. As AI models grow exponentially—drawing from Epoch AI data showing model sizes surging from vector compute to massive matrix operations, the need for accelerated matrix multiplication and broader datatype support has become critical. Different application domains necessitate varied ISA approaches, but with only a handful of matrix multiply routines, software portability remains relatively unaffected by these choices.

Central to RISC-V’s AI capabilities is the Vector Extension (RVV), which addresses computations beyond matrix multiplies, such as those in activation functions like LayerNorm, Softmax, Sigmoid, and GELU. These operations, involving exponentials and normalizations, can bottleneck throughput when matrix multiplies are accelerated. For instance, prefilling Llama-3 70B with 1k tokens requires 5.12 billion exponential operations. RVV 1.0 supports integer (INT8/16/32/64) and floating-point (FP16/32/64) datatypes, with extensions like Zvfbmin for BF16 conversions and Zvfbwma for widening BF16 multiply-adds. Proposed additions, such as Zvfbta for BF16 arithmetic and Zvfofp8min for OCP FP8 (E4M3/E5M2) via conversions, aim to expand support. Discussions focus on using an altfmt bit in the vtype CSR to encode new datatypes efficiently, avoiding instruction length expansions. Future activity may include OCP MX formats like FP8/6/4, potentially requiring more instruction space or vtype bits.

Simpson outlined several matrix extension approaches under consideration by RISC-V task groups. The Zvbdot extension introduces vector batch dot-products without new state, leveraging existing vector registers. It computes eight dot-products per instruction, with one input from vector A and eight from group B (columns as registers), accumulating in group C. A 3-bit offset accesses up to 64 results. For VLEN=1024 with FP8 inputs and FP32 outputs, it achieves 1K MACs per instruction while writing only 256 bits, accelerating GEMM and GEMV with a vector-friendly read-heavy design.

Integrated Matrix Extensions (IME TG) reuse vector registers as matrix tiles, adding minimal vtype bits. They support matrix-matrix multiplies, with higher arithmetic intensity from longer vectors. Most sub-proposals require new tile load/store instructions, and Option-G is advancing. Write demands for result C might necessitate register renaming in the matrix unit, transparent to software.

Vector-Matrix Extensions (VME TG) add large matrix accumulator state for C, divided into tiles, while using RVV vectors for A and B. Outer-product multiplies accumulate into C, with potential “fat” support for narrower inputs. It includes moves between C and vectors/memory, enabling high throughput by placing accumulators near arithmetic units.

Attached Matrix Extensions (AME TG) introduce separate state for A, B, and C, performing matrix-matrix multiplies independently of RVV. If RVV is absent, new vector operations on matrix state are needed; otherwise, integration is preferred. Requiring dedicated load/store paths, AME offers the largest design space for peak performance, though no consensus proposal exists yet.

Performance varies by approach: Zvbdot suits LLM decode phases with batch=1, accelerating GEMV. IME fits edge devices prioritizing area/power. VME balances vector sourcing with high MACs, while AME maximizes MACs but demands more resources. For LLMs, larger batches improve efficiency but strain KV cache bandwidth.

Bottom line: These extensions position RISC-V as a versatile AI platform, evolving to meet diverse needs from edge to hyperscale. SiFive’s insights highlight ongoing standardization efforts to ensure scalability and ecosystem growth.

Also Read:

SiFive Launches Second-Generation Intelligence Family of RISC-V Cores

Podcast EP197: A Tour of the RISC-V Movement and SiFive’s Contributions with Jack Kang

Enhancing RISC-V Vector Extensions to Accelerate Performance on ML Workloads


Runtime Elaboration of UVM Verification Code

Runtime Elaboration of UVM Verification Code
by Tom Anderson on 12-30-2025 at 6:00 am

AMIQ UVM Runtime Elaboration in DVT IDE

Recently, I reported on my conversation with Cristian Amitroaie, CEO of AMIQ EDA, about automated generation of documentation from design and verification code. Before we chose that topic for a post, Cristian described several capabilities of the AMIQ EDA product family that might be of interest to design and verification engineers. For today’s post, I’ve selected runtime elaboration of Universal Verification Methodology (UVM) code because I wanted to know more about the benefits for engineers working on real-world chip projects.

What do you mean by elaboration?

When our tools read in design and verification code, we check for a wide variety of errors, and then we build a complex internal model that reflects every aspect of the code. For example, in our Design and Verification Tools (DVT) Integrated Development Environment (IDE) family, we perform a full design elaboration. That means we build a model with the complete design hierarchy and all the proper parameters computation, generate blocks computation, binds, etc. This allows design engineers to explore design hierarchies, trace signals and parameters, draw schematic diagrams, and perform many other useful tasks.

How do you handle verification code?

We also build a complete model for verification environments, which are usually based on UVM. Verification engineers often partially mirror the design hierarchy by a tree of components such as drivers and monitors organized in UVM testbench components. They also define and instantiate verification-specific components such as scoreboards and sequencers. All the components are connected together using transaction-level modeling (TLM) ports, defining a verification topology.

Is the verification topology like the design hierarchy?

In some ways yes, but verification topologies are not defined in a static manner like design hierarchies. There is no top module instantiating submodules, and so on, that can be statically computed. The verification topology is controlled per UVM test by activating or deactivating drivers, replacing some components with others tuned to match specific test requirements, connecting specific components to specific design interfaces, etc. The UVM verification component hierarchy is constructed by executing a specific UVM flow at simulation time 0. During this execution, all configuration via the “config db” setters/getters mechanism is performed, all the factory overrides are applied, and more.

What does this mean for DVT IDE?

The bottom line is that verification elaboration cannot be completed until UVM phase 0 (activity at time 0) is executed. We could have called a third-party simulator for this execution, but that takes time and adds overhead. Instead, DVT IDE actually performs a “run 0” internally to allow all the UVM elaboration to happen. We call this process UVM runtime elaboration to reflect its non-static nature.

How does this work in DVT IDE?

Users can ask for the runtime elaboration of a specific UVM test and use breakpoints to debug the “run 0” execution. When a breakpoint interrupts the execution, users can browse the call stacks on each parallel thread and inspect variables. We provide different types of breakpoints, which can be conditional. Users can browse the function call stack and all the breakpoints they’ve set in their project. They can also step through the executed code and inspect variable values, add log points to print information without altering the verification code, and add watchpoints to interrupt upon variable changes.

During UVM runtime elaboration, DVT IDE collects information about factory override definitions and if/where they are applied; information about the config database, including set/get calls and how they are paired; information about the register model, including address and bitfield computation; information about which physical interfaces are connected to virtual interfaces; and information about TLM port connections.

How does this help engineers create, explore, and debug the verification topology?

All this information collected is available in DVT IDE to help engineers explore their verification topology, the tree of components, the register model, the config db, and more. DVT IDE can also display a diagram of all the nested components, including their connections via TLM ports and their connections to the design via virtual interfaces. This is called the UVM Components Diagram.

We can determine some of this verification topology statically, but runtime elaboration allows us to compute actual data that perfectly matches what would happen in a simulator at time 0. Users get all the benefits I’ve mentioned without having to access a simulator. This saves time since the internal UVM runtime elaboration is faster than invoking an external tool that builds a model for full simulation.

What other capabilities benefit the users?

Three things spring to mind. First of all, many verification environments use C models in addition to UVM SystemVerilog code. We support DPI-C calls during “run 0” so this is not an issue. Second, if the verification code changes, users don’t have to go through the compilation and design elaboration process all over again. DVT IDE incrementally analyzes the changes and performs the UVM runtime elaboration. Finally, after the elaboration is done, we save a database that users can load anytime. This means that if there are no changes to the UVM topology, verification engineers can simply load the snapshot without having to execute runtime elaboration again.

Any final thoughts?

The capabilities I’ve listed are robust and well proven by many users over several years. In this post, I’ve only given an overview. To find out more, I recommend a concise tutorial available on our website. Of course, interested verification engineers can contact us to schedule a demo or request an evaluation license.

Thank you for your time, Cristian.

Likewise, and Happy Holidays!

Also Read:

Better Automatic Generation of Documentation from RTL Code

AMIQ EDA at the 2025 Design Automation Conference #62DAC

2025 Outlook with Cristian Amitroaie, Founder and CEO of AMIQ EDA