CEVA Dolphin Weninar SemiWiki 800x100 260419 (1)

AlphaDesign AI Experts Wade into Design and Verification

AlphaDesign AI Experts Wade into Design and Verification
by Bernard Murphy on 03-06-2025 at 6:00 am

DVCon 2025 talk min

I mentioned in an earlier blog that multiple presentations at DVCon 2025 went all-in on AI-assisted design and verification. The presentation was one such example, looking very much at top-down AI-expert application of agentic flows to design and verification. AlphaDesign is a new startup out of UC Santa Barbara headed by William Wang (Professor of AI and a track record of research engagements with Amazon, Intel, Nvidia and others.)

The role of AI experts in advancing AI for design and verification

The promise of AI in this domain is both exciting and concerning. Exciting because there is potential to revolutionize productivity. Concerning not for loss of jobs (that will never happen), but because AI is still viewed as about approximate, probabilistic answers while engineering is about precision; approximate may be helpful for beginners and quick starts but not for production quality. We will still lean heavily on production tools (synthesis, simulation, etc) to validate and optimize, initially all the way through the flow, likely moving later in the flow as we build confidence in the quality of AI-based design generation.

Yet if we want to seize this opportunity and not talk ourselves out of big advances before we start, we need native AI experts to be involved in this journey as much as native EDA experts. EDA teams with their own AI experts will continue to push from the bottom up, very much with a focus on near-term profitability in optimizing proven flows, because that’s how they can run a healthy business. Top-down AI experts meanwhile can push what could be possible in generation and analysis from natural language prompts/specs, and beyond. That’s where I see AlphaDesign fitting in.

Certainly trust will need to be built along that journey, initially in helping refine verification suites for improved coverage. And perhaps in generating snippets of RTL as designers start to become comfortable with that idea. Later becoming a more accomplished aid in the design and verification process. We’ve been down similar paths before in EDA. I see what AlphaDesign is proposing an yet another improvement in productivity, initially helping tune current flows, gradually switching us over to new ways of thinking about the design task.

Agentic flows

The company calls their solution ChipAgents™, reflecting that the approach uses LLM agents to accomplish a goal. An agent in this world incorporates planning (decomposing a task into subtasks and refining past action), memory (managing context over a long period of time), and tool use (for assessment on a proposed solution and to elaborate designs/tests).

Agentic flows have been making big strides in the software engineering world. For automatically locating and fixing bugs in a software repository, LLM-only success rate is pretty sad (<3%). Adding basic agent support improved that rate by 8X. Amazon Q developer agent doubles that rate to 55%. A further refinement gets to 62%+. Not hands-free yet but an impressive advance.

Of course this is for software which can draw on a massive training corpus. Hardware is much more difficult, not because we have to be more clever but because there is so little of it to use in training (by one estimate, SystemVerilog code amounts to 0.28% of the lines of code accessible in Python+Java+Go+Javascript.) Also toolchains in hardware design compound complexity over software engineering.

Progress

This is an early-stage company, receiving their seed funding round in August 2024. Initial staffing has drawn from UCSB graduates with a heavy emphasis on AI and data science training.

The first step has been to build a serious reference design they call ChipAgentsBench, curated from OpenSource projects like OpenTitan. Good move, since many GenAI demonstrations for design that I have seen so far have been based on toy examples. This reference has 2.8k SystemVerilog files, amounting to over 600k lines of code. AlphaDesign say they plan to open-source a subset of this design at some point.

Details on demonstrated agent capabilities are thin so far. There is a CoverAgent aiming to help improve coverage in testing. As a general concept, using AI to help improve bottom-up coverage is not new. What looks intriguing here in talking to a couple of the R&D folks is looking at coverage based on reading natural language specs. As an example, finding ways to boost coverage in error-handling logic is always challenging in bottom-up testing but may be easier to spot/exercise based on reading a spec.

Unfortunately I missed their talk, thanks to conflicts, so take my limited understanding with a grain of salt. The company cites common use models in early engagements include generating DV documentation and code snippet generation for utility scripts. They also mention code summarization, RTL/testbench generation predicated on existing files and design verification IPs – all high value targets if/when proven.

Definitely a company to watch. You can check out the website HERE.

Also Read:

An Imaginative Approach to AI-based Design

Powering the Future: How Engineered Substrates and Material Innovation Drive the Semiconductor Revolution

The Double-Edged Sword of AI Processors: Batch Sizes, Token Rates, and the Hardware Hurdles in Large Language Model Processing


CEO Interview with Pradyumna (Prady) Gupta of Infinita Lab

CEO Interview with Pradyumna (Prady) Gupta of Infinita Lab
by Daniel Nenni on 03-05-2025 at 10:00 am

DSC01362

Dr. Pradyumna (Prady) Gupta is the Founder and Chief Scientist at Infinita Lab and Infinita Materials, pioneering advancements in materials testing and specialized chemicals for cutting-edge industries. A passionate advocate for onshoring critical manufacturing industries, such as semiconductors, back to America, Prady is dedicated to fostering innovation that strengthens domestic supply chains and reinforces technological sovereignty.

At Infinita Lab— “Uber of Materials Testing”—Dr. Gupta leads network of over 2,000 labs that deliver solutions in metrology, materials testing, product validation, and ASTM/ISO standardized testing. Through Infinita Materials, he provides specialized chemicals and materials that power critical applications across semiconductors, electric vehicles, aerospace, and emerging technologies.

Prady’s career includes key roles at Saint-Gobain and Corning’s Gorilla Glass commercialization team. He co-founded multiple successful startups. An accomplished scientist, Prady holds several patents and has authored numerous research papers in materials science.

He holds an MBA from INSEAD / Wharton and a PhD in Materials Science, combining deep technical expertise with strategic business acumen. Prady’s entrepreneurial and scientific leadership continues to bridge today’s industrial needs with the innovations required to solve the challenges of tomorrow.

Tell us about your company.

I run two companies: Infinita Lab and Infinita Materials.

Infinita Lab is the “Uber” of materials testing, offering a comprehensive range of testing services. From advanced metrology techniques such as SEM, TEM, RBS, and XPS to environmental, dielectric, mechanical testing, and standardized ASTM or ISO testing, we provide it all through our network of 2,000+ accredited labs.

Our clients include industry leaders like Intel, Tesla, Applied Materials, and Lam Research, who rely on us to equip their engineers with a full spectrum of testing options. We are also the go-to lab for startups and smaller companies that lack in-house testing facilities.

Infinita Materials, on the other hand, specializes in delivering custom inorganic chemicals and sputtering targets for industries such as semiconductors, batteries, fuel cells, and electronics.

Clients value us because you’ll speak directly with a master-level materials scientist who can address your materials-related challenges—not just someone taking your contact information.

Based in Newark, California, we have a national reach thanks to our partnerships with accredited labs across the U.S. Our core services include nondestructive testing, advanced material characterization, chemistry analysis, vibration testing, and root cause failure analysis.

We collaborate with everyone from startups to Fortune 500 companies, tailoring our solutions to meet their unique needs. As an innovation partner, we are as invested in our clients’ success as they are.

What problems are you solving?

Engineers often face the tedious and time-consuming task of finding labs capable of performing the specific materials testing they need. This discovery challenges inspired Infinita Lab, designed to streamline and simplify the process.

Infinita Materials addresses the challenge of designing new compositions for the electronics and semiconductor industries. It’s currently costly and difficult for engineers to find specialized composition-making facilities. Many Chinese manufacturers overlook these needs due to low ROI in small-volume compositions. Additionally, confidentiality and communication issues arise. We guarantee confidentiality and provide consultancy from masters-level materials scientists, ensuring specialized composition needs are met efficiently.

What application areas are your strongest?

Infinita Lab’s strength lies in adapting to the unique needs of diverse industries. We’ve built a solid reputation in high-tech sectors like semiconductors, nanotechnology, and energy storage. For instance, in the semiconductor space, we assist companies with thermal and failure analysis to meet rigorous performance and reliability standards.

In the energy sector, we focus on testing advanced battery technologies and solar panel materials, ensuring they’re efficient and durable in extreme conditions. Aerospace is another area of expertise, where we perform vibration and nondestructive testing to ensure components meet safety-critical requirements.

For Infinita Materials, we target semiconductor sputtering targets and specialized inorganic powders used in electronics, batteries, fuel cells, superconductors, and other cutting-edge applications. Additionally, we’re exploring additive manufacturing as a growing field, leveraging our expertise to innovate in 3D-printed materials for high-tech and industrial applications.

What keeps your customers up at night?

If I had to sum it up, I’d say it’s uncertainty—uncertainty about product performance, meeting regulatory standards, or potential failures in the field. For R&D teams, it’s the pressure of innovation—getting their product to market before the competition while ensuring reliability and performance. For manufacturers, it’s the fear of supply chain risks or defective materials. It’s about ensuring consistent quality. A single material defect in a supply chain can lead to catastrophic failures, recalls, or even safety risks. Engineers often face the tedious and time-consuming challenge of finding labs capable of performing the specific material testing they need. Compliance is another challenge—meeting ISO, ASTM, and other standards, which is equally difficult to adhere to.

Our role is to make all of this easier. We help clients identify potential risks early, solve complex material-related problems, and ultimately give them confidence that their products will perform as expected. Whether it’s failure analysis for a semiconductor company or environmental durability testing for a solar manufacturer, we’re solving problems that can make or break our clients’ success.

We understand these pressures because we’ve seen them time and again. That’s why we focus on more than just testing—we help our clients see the bigger picture. Our detailed, transparent reports don’t just identify problems; they provide tailored solutions for different industries, giving our clients peace of mind knowing they’re staying ahead of the curve.

What does the competitive landscape look like and how do you differentiate?

The testing industry is diverse, with large players and smaller, specialized labs often dominating specific niches. This fragmented landscape frequently forces clients to juggle multiple providers to meet their varied testing needs.

At Infinita Lab, we stand apart by offering a comprehensive range of services through partnerships with accredited labs, ensuring everything is accessible under one roof.

Our primary competition comes from in-house labs. While convenient and easily accessible, in-house labs are often sub-optimal solutions that can significantly handicap engineers. Here’s why:

Lack of Incentives: Managers in in-house labs typically don’t have the incentive to optimize performance. I’ve seen firsthand how instruments are down half the time or turnaround times stretch to months.

Obsolete Infrastructure: Instruments and technician skills must keep pace with the rapid innovations in materials science. As technology advances at an accelerating rate, in-house labs are often obsolete even before they are set up. This trend is only going to accelerate with the advent of AI.

Convenience vs. Capability: While in-house labs are convenient, they often lack the resources and capacity to provide cutting-edge solutions. We are working to make external labs just as accessible as in-house facilities, without compromising on quality or innovation.

Our differentiation lies in the following:

Expert Access: A master-level materials scientist will personally pick up your call, offering the expertise and guidance you need.

Concierge Service: We provide a seamless and easy concierge service, ensuring your needs are promptly and efficiently met.

Comprehensive Solutions at a Fraction of the Cost: With our expansive network of over 2,000 labs in the US, we provide a complete range of services at a fraction of the cost compared to in-house labs or most external providers. Engineers can request any type of materials testing and receive it quickly and affordably—a powerful proposition that outpaces both in-house and external lab alternatives.

For Infinita Materials, the competitive landscape primarily features Japanese small companies that excel in specialized materials and chemicals. However, we differentiate ourselves through superior communication and a personalized approach. We provide clients with access to high-level experts, ensuring tailored discussions that lead to the creation of high-quality products. This personalized interaction sets us apart, offering both technical expertise and a consultative edge that many competitors lack.

What new features/technology are you working on?

At Infinita Lab and Materials, we’re always looking to push the boundaries. Right now, we’re investing in expanding our capabilities for next-generation materials. These materials hold enormous potential, but testing them requires specialized equipment and expertise—challenges we are stepping up to meet.
The evolving requirements for AI hardware, such as advanced packaging and memory, have introduced new testing challenges. With our unique vantage point of the testing industry as a whole, we are leading the charge to upgrade and prepare the industry for these upcoming demands.

At Infinita Lab, we are working on:

  • A UPS-like tracker to better predict turnaround times (TAT) for samples in the lab.
  • A simplified system for sending samples for analysis.
  • A system to provide more testing options and better match testing needs with appropriate testing methods.

With Infinita Materials, we are addressing the challenge of designing new compositions for the electronics and semiconductor industries. Currently, it is both costly and difficult for engineers to locate specialized composition-making shops. Chinese manufacturers often find small-volume specialized compositions unattractive in terms of ROI. Additionally, issues surrounding confidentiality and communication further complicate the process.
We guarantee confidentiality and provide access to master-level materials scientists who offer expert consultancy on specialized compositions, ensuring high-quality solutions tailored to our clients’ needs.

How do customers normally engage with your company?

We strive to make the process seamless and straightforward. It typically begins with a conversation where clients outline their problem or testing needs. What sets us apart is that when you call Infinita Lab, you are greeted by a master-level materials scientist—not just someone taking your contact information. This expert access ensures that your concerns are addressed immediately, with tailored guidance and actionable insights. Together, we define the project scope. If additional support is needed, our experts are available around the clock to provide guidance.

For Infinita Materials, clients often engage with us for specialized compositions, such as semiconductor sputtering targets or custom inorganic powders. Our master-level experts work closely to understand their unique requirements, guaranteeing confidentiality and precision throughout the process. Our clients value this highly personalized approach, which includes direct access to experts who can discuss and refine their needs in detail. We provide a seamless, efficient concierge experience, ensuring your needs are met promptly and without unnecessary hurdles. With our expansive network of over 2,000 labs in the U.S., we offer a complete range of services quickly and affordably. Engineers can request any type of materials testing, confident they’ll receive high-quality results at a fraction of the cost compared to in-house labs or external providers.

Many of our clients build long-term relationships with us, treating Infinita Lab and Infinita Materials as extensions of their teams. One of the most rewarding aspects of our work is seeing these partnerships empower our clients to achieve their goals and push the boundaries of innovation.

Also Read:

Executive Interview: Steve Howington of the High Performance Flooring Division of Sherwin-Williams

2025 Outlook with Sri Lakshmi Simhadri of MosChip

CEO Interview: Mouna Elkhatib of AONDevices


An Imaginative Approach to AI-based Design

An Imaginative Approach to AI-based Design
by Bernard Murphy on 03-05-2025 at 6:00 am

Rise DA advantages min

DVCon 2025 was unquestionably a forum for pulling out all the stops in AI-based (RTL) design and verification, particularly around generative AI and agentic methods. I heard three product pitches and a keynote and have been told that every AI talk was standing room only. A pitch from Rise-DA particularly appealed to me because they have clearly taken care to balance intelligently between the promise of AI, the pros and cons of abstraction and the real dynamics of introducing new methods into established and proven flows and training.

Abstraction made easy for designers, and for training

Given the heritage of Rise-DA, it shouldn’t be surprising that abstraction is important to this story. The CEO, Badru Agarwala, was GM of the Calypto System Division at Mentor; high-level design/synthesis is in his DNA. Yet C/C++ HLS is still a barrier to adoption for most RTL designers. Rise-DA simplifies adoption by adding untimed/loosely timed SystemVerilog as a supported behavioral description. Rise also supports mixed language, allowing for reuse across multiple design styles.

The second key idea concerns training. A challenge in applying LLMs to RTL design in any capacity is that the code-corpus on which a tool can train is much smaller than for software, further reduced since no enterprise wants to share their trade secrets. Commonly a design team can train on their own RTL corpus, maybe adding some very generic training from the tool vendor. Hardly an extensive training set for generative AI.

However a high-level design tool can train on the full software corpus – C, C++, Python and more. There are some restrictions for synthesis which should be recognized, but those can be handled in fine-tuning and in linting to catch any escapes. What about synthesis from SystemVerilog – doesn’t that run into the same RTL corpus problems? According to the Rise folks the syntax you will use in synthesizable behavioral SV is (modulo some syntactic sugar) little different from that you would use in C/C++. So SV users in this context benefit from the same extensive software corpus training.

Connecting to production tooling through agents

Remember this is a high-level synthesis system. You’re going to use this flow to design new building blocks or subsystems from scratch. These might be for video/audio/radar/lidar pipelines or custom DNNs (or possibly a multi-layer perceptron). CPUs/GPUs/systolic arrays might be possible in principle but don’t play to the strengths of HLS.

The Rise flow will generate synthesizable RTL from your behavioral input, first through well-known HLS transformations (loop unrolling, pipeline scheduling, parallelism, etc.), then through technology/implementation mapping. Rise takes care of the first part, and they have integrated the Google XLS platform for the second part (in this context XLS is Google’s name for accelerated synthesis).

This flow is designed to be fast and lightweight, in support of fast turnaround synthesis/ implementation experiments to gauge performance and PPA. The Rise folks provided a couple of interesting insights here. They say this is “screaming fast”, allowing for a lot of experimentation to find optimal solutions. A designer might counter that ultra-fast synthesis can’t be very optimized; isn’t this a problem? Rise would agree for the early days when HLS was introduced, however today all that optimization can be left to production synthesis tools which are much better at handling that level of implementation detail.

To validate correctness and optimality of generated solutions the flow must run production tools like synthesis or RTL simulation. This is handled through agents which will launch said tools as and when you require. Rise will feedback estimates like PPA to provide you with insight on how you want to tune the high level.

For verification, you will want to validate that generated RTL works the same way as the behavioral source against the behavioral tests you have been using in algorithm development. Rise instruments the generated RTL with transactors so you can plug the generated RTL back into those behavioral sims to check correspondence.

You can also add asserts, cover statements, even display statements, to your HLS model, which will be mapped through to the RTL in support of UVM-based testing. Rise will also add SV attributes (if requested) to the RTL to help you trace back and forth when you’re trying to localize a problem. All providing aids to help you localize mismatches or unexpected behavior, as a guide to further refining the HLS model.

Now add GenAI

With a solid foundation and training scope that can leverage the full range of learning drawn from software engineering, you might understand why I find this direction appealing. Rise supports the kinds of generative code development you might see in a CoPilot platform –statement completion, prompt-based code snippet generation, and retrieval-augmented generation (RAG) to find real code examples, documentation, test suggestions, etc. I believe RAG feedback is limited to customer in-house sources for obvious reasons.

I’m impressed. Well thought through, closely coupled to production tools and a way for RTL designers to progress past the C/C++ barrier. (I suspect even that dam will break as more system enterprises demand flows better suited to their ecosystems.) You can learn more HERE.

Also Read:

CEO Interview: Badru Agarwala of Rise Design Automation

SemiWiki Outlook 2025 with yieldHUB Founder & CEO John O’Donnell

TRNG for Automotive achieves ISO 26262 and ISO/SAE 21434 compliance


Executive Interview: Steve Howington of the Protective, Marine & High Performance Flooring Division of Sherwin-Williams

Executive Interview: Steve Howington of the Protective, Marine & High Performance Flooring Division of Sherwin-Williams
by Daniel Nenni on 03-04-2025 at 10:00 am

howington steve

Steve Howington is Global Vice President of Marketing for the Protective, Marine and High Performance Flooring Division of The Sherwin-Williams Company. During his 22 years with the company, Steve has held multiple commercial and business leadership roles in both the architectural and industrial groups within Sherwin-Williams.

Tell us about your company.

Sherwin-Williams is one of the largest paint companies in the world, with a portfolio that includes industrial coatings for advanced manufacturing facilities, like semiconductor fabs. Paint is often associated with aesthetics; however, safeguarding a facility’s investments and substrates requires protective coatings.

Our Protective, Marine & High Performance Flooring division has tailored advanced coatings systems designed to meet the unique standards of semiconductor fabs. These more robust coating systems can be seamlessly integrated to create more efficient Fab construction processes which compresses construction schedule and ultimately accelerates chip production. We have been able to continually find ways to make the fab construction process more efficient for some of the largest fab projects across the country.

What problems are you solving?

We simplify fab construction which accelerates chip production. Fab construction is a complex, costly, and lengthy process involving many partners, suppliers, and trades. Through our industry and mega project experience, along with our global research and development teams, we’ve found many ways to simplify this process and accelerate construction timelines while improving the lifecycle of each fab, without sacrificing safety and sustainability. Additionally, our coatings and application processes ensure extended maintenance cycles and overall reduced costs over the lifetime of each area and asset.

We do this not only through our products, but also through the preconstruction guidance and expertise we provide to the largest semiconductor companies in the U.S. and worldwide.

What application areas are your strongest?

We work in several critical areas of the semiconductor fab. Our protective coatings are designed to protect both clean and non-clean zones, including walls, ceilings, floors, and structural elements. We are leaders in cleanroom-specific applications, providing low VOC, high-performance coatings that meet outgassing standards and prevent contamination.

Beyond clean zones, we deliver durable coatings for industrial wastewater treatment plants, central utility buildings, gas chemical storage and other campus buildings that support the fabs operation and meet both performance and environmental standards. Our shop-applied steel and concrete protective coatings solutions streamline construction by application taking place offsite, reducing onsite labor and risks while accelerating timelines.

What keeps your customers up at night?

We’re all aware of the focus that is being placed on semiconductor chip production right now. With the worldwide implications of this technology, those who are responsible for making it must produce with speed and efficiency. That leads to our customers experiencing significant pressure to meet aggressive construction timelines and production targets. Tight deadlines, mixed with labor shortages and rising material costs, create a lot of stress for our customers as they navigate these issues.

What does the competitive landscape look like and how do you differentiate?

Participating in semiconductor fab construction mega projects demands deep knowledge of specific standards and requirements. There are plenty of coatings providers, but where we differentiate ourselves is our mindset – we don’t see ourselves as just a coatings company, we are a partner to the semiconductor industry and its stakeholders. That mindset shift helps set us apart from other coatings companies. We partner with every stakeholder in the construction value chain, know the technology and industry and we’re invested in the future of chip technology.

How do customers normally engage with your company?

Our Semiconductor Construction Solutions experts are available for direct consultation to answer any questions you may have no matter what phase of the project you’re in. We also have resources on our website, such as whitepapers like Maximizing Cleanroom Performance and Optimizing Facility Management with High-Performance Coatings, which help customers understand how our solutions simplify the most intricate aspects of fab construction. We work to ensure our customers feel supported at every step of their project, ultimately making their fab construction process safer, faster, and simpler.

To learn more about Sherwin-Williams Protective, Marine & High Performance Flooring coatings, visit our website or follow us on LinkedIn.

Also Read:

2025 Outlook with Dr Josep Montanyà of Nanusens

CEO Interview: John Chang of Jmem Technology Co., Ltd.


Unlocking the cloud: A new era for post-tapeout flow for semiconductor manufacturing

Unlocking the cloud: A new era for post-tapeout flow for semiconductor manufacturing
by Bassem Riad on 03-04-2025 at 6:00 am

figure2 FullScale

As semiconductor chips shrink and design complexity skyrockets, managing post-tapeout flow (PTOF) jobs has become one of the most compute-intensive tasks in manufacturing. Advanced computational lithography demands an enormous amount of computing power, putting traditional in-house resources to the test. Enter the cloud—an agile, scalable solution with hundreds of compute options, set to revolutionize how foundries manage PTOF workloads.

The unpredictability problem: Bridging the gap in resources

For years, foundries have relied on powerful in-house resources to handle PTOF tasks. But PTOF workloads aren’t consistent—sometimes demand surges, leading to waiting queues that delay production, while at other times, costly resources sit idle. Expanding on-premises infrastructure to match peak demand is both costly and slow, often taking months to deploy. In an industry where every day counts, finding a flexible solution is essential. This is where the cloud steps in, offering dynamic scaling and the freedom to match resources with demand as needed.

Cloud elasticity: Pay only for what you need

This on-demand scaling means foundries no longer must overprovision or commit to massive hardware investments upfront. Cloud platforms are transforming PTOF workflows by allowing foundries to pay only for what they use. With infrastructure managed by cloud providers, teams can shift their focus to developing applications and improving customer engagement while resources expand, or contract as needed. Cloud services offer semiconductor companies access to a global network of tools, empowering them to adapt quickly and push the boundaries of innovation.

Scaling up seamlessly: Siemens EDA and AWS join forces

This vision of agility and scalability became a reality in July 2023, when Siemens EDA and AWS signed a Strategic Collaboration Agreement to accelerate EDA workloads in the cloud. Out of this partnership came Cloud Flight Plans—automation scripts and best practices that streamline EDA deployment on AWS. Now, semiconductor manufacturers can effortlessly scale up resources, deploying hundreds of thousands of cores on demand. No more waiting months to expand data centers; cloud resources are available instantly, without capital investments or maintenance.

Building the foundation: A reference architecture for PTOF in the cloud

This agility is enhanced by Siemens EDA’s Cloud Reference Environment, an architecture purpose-built to handle PTOF jobs on AWS. Designed with secure principles and optimized for seamless workload management, this setup dynamically scales resources based on current demand. A central management system allocates resources to high-priority jobs and quickly redirects any underutilized capacity. Real-time spending insights empower semiconductor companies to control their cloud costs, ensuring resources are optimized at every step and that budget surprises are a thing of the past.

Real-time cost control with CalCM+: Smart scaling for smarter budgets

But it’s not just about scaling—it’s also about managing those costs smartly. Enter CalCM+, a

solution for maximizing cloud efficiency of Calibre PTOF jobs. Central to CalCM+ is adaptive resource management, which monitors active jobs and allocates resources based on actual demand. This intelligent scaling ensures resources aren’t wasted on overprovisioning, keeping budgets lean.

At the heart of CalCM+ is the cost calculation app, offering real-time spending insights by integrating directly with AWS pricing and the Slurm scheduler. Teams can track job costs in real-time, make informed decisions, and optimize resources based on precise needs. A recent study (see chart below) highlights how CalCM+ delivers measurable cost savings through smart scaling and predictive insights, proving that cloud efficiency is as much about cost control as it is about performance.

Data-driven insights: Predicting the future of resource use

CalCM+ goes a step further with a data analysis module that records usage metrics and job metadata, enabling predictions for future jobs. By studying historical data, this tool provides insights into expected runtime and memory usage, allowing teams to pick the best instance types for each task.

Lean Computing 

The AUTOREVOKECYCLE feature dynamically releases underutilized CPUs and reallocates them to high-demand jobs. This lean computing approach doesn’t just keep costs down—it ensures resources are used precisely where they’re needed, avoiding the waste that comes from overprovisioning. Figure 1 shows the effect of using the AUTOREVOKECYCLE feature.

Figure 1. The AUTOREVOKECYCLE feature dynamically releases underutilized CPUs and reallocates them to high-demand jobs.

Cost savings through the power of spot instances

Adding to the cost-saving toolkit is the cloud’s ability to offer dynamic pricing. Foundries can now use spot instances to run high-performance tasks at a fraction of the regular cost. These spot instances, ideal for peak demand, tap into unused cloud capacity at lower rates, helping companies stay within budget without compromising performance.

FullScale processing: Speeding up time-to-tapeout

Cloud elasticity also shines with Calibre FullScale high-throughput processing capabilities, a compelling answer to the compute-intensive demands of PTOF. By enabling parallel lithography simulations, Calibre FullScale slashes job completion times, making faster tapeouts more attainable than ever. With the flexibility to adjust resources based on cost and performance needs, FullScale delivers optimal efficiency, ensuring every task is completed on schedule and with maximum precision (figure 2).

Figure 2. Calibre FullScale speeds time to tapeout.

Tapping into GPU power: Acceleration for compute-intensive tasks

For leading-edge technology nodes, the availability of GPU instances in the cloud is a game-changer. Compute-intensive tasks—like lithography, etch, and e-beam simulations—now run with hardware-accelerated performance, reducing runtimes dramatically. With GPU acceleration, manufacturers can conduct highly detailed simulations that were previously limited by on-premises constraints. The cloud’s GPU capabilities bring precision and scale, redefining what’s possible in PTOF simulations.

Cloud-native orchestration: The Kubernetes advantage

Orchestration systems like Kubernetes are also part of this cloud-driven transformation. Siemens EDA’s solutions leverage container orchestration to enable seamless job distribution across cloud resources. With Kubernetes automating deployment, scaling, and workload management, running complex Calibre PTOF jobs becomes effortless, whether on-premises or in the cloud. This cloud-native execution model maximizes resource use, delivering scalability, efficiency, and flexibility for semiconductor manufacturers.

A new era for semiconductor manufacturing

As semiconductor manufacturing embraces the cloud, a new era is taking shape—one where agility, efficiency, and cost control redefine the way PTOF tasks are managed. With the flexibility to scale on demand, optimize budgets, and orchestrate workloads seamlessly, cloud-based PTOF workflows are setting new standards. By tapping into cloud capabilities, container orchestration, and GPU resources, semiconductor manufacturers gain the edge needed to drive innovation, speed time-to-market, and thrive in an ever-evolving industry.

For a deep dive into this PTOF cloud flow, please see the technical paper, Crush Semi-manufacturing runtimes with Calibre in the cloud.

Bassem is a cloud product engineer specializing in scalable and cost-efficient computing solutions for semiconductor design and manufacturing. With expertise in Kubernetes, high-performance computing, and cloud infrastructure, Bassem focuses on optimizing post-tapeout workflows, EDA tool deployment, and hybrid cloud strategies.

Also Read:

Getting Faster DRC Results with a New Approach

Full Spectrum Transient Noise: A must have sign-off analysis for silicon success

PSS and UVM Work Together for System-Level Verification

Averting Hacks of PCIe® Transport using CMA/SPDM and Advanced Cryptographic Techniques


SemiWiki Outlook 2025 with yieldHUB Founder & CEO John O’Donnell

SemiWiki Outlook 2025 with yieldHUB Founder & CEO John O’Donnell
by Daniel Nenni on 03-03-2025 at 10:00 am

John O’Donnell YieldHUB SemiWiki

John is not your typical CEO. He gets to know every single one of his employees and is incredibly humble, despite his many achievements. He doesn’t believe you have to be ruthless to build a business, and thinks empathy and leadership can lead to better outcomes. Fast forward to today and he’s running a company with a platform that’s used by thousands of product and test engineers around the world. John loves what he does and enjoys traveling the world to meet new prospects and customers.

What was the most exciting high point of 2024 for your company?

One of the most exciting milestones in 2024 was the further expansion of our data science team, which allowed us to take a bold step toward fully integrating AI into our solutions. This not only is enhancing our offerings but also helped us grow within our existing customer base.

Another highlight for yieldHUB was attracting new and strategic customers, for example those developing AI chips and others involved in onshoring testing in the USA and Europe.

What was the biggest challenge your company faced in 2024?

The biggest challenge in 2024 was how to keep developing yieldHUB’s next-generation platform while meeting the increasing demand for our current platform as we added new customers.

How is your company’s work addressing this biggest challenge?

We expanded our R&D and customer success teams to accelerate the new platform’s progress while ensuring that our customers continued to receive top-tier support and service. Maintaining strong customer relationships and responsiveness remains a top priority.

Question: What do you think the biggest growth area for 2025 will be, and why?

We have a new product coming out soon called yieldHUB Live, our AI driven, tester-agnostic real time monitoring system for test and probe. It speeds up testing by recommending to the operator what to do when there are issues. It also allows in-depth remote monitoring of the test/probe floor and tracks key parameters that reflect the integrity, or not, of testing and trimming. The demand for real-time insights is increasing and we believe yieldHUB Live will be a game-changer for test houses as the time that lots will spend on hold will greatly decrease and fewer testers will need to be bought when volumes increase again.

How is your company’s work addressing this growth?

We’ve worked hard to ensure yieldHUB Live, although complex behind the scenes, is simple to implement on any tester type, but is also scalable and exceptionally reliable. So once setup, it can quickly fan out in days to hundreds of testers as it requires no additional hardware.

Question: What conferences did you attend in 2024 and how was the traffic?

We participated in several key industry events in 2024, including ITC Test Week, Semicon West, International Microwave Symposium, PSECE, the Annual NMI Conference, IEEE VLSI Test Symposium, and the Semiconductor Wafer Test Expo. Attendance was strong across all these events, and we had great engagement with both existing and potential customers.

Question: Will you attend conferences in 2025? Same or more?

Absolutely! We’ve already confirmed that we’ll be exhibiting at the NMI Annual Conference (UK), Semicon West, ITC, and PSECE (Philippines), with plans to attend additional events throughout the year. We recently became a member of Silicon Saxony so the plan is to expand our presence in Germany and the EU.

Question: How do customers engage with your company?

We like to make sure that all yieldHUB customers receive exceptional support and value at every stage. Our dedicated Customer Success team is committed to providing proactive, personalized assistance, and our exclusive library of tools and resources empowers customers to maximize the benefits of our solutions.

New customers receive comprehensive online training and all customers have access to our highly efficient ticketing system, ensuring that any inquiries or issues are addressed swiftly. In fact, our median first response time in 2024 was just 5 minutes, meaning customers hear from one of our engineers almost instantly:

https://www.yieldhub.com/request-a-demo/

Beyond reactive support, we prioritize ongoing engagement. Our Director of Customer Success, Michael Clarke, regularly connects with customers via face-to-face video calls to ensure they are fully supported and to gain valuable feedback.

The results speak for themselves: Our customer satisfaction rating for closed tickets in 2024 was an impressive 95%, far exceeding the global benchmark of 74%. This level of responsiveness and care is another area that sets yieldHUB apart and we’re committed to continuing this high standard in 2025 and beyond.

Additional questions or final comments?

We’re excited for what’s to come in the next two years. Our focus remains on delivering cutting-edge AI-driven data analytics that empower semiconductor companies, especially at the test stage,  to improve efficiency and maximize profitability. We look forward to continuing our journey with customers, partners and the industry as a whole!

Talk to a yield expert
Also Read:

yieldHUB Improves Semiconductor Product Quality for All

Podcast EP167: What is Dirty Data and How YieldHUB Helps Fix It With Carl Moore

Podcast EP181: A Tour of yieldHUB’s Operation and Impact with Carl Moore

Podcast EP243: What is Yield Management and Why it is Important for Success with Kevin Robinson

Podcast EP254: How Genealogy Correlation Can Uncover New Design Insights and improvements with yieldHUB’s Kevin Robinson


Powering the Future: How Engineered Substrates and Material Innovation Drive the Semiconductor Revolution

Powering the Future: How Engineered Substrates and Material Innovation Drive the Semiconductor Revolution
by Kalar Rajendiran on 03-03-2025 at 6:00 am

Substrate Vision Summit Engineered Substrate Panel Session

Engineered substrate technology is driving an evolution within the semiconductor industry. As Moore’s Law reaches its limits, the focus is shifting from traditional planar wafer scaling to innovative material engineering and 3D integration. Companies like Soitec, Intel and Samsung are pioneering this transition, unlocking new levels of performance, efficiency, and scalability.

The topic of engineered substrates and material innovation was the focus of an interesting panel discussion at the Substrate Vision Summit 2025. Daniel Nenni, Founder of SemiWiki.com, moderated the session. SemiWiki.com is a popular online platform featuring an active discussion forum dedicated to semiconductors. Christophe Maleville, CTO & SEVP of Innovation at Soitec, David Thompson, VP Technology Research at Intel, and Kelvin Low, VP Market Intelligence & Business Development at Samsung Foundry, were the panelists.

Engineered Substrates: Changing the Competitive Landscape

One of the most compelling advantages of engineered substrates is the ability to preinstall critical performance elements into the wafer itself. By embedding functionality at the substrate level, chip designers can achieve significant improvements in efficiency and power savings.

A clear example of this was shown several years ago with RF-SOI wafers, where Soitec proved how a 2G design achieved 3G-level performance simply by switching to an RF-SOI wafer. This breakthrough provided GaAs-like performance without using GaAs technology, proving the potential of engineered wafers across various applications. Such advancements not only enhance performance but also accelerate product development cycles and reduce design complexity.

Addressing Challenges of Engineered Wafers

Semiconductor manufacturers face two major cost components: the cost of processing the wafer (internally or through procurement) and the cost of time (technology development cycles, learning curves, and integration challenges).

If every manufacturer were to independently develop SOI wafer technology, it would be an inefficient process with a steep learning curve. Instead, by relying on specialized providers like Soitec, foundries and chipmakers can source mature, high-performance engineered substrates and focus on differentiation at the chip level. This ecosystem-driven approach accelerates technology readiness and product development while ensuring cost efficiency.

Foundry Adoption and Market Demand

Foundries are recognizing the strategic importance of engineered substrates, particularly for Fully Depleted SOI (FD-SOI) technology. Samsung Foundry, a key player in this space, has already adopted 28FD-SOI in high-volume production at its Austin, TX fab, with customers like NXP and Lattice leveraging its benefits. Furthermore, Samsung is expanding its FD-SOI capacity to meet rising demand, while GlobalFoundries has also joined the ecosystem, reinforcing the technology’s viability. 18FD-SOI is on Samsung Foundry’s roadmap with ST Microelectronics as the lead customer.

Despite early concerns about cost and supply chain stability, FD-SOI has proven to be a compelling solution for applications that can manipulate body-biasing to achieve low power and high efficiency. Soitec has further addressed adoption challenges by investing in design infrastructure—including the acquisition of Dolphin Integration—to enhance support for SOI-based designs.

The 3D Future of Engineered Wafers

Both Soitec and Intel are embracing the 3D way of building engineered wafers. Soitec is advancing Smart Cut™ technology to enable precise layer transfer, facilitating hybrid bonding and wafer stacking for 3D integration. Intel, on the other hand, is developing Foveros 3D stacking, which enables transistors and logic units to be vertically integrated for improved performance and energy efficiency.

Unlike the traditional planar approach, where transistors are arranged side by side, the 3D method stacks layers vertically, reducing interconnect distances and power consumption. This shift is critical for sustaining Moore’s Law and ensuring future generations of semiconductors meet the growing demands of AI, high-performance computing, and edge applications.

Standardization and Scalability: Key to Mass Adoption

The conversation around wafer size standardization is evolving, but the real challenge lies in standardizing die-to-die interconnects for chiplet-based designs. UCIe (Universal Chiplet Interconnect Express) is leading this initiative, enabling interoperability across different foundries and manufacturers.

From an economic standpoint, scaling wafer size can yield more dies per wafer. For engineered materials like SiC or GaN, the cost-benefit analysis varies. A 300mm GaN substrate, for example, can achieve 20X Figure of Merit improvement over a 200mm GaN wafer, demonstrating the potential for engineered substrates to revolutionize power electronics and RF applications.

Value Creation Beyond Die Cost

Ultimately, the value of engineered substrates extends beyond raw die cost. By enhancing performance, reducing power consumption, and enabling new system architectures, these wafers deliver system-wide cost savings and new application possibilities. Without this broader perspective, certain technologies—such as SiC for power electronics—would struggle to establish a strong business case based solely on die cost.

Summary

As the semiconductor industry moves toward a 3D future, engineered substrates are becoming a strategic enabler of next-generation computing. Preinstalling critical performance elements into the wafer itself is helping redefine what’s possible in chip design. Foundries are embracing FD-SOI, and the push for larger, high-performance wafers is opening the door for more efficient, scalable, and cost-effective semiconductor manufacturing.

With increasing demand for AI, 5G, automotive, and high-performance computing, engineered substrates will be at the heart of the semiconductor industry’s next wave of innovation. The companies that leverage this technology early will be the ones shaping the future of computing.

Also Read:

Soitec: Materializing Future Innovations in Semiconductors

I will see you at the Substrate Vision Summit in Santa Clara

EVs, Silicon Carbide & Soitec’s SmartSiC™: The High-Tech Spark Driving the Future (with a Twist!)


Podcast EP276: How Alphawave Semi is Fueling the Next Generation of AI Systems with Letizia Giuliano

Podcast EP276: How Alphawave Semi is Fueling the Next Generation of AI Systems with Letizia Giuliano
by Daniel Nenni on 02-28-2025 at 10:00 am

Dan is joined by Letizia Giuliano, Vice President of Product Marketing and Management at Alphawave Semi. She specializes in architecting cutting-edge solutions for high-speed connectivity and chiplet design architecture. Prior to her role at Alphawave Semi, Letizia held the position of Product Line Manager at Intel, where she facilitated the integration of complex IP for external customers, as well as within Intel’s graphics and CPU products. With a background in Electrical Engineering, Letizia has contributed significantly to her field through technical papers, presentations at conferences and her involvement in defining industry standards like OpenHBI and UCIe.

Dan explores the unique and demanding requirements for next generation systems with Letizia. The need for a platform approach that addresses high-performance connectivity requirements is discussed. The role of advanced interface support through IP, chiplets and custom silicon is examined with respect to the need to scale up and scale out new systems with higher quality, reliability and shorter time to market.

Letizia describes the broad offerings Alphawave Semi is bringing to market to address these challenges. The current and future impact of this technology is explored.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.

 


CEO Interview: Dr. Andreas Kuehlmann of Cycuity

CEO Interview: Dr. Andreas Kuehlmann of Cycuity
by Daniel Nenni on 02-28-2025 at 8:00 am

Andreas 2022 Headshot cropped (2)

Dr. Andreas Kuehlmann, Executive Chairman and CEO at Cycuity, has spent his career across the fields of semiconductor design, software development, and cybersecurity. Prior to joining Cycuity, he helped build a market-leading software security business as head of engineering at Coverity and, after its acquisition by Synopsys, as General Manager of the newly formed Software Integrity business unit. In that role he led its growth from double-digit-millions to a multi-hundred-million-dollar business. He also previously worked at IBM Research and Cadence Design Systems, where he made influential contributions to hardware verification. Dr. Kuehlmann served as an adjunct professor at UC Berkeley’s Department of Engineering and Computer Science for 14 years, and received a Ph.D. in Electrical Engineering from the Technical University Ilmenau, Germany.

Tell us about your company?

Cycuity provides software products and services to specify and verify semiconductor device security. We help customers to ensure that security weaknesses are identified and mitigated during the design phase before manufacturing. Our security solutions are a critical element in the semiconductor product ecosystem for commercial and defense industries. They provide the broadest security assurance for semiconductor development across the design supply chain from secure usage of third-party IPs (3PIP) to full chips, including firmware. Cycuity’s products fit smoothly into existing design flows and utilize the simulation and emulation products of all three EDA vendors: Synopsys, Cadence, and Siemens EDA. Furthermore, our technology is applied to perform advanced security assessments of legacy hardware components in existing systems.

What problems are you solving?

Security threats in modern hardware systems are complex, rapidly evolving, and often overlooked during the early stages of design. The Radix platform directly addresses these challenges by identifying security weaknesses and unexpected behaviors early in the chip design lifecycle, minimizing the risk of escapes that lead to potential exploits. Traditional verification tools frequently fall short in providing complete security coverage across hardware, firmware, and software. Radix closes this gap by delivering a comprehensive security verification solution that spans the entire system from block level to software.

Radix’s systematic approach allows teams to develop security measures effectively and document their proper functioning with full transparency. Radix transforms security assurance from a fragmented and reactive process into a proactive, scalable, and fully traceable solution.

What application areas are your strongest?

We excel in delivering quantifiable assurance for semiconductors used in critical applications across industries, especially for high-stakes applications in defense, automotive, and IoT where security and reliability are paramount.

What keeps your customers up at night?

Our customers are concerned about ensuring the security and resilience of their semiconductor chips and embedded systems. What keeps them up at night is the thought of receiving a call from one of their customers reporting a security vulnerability in a chip that is broadly deployed in many products. Besides the impact on their brand, the cost of remediating a hardware security flaw can be extremely high. Moreover, customers are concerned about delivering secure semiconductors which comply with increasingly stringent industry standards. We address these challenges head-on by providing quantifiable assurance and robust security practices. Our solutions empower customers to achieve confidence in their designs, so they can focus on innovation without compromising on security.

What does the competitive landscape look like and how do you differentiate?

Cycuity is uniquely positioned as a thought leader and innovator of hardware security solutions. We have demonstrated our commitment to the development of secure and resilient microelectronics for defense and commercial applications. Our Radix platform goes beyond the typical “pass or fail” checks. It offers unmatched security design support through advanced security exploration and analysis capabilities, as well as scalable and traceable security verification – helping to more effectively and efficiently achieve security signoff and prove compliance.

What new features/technology are you working on?

We’ve got some exciting new features coming soon—check back next month for the details. For now, we can share a bit about Radix’s unique exploration capabilities which help security and verification teams to better understand chip designs from a security perspective. Unlike functional security verification, which is aimed at ensuring that a required set of security features are correctly implemented, security exploration is focused on investigating unknown or unintended side effects of security functions that are not entirely understood but could lead to security weaknesses or vulnerabilities. Security exploration with Radix offers powerful analysis and graphical visualization capabilities to reveal unexpected security behaviors that cannot be observed with traditional design tools. Even if the unexpected behavior turns out to be benign, fully  analyzing and deeply understanding it serves as a powerful confirmation of the overall design intent.

How do customers normally engage with your company?

Many customers come to us with the need of building a comprehensive chip security program, often starting from scratch. Security is not like flipping a switch or using a software product. It is rather a journey on which we help customers to progress starting with training, security requirement development, tool selection, integration and production ramping to documentation and signing off security for manufacturing.

Talk to a Security Expert

Also Read:

Cycuity at the 2024 Design Automation Conference

Hardware Security in Medical Devices has not been a Priority — But it Should Be


The Double-Edged Sword of AI Processors: Batch Sizes, Token Rates, and the Hardware Hurdles in Large Language Model Processing

The Double-Edged Sword of AI Processors: Batch Sizes, Token Rates, and the Hardware Hurdles in Large Language Model Processing
by Lauro Rizzatti on 02-27-2025 at 10:00 am

Accelerated,Computing, ,Parallel,Processing,To,Speed,Up,Work,On

Unlike traditional software programming, AI software modeling represents a transformative paradigm shift, reshaping methodologies, redefining execution processes, and driving significant advancements in AI processors requirements.

Software Programming versus AI Modeling: A Fundamental Paradigm Shift

Traditional Software Programming
Traditional software programming is built around crafting explicit instructions (code) to accomplish specific tasks. The programmer establishes the software’s behavior by defining a rigid set of rules, making this approach ideal for deterministic scenarios where predictability and reliability are paramount. As tasks become more complex, the codebase often grows in size and complexity.

When updates or changes are necessary, programmers must manually modify the code—adding, altering, or removing instructions as needed. This process provides precise control over the software but limits its ability to adapt dynamically to unforeseen circumstances without direct intervention from a programmer.

AI Software Modeling
AI software modeling represents a fundamental shift in how to approach problem solving. AI software modeling enables systems to learn patterns from data through iterative training. During training, AI analyzes vast datasets to identify behaviors, then applies this knowledge in the inference phase to perform tasks like translation, financial analysis, medical diagnosis, and industrial optimization.

Using probabilistic reasoning, AI makes predictions and decisions based on probabilities, allowing it to handle uncertainty and adapt. Continuous fine-tuning with new data enhances accuracy and adaptability, making AI a powerful tool for solving complex real-world challenges.

The complexity of AI systems lies not in the amount of written code but in the architecture and scale of the models themselves. Advanced AI models, such as large language models (LLMs), may contain hundreds of billions or even trillions of parameters. These parameters are processed using multidimensional matrix mathematics, with precision or quantization levels ranging from 4-bit integers to 64-bit floating-point calculations. While the core mathematical operations, namely, multiplications and additions (MAC), are rather simple, they are performed millions of times across large datasets with all parameters processed simultaneously during each clock cycle.

Software Programming versus AI Modeling: Implications on Processing Hardware

Central Processing Unit (CPU)
For decades, the dominant architecture used to execute software programs has been the CPU, originally conceptualized by John von Neumann in 1945. The CPU processes software instructions sequentially—executing one line of code after another—limiting its speed to the efficiency of this serial execution. To improve performance, modern CPUs employ multicore and multi-threading architectures. By breaking down the instruction sequence into smaller blocks, these processors distribute tasks across multiple cores and threads, enabling parallel processing. However, even with these advancements, CPUs remain limited in their computational power, lacking the enormous parallelism required to process AI models.

The most advanced CPUs achieve computational power of a few GigaFLOPS and feature memory capacities reaching a few TeraBytes in high-end servers, with memory bandwidths peaking at 500 GigaBytes per second.

AI Accelerators
Overcoming CPU limitations requires a massively parallel computational architecture capable of executing millions of basic MAC operations on vast amounts of data in a single clock cycle.

Today, Graphics Processing Units (GPUs) have become the backbone of AI workloads, thanks to their unparalleled ability to execute massively parallel computations. Unlike CPUs, which are optimized for general-purpose tasks, GPUs prioritize throughput, delivering performance in the range of petaFLOPS—often two orders of magnitude higher than even the most powerful CPUs.

However, this exceptional performance comes with trade-offs, particularly depending on the AI workload: training versus inference. GPUs can experience efficiency bottlenecks when handling large datasets, a limitation that significantly impacts inference but is less critical for training. LLMs like GPT-4, OpenAI’s o1/o3, Llama 3-405B, and DeepSeek-V3/R1 can dramatically reduce GPU efficiency. A GPU with a theoretical peak performance of one petaFLOP may deliver only 50 teraFLOPS when running GPT-4. While this inefficiency is manageable during training, where completion matters more than real-time performance, it becomes a pressing issue for inference, where latency and power efficiency are crucial.

Another major drawback of GPUs is their substantial power consumption, which raises sustainability concerns, especially for inference in large-scale deployments. The energy demands of AI data centers have become a growing challenge, prompting the industry to seek more efficient alternatives.

To overcome these inefficiencies, the industry is rapidly developing specialized AI accelerators, such as application-specific integrated circuits (ASICs). These purpose-built chips offer significant advantages in both computational efficiency and energy consumption, making them a promising alternative for the next generation of AI processing. As AI workloads continue to evolve, the shift toward custom hardware solutions is poised to reshape the landscape of artificial intelligence infrastructure. See Table I.

Attributes Software Programming AI Software Modeling
Application Objectives Deterministic and Targeted Tasks PredictiveAI and GenerativeAI
Flexibility/Adaptability Rule-based and Rigid Data-driven Learning and Evolving
SW Development Specific Programming Languages Data Science, ML, SW Engineering
Processing Method Sequential Processing Non-linear, Heavily Parallel Processing
Processor Architecture CPUs GPUs and Custom ASICs

Table I summarizes the main differences between traditional software programming vis-à-vis AI software modeling.

Source: VSORA

Key and Unique Attributes of AI Accelerators

The massively parallel architecture of AI processors possesses distinct attributes not found in traditional CPUs. Specifically, two key metrics are crucial for the accelerator’s ability to deliver the performance required to process AI workloads, such as LLMs: batch sizes and token throughput. Achieving target levels for these metrics presents engineering challenges.

Batch Sizes and the Impact on Accelerator Efficiency

Batch size refers to the number of independent inputs or queries processed concurrently by the accelerator.

Memory Bandwidth and Capacity Bottlenecks

In general, larger batches improve throughput by better utilizing parallel processing cores. As batch sizes increase, so do memory bandwidth and capacity requirements. Excessively large batches can lead to cache misses and increased memory access latency, thus hindering performance.

Latency Sensitivity

Large batch sizes affect latency because the processor must handle significantly larger datasets simultaneously, increasing computation time. Real-time applications, such as autonomous driving, demand minimal latency, often requiring a batch size of one to ensure immediate response. In safety-critical scenarios, even a slight delay can lead to catastrophic consequences. However, this presents a challenge for accelerators optimized for high throughput, as they are typically designed to process large batches efficiently rather than single-instance workloads.

Continuous Batching Challenges
Continuous batching is a technique where new inputs are dynamically added to a batch as processing progresses, rather than waiting for a full batch to be assembled before execution. This approach reduces latency and improves throughput. It may have an impact on time-to-first token, but provided that the scheduler can handle the execution it achieves higher overall efficiency.

Token Throughput and Its Computational Impact

Token throughput refers to the number of tokens—whether words, sub-words, pixels, or data points—processed per second. It depends on input token sizes and output token rates, requiring high computational efficiency and optimized data movement to prevent bottlenecks.

Token Throughput Requirements
Key to defining token throughput in LLMs is the time to first token output, namely low latency achieved through continuous batching to minimize delays. For traditional LLMs, the output rate must exceed human reading speed, while for agentic AI that relies on direct machine-to-machine communication, maintaining high throughput is critical.

Traditional Transformers vs Incremental Transformers
Most LLMs, such as OpenAI-o1, LLAMA, Falcon, and Mistral, use transformers, which require each token to attend to all previous tokens. This leads to high computational and memory costs. Incremental Transformers offer an alternative by computing tokens sequentially rather than recomputing the full sequence at every step. This approach improves efficiency in streaming inference and real-time applications. However, it requires storing intermediate state data, increasing memory demands and data movement, which impacts throughput, latency, and power consumption.

Further Considerations
Token processing also presents several challenges. Irregular token patterns, such as varying sentence and frame lengths, can disrupt optimized hardware pipelines. Additionally, in autoregressive models, token dependencies can cause stalls in the processing pipeline, reducing the effective utilization of computational resources.

Overcoming Hurdles in Hardware Accelerators
In stark contrast to the CPU that has undergone a remarkable evolutionary journey over the past 70 years, AI accelerators are still in their formative stage, with no established architecture capable of overcoming  all the hurdles in meeting the computational demands of LLMs.

The most critical bottleneck is memory bandwidth, often referred to as the memory wall. Large batches require substantial memory capacity to store input data, intermediate states and activations, while demanding high data transfer bandwidth. Achieving high token throughput depends on fast data transfer between memory and processing units. When memory bandwidth is insufficient, latency increases, and throughput declines. These bottlenecks become a major constraint on computing efficiency, limiting the actual performance to a fraction of the theoretical maximum.

Beyond memory constraints, computational bottlenecks pose another challenge. LLMs rely on highly parallelized matrix operations and attention mechanisms, both of which demand significant computational power. High token throughput further intensifies the need for fast processing performance to maintain smooth data flow.

Data access patterns in large batches introduce additional complexities. Irregular access patterns can lead to frequent cache misses and increased memory access latencies. To sustain high token throughput, efficient data prefetching and reuse strategies are essential to minimize memory overhead and maintain consistent performance.

Addressing these challenges requires innovative memory architectures, optimized dataflow strategies, and specialized hardware designs that balance memory and computational efficiency.

Overcoming the Memory Wall
Advancements in memory technologies, such as high-bandwidth memory (HBM)—particularly HBM3, which offers significantly higher bandwidth than traditional DRAM—help reduce memory access latency. Additionally, larger and more intelligent on-chip caches enhance data locality and minimize reliance on off-chip memory, mitigating one of the most critical bottlenecks in hardware accelerators.

One promising approach involves modeling the entire cache memory hierarchy with a register-like structure that stores data on a single clock cycle rather than requiring tens of clock cycles. This method optimizes memory allocation and deallocation for large batches while sustaining high token output rates, significantly improving overall efficiency.

Enhancing Computational Performance
Specialized hardware accelerators designed for LLM workloads, such as matrix multiplication units and attention engines, can dramatically boost performance. Efficient dataflow architectures that minimize unnecessary data movement and maximize hardware resource utilization further enhance computational efficiency. Mixed-precision computing, which employs lower-precision formats like FP8 where applicable, reduces both memory bandwidth requirements and computational overhead without sacrificing model accuracy. This technique enables faster and more efficient execution of large-scale models.

Optimizing Software Algorithms
Software optimization plays a crucial role in fully leveraging hardware capabilities. Highly optimized kernels tailored to LLM operations can unlock significant performance gains by exploiting hardware-specific features. Gradient checkpointing reduces memory usage by recomputing gradients on demand, while pipeline parallelism allows different model layers to be processed simultaneously, improving throughput.

By integrating these hardware and software optimizations, accelerators can more effectively handle the intensive computational and memory demands of large language models.

About Lauro Rizzatti

Lauro Rizzatti is a business advisor to VSORA, an innovative startup offering silicon IP solutions and silicon chips, and a noted verification consultant and industry expert on hardware emulation.

Also Read:

A Deep Dive into SoC Performance Analysis: Optimizing SoC Design Performance Via Hardware-Assisted Verification Platforms

A Deep Dive into SoC Performance Analysis: What, Why, and How

SystemReady Certified: Ensuring Effortless Out-of-the-Box Arm Processor Deployments