Ceva webinar AI Arch SEMI 800X100 250625

WEBINAR: Edge AI Optimization: How to Design Future-Proof Architectures for Next-Gen Intelligent Devices

WEBINAR: Edge AI Optimization: How to Design Future-Proof Architectures for Next-Gen Intelligent Devices
by Daniel Nenni on 07-03-2025 at 10:00 am

Ceva webinar AI Arch SEMI 400X400 250625

Edge AI is rapidly transforming how intelligent solutions are designed, from smart home devices to autonomous vehicles, healthcare gadgets, and industrial IoT. Yet, architects, chip designers, and product managers frequently grapple with a common and daunting challenge: creating efficient, high-performance AI solutions today that remain relevant and adaptable for tomorrow’s innovations.

The reality is, the journey toward optimal edge AI solutions involves navigating numerous critical hurdles. Striking a delicate balance between computational efficiency, power consumption, latency, and flexibility for future algorithm updates requires intricate planning and foresight. Many industry professionals are caught between selecting overly rigid architectures—limiting innovation—or overly flexible systems that sacrifice performance and efficiency.

Additionally, AI models continue to evolve rapidly. Technologies like neural network architectures, quantization methods, and model compression techniques change faster than ever before. Architects and designers must forecast these changes to avoid costly redesigns and stay competitive.

But how can technological evolution be effectively anticipated? How can current AI hardware and software architectures remain powerful enough for future demands yet flexible enough to adapt seamlessly to upcoming innovations? And importantly, how can leveraging cloud inference capabilities enhance scalability and accelerate innovation in AI applications?

These pivotal questions are exactly what Ceva’s upcoming webinar, “What it Really Takes to Build Future-Proof AI Architecture”, will address. Scheduled for July 15th, 2025, this must-attend session will reveal industry-leading insights into overcoming the critical challenges of designing AI-driven solutions, including strategies for effectively leveraging cloud-based inference to meet evolving performance requirements.

Participants will join a distinguished panel of AI and semiconductor experts as they dive into practical approaches, proven methodologies, and innovative architectures to tackle the complexities of AI solutions. Attendees will learn about critical trade-offs, including performance versus power consumption, flexibility versus complexity, immediate cost versus long-term scalability, and how cloud inference can play a pivotal role in achieving future-proof designs.

This webinar isn’t just about theory—it’s about actionable insights. It aims to equip attendees to confidently navigate their next design project, whether they’re focused on edge devices or exploring cloud inference as part of their overall AI strategy. Whether an AI system architect fine-tuning next-generation devices, a chip designer optimizing performance-per-watt, or a product manager aiming to keep solutions competitive, this webinar will provide crucial insights for making informed architectural decisions.

Seats for this insightful webinar are limited, so don’t miss the opportunity to reserve your spot. Embrace this chance to ensure your next AI project doesn’t just meet today’s needs—it exceeds tomorrow’s expectations.

Secure your competitive edge today by registering here.

Uncover what it truly takes to build AI architectures that stand the test of time. Join Ceva on July 15th, 2025, and unlock the key insights your team needs to excel in the rapidly evolving world of AI, enhanced by powerful cloud inference solutions.

Also Read:

Podcast EP291: The Journey From One Micron to Edge AI at One Nanometer with Ceva’s Moshe Sheier

Turnkey Multi-Protocol Wireless for Everyone

Ceva-XC21 and Ceva-XC23 DSPs: Advancing Wireless and Edge AI Processing


WEBINAR Unpacking System Performance: Supercharge Your Systems with Lossless Compression IPs

WEBINAR Unpacking System Performance: Supercharge Your Systems with Lossless Compression IPs
by Daniel Nenni on 07-03-2025 at 6:00 am

CAST Compression IP Webinar 400x400

In today’s data-driven systems—from cloud storage and AI accelerators to automotive logging and edge computing—every byte counts. The exponential growth in data volumes, real-time processing demands, and constrained bandwidth has made efficient, lossless data compression a mission-critical requirement. Software-based compression techniques, while flexible, often fall short in meeting the throughput, latency, and power requirements of modern hardware systems.

REGISTER HERE FOR THE LIVE WEBINAR

This webinar dives deep into the world of lossless data compression, with a focus on the industry’s most widely used algorithms: GZIP, LZ4, Snappy, and Zstd. Each of these algorithms presents a unique trade-off between compression ratio, speed, and resource requirements, making the selection of the right algorithm—and the right hardware implementation—crucial for performance and scalability.

We’ll start with a technical comparison of the four algorithms, highlighting their core mechanisms and application domains. You’ll learn how GZIP’s DEFLATE approach, LZ4’s lightning-fast block compression, Snappy’s simple parsing model, and Zstd’s dictionary-based hybrid technique serve different use cases—from archival storage to real-time streaming.

From there, we’ll examine the limitations of software compression, particularly in embedded and high-performance designs. You’ll see how software implementations can quickly become bottlenecks, consuming excessive CPU cycles and failing to maintain line-rate performance. This sets the stage for hardware-accelerated compression, which delivers deterministic latency, high throughput, and significant energy savings—critical in FPGA and ASIC implementations.

The webinar will explore the capabilities and performance of HW implementations of the above compression algorithms, studying trade-offs between latency, compression ratio and resources and using examples taken from CAST’s extended portfolio:

ZipAccel-C/D: A GZIP-compatible DEFLATE engine with industry-leading ratio and throughput.

LZ4SNP-C/D: Optimized for ultra-low latency and high-speed performance in real-time systems, using the LZ4 and Snappy algorithms.

You’ll gain insights into integration strategies, including AXI and streaming interface compatibility, resource usage for FPGA vs. ASIC targets, and customization options available through CAST’s flexible IP design process.

Through real-world application examples—ranging from high-speed data transmission to on-board vehicle data logging—we’ll demonstrate how these cores are enabling next-generation performance across industries.

Whether you’re an FPGA designer, system architect, or IP integrator, this session will equip you with practical knowledge to select and implement the right compression core for your needs.

Join us to unpack the power of compression, boost your bandwidth efficiency, and gain the competitive edge that only silicon-optimized IP can deliver.

Webinar Abstract:

As data volumes surge across cloud, AI, automotive, and edge systems, efficient lossless compression has become essential for meeting performance, latency, and bandwidth constraints. This webinar explores the trade-offs and strengths of the industry’s leading compression algorithms—GZIP, LZ4, Snappy, and Zstd—highlighting how hardware-accelerated implementations can overcome the limitations of software-based solutions in demanding, real-time environments.

You’ll gain insights into latency vs. compression ratio vs. resource trade-offs, integration strategies for FPGAs and ASICs, and real-world applications like high-speed networking and automotive data logging. Discover how to boost your system’s efficiency and unlock next-level performance through compression IPs tailored for modern hardware.

Speaker:

Dr. Calliope-Louisa Sotiropoulou is an Electronics Engineer and holds the position of Sales Engineer & Product Manager at CAST. Dr. Sotiropoulou specializes in Image, Video and Data compression, and IP stacks. Before joining CAST she worked as a Research and Development Manager and an FPGA Systems Developer for the Aerospace and Defense sector. She has a long academic record as a Researcher, working for various projects, including the Trigger and Data Acquisition system of the ATLAS experiment at CERN. She received a PhD from the Aristotle University of Thessaloniki in 2014.

REGISTER HERE FOR THE LIVE WEBINAR

About CAST

Computer Aided Software Technologies, Inc. (CAST) is a silicon IP provider founded in 1993. The company’s ASIC and FPGA IP product line includes security primitives and comprehensive SoC security modules; microcontrollers and processors; compression engines for data, images, and video; interfaces for automotive, aerospace, and other applications; and various common peripheral devices. Learn more by visiting www.cast-inc.com.

Also Read:

Podcast EP273: An Overview of the RISC-V Market and CAST’s unique Abilities to Grow the Market with Evan Price

CAST Advances Lossless Data Compression Speed with a New IP Core

CAST, a Small Company with a Large Impact on Many Growth Markets #61DAC


ChipAgents Tackles Debug. This is Important

ChipAgents Tackles Debug. This is Important
by Bernard Murphy on 07-02-2025 at 6:00 am

ChipAgents concept min

Innovation is never ending in verification, for performance, coverage, connection to verification plans and other aspects of DV. But debug, accounting for 40% of the verification cycle, has remained stubbornly resistant to significant automation. Debug IDEs help to visualize but don’t address the core problem: given a failure, what is the most likely root cause for that failure? Fault localization, the common name for this objective, is still more of an art than a science. Some progress has been made through spectrum-based analytics (by test pass/fail, code coverage, suspiciousness, all crossed with failures). These methods help but still provide rather coarse-grained localization. The problem is that fine-grained debug generally requires a chain of reasoning complemented by experiments to effectively isolate a root cause. That makes it a perfect candidate for agentic methods, as ChipAgents are now demonstrating.

The nature of debug and why agents are promising

Debugging hardware or software is hard because the connection between an observed failure and the root cause for that failure is rarely obvious. These may be separated widely in space, in functions/modules that appear very unrelated, or they may be separated widely in time, a root cause planting a trap which is only sprung millions of cycles later. New bugs can pop up during design evolution thanks to seemingly harmless fixes to other bugs, with no indication for why a problem thought to be resolved should suddenly reappear.

Given this reality, debug becomes an iterative trial-and-error process, guided certainly by strong IDEs to visualize waveforms, cross probe with code, and so on. But still the real intelligence in finding a root cause depends heavily on the insight of DV and design engineers, and experiments they run to test their guesses. This discovery process is fundamental to the debug task. Bugs may have more than one root cause or may result from some latent behavior not considered in spec development or test planning. This is why debug consumes so much time and resource in the verification cycle.

At DAC 2025 I talked to Kexun Zhang and Brian Li (both at ChipAgents) to understand their agents-based approach to automating debug. I was impressed. What they are doing is an entirely logical approach to debug, aware of state-of-the-art techniques like spectrum analysis while building on LLM and agent-based methods to advance beyond first-order localization. This looks like a real step forward in debug automation, managing discovery and allowing for expert guidance in the reasoning flow, or full autonomy if you wish.

ChipAgents applied to debug

I’ll start with what these methods can do for debug, then turn to methods and training considerations. I saw a brief demo at the ChipAgents booth, wish I could have seen more though I am usually not a fan of demos. In this case the product consumes a waveform file and a simulation log and then lets you ask natural language questions, just like a chatbot.

In the demo, an engineer typed in something like “X worked correctly on the first pass but not on the second pass. Tell me why.” “X” was a relatively high-level behavior. This prompt launched an initial analysis, narrowing down first level candidate behaviors in the waveforms to explain the difference. Following this initial analysis the tool offered the expert an opportunity to refine the prompt, or to let it continue to refine further those initial localizations itself. In these subsequent steps it might run one or more simulations, possibly with modifications to the code (in a new branch) to test hypotheses.

This process repeats until the tool has isolated one or several final candidates. Then it’s up to the expert to consider if this is an RTL bug, a test plan bug or a spec bug. Or possibly not a bug but a feature!

Learning and agent methods

Kexun stressed out of the box chat models do not work well in this domain. At ChipAgents they have put significant work into training the system to understand the chip domain much more accurately than could a generalist chat model. They guided training using a combination of synthetic and human annotated data. Which could be an important moat to anyone with plans to copy their approach😀  They have also built tools to parse through giant simulation dump and log files, another must-have you wouldn’t get from a general bot. Agents work with these tools in their analysis and traceback.

On localization methods Kexun had an interesting observation. He said that in earlier agentic approaches to debug in software engineering, agents also used spectrum-based methods and those also proved roughly accurate (20-30% localization). But as models got stronger, agents are now becoming simpler, no longer using spectrum methods explicitly. He added that whether this will also hold for hardware debug is still in debate within the company. I find it intriguing that a combination of simpler agents might become a more powerful approach.

We talked briefly about other sources for analysis – check-ins, specs and testplans for example. All of these are being worked, though the existing debug capability is attracting significant attention for active design teams, an appeal I find totally understandable.

Other development

Brian touched on a few other areas of development. Verification plan generation, driven from specifications running to hundreds or thousands of pages. Checking for disconnects between a spec and an existing test plan. Code generation along the lines of tools like CoPilot. And bug triaging, which is another interesting area where I would like to see if they can add new value over and above automation available today.

Overall a very promising direction. You can learn more about ChipAgents HERE.


Siemens EDA Unveils Groundbreaking Tools to Simplify 3D IC Design and Analysis

Siemens EDA Unveils Groundbreaking Tools to Simplify 3D IC Design and Analysis
by Kalar Rajendiran on 07-01-2025 at 10:00 am

Innovator3D IC Solution Suite

In a major announcement at the 2025 Design Automation Conference (DAC), Siemens EDA introduced a significant expansion to its electronic design automation (EDA) portfolio, aimed at transforming how engineers design, validate, and manage the complexity of next-generation three-dimensional integrated circuits (3D ICs). With the launch of the Innovator3D IC solution suite and Calibre 3DStress, Siemens EDA delivers an end-to-end, multiphysics-driven environment aimed at tackling the inherent challenges of heterogeneous, multi-die integration.

I chatted with Keith Felton, Principal Technical Product Manager for Siemens EDA’s semiconductor packaging solutions, and Shetha Nolke, Principal Product Manager for its Calibre 3DStress tool, to gain additional insights.

3D ICs and the Need for Coherent Design Platforms

As the semiconductor industry increasingly pivots toward chiplet-based and 3D packaging architectures, traditional tools for physical design, thermal analysis, and data management are proving insufficient. The benefits of 3D ICs—ranging from higher performance and energy efficiency to reduced footprint and modular reuse—come with trade-offs in integration complexity and packaging-induced reliability challenges.

Siemens EDA’s newly introduced technologies aim to provide a unified, scalable solution to this new design paradigm. By embedding intelligent simulation, AI-powered layout assistance, and full-stack thermal-mechanical analysis, the company seeks to remove major roadblocks that have traditionally hampered development of these advanced systems.

Innovator3D IC: A Unified Platform for 3D System Design

At the core of Siemens EDA’s announcement is the Innovator3D IC, a modular suite purpose-built for design teams developing heterogeneously integrated 2.5D and 3D chip architectures. Unlike traditional EDA tools that treat multi-die systems as loosely connected components, Innovator3D IC provides a centralized, coherent framework that enables comprehensive design authoring, system-level simulation, and rigorous interface management.

The suite is anchored by Integrator, a cockpit that allows designers to visualize and configure all physical, electrical and thermal characteristics of a 3D IC system stack. With Integrator, engineering teams can manage die-to-die placement, define vertical interconnects, and simulate package-aware system behavior within a single interactive workspace. This enables faster iteration and earlier identification of system bottlenecks.

Complementing this is the Layout Solution, a correct-by-construction environment purpose-built for interposer design and silicon bridge planning. This tool enables designers to author highly constrained routing between chiplets and substrates while respecting package-level constraints and thermal zones.

Another key component is the Protocol Analyzer, which ensures high-speed chip-to-chip interfaces conform to expected protocol standards and signal integrity thresholds. It plays a critical role in verifying that chiplet communication pathways meet stringent electrical and timing requirements early in the design phase.

Finally, the suite includes a robust Data Management layer, which consolidates all relevant design metadata, IP reuse information, and interface specifications. This unified data model supports traceability and revision control across multi-team projects, reducing errors and accelerating design closure.

What distinguishes Innovator3D IC from legacy tools is its scalability and performance. Built with advanced multithreading and powered by AI-assisted layout guidance, the suite can efficiently handle designs featuring over five million pins—meeting the demands of the most complex AI, mobile, and high-performance computing (HPC) architectures.

Calibre 3DStress: Predictive Reliability at the Transistor Level

To complement its Innovator3D IC suite, Siemens EDA also introduced Calibre 3DStress, a first-of-its-kind tool that brings transistor-level resolution to thermo-mechanical analysis. Unlike traditional stress modeling tools that evaluate packaging effects at a die-level scale, Calibre 3DStress provides granular insights into how physical stresses impact individual transistors and circuit blocks.

This capability is essential in modern 3D ICs, where materials with differing coefficients of thermal expansion—across dies, interposers, and substrates—can induce localized strain during manufacturing and operation. These stress-induced deformations can subtly alter transistor behavior, shift threshold voltages, and compromise timing margins, ultimately threatening circuit-level performance even if all nominal design rules are met.

Calibre 3DStress allows engineering teams to simulate such effects with high fidelity before a single chip is fabricated. By modeling post-reflow and operational mechanical stresses and correlating them with electrical performance metrics, the tool enables designers to verify that neither the packaging process nor the final use environment will degrade circuit behavior or long-term product reliability. This simulation engine is fully integrated with Siemens EDA’s existing Calibre 3DThermal technology, allowing for comprehensive multiphysics analysis that spans thermal profiles, structural deformation, and transistor behavior within a unified verification flow.

Positioned for the Future of System Design

Together, Innovator3D IC and Calibre 3DStress represent a strategic leap for Siemens EDA and its customers. These solutions realign the entire development process around the realities of modern system integration. As semiconductor companies embrace chiplet reuse, heterogeneous architectures, and rapid design cycles, the ability to plan, simulate, and verify at scale will be critical. By combining intelligent design orchestration with predictive, physics-based verification, Siemens EDA has positioned itself as a catalyst for the next wave of semiconductor innovation.

To learn more about Siemens EDA’s broad portfolio of solutions for 3D IC architectures, visit here.

 

 


CEO Interview with Faraj Aalaei of Cognichip

CEO Interview with Faraj Aalaei of Cognichip
by Daniel Nenni on 07-01-2025 at 8:00 am

Faraj (1)

Faraj Aalaei is a successful visionary entrepreneur with over 40 years of distinguished experience in communications and networking technologies. As a leading entrepreneur in Silicon Valley, Faraj was responsible for building and leading two semiconductor companies through IPOs as a founder and CEO.

Post acquisition of Aquantia by Marvell Technology, Faraj was responsible for Marvell’s billion-dollar Networking and Automotive segment. Prior to his successful entrepreneurial journeys at Aquantia and Centillium communication, he worked at AT&T Bell Laboratory, Fujitsu Network Communication and MA/Com.

Tell us about your company?

Cognichip represents a new era in semiconductors — one that makes chip design easier, faster, and more accessible . Our mission is to reshape the economics of the sector and elevate intelligence as our new scaling factor.

If you look back to the early 2000s, there were nearly 200 VC funded semiconductor startups launching each year in the US. By 2015, the number had dwindled down to just a few. AI brought some revitalization in the sector yet, still, since 2015 many AI hardware startups have struggled to scale and IPOs remain rare.

Now, I am a two-time semiconductor IPO CEO with 40+ years of experience and led Marvell’s billion dollar Networking and Automotive segment. I’ve also invested in over 40 companies in recent years as a VC – none of them in semiconductors.

Why? These days, it takes $200M+ to build a semiconductor company. It takes a long time to get to revenue ramp and most importantly from an investor perspective, you’re in the deal with over half of that before you know if you even have something. Most VCs stay away because it’s hard to make money in a market like that, especially early VC investors who are typically the experts in that vertical.

I founded Cognichip to change all that – with a team of experts from Amazon, Google, Apple, Synopsys, Aquantia, and KLA with deep expertise in semiconductors, AI, and technology development. Last year, we raised $33M from investment from Mayfield, Lux Capital, FPV, and Candou ventures to create the first Artificial Chip Intelligence (ACI®) for designing semiconductors.

Artificial Chip Intelligence – What is that? What problems are you solving?
ACI® is to the semiconductor industry what AGI is to the broader AI space – a type of artificial intelligence that understands, learns, and applies knowledge across a wide range of chip design tasks, with designer-like cognitive abilities.

Fundamentally, Cognichip aims to solve two major challenges in chip design: reducing development costs and reducing time to market which will lead to democratizing access to chip design at large scale.

Chip design is slow, expensive, and highly specialized. It can take 3-5 years to go from design to production, introducing significant risk, inefficiency, and high energy consumption. The whole process requires over $100M in capital, and is often limited to regions where a broad set of chip experts in knowledge verticals are available. Over 87% of headcount budgets for semiconductor companies are spent on core experts in design, verification and bring up.

Additionally, access to talent has become exceedingly difficult. It’s no secret that the semiconductor industry faces a severe talent shortage, with an estimated need for 1 million more engineers by 2030. If you look in the last 25 years, the industry has grown in lock-step with its workforce – roughly 7% CAGR. Traditional scaling methods rely on increasing the workforce, but that is no longer sustainable.

With the rise of AI, we have a unique opportunity to modernize semiconductor development—making it faster, more cost-effective, and accessible to a broader range of innovators. Without this transformation, the U.S. semiconductor industry risks falling further behind.

What application areas are your strongest?

We are focusing on fundamentally rethinking the traditional “waterfall” model of chip design. It is the result of 1990s CAD methodologies that require “perfect” requirements and make it very difficult and expensive to react to errors and changes. Cognichip’s Artificial Chip Intelligence will actually flip this model to a compute-centric environment where design abstractions evolve concurrently and requirements are dynamically updated and tested. This shift will bring profound change on architectural exploration, microarchitecture definition, RTL design, verification and more.

This fundamental rethink will help…
Large companies to improve efficiency, “do more with less”
Mid-size companies to be able to enter new markets, currently inaccessible due to capital or expertise constraints
Startups and even individual innovators start imagining how to bring a chip into their niche market, effectively democratizing this important industry

What keeps your customers up at night?

There are four constraints on the semiconductor sector, all arising from the fact that it takes 2-3 years from concept to first samples, and another 12-18 months to reach production with a new chip:
1 Long capital commitments – Spending $100M+ before reaching production
2 Resource constraints – Dependence on multiple, and often narrow expertise (e.g. DSP, Switching, Processor), while combating the global shortage of engineers
3 Market-fit risk – Anticipating market shifts introduces excessive chip bloating, adding size, power
4 Rigid supply chains – Complex chips are designed for specific manufacturers, only possible in limited geos

Cognichip’s ACI® will alleviate these constraints, allowing teams to design fast and smarter – even without deep semiconductor experience.

What does the competitive landscape look like and how do you differentiate?

AI offers exciting opportunities and we do see a growing ecosystem of AI-enabled EDA tools, new startups adding their own pieces of the puzzle, and AI-savvy customers, such as hyperscalers, that are applying their deep know-how to chip design steps that are closely tied to their architectures.

At Cognichip, we’re carving out a new lane, with our aspiration to achieve ACI®. We are not looking at incrementally improving these existing spaces, rather we are driving towards an “OpenAI-like” moment for semiconductors that will set a new industry standard.

Which brings me to another yet critical differentiator; talent. Scientists and top engineers are interested in working on hard problems. That’s what we get satisfaction from. We have hired Olympiad gold-medal winners in mathematics and physics, as well as veterans in the chip and software industries. Our management team are experts in the end market, customers, and pain points. We have a market proven successful track record of focused execution and success. We also have been fortunate enough to get the highest quality investors backing and advising the company.

What new features/technology are you working on?

While we’re not announcing product details or timelines at this time, our ACI® technology is built on a first-of-its-kind physics-informed foundation model, purposely built for chip design tasks. It combines physics and logic training, enabling deep navigation of the design search spaces. ACI® transforms serial, human-centric design processes to concurrent, massively parallel, autonomous workflows, enabling today’s engineers to become architects while the computer becomes the designer.

How do customers normally engage with your company?

We just launched out of stealth in May and we have been working on our technology innovations for over a year. We are engaging with leaders that align with our mission to make design easier, less costly, and more accessible. We know there is a lot of interest, and we’ll share more when the time is right. Stay tuned!

Contact Cognichip

Also Read:

Rethink Scoreboards to Supercharge AI-Era CPUs

Rethink Scoreboards to Supercharge AI-Era CPUs
by Admin on 07-01-2025 at 6:00 am

Register (1)

By Dr. Thang Minh Tran, CEO/CTO Simplex Micro

Today’s AI accelerators—whether built for massive data centers or low-power edge devices—face a common set of challenges: deep pipelines, complex data dependencies, and the high cost of speculative execution. These same concerns have long been familiar in high-frequency microprocessor design, where engineers must constantly balance performance with correctness. The deeper the pipeline, the greater the opportunity for instruction-level parallelism—but also the higher the risk of pipeline hazards, particularly read-after-write (RAW) dependencies.

Conventional scoreboard architectures, introduced in the 1970s and refined during the superscalar boom of the 1990s, provided only a partial fix. While functional, they struggled to scale with the growing complexity of modern pipelines. Each additional stage or execution lane increased the number of operand comparisons exponentially, introducing delays that made high clock rates harder to maintain.

The core function of a scoreboard—determining whether an instruction can safely issue—requires comparing destination operands of in-flight instructions with the source operands of instructions waiting to issue. In deep or wide pipelines, this logic quickly becomes a combinatorial challenge. The question I set out to solve was: could we accurately model operand timing without relying on complex associative lookups or speculative mechanisms?

At the time I developed the dual-row scoreboard, the goal was to support deterministic timing in wireless baseband chips, where real-time guarantees were essential and energy budgets tight. But over time, the architecture proved broadly applicable. Today’s workloads, particularly AI inference engines, often manage thousands of simultaneous operations. In these domains, traditional speculative methods—such as out-of-order execution—can introduce energy costs and verification complexity that are unacceptable in real-time or edge deployments.

My approach took a different path—one built on predictability and efficiency. I developed a dual-row scoreboard architecture that reimagines the traditional model with cycle-accurate timing and shift-register-based tracking, eliminating speculation while scaling to modern AI workloads. It split timing logic into two synchronized yet independent shift-register structures per architectural register, ensuring precise instruction scheduling without speculative overhead.

Scoreboard Mechanics – A Shift-Register Approach

Think of the dual-row scoreboard like a conveyor belt system. Each register has two tracks. The upper track monitors where the data is in the pipeline; the lower track monitors when it will be ready. Every clock cycle, the markers on these belts move one step—advancing the timeline of each instruction.

Forwarding Tracker – The Upper Row This row operates as a shift register that moves a singleton “1” across pipeline stages, precisely tracking the position of an instruction that will generate a result. This enables forwarding without directly accessing the register file.

Issue Eligibility Tracker – The Lower Row The second row independently tracks when a result will be available, using a string of “1”s starting from the earliest stage of availability. If a dependent instruction requires the data before it’s ready, issue is stalled. Otherwise, it proceeds immediately.

By comparing operand readiness with execution timing, the scoreboard makes a straightforward issue decision using the equation:

D = (EA – E) – EN + 1

Where:

  • Eis the current stage of the producer instruction
  • EAis the stage where the result first becomes available
  • ENis the stage where the consumer will first need it

If D ≤ 0, the dependent instruction can issue safely. If D > 0, it must wait.

For example, suppose a result becomes available at EA = E3, the producer is currently at stage E2, and the consumer needs it at EN = E2. Then: D = (3 – 2) – 2 + 1 = 0 → the instruction can issue immediately. This simple arithmetic ensures deterministic execution timing, making the architecture scalable and efficient.

Integration and Implementation Each architectural register gets its own scoreboard “page,” which contains both the upper and lower rows. The scoreboard is thus a sparse, distributed structure—conceptually a 3D array indexed by register name (depth), pipeline stage (column), and logic type (upper vs. lower row). Because both rows shift synchronously with the pipeline clock, no multi-cycle arbitration or stall propagation is necessary.

The register file itself is simplified, because many operands never reach it. Data forwarding allows results to skip the register file entirely if they are consumed soon after being produced. This has both power and area benefits, particularly in small-process nodes where register file write ports are expensive.

Why This Still Matters Today

I built my architecture to solve a brutally specific problem: how to guarantee real-time execution in wireless modems where failure wasn’t an option. First rolled out in TI’s OMAP 1710, my design didn’t just power the main ARM+DSP combo—it shaped the dedicated modem pipeline supporting GSM, GPRS, and UMTS.

In the modem path, missing a deadline meant dropped packets—not just annoying like a lost video frame, but mission-critical. So I focused on predictable latency, tightly scoped memory, and structured task flow. That blueprint—born in the modem—now finds new life in AI and edge silicon, where power constraints demand the same kind of disciplined, deterministic execution.

For power-constrained environments like edge AI devices, speculative execution poses a unique challenge: wasted power cycles from mis predicted instructions can quickly drain energy budgets. AI inference workloads often handle thousands of parallel operations, and unnecessary speculation forces compute units to spend power executing instructions that will ultimately be discarded. The dual-row scoreboard’s deterministic scheduling eliminates this problem, ensuring only necessary instructions are issued at precisely the right time, maximizing energy efficiency without sacrificing performance.

The register file itself is simplified, because many operands never reach it. Data forwarding allows results to skip the register file entirely if they are consumed soon after being produced. In cases where the destination register is the same for both the producer and consumer instructions, the producer may not need to write back to the register file at all—saving even more power. This has both power and area benefits, particularly in small-process nodes where register file write ports are expensive.

This shift extends into the RISC-V ecosystem, where architects are exploring timing-transparent designs that avoid the baggage of speculative execution. Whether applied to AI inference, vector processors, or domain-specific accelerators, this approach provides robust hazard handling without sacrificing clarity, efficiency, or correctness.

Conclusion – A Shift in Architectural Thinking

For decades, microprocessor architects have balanced performance and correctness, navigating the challenges of deep pipelines and intricate instruction dependencies. Traditional out-of-order execution mechanisms rely on dynamic scheduling and reorder buffers to maximize performance by executing independent instructions as soon as possible, regardless of their original sequence. While effective at exploiting instruction-level parallelism, this approach introduces energy overhead, increased complexity, and verification challenges—especially in deep pipelines. The dual-row scoreboard, by contrast, provides precise, cycle-accurate timing without needing speculative reordering. Instead of reshuffling instructions unpredictably, it ensures availability before issuance, reducing control overhead while maintaining throughput.

In hindsight, the scoreboard isn’t just a control mechanism—it’s a new way to think about execution timing. Instead of predicting the future, it ensures the system meets it with precision—a principle that remains as relevant today as it did when it was first conceived. As modern computing moves toward more deterministic and power-efficient architectures, making time a first-class architectural concept is no longer just desirable—it’s essential.

Also Read:

Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot

Voice as a Feature: A Silent Revolution in AI-Enabled SoCs

Feeding the Beast: The Real Cost of Speculative Execution in AI Data Centers

edictive Load Handling: Solving a Quiet Bottleneck in Modern DSPs


CEO Interview with Dr. Noah Strucken of Ferric

CEO Interview with Dr. Noah Strucken of Ferric
by Daniel Nenni on 06-30-2025 at 10:00 am

Noah Sturcken headshot

Noah Sturcken is a Founder and CEO of Ferric with over 40 patents issued and 15 publications on Integrated Voltage Regulators. Noah leads Ferric with focus on business development, marketing and new technology development. Noah previously worked at AMD R&D Lab where he developed Integrated Voltage Regulator (IVR) technology.

Tell us about your company

Ferric is a growth stage technology company that designs, manufactures, and sells chip-scale DC-DC power converters called Integrated Voltage Regulators (IVR) which are critical for powering high performance computers. Ferric’s line of IVR products are complete power conversion and regulation systems that can be placed especially close to a processor or even within processor packaging to provide significant reductions in system power consumption and area while enabling improved performance. Systems that employ Ferric’s IVRs realize 20%-70% reduction in solution footprint and bill of materials costs with a 10%-50% improvement in energy efficiency. Ferric’s IVR products are being adopted to power next generation Artificial Intelligence (AI) processors where Ferric’s market leading power density and efficiency provides a direct advantage in processor performance.

What problems are you solving?

The intense demand for high performance computing spurred by recent breakthroughs in AI has driven steep increases in datacenter power consumption. The latest generation of processors developed for AI training use liquid-cooling and require more than 1000 Watts per processor, which conventional power supplies struggle to provide, resulting in inefficiency and loss of performance. Soon, processors will consume more than 2000 Watts per processor, further straining conventional power delivery solutions. In addition to performance issues, traditional power solutions take up vasts amounts of PCB real estate. Doubling the size of power solutions as power demand doubles is untenable. Next-generation systems must integrate power with significantly better power density, system bandwidth and energy saving capability, which is only possible with IVR solutions such as Ferric’s.

What application areas are your strongest?

High-performance processor applications are among our strongest because of the urgent requirements for powering AI workloads. Our products can achieve a solution current density exceeding 4A/mm2 with conversion efficiency better than 94% and regulation bandwidth approaching 10MHz. The unique combination of density, conversion efficiency and regulation bandwidth available from Ferric’s IVRs allows high performance processors to receive more power with less waste. Other applications for Ferric’s IVRs include FPGAs and ASICs, which tend to have high supply counts and therefore realize dramatic reductions in board area by integrating voltage regulators into the package.

What keeps your customers up at night?

What keeps our customers up at night is the prospect that their processors will underperform because their power solution does not provide enough power when it’s needed. AI workloads are pushing processor power demands like never before and a company’s competitiveness may boil down to whether they can reliably and efficiently deliver enough power to their processors.

What does the competitive landscape look like and how do you differentiate?

Ferric’s technology team consists of experts who have been leading the integration of switched-inductor power converters with CMOS for the past 15 years. Ferric’s technology is 10x denser than the next closest option on the market and is readily available to customers in convenient, chip-scale IVR modules or through Ferric’s partnership with TSMC. Ferric’s patented magnetic inductor and power system technology enables a remarkable improvement in IVR efficiency and integration density, delivering a substantial advantage to Ferric’s customers.

What new features/technology are you working on?

Greater power density, higher conversion efficiency, faster regulation bandwidth and better power integration options. High-performance computing systems are continuously pushing integration and power levels, so we must do the same with our IVRs. We are accomplishing this by driving our designs and magnetic composites even further while working closely with our customers to integrate our products with their systems.

How do customers normally engage with your company?

Similar to other power module vendors, we provide datasheets, models, eval. boards and samples to facilitate our customers’ evaluation and adoption of Ferric’s IVRs. Our applications team provides direct support to customers as they progress through evaluation, adoption and production. We are experienced in supporting our customers in a wide variety of ways. In addition to supporting our devices, we do power integrity analysis for customer systems, perform layout and integration scheme assistance, and offer thermal solutions options and support for numerous integration methods, ranging from PCB attach to co-packaging with the processor.

Contact Ferric

Also Read:

CEO Interview with Vamshi Kothur, of Tuple Technologies

CEO Interview with Yannick Bedin of Eumetrys

The Sondrel transformation to Aion SIlicon!


Jitter: The Overlooked PDN Quality Metric

Jitter: The Overlooked PDN Quality Metric
by Admin on 06-30-2025 at 6:00 am

Figure 1 – Accumulated jitter

The most common way to evaluate a power distribution network is to look at its impedance over the effective frequency range. A lower impedance will produce less noise when transient current is demanded by the IC output buffers. However, this transient current needs to be provided at the same time for each transition or jitter will be produced that will limit the maximum operating speed of the interface.

While not typically evaluated in PDN design, jitter can have a significant effect on the timing margins on single-ended nets found in interfaces such as double data rate (DDR) memory, limiting the maximum operating speed.

Jitter provides a metric for evaluating the quality of a PDN, since reducing it can improve the performance of a data-driven interface. . In this article we describe a simulation methodology to automatically measure the jitter caused by a PDN and use the results to evaluate the quality and effectiveness of the PDN decoupling.

The first step in measuring the jitter induced by a PDN is to create a good electrical model of the PDN that captures all its electrical characteristics, such as the frequency response of the decoupling capacitors, the mounting inductance of the ICs, and the spreading inductance of the planes. This can be done with a 3D electro-magnetic field solver, which is a hybrid of a full-wave solver, with simplifications to support a large power network structure. This type of model is often an S-parameter model with ports at the IC of interest and at the VRM connection.

The PDN model is then placed in a schematic and connected to a VRM model at the input and a driver current model as a load. The VRM model should be a simplified representation of the output impedance of the VRM, covering a range of frequencies below where the main decoupling capacitors are effective. The driver current model produces a linearly increasing current with time.  The current waveform is based on a pseudo-random bit sequence so that a variety of frequencies are covered.

Finally, we need to measure the jitter produced by the varying driver current.  We will use the VHDL-AMS behavioral language to create a model that can measure the jitter between the driver current waveform and the resulting waveform produced at the output of the PDN.  The model will keep track of the largest jitter, as well as the generated voltage noise, and report that in the output waveforms.

Interpreting PDN Performance

Once the testbench has been created, it is easy to substitute various PDN models and then quickly determine how much jitter each PDN implementation introduces. You can add or remove decoupling capacitors and see what the impact would be. You can also experiment with different capacitor values to see which combination is best.

One of the challenges with setting up the simulation is determining what the data rate and edge rate should be when the stimulus is directly connected to the PDN. In the real design, the IC has additional decoupling due to the package and die capacitance. We could add that to our model, but that information is often hard to come by. As a compromise, we will assume that for our DDR4 power net example (1.2 V), the edge rate is slowed to 200 ps by the package and die. For the data rate we will use 4 ns per UI, which corresponds to a 125 MHz Nyquist frequency. This is near the upper frequency limit in which we expect the PDN to be effective. The PRBS stimulus will then produce 4 ns transitions and many sub-harmonics, stimulating the PDN at a variety of frequencies.

Figure 1 shows the maximum jitter (maxJitter), the skew per edge transition, and the PRBS9 data pattern. After about 1.0 µs, the jitter does not increase significantly for the applied data pattern. The maximum jitter is shown to be about 6.6 ps for both the rising and falling edges.

Figure 1 – Accumulated jitter from PRBS9 data pattern.

We can also display the noise generated at the BGA pins (blue) caused by the stimulus (red), shown in Figure 2.

Figure 2 – Noise generated at BGA pins because of 1V stimulus pattern.

We can now use this technique to compare multiple PDNs to see how they perform. First, we extract the frequency domain models for three decoupling configurations and look at the Z-parameter (impedance versus frequency) plots, as shown in Figure 3. The green plot is the actual decoupling used on the 0.85 V power in a working design. The blue plot is the impedance with all the 100 µF and 4.7 µF caps removed, and the red plot is the optimized impedance profile, which has a higher impedance magnitude but a smoother and flatter impedance profile.

Figure 3 – Three example PDNs plotted as impedance versus frequency.

In Figure 4, we compare the jitter produced by the three different PDNs, and we see that the lowest jitter (5.6 ps) comes from the optimized PDN (red), which has the flattest impedance curve. The next lowest jitter (7.5 ps) is from the actual PDN as designed (green). When we remove some capacitors but keep a similar profile, the impedance and the jitter (8.5 ps) both go up (blue).

Looking at the noise amplitude as a percentage of the signal (maxv_percent), we see a direct correlation between the impedance and the noise induced by the PDN, as expected. If we look at impedance as the only quality metric for the PDN, we might conclude that the lowest impedance PDN has the best performance. However, we see that while the noise amplitude is lower, the jitter is higher.

The PDN was optimized by selecting capacitors that just met a flat impedance profile. This flatter profile also has a more consistent phase shift over the frequency range, so the edges for all data transitions tend to be more aligned and thus produce less jitter.

You may have wondered whether a flat impedance profile is better than a so-called “deep V” profile, where most of the caps have the same value. In this case, it appears that the flatter profile produces better jitter performance, which may be an important consideration for output data switching signals.

Figure 4 – Maximum Jitter and noise voltage noise percent on three example PDNs.

So, the next time you are thinking about how robust your PDN is, consider how well it can supply current at the right time across all frequencies. PDN induced jitter is another factor that can limit a high-speed design’s performance.

Please download the full white paper, Evaluating a PDN based on jitter, to learn more about this methodology and to see how adding capacitors can sometimes create even more jitter.

Bruce Caryl is a Product Specialist focused on board analysis products, including signal and power integrity, and mixed signal analysis. Prior to this role, he worked in product and technical marketing, consulting, and applications engineering. Bruce has authored several white papers on analysis techniques based on the needs of his customers. He began his career as a design engineer where he did ASIC and board design.

Also Read:

DAC News – A New Era of Electronic Design Begins with Siemens EDA AI

Podcast EP293: 3DIC Progress and What’s Coming at DAC with Dr. John Ferguson and Kevin Rinebold of Siemens EDA

Siemens EDA Outlines Strategic Direction for an AI-Powered, Software-Defined, Silicon-Enabled Future


Facing the Quantum Nature of EUV Lithography

Facing the Quantum Nature of EUV Lithography
by Fred Chen on 06-29-2025 at 8:00 am

Absorbed Photons Exposing EUV

The topics of stochastics and blur in EUV lithography has been examined by myself for quite some time now [1,2], but I am happy to see that others are pursuing this direction seriously as well [3]. As advanced node half-pitch dimensions approach 10 nm and smaller, the size of molecules in the resist become impossible to ignore for adequate modeling [3,4]. In other words, EUV lithography must face its quantum nature.

Table 1 compares edge dose fluctuations for key cases for DUV, Low-NA EUV, and High-NA EUV lithography [5]. While for a standard dose of 30 mJ/cm2, DUV shows no significant dose fluctuations down to the smallest practical half-pitch, even at 60 mJ/cm2, EUV shows much greater than 50% fluctuation (3 sigma). Such a large dose fluctuation is expected to result in severe edge placement error, leading to roughness, linewidth errors, and/or feature placement errors. The key aggravating factors are: (1) an EUV photon has ~14 times the energy as a DUV photon, so that the photon density is already much less even at double the dose; (2) resist thickness scales with pitch, to avoid large aspect ratios for patterning, leading to reduced absorption; (3) EUV resists targeted for higher resolution would have smaller molecular sizes, leading to smaller photon collection area.

Thanks for reading Exposing EUV! Subscribe for free to receive new posts and support my work.

Table 1. Half-pitch edge dose fluctuations within a molecular pixel for DUV, Low-NA EUV, and High-NA EUV. An incident dose of 30 mJ/cm2 is assumed for DUV, 60 mJ/cm2 for EUV.

The photon absorption is not the final step in the resist exposure. An EUV photon will produce a photoelectron which then proceeds to migrate and produce more electrons, known as secondary electrons [2]. These electrons can in fact migrate distances greater than the molecular size [3]. As a result, the reaction of a molecule at a given location can be affected by the electrons resulting from absorption of photons at different locations, perhaps even several nanometers away [1].

Thus, the modeling needs to be addressed in stages. First, the absorption of EUV photons within the size of a molecule (~ 2nm [3,4]) needs to be addressed. Then, the effect of EUV absorption at different locations producing (random numbers of) secondary electrons all of which affect a given exposure location must also be taken into account. For chemically amplified resists, the acid blur should also be included for comprehensive modeling.

As a reference case, we will examine the 40 nm pitch (20 nm L/S binary grating) with a 0.33 NA EUV system. I’ll assume a chemically amplified resist with 40 nm thickness and absorption coefficient 5/um. Figure 1 shows the photon absorption density plot (top view), using 2 nm as the molecular pixel size, also representing the molecular size for the EUV resist. A 60 mJ/cm2 incident dose is assumed, which would result in 11 mJ/cm2 absorbed, averaged over the pitch. A threshold of 31 absorbed photons/nm2 would nominally correspond to the half-pitch linewidth.

Figure 1. Absorbed photon density in 40 nm thick EUV resist with absorption coefficient 5/um and incident dose 60 mJ/cm2 (averaged over pitch) for imaging a 20 nm L/S binary grating. The pixel size is 2 nm.

Each photon is expected to produce up to 9 electrons [1], but can also be less. These electrons are not produced all at once but entail some electron migration. Thus, the electron blur parameter is often used to characterize this phenomenon. Contrary to the common assumption, we should not expect this to be uniform throughout the resist [1,6] due to the density inhomogeneity of the resist itself. Thus, a random number generator can be used to simulate the local electron blur parameter (Figure 2).

Figure 2. Electron blur is generated as a random number to represent local variation, due to natural material inhomogeneity.

The blur parameter actually describes the probability of finding an electron that has migrated a given distance. For the current example, we use the probability function shape shown in Figure 3, resulting from the difference of two exponential functions [7]. By convolving the local electron blur probability function with the absorbed photon density, then multiplying by the electron yield (taken to be a random number between 5 and 9), we can see the expected migrated electron density (Figure 4). Owing to the extra randomness of the electron yield added to the Poisson noise from photon absorption, the electron density plot includes enhanced non-uniformity.

Figure 3. The shapes for the electron blur distance probability functions used in the modeling for this article. The shapes are generated from the difference of two exponential functions, one with the labeled long-range decay length (1-5 nm), and one with 0.2 nm decay length, so that the probability is zero at zero distance.

Figure 4. The migrated electron density is obtained by convolving the absorbed photon density of Figure 1 with the local electron blur probability functions (up to 8 nearest neighbor pixels) from Figures 2 and 3, then multiplying by the local random electron yield.

Since the resist is a chemically amplified resist, acids are produced by the electrons. These acids finally render resist molecules dissolvable in developer; this is known as deprotection [8]. The acids also diffuse, with the corresponding acid blur probability function taken to be a Gaussian. Figure 5 shows the final acid density plot after convolving a sigma=2.5 nm Gaussian acid blur probability function with the electron density. There is a smoothing affecting from the acid blur, so some of the noise from the electron density plot seems to have diminished. However, there is still density variation that remains, since the seed of the acid generation is still random, i.e., the local electron density. Thus, the edge placement error is easily 10% of the linewidth.

Figure 5. The acid density is obtained by convolving a 2.5 nm sigma Gaussian acid blur probability function with the migrated electron density of Figure 4 (in a 5 x 5 pixel area).

Smoothing with a larger sigma Gaussian would lead to further diminishing of the acid deprotection gradient; this would actually increase sensitivity to the level of deprotection, i.e., a smaller change in dose could wipe out the whole feature (Figure 6).

Figure 6. Smoothing by Gaussian blur with a larger sigma results in a less steep deprotection gradient (orange), which is more susceptible to wiping out features with dose changes. A smaller sigma would be less susceptible (blue).

The stochastic edge fluctuations begin to consume the whole feature as the exposed or unexposed linewidth shrinks (Figure 7). Essentially, the full-fledged stochastic defectivity shows up.

Figure 7. (Left) 10 nm exposed line on 40 nm pitch; (right) 10 nm unexposed line on 40 nm pitch, under same resist exposure conditions as Figure 1.

This level of edge fluctuation is a specific feature of EUV. The fundamental way to counteract this edge dose fluctuation is to increase the dose. To reduce it 10x, the dose needs to be increased 100x. Referring back to Table 1, to get the 3 sigma fluctuation down to 7%, the dose needs to be 6000 mJ/cm2! That is clearly not feasible.

An alternative is to use double patterning, since doubling the pitch would enable 4-beam imaging, which can have higher normalized image log slope (NILS) than 2-beam imaging [9,10]. The dose would not have to be elevated as much. On the other hand, the 80 nm pitch exposure for double patterning is done more cost-effectively and quickly by DUV instead of EUV.

Moreover, 20 nm linewidths on 80 nm pitch with DUV 2-beam imaging at 30 mJ/cm2 look slightly smoother than optimized EUV 4-beam imaging at 60 mJ/cm2 (Figure 8). Besides the higher absorbed photon density per molecular pixel, there is no random electron yield component for DUV. Thus, DUV avoided the “perfect storm” that befell EUV [11], as it met its classical optical resolution limit before even approaching the molecular quantum limit.

Figure 8. (Top left) 20 nm DUV exposed line on 80 nm pitch; (top right) 20 nm EUV exposed line on 80 nm pitch; (bottom left) 20 nm DUV unexposed line on 80 nm pitch; (bottom right) 20 nm EUV unexposed line on 80 nm pitch. The DUV (3/um) and EUV (5/um) chemically amplified resist thicknesses are both 40 nm.

Thanks for reading Exposing EUV! Subscribe for free to receive new posts and support my work.

References:

[1] F. Chen, Impact of Varying Electron Blur and Yield on Stochastic Fluctuations in EUV Resist;

[2] F. Chen, Stochastic Effects Blur the Resolution Limit of EUV Lithography;

[3] H. Fukuda, “Statistics of EUV exposed nanopatterns: Photons to molecular dissolutions.” J. Appl. Phys. 137, 204902 (2025), and references therein; https://doi.org/10.1063/5.0254984.

[4] M. M. Sung et al., “Vertically tailored hybrid multilayer EUV photoresist with vertical molecular wire structure,” Proc. SPIE PC12953, PC129530K (2024).

[5] F. Chen, Stochastic EUV Resist Exposure at Molecular Scale

[6] G. Denbeaux et al., “Understanding EUV resist stochastic effects through surface roughness measurements,” IEUVI Resist TWG meeting, February 23, 2020.

[7] F. Chen, A Realistic Electron Blur Function Shape for EUV Resist Modeling

[8] S. H. Kang et al., “Effect of copolymer composition on acid-catalyzed deprotection reaction kinetics in model photoresists,” Polymer 47, 6293-6302 (2006); doi:10.1016/j.polymer.2006.07.003.

[9] C. Zahlten et al., “EUV optics portfolio extended: first high-NA systems delivered and showing excellent imaging results,” Proc. SPIE 13424, 134240Z (2025).

[10] F. Chen, High-NA Hard Sell: EUV Multipatterning Practices Revealed, Depth of Focus Not Mentioned

[11] F. Chen, A Perfect Storm for EUV Lithography


Podcast EP294: An Overview of the Momentum and Breadth of the RISC-V Movement with Andrea Gallo

Podcast EP294: An Overview of the Momentum and Breadth of the RISC-V Movement with Andrea Gallo
by Daniel Nenni on 06-27-2025 at 10:00 am

Dan is joined by Andrea Gallo, CEO of RISC-V International, the non-profit home of the RISC-V instruction set architecture standard, related specifications, and stakeholder community. Prior to joining RISC-V International, Gallo worked in leadership roles at Linaro for over a decade. He built Linaro’s server engineering team from the ground up. He later managed the Linaro Datacenter and Cloud, Home, Mobile, Networking, loT and Embedded Segment Groups and underlying open source collaborative projects, in addition to driving the company’s membership acquisition strategy as vice president of business development.

Andrea describes the current focus of RISC-V International and where he sees the standard having impact going forward. He details some recent events that have showcased the impact of RISC-V across many markets and applications across the world. Andrea describes the substantial impact RISC-V is having in mainstream and emerging markets. AI is clearly a key part of this. So is automotive and high-performance computing. He explains that some mainstream AI applications shipped over 1 billion RISC-V cores. The growth of the movement and the breadth of its application are quite impressive.

Contact RISC-V

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.