WP_Term Object
(
    [term_id] => 51
    [name] => RISC-V
    [slug] => risc-v
    [term_group] => 0
    [term_taxonomy_id] => 51
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 142
    [filter] => raw
    [cat_ID] => 51
    [category_count] => 142
    [category_description] => 
    [cat_name] => RISC-V
    [category_nicename] => risc-v
    [category_parent] => 178
)
            
SemiWiki Podcast Banner
WP_Term Object
(
    [term_id] => 51
    [name] => RISC-V
    [slug] => risc-v
    [term_group] => 0
    [term_taxonomy_id] => 51
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 142
    [filter] => raw
    [cat_ID] => 51
    [category_count] => 142
    [category_description] => 
    [cat_name] => RISC-V
    [category_nicename] => risc-v
    [category_parent] => 178
)

Beyond Von Neumann: Toward a Unified Deterministic Architecture

Beyond Von Neumann: Toward a Unified Deterministic Architecture
by Admin on 09-04-2025 at 6:00 am

By Thang Tran

For more than half a century, the foundations of computing have stood on a single architecture: the Von Neumann or Harvard model. Nearly all modern chips—CPUs, GPUs, and even many specialized accelerators—rely on some variant of this design. Over time, the industry has layered on complexity and specialization to keep up with new demands. Very Long Instruction Word (VLIW) architectures, dataflow chips, and GPUs were each introduced as point solutions to specific bottlenecks, but none offered a holistic alternative. Until now.

Beyond Von Neumann: Toward a Unified Deterministic Architecture

Simplex Micro has developed what may be the most significant departure from the traditional paradigm in over half a century—enabling unified scalar, vector, and matrix compute in a single deterministic pipeline. At its core is a revolutionary concept: Predictive Execution. Unlike dynamic execution—which guesses what will happen next—Predictive Execution statically schedules each operation with cycle-level precision, transforming the processor into a deterministic machine with a known execution timeline. This allows a single chip architecture to handle both general-purpose tasks and high-throughput AI workloads without the need for separate accelerators.

The End of Guesswork

Deterministic execution eliminates the inefficiencies and vulnerabilities of dynamic execution. Instead of dynamically dispatching instructions and rolling back incorrect paths, Predictive Execution ensures that every instruction is issued at exactly the right time with the right resources. It’s not just more efficient—it’s predictable, scalable, and inherently more secure.

The breakthrough lies in what Simplex calls the Time-Resource Matrix: a novel patented scheduling mechanism that allocates compute, memory, and control resources across time. Each instruction has a designated time slot and access window, ensuring zero-overlap and eliminating pipeline stalls. Think of it as a train schedule—except the trains are scalar, vector, and matrix operations moving across a synchronized compute fabric.

A Unified Architecture

This innovation allows a single processor to act as both CPU and accelerator, with no switching overhead, no mismatched memory hierarchies, and no need for costly data transfers between heterogeneous units. It’s a general-purpose architecture that matches—and in many cases exceeds—the performance of dedicated AI engines.

This unification is made possible through a suite of patented innovations shown to have no direct relevant prior art. The Time-Resource Matrix provides the foundational execution schedule, but other breakthroughs extend it into new domains: Phantom Registers allow the system to pipeline instructions beyond physical register file limits, Vector Data Buffers, and Extended Vector Register Sets enable seamless scaling of parallel compute for AI operations. Instruction Replay Buffers ensure that even variable-latency memory or branch events can be resolved predictably without dynamic execution guesswork.

Together, these inventions form the building blocks of a compute engine that behaves like a CPU in its flexibility but delivers the sustained throughput of an accelerator—without needing two separate chips. Whether it’s matrix-heavy AI inference or control-heavy real-time decision making, the same processor handles both efficiently, synchronously, and without architectural switching. This represents not just an evolution, but a true reinvention of general-purpose computing.

Real-World Relevance

As AI workloads grow in size and complexity, conventional architectures are buckling under the weight. GPUs require massive power budgets and still struggle with memory bottlenecks. CPUs lack the parallelism needed for modern inference and training tasks. Meanwhile, multi-chip solutions suffer from latency, synchronization, and software fragmentation.

With Predictive Execution, Simplex delivers a unified, time-driven machine that’s already being implemented in RISC-V Vector processors and is ready for integration in next-generation AI infrastructure. It’s not a theoretical concept—it’s working RTL with simulation results, on track to run production code.

For chip architects, this means simpler system design with lower silicon footprint. For software developers, it means programming a unified, predictable target with consistent timing behavior—ideal for safety-critical and performance-sensitive applications alike.

The Path Forward

This is not just a performance story. It’s a return to architectural elegance, where one chip can serve many roles without compromise. By eliminating dynamic execution and grounding execution in deterministic time windows, Simplex has created a platform that can scale with the needs of future AI, edge computing, and cloud applications.

As we enter a new era of AI-driven applications, the need for scalable, unified, and deterministic compute has never been greater. Predictive Execution lays the foundation—and we invite the research and engineering community to build on it.

Selected Patents

  1. US-11829762-B2 – Time-resource matrix for a microprocessor with time counter
  2. US 12,001,848 B2 – Phantom Registers for Pipelined Vector Execution
  3. US 12,282,772 B2 – Vector Data Buffer for Scalable Parallel Processing
  4. US 12,124,849 B2 – Extended Vector Register Architecture for AI Compute
  5. US 12,190,116 B2 – Instruction Replay Buffer with Deterministic Scheduling

About the Author

Dr. Thang Minh Tran is CTO and Founder of Simplex Micro and UT Austin Alumni. He is an inventor of over 180 issued patents in microprocessor design, including pioneering work at AMD, Texas Instruments, Freescale, Analog Devices, and Andes Technology, where he created the first RISC-V with Vector extensions now designed into Meta’s MTIA chip.

Also Read:

Basilisk at Hot Chips 2025 Presented Ominous Challenge to IP/EDA Status Quo

Can RISC-V Help Recast the DPU Race?

What XiangShan Got Right—And What It Didn’t Dare Try

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.