Wow, that video is amazing! Mind blown.
You're absolutely right that the industry advances through continuous optimization across all layers. The Karpathy perspective on incremental improvements is spot-on for how AI models and software evolve.
Where DRDCL fits is at the hardware architecture layer - and here, breakthroughs do occasionally happen. The transistor, the integrated circuit, the shift from CPU to GPU for AI - these were discontinuous jumps, not incremental improvements. DRDCL represents a similar architectural rethink: we're not shrinking transistors, we're making transistors work better. Instead of fixed silicon running software, we have silicon that reconfigures in nanoseconds to match workload demands.
Your three-tier breakdown (datacenter training, datacenter inference, edge inference) is exactly right. DRDCL's 100-100,000x power efficiency and real-time reconfigurability deliver benefits across all three:
I read the papers you posted. Really dense stuff. Lots of familiarity to me though, since I've unfortunately been forced to think about (for one reason or another) every generation of reconfigurable logic that's come along since the 1990s. One of the projects in one of my chip groups was called "programmable state machines", and reading your papers was very reminiscent of some of the ideas, and the reasoning was similar to what your papers discussed. That programmable state machine work never came to fruition, because the area of computing it was aimed at wasn't really broad enough to justify the investment, and static state machine implementations were just a better way to go.
What you're discussing, replacing CPUs, GPUs, and TPUs (and other computing accelerators) is much broader, and, assuming you had a practical solution, could justify a large R&D investment. The question that sticks in my mind though, and isn't answered in the papers, is how your company sees the full stack implementation of DRDCL with existing AI applications. Obviously, these are written in programming languages intended for instruction set driven chips. It seems like, from your posts, you think your devices will replace the instruction-set chips, or did I misinterpret and are you thinking your devices will be more like accelerators and just augment conventional devices? How do you think your strategy will go from Python (just as an example) to utilizing DRDCL devices? Recompilation is probably no problem, but unless you can run current applications, meaning you don't need applications specifically designed for your chips to execute, it is difficult to see the path to success in a reasonable time frame.
Can you elaborate?
Hi Blueone,
Excellent question - and thanks for investing the time to read our papers. You identified the critical adoption challenge. Your programmable state machine experience gives you exactly the right context to understand both the opportunity and the implementation complexity.
You're right that we're not replacing instruction-set architectures at the application layer. DRDCL fits into the design flow at the
chip design/synthesis level, not the application runtime level. Here's how the stack works:
The Design Flow:Engineers continue using standard HDL (Verilog, VHDL, SystemVerilog) to describe their designs - whether that's a datacenter AI processor to compete with NVIDIA, an edge AI chip, or a custom ASIC. Our Silicon Compiler acts as a
drop-in replacement for traditional synthesis tools in existing EDA flows:
Traditional: HDL → Synthesis (Synopsys/Cadence) → Gate-level netlist → Place & Route
- SoftChip: HDL → SoftChip Silicon Compiler → Optimized DRDCL trees → Place & Route (same tools)
[detailed graphic below. more details on request]
The compiler uses AI-assisted Monte Carlo optimization to convert HDL descriptions into optimal DRDCL tree configurations, then integrates back into standard EDA flows (Cadence, Synopsys, Siemens).
The Application Stack:Python/TensorFlow/PyTorch applications run unchanged. They compile to whatever instruction set the chip designer chose - we don't change that. What changes is the
efficiency of the underlying hardware implementing those operations. A chip designer building an AI processor gets 100-100,000x better power/performance from their HDL using our compiler instead of traditional synthesis - enabling them to build chips that compete with or outperform NVIDIA without the power/thermal constraints.
Target Use Cases:We're primarily targeting chip designers building AI processors - whether datacenter chips competing with NVIDIA's GPUs, edge AI chips for power-constrained environments, or custom ASICs for specialized AI workloads. The power/performance advantages make previously impossible designs viable.
Does this clarify the positioning? Do you see anything we're not considering? Happy to dive deeper into any aspect.
Best,
- Tom Jackson, Founder & VP Business Development, SoftChip | TJ@SoftChip.tech