Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/where-are-the-semiconductor-breakthroughs-for-ai.23967/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030770
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Where are the semiconductor breakthroughs for AI?

milesgehm

Active member
It seems like the current attitude is: Apple needs a trillion-parameter model so Daniel can plan his vacation. Apple will buy 0.666 billion Nvidia chips. That requires more power than the Hoover Damn (sic). Solution: buy the chips, build 2 nuclear power plants, charge the people more for the electricity they need to cool/heat their homes. Daniel gets to go on a vacation that is better.

Where are the breakthroughs in architecure and chip design that avoid this disaster?
 


The breakthrough you're asking about exists, and it starts with a radically different approach: what if we could do the same AI computations with 100-100,000x fewer transistors? Dynamic Reconfigurable Data Center Logic (DRDCL), developed by SoftChip, does exactly that by fundamentally rethinking how silicon is utilized. Current AI chips waste enormous resources because they're built as fixed-architecture processors. A typical solution might use 3,200+ transistors where DRDCL uses just 38 - and those 38 transistors can dynamically reconfigure in nanoseconds to perform thousands of different operations. This transistor efficiency translates directly to 100-100,000x power efficiency improvements and 99% power reduction at the datacenter level. The real-world impact: instead of needing two nuclear power plants for Apple's AI infrastructure, DRDCL could deliver equivalent performance using a fraction of the chips and 1MW instead of 100MW.

This isn't theoretical - it's mathematically proven (original paper, addendum) and we're raising capital to develop the silicon compiler that will integrate DRDCL seamlessly into existing chip design workflows. Today's AI infrastructure runs at catastrophic 5-25% utilization for inference workloads - meaning 75-95% of the silicon Daniel's vacation planner uses is sitting completely idle, burning power for nothing. DRDCL's architecture can push utilization to 85-95% while using orders of magnitude less power per computation. We don't need to choose between AI services and heating people's homes - we need architectures that aren't burning $400 billion worth of silicon doing nothing. The mathematical proofs are published. The architecture is patent-pending. The industry needs this now.

- Tom Jackson, Founder & VP Business Development, SoftChip
TJ@SoftChip.tech
 
This isn't theoretical - it's mathematically proven (original paper, addendum) and we're raising capital to develop the silicon compiler that will integrate DRDCL seamlessly into existing chip design workflows.
I respect the writeup and the work your team is doing. This looks honestly exciting, and I'm glad (as a consumer and enthusiast) to see someone is tackling this from a different angle. (Of course - with 1,000x greater power efficiency, software developers can always invent new ways to need 1,000x more energy to achieve the same result -- see Python vs. C++/Assembler in CPU history).

However, I would caution that this efficiency gain is not proven until there's working silicon and independent benchmarks have proven the results.

P.S. What's doubly exciting is - if this does work as stated, you won't even need 'the most advanced node' to (significantly) exceed the performance of GPUs. That could help with cost and availabiltiy of products tremendously. Good luck with this.

Respectfully,
John
 
A typical solution might use 3,200+ transistors where DRDCL uses just 38 - and those 38 transistors can dynamically reconfigure in nanoseconds to perform thousands of different operations.
Just a dumb question - your technology could work for replacing logic, but the big challenge for AI isn't the logic. It's really the architecture that links computation to massive amounts of memory and the ability to map and execute models to the combination of compute and memory. From your paper, you don't do anything for memory. Are you suggesting static memory within in your chips ?
 
Apple needs a trillion-parameter model so Daniel can plan his vacation. Apple will buy 0.666 billion Nvidia chips. That requires more power than the Hoover Damn (sic). Solution: buy the chips, build 2 nuclear power plants, charge the people more for the electricity they need to cool/heat their homes. Daniel gets to go on a vacation that is better.
I think you're a little off on your thinking - there are really three different pieces to the puzzle.
* Data Center training
* Data Center inference using frontier models
* Edge / client inference

Hopefully Apple and Daniel use mostly edge / client inference for most tasks - there are many companies including Apple that are doing all kinds of innovation to build better hardware and optimized models for client-side usage. Data training and inference is much more focused on rack level optimization for tuning hardware and very large models. And like the semiconductor industry, or CPU industry, most of the improvements aren't breakthroughs, but a slew of steady optimizations across all aspects. Watch this bit from Andrej Karpathy, one of co-founders of OpenAI. The whole 2 1/2 hours are totally enlightening, but this bit deals with the evolution of AI - no breakthroughs, just improvements all over.

 
John,

Thank you for the thoughtful response - you're absolutely right on both counts.

On the software efficiency paradox: you're describing Jevons paradox perfectly - efficiency gains often just enable more waste. That said, orders of magnitude improvements aren't just incremental gains, they're revolutionary shifts. We're talking about the difference between needing nuclear power plants versus not. That headroom matters.

On proven silicon: completely agreed. The mathematical foundation is peer-reviewed and the architecture is patent-pending, but you're right that nothing substitutes for working silicon and independent benchmarks. That's exactly why we're seeking fabless partners who can take DRDCL from mathematical proof to production-ready IP. That's exactly why we're seeking fabless partners to bring DRDCL to production silicon.

Your P.S. hits on what excites us most: DRDCL's power efficiency, speed, and real-time reconfigurability can deliver competitive performance on mature manufacturing nodes. That's what changes the economics - revolutionary performance without the expensive fab dependency or multi-year lead times.

- Tom Jackson, Founder & VP Business Development, SoftChip | TJ@SoftChip.tech
 
Just a dumb question - your technology could work for replacing logic, but the big challenge for AI isn't the logic. It's really the architecture that links computation to massive amounts of memory and the ability to map and execute models to the combination of compute and memory. From your paper, you don't do anything for memory. Are you suggesting static memory within in your chips ?
Kevin,

You're right that the memory wall is a critical crisis - but it's not the only one. We're building nuclear power plants and $30B fabs because traditional architectures use 3,200+ transistors where DRDCL uses 38, AND they spend 70-80% of their energy and time moving data between compute and memory.

DRDCL attacks both problems:

Logic Efficiency: 100-100,000x fewer transistors for the same computations through nanosecond reconfiguration - the same 38 transistors perform thousands of different operations. Radically reducing chip area and power dissipation while increasing logic speed will be very helpful for AI, even as memory remains the dominant bottleneck.

Memory Bottleneck Solutions: Dynamic reconfiguration accomplished by the chip logic itself, combined with integrated short-term memory elements (microseconds) with negligible impact on chip density, opens up new architectural possibilities. The transistor efficiency creates headroom for on-chip I/O acceleration:

Integrated I/O acceleration - circuits dynamically optimize data movement patterns
Adaptive memory controllers that reconfigure based on access patterns
On-chip preprocessing for data compression/decompression to reduce bandwidth requirements
Intelligent caching that adapts to workload patterns in real-time

We're essentially flipping the ratio: instead of 70% data movement overhead and 30% computing, we're targeting 80% useful compute with 20% optimized I/O. This might even lead to completely new architectures that aren't possible with today's fixed silicon.

This is why we're focused on the silicon compiler first - the tooling needs to handle the logic reconfiguration and I/O optimization to fully exploit what DRDCL enables.

- Tom Jackson, Founder & VP Business Development, SoftChip | TJ@SoftChip.tech
 


The breakthrough you're asking about exists, and it starts with a radically different approach: what if we could do the same AI computations with 100-100,000x fewer transistors? Dynamic Reconfigurable Data Center Logic (DRDCL), developed by SoftChip, does exactly that by fundamentally rethinking how silicon is utilized. Current AI chips waste enormous resources because they're built as fixed-architecture processors. A typical solution might use 3,200+ transistors where DRDCL uses just 38 - and those 38 transistors can dynamically reconfigure in nanoseconds to perform thousands of different operations. This transistor efficiency translates directly to 100-100,000x power efficiency improvements and 99% power reduction at the datacenter level. The real-world impact: instead of needing two nuclear power plants for Apple's AI infrastructure, DRDCL could deliver equivalent performance using a fraction of the chips and 1MW instead of 100MW.

This isn't theoretical - it's mathematically proven (original paper, addendum) and we're raising capital to develop the silicon compiler that will integrate DRDCL seamlessly into existing chip design workflows. Today's AI infrastructure runs at catastrophic 5-25% utilization for inference workloads - meaning 75-95% of the silicon Daniel's vacation planner uses is sitting completely idle, burning power for nothing. DRDCL's architecture can push utilization to 85-95% while using orders of magnitude less power per computation. We don't need to choose between AI services and heating people's homes - we need architectures that aren't burning $400 billion worth of silicon doing nothing. The mathematical proofs are published. The architecture is patent-pending. The industry needs this now.

- Tom Jackson, Founder & VP Business Development, SoftChip
Kevin,

You're right that the memory wall is a critical crisis - but it's not the only one. We're building nuclear power plants and $30B fabs because traditional architectures use 3,200+ transistors where DRDCL uses 38, AND they spend 70-80% of their energy and time moving data between compute and memory.

DRDCL attacks both problems:

Logic Efficiency: 100-100,000x fewer transistors for the same computations through nanosecond reconfiguration - the same 38 transistors perform thousands of different operations. Radically reducing chip area and power dissipation while increasing logic speed will be very helpful for AI, even as memory remains the dominant bottleneck.

Memory Bottleneck Solutions: Dynamic reconfiguration accomplished by the chip logic itself, combined with integrated short-term memory elements (microseconds) with negligible impact on chip density, opens up new architectural possibilities. The transistor efficiency creates headroom for on-chip I/O acceleration:

Integrated I/O acceleration - circuits dynamically optimize data movement patterns
Adaptive memory controllers that reconfigure based on access patterns
On-chip preprocessing for data compression/decompression to reduce bandwidth requirements
Intelligent caching that adapts to workload patterns in real-time

We're essentially flipping the ratio: instead of 70% data movement overhead and 30% computing, we're targeting 80% useful compute with 20% optimized I/O. This might even lead to completely new architectures that aren't possible with today's fixed silicon.

This is why we're focused on the silicon compiler first - the tooling needs to handle the logic reconfiguration and I/O optimization to fully exploit what DRDCL enables.

- Tom Jackson, Founder & VP Business Development, SoftChip | TJ@SoftChip.tech

TJ@SoftChip.tech
I read the papers you posted. Really dense stuff. Lots of familiarity to me though, since I've unfortunately been forced to think about (for one reason or another) every generation of reconfigurable logic that's come along since the 1990s. One of the projects in one of my chip groups was called "programmable state machines", and reading your papers was very reminiscent of some of the ideas, and the reasoning was similar to what your papers discussed. That programmable state machine work never came to fruition, because the area of computing it was aimed at wasn't really broad enough to justify the investment, and static state machine implementations were just a better way to go.

What you're discussing, replacing CPUs, GPUs, and TPUs (and other computing accelerators) is much broader, and, assuming you had a practical solution, could justify a large R&D investment. The question that sticks in my mind though, and isn't answered in the papers, is how your company sees the full stack implementation of DRDCL with existing AI applications. Obviously, these are written in programming languages intended for instruction set driven chips. It seems like, from your posts, you think your devices will replace the instruction-set chips, or did I misinterpret and are you thinking your devices will be more like accelerators and just augment conventional devices? How do you think your strategy will go from Python (just as an example) to utilizing DRDCL devices? Recompilation is probably no problem, but unless you can run current applications, meaning you don't need applications specifically designed for your chips to execute, it is difficult to see the path to success in a reasonable time frame.

Can you elaborate?
 
Wow, that video is amazing! Mind blown.
You're absolutely right that the industry advances through continuous optimization across all layers. The Karpathy perspective on incremental improvements is spot-on for how AI models and software evolve.

Where DRDCL fits is at the hardware architecture layer - and here, breakthroughs do occasionally happen. The transistor, the integrated circuit, the shift from CPU to GPU for AI - these were discontinuous jumps, not incremental improvements. DRDCL represents a similar architectural rethink: we're not shrinking transistors, we're making transistors work better. Instead of fixed silicon running software, we have silicon that reconfigures in nanoseconds to match workload demands.

Your three-tier breakdown (datacenter training, datacenter inference, edge inference) is exactly right. DRDCL's 100-100,000x power efficiency and real-time reconfigurability deliver benefits across all three:
I read the papers you posted. Really dense stuff. Lots of familiarity to me though, since I've unfortunately been forced to think about (for one reason or another) every generation of reconfigurable logic that's come along since the 1990s. One of the projects in one of my chip groups was called "programmable state machines", and reading your papers was very reminiscent of some of the ideas, and the reasoning was similar to what your papers discussed. That programmable state machine work never came to fruition, because the area of computing it was aimed at wasn't really broad enough to justify the investment, and static state machine implementations were just a better way to go.

What you're discussing, replacing CPUs, GPUs, and TPUs (and other computing accelerators) is much broader, and, assuming you had a practical solution, could justify a large R&D investment. The question that sticks in my mind though, and isn't answered in the papers, is how your company sees the full stack implementation of DRDCL with existing AI applications. Obviously, these are written in programming languages intended for instruction set driven chips. It seems like, from your posts, you think your devices will replace the instruction-set chips, or did I misinterpret and are you thinking your devices will be more like accelerators and just augment conventional devices? How do you think your strategy will go from Python (just as an example) to utilizing DRDCL devices? Recompilation is probably no problem, but unless you can run current applications, meaning you don't need applications specifically designed for your chips to execute, it is difficult to see the path to success in a reasonable time frame.

Can you elaborate?
Hi Blueone,

Excellent question - and thanks for investing the time to read our papers. You identified the critical adoption challenge. Your programmable state machine experience gives you exactly the right context to understand both the opportunity and the implementation complexity.

You're right that we're not replacing instruction-set architectures at the application layer. DRDCL fits into the design flow at the chip design/synthesis level, not the application runtime level. Here's how the stack works:

The Design Flow:Engineers continue using standard HDL (Verilog, VHDL, SystemVerilog) to describe their designs - whether that's a datacenter AI processor to compete with NVIDIA, an edge AI chip, or a custom ASIC. Our Silicon Compiler acts as a drop-in replacement for traditional synthesis tools in existing EDA flows:

Traditional: HDL → Synthesis (Synopsys/Cadence) → Gate-level netlist → Place & Route
  • SoftChip: HDL → SoftChip Silicon Compiler → Optimized DRDCL trees → Place & Route (same tools)
[detailed graphic below. more details on request]

The compiler uses AI-assisted Monte Carlo optimization to convert HDL descriptions into optimal DRDCL tree configurations, then integrates back into standard EDA flows (Cadence, Synopsys, Siemens).

The Application Stack:Python/TensorFlow/PyTorch applications run unchanged. They compile to whatever instruction set the chip designer chose - we don't change that. What changes is the efficiency of the underlying hardware implementing those operations. A chip designer building an AI processor gets 100-100,000x better power/performance from their HDL using our compiler instead of traditional synthesis - enabling them to build chips that compete with or outperform NVIDIA without the power/thermal constraints.

Target Use Cases:We're primarily targeting chip designers building AI processors - whether datacenter chips competing with NVIDIA's GPUs, edge AI chips for power-constrained environments, or custom ASICs for specialized AI workloads. The power/performance advantages make previously impossible designs viable.

Does this clarify the positioning? Do you see anything we're not considering? Happy to dive deeper into any aspect.
Best,

- Tom Jackson, Founder & VP Business Development, SoftChip | TJ@SoftChip.tech
 

Attachments

Back
Top