Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/how-much-more-progress-can-be-made-in-ai-hardware.10263/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

How much more progress can be made in AI hardware?

tvural

New member
[FONT=&quot]I often hear the assumption that even though progress in CPUs seems to be plateauing, we will still have exponential progress in GPUs and ASICs for a long time. Even Elon Musk came out and said progress in AI hardware is exponential.[/FONT]
[FONT=&quot]
[/FONT]

[FONT=&quot]On the other hand, it seems that the exponential progress in parallel chips will stop once the transistor sizes catch up with CPUs. For a given transistor density you can't increase the clock speed beyond a certain point, and we probably won't go much beyond 5nm processes - this is why CPU speeds have stagnated. After that, every extra parallel core or unit of computation increases both the cost and power of the chip linearly.[/FONT]
[FONT=&quot]
[/FONT]

[FONT=&quot]ASICs have already been pretty well-optimized for AI applications. Google's TPU specifically optimizes matrix multiplication, which is where almost all of the computation in training a neural network goes. There may be even more clever tricks to implement, but it's tough to say that these would result in more than a 5x improvement or so.[/FONT]
[FONT=&quot]
[/FONT]

[FONT=&quot]There may be more work to be done in the design of the FETs or 3d layering, but these will not have radical or exponential effects on performance.[/FONT]
[FONT=&quot]
[/FONT]

[FONT=&quot]Quantum computing won't be better for matrix multiplication. Optical computing is limited to clock speeds of 10s of GHz due to dispersion of spectral light pulses, which is not much better than what can be done with silicon.[/FONT]
[FONT=&quot]
[/FONT]

[FONT=&quot]It also seems that many of the remaining improvements won't multiply together. If two new architectures come out that are 5x faster than previous architectures, they probably can't just be combined for a 25x improvement. A new FET that's more power efficient might not be suitable for 3d layering. In general it seems that further progress in computations per second per dollar will be slow and difficult rather than exponential.[/FONT]
 
AI architectures (particularly for neural nets) are still evolving and, at least in some cases, don't look much like conventional architectures. For examples these can be grids of small processing elements, each like a small processor but how they are connected is quite different from conventional multi-processing. Also caching strategies are quite different and can dramatically affect real throughput, as can 3D stacking memory on top of the NN compute layer. See eg discussion and references in this. QC is still far from delivering on basic computing at scale. NNs are way outside its practical capabilities.

All told, I wouldn't write off big improvements yet, progress is much more around architectures and 3D potential rather than process.
 
Algorithms are still evolving. The machine learning currently uses global calculations with a lot of data exchanges in order to create a finished structure which is highly local with well defined data pathways. This suggests there is plenty of potential to figure out how to make the learning algorithms more suitable for partitioning into local optimization with loose interconnection, probably how living systems do it. That would free us up to have more efficient parallelism and locality. Moving data costs a lot more energy than the computations, and algorithm improvements will likely have us doing a lot less of both.

Things to look forward to in algorithms include commonality between training and inferencing (you train to create an NN on one kind of machine like a GPU, you generally execute them on something different more resembling a DSP somewhere like your phone) so that incremental learning in the execution device is possible. And training using much less data, there are many cases where humans can learn from small sets of data and we will make progress with machines that can do that too. Then simply taking our existing processes and making them more aggressively parallel so we might use the same amount of computation but finish in minutes instead of hours or hours instead of days, that can make ML more useful.

Optical maybe cannot increase computation beyond 10s of GHz but it can scale interconnects in distance at reasonable power, and there are architectures for passive switching that could aid the construction of massively parallel machines broadcasting updates on low latency. Plus the increasing use of both 3D and 2.5D packaging can increase throughput over wires while decreasing power per bit enormously, simply by shortening wires.

Overall I would bet we get to at least 1,000x where we are today, in about 10 years. This kind of thing has happened before: the MIP solve (mixed integer programming) industry gained about 1 million times over 15 years from 1990 to 2005, roughly equal contributions of 1,000x due to algorithm and to hardware. Indeed, that set the stage for the solvers used in ML.
 
Back
Top