Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/index.php?threads/what-happens-when-shrink-ends.17649/page-2
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021370
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

What Happens When Shrink Ends?

It has already happened, you are just looking in the wrong place. GPUs and now ML and Inferencing, video conversion, signal processing, sensory fusion - if you look at an A15 at least 3/4 of the area is occupied by highly parallel data flow accelerators.
The A15 is a perfect example of being locked into a dead-end design path. With less than 20% improvements over the A14, the A15 added 35-40% more transistor density. Much of the power and efficiency come from TSMC's N5P node improvements. When improvements take this kind of effort, it may be time to start looking at new approaches.

Still - it is super impressive that there are still continuous improvements in semiconductor design. At some point, there will be an invention that will introduce a shock an industry and reorder everything. History has shown that this occurs in every industry. Not really a matter of 'if' but 'when'. I am simply stating that I think it will be sooner than later...

Innovations rarely replace a well established incumbent. Mostly they simply run around it and leave it where it was, in the rear view mirror.
Agreed. An "invention" that breaks out of the current design path is very different. If a company could create the A15 with 1/10th the complexity, this would be very hard to simply run around. My contention is that this is coming.
 
The A15 is a perfect example of being locked into a dead-end design path. With less than 20% improvements over the A14, the A15 added 35-40% more transistor density. Much of the power and efficiency come from TSMC's N5P node improvements. When improvements take this kind of effort, it may be time to start looking at new approaches.
What are you on about?! They are on the same node (so 0 density improvement), it had a 27% transistor count increase, and an 8% freq bump for the CPU. They added a new image processor. Finally they buffed up the AI/ML capabilities of the "AI cores" by 43%. A 27% transistor bump on the same node for a 43% boost in specific workloads, an 8% boost in general workloads, and a new image processor hardly sounds like "Apple has hit a dead end" to me.
 
A14 is produced on N5 and A15 is on N5P which TSMC says delivers a 5% improvement
Completely understand the confusion- I said it is a dead end path - with decreasing returns on effort. Not that Apple has hit a dead end. We might disagree to how much longer to hit the end of the path.
What is clear is that improvements are taking alot more effort to squeeze out additional benefits.

The ML inference ANE workloads increased from 11.66 Tflops to 15.8 - a 35% bump.
My point is that the cost/complexity of an additional 27% in transistor count is much much higher than the initial A14 functionality - meaning Apple is paying a huge premium for the additional 8% general processing and 43% (or 35%) specific workloads.

I agree that there are still improvements to be made - still - I'd be shocked if there was disagreement to my core statement - "It is increasingly hard/costly to get those benefits"
I don't expect alot of agreement that these effort premiums are avoidable....
This is my point - the semiconductor industry is stuck in a sort of hubris that results from a long-standing path of success. A path that is getting extremely difficult but noone seems to want to let go to the tried and true method of shrink to solve its problems. There have been some pivots such as Apple's ANE to 'right-size' processing but soon there will be some fundamental assumptions that are challenged.
 
Last edited:
The A15 is a perfect example of being locked into a dead-end design path.
Why do you think the A15 is locked into a dead-end design path? (Do you really mean to discuss the A16?) Because it uses a monolithic die without asynchronous circuits?
 
Why do you think the A15 is locked into a dead-end design path? (Do you really mean to discuss the A16?) Because it uses a monolithic die without asynchronous circuits?
I leveraged the A15 example to re-iterate that it is an example of an increasingly expensive and difficult design path - a dead end path. Apple will undoubtedly deliver an A16 with yet more functionality and performance - at an even higher cost/complexity.

I have also posted on 3D designs - which helps my particular designs - but I dont say that an integrated die approach is limited. Asynchronous logical designs have been shown to unlock significant performance gains but are very hard to implement according to contemporary architectures and common assumptions.... If we consider just how much complexity/circuitry and space goes into providing caching requried to make synchronous designs work a breakthrough in how data moves within the chipset itself will unlock alot of capacity and capability...

Also - as power required to move data around moves much closer to zero - clock rates can be increased and additional logic / compute can be implemented without as much circuitry and thermal impacts. Latency moves to propogation speeds.

As far as I have been able to determine, there have much many sophisticated and complex solutions to address a fundamental assumption around data movement. These complex and complicated solutions work - and will work for some time at an increasing cost - but at some point, the industry will face a glaring challenge to these fundamental challenges.
 
I leveraged the A15 example to re-iterate that it is an example of an increasingly expensive and difficult design path - a dead end path. Apple will undoubtedly deliver an A16 with yet more functionality and performance - at an even higher cost/complexity.
The A16 is already delivered. I have one in my iPhone 14 Pro. I suspect the most significant performance improvement over the A15 in the 14 and 14 Plus phones is that the A16 uses LPDDR5-6400 DRAM, which allows Apple to use a smaller SRAM system level cache for the CPUs, saving power. Apple also used the better fab process to increase the size of the L2 caches, which should increase real world performance too. I don't run gaming apps on my phone, but I know someone who does, and claims the 14 Pro is visibly better than his old 13 Pro. I have noticed the 14 Pro's 5G performance is pretty remarkable, without heating up the phone at all either.
 
It wasn't "increasingly more expensive" though. Even in this worst case scenario (a bigger die on the same node) it would only get more expensive slightly faster than the linear die size increase, all while increasing performance far faster than the cost increase. The only scenario where this would not be true is if the defect density on N5P was so atrocious that increasing the die size 27% increased the cost per yielded mm^2 by 30%. And we both know that there was no huge (or likely any) DD regression for N5P from N5. In the best case scenario (node shrink) that same exact chip would be even cheaper on say N3 than it would on N5P and better still it would offer better PPW. It's almost like the "orthodox method" is the path of least resistance for increasing performance rather than "the path of greatest resistance that is shackeling down chip designers" :unsure:.

My advice would be to stop peddling the psudo-engineering when you seem so poorly informed on how this stuff works. Chip designers and computer scientist are intelligent folks and have probably thought about/theorized every possible computing scheme under the sun. What we have now are the winners of a no holds evolutionary struggle. If you still believe so strongly that your ideas are stronger though, go make a start up and prove the rest of the industry wrong 😁(y).
 
It wasn't "increasingly more expensive" though. Even in this worst case scenario (a bigger die on the same node) it would only get more expensive slightly faster than the linear die size increase, all while increasing performance far faster than the cost increase. The only scenario where this would not be true is if the defect density on N5P was so atrocious that increasing the die size 27% increased the cost per yielded mm^2 by 30%. And we both know that there was no huge (or likely any) DD regression for N5P from N5. In the best case scenario (node shrink) that same exact chip would be even cheaper on say N3 than it would on N5P and better still it would offer better PPW. It's almost like the "orthodox method" is the path of least resistance for increasing performance rather than "the path of greatest resistance that is shackeling down chip designers" :unsure:.
The cost of the engineering design and verification effort is more expensive. Chip production costs obviously dont increase except as die size increases marginally. So - clearly there are business case benefits to these increasing design costs and complexities.

My advice would be to stop peddling the psudo-engineering when you seem so poorly informed on how this stuff works. Chip designers and computer scientist are intelligent folks and have probably thought about/theorized every possible computing scheme under the sun. What we have now are the winners of a no holds evolutionary struggle.
"Pseudo-Engineering"... perhaps.... Poorly informed - not quite. "Every possible computing scheme" - again thats hubris speaking. It has been an evolutionary struggle and there is a revolution in the making.

If you still believe so strongly that your ideas are stronger though, go make a start up and prove the rest of the industry wrong 😁(y).
Agreed. Done and Done.
 
For those unfamiliar with asynchronous logic, its history, and its current use models, this presentation is a good high-level overview:


IBM and Intel have separately been working on neuromorphic computing for years, which is a form of top-down asynchrony, and interesting too.

 
Last edited:
Bam! The time is right.

Let's do this. I will implement in TS16ffc. They have shuttles.
 
Last edited:
The problem with asynchronous logic in the past has been the lack of tools for design and verification.

These guys did it fairly recently - https://etacompute.com/# - but dropped it again (AFAIK).

 
After AI / Machine Learning. There will be quantum computing and that will change everything from transistor level to chip design. Everything will change EDA, Architecture design , verification , fabrication and test.
 
I doubt there will be much quantum computing, it's not the sort of thing that does video decompression or neural networks. I am, however, optimistic about (quasi) adiabatic logic as a way to get to lower power.
 
I doubt there will be much quantum computing, it's not the sort of thing that does video decompression or neural networks. I am, however, optimistic about (quasi) adiabatic logic as a way to get to lower power.
If anything quantum computers will excel in neural networks, it's just the kind of massively parallel application that lends itself to quantum computing.
 
QC does not lend itself to anything that uses a lot of data, and the requirement for super-cooling it limits its use to server farms. QC does polynomial stuff well, AI is used for np-complete problems.
 
If anything quantum computers will excel in neural networks, it's just the kind of massively parallel application that lends itself to quantum computing.
No QC is anything like massively parallel. The largest problems solved, measured by combinatorial complexity, have been toy versions of classes of problem that do scale hard, but the toys can be solved by hand.

The "quantum supremacy" examples have been very weird and specialized problems.

The Qbits themselves are giant objects in any known design, size is more comparable to old vacuum tube machines when you include all the infrastructure around them The problem of running correction to keep coherency long enough to solve large problems is itself a high speed sensor, compute, and actuator problem of serious performance needs.

If you like things cold, it might be interesting to look at liquid nitrogen CMOS with carefully selected work function materials. An ARM/IMEC investigation reported last year (IEDM 2022, session 23.5) showed an overall advantage of 3x efficiency even after the disadvantage of refrigeration.
 
Back
Top