Can Intel recover even part of their past dominance?

DanX · Dec 15, 2025

IanD said:
Beyond a given point, lowering voltage further increases power consumption per operation.

If you look at power-delay product, this tells you the energy needed to do something, Dynamic power (CV^2) drops with lower voltages (leakage less so) but so does clock speed, more rapidly as you get closer to the threshold voltages. For a given gate type (e.g. ELVT, ULVT, LVT, SVT) and clock speed (and activity percentage) there is a supply voltage where PDP reaches a minimum, and this where the power consumption is also a minimum -- as VDD drops you have to run slower but with more parallel circuits, which works for many things but not all. And if you're really bothered about power efficiency, you also need to vary VDD with process corner and temperature, and also circuit activity and clock speed.

For the circuits we've looked at in N3 and N2 which are relatively high activity (e.g. DSP, FEC...) the lowest PDP is usually with ELVT, but has never been as low as 0.225V -- for lower activity circuits where ELVT leakage is too high compared to dynamic power, ULVT can be better. But there's no single "best" answer (transistor type, voltage, frequency), it all depends on what the circuits are doing... ;-)

Electricity accounts 80% of BTC mining cost.
You can try a better solution and there's a lot of money to be made .
Intel was once a wantobe player.

siliconbruh999 · Dec 16, 2025

DanX said:
Intel was once a wantobe player.

> Intel has put it's hand everywhere

IanD · Dec 16, 2025

DanX said:
Electricity accounts 80% of BTC mining cost.
You can try a better solution and there's a lot of money to be made .
Intel was once a wantobe player.

Which is true -- I was trying to correct the misapprehension that lower voltage is always better for efficiency/energy use, because it's not. However the minimum PDP VDD is well below anything used by chips like CPUs and GPUs today where lower voltage does always improve efficiency, depending on process corner (and circuit, and clock speed, and activity, and transistor type, and phase of the moon...) it's usually around 0.4V or a bit lower which is in the depths far below where CPUs lurk... ;-)

But you can't just take a chip designed for "normal-voltage" operation and drop the supply voltage massively because it won't work, at least not reliably -- if you want to operate down in this region you need to use special libraries and also tool precautions and new timing checks, because gate delay variation and sensitivity to supply voltage drops gets rapidly worse. TSMC enforce special rules for ULV operation, and the voltage where this happens varies with transistor type (ELVT, ULVT, LVT, SVT) -- which then causes bigger issues with mixing transistor types (e.g. uncorrelated Vth) because the delay tracking between types gets worse and worse.

All this imposes some penalties on design which reduce performance and increase area (as does going slower and more parallel), so you don't want to do this for a chip which spends most of its time (and dissipates most of its power) at higher Vdd (e.g. 0.5V and above) like a CPU. However if you have a chip which has one job to do and where power consumption is all-important and you're willing to use adaptive supply voltage, it's a price worth paying -- we've been doing this for some time now, the typical power saving is similar to a complete process node step, and the worst-case saving is closer to two process nodes...

benb · Dec 16, 2025

If Intel design an SRAM that scales, they could regain the lead. But SRAM is like other memory, scaling seems to have ended. SRAM is 90% of the area of a logic chip.

This leads to an observation, who knows memory better than Samsung? Maybe Samsung is a dark horse in the battle to scale SRAM.

ChrisGar · Dec 16, 2025

Daniel Nenni said:
Here is the story:

HP had a big R&D group which spun out Agilent Technologies which became Avago. This was Hock Tan's doing. Avago had an IP group that had the lead in SerDes and other IP so based on that IP Avago did custom ASICs. This was back when IBM, LSI Logic, VLSI Technologies, NEC, and other Japanese semiconductor companies owned the ASIC market. IBM really was a force of nature back then. Avago bought LSI Logic, GlobalFoundries bought IBM Semiconductor and there was other consolidation. Avago became Broadcom, again Hock Tan's doing, and the ASIC business grew. Last I heard it was $30B+ of BRCM revenue.

Avago did Google's first TPUs but Google built up internal teams so they do most of their own design now. Avago still handles some of the backend stuff. I worked for an EDA company who was inside Google for 16nm, 7nm, 5nm, and 3nm. The TSMC N2 TPU is now in process.. They wrote some very big checks and are a coveted EDA/IP customer. Broadcom, on the other hand, has always been cheap on EDA tools. I worked on a couple of projects with them back in the 1990s and it was rough. From what I hear Hock Tan has continued that tradition of sharp penciling.

The analyst Beth Kindig just published a piece saying that Broadcom sells these TPUs to Google for $13,000 a piece. (not bad for just doing some of the backend stuff) There is also an order backlog of $73B in the next few quarters. (not just Google)

MKWVentures · Dec 16, 2025

siliconbruh999 said:
It is Intel designing for Google on N5 also some Chinese customers it's a NEX Product not sure though.

Wow. So Intel is now designing for others to not use IFS.

swka · Dec 16, 2025

IanD said:
Which is true -- I was trying to correct the misapprehension that lower voltage is always better for efficiency/energy use, because it's not. However the minimum PDP VDD is well below anything used by chips like CPUs and GPUs today where lower voltage does always improve efficiency, depending on process corner (and circuit, and clock speed, and activity, and transistor type, and phase of the moon...) it's usually around 0.4V or a bit lower which is in the depths far below where CPUs lurk... ;-)

But you can't just take a chip designed for "normal-voltage" operation and drop the supply voltage massively because it won't work, at least not reliably -- if you want to operate down in this region you need to use special libraries and also tool precautions and new timing checks, because gate delay variation and sensitivity to supply voltage drops gets rapidly worse. TSMC enforce special rules for ULV operation, and the voltage where this happens varies with transistor type (ELVT, ULVT, LVT, SVT) -- which then causes bigger issues with mixing transistor types (e.g. uncorrelated Vth) because the delay tracking between types gets worse and worse.

All this imposes some penalties on design which reduce performance and increase area (as does going slower and more parallel), so you don't want to do this for a chip which spends most of its time (and dissipates most of its power) at higher Vdd (e.g. 0.5V and above) like a CPU. However if you have a chip which has one job to do and where power consumption is all-important and you're willing to use adaptive supply voltage, it's a price worth paying -- we've been doing this for some time now, the typical power saving is similar to a complete process node step, and the worst-case saving is closer to two process nodes...

you certainly cannot just scale Vdd and expect the original HPC application to scale. the call for 0.25v Vdd is about getting to the power efficiency for the same workload, hence requires device and architecture innovations

siliconbruh999 · Dec 16, 2025

MKWVentures said:
Wow. So Intel is now designing for others to not use IFS.

There is the first gen IPU as well it was also mainly for Google Cloud

Intel IPU E2100 DPU Finally Launched for the Mass Market

The Intel IPU E2100 DPU is finally hitting the market with 16 Arm Neoverse N1 cores, 48GB of memory and support for Red Hat OpenShift

www.servethehome.com

IanD · Dec 17, 2025

swka said:
you certainly cannot just scale Vdd and expect the original HPC application to scale. the call for 0.25v Vdd is about getting to the power efficiency for the same workload, hence requires device and architecture innovations

And as I said there's an optimum voltage for lowest PDP (power-delay product), and in my experience (all transistor types, N7 down to N2) it's never as low as 0.25V even with ELVT transistors -- that's from extensive evaluation using real cell libraries across a wide range of conditions.

Yes you can get very low power consumption per gate because dynamic power (CV^2) always drops as you reduce voltage, but once you get below the optimum-PDP voltage -- which is invariably bigger than the sum of the NMOS and PMOS Vths -- delay goes up faster than power drops, so power for the same workload increases. Lower-Vth transistors like ELVT have a lot more leakage, so especially in lower-activity/lower-clock-rate circuits these can make things worse, not better.

No magic device/architecture wand possible here, it's a fundamental property of CMOS transistors -- and the voltages have only moved down slightly over several process nodes, because the shape of the Ids vs. Vgs curves has hardly changed, and neither has subthreshold slope -- Nanosheets in N2 instead of FinFETs show a small improvement, but not that significant.

Here's an old example for N7, so not giving anything secret away... ;-)

Daniel Nenni · Dec 17, 2025

ChrisGar said:
The analyst Beth Kindig just published a piece saying that Broadcom sells these TPUs to Google for $13,000 a piece. (not bad for just doing some of the backend stuff) There is also an order backlog of $73B in the next few quarters. (not just Google)

Compare that to what Nvidia charges......... $13k seems to be pretty cheap.

How many TPU chips does Broadcom do for Google per year? Hundreds of thousands versus millions?

Google has (2) N2 designs in progress that I am aware of. Once the design is complete Google then hands over functionally verified RTL to Broadcom or MediaTek for implementation, packaging and testing. The chips use CoWos so TSMC is also a collaborator in this.

Some ASIC companies do the complete design from spec to chip. Some get GDSII and have the chips made, Google is a hybrid where they do the design, integrate IP, run simulations based on the PDKs, and do functional verification. It really is a close collaboration between the customer, ASIC partner, and foundry.

ChrisGar · Dec 17, 2025

Daniel Nenni said:
Compare that to what Nvidia charges......... $13k seems to be pretty cheap.

How many TPU chips does Broadcom do for Google per year? Hundreds of thousands versus millions?

Google has (2) N2 designs in progress that I am aware of. Once the design is complete Google then hands over functionally verified RTL to Broadcom or MediaTek for implementation, packaging and testing. The chips use CoWos so TSMC is also a collaborator in this.

Some ASIC companies do the complete design from spec to chip. Some get GDSII and have the chips made, Google is a hybrid where they do the design, integrate IP, run simulations based on the PDKs, and do functional verification. It really is a close collaboration between the customer, ASIC partner, and foundry.

Broadcom CEO said the order backlog for the next 6 quarters is $73B for AI processors. (those are confirmed orders -- there will be additional orders -- total will be higher)

$13,000 / device just to do "some backend stuff" ... doesn't sound cheap. Also, Broadcom gross margins (all products) are 78%.

swka · Dec 17, 2025

ChrisGar said:
Broadcom CEO said the order backlog for the next 6 quarters is $73B for AI processors. (those are confirmed orders -- there will be additional orders -- total will be higher)

$13,000 / device just to do "some backend stuff" ... doesn't sound cheap. Also, Broadcom gross margins (all products) are 78%.

I have heard from multiple sources Google's own design teams are taking over more and more of TPU design. Anyone has more specifics on which TPU version this started and the % of partition labor?

siliconbruh999 · Dec 17, 2025

swka said:
I have heard from multiple sources Google's own design teams are taking over more and more of TPU design. Anyone has more specifics on which TPU version this started and the % of partition labor?

Google is moving some TPU To Mediatek from Broadcomm

DanX · Dec 18, 2025

IanD said:
And as I said there's an optimum voltage for lowest PDP (power-delay product), and in my experience (all transistor types, N7 down to N2) it's never as low as 0.25V even with ELVT transistors -- that's from extensive evaluation using real cell libraries across a wide range of conditions.

Yes you can get very low power consumption per gate because dynamic power (CV^2) always drops as you reduce voltage, but once you get below the optimum-PDP voltage -- which is invariably bigger than the sum of the NMOS and PMOS Vths -- delay goes up faster than power drops, so power for the same workload increases. Lower-Vth transistors like ELVT have a lot more leakage, so especially in lower-activity/lower-clock-rate circuits these can make things worse, not better.

No magic device/architecture wand possible here, it's a fundamental property of CMOS transistors -- and the voltages have only moved down slightly over several process nodes, because the shape of the Ids vs. Vgs curves has hardly changed, and neither has subthreshold slope -- Nanosheets in N2 instead of FinFTEs show a small improvement, but not that significant.

Here's an old example for N7, so not giving anything secret away... ;-)

You're correct for most chips, but not for BTC ASICs.
The Bitmain 7 nm miner is called the S19 or S19 Pro, and its chips run at around 0.32 V.
BTC chips are currently the industry leaders in achieving the lowest operating voltage and power consumption.

IanD · Dec 18, 2025

DanX said:
You're correct for most chips, but not for BTC ASICs.
The Bitmain 7 nm miner is called the S19 or S19 Pro, and its chips run at around 0.32 V.
BTC chips are currently the industry leaders in achieving the lowest operating voltage and power consumption.

The laws of physics apply to BTC ASICs just like they do for all others, and no amount of being "an industry leader" can get round them... ;-)

If you're willing to do things like accepting reduced yield by skewing/tightening the process window towards the FF corner then you can push the minimum-PDP voltage down, as can be seen from the RH plot I showed (which was for N7) -- but you have to take extreme care doing this because leakage goes up significantly and this increases rapidly with temperature, so you risk thermal runaway at high chip power levels on fast chips.

That's nothing to do with some super-cleverness in design or "secret sauce", it's simply a commercial decision like CPU binning. If Bitmain claim otherwise and you believe them, you're drinking their Kool-Aid... ;-)

P.S. "VDD down to 0.32V" (in the hot FF corner) is more believable, this would also be the minimum power corner -- but typical would probably be around 0.37V (and maybe ~25% higher power) and the cold SS corner would need something like 0.43V (maybe even higher) and dissipate even more power (maybe ~50% higher?) -- because that's how CMOS works over PVT variation with adaptive (per-chip!) supply voltages... ;-)

P.P.S. Have you ever actually been involved in real physical design in N7/N5/N4/N3/N2, to give some credibility to your statements?

DanX · Dec 18, 2025

IanD said:
The laws of physics apply to BTC ASICs just like they do for all others, and no amount of being "an industry leader" can get round them... ;-)

If you're willing to do things like accepting reduced yield by skewing/tightening the process window towards the FF corner then you can push the minimum-PDP voltage down, as can be seen from the RH plot I showed (which was for N7) -- but you have to take extreme care doing this because leakage goes up significantly and this increases rapidly with temperature, so you risk thermal runaway at high chip power levels on fast chips.

That's nothing to do with some super-cleverness in design or "secret sauce", it's simply a commercial decision like CPU binning. If Bitmain claim otherwise and you believe them, you're drinking their Kool-Aid... ;-)

P.S. "VDD down to 0.32V" (in the hot FF corner) is more believable, this would also be the minimum power corner -- but typical would probably be around 0.37V (and maybe ~25% higher power) and the cold SS corner would need something like 0.43V (maybe even higher) and dissipate even more power (maybe ~50% higher?) -- because that's how CMOS works over PVT variation with adaptive (per-chip!) supply voltages... ;-)

P.P.S. Have you ever actually been involved in real physical design in N7/N5/N4/N3/N2, to give some credibility to your statements?

Of course I’m designing similar chips ; I wouldn’t make such claims otherwise.

Let me give you a hint why this is OK: you can add one more pipeline stage to achieve double frequency.

Scotten Jones · Dec 18, 2025

benb said:
If Intel design an SRAM that scales, they could regain the lead. But SRAM is like other memory, scaling seems to have ended. SRAM is 90% of the area of a logic chip.

This leads to an observation, who knows memory better than Samsung? Maybe Samsung is a dark horse in the battle to scale SRAM.

At TechInsights we do detailed floorplan analysis of various chips. A little over a year ago I took 10 leading edge logic chip floorplans done on 7nm and 5nm nodes, Intel and AMD Microprocessors, Apple M and A processors, NVIDIA GPU, Qualcomm, Media Tech, and Broadcom. Logic was ~1/2 of the area, SRAM ~1/3 and the balance I/O and Analog. Those are averages for the 10 designs but the standard deviation was pretty small. I have heard 75% to 90% numbers for SRAM area for years, but real data doesn't support it. I do plan to go back and relook now that we have several 3nm devices to add in.

IanD · Dec 18, 2025

DanX said:
Of course I’m designing similar chips ; I wouldn’t make such claims otherwise.

Let me give you a hint why this is OK: you can add one more pipeline stage to achieve double frequency.

That's nothing to do with anything; gate delay and power and voltage are linked no matter how you configure a pipeline, and this applies just as much to latches and D-types as anything else.

Adding lots of latches/D-types to shorten pipelines (e.g. to "double frequency") does put clock rates up, but usually increases power per gate transition/operation because the D-types don't contribute any useful function, they just take power (both to propagate data and for clocking). Yes this increases OPS/mm2 and clock rate, but usually total power for a given function also increases -- this is what Intel found out the hard way with NetBurst... ;-)

We've done exactly this comparison many times in DSP design, and the conclusion is always that it's better to have more gate depth between latches and fewer latches and clock more slowly, so long as you can afford the extra silicon area because more parallel paths running more slowly decreases power but increases area for the same task.

This may not work for bitcoin miners because they also have to worry about die size/cost and squeezing more MIPs out of each mm2, and in this case extra pipelining might help meet this requirement.

But for power efficiency -- which is what we're talking about here -- the trends are clear, lots of pipelining increases throughput but decreases power efficiency.

IanD · Dec 18, 2025

Scotten Jones said:
At TechInsights we do detailed floorplan analysis of various chips. A little over a year ago I took 10 leading edge logic chip floorplans done on 7nm and 5nm nodes, Intel and AMD Microprocessors, Apple M and A processors, NVIDIA GPU, Qualcomm, Media Tech, and Broadcom. Logic was ~1/2 of the area, SRAM ~1/3 and the balance I/O and Analog. Those are averages for the 10 designs but the standard deviation was pretty small. I have heard 75% to 90% numbers for SRAM area for years, but real data doesn't support it. I do plan to go back and relook now that we have several 3nm devices to add in.

That's true for certain classes of devices (increasingly for CPUs because caches are shrinking slower than logic), but that's showing up memory trends not logic power efficiency trends. For devices which are doing a lot more data processing and a lot less local storage logic still dominates the area.

siliconbruh999 · Dec 18, 2025

IanD said:
That's true for certain classes of devices (increasingly for CPUs because caches are shrinking faster than logic), but that's showing up memory trends not logic power efficiency trends. For devices which are doing a lot more data processing and a lot less local storage logic still dominates the area.

I think you meant Logic is shrinking faster than Caches no?

Can Intel recover even part of their past dominance?

Active member

Well-known member

Well-known member

Well-known member

Active member

Moderator

Active member

Well-known member

Well-known member

Attachments

Admin

Active member

Active member

Well-known member

Active member

Well-known member

Active member

Moderator

Well-known member

Well-known member

Well-known member