Intel signs Microsoft as foundry customer, says on track to overtake TSMC

Xebec · Feb 22, 2024

blueone said:
I don't get it. As an architecture & design guy, I'd relish the end of Moore's Law, if and when it ever happens. Imagine how important innovative chip architecture and design would have to become to keep adding value over time. If that happened I might actually have to go back to work. (BTW, I'm not trying to be humorous. I'm serious.) But, alas, Moore's Law is not dead, and even though it has slowed other forms of chip innovation are emerging.

That is an interesting perspective — I like it!

I think the perspective is that a large portion of Intel is dedicated to litho; if you think you can’t scale any further that’s probably going to generate a large portion of de-energized engineers within Intel. “What is the purpose of my job”, and that de-energization spreads to others.

For me personally, my ‘feeling of Moores Law’ died when Dennard scaling halted.. (I miss the awesome “everything” scaling of the mid 80s through mid 00’s), but it’s still interesting to see how litihography evolves.

TICin852 · Feb 22, 2024

lefty said:
I'm not sure how much Granite Rapids is going to close the gap with AMD's Turin. It has the same Redwood Cove core found in Meteor lake, which basically has no IPC improvement at all: https://www.tomshardware.com/pc-com...enchmarks-show-ipc-regressions-vs-raptor-lake
Meanwhile, Turin has Zen5 which is a major upgrade over Zen4 - at least 15% IPC improvement.
Also, most Arrow Lake SKUs are using TSMC N3 - only the low end SKUs use Intel 20A according to rumours.

Going off of this article: https://chipsandcheese.com/2024/01/11/previewing-meteor-lake-at-ces/
I'd say Meteor Lake is much more of a die shrink of raptor lake and a move towards a disaggregated architecture. One thing to remember is that Intel has been running 10nm for arguably 4/5 generations (depends if you count cannon lake - Ice Lake, Tiger Lake, Alder Lake, and Raptor Lake) and have squeezed absolutely everything out of 10nm like they did with 14nm. Only thing they didn't do was call it 10nm+++ (which arguably makes a lot of sense).

This is Intel's 1st generation and product with their Intel 4 process (previously 7nm, but that doesn't really matter). Arguably the IPC issues could be attributed to higher latency and a disaggregated architecture. I think Granite Rapids should do much better on the 2nd generation Intel 3 process, but this remains to be seen.

Xebec · Feb 24, 2024

blueone said:
I don't get it. As an architecture & design guy, I'd relish the end of Moore's Law, if and when it ever happens. Imagine how important innovative chip architecture and design would have to become to keep adding value over time. If that happened I might actually have to go back to work. (BTW, I'm not trying to be humorous. I'm serious.) But, alas, Moore's Law is not dead, and even though it has slowed other forms of chip innovation are emerging.

Side note ~ 10 years ago Nvidia proved the power of chip architecture with their Maxwell architecture (GTX 900 series). In one generation, they increased perf/watt efficiency by about 40-45% while still remaining on the same node as the previous architecture. That only initially paid 'some' performance dividends over AMD at the time (GPUs weren't yet wattage limited), but ever since then that one time investment in architecture efficiency has kept them consistently ahead and AMD has never caught them in perf/watt for GPUs.

blueone · Feb 24, 2024

Xebec said:
Side note ~ 10 years ago Nvidia proved the power of chip architecture with their Maxwell architecture (GTX 900 series). In one generation, they increased perf/watt efficiency by about 40-45% while still remaining on the same node as the previous architecture. That only initially paid 'some' performance dividends over AMD at the time (GPUs weren't yet wattage limited), but ever since then that one time investment in architecture efficiency has kept them consistently ahead and AMD has never caught them in perf/watt for GPUs.

Agreed.

Another example of design and implementation winning over just making CPUs faster, one can take well-defined subroutines, such as networking protocols or complex financial algorithms, use multiple compiler layers to convert them to RTL, run the RTL in a hot and comparatively slow FPGA, and still beat any CPU's performance running the same stuff in software. This was Microsoft's networking acceleration approach for some time (I don't know if it still is). Making ASICs for acceleration is even better, and can have 10-100x advantages over software on a CPU, with investments in ASIC state machine logic. Architecture, design, and implementation quality can be huge factors in performance and efficiency, but it's easier for the CPU design teams just to count on a bump from a better fab process, at least it was until Dennard scaling plateaued. Now it's bigger caches and increased instruction-level parallelism. Sooner or later you end up going back to first principles.

KevinK · Feb 24, 2024

Xebec said:
Just to clarify for me - is Backside power an “option” or a requirement for a foundry customer on 18A?

Even if it were to be option, that option would require an entire different DTCOed process/full IP/EDA stack that would cost tens of millions to populate and tune, if the EDA guys even considered that a viable option. I think folks fail to understand the cost of building the ecosystem for every node, especially if the node requires fundamental changes to the standard cells, memories, and PHYs, and subsequently, the EDA.

Tanj · Feb 24, 2024

blueone said:
Another example of design and implementation winning over just making CPUs faster, one can take well-defined subroutines, such as networking protocols or complex financial algorithms, use multiple compiler layers to convert them to RTL, run the RTL in a hot and comparatively slow FPGA, and still beat any CPU's performance running the same stuff in software. This was Microsoft's networking acceleration approach for some time (I don't know if it still is). Making ASICs for acceleration is even better, and can have 10-100x advantages over software on a CPU, with investments in ASIC state machine logic. Architecture, design, and implementation quality can be huge factors in performance and efficiency, but it's easier for the CPU design teams just to count on a bump from a better fab process, at least it was until Dennard scaling plateaued. Now it's bigger caches and increased instruction-level parallelism. Sooner or later you end up going back to first principles.

Mostly FPGAs win in latency. That was their main advantage in Catapult networking.

In small functions they can win on throughput - I got 1,000x perf per joule gain out of an FPGA simulating an IP block, for example. But when the functions get larger smart software algorithms start to overhaul, whether by CPU or GPU. And ASICs win overall of course, if the function is worth that spend.

Just as CPUs hit an energy wall, GPUs and ASICs are now. You can see GPUs are power monsters running AI, usually throttling back their clocks because the compute load is so intense. They have absorbed the architecture and app-specific buffer of 15 years or so since CPUs hit the wall. Now you need energy per unit of work - Joules per teraOp for example, pJ per bit of memory read or data moved over the network - as the dominant goal for improvement in new design. This will need some adjustment in EDA which pervasively uses "logical effort" as the optimization target, and that increasingly is different from energy per unit of useful work.

It will be as important to look at new processes on that metric, energy per unit of work delivered, not simply density. Can a process deliver reductions in capacitance and voltage, and are those theoretical gains maintained in the face of EDA algorithms and routing?

Search

Intel signs Microsoft as foundry customer, says on track to overtake TSMC

Xebec

Well-known member

TICin852

Member

Xebec

Well-known member

blueone

Well-known member

KevinK

Active member

Tanj

Well-known member