Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/intel%E2%80%99s-ex-exec-raja-koduri-says-%E2%80%9Cyou-don%E2%80%99t-learn-without-shipping%E2%80%9D.22143/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Intel’s Ex-Exec Raja Koduri Says “You Don’t Learn Without Shipping”

tedc

Member
GkKFV1MbMAAvVa-


Intel had a tough 2024. Wishing Intel a wonderful and productive 2025 and a path forward. A lot has been written about Intel lately. Largely doom and gloom. Many in the industry, folks at intel and friends and family circles have reached out. Some don't see any hope of turn around, others wondering if this is the low point and if they should invest now.

"Intel is so far behind on AI and have no strategy", "they are still many years behind TSMC on process technology" and the list goes from the Intel bears, who seem the majority now. Bulls have "hope" and the bears counter with "hope is not a plan".

I am in the bulls camp, and rest of the article outlines my perspective and opinion, which is the basis of my hope. My central thesis is that Intel needs to set itself an audacious product target that inspires their whole engineering to rally behind. To achieve these targets the entire technology stack that includes transistor physics, advanced packaging, silicon design and software architecture need to take shared risks. There is chatter about splitting process technology and product engineering into separate companies- this could be counter productive.

Creating an arm's length foundry relationship now risks crippling the only company theorectically capable of innovating across the entire stack – from fundamental physics (atoms) to software (python).

...

My humble suggestions for whoever takes the leadership mantle at intel
  1. 1) Increase the coder-to-coordinator ration by 10x. This is likely the most painful thing to do, as it could result in massive reduction in head count first and some rehiring. Give re-learning opportunity to folks stuck in co-ordination tasks to get back to coding or exit the company. AI tools are a great enablers for seniors to get back into hands on work.

  2. 2) Organize the company around product leadership architecture. Intel can build the whole stack of products from 10W to 150KW with <6 modular building blocks (/chiplets) that are shared across the whole stack. Splitting the company around go-to-market boundaries is preventing them to leverage their leadership IP up and down the stack (eg:- Lunarlake SOC energy efficiency on Xeon will be awesome, but Xeon energy efficiency is far from leadership today). With leverage of leadership IP across the whole stack Intel can field top performing products across client, edge and data centers and get a healthy share of >$500B TAM accessible to them.

  3. 3) Cancel the cancel culture. The legacy of Intel is built on relentless iteration. Iteration cycles to 90% yields of new process technologies every 18 months. Tick-tock model of execution. Stop the "cancel culture". You achieve nothing.

  4. 4) Bet on generality and focus in performance fundamentals. Ops/clk, Bytes/Clock, Pj/Op, Pj/Bit etc. The boundaries are not CPU, GPU and AI Accelerators. The workloads are an evolving mix of scalar, vector and matrix computations demanding increasing bandwidth and memory capacity. You have the unique ability to deliver these elements in ratios that can delight your customers and destroy your competitors.

  5. 5) Make a ton of BattleMage and PVC GPUs available to open source developers worldwide friction free. Selling a ton of Battlemage GPUs is a good step to achieve this. Don't worry about the margins on them. This is the most efficient way to get into hearts and minds of AI developers, while delighting millions of gamers worldwide. Battlemage is a great example of the benefit of iteration. Very measurable gains in software robustness and performance since Alchemist in 2022. They will be on path to leadership if they iterate again and launch Celestial in the next 12 months. Make all inventory of PVC (including ones in the Argonne Exascale installation) available to Github developers with no "cloud friction". It should be a single click connect to cloud GPUs from any PC/Mac in the world. Intel GPUs are the most compatible (amongst other intel choices) with Pytorch/Triton AI developer eco-system. This effort will help immensely with the leadership 2027 system launch, where more software will be functional on Intel day one.
 
Make all inventory of PVC (including ones in the Argonne Exascale installation) available to Github developers with no "cloud friction".
Does this mean the Argonne machine sits idle sometimes? That is not a great look imo. More like putting lipstick on a pig.

What am I missing?
 

Image


Intel had a tough 2024. Wishing Intel a wonderful and productive 2025 and a path forward. A lot has been written about Intel lately. Largely doom and gloom. Many in the industry, folks at intel and friends and family circles have reached out. Some don't see any hope of turn around, others wondering if this is the low point and if they should invest now.

"Intel is so far behind on AI and have no strategy", "they are still many years behind TSMC on process technology" and the list goes from the Intel bears, who seem the majority now. Bulls have "hope" and the bears counter with "hope is not a plan".

I am in the bulls camp, and rest of the article outlines my perspective and opinion, which is the basis of my hope. My central thesis is that Intel needs to set itself an audacious product target that inspires their whole engineering to rally behind. To achieve these targets the entire technology stack that includes transistor physics, advanced packaging, silicon design and software architecture need to take shared risks.

There is chatter about splitting process technology and product engineering into separate companies- this could be counter productive.
Creating an arm's length foundry relationship now risks crippling the only company theoretically capable of innovating across the entire stack – from fundamental physics (atoms) to software (python).

Intel Treasures and Snakes​

Image
Intel snakes and treasures

Intel Treasures​

Intel still has a ton of IP and technology. These are gems that many in the eco-system envy. Many innovations have been sitting on shelf. These innovations span across process technology, advanced packaging, optics, advanced memories, thermals, power delivery CPU, GPU, and much more. Some of these innovations could give intel products an order of magnitude improvements in Performance, Performance/$ and Performance/Watt, the metrics that determine the ultimate leadership in all computing domains - across data centers, edge and personal devices.

Intel Snakes​

The tragedy of Intel's treasures lies in their delayed or deferred deployment. For over five years, the company's product roadmap – the vital pipeline for bringing these innovations to market – has been clogged by manufacturing challenges. While the troubles began with 14nm, the 10nm node became an unprecedented bottleneck that cost Intel half a decade of leadership. However, manufacturing delays tell only part of the story. Deeper issues, rooted in culture and leadership, prevented Intel from making pragmatic decisions – such as timely adoption of external manufacturing capabilities like TSMC when internal solutions faltered.

At its core, Intel's DNA is built on performance leadership – the relentless pursuit of benchmark-breaking excellence. Every aspect of its business model, from marketing to sales, is calibrated for being the undisputed leader in its chosen segments. NVIDIA shares this performance-first DNA, evident in their relentless pursuit of benchmark supremacy at any cost. "Performance DNA" companies also build products ahead of customers needs. They are always ahead of the curve. Neither company thrives as a "value or a services player" – they're not built to compete primarily value metrics like performance/$ or delivering services per customer requests. While value/service-oriented companies can be tremendously successful, transforming a performance-focused company into a value player requires major cultural surgery. The reverse transformation is far more natural. Running foundry service will be a challenging transition for Intel. Licensing partnerships with companies that are already in the foundry services business could be a more pragmatic approach.

The "spreadsheet & powerpoint snakes" – bureaucratic processes that dominate corporate decision-making – often fail to grasp the true cost of surrendering performance leadership. They optimize for minimizing quarterly losses while missing the bigger picture. These processes multiply and coil around engineers, constraining their ability to execute on the product roadmap with the boldness it requires. A climate of fear surrounds any attempt at skunkworks initiatives outside established processes – one misstep, and the bureaucratic snakes strike. This environment has bred a pervasive "learned helplessness" throughout the engineering ranks, stifling the very innovation culture that built Intel's empire. Learned helplessness is a set of behaviors where we give up on escaping a painful situation, because our brain has gradually been taught to assume powerlessness in that situation.

Transformation​

Having witnessed companies rise from the ashes before, I know transformations are possible, even from the depths of despair. While financial engineering provides essential sustenance for development, it alone cannot ignite the spark that drives engineers to build something truly revolutionary. In the cutting-edge world of technology, engineers need more than just resources – they need an inspiring, almost audacious target to pursue. The ideal target should be simultaneously intimidating and inspiring: intimidating because it pushes the boundaries of what's possible, inspiring because it represents a leap forward for computing. Leadership's role isn't just to set these targets – it's to provide the tools, show the path forward, and get their hands dirty alongside the team in the trenches.

The pursuit of a formidable challenge – a "big bad monster" – has universal appeal, regardless of experience level. In today's AI computing landscape, what could serve as that inspiring yet daunting target? Let's begin with the hardware challenge

The Big Bad Monster NVL72​

Consider NVIDIA's NVL72, the current apex predator in AI computing:
  • 360 PFLOPS of raw FP8 compute (no sparsity)
  • 576 TB/Sec of HBM Bandwidth at 18.8 TB Capacity
  • 130 TB/Sec of GPU-GPU Bandwidth through NVLink
  • ~ $3M Price
While NVIDIA's individual GPUs (B100/B200) are impressive in their own right, it's the NVL72's density and scale that truly intimidates. This isn't just about raw GPU power – it's a masterclass in system architecture, showcasing state-of-the-art scale-up and scale-out bandwidth capabilities that set new industry standards. This breakthrough performance comes with corresponding costs: a premium price tag and substantial power consumption (~120-132 KW). Yet NVIDIA commands this premium because the NVL72 stands alone in delivering this combination of generality and performance.

Lets capture the pico-joules-flop (Pj/Flop) of the NVL72 system as that will be handy later.
NVL72PjFlop=(132,000∗1012)/(360∗1015)≈0.4

Inspiring 2027 Target​

Here's what I propose as Intel's moonshot system targets:
  • 1 ExaFlop of raw FP8/INT8 compute performance
  • 5 PB/Sec of "HBM" bandwidth at 138 TB Capacity
  • 2.5 PB/Sec of GPU-GPU bandwidth
  • All while maintaining a 132 KW power envelope
  • At $3M price
Let's grasp the audacity of these targets:
  • 3X leap in compute performance
  • 10X revolution in memory bandwidth and capacity
  • 20X breakthrough in interconnect bandwidth.
  • All while maintaining the same power envelope and cost
Intel possesses all the technological ingredients to achieve these spectacular specifications. With complete organizational alignment and focus, they can get there. And we should expect NVIDIA to set their sights on similar – or even more ambitious – parameters. You need to beyond your best to compete against Nvidia and you also need to show up again and again in 2028, 2029.

It's also important to mention that the specs above will translate to very compelling systems at 1 Petaflop (132W) and 100 Teraflop(13W) ranges as well, giving Intel an excellent leadership stack from mobile, mini-PC, Desktop to Data Centers. Intel will have the ability to offer single stack from device to DC to deploy excellent open models like Deep Seek efficiently to consumers and enterprises. A single system that can productively host the whole 670B parameter DeepSeek model under $10K is very much in Intel's realm.

There's a "deepseek" moment in cost within the next 3-5 year horizon. What gives me this optimism? One should go down to first principles and look at the following factors -
  • How many logic and memory wafers do we need for the specs above
  • The price of these wafers
  • Pflops-per-mm2
  • Gbytes-per-mm2
  • Wafer yields
  • Assembly and rest-of-system overheads
  • Margin
with ~0.01 Pflop/mm2 and ~0.03 GB/mm2 for memory, you can construct a simple first principle cost range for these products. You will be amazed to see that there is 5-10x opportunity on dollars. Exploiting this opportunity is impractical if you don't own most of the components and more importantly the final assembly (3D, 2.5D, 2D...)

( I have simple GPUFirstPrincipleCost web app in the works where you can feed your input and assumptions and the app calculates the cost. Will share when done)

Let us now look at Pj/Flop derived from the ambitious targets above
Intel2027PjFlop=(132,000∗1012)/(1000∗1015)≈0.1
Achieving this target requires a 4X reduction in Pj/Flop – a daunting challenge in the post-Moore's law era. However, Intel's Lunar Lake silicon already demonstrates promising efficiency, delivering ~100 INT8 TOPS (GPU+NPU) at ~20W, or ~0.2 Pj/op. This baseline proves Intel possesses IP capable of competitive efficiency.
4 key connected challenges to overcome
1. Find another 2X efficiency to get to 0.1 Pj/Flop
2. While scaling compute 10,000X to get to Exaflop (including the cost for interconnect)
3. While delivering 10X near memory bandwidth
4. while staying compatible with the existing Python/C/C++ GPU software (ie; no esoteric diversions like quantum, neuromorphic , logarthmic and other ideas being pursued by a few startups )
3 key contributors for on-chip power (In Femto (F) joules )
  • Math ops: ~8 Fj/bit
  • Memory: ~50 Fj/bit
  • Communication: ~100 Fj/bit/mm
All state of the art designs will be in the same ball park on the math ops power. They will be close to whatever the leading process node of that time (TSMC N2, Intel 14A etc) entitles them to. Most of the interesting differentiation for Intel needs to come from memory and communication aspects.
At IEDM recently nVidia published the below picture.
Image
Advanced packaging techniques have demonstrated potential for 10-20X reduction in Fj/bit for memory access and inter-chip communication. Intel's zetta-scale development work in 2021-2022 produced working prototypes demonstrating these gains. While the zetta-scale initiative that sparked this work has been discontinued, its technical foundations remain relevant.

https://www.servethehome.com/rajas-chip-notes-lay-out-intels-path-to-zettascale/

Having silicon designers, advancing packaging engineers with access to latest process technology from both TSMC & Intel is a huge advantage in pulling of iterations needed to productize these technologies sooner than later. Intel started this journey 5+ years ahead of others. Kabylake-G (EMIB with GPU), Lakefield (first 3d stacked high-volume chip), Ponte Vecchio (47 chiplets in a package that combined SOTA 2.5 and 3D) are key examples.

Mastering advanced technology is a relentless iterative process and you need to ship to learn and improve. There is no "magic" where you solve yields, cost, performance, thermals and reliability in one single shot. Intel started this iteration loop ahead of the industry, but had many self inflicted stutters in between. Primarily by cancelling projects, some of which were even ready for sampling and production and switching to external acquisitions and restarting the iteration loop. Going back to Larrabee days (2009), Intel had at-least 8 start/stops to throughput computing architecture iteration loop. And at each of these junctures they switched to a different architecture and not benefitting from the earlier loop.

Back to the math again, the "math ops" targets by themselves shouldn't be a problem. Intel should be able to do <0.05 Pj/Flop (FP8) without heroics and you need that buffer to pay for the power of memory and communication. To hit the 2027 system goals they would also need to find a 2X opportunity outside math as well. Exploiting the reduction of distance (mm) through advanced packaging is a key tool to achieve that and I believe they had technologies to accomplish this. Next are the memory bandwidth targets.

Intel also has homegrown memory technologies that can usher in the era of near memory computing. Some hints linked below

https://www.tomshardware.com/news/intel-patent-reveals-meteor-lake-adamantine-l4-cache
.
Whether it's their homegrown technology or from a "tight" partnership with DRAM industry, there are 10X bandwidth increase opportunities in the 3-4 year horizon. Whoever takes the first risks and executes can be far ahead of the rest. Interestingly the technologies that help deliver 10X memory bandwidth also help with the communication bandwidth target of 20X. The key is to free up more of the chip perimeter for chip-to-chip communication.

Intel also has excellent Silicon Photonics technology, which won't amount to anything if it isn't integrated into products to start the learning loop. All technologies and IP are perishable goods with expiry date. Consume them before it's too late.

Let's talk about Intel's scalability and software now. Recently I got access to Intel PVC 8-GPU system on their Tiber cloud. I also add access 8-GPU setups from AMD and Nvidia. All three systems are floating point beasts. Here are there FP16/BF16 specifications
  • Nvidia 8xH100 - 8 PF
  • AMD 8xMI300 - 10.4 PF
  • Intel 8xPVC. - 6.7 PF
I wrote a custom benchmark tool (called torchure 😄) to understand the performance of these systems across various sizes and shapes of matrices. The motivation for this came from observing the traces of various AI models. I noticed that majority of the performance is dominated by sequence of matrix multiplies, all of them generally large matrices (4K and above). I also wanted to exercise these system with PyTorch - standard PyTorch, no fancy libraries or middleware. My thesis is that the quality, coverage and performance of standard PyTorch is a good benchmark for AI software developer productivity on different GPUs.

Software observations. The install and getting things to "work" first time was more steps with both AMD and Intel. It involved interactions with engineers at both companies before I got going. Nvidia was straightforward. But I have to acknowledge that both AMD and Intel made a ton of progress in making Pytorch easy to use, compared to where we were 2 years ago. Intel's driver install and Pytorch setup was a bit less friction than AMD. AMD supports torch.cuda device directly and with intel you need to map to torch.xpu device. So, there is a bit of code adjustments I needed to make for Intel, but was not too painful. Intel "sunset" PVC GPU last year and from what I have heard the AI software team was busy with Gaudi for past few years. My expectations of compatibility and performance on Intel were very low. I was pleasantly surprised that I was to run my tests to completion - not only 1 GPU, but also all the way up to 8 GPUs. Below are the results for 8X GPUs

Image
8x Nvdia vs AMD vs Intel

Across the sweep of different matrix shapes and sizes
  • Nvidia 8xH100 - 5.3 PF (67% of peak)
  • AMD 8xMI300 - 3.1 PF (30% of peak)
  • Intel 8xPVC - 2.7 PF (40% of peak)

Some observations
  • Easy to see why Nvidia is still the darling of everyone. This is H100. Blackwell will move the bar up even more
  • From the semi-analysis article, understand AMD has new drivers coming that seem to improve GEMM numbers substantially. which is good news for AMD. This article is not about AMD or NVIDIA.

  • The surprise here is the abandoned PVC that is even this close to the top GPUs . PVC is a generation behind MI300X in terms of process technology. Majority of PVC silicon is on Intel 10nm, which is ~1.5 nodes behind TSMC N4. The GPU-to-GPU bandwidth through XeLink seems to be performing better than AMD xGMI solution.

  • There are definitely software optimizations left on the table. They should be able to get to 60% of peak. You can see the impact of software overhead on Intel in the case of smaller matrix dimensions.

  • Intel cancelled the follow-on to PVC called Rialto Bridge in Mar' 2023
    https://www.crn.com/news/components...ter-gpu-road-map-delays-falcon-shores-to-2025
Image
Rialto Bridge
  • This chip was ready for tape-out in Q4'22 and would have been volume production in 2024 and was speced to deliver more than H100.
  • AMD began there iteration loop with advanced packaging and HBM with Fiji in 2015, followed it with Vega in 2017. MI25, MI50, MI100, MI200, MI250 followed and eventually MI300. MI300 is AMD's first GPU to cross $1B in revenue. You only learn by shipping.
Image

Getting back to the main thread. The data points above show that Intel has foundations to be able to compete with the best. They need to be actively playing the game and not thrash the roadmap. And stop snatching defeats from jaws of victory.

None of the this is going to be easy. All layers of Intel have to go through painful transformations. Just the executive leadership musical chairs are insufficient.

"Let chaos reign and then rein in chaos."​

This is a famous quote by Andy Grove ( probably the last CEO of Intel that understood every layer of the company's stack very intimately. I often wondered what Andy would do now..)
Let's dissect this a bit. Why would you let any chaos reign? Isn't all chaos bad? the answer is no. There is good chaos and bad chaos. Good chaos forces you to invent and change. Major tech and industry transitions are good chaos. Internet, WiFI, Cloud, Smartphone, AI are some examples of transitions than can lead to good chaos. Intel benefitted from some of these transitions when it was able to "rein in". Good chaos generally comes in from external events. Bad chaos comes from internal issues. I like to call bad chaos "organizational entropy". This is the higher order bit that decays the efficiency of companies.
Image
https://pdfs.semanticscholar.org/8655/f1d23285639d5833ff4fa0ea4632856011cf.pdf
.
When entropy crosses a certain threshold, the leadership loses control of the company. No amount of executive thrash can fix this situation, until you reduce this entropy.

Image
My humble suggestions for whoever takes the leadership mantle at intel
  1. Increase the coder-to-coordinator ration by 10x. This is likely the most painful thing to do, as it could result in massive reduction in head count first and some rehiring. Give re-learning opportunity to folks stuck in co-ordination tasks to get back to coding or exit the company. AI tools are a great enablers for seniors to get back into hands on work.

  2. Organize the company around product leadership architecture. Intel can build the whole stack of products from 10W to 150KW with <6 modular building blocks (/chiplets) that are shared across the whole stack. Splitting the company around go-to-market boundaries is preventing them to leverage their leadership IP up and down the stack (eg:- Lunarlake SOC energy efficiency on Xeon will be awesome, but Xeon energy efficiency is far from leadership today). With leverage of leadership IP across the whole stack Intel can field top performing products across client, edge and data centers and get a healthy share of >$500B TAM accessible to them.

  3. Cancel the cancel culture. The legacy of Intel is built on relentless iteration. Iteration cycles to 90% yields of new process technologies every 18 months. Tick-tock model of execution. Stop the "cancel culture". You achieve nothing.

  4. Bet on generality and focus in performance fundamentals. Ops/clk, Bytes/Clock, Pj/Op, Pj/Bit etc. The boundaries are not CPU, GPU and AI Accelerators. The workloads are an evolving mix of scalar, vector and matrix computations demanding increasing bandwidth and memory capacity. You have the unique ability to deliver these elements in ratios that can delight your customers and destroy your competitors.
  5. Make a ton of BattleMage and PVC GPUs available to open source developers worldwide friction free. Selling a ton of Battlemage GPUs is a good step to achieve this. Don't worry about the margins on them. This is the most efficient way to get into hearts and minds of AI developers, while delighting millions of gamers worldwide. Battlemage is a great example of the benefit of iteration. Very measurable gains in software robustness and performance since Alchemist in 2022. They will be on path to leadership if they iterate again and launch Celestial in the next 12 months. Make all inventory of PVC (including ones in the Argonne Exascale installation) available to Github developers with no "cloud friction". It should be a single click connect to cloud GPUs from any PC/Mac in the world. Intel GPUs are the most compatible (amongst other intel choices) with Pytorch/Triton AI developer eco-system. This effort will help immensely with the leadership 2027 system launch, where more software will be functional on Intel day one.
Is this all too late?

"Optimism is the essential ingredient of innovation" said Robert Noyce, one of legends that made Intel great.
Below is the legal disclaimer:)

"Everything we hear is an opinion, not a fact. Everything we see is a perspective, not the truth.” — Marcus Aurelius
Image
 
What do you think Raja's end game is here? Is he raising money for his new venture? Or is he trying to sell his new venture?


Or is he looking for a job?
 
What do you think Raja's end game is here? Is he raising money for his new venture? Or is he trying to sell his new venture?


Or is he looking for a job?
There were a lot of rumors he wanted the Intel CEO slot when Pat got it. Similar rumors of leaving AMD to Intel for more career growth opportunities. Also Pat allegedly got the job by just sharing his vision with the board (not specifically asking for the job) - but that's rumor of course.

The only caution I have with "you don't learn without shipping products" is.. learning after shipping is the most expensive engineering lesson there is :)
 
He is very good at talking but not executing. His experience in AMD and Intel has the similar pattern. Every product he had developed had beautiful specs and then failed to impress after launch. I also heard a rumor that in Intel he spent millions of dollar in polishing his slides. He should be responsible for the failures of Intel ARC A Series and PVC for the supercomputer in Argonne National Lab. Intel had to cancel the GPU series for data center because of his failure.
 
There were a lot of rumors he wanted the Intel CEO slot when Pat got it. Similar rumors of leaving AMD to Intel for more career growth opportunities. Also Pat allegedly got the job by just sharing his vision with the board (not specifically asking for the job) - but that's rumor of course.

The only caution I have with "you don't learn without shipping products" is.. learning after shipping is the most expensive engineering lesson there is :)

Intel wanted Pat Gelsinger on the board, Pat wanted the CEO job, that is how it happened. Pat would have been a better board member for sure so once again the Intel board makes a bad CEO decision.
 
Intel wanted Pat Gelsinger on the board, Pat wanted the CEO job, that is how it happened. Pat would have been a better board member for sure so once again the Intel board makes a bad CEO decision.

I did not realize that nuance on the hiring -- thanks Daniel!

(and agreed re: Pat as a board member).
 
Back
Top