Research paper: Debunking the CUDA Myth Towards GPU-based AI Systems

XYang2023 · Jan 2, 2025

Evaluation of the Performance and Programmability of Intel’s Gaudi NPU for AI Model Serving

IMakeICsAtHomeJK · Jan 5, 2025

Results indicate that Gaudi-2 achieves
energy efficiency comparable to A100, though there are notable
areas for improvement in terms of software maturity. (Lee, Lim, Bang, et al; 2024)

Energy efficiency sounds nice, but " . . . notable areas for improvement in terms of software maturity." sounds like a death sentence for serious consideration by companies and their developers. Also, "software maturity" singularly encompasses so much of what is meant behind the sentiment of 'usability'. So much so, that the term "software maturity" is often enough the sole metric used in the course of tool-acquisition.

Figure 16: [ . . . ]In (c), we compare the end-to-end LLM serving throughput and the corresponding mean time per output token (TPOT) of a single vLLM𝑜𝑝𝑡 based Gaudi-2 and A100. To properly reflect LLM serving system’s dynamism and variable output length, we used the Dynamic-Sonnet dataset [10]. (et al, 2024)

Screenshot 2025-01-05 at 16-56-43 Debunking the CUDA Myth Towards GPU-based AI Systems Evaluat...png

XYang2023 · Jan 5, 2025

IMakeICsAtHomeJK said:
Energy efficiency sounds nice, but " . . . notable areas for improvement in terms of software maturity." sounds like a death sentence for serious consideration by companies and their developers. Also, "software maturity" singularly encompasses so much of what is meant behind the sentiment of 'usability'. So much so, that the term "software maturity" is often enough the sole metric used in the course of tool-acquisition.

View attachment 2636

Based on what I heard, Gaudi Next (whatever) and Falcon Shores will both support oneAPI.

Here is my recent testing of the Intel ARC GPU:

KevinK · Jan 5, 2025

I’m going question the continued focus solely on CUDA model running parity, especially for inference. CUDA is just one piece (crucial of course) at the bottom of the inference system software stack. And this doesn’t even include the cluster management and optimization added with the Run:AI deal. Of course, the criteria are different if you are doing research vs. putting deploying production GenAI systems.

Nvidia Rolls Out Blueprints For The Next Wave Of Generative AI

Hardware is always the star of Nvidia’s GPU Technology Conference, and this year we got previews of “Blackwell” datacenter GPUs, the cornerstone of a 2025

www.nextplatform.com

hist78 · Jan 5, 2025

XYang2023 said:
Evaluation of the Performance and Programmability of Intel’s Gaudi NPU for AI Model Serving

https://arxiv.org/pdf/2501.00210

Has this research paper been published by any peer-reviewed journal?

XYang2023 · Jan 5, 2025

KevinK said:
I’m going question the continued focus solely on CUDA model running parity, especially for inference. CUDA is just one piece (crucial of course) at the bottom of the inference system software stack. And this doesn’t even include the cluster management and optimization added with the Run:AI deal. Of course, the criteria are different if you are doing research vs. putting deploying production GenAI systems.

View attachment 2637

Nvidia Rolls Out Blueprints For The Next Wave Of Generative AI

Hardware is always the star of Nvidia’s GPU Technology Conference, and this year we got previews of “Blackwell” datacenter GPUs, the cornerstone of a 2025

www.nextplatform.com

I think the most difficult part is CUDA, but other companies are catching up. Intel is leading the UXL/oneAPI efforts:

UXL Foundation: Unified Acceleration

uxlfoundation.org

Then there’s the Ultra Accelerator:

Digging Into The Ultra Accelerator Link Consortium

The newly formed UALink Consortium brings together major tech companies to address the vital technical challenge of GPU-to-GPU connectivity in datacenters.

www.forbes.com

And Ultra Ethernet:

Ultra Ethernet Consortium

Delivering an Ethernet based open, interoperable, high performance, full-communications stack architecture to meet the growing network demands of AI & HPC at scale.

ultraethernet.org

Intel also has deep experience in:
* Networking/IPUs/FPGAs
* Silicon photonics/Ayar Labs
* Cloud management/Tiber Cloud/CSP customers
* Liquid and immersive cooling technologies

The Gaudi architecture can be scaled into large clusters:

Intel® Gaudi® 3 AI Accelerator White Paper

This technical paper introduces the next-generation AI accelerator from Intel: the Intel® Gaudi® 3 AI accelerator. The paper provides technical and performance information regarding the new accelerator, including: overview, hardware system, architecture, host interface, compute, software suite...

www.intel.com

Similar to Nvidia's latest GPUs, the upcoming Falcon Shores can reach up to 1500W of power consumption:

AI Power Consumption: Rapidly Becoming Mission-Critical

Generative AI and rising GPU shipments is pushing data centers to scale to 100,000-plus accelerators, putting emphasis on power as a mission-critical problem to solve.

www.forbes.com

Additionally, Intel generally demonstrates better software engineering capabilities than competitors like AMD. For example:

Open Platform For Enterprise AI

Efficiently integrate secure, performant, and cost-effective Generative AI workflows into business value.

opea.dev

Overall, I believe Intel has the necessary components to compete effectively in this market. However, Intel should focus on unifying its resources and accelerating its market strategy to gain a stronger competitive edge while being discipline in terms of minimizing un-necessary spending.

XYang2023 · Jan 5, 2025

hist78 said:
Has this research paper been published by any peer-reviewed journal?

I need to read the paper in bit more details. I will find a time to do so.

hist78 · Jan 9, 2025

XYang2023 said:
I need to read the paper in bit more details. I will find a time to do so.

My question is not whether you have read it in detail or not. My question concerns the credibility or quality of this research paper. As far as I can tell, this article hasn't been published in any peer-reviewed publications. Why they chose this route or avoided doing so is an important question.

XYang2023 · Jan 9, 2025

I think if they write it, they will publish it. Maybe it is currently under review. Personally, I want to have a look at their methods.

siliconbruh999 · Jan 9, 2025

Yeah need someone to peer review it the quality is nice I haven't read the full analysis is nice but they can pull some tricks

Search

Research paper: Debunking the CUDA Myth Towards GPU-based AI Systems

XYang2023

Well-known member

IMakeICsAtHomeJK

New member

XYang2023

Well-known member

KevinK

Well-known member

Nvidia Rolls Out Blueprints For The Next Wave Of Generative AI

hist78

Well-known member

XYang2023

Well-known member

Nvidia Rolls Out Blueprints For The Next Wave Of Generative AI

UXL Foundation: Unified Acceleration

Digging Into The Ultra Accelerator Link Consortium

Ultra Ethernet Consortium

Intel® Gaudi® 3 AI Accelerator White Paper

AI Power Consumption: Rapidly Becoming Mission-Critical

Open Platform For Enterprise AI

XYang2023

Well-known member

hist78

Well-known member

XYang2023

Well-known member

siliconbruh999

Well-known member