Why’s Nvidia such a beast? It’s that CUDA thing.

siliconbruh999 · Mar 19, 2025

KevinK said:
Dynamo is open source as well. But more importantly, it offers dynamic reallocation and tuning of resources for max throughout or min token latency for each model inference instance running in an entire data center to optimize operations as models go through different phases (pre fill, token generation).

Cuda is not you can't run it without cuda can you?

KevinK · Mar 19, 2025

AMD runs the open-source precursor to Dynamo, the Triton inference server, using RoCM on the backend.

Triton Inference Server with vLLM on AMD GPUs — ROCm Blogs

This blog provides a how-to guide on setting up a Triton Inference Server with vLLM backend powered by AMD GPUs, showcasing robust performance with several LLMs

rocm.blogs.amd.com

So your starting assumption is likely wrong.

foft · Mar 20, 2025

Cuda is the modern glide ( https://en.m.wikipedia.org/wiki/Glide_(API). The proprietary single hardware company api never wins out in the end.

I presume we’ll end up with something like an OpenGL or DirectX interface for the calculations required backed by optimised hardware specific drivers. Once the most common compute patterns are established.

It just does not make sense for the AI companies to tie themselves so strongly to a single hardware vendor, too much lock in and nigh on impossible to negotiate on price.

KevinK · Mar 20, 2025

foft said:
I presume we’ll end up with something like an OpenGL or DirectX interface for the calculations required backed by optimised hardware specific drivers. Once the most common compute patterns are established.

It just does not make sense for the AI companies to tie themselves so strongly to a single hardware vendor, too much lock in and nigh on impossible to negotiate on price.

We already have better open solutions/frameworks for genAI, than OpenGL or DirectX, like PyTorch and TensorFlow. But higher level GenAI data center operations and orchestration solutions are required - DeepSeek has shown that that one can squeeze much more out of GPUs through disaggregation and intelligent resource allocation for transformer models. NVIDIA is making similar tuning available more broadly.

hist78 · Mar 20, 2025

foft said:
Cuda is the modern glide ( https://en.m.wikipedia.org/wiki/Glide_(API). The proprietary single hardware company api never wins out in the end.

I presume we’ll end up with something like an OpenGL or DirectX interface for the calculations required backed by optimised hardware specific drivers. Once the most common compute patterns are established.

It just does not make sense for the AI companies to tie themselves so strongly to a single hardware vendor, too much lock in and nigh on impossible to negotiate on price.

"It just does not make sense for the AI companies to tie themselves so strongly to a single hardware vendor, too much lock in and nigh on impossible to negotiate on price."

Ideally, it's correct. However, the problem is that most companies do not have the money or resources to develop, manufacture, support, and market their products for too many environments and standards from the outset. Meanwhile, there may not be other mature or comparable options available for them to choose from.

What can they do? They must deliver products and generate revenue before time runs out. They have no choice but to pick a side.

siliconbruh999 · Mar 21, 2025

foft said:
Cuda is the modern glide ( https://en.m.wikipedia.org/wiki/Glide_(API). The proprietary single hardware company api never wins out in the end.

I presume we’ll end up with something like an OpenGL or DirectX interface for the calculations required backed by optimised hardware specific drivers. Once the most common compute patterns are established.

It just does not make sense for the AI companies to tie themselves so strongly to a single hardware vendor, too much lock in and nigh on impossible to negotiate on price.

Direct X is supported by Multiple Hardware Vendor vs single Vendor for CUDA

KevinK · Mar 21, 2025

siliconbruh999 said:
Direct X is supported by Multiple Hardware Vendor vs single Vendor for CUDA

DirectX isn’t even in the game when it comes to GenAI. My take is that some parts of CUDA, RoCM (AMD), etc. are essentially the assembly code for GenAI. Much of the real innovation is taking place further up the stack, though occasionally optimizations done higher up require new features (instructions) in the CUDA (or other low level GPU management API). And new hardware features (like FP4) need the same.

Daniel Nenni · Mar 21, 2025

I got look at Blackwell, impressive. Remember, it is a system and not just a chip. If you haven't already, take a look at Jensen's GTC keynote. It looks like the CUDA moat is getting larger every year.

Masterclass in AI........

KevinK · Mar 21, 2025

Daniel Nenni said:
Remember, it is a system and not just a chip.

My two takeaways from the keynote, especially the part of data center:

* It’s not just a system but a huge data center system - ultra-high bandwidth GPU to GPU networking, water cooling and power distribution of megawatts are becoming considerations as important as the core GPU chips.
* The software to manage these beasts for inference efficiently is even more important. DeepSeek got much of their inference efficiently from fine tuning / disaggregating their model to and fitting it to the their specific data center configuration. But NVIDIA’s new “data center GenAI OS” can automate and generalize the same optimizations (and possibly more) for a broad range of NVIDIA-equipped data center configurations. Sounds like they have already tested with DeepSeek and Perplexity models.

Daniel Nenni · Mar 21, 2025

KevinK said:
My two takeaways from the keynote, especially the part of data center:

* It’s not just a system but a huge data center system - ultra-high bandwidth GPU to GPU networking, water cooling and power distribution of megawatts are becoming considerations as important as the core GPU chips.
* The software to manage these beasts for inference efficiently is even more important. DeepSeek got much of their inference efficiently from fine tuning / disaggregating their model to and fitting it to the their specific data center configuration. But NVIDIA’s new “data center GenAI OS” can automate and generalize the same optimizations (and possibly more) for a broad range of NVIDIA-equipped data center configurations. Sounds like they have already tested with DeepSeek and Perplexity models.

I agree completely.

But the next time I hear Blackwell is delayed, wafer yield will not be my first guess.

Search

Why’s Nvidia such a beast? It’s that CUDA thing.

siliconbruh999

Well-known member

KevinK

Well-known member

Triton Inference Server with vLLM on AMD GPUs — ROCm Blogs

foft

New member

KevinK

Well-known member

hist78

Well-known member

siliconbruh999

Well-known member

KevinK

Well-known member

Daniel Nenni

Admin

KevinK

Well-known member

Daniel Nenni

Admin