Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/tech-amd%E2%80%99s-lisa-su-has-already-vanquished-intel-now-she%E2%80%99s-going-after-nvidia.22434/page-2
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Tech AMD’s Lisa Su has already vanquished Intel. Now she’s going after Nvidia

That’s why I question folks proposing lower level standards - I can understand it for HPC problems, but not for GenAI inference. I think the big competitive battle going on right now is going to be about inference cost / power per token at the data center level, for every leading model. The good news is that Llama has been added to MLPerf 5.0. The bad news is that the focus is still on performance, so they aren’t looking at cost/power per token yet.

Gaudi 3 is not bad.
Screenshot 2025-04-03 084954_resized.png
 
That’s why I question folks proposing lower level standards - I can understand it for HPC problems, but not for GenAI inference. I think the big competitive battle going on right now is going to be about inference cost / power per token at the data center level, for every leading model. The good news is that Llama has been added to MLPerf 5.0. The bad news is that the focus is still on performance, so they aren’t looking at cost/power per token yet.

AMD has thrown so much Hardware at the problem it's hilarious just to loose to H100 as for low level you know that part of deepseks success was low level PTX assembly they utilized hardware fully.
 
Gaudi 3 is not bad.
Who knows ? Random commissioned benchmarks are a bit meaningless, especially without full disclosure of comparative environments. I’m thinking that MLPerf 5.0 is a far more reliable and trustworthy, transparent comparison. Unfortunately Intel has only done the hard work for Granite Rapids, not Gaudi 3 (yet) for the new LLM Llama benchmarks.

 
AMD has thrown so much Hardware at the problem it's hilarious just to loose to H100 as for low level you know that part of deepseks success was low level PTX assembly they utilized hardware fully.
Yeah, but most of DeepSeek’s efficiency magic can be duplicated via smarter data center orchestration that does GPU planning, prefill/decode disaggregation, smart KV cache management / routing and communication between GPUs. Guess who has figured out how to make that work (hint 30x tokens/sec improvement on DeepSeek-R1)


Open source, so others can go that route.
 
Who knows ? Random commissioned benchmarks are a bit meaningless, especially without full disclosure of comparative environments. I’m thinking that MLPerf 5.0 is a far more reliable and trustworthy, transparent comparison. Unfortunately Intel has only done the hard work for Granite Rapids, not Gaudi 3 (yet) for the new LLM Llama benchmarks.

Intel is the only company that submits ML Perf Result for CPU it feels weird vs ASIC/GPUs
 
Back
Top