Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/tachyum-unveils-2nm-prodigy-with-21x-higher-ai-rack-performance-than-the-nvidia-rubin-ultra.24007/page-2
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030770
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Tachyum Unveils 2nm Prodigy with 21x Higher AI Rack Performance than the Nvidia Rubin Ultra

I know two of the executives at Tachyum. I worked with them when I was a junior engineer. They probably won't remember me.

The data sheet said it runs binaries for x86, Arm and RISC-V in addition to Native ISA. Could Prodigy be something like Transmeta from many years ago?
 
The weak point of AI hardware is supplying enough memory bandwidth to feed the cores and it seems like they aren't using HBM memory. How can it beat Nvidia without HBM?
 
The data sheet said it runs binaries for x86, Arm and RISC-V in addition to Native ISA. Could Prodigy be something like Transmeta from many years ago?
That's what it reads like. The most successful instruction set emulator in software I'm aware of is Apple's Rosetta (now Rosetta 2) for running X86 code on its M-series processors.

Your post made me go back and read about Transmeta... I forgot it was yet another VLIW design that lost to superscalar architecture. Transmeta's lead founder, Dave Ditzel, worked at Intel for a while, apparently trying to do a follow-on generation to the Transmeta design. Apparently nothing came of it.
 
That's what it reads like. The most successful instruction set emulator in software I'm aware of is Apple's Rosetta (now Rosetta 2) for running X86 code on its M-series processors.

Your post made me go back and read about Transmeta... I forgot it was yet another VLIW design that lost to superscalar architecture. Transmeta's lead founder, Dave Ditzel, worked at Intel for a while, apparently trying to do a follow-on generation to the Transmeta design. Apparently nothing came of it.
the idea of using firmware to direct traffic to appropriate HW lives on to many other applications ...
 
In my view, any company serious about doing LLM acceleration hardware has to be talking about circuit improvements through architectural innovation and optimization for full-stack attention / transformer-based inference - not “we make faster, parallel universal processor that now does FP4.

Thought this talk was eye-opening on the kinds of hardware / software challenges that are the bottlenecks today along with possible solutions.

 
The weak point of AI hardware is supplying enough memory bandwidth to feed the cores and it seems like they aren't using HBM memory. How can it beat Nvidia without HBM?
Probably in reaction to Your point:
“The TDIMM is key in reducing the cost of AI systems trained on all the knowledge from $8 trillion and 276 gigawatts to $78 billion and 1 gigawatt in 2028,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “The TDIMM ushers in the era of affordable AI trained on all written knowledge produced by humanity, accessible to many companies and nations.”
Minor changes to DDR6 controller, PHY, and MRDIMM chips will double bandwidth from 6.7 TB/s to 13.5 TB/s in 2027, exceeding Nvidia Rubin’s 13 TB/s. The TAI reduces bandwidth up to 4x, making TAI inference like with 54 TB/s of bandwidth. Evolutionary changes in 2028 would double TDIMM based AI chips bandwidth to 27 TB/s.
The TDIMM power consumption is expected to be 30% higher for 2x bandwidth. Using newer DRAM chips will put TDIMM power consumption at about the same level as older DDR5 RDIMM.
 
Back
Top