You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!
The weak point of AI hardware is supplying enough memory bandwidth to feed the cores and it seems like they aren't using HBM memory. How can it beat Nvidia without HBM?
That's what it reads like. The most successful instruction set emulator in software I'm aware of is Apple's Rosetta (now Rosetta 2) for running X86 code on its M-series processors.
Your post made me go back and read about Transmeta... I forgot it was yet another VLIW design that lost to superscalar architecture. Transmeta's lead founder, Dave Ditzel, worked at Intel for a while, apparently trying to do a follow-on generation to the Transmeta design. Apparently nothing came of it.
That's what it reads like. The most successful instruction set emulator in software I'm aware of is Apple's Rosetta (now Rosetta 2) for running X86 code on its M-series processors.
Your post made me go back and read about Transmeta... I forgot it was yet another VLIW design that lost to superscalar architecture. Transmeta's lead founder, Dave Ditzel, worked at Intel for a while, apparently trying to do a follow-on generation to the Transmeta design. Apparently nothing came of it.
In my view, any company serious about doing LLM acceleration hardware has to be talking about circuit improvements through architectural innovation and optimization for full-stack attention / transformer-based inference - not “we make faster, parallel universal processor that now does FP4.
Thought this talk was eye-opening on the kinds of hardware / software challenges that are the bottlenecks today along with possible solutions.
The weak point of AI hardware is supplying enough memory bandwidth to feed the cores and it seems like they aren't using HBM memory. How can it beat Nvidia without HBM?
“The TDIMM is key in reducing the cost of AI systems trained on all the knowledge from $8 trillion and 276 gigawatts to $78 billion and 1 gigawatt in 2028,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “The TDIMM ushers in the era of affordable AI trained on all written knowledge produced by humanity, accessible to many companies and nations.”
Minor changes to DDR6 controller, PHY, and MRDIMM chips will double bandwidth from 6.7 TB/s to 13.5 TB/s in 2027, exceeding Nvidia Rubin’s 13 TB/s. The TAI reduces bandwidth up to 4x, making TAI inference like with 54 TB/s of bandwidth. Evolutionary changes in 2028 would double TDIMM based AI chips bandwidth to 27 TB/s.
The TDIMM power consumption is expected to be 30% higher for 2x bandwidth. Using newer DRAM chips will put TDIMM power consumption at about the same level as older DDR5 RDIMM.
So not just a processor that hasn’t been built yet, but also a memory “standard” that is still only in EDA and CAD diagrams ? But I guess with an HQ in Las Vegas, they are going for the extreme long-odds gamblers ?