Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/will-the-cerebras-four-trillion-transistor-chip-change-the-ai-game.23886/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030770
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Will the Cerebras four trillion transistor chip change the AI game?

Arthur Hanson

Well-known member
Cerebras maintaines having everything on on giant chip saves power and provides superior results to a board full of separate chips. Any thoughts or comments on this appreciated.


 
Cerebras maintaines having everything on on giant chip saves power and provides superior results to a board full of separate chips. Any thoughts or comments on this appreciated.


In my books, Cerebras is the real deal - wafer scale integration gives them some fundamental strengths that translate into huge wins when it comes to hosting large scale Gen AI models:
* model speed - wafer-scale on-chip memory delivers huge memory bandwidth and thus, super speedy models within a single slot / cabinet.
* lower power - wafer-scale also offers incredible on-chip interconnect density. Far shorter signaling wires means lower power consumption within a slot / cabinet.

But those advantages might be less clear when looking at large scale multi-user performance, capacity and cost/TCO, especially when going beyond a single slot /chassis, and looking at rack-level and multi-rack results. In that world, there is much more dependence on how well the hardware and software shares the compute resources within the entire rack across many contexts and users. From what I can tell, things that DeepSeek and others have done related to disaggregation across hardware and KV caching, greatly increase hardware capacity and speed for heterogeneous hardware - AI cpus plus HBMs. That’s why I find the rack-level LLM benchmarking in the second half of this article so interesting - rack-level TCO per processor, and power per token is far better at the rack level than slot level, thanks to these kinds of rack-level optimizations.
 
Back
Top