Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/the-era-of-1-bit-llms-all-large-language-models-are-in-1-58-bits.21263/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

XYang2023

Well-known member
Abstract

"Recent research, such as BitNet [WMD+23], is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant,namely BitNet b1.58, in which every single parameter (or weight) of the LLM isternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) TransformerLLM with the same model size and training tokens in terms of both perplexityand end-task performance, while being significantly more cost-effective in termsof latency, memory, throughput, and energy consumption. More profoundly, the1.58-bit LLM defines a new scaling law and recipe for training new generations ofLLMs that are both high-performance and cost-effective. Furthermore, it enablesa new computation paradigm and opens the door for designing specific hardwareoptimized for 1-bit LLMs."

"Moreover, 1.58-bitLLMs are more friendly to CPU devices, which are the main processors used in edge and mobile devices. This means that BitNet b1.58 can be efficiently executed on these devices, further improving their performance and capabilities."

 
So if I read the paper correctly:
* Not really 1 bit as advertised, but 2 bits (I think Apple has some interesting 4 bit lookup table based representation in their NPU)
* Only useful/relatively accurate for zero shot learning - that’s the only use model where they measured accuracy comparable to FP16/BP16?
 
Back
Top