Array

Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

Serving their AI Masters
Couple more comments. Server revenue is not growing rapidly. It is for Supermicro everyone else is pretty average growth.... some…

— Mark Webb on September 18, 2024
Serving their AI Masters
great info as always. Simple overview: what is the overall server unit growth... do you have a number for 2024…

— Mark Webb on September 18, 2024
Samsung Adds to Bad Semiconductor News
"DRAM pricing dropping like a stone in market share fight" "HBM Not to the rescue" just to recheck. So you…

— Mark Webb on September 16, 2024
The Chip 4: A Semiconductor Elite
The Samsung-TSMC collaboration on HBM4 is more meaningful than any government-based alliance.

— Fred Chen on September 15, 2024
TetraMem Integrates Energy-Efficient In-Memory Computing with Andes RISC-V Vector Processor
IMC based on NVM takes endurance for granted.

— Fred Chen on September 13, 2024
The Chip 4: A Semiconductor Elite
The Chinese DUV scanner is low field size prototype so it would never be cost effective vs even an obsolete…

— nghanayem on September 12, 2024
The Chip 4: A Semiconductor Elite
Would a PRC company have access to all they needed to make larger nodes in the 40 to 50 nm…

— jorgequinonez on September 11, 2024
The Chip 4: A Semiconductor Elite
Today it is physically impossible to make even 32/28nm without access to TEL, Kokoksai, Hitachi, AMAT, LAM, and KLA. Also…

— nghanayem on September 11, 2024
The Chip 4: A Semiconductor Elite
Chip 4 without EUV looks like a toothless tiger to me ...

— schemmel on September 11, 2024
Intel’s Death Spiral Took Another Turn
#2 suffered in Today's news. Reuters reported Broadcom's evaluation of Intel's 18A process with production masks concluded the manufacturing process…

— JackBWF on September 4, 2024

WP_Term Object
(
    [term_id] => 50
    [name] => Events
    [slug] => events
    [term_group] => 0
    [term_taxonomy_id] => 50
    [taxonomy] => category
    [description] => 
    [parent] => 0
    [count] => 1300
    [filter] => raw
    [cat_ID] => 50
    [category_count] => 1300
    [category_description] => 
    [cat_name] => Events
    [category_nicename] => events
    [category_parent] => 0
)

September 9, 2024September 18, 2024 by Joseph Byrne

Hot Chips 2024: AI Hype Booms, But Can Nvidia’s Challengers Succeed?

Hot Chips 2024: AI Hype Booms, But Can Nvidia’s Challengers Succeed?
by Joseph Byrne on 09-09-2024 at 10:00 am
Categories: Events

You don’t know you’re at a peak until you start to descend, and Hot Chips 2024 is proof that AI hype is still climbing among semiconductor vendors. Juggernaut Nvidia, startups, hyperscalers, and major companies presented their AI accelerators (GPUs and neural-processing units—NPUs) and touched on the challenges of software, memory access, power, and networking. As always, microprocessors also made a significant contribution to the conference program.

Nvidia recapitulated Blackwell details, the monster chip introduced earlier this year. Comprising 208 billion transistors on two reticle-limited dice, it can deliver a theoretical maximum of 20 PFLOPS on four-bit floating-point (FP4) data. The castle wall protecting its dominance is Nvidia’s software, and the company discussed its Quasar quantization stack that facilitates FP4 use and reminded the audience of its 400-plus Cuda-X libraries.

The software barrier for inference is lower. Seeking to bypass it altogether—as well as to offer AI processing in an easier-to-consume chunk than a whole system—AI challengers such as Cerebras and SambaNova provide API access to cloud-based NPUs. Cerebras is unusual in operating multiple data centers, and both companies also offer the option to buy ready-to-run systems. Tenstorrent, however, sees a developer community and software ecosystem as essential to the long-term success of a processor vendor. The company presented its open-source stack and described how developers can contribute to it at any level, facilitated by its use of hundreds of C-programmed RISC-V CPUs.

AI Networking at Hot Chips 2024

Networking is critical to building a Blackwell cluster, and Nvidia showed InfiniBand, Ethernet, and NVLink switches based on its silicon. The NVLink switch picture reveals a couple of interesting challenges to scaling out AI systems. Employing 200 Gbps serdes, signals in and out of the NVLink chips must use flyover cables, shown in blue and pink in Figure 1, because PCBs can’t handle the data rate. Moreover, the picture suggests the blue lines connect to front-panel ports, indicating customers’ data centers don’t have enough power to support 72-GPU racks and must divide this computing capacity among two racks.

NVLink switch tray top view open black Hot Chips 2024 — Figure 1. Nvidia NVSwitch. (Source: Nvidia)

Short-hop links eventually will require optical networking. Broadcom updated the audience on its efforts developing copackaged optics (CPO). Having created two CPO generations for its Tomahawk switch IC, Broadcom completed a third CPO-development vehicle for an NPU. The company expects CPO to enable all-to-all connectivity among 512 NPUs. Intel also discussed its CPO progress. Disclosed two years ago, Intel’s key technology is an eight-laser IC its CPO design integrates, replacing the conventional external light source Broadcom requires.

Software and networking intersect at protocol processing. Hyperscalers employ proprietary protocols, adapting the standard ones to their data centers’ rigorous demands. For example, in presenting its homegrown Maia NPU, Microsoft alluded to the custom protocol it employs on the Ethernet backbone connecting a Maia cluster. Seeing standards’ inadequacies but also valuing their economies, Tesla presented its TTPoE protocol, advocating for the industry to adopt it as a standard. It has joined the Ultra Ethernet Consortium (UEC) and submitted TTPoE. Unlike other Ethernet trade groups focused on developing a new Ethernet data rate, the UEC has a broader mission to improve the whole networking stack.

Alternative Memory Hierarchies at Hot Chips 2024

Despite Nvidia’s success, a GPU-based architecture is suboptimal for AI acceleration. Organizations that started with a clean sheet have gone in different directions particularly with their memory hierarchies. Hot Chips 2024 highlighted several different approaches. The Meta MTIA accelerator has SRAM banks along its sides and 16 LPDDR5 channels, eschewing HBM. By contrast, Microsoft distributes memory among Maia’s computing tiles and employs HBM for additional capacity. In its Blackhole NPU, Tenstorrent similarly distributes SRAM among computing tiles and avoids expensive HBM, using GDDR6 memory instead. SambaNova’s SN40L takes a “yes-and” approach, integrating prodigious SRAM, including HBM in the package, and additionally supporting standard external DRAM. For on-chip memory capacity, nothing can touch the Cerebras WS-3 because no other design comes close to its wafer-scale integration.

CPUs Still Matter

AMD, Intel, and Qualcomm discussed their newest processors, mostly repeating information previously disclosed. Ampere discussed the Arm-compatible microarchitecture employed in the 192-core AmpereOne chip, revealing it to be in a similar class as the Arm Neoverse-N2 and adapted to many-core integration.

The RISC-V architecture is the standard for NPUs, being employed by Meta, Tenstorrent, and others. The architecture was also the subject of a presentation by the Chinese Academy of Sciences. The organization has two open-source projects under its XiangShan umbrella, which covers microarchitecture, chip generation, and development infrastructure. Billed as comparable to an Arm Cortex-A76, the Nanhu microarchitecture shown in Figure 2 is a RISC-V design focused more on power- and area-efficiency than maximizing performance. The Kunminghu microarchitecture is a high-performance RISC-V design the academy compares with the Neoverse-N2. Open source, and thus freely available, these CPUs present a business-model challenge to the many companies developing and hoping to sell RISC-V cores.

Figure 2. Nanhu microarchitecture (source: https://github.com/OpenXiangShan)

Bottom Line

Artificial-intelligence mania is propelling chip and networking developments. The inescapable conclusion, however, is that too many companies are chasing the opportunity. Beyond the companies highlighted above, others presented their technologies—a key takeaway about each is available at xpu.pub. The biggest customers are the hyperscalers, and they’re gaining leverage over merchant-market suppliers by developing their own NPUs, such as the Meta MTIA and Microsoft Maia presented at Hot Chips 2024.

Nvidia’s challengers, therefore, are targeting smaller customers by employing various strategies, such as standing up their own data centers (Cerebras), offering API access and selling turnkey systems (SambaNova), or fostering a software ecosystem (Tenstorrent). The semiconductor business, however, is one of scale economies, and aggregating small customers’ demand is rarely as effective as landing a few big buyers.

Although less prominent, RISC-V is another frothy technology. An open-source instruction-set architecture, it also has open-source implementations. Businesses have been built around Linux, but they involve testing, improving, packaging, and contributing to the open-source OS, not replacing it. Their business model could be a template for CPU companies, which have focused on developing better RISC-V implementations—which could be fruitless given the availability of high-end cores like Kunminghu.

At some point, both the AI and RISC-V bubbles will burst. If it happens in the next 12 months, we’ll learn that Hot Chips 2024 was the zenith of hype.

Joseph Byrne is an independent analyst. For more information, see xampata.com.

Also Read:

The Semiconductor Business will find a way!

Powering the Future: The Transformative Role of Semiconductor IP

Nvidia Pulled out of the Black Well

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.

Serving their AI Masters
Couple more comments. Server revenue is not growing rapidly. It is for Supermicro everyone else is pretty average growth.... some…

— Mark Webb on September 18, 2024
Serving their AI Masters
great info as always. Simple overview: what is the overall server unit growth... do you have a number for 2024…

— Mark Webb on September 18, 2024
Samsung Adds to Bad Semiconductor News
"DRAM pricing dropping like a stone in market share fight" "HBM Not to the rescue" just to recheck. So you…

— Mark Webb on September 16, 2024
The Chip 4: A Semiconductor Elite
The Samsung-TSMC collaboration on HBM4 is more meaningful than any government-based alliance.

— Fred Chen on September 15, 2024
TetraMem Integrates Energy-Efficient In-Memory Computing with Andes RISC-V Vector Processor
IMC based on NVM takes endurance for granted.

— Fred Chen on September 13, 2024
The Chip 4: A Semiconductor Elite
The Chinese DUV scanner is low field size prototype so it would never be cost effective vs even an obsolete…

— nghanayem on September 12, 2024
The Chip 4: A Semiconductor Elite
Would a PRC company have access to all they needed to make larger nodes in the 40 to 50 nm…

— jorgequinonez on September 11, 2024
The Chip 4: A Semiconductor Elite
Today it is physically impossible to make even 32/28nm without access to TEL, Kokoksai, Hitachi, AMAT, LAM, and KLA. Also…

— nghanayem on September 11, 2024

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

AI Networking at Hot Chips 2024

Alternative Memory Hierarchies at Hot Chips 2024

CPUs Still Matter

Bottom Line

Also Read:

Comments

Recent Forum Threads

Recent Article Comments