Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/index.php?threads/gen-ai-wafer-demand-on-the-semiconductor-industry.19940/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021370
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Gen AI wafer demand on the semiconductor industry

Daniel Nenni

Admin
Staff member

Pursuing innovation in semiconductors to capture generative AI value​

McKinsey analysis estimates the wafer demand of high-performance components based on compute demand and its hardware requirement: logic chips (CPUs, GPUs, and AI accelerators), memory chips (high-bandwidth memory [HBM] and double data rate memory [DDR]), data storage chips (NAND [“not-and”] chips), power semiconductor chips, optical transceivers, and other components. In the following sections, we will look more closely at logic, HBM, DDR, and NAND chips. Beyond logic and memory, we anticipate that there will be an increase in demand for other device types. For instance, power semiconductors will be in higher demand because gen AI servers consume higher amounts of energy. Another consideration is optical components, such as those used in communications, which are expected to transition to optical technologies over time. We have already seen this transition for long-distance networking and backbones that reduce energy consumption while increasing data transmission rates. To spur innovation in almost all areas of the industry, it is necessary to combine these new requirements with the high level of investment anticipated (see sidebar “Pursuing innovation in semiconductors to capture generative AI value”).

In 2030, AI accelerators with ASIC chips are expected to serve the majority of workloads because they perform optimally in specific AI tasks.

Logic chips​

Logic chip demand depends on the type of gen AI compute chip and type of server for training and inference workloads. As discussed earlier, by 2030, we anticipate the majority of gen AI compute demand in FLOPs to come from inference workloads. Currently, there are three types of AI servers that can manage inference and training workloads: CPU+GPU, CPU+AI accelerator, and fusion CPU+GPU. Today, CPU+GPU has the best availability and is used for inference and training workloads. In 2030, AI accelerators with ASIC chips are expected to serve the majority of workloads because they perform optimally in specific AI tasks. On the other hand, GPU and fusion servers are ideal for handling training workloads due to their versatility in accommodating various types of tasks (Exhibit 4).

Exhibit 4

AI Server Architecture.jpg


In 2030, McKinsey estimates that the logic wafer demand from non–gen AI applications will be approximately 15 million wafers. About seven million of these wafers will be produced using technology nodes of more than three nanometers, and approximately eight million wafers will be produced using nodes equal to or less than three nanometers. Gen AI demand would require an additional 1.2 million to 3.6 million wafers produced using technology nodes equal to or less than three nanometers. Based on current logic fab planning,3 it is anticipated that 15 million wafers using technology nodes equal to or less than seven nanometers can be produced in 2030. Thus, gen AI demand creates a potential supply gap of one million to about four million wafers using technology nodes equal to or less than three nanometers. To close the gap, three to nine new logic fabs will be needed by 2030 (Exhibit 5).

Exhibit 5
AI Server Architecture 2.jpg

DDR and HBM​

Gen AI servers use two types of DRAM: HBM, attached to the GPU or AI accelerators, and DDR RAM, attached to the CPU. HBM has higher bandwidth but requires more silicon for the same amount of data.

As transformer models grow larger, gen AI servers have been expanding memory capacity. However, the growth in memory capacity is not straightforward, posing challenges to hardware and software design. First, the industry faces a memory wall problem, in which memory capacity and bandwidth are the bottleneck for system-level compute performance. How the industry will tackle the memory wall problem is an open question. Static random-access memory (SRAM) is tested in various chips to increase the near-compute memory, but its high cost limits wide adoption. For example, future algorithms may require less memory per inference run, slowing down total memory demand growth. Second, AI accelerators are lighter in memory compared to CPU+GPU architecture and may become more popular by 2030 when inference workloads flourish. This could mean a potentially slower growth in memory demand.

As transformer models grow larger, gen AI servers have been expanding memory capacity. However, the growth in memory capacity is not straightforward, posing challenges to hardware and software design.

Given these uncertainties, we consider two DRAM demand scenarios in addition to the base, conservative, and accelerated adoption scenarios: a “DRAM light” scenario, in which AI accelerators remain memory-light compared to GPU-based systems, and a “DRAM base” scenario, in which AI accelerator–based systems catch up to GPU-based systems in terms of DRAM demand.

By 2030, we expect DRAM demand from gen AI applications to be five to 13 million wafers in the DRAM light scenario, translating to four to 12 dedicated fabs. In the DRAM base scenario, DRAM demand would be seven to 21 million wafers, translating to six to 18 fabs. The wide range of values reflects the challenges associated with reducing the memory requirements per device.

NAND memory​

NAND memory is used for data storage—for instance, for the operating system, user data, and input and output. In 2030, NAND demand will likely be driven by dedicated data servers for video and multimodel data. This data will require substantial storage (for example, for training on high-resolution video sequences and retrieving data during inference). We expect the total NAND demand to be two to eight million wafers, corresponding to one to five fabs. Given that the performance requirement for NAND storage of gen AI will be the same as in current servers, fulfilling this demand will be less challenging compared to logic and DRAM.

Other components​

The rising compute demand will create additional demand for many other chip types. Two types are particularly noteworthy:

High-speed network and interconnect. Gen AI requires high-bandwidth and low-latency connectivity between the servers and between the various components of the servers. A larger amount of network interfaces and switches are required to create all the connections. Today, these interlinks are mostly copper-based, but optical connectivity is expected to gain share with rising bandwidth and latency requirements.

Power semiconductors. AI servers need a large amount of electricity and might consume more than 10 percent of global electricity in 2030. This requires many power semiconductors within the server and on the actual devices.


The surge in demand for gen AI applications is propelling a corresponding need for computational power, driving both software innovation and substantial investment in data center infrastructure and semiconductor fabs. However, the critical question for industry leaders is whether the semiconductor sector will be able to meet the demand. To meet this challenge, semiconductor leaders should consider which scenario they believe in. Investment in semiconductor manufacturing capacity and servers is costly and takes time, so careful evaluation of the landscape is essential to navigating the complexities of the gen AI revolution and developing a view of its impact on the semiconductor industry.

 
AI servers need a large amount of electricity and might consume more than 10 percent of global electricity in 2030. This requires many power semiconductors within the server and on the actual devices.
So they run hot and dissapate lots of heat. Does that really translate into extra need for power semiconductors?
 
Back
Top