Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/as-chip-industry-chases-ai-u-s-national-labs-look-to-newcomers-for-supercomputers.25119/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2031070
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

As chip industry chases AI, U.S. national labs look to newcomers for supercomputers

Daniel Nenni

Founder
Staff member
1779138199240.png

  • - Sandia tests NextSilicon chips, which passed key technical milestone for government supercomputing
  • - NextSilicon chips use different approach from mainstream products that is more energy efficient
  • - Sandia aims to diversify suppliers, ensuring mission-critical chips amid shifting industry priorities
  • - Sandia's collaborations have previously driven industry adoption of innovations like liquid cooling

ALBUQUERQUE, New Mexico, May 18 (Reuters) - In a nondescript building on Kirtland Air Force Base on the high desert of New Mexico, liquid-cooled supercomputers gurgle and hum their way through some of the most complex math problems the U.S. government seeks to ‌solve: simulating how hypersonic nuclear weapons would move through the earth's atmosphere, or what would happen if one nuclear warhead detonated near another.

For more than a decade, the chips handling this secretive and demanding work came from mainstream semiconductor firms like Nvidia or Advanced Micro Devices
Learn about the latest breakthroughs in AI and tech with the Reuters Artificial Intelligencer newsletter. Sign up here.

But with those companies increasingly designing their chips for artificial intelligence and facing supply shortages, the managers in charge of the systems at Sandia National Laboratories, which operates the machines at Kirtland and is one of three U.S. labs tasked with developing and maintaining the nation's nuclear weapons arsenal, are increasingly unsure how they will find computing power for high-precision scientific work like theirs.

"The pressure we're feeling right now is on the computing front and also from the supply chain," said ⁠Steve Monk, the manager of Sandia's high-performance computing team, explaining the challenge of getting chips that meet his needs. "Looking to the future, it's a bit stressful in terms of our ability to deliver to the mission."

NEW ENTRANTS INTO CHIP MARKET
The lab's predicament shows how the race for better AI chips is having the unintended consequence of opening markets once dominated by the big firms to smaller players such as NextSilicon, an Israeli startup whose chips are being tested by a program at Sandia. It also shows the role that Sandia, which worked with Nvidia extensively as the company rose to prominence in supercomputing and is still collaborating with Nvidia on new memory technology, plays in incubating and shaping new computing technologies.

One major concern for officials at Sandia is what is known as double-precision floating point computation, a technical term for being able to compute both very large and very small numbers without losing accuracy to rounding errors. For years, Nvidia and AMD pursued leadership in speeding up that kind of computing, landing supercomputing contracts with universities and government labs.

But AI work does not benefit from double-precision computing in the same way as simulating physics problems. While AMD is releasing a version of its ‌chips aimed at ⁠scientific computing, the double-precision performance of Nvidia's forthcoming Rubin chips has declined by some measures, worrying many scientists in the high-performance computing industry, said Ian Cutress, chief analyst at More Than Moore, a chip consulting firm.

Daniel Ernst, senior director of supercomputing products at Nvidia, said the company remains committed to scientific computing, aiming to create a balanced chip that can run real-world scientific applications alongside AI work.

But the shifting chip market has prompted officials at Sandia to test products from newcomers such as NextSilicon, whose chip uses a completely different computing approach than graphics processing units (GPUs) or central processing units (CPUs) from Nvidia and AMD.

NUCLEAR SECURITY WORK
On Monday, Sandia, NextSilicon and Penguin Solutions, ⁠the firm that helped weave NextSilicon's chips into a supercomputer, said the systems have passed a key technical milestone using a battery of general supercomputing tests that put the chips in the running for use in government systems.

That sets up NextSilicon's chips for a decision this fall on whether to start testing the chips with more demanding computing problems that closely resemble the kind of nuclear security work they would eventually have to handle.

The NextSilicon chips ⁠can perform double-precision computing and are also designed to reprogram themselves on the fly to run more efficiently. NextSilicon's chips save electricity by using what is known as a data flow architecture that spends less time and energy shuffling data back and forth to the computing system's memory.

Sandia's work with chip firms often helps technology become widespread. Liquid cooling systems for chips were an exotic idea when Sandia started urging Intel, ⁠AMD and Nvidia to work on the technology more than a decade ago, and now they are common.

James Laros, a senior scientist at Sandia who oversees a program to test new computing architectures at Sandia, said the work with smaller players like NextSilicon is aimed at ensuring Sandia can always procure the chips it needs, even if major chip firms shift focus.

"We have to keep available options to complete our mission, because the mission is not optional," Laros said.

 
Sandia is also working Cerebras:


Sandia deploys cutting-edge Cerebras CS-3 testbed for AI workloads

Technician positioning Cerebras wafer enclosure for installation in Kingfisher. Photo taken by Craig Fritz.
In a partnership just reaching two years, Sandia and Cerebras Systems have unveiled a cluster composed of four Cerebras CS-3 systems to be used as a Sandia testbed, that will expand research into AI workloads for national security missions.

The first four Cerebras CS-3 nodes of a planned eight-node system, named Kingfisher, were recently deployed at Sandia, funded by and in support of the NNSA’s Advanced Simulation and Computing Artificial Intelligence for Nuclear Deterrence strategy.

“As part of our ASC AI4ND strategy, the Cerebras CS-3 system positions us to be able to develop large scale trusted AI models on secure internal Tri-lab (Sandia, Lawrence Livermore and Los Alamos Laboratories) data without many of the memory and power challenges that GPU systems face,” said Justin Newcomer, senior manager of the ASC program at Sandia. “Consistent with the Sandia led Advanced Architecture Prototype System program, Vanguard, we are excited to push the boundaries on what is possible with AI systems through this partnership with Cerebras.”

This third-generation wafer scale engine architecture, WSE-3, will expand the capabilities of the NNSA Tri-labs and will allow bleeding edge investigation of future applications of AI to augment the existing ASC mission. In addition to this focus, testing will be conducted to investigate ways the architecture can be applied to Sandia’s traditional modeling and simulation workloads.

“We’re excited to expand our collaboration with Sandia with the deployment of this new Cerebras CS-3 cluster, which consists of four of our 3rd-generation wafer-scale systems,” said Andy Hock, senior vice president of product and strategy at Cerebras. “Building on Sandia’s and Cerebras’ history of record-setting AI and HPC performance and award-winning research, we look forward to seeing how this new, powerful cluster will enable Sandia researchers to uncover new breakthroughs across science, energy, national security and more.”

The system includes Cerebras’ Wafer Scale Engine, which has proven to be a novel alternative to traditional accelerators that have been used for AI. It employs many industry standard semiconductor manufacturing processes also used for general purpose CPUs and accelerators.

Cerebras WSE-3 positioned to show size comparison to that of a dinner plate.
Cerebras WSE-3 positioned to show size comparison to that of a dinner plate. Image provided by Cerebras.
Fabrication begins with, a dinner plate sized piece of silicon, a wafer, and goes through a complex series of manufacturing steps. The process uses optical lithography and other methods to etch a large number of transistors, cores, and individual processor ‘dies’ on the wafer. In a more traditional process, the wafer is chopped up into individual dies that are destined to become smaller packaged processors, like those in a laptop or cell phone. In the Cerebras process, however, the wafer remains intact.

The result is a wafer-scale engine, WSE-3, that contains 900,000 processors built for AI+HPC that are tightly integrated with each other and located close to high-performance on-wafer SRAM memory. This novel approach allows for extremely high-performance compute in a single chip – and even greater in a cluster – with extremely fast communication and high memory bandwidth.

Sandia has initiated several Generative AI projects to develop capabilities for science and engineering use cases in the national security domain. Siva Rajamanickam, PI of the new BANYAN Institute focused on Generative AI said “We are excited to have this new system at Sandia. It allows us to evaluate training and finetuning of large multimodal models for our mission. We plan to explore the accuracy and scalability of model training, productivity of model development and understand the energy and power consumption of training workloads.”

This machine arrives as DOE continues to work on its Frontiers in Artificial Intelligence for Science, Security, and Technology initiative. By integrating this state-of-the-art architecture, Sandia not only enhances their current capabilities but also lays the groundwork for pioneering advancements in AI that will support the DOE’s broader strategic objectives.

“The deployment of the Cerebras CS-3 system at Sandia is a significant milestone in our journey to lead in AI and machine learning innovation,” said Jen Gaudioso, director of the ASC Program at Sandia. “This advanced testbed aligns perfectly with the DOE’s FASST initiative, enabling us to explore and develop cutting-edge AI technologies that are crucial for future national security missions.”

Although there is growing excitement for the potential of AI to impact the NNSA mission, modeling and simulation is and will remain critical.

“While CS-3 is designed for AI, Kingfisher will also be used to explore traditional modeling and simulation workloads. Sandia has led a Tri-lab effort under the Advanced Memory Technology program to explore the feasibility of using future versions of the Cerebras Wafer Scale Engine architecture for a combination of Mod-Sim and AI workloads,” said James H. Laros III distinguished member of technical staff and AMT program lead at Sandia.

The machine was installed in October 2024 and has already started on the path of exploration and innovation, made possible through the important collaborations of national laboratories with industry. Gaudioso stated, “This partnership with Cerebras Systems exemplifies our commitment to pushing the boundaries of what is possible in AI research and development.”

1779138881476.png



From left to right Thuc Hoang, Ann Gentile, Andrew Younge, Si Hammond, James Laros, and Kevin Stroup standing beside Kingfisher, the new Cerebras CS-3 system installed at Sandia National Labs. Photo provided by Kevin Stroup.
About NNSA: Established by Congress in 2000, NNSA is a semi-autonomous agency within the U.S. Department of Energy responsible for enhancing national security through the military application of nuclear science. NNSA maintains and enhances the safety, security, and effectiveness of the U.S. nuclear weapons stockpile; works to reduce the global danger from weapons of mass destruction; provides the U.S. Navy with safe and militarily effective nuclear propulsion; and responds to nuclear and radiological emergencies in the United States and abroad.


November 12, 2024

 
Back
Top