Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/chinese-tech-giants-reveal-how-they%E2%80%99re-dealing-with-u-s-chip-curbs-to-stay-in-the-ai-race.22946/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Chinese tech giants reveal how they’re dealing with U.S. chip curbs to stay in the AI race

Daniel Nenni

Admin
Staff member
Key Points
  • - Chinese tech giants Tencent and Baidu revealed how they’re keeping in the global artificial intelligence race even as the U.S. tightens some curbs on key semiconductors.
  • - Methods include stockpiling chips, making AI models more efficient and even using homegrown semiconductors.
  • - Washington has continued to restrict China’s access to Nvidia and AMD chips for AI.
China and Usa flag on pub board background 3d rendering

Niphon | Istock | Getty Images

Tencent and Baidu, two of China’s largest technology companies, revealed how they’re keeping in the global artificial intelligence race even as the U.S. tightens some curbs on key semiconductors.

The business’ methods include stockpiling chips, making AI models more efficient and even using homegrown semiconductors.

While the administration of U.S. President Donald Trump scrapped one controversial Biden-era chip rule, it still tightened exports of some semiconductors from companies including Nvidia and AMD in April.

Big names in the sector addressed the issue during their latest earnings conference calls.

Martin Lau, president of Tencent — the operator of China’s biggest messaging app WeChat — said his company has a “pretty strong stockpile” of chips that it has previously purchased. He was referring to graphics processing units (GPUs), a type of semiconductor that has become the gold standard for training huge AI models.

These models require powerful computing power supplied by GPUs to process high volumes of data.

But, Lau said, contrary to American companies’ belief that GPU clusters need to expand to create more advanced AI, Tencent is able to achieve good training results with a smaller group of such chips.

“That actually sort of helped us to look at our existing inventory of high-end chips and say, we should have enough high-end chips to continue our training of models for a few more generations going forward,” Lau said.

Regarding inferencing — the process of actually carrying out an AI task rather than just training — Lau said Tencent is using “software optimization” to improve efficiency, in order to deploy the same amount of GPUs to execute a particular function.

Lau added the company is also looking into using smaller models that don’t require such large computing power. Tencent also said it can make use of custom-designed chips and semiconductors currently available in China.

“I think there are a lot of ways [in] which we can fulfill the expanding and growing inference needs, and we just need to sort of keep exploring these venues and spend probably more time on the software side, rather than just brute force buying GPUs,” Lau said.

Baidu’s approach
Baidu, China’s biggest search company, touted what it calls its “full-stack” capabilities — the combination of its cloud computing infrastructure, AI models and the actual applications based on those models, such as its ERNIE chatbot.

“Even without access to the most advanced chips, our unique full stack AI capabilities enable us to build strong applications and deliver meaningful value,” Dou Shen, president of Baidu’s AI cloud business, said on the company’s earnings call this week.

Baidu also touted software optimization and the ability to bring down the cost of running its models, because it owns much of the technology in that stack. Baidu management also spoke about efficiencies that allow it to get more out of the GPUs it possesses.

“With foundation models driving up the need for a massive computing power, the abilities to build and manage large scale GPU clusters and to utilize GPUs effectively has become key competitive advantages,” Shen said.

The Baidu executive also touted the progress made by domestic Chinese technology firms in AI semiconductors, a move he said would help mitigate the impact of U.S. chip curbs.

“Domestically developed self-sufficient chips, along with [an] increasingly efficient home-grown software stack, will jointly form a strong foundation for long-term innovation in China’s AI ecosystem,” Shen said.

China domestic chip focus
China has been ramping up development of chips designed and manufactured on its home soil for the last few years. Most experts agree that Beijing remains overall behind the U.S. in the realm of GPUs and AI chips, but there have been some advances.

Gaurav Gupta, an analyst covering semiconductors at Gartner, said stockpiling is one way Chinese companies are dealing with export restrictions. Additionally, there has been some progress made in semiconductor technology in China, even if it remains behind the U.S., Gupta added.

“China has also been developing its own domestic semiconductor ecosystem, all the way from materials to equipment to chips and packaging. Different segments have made varying levels of progress, but China has been surprisingly extremely consistent and ambitious in this goal, and one must admit that they have achieved decent success,” Gupta told CNBC by email.

“This provides an avenue for them to procure AI chips, which perhaps can’t compete with those from the U.S chip leaders but continue to make progress.”
Many U.S. executives have urged Washington to scrap export restrictions in light of China’s progress. Nvidia CEO Jensen Huang called the curbs a “failure” this week, saying they are doing more damage to American businesses than to China.

 
I'm curious to see how far they can go with the 'make AI more efficient' route. Deepseek-R1 already demonstrated an order of magnitude less RAM usage for some use cases with similar or better results than other models. (On the flip side, Nvidia seems to be improving efficiency by >10x per year for energy required per token).

I imagine we're still early enough in the AI cycle there is a lot of room for more efficiency on the software side.
 
I'm curious to see how far they can go with the 'make AI more efficient' route. Deepseek-R1 already demonstrated an order of magnitude less RAM usage for some use cases with similar or better results than other models. (On the flip side, Nvidia seems to be improving efficiency by >10x per year for energy required per token).

I imagine we're still early enough in the AI cycle there is a lot of room for more efficiency on the software side.
one main factor is that for the strategic reason, they can ignore or at least treat cost/power as a secondary considerations, while US companies need to quickly consider scale and business impact.
 
I'm curious to see how far they can go with the 'make AI more efficient' route. Deepseek-R1 already demonstrated an order of magnitude less RAM usage for some use cases with similar or better results than other models. (On the flip side, Nvidia seems to be improving efficiency by >10x per year for energy required per token).

I imagine we're still early enough in the AI cycle there is a lot of room for more efficiency on the software side.

Deepseek was a great accomplishment but AI really is an arms race so why would China allow that to be so public? And what AI advancements have we not heard about? Only the paranoid survive, right?

I remember being read in on top secret compute programs at Lockheed and Lawrence Livermore Labs in the 1980s. I now and again see movies that remind me of the stuff we worked on but for the most part secrets were kept. You know, those big windowless buildings with heavily armed guards in the lobbies? We did not do Zoom briefings back then.

Are secrets still kept or is everything on social media now? Would there even be US semiconductor and AI curbs against China if not for all of the back and forth bragging?
 
Deepseek was a great accomplishment but AI really is an arms race so why would China allow that to be so public? And what AI advancements have we not heard about? Only the paranoid survive, right?

I remember being read in on top secret compute programs at Lockheed and Lawrence Livermore Labs in the 1980s. I now and again see movies that remind me of the stuff we worked on but for the most part secrets were kept. You know, those big windowless buildings with heavily armed guards in the lobbies? We did not do Zoom briefings back then.

Are secrets still kept or is everything on social media now? Would there even be US semiconductor and AI curbs against China if not for all of the back and forth bragging?
There is a book that covers a few interesting classified Lockheed Martin Space Programs of the 80s - revealing information that coworkers/friends of mine never knew about programs they worked on (California based). Once I remember the book I'll reply here - you might find it interesting.

Re: China and AI - There are three arguments I can think of for sharing the sauce on how Deepseek-R1 was made:

1. It gets people outside of China using Deepseek-R1 -- that can benefit China in both "well intended" and "less well intended" ways. FOMO is strong in the AI field.

2. It's a national prestige thing: "look we're doing this better than the West"

3. China may think it can do better in a level playing field than other countries. They can certainly throw resources around (especially energy - no fear of nuclear), and if they have less access to advanced hardware than we do -- lightweight software can negate some of that hardware advantage. By sharing the work, it'll encourage other AI engineers/scientists to go even further, allowing China to benefit from that work too.

AI seems to be (globally) in a spot where the main secrets are about how it's used, not how it works.. but I'm sure there are exceptions.
 
Re: China and AI - There are three arguments I can think of for sharing the sauce on how Deepseek-R1 was made:

1. It gets people outside of China using Deepseek-R1 -- that can benefit China in both "well intended" and "less well intended" ways. FOMO is strong in the AI field.

2. It's a national prestige thing: "look we're doing this better than the West"

3. China may think it can do better in a level playing field than other countries. They can certainly throw resources around (especially energy - no fear of nuclear), and if they have less access to advanced hardware than we do -- lightweight software can negate some of that hardware advantage. By sharing the work, it'll encourage other AI engineers/scientists to go even further, allowing China to benefit from that work too.
Great answers - I was just getting ready to offer these three points that line up with your specifics. Why battle in open in China:

1) PR and negotiation power - China and more specifically DeepSeek have a real brain trust in AI - they want to show it off to demonstrate that Trump’s alleged controls are futile. This means openly showcasing the talent and accomplishments. But they don’t openly show the pitfalls - that DeepChip’s numbers are somewhat misleading and that the stack they created had to be meticulously hand coded at a low level to work for their exact data center configuration. Just like Huawei/HiSilicon/SMIC claims to have a 3nm process underway, without reference to cost, yield or limited capacity of the predecessor 5nm and 7nm processes.

2) Velocity - open source and open models (weights) have been the key to most of the market almost catching up with proprietary models.

3) Scale - there would also be a scale problem in China if things weren’t open - the single national champion model is doomed to failure, when solutions have to be diffused throughout many hundreds of companies to achieve max value.
 
There is a book that covers a few interesting classified Lockheed Martin Space Programs of the 80s - revealing information that coworkers/friends of mine never knew about programs they worked on (California based). Once I remember the book I'll reply here - you might find it interesting.

This was the Strategic Defense Initiative during the Ronald Regan era. We called them Star Wars programs. Definitely let me know about the book. Tom Clancy had some good inside sources, we all read his books to reminisce. :cool:
 
Are secrets still kept or is everything on social media now?

One other thought - a lot has changed since I was doing chip and system design. Virtually everything has gone from vertically integrated to ecosystem. When I started at Burroughs doing mainframe design, we were the first generation to outsource the chips (Motorola ECL MCA 1 arrays) and the boards (multiwire). EDA and software OS and apps were still deeply internal. And there were very few standards for connectivity - Burroughs used EBCDIC, not ASCII, and Burroughs mainframes were mainly limited to using Burroughs terminals. No internet of TCP/IP yet, but we could connect to the mainframes in Scotland or SoCal from Paoli PA via proprietary networking.

But we're in a very different ecosystem-based world today. Even IBM, Intel, NVIDA, Amazon, TSMC and Google survive by being part of ecosystems or creating ecosystems.
 
I'm curious to see how far they can go with the 'make AI more efficient' route. Deepseek-R1 already demonstrated an order of magnitude less RAM usage for some use cases with similar or better results than other models. (On the flip side, Nvidia seems to be improving efficiency by >10x per year for energy required per token).

I imagine we're still early enough in the AI cycle there is a lot of room for more efficiency on the software side.

How many of "AI" people actually took basic computer science?

To me it makes an impression that they missed Memory Management 101
 
How many of "AI" people actually took basic computer science?

To me it makes an impression that they missed Memory Management 101
I think you're being the simpleton here - Memory Management 101 isn't applicable to GPUs/TPUs running inference that requires 1/4 T parameters in the model in GPU/TPU memory at one time. There's no notion of single stream locality to leverage - it's massively parallel, so very different system-level memory management is required, both to support in-memory models, sharing of models across users, distribution of models across GPUs/TPUs, and the provisioning/swapping in/out of models. Plus the best memory management approached for a convolutional neural network model can be very different than the best approach for transformer models. Thinking you need to take AI Memory Management 101:


But to understand that, you have to understand how transformer models work for LLMs, plus how to optimize them for speed and latency (as DeepSeek did), using multi-headed latent attention (MLA) with disaggregated prefill and decode.

 
Last edited:
I think you're being the simpleton here - Memory Management 101 isn't applicable to GPUs/TPUs running inference that requires 1/4 T parameters in the model in GPU/TPU memory at one time. There's no notion of single stream locality to leverage - it's massively parallel, so very different system-level memory management is required, both to support in-memory models, sharing of models across users, distribution of models across GPUs/TPUs, and the provisioning/swapping in/out of models. Plus the best memory management approached for a convolutional neural network model can be very different than the best approach for transformer models. Thinking you need to take AI Memory Management 101:


And thus, they should have found an algorithmic way to reduce the need for memory locality, and I would not believe there is no way for that.
 
And thus, they should have found an algorithmic way to reduce the need for memory locality, and I would not believe there is no way for that.
Not sure if you are suggesting they "they should have found an algorithmic way to reduce the need for memory locality," or "local memory". The first implies that they need to find approaches very different than Memory Management 101 demand paged memory management, which is premised on serial execution and memory locality of code and data. That's happening with DeepSeek, vLLM and NVIDIA Dynamo. The second is an impossibility given that parameter memory is integral to the datapath engines created for LLM inference.
 
How many of "AI" people actually took basic computer science?

To me it makes an impression that they missed Memory Management 101
lol - you're 100% spot on.

Mini-rant: It's been interesting watching software developers find new ways to be less efficient over the decades. JAVA made a Pentium 3 look very slow when at the time it was crushing most software, Python brings the responsiveness of 8-bit machines to modern PCs..
 
lol - you're 100% spot on.

Mini-rant: It's been interesting watching software developers find new ways to be less efficient over the decades. JAVA made a Pentium 3 look very slow when at the time it was crushing most software, Python brings the responsiveness of 8-bit machines to modern PCs..
Confession - I despise memory-safe languages, which use runtime services to manage memory for applications by allocation and deallocation out of heaps, as opposed to more optimized data structures for individual applications, and use garbage collection to gather up no longer needed memory spaces and return them to the heaps. These memory management services add serialization and processing overhead. And it's not just programming languages for non-CS experts like Python and Java, it's the latest mainstream performance languages with parallel processing constructs like Go and Rust. 🤮 The argument is that compute and memory resources are a lot cheaper than humans debugging memory corruption problems. Yeah, I'm a dinosaur. Applications should be written to manage their own memory.
 
lol - you're 100% spot on.

Mini-rant: It's been interesting watching software developers find new ways to be less efficient over the decades. JAVA made a Pentium 3 look very slow when at the time it was crushing most software, Python brings the responsiveness of 8-bit machines to modern PCs..
this is from an MIT lecture

1749132037690.png
 
Back
Top