Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/index.php?threads/will-the-chinese-deepseek-ai-upset-the-ai-ml-race.21768/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Will the Chinese DeepSeek AI upset the AI/ML race?

Arthur Hanson

Well-known member
The Chinese claim their Deepseek AI will run on the lower cost H100 Nvidia chip which is not restricted. It looks like they used Meta Lama and GPT4 to develop this software which would be a service violation. The Chinese developed this software for a cost of only five million. Sam Altman said this could be a threat to the current ecosystems. The Chinese said this is open-source software. Any thoughts or comments would be appreciated. I pulled this information from CNBC today. Any thoughts or comments appreciated.
 
Two things that jump out at me:
* Their paper on their new MoE (mixture of experts) model compares against the leading dense models like OpenAI or Llama, not against other MoE models like Mistral Mixtral. I suspect that there are other more reasoning focused benchmarks where GPT-4o would greatly outpace Deepseek’s model.
* Seems like they employ a bunch of techniques that could be rolled back into most any companies similar models, which would be good for the power usage profile of the entire industry.

https://www.tomshardware.com/tech-i...ptimizations-highlight-limits-of-us-sanctions
 
Mixture of Experts type models are really interesting, there is an analogy to chiplets, where instead of doing everything on a single monolithic chip, you break off different functions to do different things. It's also more similar to hour our own nervous system works where there are specialized nerves and neurons that have different functions.
 
Any thoughts or comments would be appreciated. I pulled this information from CNBC today. Any thoughts or comments appreciated.
Our son’s quick assessment based on the Deepmind AI. Rest assured that NVIDIA will leverage all they can from this new open source model.

 
With the stock market blow up on this, any additional comments on its future or who the competition will be would be greatly appreciated; Thanks.
 
Competition is what technology is all about, absolutely...


1737993948839.png


Jan 27 (Reuters) - Chinese startup DeepSeek was on Monday hit by outages on its website after its AI assistant became the top-rated free application available on Apple's App Store in the United States.

The company resolved issues relating to its application programming interface and users' inability to log in to the website, according to its status page. The outages on Monday were the company's longest in around 90 days and coincides with its sky-rocketing popularity.

 

What is DeepSeek and why is it disrupting the AI sector?​

January 27, 20256:17 AM PSTUpdated 2 hours ago

Illustration shows Deepseek logo and Chinese flag

Deepseek logo and the Chinese flag are seen in this illustration taken January 27, 2025. REUTERS/Dado Ruvic/Illustration

BEIJING, Jan 27 (Reuters) - Chinese startup DeepSeek's launch of its latest AI models, which it says are on a par or better than industry-leading models in the United States at a fraction of the cost, is threatening to upset the technology world order.

The company has attracted attention in global AI circles after writing in a paper last month that the training of DeepSeek-V3 required less than $6 million worth of computing power from Nvidia H800 chips.

 
With the stock market blow up on this, any additional comments on its future or who the competition will be would be greatly appreciated; Thanks.

Hard to exactly put a finger on long term impact. Mistral launched Mixtral, a similar family of open MoE (as opposed to fully dense models) models that gave pretty amazing quality of results about a year ago, though they didn’t do the software / HW optimizations for training on (somewhat) limited resources that DeepSeek did. Not sure why Mixtral hasn’t caught on more.

One other thing I notice - the market is making an apples-to-oranges training cost comparison when it comes to DeepSeek vs. OpenAI/ChatGPT or Llama. We typically see two costs quoted - the total cost of development (experimentation/tuning/final “production”) and the training cost of the last production run. The development cost for something like Llama 3.0/3.1 family is $1B (maybe $700M in GPU), but the final production training might only be $50M in GPU using a standard platform, as compared to $6B for DeepSeek’s highly optimized model-optimized training platforms. Still an impressive reduction, but not as monumental as some seem to think. But we’ll have to see if this forces Trump and Altman to rethink StarGate (which was originally a Microsoft concept in early 2024).

 
Last edited:
Hard to exactly put a finger on long term impact. Mistral launched Mixtral, a similar family of open MoE (as opposed to fully dense models) models that gave pretty amazing quality of results about a year ago, though they didn’t do the software / HW optimizations for training on (somewhat) limited resources that DeepSeek did. Not sure why Mixtral hasn’t caught on more.

One other thing I notice - the market is making an apples-to-oranges training cost comparison when it comes to DeepSeek vs. OpenAI/ChatGPT or Llama. We typically see two costs quoted - the total cost of development (experimentation/tuning/final “production”) and the training cost of the last production run. The development cost for something like Llama 3.0/3.1 family is $1B (maybe $700M in GPU), but the final production training might only be $50M in GPU using a standard platform, as compared to $6B for DeepSeek’s highly optimized model-optimized training platforms. Still an impressive reduction, but not as monumental as some seem to think. But we’ll have to see if this forces Trump and Altman to rethink StarGate (which was originally a Microsoft concept in 2022).
I think this also suggests that it is possible to host high-quality models on the edge. I did a rough calculation, and the potential savings could be significant. This raises the question of whether we truly need substantial capital spending, especially if test-time compute—currently the focus—can be performed on the edge. If so, the need for extensive infrastructure build-out would be reduced.

I am not a fan of the Stargate project. I suspect that, in the interest of speed and cost-effectiveness, they might find it logical to construct fossil fuel-based generation to power the data centers. However, I find it absurd that we are planning to expend such an enormous amount of energy and carbon emissions on trivial activities, such as planning holidays, dinner arrangements, etc. Anyway, that’s just my opinion.
 
I think this also suggests that it is possible to host high-quality models on the edge. I did a rough calculation, and the potential savings could be significant. This raises the question of whether we truly need substantial capital spending, especially if test-time compute—currently the focus—can be performed on the edge. If so, the need for extensive infrastructure build-out would be reduced.

I am not a fan of the Stargate project. I suspect that, in the interest of speed and cost-effectiveness, they might find it logical to construct fossil fuel-based generation to power the data centers. However, I find it absurd that we are planning to expend such an enormous amount of energy and carbon emissions on trivial activities, such as planning holidays, dinner arrangements, etc. Anyway, that’s just my opinion.
Yeah I think there is something to the idea that model training could be on the cloud and model execution could be on the edge. I mean isn't this the point of the AI PC that is currently being hyped up? And if model execution is on the edge, it substantially decreases the amount compute required on the cloud.

It will require a lot of memory to run on the edge... but the case could be made for memory prices to drop by 5x over the next 5-10 years in keeping with historic norms.
 

What is DeepSeek and why is it disrupting the AI sector?​

January 27, 20256:17 AM PSTUpdated 2 hours ago

Illustration shows Deepseek logo and Chinese flag

Deepseek logo and the Chinese flag are seen in this illustration taken January 27, 2025. REUTERS/Dado Ruvic/Illustration

BEIJING, Jan 27 (Reuters) - Chinese startup DeepSeek's launch of its latest AI models, which it says are on a par or better than industry-leading models in the United States at a fraction of the cost, is threatening to upset the technology world order.

The company has attracted attention in global AI circles after writing in a paper last month that the training of DeepSeek-V3 required less than $6 million worth of computing power from Nvidia H800 chips.

I was not so thrilled, though, with DeepSeek's performance relative to ChatGPT. In particular, it will often give answers related to some events involving China with a political spin. On some technical topics as well, not so accurate.

DeepSeek fails to talk about Tiananmen Square.png


DeepSeek answer about Covid origin in Wuhan.png
 
I was not so thrilled, though, with DeepSeek's performance relative to ChatGPT. In particular, it will often give answers related to some events involving China with a political spin. On some technical topics as well, not so accurate.

View attachment 2729

View attachment 2730
I haven’t tried it, but I think you might be able to force a model to output certain information by providing your own context. However, it depends on how they implemented instructions and preferences. Those are technical challenges to do, adding another layer of difficulty—teams working outside of China do not need to deal with this problem.

For R1, what’s valuable is the reasoning capability, which allows you to build something on top of it.
 
I haven’t tried it, but I think you might be able to force a model to output certain information by providing your own context. However, it depends on how they implemented instructions and preferences. Those are technical challenges to do, adding another layer of difficulty—teams working outside of China do not need to deal with this problem.

For R1, what’s valuable is the reasoning capability, which allows you to build something on top of it.
I can't reason with it, it clearly prefers opportunity for propaganda.

DeepSeek question guidance.png
 
Yeah I think there is something to the idea that model training could be on the cloud and model execution could be on the edge. I mean isn't this the point of the AI PC that is currently being hyped up? And if model execution is on the edge, it substantially decreases the amount compute required on the cloud.

It will require a lot of memory to run on the edge... but the case could be made for memory prices to drop by 5x over the next 5-10 years in keeping with historic norms.
For quantised models, it should be fine. Refer to my video:
 

Intel's former CEO says the market is getting DeepSeek wrong after AI chip stock rout​



Pat Gelsinger
@PGelsinger

Wisdom is learning the lessons we thought we already knew. DeepSeek reminds us of three important learnings from computing history:1) Computing obeys the gas law. Making it dramatically cheaper will expand the market for it. The markets are getting it wrong, this will make AI much more broadly deployed.2) Engineering is about constraints. The Chinese engineers had limited resources, and they had to find creative solutions.3) Open Wins. DeepSeek will help reset the increasingly closed world of foundational AI model work. Thank you DeepSeek team.


 
Back
Top