$5 million total is false. It's obvious the people who say it's true know little to nothing about AIDL or software engineering. It's like telling a non runner someone ran a 2 minute 5k. The non runner is oblivious.
A well trained pretrained model (the P in GPT for example) alone costs significantly more than $5 million in R&D. For the deluded people out there, that is just the next word predictor, the model before the model. A multilayered feed forward neural net outputs a probability distribution of the words in its vocabulary based on the previous words. This type of model is not new, has been around for decades, and easily surpasses that fictitious $5 million number.
That Ben Thompson DeepSeek FAQ (post #43 above) addresses that:
Ben Thompson wrote:
I’m not sure I understood any of that.
The key implications of these breakthroughs — and the part you need to understand — only became apparent with V3, which added a new approach to load balancing (further reducing communications overhead) and multi-token prediction in training (further densifying each training step, again reducing overhead): V3 was shockingly cheap to train. DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million.
That seems impossibly low.
DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:
Lastly, we emphasize again the economical training costs of DeepSeek-V3, summarized in Table 1, achieved through our optimized co-design of algorithms, frameworks, and hardware. During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pre- training stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.
So no, you can’t replicate DeepSeek the company for $5.576 million.
I still don’t believe that number.
Actually, the burden of proof is on the doubters, at least once you understand the V3 architecture. Remember that bit about DeepSeekMoE: V3 has 671 billion parameters, but only 37 billion parameters in the active expert are computed per token; this equates to 333.3 billion FLOPs of compute per token. Here I should mention another DeepSeek innovation: while parameters were stored with BF16 or FP32 precision, they were reduced to FP8 precision for calculations; 2048 H800 GPUs have a capacity of 3.97 exoflops, i.e. 3.97 billion billion FLOPS. The training set, meanwhile, consisted of 14.8 trillion tokens; once you do all of the math it becomes apparent that 2.8 million H800 hours is sufficient for training V3. Again, this was just the final run, not the total cost, but it’s a plausible number.
Thank you for sharing that. It's interesting read on where the $5 million is from.
DeepSeek is clear that these costs are only for the final training run
The metric is captivating, one cherry picked for a good go-to-market story or PR stunt, which worked very well.
The metric is basically equal to the compute cost of running a lot of GPUs for 2 months nonstop. But training is not a one-off activity. So it excludes initial training, fine tune training, retraining.. And many day to day costs are not counted, like R&D, data collection/prep, even other essential parts of compute, and all other activities needed to develop AI software, which are not captured in the headlines.
What is really impressive though is that it's apparently a high quality LLM trained with GPUs (H800) half the speed of the top-of-the-line chips (H100). It also looks like there is a major breakthrough with Multi-Head Latent Attention.
To help put that $5 million into perspective, here's an article describing a bit of what is misleading about it from INSAIT, Institute for CS, AI and Tech.
2. Cost of training (compute): the $5-6M cost of training is misleading. It comes from the claim that 2048 H800 cards were used for *one* training, which at market prices is upwards of $5-6M. Developing such a model, however, requires running this training, or some variation of it, many times, and also many other experiments (item 3 below). That makes the cost to be many times above that, not to mention data collection and other things, a process which can be very expensive (why? item 4 below). Also, 2048 H800 cost between $50-100M. The company that deals with DC is owned by a large Chinese investment fund, where there are many times more GPUs than 2048 H800.
Deep Seek is cool and nice. But GPUs spend most of it;s cycles moving data about. We need Electronic Neurons to make things really fast. No commercial chip maker has it. Rumors say Japan Mil has it but won't confirm or answer.
What is Japan Mil? I want to be speculative. Link?
It’s especially bad for nvidia since it implies far less demand for their gpus. I guarantee you that every single FAANG company has contacted DeepSink scientists with sight unseen 7 figure contracts. Knowledge of their methods will spread. Quickly. Reducing demand.
It’s bad for FAANG because training these models will become feasible for MANY mores companies once knowledge becomes widespread.
It might eventually be… really really bad once unaligned models get created and people use it to engineer Very Bad Things a la 12 Monkeys, or Iran uses it to build an H-Bomb.
Right now it’s bad for some companies. And it’s one big step towards the great filter.
1) Why are they making this open source? I know the CEO says that helps him attract the best talent but i'm not necessarily buying that and even if that's true, couldn't he still find enough employees to design this if it wasn't open source?
2) Why does Meta want AIi also to be open source?
3) Why do the others not want it to be open source?
I downloaded Deepseek and asked it to describe what happened in Tianenmen Square in 1989. Response: Sorry that's beyond my current scope. Let's talk about something else.
Is Hu Jintao still alive? Response: Sorry that's beyond my current scope. Let's talk about something else.Ti won't go into it's responses to queries about the Uigers and XinJiang.
This is something I've always wondered. How can a Chinese AI be worth anything if it doesn't have the ability to access all parts of the Internet and give answers to questions asked of it?
— but smaller orgs can still find room to innovate and competez
Thats Open Source spirit, philosophy at work. The Japan Greater East Asian Co-Prosperity Sphere from 1890s passed out free Physics, Chemistry, English/German Dictionary, etc transleted to Chinese, Thai, Korean, etc. to China and East Asian countries occupied by US and Euro Imperialists. N,ow look at what Japan has created, single handedly without any external helpers, in East Asia, created the Asian Economic Miracle. The Open Source idea is not Japanese, Ancient China, Korea, India has they idea before.
I downloaded Deepseek and asked it to describe what happened in Tianenmen Square in 1989. Response: Sorry that's beyond my current scope. Let's talk about something else.
Is Hu Jintao still alive? Response: Sorry that's beyond my current scope. Let's talk about something else.Ti won't go into it's responses to queries about the Uigers and XinJiang.
This is something I've always wondered. How can a Chinese AI be worth anything if it doesn't have the ability to access all parts of the Internet and give answers to questions asked of it?
Deepseek ... DELETE
We have forbidden topics too. Ask an American AI model about race and IQ.
1) Why are they making this open source? I know the CEO says that helps him attract the best talent but i'm not necessarily buying that and even if that's true, couldn't he still find enough employees to design this if it wasn't open source?
2) Why does Meta want AIi also to be open source?
3) Why do the others not want it to be open source?
Deepseek is a side project by a hedge fund. First, that is crazy. I’d love to know if they shorted Nvidia, put Deepseek out there knowing what it would to to Nvidia’s stock, then cleaned up a couple hundred billion on the short? They’d make a ton of money, disrupt the 500 billion dollar StarGate project to benefit China, and slap us in the face.
But why open source it knowing, ultimately, it would help the US along with everyone else? I still don’t get that. They sure aint doing it to be nice.
It is true that the tech community has some deep hatred of the “tech-bros” and most do want open source software to take control out of the bro’s hands. Maybe China “let” them open source it to stop what could have been a monopoly on AI. It is a fight between China and the US after all.
Nvidia has already made up half of what they lost today....because people came to their senses and realized demand is only going to go up. Almost too late to buy low....
1) Why are they making this open source? I know the CEO says that helps him attract the best talent but i'm not necessarily buying that and even if that's true, couldn't he still find enough employees to design this if it wasn't open source?
2) Why does Meta want AIi also to be open source?
3) Why do the others not want it to be open source?
1) Good question. To be clear, only the model is open source, NOT how they created it which is still a mystery. A lot of top talent might now be considering working for them just to learn how they did it. I would if I were in their position. It’s super cool.
2) I think llama models are open source cuz Meta knows they can’t really compete with open ai so they’re grasping at straws to be relevant in the space.
3) others don’t want it to be open source because they think they have a competitive advantage. Also because what they are doing is extremely dangerous in the hands of a bad actor (“ChatGPT tell me how to make a bomb using what I have in my garage.” can be fully answered by an AI not trained to refuse to answer)
my post was more pointing out the dumb DEI complaining the other poster was doing. But still:
I have a hard time understanding the framing of “monopolies” in terms of US tech and AI research. There’s currently intense competition among 5+ US tech firms with many of them deploying $10s of billions of capital per year. Nothing about that is monopolistic. Cutting edge AI research requires huge amounts of capital so it’s naturally going to be done by the biggest companies — but smaller orgs can still find room to innovate and competez
This absolutely is an arms race. The use cases of advanced AI include weapons and war just as the use cases for orbital rockets did in the 50s.
China is quite clear in their goals to break up US hegemony and overtake them. There’s nothing wrong with chip bans and other policies if you think the US staying ahead in the AI race is valuable. It think it is - I think you have to pragmatically assess China as a rival and avoid fairy tales of de-escalation and cooperation.
Lastly, I don’t see the evidence for the “Chinese model” being better than the US research system. Certainly DeepSeek is impressive and close to parity, but the bulk of key advancements have still been made by American research teams. They’re certainly competitive but saying “this is a sign the US needs to rethink its whole economic structure” is hyperbole.
Point taken about the continued existence of a degree of competition within US tech. But I'm far from alone in arguing that they are still too big AND too shareholder-oriented, which remains a recipe for rent-seeking and highly destabilizing investment bubbles driven by hype ahead of, or out of proportion to, the potential pay-off in terms of innovation. This is a thing no one should need to be reminded of after this week in particular, if they are too young to recall the late 1990s or the financial crash in 2008 (different industry and product, but the same logic at play). Markets will now gyrate wildly around AI investment for a while, which is hardly the most rational way to fund research and innovation in a field that is supposedly SO vital to future economic development.
But then there's the question of whether it actually IS vital and not just more hype designed to attract investment. This where a system that actually planned large scale investment around some conception of proven future human need, rather than vague ideas about what today's consumers may want, or may be convinced to want, would be far superior. I'm not saying China has that kind of system, but we know what they have is far closer to it than what the US currently has. In short, they don't have a class of massively wealthy private investors interested primarily in short term gains dictating large scale industrial investment from month to month (and sometimes from minute to minute!).
And my point about an "arms race" is that today, unlike the 1950s, no one is under the illusion that actual warfare between two global economic powers is desirable, or even thinkable, really. Even proxy wars using modern technology are unimaginably horrific and feared by any reasonable person today. You're undoubtedly right that AI will have significant military applications, but this will have very little to do with actually planning for war and everything to do with these giant companies rent-seeking through lucrative government contracts. Just like automobile and aeronautics companies before them, these companies just want fat cost-plus contracts in perpetuity, and probably fear the actual use of the weapons they help create as much as any of us. And of course the point is that this amounts to a very wasteful form of international competition, when it isn't actually catastrophic for settled human life.
Good post. I’m less confident that tensions between China and USA will avoid a hot war in the next 20 year. I hope you’re right…
1) Why are they making this open source? I know the CEO says that helps him attract the best talent but i'm not necessarily buying that and even if that's true, couldn't he still find enough employees to design this if it wasn't open source?
2) Why does Meta want AIi also to be open source?
3) Why do the others not want it to be open source?
I think companies have decided on two main strategies:
1) build the best model, make money by selling it to everyone else to build products on top of. This assumes that someone can train a much better model than anyone else with suitable expertise and capital. Model is the moat. Needs to be kept secret.
2) build products using the best AI model. If you can train a “good enough” model and give it away free. It will hurt your competitors who want to charge for their model. Instead, you sell products that use the model. “Wrappers”
Im not sure DeepSeek wants to do 2 but giving away their model also hurts their competitors. Likely they realize they’re going to have a hard time monetizing their model in the West so sowing chaos is the next best option.
Because it cut into the profits. The stock market has built up an AI bubble in the last couple years. All of a sudden we are facing a reality check, one dollar is now worth a dime
I still think Trump coin is a better investment than AI.
As I expected, the Nasdaq gained back most of its losses today. People had time to think about, what were really expecting from this AI stuff.
It has impressed many with its ability assemble words into coherent-seeming articles that upon inspection are flat-out wrong.
it can draw pictures approximating what a person tries to tell it to draw, but they are soulless clutter, compared to what an average comic artist could generate in the same few seconds.
I have a hunch this is all kabuki theater to ramp up the hype. But at the end of the day, Nvidia still makes the best gpu's, and they were always for frivolous purposes (games) anyhow
As I expected, the Nasdaq gained back most of its losses today. People had time to think about, what were really expecting from this AI stuff.
It has impressed many with its ability assemble words into coherent-seeming articles that upon inspection are flat-out wrong.
it can draw pictures approximating what a person tries to tell it to draw, but they are soulless clutter, compared to what an average comic artist could generate in the same few seconds.
I have a hunch this is all kabuki theater to ramp up the hype. But at the end of the day, Nvidia still makes the best gpu's, and they were always for frivolous purposes (games) anyhow
Bad take bro. AI makes mistakes. So do humans. Both are useful.
Google was largely Sun Microsystems people form Stanford and Cal when it started. Most of Solaris & Sun OS was open source. The CPUs then were 1/1000 the speed, and only one thread. Yet Suns would run for years w/o having to be rebooted. You only powered down to add mem or h/w boards.
1) Why are they making this open source? I know the CEO says that helps him attract the best talent but i'm not necessarily buying that and even if that's true, couldn't he still find enough employees to design this if it wasn't open source?
2) Why does Meta want AIi also to be open source?
3) Why do the others not want it to be open source?
OpenAI wants to sell enterprise subscriptions with Oracle and Microsoft using ASI and allow for mass layoffs. Many AI researchers see this and want to level the playing field/commoditize their complements as to not have competition in their main sectors. For Deepseek their parent company High-Flyer has been investing in compute for high frequency trading using machine learning based strategies and scaling their supercomputers. From 2018 to 2024 they have outperformed the CSI (The CSI 300 (Chinese: 沪深300) is a capitalization-weighted stock market index designed to replicate the performance of the top 300 stocks traded on the Shanghai Stock Exchange and the Shenzhen Stock Exchange). Deepseek began as a subsidiary research organization that the CEO likens to Deepmind (CEO is Demis Hassabis, Parent Company Google, AlphaGo, AlphaZero, Broadly Deep Reinforcement Learning Research, AlphaFold, Robotics, ++). Meta has Facebook, Instagram, Oculus, ++ that stand to benefit from powerful AI for consumer use, but also for things like advertisement targetting. Meta will thus use the research spoils in their high volume plays and then weaken rising competitors and establish a larger market share in AI as they integrate it vertically throughout their ecosystems. Open source releases also drives interest of talented developers who want to have a larger impact (who doesn't want to be the Linux Torvalds or Dennis Ritchie of their era?) and create tooling for the developer environment like PyTorch from time to time. Thus Llama 1, 2, and 3 are dropped. OpenAI, Microsoft and Oracle need it to be propietary so that they can add you to their subscription pool. Meta and Deepseek don't care and can supply the compute.
If Deepseek is truly open-source, how can developers verify and adapt it safely without privacy risks?
You clone the repository from github.com in San Francisco or huggingface.com in Brooklyn NYC. Make your changes. Check in yor changes. Deek Seek will then look at your changes and accept them or not. In 2018, Microsoft booght github so many in the Open Source Community switched to little known HuggingFace.com.
As I expected, the Nasdaq gained back most of its losses today. People had time to think about, what were really expecting from this AI stuff.
It has impressed many with its ability assemble words into coherent-seeming articles that upon inspection are flat-out wrong.
it can draw pictures approximating what a person tries to tell it to draw, but they are soulless clutter, compared to what an average comic artist could generate in the same few seconds.
I have a hunch this is all kabuki theater to ramp up the hype. But at the end of the day, Nvidia still makes the best gpu's, and they were always for frivolous purposes (games) anyhow
Bad take bro. AI makes mistakes. So do humans. Both are useful.
You know nothing. You are among the few posters here more useless than an AI.
Noone wants you around, and you're still too dumb to realize it. Been around long enough that you must be at least near 30, so no real excuse there. Shoulda grown up by now.
I've been programming since probably before you were born. AI is nothing, just a fad, a marketing gimmick. It's been around for decades, and all honest programmers know what it can't do.
Help us build the best running shoe review site for a chance to win a LetsRun t-shirt.Help us build the best running shoe review site for a chance to win one of 10 LetsRun t-shirts.