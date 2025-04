LukeTbk said: The total cost of everything they did before that final training pure electricty-gpu renting market price of 6 millions was never claimed to be cheap Click to expand...

Right. I went straight to the article's cited 'extensive analysis'. The analysis too merely tries to make a point of this.The analysis:- Posits that they also had access to tens of thousands of H100s, without any evidence. Not even mentioning the Singapore speculation, literally just expecting the reader to to along with the claims of 'we believe' and showing a table of entirely speculated GPU expenditure. The WCCTech article just rides along, uncritically, showing the same table and claims.- They do correctly point out that H800s are no longer available officially for purchase from China. The US put up this restriction around early 2024. Of course this means nothing for already purchased units.- Combining the total infrastructure costs as if it's some gotcha when it was never part of the original claimed training figure for v3.- Fails to offer any technical rebuttal to the claim Deepseek makes about using a lower level programming language to get higher bandwidth out of their H800 arrays (not possible when using the standard CUDA which competing companies use). Instead the article ignores this key point throughout.- Shows chart of OpenAI and Meta lowering their API pricing over time, which is not like-for-like since the relevant news on Deepseek pricing is about self-hosted cost. They could have made a chart of only open weight models and the cost to run over time but it probably wouldn't have illustrated the point they were trying to make (which is that OpenAI has lowered their pricing by n factor in the past, so it's not 'news', a point that Anthropic's CEO argued in a blog post that the article also cites near the end).- They claim Deepseek is being misleading about comparative performance to OpenAI's o1, basing this on a single chart in their whitepaper, saying they didn't show benchmark results that R1 didn't achieve the highest scores with. This is misleading on the critiquer's part as on their Github readme they have a more extensive chart of comparisons, where R1 isn't always highest scoring (OpenAI's o1 is in fact shown to top 1/4 of the tests) but they don't mention this.- It segues into touching on OpenAI's provided o3 benchmarks, to show the higher results than o1 or R1. However the author both doesn't mention that o3 hadn't even been released yet at the time of the paper they're discussing nor even now has had its primary model released. I'm surprised the author thinks this is a relevant point, which I feel should be obvious to a writer that it would diminish the benefit of the doubt that the analysis is being even-handed.- Google's Gemini 2.0 Flash Thinking is brought up as an example of a reasoning model that didn't make the same big news splash as R1 despite being capable (of course ignoring it's not open weight nor the efficiency claims of R1 the article is meant to be debunking). They then inexplicably compare Deepseek v3 to Gemini 1.5 Pro in a price-to-performance graph, neither of models being discussed.- The crux of their weekly substantiated argument and which they bury in the article, is Deepseek is said to have not disclosed their compute hardware for generating synthetic data for training, which the author believes is a smoking gun for higher hardware requirements (but still not addressing the counterpoint of Deepseek achieving higher than normally achievable bandwidth due to their use of PLX).I would have preferred a more robust critique.