DeepSeek Reader
DeepSeek is exploding onto the AI scene and already having a massive impact on AI labs, investors, business models, forecasts, and more. Here’s your overview and reading list.
In the January 12th issue, in the Algorithms section, I linked this piece about DeepSeek-V3 and how a “Chinese AI startup outpaces tech giants in cost and performance.” Back then the hook was that the “training process was completed at a total cost of around $5.57 million.” Very impressive from a frontier model. In late December, some people were already calling it The best Open-source LLM.
Last week they announced DeepSeek-R1, claiming “Performance on par with OpenAI-o1.” That claim has since been confirmed left and right through extensive testing and everyone has been going bonkers. Monday morning I happened on a few articles on the topic, Nasdaq fell 3.1%, Nvidia fell 16.9% (the King Kong of GPU (Graphics Processing Unit) chips used to train LLMs), I decided to throw everything in a file for an email to members. Tuesday morning I had a glance at my newsletters folder, just in case there was something new before sitting down to write and… half the new issues were about the DeepSeek earthquake. You should be getting this Wednesday morning, hopefully it’s still useful by then!
In summary: A Chinese edge fund’s side project, stemming from a trade-picking algorithm, was trained for one to two orders of magnitude less money and GPUs than the best American models, kind of vaporising greatly affecting their business model and valuation. Why?
- Besides the training cost, DeepSeek open sourced the whole thing, including open weights, making it easier to use on your own, much more replicable, and somewhat easier to audit.
- Open source has thus largely caught up to closed source models, which means you can run your own version locally. The full version on some expensive hardware, some smaller versions on a recent laptop with a powerful GPU (for example any Mac with an M1 processor, ideally M2 and up).
- Within days, HuggingFace (kind of the Github of LLMs), “the AI community building the future,” published the code for a “fully open reproduction of DeepSeek-R1” to “build the missing pieces of the R1 pipeline such that everybody can reproduce and build on top of it.” That one is more technical, but it means that very quickly thousands of organisations have been enabled to build their own reasoning (breaking problems into constituent parts in order to tackle them more effectively) AI models, roughly on par with the second best of OpenAI and Google.
- The DeepSeek chatbot doesn’t want to answer certain questions about China, it’s “Chinese politically correct.” That brings a whole other set of questions and lines of investigation depending on which platform becomes popular, but so far it seems that local installs of their open source models do answer questions about China correctly.
There’s a reading list below, with lots of mostly well-informed reflections to get caught up further. For this post, I’d like to focus on an international component that I’m curious about.
I haven’t been following AI outside of the USA, China, Europe, and Canada to have a valuable opinion on the rest of the world, but for the latter two, this seems like a pivotal moment—for Americans too of course, but in a different way reflected in most pieces on the topic. Not that long ago, Canada seemed well on its way to a strong place in AI but, thanks in part to much much shallower pockets, we’ve fallen behind on most every metric you might think of. Europe has done a bit better, especially France with Mistral, but is also very much an also ran.
It’s more complex than that, of course, but basically China used a massive amount of engineering talent and the “forcing function” of US restrictions to come up with innovative and much more frugal answers in the quest for the best AI. Europe and Canada, as is usually our wont, tried to copy the US while keeping social goals (ethics, bias, fears of killer robots) in mind. We couldn’t “scale the money” as fast and didn’t consider using other approaches like the Chinese were pushed into. Too bad.
What will this moment mean for Silicon Valley? Likely a very very quick reassessment of their way of developing and applying resources. More interestingly to me, how will Europe, Canada, and the rest of the world react? We might be seeing a kind of reset of the AI race, where it’s possible for companies but also countries to catch up and come up with a new path forward. Like there are multiple flavours of Linux, there will be multiple flavours of LLAMA/DeepSeek, but will they only be led by companies, or will national and international strategies emerge? For example, could Canada or Europe or any group of countries fork DeepSeek and give tax breaks for engineer salaries to work on that open source project? Would it be useful?
I don’t have an answer yet, no one does, but if you thought a handful of VC-funded companies advancing AI made for a fascinating spectator sport mixed in with massive societal impacts, we might have just hit the afterburners. For better or for worse.
Pay attention to
- Chances are, for consumers at least, this will just be the equivalent of TikTok v Instagram, in other words two products competing. I think how people take the open source version and “run with it,” independent of DeepSeek and whatever their intentions might be, is something to pay more attention to.
- As stated above: how countries and industries other than the US react to both China’s newly (re)proven strength and how do they invest, or not, in open source.
- Chinese bias in opinions. Some seem quite pro China (see Wardley below) and some immediately focus on national security (parts of Newton below). I’m not only talking about considering bias, but also of specifically observing the positions and further positioning of everyone on that specific axis.
- This by Martin Spindler on LinkedIn regarding computing power needs: “Pre-Training and Inference are very different beasts - worth keeping this in mind as we watch how this plays out. With advanced models, differentiation can likely come down to inferencing infrastructure and data access.” In other words, models being cheaper to train doesn’t mean there won’t be loads of compute needed elsewhere.
- Speaking of which, read up on Jevons paradox before you make too many assumptions about those massive computing and energy investments going down (Martin above also mentioned this). Half the training cost doesn’t mean half the energy, it likely means twice as many people training models.
Reading list
→ Four big reasons to worry about DeepSeek (and four reasons to calm down). I flew over the various developments but you should have a read at this much more detailed recap by Casey Newton. Big caveat though on his view that open source projects “reverse engineer” or “rip off” the big AI labs. Open source is not a pale, copied, comparison of closed source. At the very least it’s a two way street.
To many prominent voices in AI, DeepSeek seems to have confirmed what they already believed. To AI skeptics, who believe that AI costs are so high that they will never be recouped, DeepSeek’s success is evidence of Silicon Valley waste and hubris. To AI bulls, who think America needs to build artificial general intelligence before anyone else as a matter of national security, DeepSeek is a dire warning to move faster. And to AI safety researchers, who have long feared that framing AI as a race would increase the risk of out-of-control AI systems doing catastrophic harm, DeepSeek is the nightmare that they have been waiting for.
→ DeepSeek: everything you need to know right now. Excellent, more technical overview by Azeem Azhar. Among many good quotes, I’ll pull out this one on the speed, when a “system can process 250 tokens per second.”
For instance, consider a network of a hundred high-end reasoning models operating in both collaborative and adversarial modes. They could tackle complex challenges—ranging from climate modeling to financial market simulations—by pooling diverse perspectives, cross-verifying each other’s results and iterating on solutions collectively.
Venkatesh Rao wasn’t talking about the same thing, but the hundreds of collaborating models above reminded me of his Massed Muddler Intelligence piece. That was almost a year ago but still a good read.
→ DeepSeek FAQ. Yet more technical, this one by Ben Thompson is an excellent explainer and perhaps my favorite read of the lot. Also note that his evaluation of impacts, on Meta and Google for example, does not align with Azhar above showing, that everyone is still figuring this one out.
Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.
→ The ‘Chinese Sputnik’ that demands our attention. It’s behind a paywall and Azhar is not thinking only of DeepSeek in his post, but I believe the following is a valid lens to consider.
In our current fractious geopolitical environment, these collectively amount of a Sputnik moment. DeepSeek is a product of China’s entrepreneurial ecosystem, demonstrating vibrancy and ingenuity. The elegance of the approach, more refined than brute force, ought to be a wake-up call for US labs following a ‘muscle-car’ strategy.
→ When people say “China’s DeepSeek Bombshell”. I don’t agree with everything Wardley says in his LinkedIn post, but I think he’s largely directionally correct.
This division will become blindingly obvious over the next decade with China leading the world not only in economic and technological landscapes but also in the social and political ones.
→ The great undermining of the American AI industry. The title tells you what Brian Merchant focuses on in his piece, he ends with a lot of good questions, and he’s correct on some angles. But for other bits like the quote below, see Jevons paradox above.
Second, this recent semi-hysterical build out of energy infrastructure for AI will also likely soon halt; there will be no need to open any additional Three Mile Island nuclear plants for AI capacity, if good-enough AI can be trained more efficiently. This too, to me, seemed likely to happen as generative AI was commoditized, since it was always somewhat absurd to have five different giant tech companies using insane amounts of resources to train basically the same models to build basically the same products.
→ The impact of competition and DeepSeek on Nvidia. Simon Willison’s thoughts and excerpts from a piece by “Jeffrey Emanuel capturing the current state of the AI/LLM industry.” I haven’t read the latter but this quote drew my attention.
With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward functions, they managed to get models to develop sophisticated reasoning capabilities completely autonomously. This wasn't just about solving problems—the model organically learned to generate long chains of thought, self-verify its work, and allocate more computation time to harder problems.
→ DeepSeek means AI proliferation is guaranteed. Issue 397 of Jack Clark’s Import AI covers a lot of things, but the bit about R1 being used to convert LLMs into reasoning models is worth a look.
But perhaps most significantly, buried in the paper is an important insight: you can convert pretty much any LLM into a reasoning model if you finetune them on the right mix of data - here, 800k samples showing questions and answers the chains of thought written by the model while answering them.
→ What DeepSeek means for energy (and climate). This piece by Susan Su is a last minute add. She goes into detail on the energy angle and provides a great explanation of this Jevons paradox I’ve been mentioning. If you care about the climate crisis (!!), this one might give you nightmares.
In other words, cheaper, more efficient AI means more AI overall, not less. And exponentially so. That means more energy demand — so much more that it will dwarf any energy savings from using even dramatically more efficient models. […]
Regardless of which model, which company or which country wins, AI at its core is an infinitely scalable machine that transforms megawatts into computation and geopolitical power. DeepSeek’s breakthrough demonstration won’t change what AI is or does — it will accelerate it.
Further reading
- How Chinese A.I. start-up DeepSeek is competing with OpenAI and Google | The New York Times
- How small Chinese AI start-up DeepSeek shocked Silicon Valley | MetaFilter thread
- Why everyone in AI is freaking out about DeepSeek | VentureBeat
- DeepSeek R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost | VentureBeat
- How Chinese company DeepSeek released a top AI reasoning model despite US sanctions | MIT Technology Review