AI with Alec Newsletter
Posts
Newsletter #34: Did China just jump ahead of America in the AI race via DeepSeek?

Newsletter #34: Did China just jump ahead of America in the AI race via DeepSeek?

Alec Coughlin
January 26, 2025

If you haven’t heard of DeepSeek AI yet, you most definitely will.

The Chinese company introduced an AI model that has lit up the AI ecosystem in a way that can only be compared to some of the early, initial reactions to ChatGPT.

We will see how much further things develop but I wanted to pull together some initial content for those that want to get ahead of this as it develops.

If you want to get the gist of what’s going on, check out “Silicon Valley Is Raving About a Made-in-China AI Model” by WSJ.

“On Jan. 20, DeepSeek introduced R1, a specialized model designed for complex problem-solving.”

Not known for hyperbole, Silicon Valley legend Marc Andreessen's reaction speaks for itself…“Deepseek R1 is one of the most amazing and impressive breakthroughs I’ve ever seen.”

Why would Marc say that?

"DeepSeek said training one of its latest models cost $5.6 million, compared with the $100 million to $1 billion range cited last year by Dario Amodei, chief executive of the AI developer Anthropic, as the cost of building a model.”

“DeepSeek said in a technical report that it used a cluster of more than 2,000 chips to train its V3 model, compared with tens of thousands of chips for training models of similar size.”

“A few U.S. AI specialists have recently questioned whether High-Flyer and DeepSeek are accessing computing power beyond what they have announced.”

One of those people included Alexandr Wang - “My understanding is that DeepSeek has about 50,000 H100s, which they can’t talk about, obviously, because it is against the export controls that the United States has put in place.”

Now if you want to go deeper, check out the thorough and excellent piece by Deirdre Bosa and CNBC.

The 40 minute piece starts with a 15 minute segment describing the development and implications followed by an insightful conversation with Perplexity co-founder and CEO, Aravind Srinivas.

There’s a whole lot to unpack but here are my 5 key takeaways with the associated segment excerpts.

1: The cost + speed are breathtaking.

"Google and OpenAI took years and hundreds of millions of dollars to build. DeepSeek says it took just two months and less than $6 million.”

“The AI Lab reportedly spent just $5.6 million to build DeepSeek version three. Compare that to Open AI, were just spending $5 billion a year. And Google, which expects capital expenditures in 2024 to soar to over $50 billion and then there's Microsoft that shelled out more than $13 billion just to invest in open AI.”

“And then these guys come out with, like a crazy model that's like 10x cheaper and API pricing than GPT-4”

“'They did it all with approximately just 2040 800 GPUs, which is actually equivalent to somewhere around 1500 or 1000, 1500 H100 GPUs. That's like 20 to 30x lower than the amount of GPUs that GPT-4 is usually trained on. They ended up with roughly $5 million in total compute budget. They did it with so little money and such an amazing model, gave it away for free, wrote a technical paper, and definitely it makes us all question, like, okay, like, if we have the equivalent of DOGE for like, model training, this is an example of that.'”

“So previously, you know, to get to the frontier, you would have to think about hundreds of millions of dollars of investment, and perhaps a billion dollars of investment. What DeepSeek has now done here in Silicon Valley is it opened our eyes to what you can actually accomplish with $10, $15, $20, $30 million.”

“So you can actually create these models that do thinking for much, much less. You don't need those huge amounts to pre-train the model. So I think the game is shifting.”

“It means that staying on top may require as much creativity as capital.”

“In early 2024 former Google CEO Eric Schmidt, he predicted China was 2-3 years behind the US in AI, but now Schmidt is singing a different tune.”

“‘I used to think we were a couple of years ahead of China. China has caught up in the last six months, in a way that is remarkable. The fact of the matter is that a couple of the Chinese programs, one, for example, is called DeepSeek. Looks like they've caught up.’”

2: “Necessity is the mother of invention.”

“It's hard to fake scarcity, right? If you raise $10 billion and you’ve decided to spend 80% of it on a compute cluster, it's hard for you to come up with the exact same solution that someone with $5 million does, and there's no point, no need to, like sort of berate those who are putting more money in. They're trying to do it as fast as they can.”

“Those chip restrictions from the US government, they were intended to slow down the race, to keep American tech on American ground, to stay ahead in the race,”

“Necessity is the mother of invention, because they had to go figure out workarounds. They actually ended up building something a lot more efficient.”

“It's really remarkable the amount of progress they've made with as little capital as it's taken them to make that progress, it drove them to get creative with huge implications.”

“Because they had to go figure out workarounds, they actually ended up building something a lot more efficient.”

“But the reality is, some of the details in DeepSeek are so good that I wouldn't be surprised if Meta took a look at it and incorporated some of that in Llama, right? I wouldn't necessarily say copy. It's all like, you know, sharing, science, engineering, but the point is, it's changing. It's not like China is a copycat. They're also innovating. We don't know exactly the data that it was trained on, right?”

“I wouldn't disregard their technical accomplishment just because, for some prompts, like ‘Who are you?’ or ‘Which model are you?’ In response to that, it doesn't even matter. In my opinion.”

“You can always say that, like everybody copies everybody in this field. You can say Google did the transformer first. It's not Open AI and Open AI just copied it. Google built the first Large Language Models. They didn't prioritize it, but Open AI did it in the prioritized way. So you can say all this in many ways. It doesn't matter.”

3: Innovation cycles are about to pick up a WHOLE lot of speed aka Jevons Paradox is real.

“We already are beginning to use it, as in, they have an API, and we're also open source by the way, so we can host it ourselves too. And it's good to, like, try to start using that, because it actually allows us to do a lot of the things at lower cost."

"But what I'm kind of thinking beyond that, which is like, okay, these guys actually could train such a great model this, you know, good team like and that's no excuse anymore for companies in the US, including ourselves, to, like, not try to do something like that.”

“The widespread availability of powerful open source models allows developers to skip the demanding capital intensive steps of building and training models themselves. Now they can build on top of existing models, making it significantly easier to jump to the frontier that is the front of the race with a smaller budget and a smaller team.”

“In the last two weeks, AI research teams have really opened their eyes and have become way more ambitious on what's possible with a lot less capital.”

“It also means any company like Open AI that claims the frontier today could lose it tomorrow. That's how DeepSeek was able to catch up so quickly. It started building on the existing frontier of AI. Its approach focuses on iterating on existing technology, rather than reinventing the wheel.”

“It can take a really good, big model and use a process called distillation.”

“And what distillation is basically is to use a very large model to help your small model get smart at the thing that you wanted it to get smart at. And that's actually very cost efficient.”

4: Open-source implications are massive on multiple dimensions.

“What's more dangerous than trying to do all the things to not let them catch up and you know, all this stuff is, what's more dangerous is they have the best open source model, and all the American developers are building on that, right?”

“That's more dangerous because then they get to own the mindshare of the ecosystem, the entire American AI ecosystem.”

“Look in general, it's known that once open-source is caught up or improved over closed source software, all developers migrate to that. It's historically known, right?”

“When llama was being built and becoming more widely used, there was this question, should we trust Zuckerberg? But now the question is, should we trust China?”

“It doesn't matter, in the sense that you still have full control. You run it as your own, like, like, set of weights on your own computer. You are in charge of the model, but it's not a great look for our own talent to, you know, rely on software built by others, even if it's open.”

“So there's always a point where open source can stop being open source too, right?”

“The licenses are very favorable today, but if you close it over time, we can always change the license.”

“It's important that we actually have people here in America building and that's why this matter is so important.”

“Just try to out compete and win…That's just the American, we're doing things, just be better”

5: What are the AI policy implications?

“They accomplished all that despite the strict semiconductor restrictions that the US government has imposed on China, which has essentially shackled them out of computing power.”

“Washington has drawn a hard line against China and the AI race, cutting the country off from receiving America's most powerful chips, like NVIDIA 's H100 GPUs.”

“Those were once thought to be essential to building a competitive AI model, with startups and big tech firms scrambling to get their hands on any available but DeepSeek turned that on its head, sidestepping the rules by using Nvidia's less performant 800s to build the latest model, and showing that the chip export controls were not the chokehold DC intended.”

“The mystery brings into sharp relief just how urgent and complex the AI face off against China has become, because it's not just DeepSeek. Other more well known Chinese AI models have carved out positions in the race with limited resources as well.”

"Kai-Fu Lee, he's one of the leading AI researchers in China, formerly leading Google's operations there.”

“Now his startup 01.AI, it's attracting attention, becoming a unicorn just eight months after founding and bringing in almost $14 million in revenue in 2024.”

“‘The thing that shocks my friends in the Silicon Valley is not just our performance, but that we train the model with only $3 million and GPT-4 was trained for $80, $200 million.”

“It could mean that the prevailing model in global AI may be open source.”

“As organizations and nations come around to the idea that collaboration and decentralization, those things can drive innovation faster and more efficiently than proprietary, closed ecosystems, a cheaper, more efficient, widely adopted open source model from China that could lead to a major shift in dynamics.”

“There's really only two countries right now in the world that can build this at scale, and that is the US and China and so, you know, the consequences of the stakes in and around this are just enormous. Enormous stakes, enormous consequences and hanging in the balance.”

More to come, that's for sure.

Have a great Sunday and talk soon.

-Alec