DeepSeek's Deep Shock to the US AI Behemoths

Image Courtesy: Flickr

The tech world was shocked when a little-known Chinese company released an AI model called DeepSeek that appears to match the Open AI's most advanced models while spending a small fraction of its cost. The tech world has been buzzing for the last month. First, the leading US tech investors boosting Nvidia’s stock to dizzying heights on the back of the AI dazzling future. And then bemoaning that the AI's Sputnik moment—DeepSeek's AI models—had wiped out nearly a trillion dollars in the stock market. The leading tech companies took the biggest hit. with Nvidia, which manufactures high-end Graphic Processing Units (GPUs), took the biggest hit, losing nearly $600 million in just one day.

GPUs were originally developed for the parallel processing of image data, hence the name; but they are now used for all parallel computational tasks, including AI models. The other feature in the eye-popping advance of this frugal/cost-effective? Chinese innovation is not simply that it built its most advanced models at 3-5% of the cost incurred by OpenAI, Anthropic, Google, Meta, etc., but that it came in spite of stringent sanctions imposed by the US (with bi-partisan support) on the export of advanced chips to China.

The specific target to cripple China's AI advance was to not allow advanced GPUs considered essential for any major AI advance. During his tour of India last year, Sam Altman, the prevailing guru of OpenAI, had dismissed any attempt to match AI advances of the big US tech companies in building foundational AI models with small investments and a much smaller team as "totally hopeless". Almost in the same vein, India's tech guru Nandan Nilekani argued that India should not build the basic AI models, but only use them in their work, ceding the tech baton completely to the US. His opinion was hotly contested by Aravind Srinivas, Cofounder and CEO of the AI company Perplexity.

Obviously, Sam Altman was wrong. Not only did DeepSeek create a model on a shoe-string budget that can go toe-to-toe with companies that have spent hundreds of millions of dollars, but also it did this using hardware "designed" precisely to hamstring such advances. The H-800 chips were developed by Nvidia specifically for the Chinese market, and supposed to prevent such AI advances. The tech world is waking up belatedly to the simple historical truth that it is difficult to stop advances in science and technology using just a bunch of trade restrictions.

The AI models we are discussing here are not the ChatGPT or DeepSeek chatbots that answer your questions, do some decent summaries and even create research summaries, all of which can be considered superior versions of a Google search. After having "ingested" (being fed with) virtually all internet content, there is not much of stretching possible with ChatGPT tools to generate further insights. The new models developed by Open AI and other leading US companies, have added reasoning models rto the Large Language Models (LLMs), the basis of ChatGPT and their counterparts. These reasoning models are built on reinforced learning. It has also been argued that for the holy grail of Artificial General Intelligence (AGI), the machine counterpart to biological intelligence, reasoning models are the way to go--even if the goal AGI is not as close as Sam Altman and his AI tribes would have us believe. The new advances that we are talking about are in the reasoning models, and here DeepSeek has been able to create models ahead of, or on par with, what the US digital behemoths can do. Or, as a news headline asks: Did China Just Eat America's AI Lunch?

What has shocked the tech world is not that China has matched the AI development of the US tech giants, but that a company worth only $8 billion, with no previous tech feats, has managed this by spending a fraction of the cost. They spent just two months and under $6 million to build an AI model comparable to OpenAI's. On top of that, they did it using Nvidia's crippled H800 chips (to conform to US restrictions for exporting hardware to China). For those who are deeply suspicious of any Chinese claims, DeepSeek has not only open-sourced the model, but has published detailed papers documenting what their team has done.

So what is the company behind DeepSeek, and who are they? The people behind DeepSeek are a bunch of what are called "Quants" in the financial world . Quants are mathematics, modelling and programming people who work in the financial market. They were held responsible for blowing up Wall Street in 2008, the subprime disaster for the global stock markets. Though quants were partially discredited after the stock market meltdown of 2008, the world of finance cannot do without them. In China, the financial markets are more tightly controlled. The quant who set up DeepSeek is Liang Wenfeng. After a stumble in which his funds lost about a third of its $12 billion value in 2012, he decided to channel some of his money, and a team of his quants, into AI.

It is not that DeepSeek found some brand new mathematics to solve the problem of AI. Instead of just throwing money and computing power at the problem, they decided to do some clever engineering to build and release two new models. These models, writes Jeffrey Emanuel, a well-known techie familiar with the area, "have basically world-competitive performance levels on par with the best models from OpenAI and Anthropic (blowing past the Meta Llama3 models and other smaller open source model players such as Mistral). These models are called DeepSeek-V3 (answer to GPT-4o and Claude 3.5 Sonnet) and DeepSeek-R1 (answer to OpenAI's O1 model)." The price? At most 5% of what others have or would have spent. Emanuel's guesstimate is that DeepSeek is 45x-50x more efficient than other cutting-edge platforms.

Not only have the DeepSeek models been released in the public domain, they have also been released as open-source models under an MIT license, with the weights in the model available on GitHub. Moreover, they have released two detailed technical reports explaining each step of what they have done. So the models, the theory, and how they analysed and solved the problems, are all set down in a way that people can not only track and use what they have done, but can also reproduce and run them on their servers.

There are three major implications for the digital world with the DeepSeek market shock. One is that Nvidia, the major beneficiary of the AI boom, is in for a major correction of its stock price. That is already visible. The second is that many more players will now be willing to enter the AI race, knowing that the entry price is not as steep as the biggies had told them: the race is not necessarily won by the biggest, just as it happened in animal evolution! The last is that technology sanctions don't work. It did not work against India in the nuclear and space sectors; neither has it worked against China's AI developments.

And that is not all. If scaling up computing power is not the only way to improve models and get a market edge for AI players, do we need the huge data centres that the AI industry was planning? For those who remember the development of microprocessors and the PC revolution, is the DeepSeek moment likely to provide a similar shock? Remember the world of IBM when huge rooms were built in the 60’s for IBM machines? It is this expectation that led to Trump announcing the $500 billion StarGate project of OpenAI on the second day of his new Presidential term. Implicit in this vision was a large number of data centres housing very large arrays of powerful GPUs, almost entirely from Nvidia.

This brought into focus the issue of energy, as such data centres would also be huge energy guzzlers. The plan, which dovetails with Donald Trump's vision of "drill, baby, drill, was to use natural gas. That, of course, would cause a jump in US greenhouse gas (GHG) emissions. Without the need for such an immediate energy demand, natural gas in the US is finding it difficult to compete with solar and wind energy, the costs of which have dropped and continue to drop below that of natural gas. So, not only has DeepSeek essentially deep-sixed the concept that bigger is better, but it has also reduced the threat of a rapid increase in US green house gas emissions.

As a well-known philosopher has said: "There are decades where nothing happens; and there are weeks where decades happen." This appears to be one of those moments. At least for AI.

Get the latest reports & analysis with people's perspective on Protests, movements & deep analytical videos, discussions of the current affairs in your Telegram app. Subscribe to NewsClick's Telegram channel & get Real-Time updates on stories, as they get published on our website.

Subscribe Newsclick On Telegram