Skip to main content
xYOU DESERVE INDEPENDENT, CRITICAL MEDIA. We want readers like you. Support independent critical media.

DeepSeek's Deep Shock to the AI World

The tech world is waking up belatedly to the simple historical truth that it is difficult to stop advances using a bunch of trade restrictions.
Prabir Purkayastha

Image Courtesy: Flickr

The tech world was shocked when a little-known Chinese company released an AI (artificial intelligence) model called DeepSeek that appears to match OpenAI's most advanced models while spending a small fraction of its cost. The tech world has been buzzing for the last month with leading the US tech investors first following Nvidia's performance with bated breath and then bemoaning that the AI's Sputnik moment—DeepSeek's AI models—had wiped nearly a trillion dollars of leading tech companies.

Interestingly, Nvidia, which manufactures high-end Graphic Processing Units (GPUs), took the biggest hit, losing nearly $600 million in one day. GPUs were originally developed for the parallel processing of image data, hence the name, but are now used for all parallel computational tasks, including AI models.

The other feature in the eye-popping advance of this Chinese frugal innovation is not simply that it built its most advanced models at 3-5% of the cost incurred by OpenAI, Anthropic, Google, Meta, etc. However, the Chinese advance has come in spite of stringent sanctions imposed by the US (with bi-partisan support) on advanced chips that could be exported from the US to China.

The specific target to cripple China's AI advance was not to allow advanced GPUs that were thought essential for any major AI advance. Sam Altman, the prevailing guru of OpenAI, had dismissed during his tour of India last year that any attempt to match AI advances of the big US tech companies was "totally hopeless" as the cost of the infrastructure for such computations was out of their reach. Almost in the same vein, India's tech guru Nandan Nilekani had argued that India should not build the basic AI models but only use them in their work, ceding the tech baton completely to the US, an opinion hotly contested by the co-founder and CEO of the AI company, Perplexity.

Altman was obviously wrong. Not only did DeepSeek create a model on a shoe-string budget that can go toe-to-toe with companies that have spent hundreds of millions of dollars, but it has also done it using hardware that was "designed" to precisely hamstring such advances. The H-800 chips developed by Nvidia specifically for the Chinese market were supposed to prevent such AI advances, as they throttled communication speeds between GPUs and the training of models. The tech world is waking up belatedly to the simple historical truth that it is difficult to stop advances using just a bunch of trade restrictions.

The AI models we are discussing here are not ChatGPT or DeepSeek chatbots that answer your questions, do some research summaries and even create decent summaries, all of which can be seen as superior versions of Google Search or Amazon's Alexa. After having "ingested" (fed with) virtually all internet content, there is not much that ChatGPT tools can stretch to generate new insights.

The new models use Large Language Models, the basis of ChatGPT and their counterparts, but also have added reasoning models via what the techies call reinforced learning. It has also been argued that for the holy grail of Artificial General Intelligence, the machine counterpart to biological intelligence, reasoning models are the way to go, even if the goal is not as close as Altman and his AI tribe would have us believe.

The new advances that we are talking about are in the reasoning models, and DeepSeek has been able to create models ahead or on par with what the US digital behemoths can do. As a news headline states on DeepSeek tools: Did China Just Eat America's AI Lunch?

What has shocked the tech world is not that China has matched the AI development of the US tech giants, but a company worth only $8 billion, with no previous tech feats, has managed this feat by spending a small fraction of the cost: they spent just two months and under $6 million to build an AI model comparable to OpenAI's using Nvidia's crippled H800 chips (to conform with US restrictions for exporting hardware to China). For those who are deeply suspicious of any Chinese claims, DeepSeek has not only open-sourced the model but has published detailed papers documenting what their team has done.

So, what is the company behind DeepSeek, and who are they? The people behind DeepSeek are a bunch of what in the financial world are called "Quants". Quants are mathematics, modelling and programming people who work in the financial world. They are held responsible for blowing up Wall Street in 2008, the subprime disaster for the global markets. Though quants were partially discredited after the market meltdown of 2008, the world of finance cannot do without them. In China, they are more tightly controlled.

The quant who set up DeepSeek is Liang Wenfeng, who, after a stumble in which his funds lost about a third of its $12 billion value in 2012, decided to channel some of his money and with quants into AI.

It is not that DeepSeek found some new insight to solve the problem of AI. Instead of just throwing money and computing power at the problem, they decided to do some clever engineering to build and release two new models. These models are analysed by Jeffrey Emanuel (and also others), a well-known techie familiar with the area, who writes, "have basically world-competitive performance levels on par with the best models from OpenAI and Anthropic (blowing past the Meta Llama3 models and other smaller open source model players such as Mistral). These models are called DeepSeek-V3 (answer to GPT-4o and Claude 3.5 Sonnet) and DeepSeek-R1 (answer to OpenAI's O1 model)." The price? At most 5% of what others have or would have spent, Emanuel's guesstimate is that DeepSeek is 45x-50x more efficient than other cutting-edge platforms.

Not only have these models been released in the public domain, these have also been released as free and open-source software under an MIT license (developed by the Massachusetts Institute of Technology in the late 1980s), with the code available on GitHub. They have also released two detailed technical reports explaining each step of what they have done. So, the code, the theory, and how they analysed and solved the problems are all set down in a way that people can not only track and use what they have done, but can reproduce it with their own code if they so desire.

Three major implications loom for all of us with the DeepSeek market shock. One is that Nvidia, the major beneficiary of the AI boom, is in for a major correction of its stock price. That is already visible. The second is that many more players will now be willing to enter the AI race, knowing that the entry price is not as steep as the biggies had told them, and the race is not necessarily won by the biggest; just as it happened in animal evolution! The last is that technology sanctions don't work. It did not work against India in the nuclear and space sectors; nor has it worked against China's AI developments.

As a well-known philosopher said: "There are decades where nothing happens; and there are weeks where decades happen." This appears to be one of those moments.

Get the latest reports & analysis with people's perspective on Protests, movements & deep analytical videos, discussions of the current affairs in your Telegram app. Subscribe to NewsClick's Telegram channel & get Real-Time updates on stories, as they get published on our website.

Subscribe Newsclick On Telegram

Latest