
Ambuj Tewari, University of Michigan
State-of-the-art artificial intelligence systems like OpenAI’s ChatGPT, Google’s Gemini and Anthropic’s Claude have captured the public imagination by producing fluent text in multiple languages in response to user prompts. Those companies have also captured headlines with the huge sums they’ve invested to build ever more powerful models.
An AI startup from China, DeepSeek, has upset expectations about how much money is needed to build the latest and greatest AIs. In the process, they’ve cast doubt on the billions of dollars of investment by the big AI players.
I study machine learning. DeepSeek’s disruptive debut comes down not to any stunning technological breakthrough but to a time-honored practice: finding efficiencies. In a field that consumes vast computing resources, that has proved to be significant.
Where the costs are
Developing such powerful AI systems begins with building a large language model. A large language model predicts the next word given previous words. For example, if the beginning of a sentence is “The theory of relativity was discovered by Albert,” a large language model might predict that the next word is “Einstein.” Large language models are trained to become good at such predictions in a process called pretraining.
Pretraining requires a lot of data and computing power. The companies collect data by crawling the web and scanning books. Computing is usually powered by graphics processing units, or GPUs. Why graphics? It turns out that both computer graphics and the artificial neural networks that underlie large language models rely on the same area of mathematics known as linear algebra. Large language models internally store hundreds of billions of numbers called parameters or weights. It is these weights that are modified during pretraining.
Pretraining is, however, not enough to yield a consumer product like ChatGPT. A pretrained large language model is usually not good at following human instructions. It might also not be aligned with human preferences. For example, it might output harmful or abusive language, both of which are present in text on the web.
The pretrained model therefore usually goes through additional stages of training. One such stage is instruction tuning where the model is shown examples of human instructions and expected responses. After instruction tuning comes a stage called reinforcement learning from human feedback. In this stage, human annotators are shown multiple large language model responses to the same prompt. The annotators are then asked to point out which response they prefer.
It is easy to see how costs add up when building an AI model: hiring top-quality AI talent, building a data center with thousands of GPUs, collecting data for pretraining, and running pretraining on GPUs. Additionally, there are costs involved in data collection and computation in the instruction tuning and reinforcement learning from human feedback stages.
All included, costs for building a cutting edge AI model can soar up to US$100 million. GPU training is a significant component of the total cost.
The expenditure does not stop when the model is ready. When the model is deployed and responds to user prompts, it uses more computation known as test time or inference time compute. Test time compute also needs GPUs. In December 2024, OpenAI announced a new phenomenon they saw with their latest model o1: as test time compute increased, the model got better at logical reasoning tasks such as math olympiad and competitive coding problems.
Slimming down resource consumption
Thus it seemed that the path to building the best AI models in the world was to invest in more computation during both training and inference. But then DeepSeek entered the fray and bucked this trend.
Their V-series models, culminating in the V3 model, used a series of optimizations to make training cutting edge AI models significantly more economical. Their technical report states that it took them less than $6 million dollars to train V3. They admit that this cost does not include costs of hiring the team, doing the research, trying out various ideas and data collection. But $6 million is still an impressively small figure for training a model that rivals leading AI models developed with much higher costs.
The reduction in costs was not due to a single magic bullet. It was a combination of many smart engineering choices including using fewer bits to represent model weights, innovation in the neural network architecture, and reducing communication overhead as data is passed around between GPUs.
It is interesting to note that due to U.S. export restrictions on China, the DeepSeek team did not have access to high performance GPUs like the Nvidia H100. Instead they used Nvidia H800 GPUs, which Nvidia designed to be lower performance so that they comply with U.S. export restrictions. Working with this limitation seems to have unleashed even more ingenuity from the DeepSeek team.
DeepSeek also innovated to make inference cheaper, reducing the cost of running the model. Moreover, they released a model called R1 that is comparable to OpenAI’s o1 model on reasoning tasks.
They released all the model weights for V3 and R1 publicly. Anyone can download and further improve or customize their models. Furthermore, DeepSeek released their models under the permissive MIT license, which allows others to use the models for personal, academic or commercial purposes with minimal restrictions.
Resetting expectations
DeepSeek has fundamentally altered the landscape of large AI models. An open weights model trained economically is now on par with more expensive and closed models that require paid subscription plans.
The research community and the stock market will need some time to adjust to this new reality.
Ambuj Tewari, Professor of Statistics, University of Michigan
This article is republished from The Conversation under a Creative Commons license. Read the original article.
10 Best Selling Books About Artificial Intelligence
Life 3.0: Being Human in the Age of Artificial Intelligence by Max Tegmark
This book frames artificial intelligence as an evolution of “life” from biological organisms to engineered systems that can learn, plan, and potentially redesign themselves. It outlines practical AI governance questions – such as safety, economic disruption, and long-term control – while grounding the discussion in real machine learning capabilities and plausible future pathways.
Superintelligence: Paths, Dangers, Strategies by Nick Bostrom
This book analyzes how an advanced artificial intelligence system could outperform humans across domains and why that shift could concentrate power in unstable ways. It maps scenarios for AI takeoff, AI safety failures, and governance responses, presenting the argument in a policy-oriented style rather than as a technical manual.
Human Compatible: Artificial Intelligence and the Problem of Control by Stuart Russell
This book argues that the central issue in modern AI is not capability but control: ensuring advanced systems pursue goals that reliably reflect human preferences. It introduces the alignment challenge in accessible terms, connecting AI research incentives, machine learning design choices, and real-world risk management.
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos
This book explains machine learning as the engine behind modern artificial intelligence and describes multiple “schools” of learning that drive practical AI systems. It connects concepts like pattern recognition, prediction, and optimization to everyday products and to broader societal effects such as automation and data-driven decision-making.
The Alignment Problem: Machine Learning and Human Values by Brian Christian
This book shows how machine learning systems can produce outcomes that diverge from human values even when designers have good intentions and ample data. It uses concrete cases – such as bias in automated decisions and failures in objective-setting – to illustrate why AI ethics and evaluation methods matter for real deployments.
Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell
This book separates marketing claims from technical reality by explaining what today’s AI can do, what it cannot do, and why general intelligence remains difficult. It provides a clear tour of core ideas in AI and machine learning while highlighting recurring limitations like brittleness, shortcut learning, and lack of common sense reasoning.
The Age of AI: And Our Human Future by Henry A. Kissinger, Eric Schmidt, and Daniel Huttenlocher
This book focuses on how artificial intelligence changes institutions that depend on human judgment, including national security, governance, and knowledge creation. It treats AI as a strategic technology, discussing how states and organizations may adapt when prediction, surveillance, and decision-support systems become pervasive.
AI Superpowers: China, Silicon Valley, and the New World Order by Kai-Fu Lee
This book compares the AI business ecosystems of the United States and China, emphasizing how data, talent, capital, and regulation shape competitive outcomes. It explains why applied machine learning and automation may reconfigure labor markets and geopolitical leverage, especially in consumer platforms and industrial applications.
Genius Makers: The Mavericks Who Brought AI to Google, Facebook, and the World by Cade Metz
This book tells the modern history of deep learning through the researchers, labs, and corporate rivalries that turned neural networks into mainstream AI. It shows how technical breakthroughs, compute scaling, and competitive pressure accelerated adoption, while also surfacing tensions around safety, concentration of power, and research openness.
The Coming Wave: AI, Power, and Our Future by Mustafa Suleyman and Michael Bhaskar
This book argues that advanced AI systems will diffuse quickly across economies and governments because they can automate cognitive work at scale and lower the cost of capability. It emphasizes containment and governance challenges, describing how AI policy, security controls, and institutional readiness may determine whether widespread deployment increases stability or amplifies systemic risk.