Executive Summary
For the last decade, breakthroughs in artificial intelligence (AI) have come like clockwork, driven to a significant extent by an exponentially growing demand for computing power ("compute" for short). One of the largest models, released in 2020, used 600,000 times more computing power than the noteworthy 2012 model that first popularized deep learning.
Key Insight: Deep learning will soon face a slowdown in its ability to consume ever more compute for at least three reasons: (1) training is expensive; (2) there is a limited supply of AI chips; and (3) training extremely large models generates traffic jams across many processors that are difficult to manage.
Progress towards increasingly powerful and generalizable AI is still possible, but it will require a partial re-orientation away from the dominant strategy of the past decade—more compute—towards other approaches.
Key Data Points
Key Insights Summary
Compute Growth Is Unsustainable
The current growth rate for training the most compute-intensive models cannot be sustained. We estimate that the absolute upper limit of this trend's viability is at most a few years away, and the slowdown may have already begun.
Three Major Constraints
Deep learning faces three major constraints: exploding training costs, limited supply of AI chips, and parallelization bottlenecks that create "traffic jams" across processors.
Algorithmic Efficiency Gains Not Enough
Although algorithms have been exponentially improving their efficiency, the rate of improvement is not fast enough to make up for a loss in compute growth.
Shift to Application-Centric Approaches
Future progress will likely involve more incremental improvements to existing algorithms and a shift toward application-centric problems rather than simply scaling up compute usage.
Policy Focus on Talent Development
Continued AI progress requires shifting focus toward talent development, technical training for researchers, and promoting openness and access to large-scale models.
Alternative Computing Paradigms
Major overhauls like quantum computing or neuromorphic chips might one day allow for vast amounts of new compute, but these are unlikely to impact the compute demand trendline before it hits fundamental limits.
Report Overview
Document Contents
Introduction
In 2018, researchers at OpenAI attempted to quantify the rate at which the largest models in AI research were growing in terms of their demands for computing power. They found that prior to 2012, the amount of compute used to build a breakthrough model grew at roughly the same rate as Moore's law. Following the release of AlexNet in 2012, however, compute demands began climbing far faster, doubling every 3.4 months between 2012 and 2018.
This compute demand trend only considers the most compute-intensive models from the history of AI research. While most AI projects are much smaller than these large efforts, several of the most well-known breakthroughs of the last decade required record-breaking levels of compute to train.
Modern Compute Infrastructure
GPT-3 and similar models are the current state of the art in terms of computing appetite. Training GPT-3 in 2020 required a massive computing system that was effectively one of the five largest supercomputers in the world.
High-end AI supercomputers require special purpose accelerators such as Graphics Processing Units (GPUs) or Application-Specific Integrated Circuits (ASICs) such as Google's Tensor Processing Units (TPUs). These accelerators are specialized hardware chips optimized for performing the mathematical operations of machine learning.
Processor Type | Uses in the AI Pipeline | Other Uses |
---|---|---|
Central Processing Unit (CPU) | Small models can be directly trained on CPUs; necessary in larger models to coordinate training across GPUs or ASICs | Central unit of every computing device |
Graphics Processing Unit (GPU) | Optimized for mathematical operations common in machine learning; can train models far quicker than CPUs | Video game graphics, cryptocurrency mining |
Application-Specific Integrated Circuit (ASIC) | Designed specifically for AI matrix operations; can train models far quicker than CPUs | If designed specifically for AI, no major uses beyond AI pipeline |
Field Programmable Gate Array (FPGA) | Primarily used for model inference using trained AI models | Wide variety of applications, particularly embedded systems |
Projecting the Cost and Future of AI and Compute
One possible constraint on the growth of compute is expense. Using Google's TPUs as a baseline, we estimate that training GPT-3 would cost approximately $1.65 million if trained on TPUs performing continuously at their maximum speeds.
By the end of 2021, the compute demand trendline predicted a model requiring just over one million petaFLOPS-days. Training such a model at Google Cloud's current prices would cost over $450 million.
While large, this cost is not unobtainable for governments. However, the trendline quickly surpasses these benchmarks, costing as much as the National Ignition Facility by October 2022, the search for the Higgs Boson by May 2023, and surpassing the Apollo program in October 2024. By 2026, the training cost would exceed the total U.S. GDP.
The Cost of Compute
The price of computations in gigaFLOPS has not decreased since 2017. Similarly, cloud GPU prices have remained constant for Amazon Web Services since at least 2017 and Google Cloud since at least 2019.
During this period, manufacturers have improved performance by developing chips that can perform less precise computations rather than by simply performing more of them. Only so much precision can be shaved off before AI performance degrades, and these techniques are quickly reaching practical limitations.
Analysis: Even if compute per dollar doubled at the rapid pace of every two years, the point where training costs would exceed U.S. GDP is only delayed until May 2027—less than a year after it would be reached with no changes in the price of compute.
The Availability of Compute
Rather than fall, price per computation may actually rise as demand outpaces supply. Excess demand is already driving GPU prices to double or triple retail prices.
Estimates for the number of existing AI accelerators are imprecise, but we estimate the total number of accelerators reaching datacenters annually to be somewhere in the ballpark of 35 million.
By the end of 2025, the compute demand trendline predicts that a single model would require the use of every GPU in every datacenter for a continuous period of three years in order to fully train.
Managing Massive Models
The only major increases in model size since GPT-3's release in 2020 have been Megatron-Turing NLG (530 billion parameters) and Gopher (280 billion parameters). The fact that these models fell below the projected compute demand trend line suggests that the trend may have already started to slow down.
For models over roughly one trillion parameters to be trained at all, researchers will have to overcome additional technical challenges: models are already getting too large to manage. The largest AI models no longer fit on a single processor, which means that even inference requires clusters of processors to function.
Parallelization for AI is not new, but today's models require more sophisticated approaches. The 530 billion parameter Megatron-Turing model used 4,480 GPUs in total, with each copy of the model stored across 280 GPUs.
Critical Challenge: Splitting training across multiple processors means computation results must be passed between them. At large scales, communication creates traffic jams. Managing these flows is arguably the main impediment for continuing to scale up AI models.
Where Will Future Progress Come From?
If the rate of growth in compute demands is slowing, future AI progress cannot rely on just scaling up model sizes, and will instead have to come from doing more with more modest increases in compute.
Algorithmic efficiency improvements have been impressive but not sufficient to compensate for reduced compute growth. Making up for reduced ability to scale up compute would require finding major additional gains at a rate faster than researchers have already been achieving.
Alternative approaches show promise:
- Mixture of Experts (MoE) methods allow for more parameters by combining many smaller models together, permitting larger aggregate models to be trained on less compute.
- Application-centric approaches like AlphaFold demonstrate that revolutionary progress can be made without record-breaking compute levels.
- Fine-tuning foundation models for specific applications requires far less compute than training from scratch.
Major overhauls like quantum computing or neuromorphic chips are unlikely to impact the compute demand trendline before it hits fundamental limits.
Conclusion and Policy Recommendations
For nearly a decade, buying and using more compute each year has been a primary factor driving AI research beyond what was previously thought possible. This trend is likely to break soon due to exploding costs, chip shortages, and parallelization bottlenecks.
Future progress will likely rest far more on a shift towards efficiency in both algorithms and hardware rather than massive increases in compute usage. This has several implications for policymakers:
1. Shift Focus to Talent Development
Improving algorithmic efficiency and overcoming parallelization bottlenecks require significantly more human expertise than simply purchasing more compute. Policymakers should invest in AI education and make the U.S. an attractive destination for AI talent.
2. Support Researchers with Technical Training
Institutions like the National AI Research Resource should provide not just compute resources but also educational tools to help researchers build skills for innovating with efficient algorithms and better-scaling parallelization methods.
3. Promote Openness and Access
Policymakers should encourage owners of large foundation models to permit appropriately vetted researchers access to these models, ensuring AI remains a field where researchers of many backgrounds can contribute.
Note: The above is only a summary of the report content. The complete document contains extensive data, charts, and detailed analysis. We recommend downloading the full PDF for in-depth reading.