AI and Compute

How Much Longer Can Computing Power Drive Artificial Intelligence Progress?

CSET Issue Brief | January 2022

Executive Summary

For the last decade, breakthroughs in artificial intelligence (AI) have come like clockwork, driven to a significant extent by an exponentially growing demand for computing power ("compute" for short). One of the largest models, released in 2020, used 600,000 times more computing power than the noteworthy 2012 model that first popularized deep learning.

Key Finding: Deep learning will soon face a slowdown in its ability to consume ever more compute for at least three reasons: (1) training is expensive; (2) there is a limited supply of AI chips; and (3) training extremely large models generates traffic jams across many processors that are difficult to manage.

Progress towards increasingly powerful and generalizable AI is still possible, but it will require a partial re-orientation away from the dominant strategy of the past decade—more compute—towards other approaches. Improvements in hardware and algorithmic efficiency offer promise for continued advancement, even if they are unlikely to fully offset a slowdown in the growth of computing power usage.

Key Data Points

600,000x
Compute increase from 2012 to 2020 models
3.4 months
Doubling time for compute needs (2012-2018)
$4.6M
Estimated cost to train GPT-3
35 million
AI accelerators reaching datacenters annually

Key Insights Summary

Compute Growth Is Unsustainable

The current growth rate for training the most compute-intensive models is unsustainable. The absolute upper limit of this trend's viability is at most a few years away, and the impending slowdown may have already begun.

Three Major Constraints

Deep learning faces three major constraints: (1) training costs are exploding; (2) there's a limited supply of AI chips; and (3) managing massive models creates parallelization bottlenecks.

Shift from Compute to Efficiency

Future AI progress will rely more on algorithmic innovation and efficiency improvements rather than simply scaling up compute usage. The era of "brute-force" AI approaches is ending.

Algorithmic Efficiency Gains

While algorithms have been exponentially improving their efficiency, the rate of improvement is not fast enough to make up for a loss in compute growth. Additional major gains are needed.

Application-Focused Approaches

Researchers are likely to turn to approaches more focused on specific applications rather than the "brute-force" methods that undergirded much of the last decade of AI research.

Policy Implications

If continued AI advancement relies increasingly on improved algorithms and hardware designs, policy should focus on attracting, developing, and retaining talented researchers rather than simply outspending rivals on computing power.

Content Overview

Introduction

In the field of AI, not checking the news for a few months is enough to become "out of touch." Occasionally, this breakneck speed of development is driven by revolutionary theories or original ideas. More often, the newest state-of-the-art model doesn't rely on any new conceptual advances at all, rather just a larger neural network and more powerful computing systems than were used in previous attempts.

In 2018, researchers at OpenAI attempted to quantify the rate at which the largest models in AI research were growing in terms of their demands for computing power. They found that prior to 2012, the amount of compute used to build a breakthrough model grew at roughly the same rate as Moore's law. In 2012, however, the release of the image recognition system AlexNet sparked interest in deep learning methods, and compute demands began climbing far faster—doubling every 3.4 months between 2012 and 2018.

This compute demand trend only considers the most compute-intensive models from the history of AI research. The most impactful models are not necessarily the largest or the most compute-intensive. However, several of the most well-known breakthroughs of the last decade—from the first AI that could beat a human champion at Go to the first AI that could write news articles that humans mistook for human-authored text—required record-breaking levels of compute to train.

Modern Compute Infrastructure

GPT-3 and similar models such as the Chinese PanGu-alpha, Nvidia's Megatron-Turing NLG, and DeepMind's Gopher are the current state of the art in terms of computing appetite. Training GPT-3 in 2020 required a massive computing system that was effectively one of the five largest supercomputers in the world.

For large models like these, compute consumption is measured in petaFLOPS-days. One petaFLOPS-day is the number of computations that could be performed in one day by a computer capable of calculating a thousand trillion computations per second. For comparison, a standard laptop would need about a year to reach one petaFLOPS-day. That laptop would need several millennia to reach the 3,640 petaFLOPS-days it took to train GPT-3.

High-end AI supercomputers require special purpose accelerators such as Graphics Processing Units (GPUs) or Application-Specific Integrated Circuits (ASICs) such as Google's Tensor Processing Units (TPUs) or Huawei's Ascend 910. These accelerators are specialized hardware chips that are optimized for performing the mathematical operations of machine learning.

Projecting the Cost and Future of AI and Compute

One possible constraint on the growth of compute is expense. Using Google's TPUs as a baseline to calculate the expected cost of compute, we find that training GPT-3 would cost approximately $1.65 million if trained on TPUs performing continuously at their maximum speeds (though more realistic estimates are around $4.6 million).

By the end of 2021, the compute demand trendline predicted a model requiring just over one million petaFLOPS-days. Training such a model at Google Cloud's current prices would cost over $450 million. While this is a large sticker price, it's not unobtainable—governments have funded scientific projects costing billions.

However, the trendline quickly blows past these benchmarks too, costing as much as the National Ignition Facility ($3.5B) by October 2022, the search for the Higgs Boson ($13.25B) by May 2023, and surpassing the Apollo program in October 2024. By 2026, the training cost would exceed the total U.S. GDP.

Critical Finding: The compute demand trendline should be expected to break within two to three years at the latest, and certainly well before 2026—if it hasn't done so already.

The Cost of Compute

While our projections might seem pessimistic, one might object that the cost of compute is not fixed. To explore this, we considered historical trends in the cost of computing.

The price of computations in gigaFLOPS has not decreased since 2017. Similarly, cloud GPU prices have remained constant for Amazon Web Services since at least 2017 and Google Cloud since at least 2019. Although more advanced chips have been introduced, they only offer five percent more FLOPS per dollar than the V100 that was released in 2017.

During this period, manufacturers have improved performance by developing chips that can perform less precise computations rather than by simply performing more of them. For example, GPT-3 only used half-point precision, which requires half as much memory and can be computed faster.

Even if we assume that compute per dollar is likely to double roughly every four years, or even every two years, the compute trendline still quickly becomes unsustainable before the end of the decade. Relaxing the assumption that compute per dollar is stable likely buys the original trendline only a few additional months of sustainability.

The Availability of Compute

Rather than fall, price per computation may actually rise as demand outpaces supply. Excess demand is already driving GPU prices to double or triple retail prices. Chip shortages are stalling the automotive industry and delaying products like iPhones, PlayStations, and Xboxes.

Estimates for the number of existing AI accelerators are imprecise. Once manufactured, most GPUs are used for non-AI applications such as personal computers, gaming, or cryptomining. The large clusters of accelerators needed to set AI compute records are mostly managed in datacenters.

We estimate the total number of accelerators reaching datacenters annually to be somewhere in the ballpark of 35 million. Following the conventional three-year lifespan for accelerators, we find that by the end of 2025, the compute demand trendline predicts that a single model would require the use of every GPU in every datacenter for a continuous period of three years in order to fully train.

Managing Massive Models

The only major increases in model size since GPT-3's release in 2020 have been a 530 billion parameter model called Megatron-Turing NLG and a 280 billion parameter model called Gopher. The fact that these models fall below the projected compute demand trendline suggests that the trend may have already started to slow down.

For models over roughly one trillion parameters to be trained at all, researchers will have to overcome an additional series of technical challenges driven by a simple problem: models are already getting too large to manage. The largest AI models no longer fit on a single processor, which means that even inference requires clusters of processors to function.

Parallelization for AI is not new, but current approaches require splitting the layers of a deep neural network across different processors and even splitting individual layers across processors. The 530 billion parameter Megatron-Turing model used 4,480 GPUs in total, with each copy of the model stored across 280 GPUs.

Managing the flows so that traffic does not grind to a halt is arguably the main impediment for continuing to scale up the size of AI models. Some experts question whether it is even possible to significantly increase the parallelization for transformer models like the one used in GPT-3 beyond what has already been accomplished.

Where Will Future Progress Come From?

If the rate of growth in compute demands is already slowing down, then future progress in AI cannot rely on just continuing to scale up model sizes, and will instead have to come from doing more with more modest increases in compute.

Although algorithms have been exponentially improving their efficiency, the rate of improvement is not fast enough to make up for a loss in compute growth. The number of computations required to reach AlexNet's level of performance in 2018 was a mere 1/25th the number of computations that were required to reach the same level of performance in 2012. But over the same period, the compute demand trend covered a 300,000 times increase in compute usage.

New architectures like Mixture of Experts (MoE) methods allow for more parameters by combining many smaller models together, each of which are individually less capable than a single large model. This approach permits models that are larger in the aggregate to be trained on less compute.

More importantly, not all progress requires record-breaking levels of compute. AlphaFold is revolutionizing aspects of computational biochemistry and only required a few weeks of training on 16 TPUs—likely costing tens of thousands of dollars rather than the millions that were needed to train GPT-3.

Major overhauls of the computing paradigm like quantum computing or neuromorphic chips might one day allow for vast amounts of plentiful new compute. But these radically different approaches are still largely theoretical and are unlikely to make an impact before we project that the compute demand trendline will hit fundamental budgetary and supply availability limits.

Conclusion and Policy Recommendations

For nearly a decade, buying and using more compute each year has been a primary factor driving AI research beyond what was previously thought possible. This trend is likely to break soon. Although experts may disagree about which limitation is most critical, continued progress in AI will soon require addressing major structural challenges such as exploding costs, chip shortages, and parallelization bottlenecks.

Future progress will likely rest far more on a shift towards efficiency in both algorithms and hardware rather than massive increases in compute usage. In addition, we anticipate that the future of AI research will increasingly rely on tailoring algorithms, hardware, and approaches to sub-disciplines and applications.

Policy Recommendations:

  1. Shift focus towards talent development by increasing investment in AI education and competing to attract highly skilled immigrants.
  2. Support AI researchers with technical training, not just compute resources. The National AI Research Resource should help researchers build skills for efficient algorithms and better-scaling parallelization methods.
  3. Promote openness and access to large-scale models throughout the research community, especially for researchers who cannot train their own models.

Note: The above is only a summary of the report content. The complete document contains extensive data, charts, and detailed analysis. We recommend downloading the full PDF for in-depth reading.