Why AI is progressing so quickly in 2025

And why we know the pace of advancement will continue

Did you receive this forwarded from a friend?

Today I want to show why the chatter in the legacy media about AI being in a hype cycle or bubble is simply wrong, and why understanding just a little bit about where the advances in AI capabilities are coming from shows why. Continued AI acceleration in the coming months and years isn’t guesswork — its baked-in. There are persistent, reliable processes behind the steady advances as we’ll see.

Stacked advances: "Scaling Laws" and Breakthroughs

The term “scaling law” describes certain processes in AI development, like adding more compute, that have been shown to produce consistent capability advances over a broad range of increases to the input. The idea is, when we increase X (say compute) by 10 times, we can expect the model capability C to increase by say 2 or 5 or 10 times. And we have good reason to expect that relationship to continue even when we increase X by 100 or 10,000 or 100 billion times. In other words, scaling laws are expressions of a practical technique for achieving compounding improvement in capabilities.

AI scaling laws are important because, unlike most scientific research programs with hazy future outcomes, they make the shape of progress to come in AI partly visible through the haze.

A lot of the "legitimate" debate over coming AI capabilities is actually debate over whether a particular scaling law is coming to an end -- whether we have extracted all the "juice" out of it. But the more independent routes to progress with scaling laws are shown to exist, the more confident we can be that consistent progress in capabilities will continue.

There are technical formulations of some of these laws1 , but the general idea can be captured simply: What are the known research processes that generate predictable and significant advances in model capability?

There are also less predictable routes to progress that don’t have scaling laws attached to them, but that have had tremendous periodic impact since the start of the deep learning revolution in the late 2000s. New model architectures like the transformer (the T in ChatGPT) or diffusion models (powering image generative models like Midjourney and Stable Diffusion) fit into this category.

Today, the basket of core approaches contains: advancement in compute hardware, model architectures, increases in training data size, and inference-time scaling. Let's look at each of these and why we know or expect they will continue.

Compute Capacity

Increases in AI compute capability are actually driven by two processes:

  1. Raw technology advances: Chip logic, memory, and interconnect (think networking), and

  2. Increasing cluster deployment sizes

The first is a continuation2 of the compute performance improvements we've seen since the 1940s. The second is driven, and governed, by money and electric power generation capacity. Both of these drivers have a lot more juice left in them.

On Jan 21st , OpenAI, Softbank, and Oracle announced the creation of a new company called Stargate. The announcement was headlined by colossal investment numbers: $500 billion over the next 4 years, with the first $100 billion “being deployed now”, including into an in-progress data center and power project in Texas. When challenged with claims that “they don’t have the money”, Stargate partner Microsoft’s CEO Satya Nadella had a mic drop moment: “All I know is, I’m good for my $80 billion”. The point is, compute deployments are getting much more ambitious.

On the issue of new electricity generation, as I wrote in November: “Although the U.S. leads AI development now, the critical nature of new power station construction suggests China could have a decisive advantage”. This is still true, although there are indications that expanded U.S. electric generation capacity may be a priority for the new administration. The president issued an executive order titled “Declaring a National Energy Emergency”, partly aimed at bypassing regulatory barriers to the construction of energy infrastructure like power plants.

Inference-time Scaling

The youngest of the scaling processes at work, inference-time scaling is anything that improves model performance by the use of more compute by a model while generating the answer, rather than at training time. OpenAI's o1 family of models was introduced in September 2024 with o1-preview and o1-mini. In December, the full o1 was released. o3 followed days later, and is available to some tiers of OpenAI users.

o1 is special because it is the first model family to make use of inference-time scaling to effectively think harder about its answers. This means that by using more compute, it can exhibit greater capability in terms of reasoning and answer quality.

Just as some voices were again claiming we were "hitting a wall" with AI capability increases, progress actually accelerated: partly because the paradigms of increasing model training data (via synthetic data) and compute are not so dead as those voices claim, and but mostly because inference-time scaling has been overlaid on the existing rate of progress as a new scaling law.

One performance metric, but they all look like this. Source: Epoch AI

We’re just months into the inference-time scaling paradigm yet it is showing incredible progress particularly in the improvement of reasoning capabilities which has been a weak point for LLMs.

Model Architecture Advances

Of all the scaling processes at work, model architecture improvement is the most unpredictable. As a result, in my view, it is undervalued as a source of future gains. The dynamic is something like the investor attitude to Apple in the iPod days. Paraphrasing: “Apple's success is hit-driven (new product category driven), and hits can't be predicted, so there will never be another hit and Apple is doomed”. A common take on AI model architectural advances (breakthroughs) is that since they can't be predicted, we should assume there won't be any.

I tend toward a more optimistic view of the prospects for architectural breakthroughs, for a couple reasons.

First, we're still very early in our understanding of learning algorithms, and it's absurd to think that e.g. the transformer is the ultimate architecture. The transformer is the breakthrough model architecture that gives ChatGPT its name (GPT = Generative Pre-trained Transformer) and has powered the capability growth of LLMs since its introduction by Google in 2017. Since then, as the transformer revealed its potential, people have wondered when a successor might arrive. With Google's Titan architecture3 , the next breakthrough architecture may be here. Titan introduces a new form of long term memory and inference-time learning that is promising — we'll learn soon if it lives up to expectations.

Second, we are just entering the era of AI self-improvement through the use of AI researchers (agents), expanding the pool of AI scientists beyond human PhDs. This will prove to be a powerful source of compounding advances to overlay on the rest.

One more thing: Hallucination rates keep dropping

This is an item that really doesn't get enough coverage. The idea that LLMs are unreliable due to “hallucinations” and that this problem is not getting better is widespread. Hallucination behavior is often one of the first things members of the broad public learn about when hearing about LLMs for the first time. Hallucination is obviously a real problem, but it’s getting better, and fairly quickly. The graph below isn’t ordered by release date, but the vertical axis mostly correlates to release dates running from older to newer, bottom to top.

That’s it for this week, enjoy your weekend!

Have feedback? Are there topics you’d like to see covered?
Reach out at: jeff @ roadtoartificia.com

What did you think of this issue?

Login or Subscribe to participate in polls.

1  A great example of a scaling law with a clear technical formulation is the Chinchilla scaling paper, that provides a recipe for choosing the (transformer) model size and training dataset size given a certain compute budget: Training Compute-Optimal Large Language Models - Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al.

2  A type of continuation — because most of the computer era was driven by Moore’s Law, which is no longer a correct description of the rate of semiconductor advances. Instead a sub-Moore pace of transistor density advances continues, but more significantly, embarrassingly parallel workloads (like AI training and inference) and chip designs tuned for them allow other paths for improvement.

3  Titans: Learning to Memorize at Test Time - Ali Behrouz, Peilin Zhong, Vahab Mirrokni