When you use AI today, you almost certainly use it through an API. You send a prompt to someone else's infrastructure, their data center does the work, and you pay for the privilege. This is the default. It feels inevitable, like this is just how AI works.
But it's a business model, not a law of physics. And it's an intentional one.
The major AI providers have built their entire commercial strategy around centralized inference, backed by unprecedented levels of capital investment. The scale of commitment behind it makes it self-reinforcing.
We've been here before
In the 1960s and 70s, computing meant mainframes. You rented time on someone else's machine. Your work lived on their infrastructure. The economics were simple: more usage, more cost.
The companies selling that compute genuinely believed this was the right model. In 1980, Gordon Bell, DEC's Vice President of Engineering, argued that people should have terminals at home connected to powerful remote computers. The real computing, he said, should happen somewhere else. Given what mainframes could do and what early PCs couldn't, it was a rational position. It also happened to align perfectly with DEC's business.
But the PC changed the economics in a way that the terminal model never could. When you owned the machine on your desk, your marginal cost of doing more work dropped to zero. One document or a thousand, the cost was the same.
The remote-compute model that Bell championed wasn't wrong at the time - it was a phase, not the destination.
A technology problem, not a compute one
The default assumption is that AI inference cost scales with use. The more you use it, the more you pay. And it's easier to let someone else manage the infrastructure, so you do. Both of these things are true today. But they're being treated as permanent features of AI, when they're really artefacts of where the technology is right now.
Treating inference cost as permanent frames it as a compute problem. But it's a technology problem, and technology problems get solved with focused R&D and a clear pathway to improvement.
The pathway is already visible. Inference costs are dropping roughly 10x per year for equivalent capability. Open-source models are closing the gap on proprietary ones with every release cycle. Models that required a data center 18 months ago now run on a laptop.
When local capability crosses the threshold for most tasks, the economics change in the same way the PC changed them. One task or a thousand, the cost is the same. No metering. No per-token billing. You stop rationing AI and just use it. That threshold is approaching faster than the current pricing models account for.
The lock-in is by design
This is not an accident. The major cloud providers have built a self-reinforcing system designed to keep AI centralized.
It starts with investment. Microsoft has put billions into OpenAI. Amazon has invested $8 billion in Anthropic. Google has invested billions in both Anthropic and its own DeepMind. That money flows in as equity, and flows back out as cloud compute contracts. The AI companies build on the investors' infrastructure. The investors fill their data centers. The cycle continues.
Then there's the capital. The scale of investment is hard to overstate.
Combined capital expenditure vs. free cash flow
Amazon, Alphabet, Microsoft, Meta — annual figures in $B
Sources: Company filings, Platformonomics, CNBC, analyst estimates. 2026 reflects midpoint of guidance. Capex includes finance leases. Figures may understate total infrastructure commitment due to off-balance-sheet lease arrangements.
That capital needs to be justified, which means it needs workloads. Your workloads.
As models approach commodity status, the incentive sharpens. If the model isn't meaningfully better than the next one, the moat isn't intelligence. It's entrenchment. Keeping customers on the infrastructure, on the meter, is how a provider avoids being swapped out.
These aren't niche players betting on a trend. Seven of the ten most valuable companies in the S&P 500 are in tech or semiconductors. They have more capital, more market power, and more structural leverage than IBM ever had in the mainframe era. The pattern may be the same, but the incumbents are stronger this time. The move from centralised to distributed won't happen by accident.
The fine line between convenience and dependency
None of this makes cloud AI bad. The products are good. The provider experience is polished, the integration is simple, and the productized tooling around the models, from code generation to agentic workflows, is genuinely useful and proven through adoption. For frontier reasoning and complex multi-step tasks, cloud is a reasonable default right now, especially while the ecosystem is still maturing.
The business model also funds what comes next. Training the next generation of models costs hundreds of millions of dollars. That money comes from API revenue and cloud contracts. The current model isn't just distribution, it's the engine that pushes the frontier forward.
But convenience and dependency look the same until you try to leave. Your data travels to someone else's infrastructure for every interaction. Your costs are set by someone else's pricing. Your capabilities are shaped by someone else's product roadmap. When models are commoditising while switching costs stay high, the value is accruing to the infrastructure provider, not to you.
Cheaper has a ceiling, but access compounds.
Cutting costs on existing work has a ceiling. The real case for distributed AI isn't about making the same work cheaper. It's what becomes possible when the constraint is removed entirely.
The printing press didn't make the same number of books cheaper. It created a world where exponentially more books existed, because the cost of producing one collapsed. Ideas that would never have been published suddenly were.
Local AI opens the same door. When running a model costs nothing extra, you stop asking "is this task worth an API call?" and start asking "what else could I use this for?" That shift, from rationing to exploring, is where new use cases get created. Not by the companies selling the infrastructure, but by the people using it.
When capability is distributed, the value accrues to the people doing the work, not just the companies providing the tools.
Every major computing paradigm started centralised and moved to the edge. AI will not be the exception. The question is whether we build toward that future intentionally, or wait for the incumbents to decide when it's convenient - if at all.
What Else?
The open-source gap is shrinking. How fast is it actually closing, and when does "good enough" start to mean "better than what most companies need"?
Apple Silicon as a local AI platform. The most capable consumer hardware ever built is sitting on millions of desks, and almost nobody is framing it as a local AI play.
Local tooling needs its own moment. The local AI ecosystem needs its own version of the developer experience that cloud providers have spent years refining. Projects like OpenRouter are early signals, but the tooling gap is still the biggest barrier to adoption.
The procurement question. When does "we can run this ourselves" start showing up in enterprise contract negotiations? That's when the economics of this article stop being theoretical.