When you use AI today, you almost certainly use it through an API. You send a prompt to someone else's infrastructure, their data center does the work, and you pay for the privilege. This is the default. It feels inevitable, like this is just how AI works.

But it's a business model, not a law of physics. And it's an intentional one.

The major AI providers have built their entire commercial strategy around centralized inference, backed by unprecedented levels of capital investment. The scale of commitment behind it makes it self-reinforcing.

We've been here before

In the 1960s and 70s, computing meant mainframes. You rented time on someone else's machine. Your work lived on their infrastructure. The economics were simple: more usage, more cost.

The companies selling that compute genuinely believed this was the right model. In 1980, Gordon Bell, DEC's Vice President of Engineering, argued that people should have terminals at home connected to powerful remote computers. The real computing, he said, should happen somewhere else. Given what mainframes could do and what early PCs couldn't, it was a rational position. It also happened to align perfectly with DEC's business.

But the PC changed the economics in a way that the terminal model never could. When you owned the machine on your desk, your marginal cost of doing more work dropped to zero. One document or a thousand, the cost was the same.

The remote-compute model that Bell championed wasn't wrong at the time - it was a phase, not the destination.

A technology problem, not a compute one

The default assumption is that AI inference cost scales with use. The more you use it, the more you pay. And it's easier to let someone else manage the infrastructure, so you do. Both of these things are true today. But they're being treated as permanent features of AI, when they're really artefacts of where the technology is right now.

Treating inference cost as permanent frames it as a compute problem. But it's a technology problem, and technology problems get solved with focused R&D and a clear pathway to improvement.

The pathway is already visible. Inference costs are dropping roughly 10x per year for equivalent capability. Open-source models are closing the gap on proprietary ones with every release cycle. Models that required a data center deployment 18 months ago now run on a laptop.

When local capability crosses the threshold for most tasks, the economics change in the same way the PC changed them. One task or a thousand, the cost is the same. No metering. No per-token billing. You stop rationing AI and just use it. That threshold is approaching faster than the current pricing models account for.

The lock-in is by design

This is not an accident. The major cloud providers have built a self-reinforcing system designed to keep AI centralized.

It starts with investment. Microsoft has put billions into OpenAI. Amazon has invested $8 billion in Anthropic. Google has invested billions in both Anthropic and its own DeepMind. That money flows in as equity, and flows back out as cloud compute contracts. The AI companies build on the investors' infrastructure. The investors fill their data centers. The cycle continues.

Then there's the capital. The scale of investment is hard to overstate.

Combined capital expenditure vs. free cash flow

Amazon, Alphabet, Microsoft, Meta — annual figures in $B

Sources: Company filings, Platformonomics, CNBC, analyst estimates. 2026 reflects midpoint of guidance. Capex includes finance leases. Figures may understate total infrastructure commitment due to off-balance-sheet lease arrangements.

The fine line between convenience and dependency

None of this makes cloud AI bad. The products are good. The provider experience is polished, the integration is simple, and the productized tooling around the models, from code generation to agentic workflows, is genuinely useful and proven through adoption. For frontier reasoning and complex multi-step tasks, cloud is a reasonable default right now, especially while the ecosystem is still maturing.

The business model also funds what comes next. Training the next generation of models costs hundreds of millions of dollars. That money comes from API revenue and cloud contracts. The current model isn't just distribution, it's the engine that pushes the frontier forward.

But convenience and dependency look the same until you try to leave. Your data travels to someone else's infrastructure for every interaction. Your costs are set by someone else's pricing. Your capabilities are shaped by someone else's product roadmap. When models are commoditising while switching costs stay high, the value is accruing to the infrastructure provider, not to you.

What rented intelligence will never unlock

Cutting costs on existing work has a ceiling. The real case for distributed AI isn't about making the same work cheaper. It's what becomes possible when the constraint is removed entirely.

Right now, every AI interaction carries a cost signal. Not just money, but friction: latency, privacy trade-offs, the ambient awareness that someone else's meter is running. That signal shapes behavior. You use AI for the tasks that clearly justify the cost, and you don't use it for the ones that might not. The entire category of "maybe worth trying" never gets explored.

When running a model costs nothing extra, that filter disappears. You stop rationing intelligence and start exploring it. You stop asking "is this worth an API call?" and start asking "what else could I use this for?" When capability is distributed, the value accrues to the people doing the work, not just the companies providing the tools.

The printing press didn't just reduce the cost of books. It created entire categories of written work that nobody had imagined, because nobody had needed to. The same potential sits inside local AI. But unlike the printing press, there are trillion-dollar incentives working to keep this transition from happening on its own.

Computing has oscillated between centralisation and distribution for decades. The question is whether we build toward that future intentionally, or wait for the incumbents to decide when it's convenient.

What Else?

The open-source gap is shrinking. How fast is it actually closing, and when does "good enough" start to mean "better than what most companies need"? Especially as smaller, targeted models start outperforming larger ones at specific tasks.

Apple Silicon as a local AI platform. When Apple announced the M1 in 2020, the word "AI" didn't appear once in the press release. Machine learning was listed behind CPU performance, GPU, and battery life. But the architecture they built, unified memory, high GPU core counts, and fast on-chip bandwidth, turns out to be exactly what local inference needs. The most capable local AI hardware in the world landed on millions of desks as a side effect of a chip designed for power efficiency, not AI.

Local tooling needs its own moment. Cloud AI is the default because it's frictionless: sign up, start prompting. Local still means choosing a model, configuring hardware, and stitching together an experience yourself. Projects like OpenRouter are early signals, but the usability gap between cloud and local is still the biggest barrier to adoption.

Sovereignty and privacy may force the issue. Growing data residency requirements and enterprise sensitivity around sending proprietary information to third-party APIs could accelerate the move to local faster than economics alone. The compliance case for local AI may arrive before the cost case does. When does "we can run this ourselves" start showing up in enterprise contract negotiations?

AI's Mainframe Moment

Rented intelligence is a business model, not a technical inevitability.