Your Data Never Dies

The purpose fades. The data doesn't.

Michael Eggleton

March 2026

In 2016, hundreds of millions of people downloaded Pokémon Go and started walking around their neighbourhoods pointing their phones at buildings, parks, and street corners, tryna catch 'em all. They were also building one of the most comprehensive spatial datasets on the planet.

Catching more than we thought

A decade later, that data (30 billion images tagged with precise location metadata) is being used to train a "Large Geospatial Model" that powers autonomous delivery robots navigating city streets. Niantic, the company behind the game, spun out a division called Niantic Spatial to commercialise it. Their robots are already operating in Los Angeles, Chicago, Miami, and Helsinki.

The players who used their phone’s camera and location weren't consenting to train navigation systems for delivery robots. That use case didn't exist yet. The consent was for a game. The value extraction is for an entirely different industry, a decade later.

Hungry for more

The internet has been scraped. The publicly available text, images, and code that powered the first wave of large language models is largely exhausted as a novel training resource. The frontier of AI development is now proprietary, unseen, and real-world data - the kind that doesn't exist on the open web. The kind that belongs to individuals, across all their own digital and real-world lives.

This creates enormous pressure on any organisation sitting on a unique dataset. Health records, financial transactions, geospatial imagery, sensor data, clinical notes, behavioural patterns - these are the training sets that AI companies need. And the incentive to monetise them, or to acquire the companies that hold them, is increasing every quarter.

Health and financial data sit at the sharpest end of this. They're simultaneously the most sensitive (deeply personal, heavily regulated) and the most commercially valuable (proprietary, high signal, hard to replicate). Clinical datasets in particular, like patient histories, diagnostic patterns, and treatment outcomes represent exactly the kind of novel, high-quality training data that the market is hungry for. The same is true for financial transaction data, insurance claims, and purchase records. The value locked inside these datasets is capturable by anyone who can access them - which makes the companies holding them acquisition targets, not just service providers.

A quiet exit

The hunger is the macro incentive. The quiet exit is the micro exposure: the everyday moments where data leaves your control and enters someone else's system.

Every document uploaded to a cloud platform, every conversation with an AI assistant, every file synced to a third-party service - these interactions create data that exists somewhere outside your direct control. The terms of service may say it won't be used for training. The architecture still creates exposure. The boundary between "operating the service" and "learning from your usage" is not as clean as most people assume.

This extends beyond the digital. At a previous employer, I was issued a corporate expense card through Float. The KYC (Know Your Customer) verification was handled by an external provider who required biometric data. There was no alternative verification method, and no opt-out. Biometric data was collected by a third-party provider I had no relationship with, simply to reduce a tiny bit of friction for me, and much more so for the finance team.

KYC is a legitimate compliance requirement - but the biometric component was a step beyond what was necessary, and the lack of any alternative meant consent was effectively compulsory. My biometric data now sits in that provider's database. I have no control over their retention policy, their security posture, or what happens to it if they're acquired.

This is the pattern: individually reasonable requests that, in aggregate, create a sprawling footprint of personal data across dozens of third-party systems - most of which you didn't choose and can't monitor.

Organisations holding this data may have every intention of protecting it. But intentions don't survive market pressure forever.

End of life

Even if you trust the company holding your data today, companies don't all last forever. They get acquired. They go bankrupt. They pivot. They get desperate. But the data almost always outlives the company that collected it.

In almost every privacy policy, user data including PII is explicitly classified as a business asset. That's not buried in fine print; it's the standard legal framework. When a company changes hands, the data changes hands with it. In a bankruptcy, it can be liquidated like furniture. Sometimes it becomes one of the few remaining valuable assets. The acquirer may have entirely different incentives, a different risk tolerance, and a different view of what that data is worth.

This is the time dimension that most people don't think about. Consent degrades over time - not because the original agreement was violated, but because the conditions that made it reasonable no longer apply. The company you trusted in 2016 may not be the entity holding your data in 2030. The regulatory environment may have shifted. The commercial incentive to monetise that data may have increased by orders of magnitude.

Health data is the clearest example. A health platform that collected patient data for clinical purposes may be acquired by a company whose primary interest is the dataset itself. The patients consented to care. They didn't consent to becoming training data for a model they'll never see. But the legal framework, as currently constructed, often allows exactly that transfer.

What else?

Of course, this isn't a call to stop using technology, or to treat every service as adversarial. Most of the companies collecting data are acting in good faith, within the current rules.

The problem is structural. The consent model was designed for a world where data had a single, understood purpose at the time of collection. We now live in a world where the most valuable use of data often hasn't been invented yet when it's first collected. The question isn't "do I trust this company?" It's "do I trust every company that might ever hold this data, under every market condition that might ever exist?"

That's a hard question, in the context of the task you're doing right in front of you today. Which raises the question: why should it last forever? What if the default was a different model, like single-use consent? Access scoped to a specific purpose, that ends when the purpose does. Not because every actor is malicious, but because the conditions under which data was collected are never permanent. And in many cases, the need wasn't permanent either.

For our future selves, a different default is worth building around.


Keep reading