Testing “your data is safe with us”
How to test the claims that every tech company makes?
Every AI and tech vendor says they protect your data, but almost none of them do it in the way you’d assume if you didn’t read the fine print.
This isn't about bad actors. It's about how AI systems and the cloud actually works, and the gap between what vendors say and what’s technically true. Here's how to think through the most common claims.
“We don't train on your data”
Read the T&Cs carefully and you'll typically find “we may not use your data for training” which is not the same thing as “we will not or do not …”. Training on user data is one of the primary mechanisms for improving AI models and the incentive to do it is significant.
More importantly most software vendors (think your CMS, accounting package, drafting software, quoting app) don’t own or control the underlying AI model. They are reselling access to OpenAI, Anthropic, Google or another provider’s model. Whatever that AI model provider does with data is largely outside the software vendor's control, regardless of what contract sits between them.
“All data is encrypted, so it's protected”
Every data transfer these days is encrypted. Saying so is table stakes, not a privacy guarantee.
The relevant question is, “what happens when the data arrives at its destination?” It gets decrypted. It has to otherwise the model can’t process it. The vendor cannot simultaneously claim that data is encrypted and that the AI can act on it. At the point of processing, someone’s system can see it.
“We use big tech cloud provider X, so you're covered”
What this often means is, “We don't run any AI ourselves. We’ve connected a third-party LLM to a cloud provider, and we're passing your data through both.”
It also means that if the cloud or model provider is a US company, the US CLOUD Act applies. American companies can be compelled to produce data stored anywhere in the world, including data in Australian data centres, when requested by US authorities. “Your data stays in Australia” is not the same as “your data is sovereign.”
“We can't see your data”
If a vendor is building an application that operates on your data, they/their systems must be able to see it to do anything with it. If they genuinely can’t see it, they’re simply passing it to another provider who can. Either the vendor processes your data, or someone they’ve contracted with does. There is no third option where data gets acted on without being visible to any system.
How to evaluate a vendor?
Find out:
Where does inference happen? On whose hardware, in which jurisdiction, under whose operational control?
Who owns the underlying model? If it’s not the vendor, what are the AI model provider’s data terms?
Is client data stored? For how long, where, and who has access?
What does “private” mean architecturally? Not in the marketing copy, in the actual system design?
The vendors who can answer these questions clearly and specifically, without redirecting to a policy document, are the ones worth taking seriously. The ones who can’t, or who answer in abstractions, are telling you something important.
Genuine data sovereignty means your data never leaves an environment you control. Everything short of that is a compromise. Know what compromise you’re making before you make it.