Private AI 101

Private vs public AI

  • Private AI is an approach to AI designed with data protection at its core. Instead of sending sensitive data to public clouds and open LLMs for training and processing, private AI keeps information within your infrastructure — whether that’s on-prem, in a VPC, or inside secure containers.

    Private AI = Local AI

    Public AI = Open-sourced AI

  • Your data (inputs, prompts, attachments, uploads, outputs, answers, documents) doesn’t leave the machine. Our solution works without being connected to the internet or cloud.

  • There are thousands of software providers who claim to keep data private, but what they really mean is that they de-identify personal information (PII) before the data goes online, to the LLM provider or to the cloud. These techniques may solve PII but they don’t protect other private, third party or confidential info or IP.

    1. Data ownership: private AI ensures that your sensitive corporate and personal data, and IP remain within your control. This reduces the risks associated with sharing data with external entities.

    2. Regulatory compliance: private AI solutions comply with data protection laws such as GDPR, HIPAA, GLBA, the Privacy Act. Adhering to these regulations is vital to avoid legal consequences.

    3. Tailored solutions: private AI allows for customised AI models tailored to your organisation's requirements. Unlike public AI, which may offer generic solutions, private AI enables you to develop models specifically optimised for your business needs.

    4. Cost effectiveness: private AI offers more predictable costs, especially at scale. By investing in your own AI infrastructure and data management, you can avoid the unpredictable expenses associated with public AI services.

    5. Strategic independence: with private AI, you maintain strategic independence over your AI initiatives. By controlling your data and AI models, you can make decisions that align with your organisation's long-term goals and objectives.

    6. Audit trail: ensure that your AI remains fully auditable and accountable.

    1. All data intake and preprocessing stays in your environment.

    2. Model is trained locally using centralised or federated learning.

    3. Privacy first inherent via encryption and anonymisation.

    4. Deployed in controlled, offline infra according to your policies.

    5. Audit and accountability — full visibility of model behaviour.

    1. Data ownership: full control of data.

    2. Built-in compliance and regulatory compliance.

    3. Faster, smarter AI: local responsiveness with localised response.

    4. Lower risk: no data leaks or shadow AI.

    5. Scalable architecture: adaptable for hybrid/multi-cloud.

    1. More complicated IT integration.

    2. Higher skilled technology team requirements.

    3. Governance policies required.

    4. Infrastructure investment.

  • Zainode has created a web application where, after users submit their data (queries, documents etc.), the data remains accessible exclusively within the client's infrastructure, visible only to them and the individuals they authorise.

    • Cost predictability is a key concern as enterprises move workloads from public to private cloud environments.

    • Also a desire for greater control and security is leading enterprises to devote more resources to private cloud infrastructure. See this research from Broadcom for more info.

    • It is theoretically possible for a cloud provider to view all data that passes through their systems.

    • The more important concern with cloud providers is:

      1. Cost, particularly long-term cost stability.

      2. Availability of additional GPU compute when scaling needs hit.

      3. Being vendor locked and then above two points are triggered.

  • Public Private

    Data control Limited Full

    Privacy risk High Low

    Customization Low High

    Deployment Web Local

    Compliance Lacking Built-in

    Cost Variable Fixed


Testing claims of data security from other vendors

  • In the fine print, most vendors say they “may not use the data for training”. This is highly unlikely because learning from the data is the primary method to improve LLMs.

    Most software vendors don’t own the LLM so ultimately the LLM provider controls what data they use for training.

  • Any data sent anywhere these days is encrypted, so this is a nothing point from vendors.

    Whilst it’s common for data to be encrypted before being sent, the data is decrypted at the final destination (the public LLM owner and/or application vendor). Vendors cannot claim that encrypted data is never decrypted and never visible to any system. If so, the system would not function at all.

  • What they are really saying is, 'We actually don't run any of our AI ourselves, and most likely don’t really understand AI, so we use this [big tech] LLM and cloud company. We also have no idea how they do data protection, so go look it over at their policies and leave us alone.’

  • The AI app/software vendor must see the client data to operate on the data in some way. If they can’t see client data, then they are simply passing it onto another provider to operate on it in some way. It is extremely unlikely that a vendors’ platform is not storing client data. It would be required to be stored for it to be surfaced in the UI of the app or otherwise mutated in their platform.

    If an AI app uses a public LLM, the LLM provider must be able to see the data to produce answers for the data or operate on it in some way. Most companies using external LLM providers dance around this, but, at the end of the day, they are passing their clients’ data onto third parties for processing.

  • This may be correct, however it widely known that public LLM providers share submitted data with third party processors, partners and LE/regulators. They also use the data for training new models. See here.

  • Some vendors simply run public LLMs on the cloud, and connect them up to a basic chat app. This is more “private” than using the public LLM directly, but still the client data is going to the cloud and being processed by someone else — often the cloud owned by the same big tech company who operates the LLM. Thus, it's not private AI and it's definitely not sovereign AI.