What happens now to data in public AI?

  • 01 User submits chats, queries, docs into web based public AI model

    Once the data is submitted, it’s visible to the public AI model owner — it must be visible for processing. Data transmitted across the internet isn’t inherently visible, however, it does mean there are more third parties involved and there’s more chance of interception.

  • 02 The public AI infra machine

    Most public AI companies use data submitted by users to “improve their services” and, if not explicitly requested, they use for “training purposes”. The public AI infra stack normally includes their: backend OS; LLM; model training; product improvement; and data storage. Note that most public AI stores user data which creates more data breach risk.

  • 03 Third parties process much of users' data

    Most public AI companies send user data to many third party processors. E.g., OpenAI uses Microsoft, Cloudflare, CoreWeave, Oracle, Google, Snowflake, Salesforce and many more, and in numerous countries. OpenAI’s dynamic Sub-Processor List is here.

  • 04 The upshot

    User and company data, personal and confidential info, and IP are shared with many unknown third parties in different countries for commercial reasons that don’t benefit the user. This increases company costs and may lead to possible harm later from data misuse.