Generative AI doesn’t just process data; it often remembers it. Prompts, outputs, embeddings, and agent “memories” can all contain sensitive, personal, or strategically important information. That means the “Store” stage in DASUD must expand from databases and file systems to include these new artefacts and behaviours.
If you don’t govern storage and memory, GenAI can quietly become an uncontrolled archive of your most sensitive content.
Identify GenAI‑specific artefacts
Start by listing what your GenAI systems actually store:
- Prompt logs and conversation histories The exact text users type, which can include secrets, personal data, or confidential documents.
- Generated outputs Cached answers, drafts, code snippets, or content saved for reuse or audit.
- Embeddings and vector databases High‑dimensional representations of your documents and data used for retrieval.
- Agent memory State that AI agents keep to personalise behaviour or remember past interactions.
- Safety and feedback logs Records of flagged content, safety interventions, or user feedback.
Each of these artefact types needs explicit decisions on access, retention, and protection.
Segment storage by sensitivity and purpose
Next, avoid treating all GenAI storage as one big bucket.
- Segment by environment Separate instances and stores for public, internal, and highly sensitive use cases. For example, your public marketing chatbot should not store data in the same environment as a healthcare triage assistant.
- Segment by tenant or domain In multi‑tenant or multi‑department setups, ensure data isolation so one client or business unit cannot access another’s prompts, outputs, or knowledge.
Then apply role‑based access control:
- Limit who can access logs, vector stores, and agent memories.
- Require higher approvals or oversight for access to highly sensitive domains (e.g., HR, health, legal).
Segmentation gives you containment. If something goes wrong, the blast radius is smaller.
Govern prompt and output logging
Logging is essential for monitoring, debugging, and auditing—but it’s also a privacy and security risk.
Decide:
- What to log Capture enough metadata to understand behaviour (timestamps, user IDs or pseudonyms, system configuration, high‑level event types) without storing more raw content than necessary.
- How long to keep logs Set retention windows based on risk and regulation. High‑risk domains may need longer retention for audit; low‑risk use cases may justify shorter windows to minimise exposure.
- How to redact or anonymise Where possible, automatically remove identifiers or certain patterns (e.g., credit card numbers) from logs, or avoid logging full prompt/output text altogether in specific domains.
Communicate clearly to users what is logged and why—especially where personal or sensitive data may be involved.
Embeddings and vector store governance
Embeddings and vector stores feel abstract, but they are essentially another way of storing your content.
You should define:
- What can be embedded Consider prohibiting direct embedding of raw personal data, or at least requiring explicit justification and strong protections. Encourage pre‑processing or anonymisation where feasible.
- How to handle updates and deletions Track which documents were embedded, with what IDs and versions. When a document is updated or removed (e.g., to reflect a policy change or a deletion request), have a process to re‑index or delete corresponding vectors.
Treat vector stores as part of your regulated data environment, not as a side‑channel.
Agent memory and long‑term personalisation
For systems that use agent‑like behaviour or personalisation:
- Decide what may be remembered Preferences, interaction patterns, and generic context may be acceptable. Highly sensitive content, personal circumstances, or identifiers may not be.
- Distinguish between per‑user and global memory Per‑user memories should be isolated; global memories (shared learnings) must be carefully curated to avoid leaking one user’s data into another’s experience.
- Provide reset and control options Users should have a way to clear or reset their personal memory, and you should have technical guarantees that this genuinely removes their data from relevant stores.
Make it concrete
Build a “GenAI storage and memory matrix” that lists each artefact type, where it lives, who can access it, how long it’s kept, and how it’s protected. Use it to review one existing GenAI system, identify gaps, and then standardise your approach.
By bringing prompt logs, embeddings, and agent memories into your storage governance, you prevent GenAI systems from becoming hidden, uncontrolled data lakes—and you build trust that these powerful tools aren’t quietly hoarding more than they should.
If you’d like assistance or advice with your Data Governance implementation, or any other topic (Privacy, Cybersecurity, Ethics, AI and Product Management) please feel free to drop me an email here and I will endeavour to get back to you as soon as possible. Alternatively, you can reach out to me on LinkedIn and I will get back to you within the same day!