How do I handle GDPR right-to-be-forgotten across LLM SQL paths?

Question

Accepted Answer

GDPR Article 17 (right to erasure) requires that a data subject's personal data be deleted across all systems on valid request. LLM text-to-SQL pipelines complicate this in three places: 1. **The underlying database.** Standard `DELETE FROM users WHERE id = :sub`. Tombstone or anonymize per your data model. No LLM concern. 2. **The evidence log.** Query logs may contain personal data (the natural-language prompt may reference the subject by name; the SQL may include their ID; row counts and column names are metadata, not PII, but the prompt text often is). Two patterns: - **Pseudonymize subject IDs in logs** from the start, with the mapping stored in a separately-erasable table. - **Subject-indexed log entries** so erasure can scrub specific records. QueryShield supports a `subject_pseudonym` field plus a documented erasure procedure. 3. **The LLM provider's retained data.** If your prompts go to OpenAI/Anthropic/Google, those providers retain prompts per their DPA. Enterprise tiers usually offer zero-retention modes; otherwise prompt-level scrubbing of subject identifiers before sending is the right control. 4. **Vector indexes / RAG embeddings.** If user data was embedded for retrieval, the embeddings must also be deleted. Maintain a subject-id → embedding-id map. Standard procedural posture: erasure runs as a single workflow against (DB, evidence log, LLM provider deletion API, vector store), with each step verified before closing the ticket within 30 days (Art 12(3)).