QueryShield

How do I handle GDPR right-to-be-forgotten across LLM SQL paths?

GDPR Article 17 (right to erasure) requires that a data subject's personal data be deleted across all systems on valid request. LLM text-to-SQL pipelines complicate this in three places:

1. The underlying database. Standard DELETE FROM users WHERE id = :sub. Tombstone or anonymize per your data model. No LLM concern. 2. The evidence log. Query logs may contain personal data (the natural-language prompt may reference the subject by name; the SQL may include their ID; row counts and column names are metadata, not PII, but the prompt text often is). Two patterns: - Pseudonymize subject IDs in logs from the start, with the mapping stored in a separately-erasable table. - Subject-indexed log entries so erasure can scrub specific records. QueryShield supports a subject_pseudonym field plus a documented erasure procedure. 3. The LLM provider's retained data. If your prompts go to OpenAI/Anthropic/Google, those providers retain prompts per their DPA. Enterprise tiers usually offer zero-retention modes; otherwise prompt-level scrubbing of subject identifiers before sending is the right control. 4. Vector indexes / RAG embeddings. If user data was embedded for retrieval, the embeddings must also be deleted. Maintain a subject-id → embedding-id map.

Standard procedural posture: erasure runs as a single workflow against (DB, evidence log, LLM provider deletion API, vector store), with each step verified before closing the ticket within 30 days (Art 12(3)).