How does PCI DSS apply to AI text-to-SQL pipelines?
PCI DSS v4.0 (mandatory March 2025) governs any system that stores, processes, or transmits cardholder data (CHD). An LLM text-to-SQL pipeline querying a CHD-adjacent database is in scope if the agent can return CHD or sensitive authentication data (SAD). Relevant requirements:
- Req 3 — Protect stored CHD. PAN must be rendered unreadable; truncation/masking at the view layer is the norm. AST policy enforces that the agent's SQL never projects
pan_fulland only the maskedpan_last4view. - Req 7 — Restrict access to need-to-know. Per-agent role with access only to masked CHD views; QueryShield policy file is the documented control.
- Req 8 — Identify and authenticate access. Every agent has an identity; every request carries an authenticated subject; evidence log ties both to the SQL.
- Req 10 — Log and monitor all access to CHD. Tamper-evident audit log of every query against CHD tables; retain one year online + three years archived.
- Req 11.4 — Penetration testing. Annual + significant-change tests must include the LLM SQL path; QueryShield publishes a red-team test suite (prompt injection → SQL).
Realistic posture: keep the LLM out of the CHD scope wherever possible (use tokenized references), and where it must touch CHD-adjacent data, enforce minimum-necessary at the AST layer with QueryShield + masked views + comprehensive logging. The QSA's first question will be "show me the access control list for this agent" — your policy file *is* the answer.