What evidence logging do I need for LLM SQL incident response?

Question

Accepted Answer

When (not if) something goes wrong, the evidence log is what lets you answer: what did the agent do, who asked it to, and what data was touched? Minimum fields for IR-grade logging: - `event_id` — UUIDv7 for time-ordered correlation. - `timestamp` — RFC 3339, UTC, ns precision. - `prev_hash` — SHA-256 of the previous entry (tamper-evident hash chain; cf. Certificate Transparency). - `entry_hash` — SHA-256 of this entry's canonical form, optionally signed with HSM-held key for non-repudiation. - `subject` — authenticated user ID, tenant, API key fingerprint. - `agent` — agent ID, model + version, system-prompt hash. - `request_id` — correlates with upstream LLM call + downstream DB query ID. - `prompt` — natural-language input (PII-redacted if your policy says so; otherwise raw). - `generated_sql` — the LLM's emitted SQL string verbatim. - `rewritten_sql` — if QueryShield injected predicates. - `decision` — `allow` / `reject` / `rewrite`, with `rule_id` for forensic indexing. - `tables_accessed`, `columns_accessed` — derived from AST. - `row_count` — result count only; **never log row contents** (logging PHI/CHD/PII in audit logs creates a second-order breach). - `execution_ms` — DB-side latency. - `error` — if execution failed downstream. Ship to your SIEM (Datadog / Splunk / Elastic / Chronicle) with alerting on: reject-rate spikes per agent, table-allowlist-violation rule hits, sleep-function denials, multi-statement attempts. Retention: HIPAA six years; PCI DSS one year online + three archived; SOC 2 typically one year. A working IR runbook should be able to answer "did agent X exfiltrate `users.email` between time A and B" in under five minutes by replaying the log filtered on `tables_accessed CONTAINS 'users'` AND `columns_accessed CONTAINS 'email'`.