How do I test LLM SQL guardrails (red-team + unit tests)?

Question

Accepted Answer

Two complementary testing modes; run both in CI. **Unit tests** (deterministic, fast, run on every PR): - One test per policy rule: feed a known-bad SQL string, assert reject + rule ID. - One test per allowed query shape: assert allow. - One test per rewrite rule: assert AST-equivalent output with the required predicate injected. - Parser dialect tests: same logical query in Postgres / MySQL / Snowflake / BigQuery dialects all reach the same decision. - Multi-statement, comment-injection, UNION, CTE, and subquery test cases — at least one per OWASP LLM Top 10 entry that's in scope. **Red-team / adversarial tests** (probabilistic, run nightly or pre-release): - A prompt-injection corpus (Lakera Gandalf, PromptInject, custom retrieved-content payloads) is run through your actual LLM → SQL pipeline; the QueryShield decision is checked. - A semantic-injection corpus: natural-language prompts designed to elicit destructive valid SQL ("clean up old test users", "show me everyone's info"). Assert reject. - Differential testing: feed the same NL prompt to two different LLMs (Claude, GPT-4) and assert the guardrail catches both. - Mutation testing: take a known-good SQL, mutate comments/whitespace/casing/Unicode; assert the parser normalizes correctly. QueryShield ships `queryshield-test` with the open red-team corpus + CI integration; results feed an SLI ("reject rate on red-team corpus must be ≥99.5%") that gates deploys.