How do I test LLM SQL guardrails (red-team + unit tests)?
Two complementary testing modes; run both in CI.
Unit tests (deterministic, fast, run on every PR):
- One test per policy rule: feed a known-bad SQL string, assert reject + rule ID.
- One test per allowed query shape: assert allow.
- One test per rewrite rule: assert AST-equivalent output with the required predicate injected.
- Parser dialect tests: same logical query in Postgres / MySQL / Snowflake / BigQuery dialects all reach the same decision.
- Multi-statement, comment-injection, UNION, CTE, and subquery test cases — at least one per OWASP LLM Top 10 entry that's in scope.
Red-team / adversarial tests (probabilistic, run nightly or pre-release):
- A prompt-injection corpus (Lakera Gandalf, PromptInject, custom retrieved-content payloads) is run through your actual LLM → SQL pipeline; the QueryShield decision is checked.
- A semantic-injection corpus: natural-language prompts designed to elicit destructive valid SQL ("clean up old test users", "show me everyone's info"). Assert reject.
- Differential testing: feed the same NL prompt to two different LLMs (Claude, GPT-4) and assert the guardrail catches both.
- Mutation testing: take a known-good SQL, mutate comments/whitespace/casing/Unicode; assert the parser normalizes correctly.
QueryShield ships queryshield-test with the open red-team corpus + CI integration; results feed an SLI ("reject rate on red-team corpus must be ≥99.5%") that gates deploys.