How do I block comment-based injection (`-- DROP TABLE...`) in LLM output?

Question

Accepted Answer

Comment-based injection exploits SQL's comment syntax (`--` line comments, `/* */` block comments, MySQL's `#`) to either smuggle payloads past naive validators or to hide intent from log review. Examples the LLM might emit (often steered by retrieved-content prompt injection): ```sql SELECT * FROM orders WHERE id = 42; -- DROP TABLE users; SELECT * FROM /* DROP TABLE users; */ orders; SELECT * FROM orders /*! ; DROP TABLE users */; -- MySQL conditional comment ``` Regex blocklists fail here in two directions: they either miss the comment-embedded payload (because the `DROP` is inside a comment that the regex skips) or they over-block legitimate queries that contain the word `DROP` in a column alias or string literal. AST-level defense: 1. **Real parser, real grammar.** libpg_query / sqlglot strip comments during parsing and emit a clean tree. The `-- DROP TABLE users` after a `SELECT` literally does not appear in the parsed AST. The question becomes: did the parser see one statement or two? 2. **Multi-statement detection.** Most parsers will emit a list of statements; `len(statements) > 1` → reject. This catches `SELECT ... ; DROP ...` cleanly. 3. **MySQL conditional-comment awareness.** `/*! ... */` is code in MySQL, not a comment. Use a MySQL-aware parser (sqlglot dialect mode) so these are parsed as the statements they actually are. 4. **Driver-level single-statement mode.** `allowMultiQueries=false`, `multi_statements=false`, depending on driver. Belt-and-suspenders. QueryShield runs a dialect-aware parse + multi-statement check + driver-level enforcement.