Data-Grounded Verification
Data-grounded verification checks whether an AI agent’s natural language answer accurately reflects the underlying data it was based on. Unlike standard hallucination detection (which uses LLM-as-judge), data-grounded verification compares claims directly against raw query results.
The Problem
AI agents that query databases (text-to-SQL, RAG-over-data) return natural language answers based on structured data. These answers can be wrong in ways no standard hallucination detector catches:
- Numeric misreading: The data says 18.1%, but the LLM says 23%
- Wrong aggregation: The LLM misidentifies which row was the maximum
- Fabricated data: The query returned no results, but the LLM invented a plausible number
- Incorrect conclusions: The LLM quoted a number correctly but drew the wrong conclusion from it
Standard verification approaches (cross-LLM consensus, LLM-as-judge) fail here because no judge LLM has access to the actual data.
How It Works
TruthVouch extracts the raw data from the LLM’s context (SQL queries and results are already present in most text-to-SQL agent prompts) and verifies the response against it:
- Extract data context — Parse SQL queries and raw results from the request messages (tool call outputs, code blocks, structured data)
- Extract claims — Identify numeric and qualitative claims in the response
- Verify each claim:
- Numeric claims → Deterministic arithmetic comparison (not LLM judgment). Example: “Response says Germany=23%, data row shows Germany=18.1%” → MISMATCH
- Qualitative claims → LLM faithfulness check against the data. Example: “Response says Germany had highest uplift, data shows France was highest” → MISMATCH
- Return per-claim verdicts with evidence and an overall grounding score
Key Design Decisions
No Database Connection Required
The raw data is already present in the LLM’s prompt context — we extract it from there. TruthVouch never connects to your database directly.
Deterministic Numeric Verification
Numeric claims are verified with arithmetic, not another LLM call. This eliminates the possibility of an LLM judge making the same mistake as the original LLM.
Non-Blocking (Fire-and-Forget)
Data verification runs asynchronously after the response is returned to the user. It does not add latency to the LLM response path. Results are stored in the audit trail and surface as alerts if contradictions are found.
Integration
Data-grounded verification is available through:
- Governance Gateway — Automatic (stage 19 of the pipeline). Enable per-client in gateway settings
- SDK —
client.verify.data_grounding()(Python, TypeScript, .NET) - MCP —
verify_data_groundingtool in Claude Code - CLI —
truthvouch verify-data - REST API —
POST /api/v1/governance/data-verification/verify
Use Cases
- Text-to-SQL agents: Verify that the natural language answer matches the SQL query results
- BI dashboard copilots: Ensure AI summaries of charts/tables are factually accurate
- RAG-over-data pipelines: Validate that retrieved data is faithfully represented in the response
- Data reporting agents: Catch numeric errors before they reach stakeholders