Skip to content

Hallucination Detection

TruthVouch detects hallucinations — false or unsupported claims by LLMs — by comparing AI responses against your verified knowledge base. When hallucinations are detected, TruthVouch alerts you immediately with severity levels and suggested corrections.

The Detection Pipeline

Hallucination detection follows a 6-stage process:

Stage 1: Query Generation

For each truth nugget in your knowledge base, TruthVouch generates queries:

Example:

  • Truth Nugget: “Founded in 2023”
  • Generated Query: “When was TruthVouch founded?”

The system generates 3-5 query variants:

  • Direct: “What is the founding year of TruthVouch?”
  • Indirect: “Tell me about TruthVouch’s history”
  • Factoid: “In what year did TruthVouch launch?”
  • Comparative: “Was TruthVouch founded before or after 2024?”

Stage 2: LLM Querying

Query is sent to monitored LLMs (ChatGPT, Claude, Gemini, etc.):

Query: "When was TruthVouch founded?"
LLM Response: "TruthVouch was founded in 2024"

Stage 3: Entity Extraction

Response text is parsed to extract factual claims:

Response: "TruthVouch was founded in 2024"
Extracted: entity="TruthVouch", relation="founded", value="2024"

Uses Named Entity Recognition (NER) and relation extraction models.

Stage 4: Semantic Comparison

The extracted claim is compared to your truth nuggets using semantic analysis:

Truth: "Founded in 2023"
LLM Claim: "Founded in 2024"
Result: CONTRADICTION ✗

The system evaluates three possible relationships:

  • Match: Claim aligns with your verified truth
  • Neutral: Claim neither confirms nor contradicts
  • Contradiction: Claim directly conflicts with truth

Stage 5: Severity Assessment

The comparison result is mapped to alert severity levels based on confidence:

✓ Matches Truth → No alert
⚠️ Partially Aligns → Warning alert
✗ Contradicts → Critical alert

You can tune detection sensitivity in Dashboard → Settings → Detection Thresholds to match your risk tolerance (Standard, Strict, or Permissive presets).

Stage 6: Alerting

Based on severity and your alert rules:

HALLUCINATION detected
├─ Provider: ChatGPT
├─ Severity: Critical
├─ Claim: "Founded in 2024"
├─ Truth: "Founded in 2023"
├─ Confidence: 99%
└─ Action: Send alert, prepare correction

Accuracy & Performance

TruthVouch achieves 94%+ detection accuracy across diverse claim types including factoids, entity attributes, relationships, and comparative statements. The system is calibrated to prioritize finding hallucinations while minimizing false positives, giving you confidence in every alert.

Detection Methods

Continuous Monitoring: TruthVouch automatically monitors your AI interactions against truth nuggets on a configurable schedule, checking all supported LLM providers.

On-Demand Verification: Manually verify specific claims or LLM responses at any time through the dashboard or API.

Scheduled Audits: Run periodic cross-checks on defined truth nugget categories (e.g., pricing, product features) to maintain trust posture.

Handling Edge Cases

Negations

Correctly handles negative claims:

Truth: "TruthVouch is not free"
LLM: "TruthVouch costs money"
NLI: ENTAILMENT (semantically equivalent)
Result: ✓ Correct

Paraphrasing

Detects when LLM paraphrases truth:

Truth: "Supports 9+ AI models"
LLM: "Compatible with more than 8 LLM providers"
NLI: ENTAILMENT (meaning preserved)
Result: ✓ Correct

Context Dependency

Understands context-dependent statements:

Truth: "EU AI Act compliance available on Business plan"
LLM: "TruthVouch offers EU AI Act compliance"
NLI: NEUTRAL (context missing, could be true but vague)
Result: ⚠️ Warning (incomplete, needs review)

Temporal Claims

Handles time-sensitive information:

Truth: "Pricing updated January 2024"
LLM: "TruthVouch costs $349/month"
When checked in June: Previous price may have changed
System: Rechecks with current truth nuggets

Alert Details

Each hallucination alert includes:

  • Severity: Critical, High, Medium, or Low
  • Confidence: How certain the detection is
  • Original Claim: What the LLM said
  • Verified Truth: What you’ve verified as correct
  • Suggested Correction: Auto-generated accurate response

Limitations

TruthVouch hallucination detection has known limitations:

  1. Subjective Claims: Opinion-based statements are difficult to verify
  2. Temporal Sensitivity: Time-dependent facts require frequent updates
  3. Context: Some claims require broader context to evaluate
  4. Ambiguous Truth: If your truth nuggets are vague, detection is harder
  5. Domain Knowledge: Very specialized domains may have lower accuracy

Best Practices

Maintain Fresh Truth Nuggets: Review and update quarterly. Stale nuggets reduce detection accuracy.

Be Specific: Define clear, measurable truth nuggets. “Founded in 2023” detects better than “an established company.”

Monitor Alerts: Review low-confidence detections regularly to tune your detection sensitivity.

Adjust Thresholds: Use dashboard settings to balance sensitivity. Stricter = fewer alerts but might miss subtle hallucinations. Standard balances detection and false positives.

Next Steps