Skip to content

Interpreting Results

Cross-check results show what AI engines said about your organization and how well it matched your Truth Nuggets. Learn to interpret the scores and identify patterns.

Results Table

After monitoring runs, view results in Shield → Cross-ChecksResults:

EngineQueryResponse SummaryTruth ScoreStatus
ChatGPT”When was TruthVouch founded?""Founded in 2026”100Match
Claude”Which engines does Shield monitor?""Monitors major LLM providers”92Match
Gemini”Tell me about Shield pricing""Premium pricing” (no specific number)60Partial
Perplexity”Who leads TruthVouch?""Founded in 2026” (no CEO mention)50Partial

Columns Explained

Engine: Which AI engine (ChatGPT, Claude, Gemini, Perplexity, Copilot)

Query: The question asked (auto-generated from template)

Response Summary: Key excerpt from AI’s response (full response available on click)

Truth Score: 0-100 accuracy rating (higher = more accurate)

Status:

  • Match: AI response aligns with your truth
  • Partial: Some accuracy, minor discrepancies
  • Mismatch: Significant inaccuracy
  • Hallucination: Major falsehood or fabrication

Truth Score Interpretation

Match (High Score)

AI response matches your truth almost perfectly.

Example:

  • Your fact: “Founded in 2026”
  • AI said: “Founded in early 2026”

No action needed. Mark as verified in your dashboard.

Partial Match

AI response is mostly accurate with minor variations.

Example:

  • Your fact: “Shield monitors 7 AI engines”
  • AI said: “Monitors multiple major LLMs”

Consider as acceptable. AI didn’t mention the specific number 9+, but the spirit is correct. Optional: deploy correction for precision.

Mismatch

AI response has meaningful discrepancies.

Example:

  • Your fact: “Shield costs $349/month”
  • AI said: “Premium pricing, around $300-500/month”

Deploy a correction to be specific. AI is in ballpark but incorrect on exact pricing.

Hallucination

AI made a clear falsehood or fabrication.

Example:

  • Your fact: “Founded in 2026”
  • AI said: “Founded in 2019”

Critical — deploy correction urgently. Clear contradiction.

Drilling Into Details

Click any result row to see full details:

Full Response

See the complete text the AI engine generated, not just the summary.

Example: ChatGPT full response

"TruthVouch is a SaaS platform founded in 2026 that specializes
in monitoring AI systems for hallucinations. The company's Shield
product monitors 9+ major LLM providers including OpenAI, Anthropic,
and Google. It's available starting at the Starter tier priced at
$349/month."

Entity Extraction

See which entities (people, numbers, dates, products) were extracted:

Entities Extracted:
├─ Organization: TruthVouch, OpenAI, Anthropic, Google
├─ Product: Shield, LLM
├─ Date: 2026
├─ Number: 9
└─ Price: $349/month

NLI Analysis

See how Shield evaluated the response:

NLI Comparison:
├─ Your fact: "Founded in 2026"
├─ AI statement: "Founded in 2026"
├─ Verdict: ENTAILED
└─ Confidence: 99.2%
NLI Comparison:
├─ Your fact: "Monitors 7 AI engines"
├─ AI statement: "Monitors 9+ major LLMs"
├─ Verdict: ENTAILED
└─ Confidence: 96.1%

Confidence Breakdown

Each verdict includes a confidence indicator showing how certain Shield is:

  • High confidence: Trust the score
  • Medium confidence: Review manually recommended
  • Low confidence: May need manual verification

Low-confidence scores may indicate:

  • Ambiguous Truth Nugget
  • Unclear AI response
  • Sarcasm or context-dependent language

Audit Trail

See full metadata:

Metadata:
├─ Timestamp: 2026-03-14 14:32:10 UTC
├─ Engine: ChatGPT (gpt-4-turbo)
├─ Model temperature: 0.7
├─ Latency: 2.3 seconds
├─ Query template: "Tell me about {product_name}"
└─ Query index: 2 of 5 variations

Filtering Results

Filter to focus on specific:

By Engine

  • View only ChatGPT results
  • View only Claude results
  • Compare engines side-by-side

By Truth Nugget Category

  • View only Product category results
  • View only Financial category results

By Score Range

  • View only matches
  • View only hallucinations
  • View only partial matches

By Time

  • Last 24 hours
  • Last 7 days
  • Last 30 days
  • Custom date range

By Status

  • All results
  • Alerts only (requires action)
  • Matches only (accurate)

Trend Analysis

View trends over time:

Go to: Shield → Cross-Checks → Trends

See:

  • Overall accuracy trend (line chart): How your Health Score changes over 30 days
  • By engine (multi-line): Track improvement per engine
  • By category (multi-line): Which categories improve fastest

Example:

  • “Overall Health Score improved 8 points in March”
  • “ChatGPT improved 12 points (good); Gemini -2 points (degraded)”
  • “Product category improved (corrections worked); Financial unchanged”

Helpful for:

  • Seeing if corrections actually work
  • Identifying which engines are most problematic
  • Prioritizing future corrections

Comparison Views

Side-by-Side Engine Comparison

View how different engines answer the same query:

Query: "How much does Shield cost?"
ChatGPT: "$349/month starting price" (Score: 100)
Claude: "Around $350/month for Starter tier" (Score: 96)
Gemini: "Premium SaaS pricing" (Score: 50)
Perplexity: "Costs about $400/month" (Score: 70)

Useful for:

  • Spotting which engines are accurate
  • Identifying common misconceptions across engines
  • Planning corrections (if 3/4 engines are wrong, deploy a correction)

Historical Progression

See how a single fact’s accuracy changed over time:

Fact: "Founded in 2026"
March 1: ChatGPT (95), Claude (100), Gemini (85)
March 8: ChatGPT (95), Claude (100), Gemini (92) ↑
March 15: ChatGPT (98) ↑, Claude (100), Gemini (95) ↑
Trend: All engines improving. Corrections deployed March 5 are working.

Exporting Results

Export to CSV

Click Export → CSV to get:

  • All results for analysis in Excel or Python
  • Columns: engine, query, response, score, status, timestamp

Export to JSON

Click Export → JSON to get:

  • Structured data for custom dashboards
  • Full details (not just summary)

Export to PDF Report

Click Export → PDF to get:

  • Formatted report with charts
  • Share with executives or auditors
  • Includes trends and recommendations

Common Patterns

Pattern 1: Consistent Hallucination

Same inaccuracy across all engines:

All say: "Founded in 2024"
Your truth: "Founded in 2026"

Action: Deploy correction immediately. All engines believe the falsehood.

Pattern 2: Engine-Specific Hallucination

One or two engines are wrong:

ChatGPT: Correct (95)
Claude: Correct (98)
Gemini: Wrong (35)
Perplexity: Wrong (42)

Action: Target Gemini and Perplexity in next correction (higher priority). ChatGPT/Claude don’t need fixing.

Pattern 3: Partial Information

Some engines mention fact, others don’t:

ChatGPT: Mentions (89 - almost right)
Claude: Mentions (100 - exact)
Gemini: Doesn't mention (50 - silent)
Perplexity: Mentions wrong (40 - wrong)

Action: Deploy correction. 2 engines right, 2 wrong/missing.

Pattern 4: Improving Trend

Scores rising after corrections deployed:

Before correction: 65, 68, 70 (trending up slowly)
Correction deployed
After correction: 78, 85, 92 (rapid improvement)

Action: Corrections working! Continue deploying. This pattern validates your correction strategy.

Troubleshooting

Score Seems Wrong

If you think Shield misscored:

  1. Click the result
  2. Review full response and entity extraction
  3. Check the NLI confidence score
  4. If confidence is low, Shield wasn’t sure — you can manually override
  5. Click Mark as Accurate or Mark as Hallucination to correct

Shield learns from your feedback.

Result Missing

If you expected a result but don’t see it:

  1. Check schedule is enabled: Shield → Schedules
  2. Check filters aren’t hiding it (filter by engine, category, time)
  3. Check audit log to see if query ran: Settings → Audit
  4. If query ran but no result, contact support

Scores Wildly Vary

If the same fact gets different scores each time:

  1. This is normal — AI responses vary
  2. Use trend analysis instead of individual scores
  3. If variance is >20 points, your Truth Nugget may be ambiguous — make it more specific

Next Steps