Interpreting Results
Cross-check results show what AI engines said about your organization and how well it matched your Truth Nuggets. Learn to interpret the scores and identify patterns.
Results Table
After monitoring runs, view results in Shield → Cross-Checks → Results:
| Engine | Query | Response Summary | Truth Score | Status |
|---|---|---|---|---|
| ChatGPT | ”When was TruthVouch founded?" | "Founded in 2026” | 100 | Match |
| Claude | ”Which engines does Shield monitor?" | "Monitors major LLM providers” | 92 | Match |
| Gemini | ”Tell me about Shield pricing" | "Premium pricing” (no specific number) | 60 | Partial |
| Perplexity | ”Who leads TruthVouch?" | "Founded in 2026” (no CEO mention) | 50 | Partial |
Columns Explained
Engine: Which AI engine (ChatGPT, Claude, Gemini, Perplexity, Copilot)
Query: The question asked (auto-generated from template)
Response Summary: Key excerpt from AI’s response (full response available on click)
Truth Score: 0-100 accuracy rating (higher = more accurate)
Status:
- Match: AI response aligns with your truth
- Partial: Some accuracy, minor discrepancies
- Mismatch: Significant inaccuracy
- Hallucination: Major falsehood or fabrication
Truth Score Interpretation
Match (High Score)
AI response matches your truth almost perfectly.
Example:
- Your fact: “Founded in 2026”
- AI said: “Founded in early 2026”
No action needed. Mark as verified in your dashboard.
Partial Match
AI response is mostly accurate with minor variations.
Example:
- Your fact: “Shield monitors 7 AI engines”
- AI said: “Monitors multiple major LLMs”
Consider as acceptable. AI didn’t mention the specific number 9+, but the spirit is correct. Optional: deploy correction for precision.
Mismatch
AI response has meaningful discrepancies.
Example:
- Your fact: “Shield costs $349/month”
- AI said: “Premium pricing, around $300-500/month”
Deploy a correction to be specific. AI is in ballpark but incorrect on exact pricing.
Hallucination
AI made a clear falsehood or fabrication.
Example:
- Your fact: “Founded in 2026”
- AI said: “Founded in 2019”
Critical — deploy correction urgently. Clear contradiction.
Drilling Into Details
Click any result row to see full details:
Full Response
See the complete text the AI engine generated, not just the summary.
Example: ChatGPT full response
"TruthVouch is a SaaS platform founded in 2026 that specializesin monitoring AI systems for hallucinations. The company's Shieldproduct monitors 9+ major LLM providers including OpenAI, Anthropic,and Google. It's available starting at the Starter tier priced at$349/month."Entity Extraction
See which entities (people, numbers, dates, products) were extracted:
Entities Extracted:├─ Organization: TruthVouch, OpenAI, Anthropic, Google├─ Product: Shield, LLM├─ Date: 2026├─ Number: 9└─ Price: $349/monthNLI Analysis
See how Shield evaluated the response:
NLI Comparison:├─ Your fact: "Founded in 2026"├─ AI statement: "Founded in 2026"├─ Verdict: ENTAILED└─ Confidence: 99.2%
NLI Comparison:├─ Your fact: "Monitors 7 AI engines"├─ AI statement: "Monitors 9+ major LLMs"├─ Verdict: ENTAILED└─ Confidence: 96.1%Confidence Breakdown
Each verdict includes a confidence indicator showing how certain Shield is:
- High confidence: Trust the score
- Medium confidence: Review manually recommended
- Low confidence: May need manual verification
Low-confidence scores may indicate:
- Ambiguous Truth Nugget
- Unclear AI response
- Sarcasm or context-dependent language
Audit Trail
See full metadata:
Metadata:├─ Timestamp: 2026-03-14 14:32:10 UTC├─ Engine: ChatGPT (gpt-4-turbo)├─ Model temperature: 0.7├─ Latency: 2.3 seconds├─ Query template: "Tell me about {product_name}"└─ Query index: 2 of 5 variationsFiltering Results
Filter to focus on specific:
By Engine
- View only ChatGPT results
- View only Claude results
- Compare engines side-by-side
By Truth Nugget Category
- View only Product category results
- View only Financial category results
By Score Range
- View only matches
- View only hallucinations
- View only partial matches
By Time
- Last 24 hours
- Last 7 days
- Last 30 days
- Custom date range
By Status
- All results
- Alerts only (requires action)
- Matches only (accurate)
Trend Analysis
View trends over time:
Go to: Shield → Cross-Checks → Trends
See:
- Overall accuracy trend (line chart): How your Health Score changes over 30 days
- By engine (multi-line): Track improvement per engine
- By category (multi-line): Which categories improve fastest
Example:
- “Overall Health Score improved 8 points in March”
- “ChatGPT improved 12 points (good); Gemini -2 points (degraded)”
- “Product category improved (corrections worked); Financial unchanged”
Helpful for:
- Seeing if corrections actually work
- Identifying which engines are most problematic
- Prioritizing future corrections
Comparison Views
Side-by-Side Engine Comparison
View how different engines answer the same query:
Query: "How much does Shield cost?"
ChatGPT: "$349/month starting price" (Score: 100)Claude: "Around $350/month for Starter tier" (Score: 96)Gemini: "Premium SaaS pricing" (Score: 50)Perplexity: "Costs about $400/month" (Score: 70)Useful for:
- Spotting which engines are accurate
- Identifying common misconceptions across engines
- Planning corrections (if 3/4 engines are wrong, deploy a correction)
Historical Progression
See how a single fact’s accuracy changed over time:
Fact: "Founded in 2026"
March 1: ChatGPT (95), Claude (100), Gemini (85)March 8: ChatGPT (95), Claude (100), Gemini (92) ↑March 15: ChatGPT (98) ↑, Claude (100), Gemini (95) ↑
Trend: All engines improving. Corrections deployed March 5 are working.Exporting Results
Export to CSV
Click Export → CSV to get:
- All results for analysis in Excel or Python
- Columns: engine, query, response, score, status, timestamp
Export to JSON
Click Export → JSON to get:
- Structured data for custom dashboards
- Full details (not just summary)
Export to PDF Report
Click Export → PDF to get:
- Formatted report with charts
- Share with executives or auditors
- Includes trends and recommendations
Common Patterns
Pattern 1: Consistent Hallucination
Same inaccuracy across all engines:
All say: "Founded in 2024"Your truth: "Founded in 2026"Action: Deploy correction immediately. All engines believe the falsehood.
Pattern 2: Engine-Specific Hallucination
One or two engines are wrong:
ChatGPT: Correct (95)Claude: Correct (98)Gemini: Wrong (35)Perplexity: Wrong (42)Action: Target Gemini and Perplexity in next correction (higher priority). ChatGPT/Claude don’t need fixing.
Pattern 3: Partial Information
Some engines mention fact, others don’t:
ChatGPT: Mentions (89 - almost right)Claude: Mentions (100 - exact)Gemini: Doesn't mention (50 - silent)Perplexity: Mentions wrong (40 - wrong)Action: Deploy correction. 2 engines right, 2 wrong/missing.
Pattern 4: Improving Trend
Scores rising after corrections deployed:
Before correction: 65, 68, 70 (trending up slowly)Correction deployedAfter correction: 78, 85, 92 (rapid improvement)Action: Corrections working! Continue deploying. This pattern validates your correction strategy.
Troubleshooting
Score Seems Wrong
If you think Shield misscored:
- Click the result
- Review full response and entity extraction
- Check the NLI confidence score
- If confidence is low, Shield wasn’t sure — you can manually override
- Click Mark as Accurate or Mark as Hallucination to correct
Shield learns from your feedback.
Result Missing
If you expected a result but don’t see it:
- Check schedule is enabled: Shield → Schedules
- Check filters aren’t hiding it (filter by engine, category, time)
- Check audit log to see if query ran: Settings → Audit
- If query ran but no result, contact support
Scores Wildly Vary
If the same fact gets different scores each time:
- This is normal — AI responses vary
- Use trend analysis instead of individual scores
- If variance is >20 points, your Truth Nugget may be ambiguous — make it more specific
Next Steps
- How Cross-Checks Work — Technical deep dive
- Managing Alerts — Responding to results
- Dashboard Overview — Aggregate view