Knowledge Base
The Knowledge Base is your organization’s private knowledge layer for AI governance. Upload internal documents — policies, product specs, HR guidelines, technical documentation — and TruthVouch automatically extracts, chunks, embeds, and indexes them for real-time use in the governance pipeline.
Internal knowledge is never published externally. It stays within your platform, used only to ground and verify AI responses passing through the Truth Firewall.
How Does the Knowledge Base Differ from Truth Nuggets?
The Knowledge Base and Truth Nuggets serve different purposes within the same governance pipeline. Truth Nuggets are short, structured fact statements used by the Hallucination Shield to verify what external LLMs say about your organization publicly. The Knowledge Base holds rich internal documents and structured knowledge items used by the Truth Firewall to ground AI responses with your proprietary information.
| Feature | Truth Nuggets | Knowledge Base |
|---|---|---|
| Purpose | Verify external LLM accuracy about public facts | Ground the governance pipeline with internal knowledge |
| Data format | Short fact statements | Rich documents + structured knowledge items |
| Corrections | External: wiki updates, provider notifications, fact sheets | Internal: database update + re-embed only |
| Visibility | Can be published externally | Never leaves the platform |
| Used by | Hallucination Shield + Truth Firewall | Truth Firewall only |
| Sources | Manual entry, web monitoring | Document upload, auto-extraction, manual entry, Knowledge Connectors (Confluence, SharePoint, Google Drive) |
The key principle: all truth nuggets are knowledge, but not all knowledge is truth. A knowledge item can be promoted to a truth nugget when it should become a public fact, but the reverse is not needed — truth nuggets are already included in the governance pipeline’s combined knowledge.
Who Can Access the Knowledge Base?
The Knowledge Base is available on Professional, Business, and Enterprise subscription tiers. Starter tier users see the Knowledge Base navigation item but are shown an upgrade prompt when they click it.
All Knowledge Base API endpoints enforce tier checking. Requests from accounts below Professional tier receive a 403 Forbidden response.
Uploading Documents
Navigate to AI Governance > Knowledge > Knowledge Base, then select the Documents tab.
Supported Formats
| Format | Extensions | Notes |
|---|---|---|
.pdf | Text-based PDFs only (no OCR for scanned documents in v1) | |
| Word | .docx | Office Open XML format |
| Plain Text | .txt | UTF-8 encoded |
| Markdown | .md | Preserves structure for better chunking |
| HTML | .html, .htm | Tags stripped, structure preserved |
Size limit: 50 MB per file.
Single Document Upload
- Click Upload Document.
- Select your file (drag and drop is supported).
- Fill in the metadata:
- Title (required): A descriptive name for the document.
- Description (optional): What this document contains.
- Department (optional): Organizational grouping (e.g., Engineering, Legal, HR).
- Classification: Choose from Internal, Confidential, or Restricted.
- Tags (optional): Comma-separated labels for filtering.
- Expires at (optional): Expiry date for policies or certifications.
- Click Upload.
The system processes your document immediately:
- Extracts the full text content
- Splits text into chunks (approximately 512 tokens each with overlap)
- Generates vector embeddings for each chunk
- Stores everything for semantic search and pipeline use
Once processing completes, the document status changes from Pending to Ready.
Bulk Upload
For uploading multiple documents at once:
- Click Bulk Upload.
- Select multiple files (up to 50 MB each).
- Click Upload All.
Bulk uploads are processed asynchronously in the background. Each document appears in the list with a Pending status and transitions to Ready as processing completes. Document titles default to the filename (without extension) — you can edit them afterward.
Document Statuses
| Status | Meaning |
|---|---|
| Pending | Queued for processing |
| Processing | Text extraction and embedding in progress |
| Ready | Fully processed and available in the pipeline |
| Failed | Processing encountered an error (see error message) |
| Archived | Manually archived, excluded from pipeline |
Knowledge Connectors
In addition to manual uploads, you can automatically sync documents from external systems using Knowledge Connectors. TruthVouch provides built-in connectors for Confluence, SharePoint, and Google Drive that pull documents on a configurable schedule (every 1h, 6h, 12h, or daily).
Connector-synced documents go through the same processing pipeline as manual uploads — text extraction, chunking, embedding, and optional auto-extraction of knowledge items. Documents synced by a connector display the source system type (e.g., “confluence”, “sharepoint”) in the document list.
Navigate to the Connectors tab (next to Documents and Knowledge Items) to set up and manage connectors. For full setup instructions, see the Knowledge Connectors guide.
Reviewing Extracted Text
After a document is processed, you can review the extracted text to verify accuracy.
- Open a document from the list.
- Click View Extracted Text.
- Review the full text that was extracted from your document.
This is particularly useful for PDFs and Word documents where formatting may affect extraction quality. If the extracted text is unsatisfactory, you can delete the document and re-upload a cleaner version.
You can also browse individual chunks by clicking View Chunks on the document detail view. Each chunk shows its content, token count, and metadata (page number for PDFs, section headings when detected).
Auto-Extracting Knowledge Items from Documents
One of the most powerful features of the Knowledge Base is LLM-powered auto-extraction. Instead of manually creating knowledge items one by one, you can have the system analyze a document and extract structured facts automatically.
How to Extract Facts
- Open a processed document (status must be Ready).
- Click Extract Facts.
- Optionally adjust the maximum number of facts to extract (default: 50).
- Wait for the extraction to complete.
The system uses an LLM to identify key factual statements in your document. For each extracted fact, you get:
- Fact statement: The structured knowledge claim
- Entity name: The subject of the fact (auto-detected)
- Suggested category: Automatic classification (policy, product, technical, etc.)
- Confidence score: How confident the system is in the extraction (0.0 to 1.0)
- Source excerpt: The relevant passage from the original document
Reviewing and Approving Extracted Facts
Extracted facts are presented for your review — they are not automatically added to the Knowledge Base. This human-in-the-loop design ensures quality.
- Review each extracted fact in the extraction dialog.
- Edit fact statements or categories as needed.
- Deselect any facts you don’t want to keep.
- Click Add Selected to create knowledge items from the approved facts.
The approved facts are created as knowledge items in bulk, with embeddings generated automatically.
Managing Knowledge Items
Knowledge items are structured fact statements — the atomic units of your internal knowledge. They can be created manually, extracted from documents, or both.
Creating Knowledge Items Manually
- Navigate to the Knowledge Items tab.
- Click Add Knowledge Item.
- Enter:
- Fact Statement (required): A clear, concise factual claim. Example: “Our standard SLA guarantees 99.9% uptime for Enterprise customers.”
- Entity Name (optional): The subject. Example: “Enterprise SLA.”
- Category (required): One of the 10 available categories.
- Tags (optional): For filtering and organization.
- Click Create.
An embedding is generated automatically so the item is immediately searchable and available in the governance pipeline.
Knowledge Item Categories
There are 10 categories for organizing knowledge items:
| Category | Use For |
|---|---|
| Policy | Company policies, guidelines, standards |
| Procedure | Step-by-step processes, workflows |
| Product | Product features, capabilities, specifications |
| Technical | Architecture, infrastructure, technical details |
| Legal | Contracts, terms of service, regulatory requirements |
| Financial | Pricing, budgets, financial metrics |
| HR | Benefits, hiring processes, employee policies |
| Security | Security practices, incident response, access controls |
| General | Cross-functional or uncategorized knowledge |
| Other | Anything that doesn’t fit the above |
Editing and Deleting Items
To edit: Click on a knowledge item, modify the fact statement, entity name, category, or tags, then save. If the fact statement changes, the embedding is automatically regenerated.
To delete: Click the delete action on a knowledge item. This performs a soft delete — the item is deactivated and removed from pipeline retrieval, but retained for audit purposes.
Promoting Knowledge Items to Truth Nuggets
When an internal knowledge item should become a public fact — one that the Hallucination Shield uses to verify external LLMs — you can promote it to a truth nugget.
When to Promote
Promote a knowledge item when:
- The fact is publicly known and should be verified across external AI systems
- You want the Hallucination Shield to detect when external LLMs get this fact wrong
- The information is no longer sensitive or internal-only
Do not promote when:
- The fact contains proprietary or confidential information
- The knowledge is only relevant for internal AI grounding
- The fact is likely to change frequently
How to Promote
- Open a knowledge item.
- Click Promote to Truth Nugget.
- Optionally adjust the fact statement, category, or source type for the new truth nugget.
- Confirm the promotion.
The system creates a new truth nugget linked to the original knowledge item. The knowledge item remains active and unchanged — promotion is additive, not a move.
How Knowledge Is Used in the Governance Pipeline
The governance pipeline (Truth Firewall) queries the Knowledge Base in four stages to ground and verify AI interactions.
Stage 1: Input Truth Scan
When a prompt enters the pipeline, it is scanned against both truth nuggets and the Knowledge Base. The system embeds the input text and performs a semantic similarity search across:
- Truth nugget embeddings
- Knowledge document chunk embeddings
- Knowledge item embeddings
Matches above the similarity threshold are attached to the request context. If any matches indicate a factual contradiction, the request is flagged (but not blocked — the truth scan is advisory).
Stage 2: Context Injection
Before the prompt is forwarded to the LLM, the pipeline enriches it with relevant knowledge. Retrieved content is injected into the system prompt using clear labels:
[Verified Facts]Company X was founded in 2020 and is headquartered in New York.
[Internal Knowledge]Our standard Enterprise SLA guarantees 99.9% uptime. -- Source: Enterprise Service AgreementThis grounding helps the LLM generate responses that are consistent with your organization’s verified knowledge.
Stage 3: Output Truth Scan
After the LLM responds, the output is scanned against the same combined knowledge base. If the response contradicts a known fact (from either truth nuggets or internal knowledge), the contradiction is recorded in the governance report.
When auto-correction detection is enabled, contradictions between the LLM response and knowledge chunks can automatically create pending internal corrections for review.
Stage 4: Truth Nugget Screener (Sentinel Traffic)
For traffic originating from Sentinel agents (employee AI tool monitoring), extracted claims from LLM responses are checked against both truth nuggets and knowledge items. Potential mismatches are recorded for review.
All four stages include source attribution — each match identifies whether it came from a truth nugget, document chunk, or knowledge item.
Internal vs External Corrections
TruthVouch separates corrections into two distinct scopes to prevent internal knowledge from leaking externally.
External Corrections (Truth Nugget Scope)
External corrections are triggered when the Hallucination Shield detects that an external LLM is stating something incorrectly about a public fact. These corrections are managed from Hallucination Shield > Corrections and can be deployed via:
- Neural fact sheets
- Provider notifications
- Web presence updates
Internal Corrections (Knowledge Base Scope)
Internal corrections target knowledge base content — a document chunk that contains outdated or incorrect information. These are managed from a separate screen: AI Governance > Knowledge > Knowledge Corrections.
When an internal correction is approved, the system:
- Updates the chunk content with the corrected text
- Increments the chunk version number
- Regenerates the vector embedding for the updated content
- Records the correction in the audit trail
The next time the governance pipeline retrieves that chunk, it uses the corrected version. Nothing is published externally.
Scope Badges
Corrections display a scope badge in the UI:
- External (blue): Deploys to external channels (wiki, providers, web)
- Internal (green): Updates the knowledge base only
This visual separation prevents accidental cross-scope operations.
Creating an Internal Correction
- Navigate to AI Governance > Knowledge > Knowledge Corrections.
- Click New Correction.
- Select the target document chunk.
- Enter the corrected content and optional notes.
- Submit the correction (creates it in Pending status).
- A reviewer approves the correction, which triggers the update pipeline.
Classification Levels
Documents can be tagged with a classification level to indicate sensitivity:
| Level | Description |
|---|---|
| Internal (default) | Standard internal information, accessible to all team members |
| Confidential | Sensitive business information, limited distribution |
| Restricted | Highly sensitive — trade secrets, financial data, legal matters |
In the current version, classification levels are labels only — they do not enforce access control. All users with Knowledge Base access can view all documents regardless of classification. Future versions may add classification-based access restrictions.
Document Expiry
Documents can have an optional expiry date, useful for content with a defined lifecycle:
- Policies that are reviewed annually
- Certifications with renewal dates (e.g., SOC 2, ISO 27001)
- Contracts or agreements with end dates
- Regulatory filings with compliance deadlines
Setting an Expiry Date
Set the Expires at field during upload or edit the document metadata afterward. The document list supports filtering by expiry status, and expiring documents display a visual badge.
In the current version, expired documents are not automatically archived. They remain active in the pipeline until manually archived. The expiry badge serves as a visual reminder to review and update the content.
Retrieval Tracking
The Knowledge Base tracks how your content is being used by the governance pipeline. Each document chunk and knowledge item records:
- Retrieval count: How many times it has been retrieved during pipeline execution
- Last retrieved at: The timestamp of the most recent retrieval
This data helps you understand:
- Which documents are actively contributing to AI grounding
- Which knowledge items are being matched most frequently
- Whether uploaded content is actually being used (low retrieval counts may indicate the content isn’t relevant to your AI traffic)
Retrieval tracking is updated in real time as the pipeline processes requests. The tracking updates are non-blocking — they never add latency to the governance pipeline.
Semantic Search
The Knowledge Base provides semantic search across all three knowledge sources: document chunks, knowledge items, and truth nuggets.
Using Semantic Search
- Navigate to the Knowledge Base.
- Use the search bar or the dedicated search endpoint.
- Enter a natural language query.
- Results are ranked by semantic similarity, with each result showing:
- The matched content
- The source type (document chunk, knowledge item, or truth nugget)
- The similarity score
- Source attribution (document title, entity name, category)
Semantic search uses the same vector embeddings that power the governance pipeline, so search results accurately reflect what the pipeline would retrieve for a similar query.
Tier Requirements
| Tier | Knowledge Base Access |
|---|---|
| Starter | Not available — upgrade prompt shown |
| Professional | Full access |
| Business | Full access |
| Enterprise | Full access |
The tier check is enforced at the API level. All Knowledge Base endpoints require Professional tier or above. The frontend shows an upgrade prompt for Starter tier accounts.
Next Steps
- Knowledge Connectors — Automatically sync from Confluence, SharePoint, and Google Drive
- Governance Overview — Understand the full governance platform
- How the Firewall Works — Learn about the 17-stage pipeline that uses your knowledge
- Audit Trail — Review how knowledge corrections are logged
- Board Reports — Generate compliance documentation