Skip to content

Knowledge Base

The Knowledge Base is your organization’s private knowledge layer for AI governance. Upload internal documents — policies, product specs, HR guidelines, technical documentation — and TruthVouch automatically extracts, chunks, embeds, and indexes them for real-time use in the governance pipeline.

Internal knowledge is never published externally. It stays within your platform, used only to ground and verify AI responses passing through the Truth Firewall.

How Does the Knowledge Base Differ from Truth Nuggets?

The Knowledge Base and Truth Nuggets serve different purposes within the same governance pipeline. Truth Nuggets are short, structured fact statements used by the Hallucination Shield to verify what external LLMs say about your organization publicly. The Knowledge Base holds rich internal documents and structured knowledge items used by the Truth Firewall to ground AI responses with your proprietary information.

FeatureTruth NuggetsKnowledge Base
PurposeVerify external LLM accuracy about public factsGround the governance pipeline with internal knowledge
Data formatShort fact statementsRich documents + structured knowledge items
CorrectionsExternal: wiki updates, provider notifications, fact sheetsInternal: database update + re-embed only
VisibilityCan be published externallyNever leaves the platform
Used byHallucination Shield + Truth FirewallTruth Firewall only
SourcesManual entry, web monitoringDocument upload, auto-extraction, manual entry, Knowledge Connectors (Confluence, SharePoint, Google Drive)

The key principle: all truth nuggets are knowledge, but not all knowledge is truth. A knowledge item can be promoted to a truth nugget when it should become a public fact, but the reverse is not needed — truth nuggets are already included in the governance pipeline’s combined knowledge.

Who Can Access the Knowledge Base?

The Knowledge Base is available on Professional, Business, and Enterprise subscription tiers. Starter tier users see the Knowledge Base navigation item but are shown an upgrade prompt when they click it.

All Knowledge Base API endpoints enforce tier checking. Requests from accounts below Professional tier receive a 403 Forbidden response.

Uploading Documents

Navigate to AI Governance > Knowledge > Knowledge Base, then select the Documents tab.

Supported Formats

FormatExtensionsNotes
PDF.pdfText-based PDFs only (no OCR for scanned documents in v1)
Word.docxOffice Open XML format
Plain Text.txtUTF-8 encoded
Markdown.mdPreserves structure for better chunking
HTML.html, .htmTags stripped, structure preserved

Size limit: 50 MB per file.

Single Document Upload

  1. Click Upload Document.
  2. Select your file (drag and drop is supported).
  3. Fill in the metadata:
    • Title (required): A descriptive name for the document.
    • Description (optional): What this document contains.
    • Department (optional): Organizational grouping (e.g., Engineering, Legal, HR).
    • Classification: Choose from Internal, Confidential, or Restricted.
    • Tags (optional): Comma-separated labels for filtering.
    • Expires at (optional): Expiry date for policies or certifications.
  4. Click Upload.

The system processes your document immediately:

  • Extracts the full text content
  • Splits text into chunks (approximately 512 tokens each with overlap)
  • Generates vector embeddings for each chunk
  • Stores everything for semantic search and pipeline use

Once processing completes, the document status changes from Pending to Ready.

Bulk Upload

For uploading multiple documents at once:

  1. Click Bulk Upload.
  2. Select multiple files (up to 50 MB each).
  3. Click Upload All.

Bulk uploads are processed asynchronously in the background. Each document appears in the list with a Pending status and transitions to Ready as processing completes. Document titles default to the filename (without extension) — you can edit them afterward.

Document Statuses

StatusMeaning
PendingQueued for processing
ProcessingText extraction and embedding in progress
ReadyFully processed and available in the pipeline
FailedProcessing encountered an error (see error message)
ArchivedManually archived, excluded from pipeline

Knowledge Connectors

In addition to manual uploads, you can automatically sync documents from external systems using Knowledge Connectors. TruthVouch provides built-in connectors for Confluence, SharePoint, and Google Drive that pull documents on a configurable schedule (every 1h, 6h, 12h, or daily).

Connector-synced documents go through the same processing pipeline as manual uploads — text extraction, chunking, embedding, and optional auto-extraction of knowledge items. Documents synced by a connector display the source system type (e.g., “confluence”, “sharepoint”) in the document list.

Navigate to the Connectors tab (next to Documents and Knowledge Items) to set up and manage connectors. For full setup instructions, see the Knowledge Connectors guide.

Reviewing Extracted Text

After a document is processed, you can review the extracted text to verify accuracy.

  1. Open a document from the list.
  2. Click View Extracted Text.
  3. Review the full text that was extracted from your document.

This is particularly useful for PDFs and Word documents where formatting may affect extraction quality. If the extracted text is unsatisfactory, you can delete the document and re-upload a cleaner version.

You can also browse individual chunks by clicking View Chunks on the document detail view. Each chunk shows its content, token count, and metadata (page number for PDFs, section headings when detected).

Auto-Extracting Knowledge Items from Documents

One of the most powerful features of the Knowledge Base is LLM-powered auto-extraction. Instead of manually creating knowledge items one by one, you can have the system analyze a document and extract structured facts automatically.

How to Extract Facts

  1. Open a processed document (status must be Ready).
  2. Click Extract Facts.
  3. Optionally adjust the maximum number of facts to extract (default: 50).
  4. Wait for the extraction to complete.

The system uses an LLM to identify key factual statements in your document. For each extracted fact, you get:

  • Fact statement: The structured knowledge claim
  • Entity name: The subject of the fact (auto-detected)
  • Suggested category: Automatic classification (policy, product, technical, etc.)
  • Confidence score: How confident the system is in the extraction (0.0 to 1.0)
  • Source excerpt: The relevant passage from the original document

Reviewing and Approving Extracted Facts

Extracted facts are presented for your review — they are not automatically added to the Knowledge Base. This human-in-the-loop design ensures quality.

  1. Review each extracted fact in the extraction dialog.
  2. Edit fact statements or categories as needed.
  3. Deselect any facts you don’t want to keep.
  4. Click Add Selected to create knowledge items from the approved facts.

The approved facts are created as knowledge items in bulk, with embeddings generated automatically.

Managing Knowledge Items

Knowledge items are structured fact statements — the atomic units of your internal knowledge. They can be created manually, extracted from documents, or both.

Creating Knowledge Items Manually

  1. Navigate to the Knowledge Items tab.
  2. Click Add Knowledge Item.
  3. Enter:
    • Fact Statement (required): A clear, concise factual claim. Example: “Our standard SLA guarantees 99.9% uptime for Enterprise customers.”
    • Entity Name (optional): The subject. Example: “Enterprise SLA.”
    • Category (required): One of the 10 available categories.
    • Tags (optional): For filtering and organization.
  4. Click Create.

An embedding is generated automatically so the item is immediately searchable and available in the governance pipeline.

Knowledge Item Categories

There are 10 categories for organizing knowledge items:

CategoryUse For
PolicyCompany policies, guidelines, standards
ProcedureStep-by-step processes, workflows
ProductProduct features, capabilities, specifications
TechnicalArchitecture, infrastructure, technical details
LegalContracts, terms of service, regulatory requirements
FinancialPricing, budgets, financial metrics
HRBenefits, hiring processes, employee policies
SecuritySecurity practices, incident response, access controls
GeneralCross-functional or uncategorized knowledge
OtherAnything that doesn’t fit the above

Editing and Deleting Items

To edit: Click on a knowledge item, modify the fact statement, entity name, category, or tags, then save. If the fact statement changes, the embedding is automatically regenerated.

To delete: Click the delete action on a knowledge item. This performs a soft delete — the item is deactivated and removed from pipeline retrieval, but retained for audit purposes.

Promoting Knowledge Items to Truth Nuggets

When an internal knowledge item should become a public fact — one that the Hallucination Shield uses to verify external LLMs — you can promote it to a truth nugget.

When to Promote

Promote a knowledge item when:

  • The fact is publicly known and should be verified across external AI systems
  • You want the Hallucination Shield to detect when external LLMs get this fact wrong
  • The information is no longer sensitive or internal-only

Do not promote when:

  • The fact contains proprietary or confidential information
  • The knowledge is only relevant for internal AI grounding
  • The fact is likely to change frequently

How to Promote

  1. Open a knowledge item.
  2. Click Promote to Truth Nugget.
  3. Optionally adjust the fact statement, category, or source type for the new truth nugget.
  4. Confirm the promotion.

The system creates a new truth nugget linked to the original knowledge item. The knowledge item remains active and unchanged — promotion is additive, not a move.

How Knowledge Is Used in the Governance Pipeline

The governance pipeline (Truth Firewall) queries the Knowledge Base in four stages to ground and verify AI interactions.

Stage 1: Input Truth Scan

When a prompt enters the pipeline, it is scanned against both truth nuggets and the Knowledge Base. The system embeds the input text and performs a semantic similarity search across:

  • Truth nugget embeddings
  • Knowledge document chunk embeddings
  • Knowledge item embeddings

Matches above the similarity threshold are attached to the request context. If any matches indicate a factual contradiction, the request is flagged (but not blocked — the truth scan is advisory).

Stage 2: Context Injection

Before the prompt is forwarded to the LLM, the pipeline enriches it with relevant knowledge. Retrieved content is injected into the system prompt using clear labels:

[Verified Facts]
Company X was founded in 2020 and is headquartered in New York.
[Internal Knowledge]
Our standard Enterprise SLA guarantees 99.9% uptime. -- Source: Enterprise Service Agreement

This grounding helps the LLM generate responses that are consistent with your organization’s verified knowledge.

Stage 3: Output Truth Scan

After the LLM responds, the output is scanned against the same combined knowledge base. If the response contradicts a known fact (from either truth nuggets or internal knowledge), the contradiction is recorded in the governance report.

When auto-correction detection is enabled, contradictions between the LLM response and knowledge chunks can automatically create pending internal corrections for review.

Stage 4: Truth Nugget Screener (Sentinel Traffic)

For traffic originating from Sentinel agents (employee AI tool monitoring), extracted claims from LLM responses are checked against both truth nuggets and knowledge items. Potential mismatches are recorded for review.

All four stages include source attribution — each match identifies whether it came from a truth nugget, document chunk, or knowledge item.

Internal vs External Corrections

TruthVouch separates corrections into two distinct scopes to prevent internal knowledge from leaking externally.

External Corrections (Truth Nugget Scope)

External corrections are triggered when the Hallucination Shield detects that an external LLM is stating something incorrectly about a public fact. These corrections are managed from Hallucination Shield > Corrections and can be deployed via:

  • Neural fact sheets
  • Provider notifications
  • Web presence updates

Internal Corrections (Knowledge Base Scope)

Internal corrections target knowledge base content — a document chunk that contains outdated or incorrect information. These are managed from a separate screen: AI Governance > Knowledge > Knowledge Corrections.

When an internal correction is approved, the system:

  1. Updates the chunk content with the corrected text
  2. Increments the chunk version number
  3. Regenerates the vector embedding for the updated content
  4. Records the correction in the audit trail

The next time the governance pipeline retrieves that chunk, it uses the corrected version. Nothing is published externally.

Scope Badges

Corrections display a scope badge in the UI:

  • External (blue): Deploys to external channels (wiki, providers, web)
  • Internal (green): Updates the knowledge base only

This visual separation prevents accidental cross-scope operations.

Creating an Internal Correction

  1. Navigate to AI Governance > Knowledge > Knowledge Corrections.
  2. Click New Correction.
  3. Select the target document chunk.
  4. Enter the corrected content and optional notes.
  5. Submit the correction (creates it in Pending status).
  6. A reviewer approves the correction, which triggers the update pipeline.

Classification Levels

Documents can be tagged with a classification level to indicate sensitivity:

LevelDescription
Internal (default)Standard internal information, accessible to all team members
ConfidentialSensitive business information, limited distribution
RestrictedHighly sensitive — trade secrets, financial data, legal matters

In the current version, classification levels are labels only — they do not enforce access control. All users with Knowledge Base access can view all documents regardless of classification. Future versions may add classification-based access restrictions.

Document Expiry

Documents can have an optional expiry date, useful for content with a defined lifecycle:

  • Policies that are reviewed annually
  • Certifications with renewal dates (e.g., SOC 2, ISO 27001)
  • Contracts or agreements with end dates
  • Regulatory filings with compliance deadlines

Setting an Expiry Date

Set the Expires at field during upload or edit the document metadata afterward. The document list supports filtering by expiry status, and expiring documents display a visual badge.

In the current version, expired documents are not automatically archived. They remain active in the pipeline until manually archived. The expiry badge serves as a visual reminder to review and update the content.

Retrieval Tracking

The Knowledge Base tracks how your content is being used by the governance pipeline. Each document chunk and knowledge item records:

  • Retrieval count: How many times it has been retrieved during pipeline execution
  • Last retrieved at: The timestamp of the most recent retrieval

This data helps you understand:

  • Which documents are actively contributing to AI grounding
  • Which knowledge items are being matched most frequently
  • Whether uploaded content is actually being used (low retrieval counts may indicate the content isn’t relevant to your AI traffic)

Retrieval tracking is updated in real time as the pipeline processes requests. The tracking updates are non-blocking — they never add latency to the governance pipeline.

The Knowledge Base provides semantic search across all three knowledge sources: document chunks, knowledge items, and truth nuggets.

  1. Navigate to the Knowledge Base.
  2. Use the search bar or the dedicated search endpoint.
  3. Enter a natural language query.
  4. Results are ranked by semantic similarity, with each result showing:
    • The matched content
    • The source type (document chunk, knowledge item, or truth nugget)
    • The similarity score
    • Source attribution (document title, entity name, category)

Semantic search uses the same vector embeddings that power the governance pipeline, so search results accurately reflect what the pipeline would retrieve for a similar query.

Tier Requirements

TierKnowledge Base Access
StarterNot available — upgrade prompt shown
ProfessionalFull access
BusinessFull access
EnterpriseFull access

The tier check is enforced at the API level. All Knowledge Base endpoints require Professional tier or above. The frontend shows an upgrade prompt for Starter tier accounts.

Next Steps