Knowledge Connectors

Knowledge Connectors automatically synchronize documents from external systems into your Knowledge Base. Instead of manually uploading documents, connectors pull content from Confluence, SharePoint, and Google Drive on a configurable schedule, keeping your governance pipeline grounded with up-to-date organizational knowledge.

Connectors perform one-way sync: external system to Knowledge Base. Documents are processed through the same pipeline as manual uploads — text extraction, chunking, embedding generation, and optional auto-extraction of structured knowledge items.

Who Can Use Knowledge Connectors?

Knowledge Connectors are available on Professional, Business, and Enterprise subscription tiers — the same tiers that have access to the Knowledge Base. Starter tier accounts see an upgrade prompt.

All connector API endpoints enforce tier checking. Requests from accounts below Professional tier receive a 403 Forbidden response.

Available Connectors

TruthVouch provides three built-in connectors:

Connector	Source System	Auth Methods	Scope	What It Syncs
Confluence	Atlassian Confluence Cloud	API token (email + token) or OAuth 2.0	Spaces	Pages (HTML content), page labels
SharePoint	Microsoft 365 SharePoint	OAuth 2.0 (Azure AD client credentials)	Sites and document libraries	DOCX, PDF, TXT, XLSX, PPTX files
Google Drive	Google Workspace	Service account JSON key or OAuth 2.0	Shared drives and folders	Google Docs, Sheets, Slides (exported as text), PDF, DOCX, TXT

Setting Up a Connector

Navigate to AI Governance > Knowledge > Knowledge Base, then select the Connectors tab (the third tab after Documents and Knowledge Items).

Step 1: Open the Add Connector Wizard

Click Add Connector in the top-right corner. A multi-step wizard opens to guide you through the setup.

Step 2: Select Connector Type

Choose from three connector cards:

Confluence — Sync Confluence Cloud spaces and pages
SharePoint — Sync SharePoint Online document libraries
Google Drive — Sync Google Workspace shared drives and folders

Click the card for the system you want to connect, then click Next.

Step 3: Configure Authentication

Enter the credentials required for your chosen connector type. Each connector has specific authentication requirements — see the detailed setup sections below.

Click Test Connection to verify that TruthVouch can reach your external system with the provided credentials. The test performs a read-only API call (listing one space, site, or drive) to confirm access.

If the test fails, you see a specific error message — common issues include expired tokens, insufficient permissions, or incorrect base URLs. Fix the issue and test again before proceeding.

Step 4: Select Scope

After a successful connection test, the wizard loads available scopes from your external system:

Confluence: A list of spaces you have access to
SharePoint: A two-level hierarchy of sites and their document libraries
Google Drive: Shared drives and their root-level folders

Select the specific spaces, sites/libraries, or drives/folders you want to sync. Only selected scopes are included in sync operations.

Step 5: Configure Sync Settings

Name: A descriptive name for this connector (e.g., “Engineering Confluence” or “HR Policies SharePoint”).
Sync interval: How often the connector should check for changes. Options: every 1 hour, 6 hours (default), 12 hours, or daily.
Auto-extract facts: When enabled, newly synced documents automatically trigger LLM-powered extraction of structured knowledge items. Extracted items are created directly in the Knowledge Base. When disabled (default), documents are ingested and chunked but knowledge items must be extracted manually.

Step 6: Review and Create

The final step shows a summary of your configuration: connector type, authentication mode, selected scopes, sync schedule, and auto-extraction setting.

Click Create to save the connector. The credentials are encrypted at rest using AES-256-GCM before being stored.

After creation, you can click Start Initial Sync to trigger the first sync immediately, or wait for the next scheduled run.

Confluence Setup

Requirements

An Atlassian Cloud account with access to the Confluence instance you want to sync
Either an API token or an OAuth 2.0 access token

Authentication: API Token

This is the simpler method, recommended for getting started.

Go to https://id.atlassian.com/manage-profile/security/api-tokens.
Click Create API token.
Give it a label (e.g., “TruthVouch Knowledge Connector”) and click Create.
Copy the generated token.
In the TruthVouch wizard, enter:
- Base URL: Your Confluence instance URL (e.g., https://your-company.atlassian.net)
- Email: The email address associated with your Atlassian account
- API Token: The token you just created

The connector uses Basic authentication with your email and API token.

Authentication: OAuth 2.0

For production deployments where you want app-level access without tying the connector to a personal account:

Go to the Atlassian Developer Console and create an OAuth 2.0 app.
Configure the required scopes: read:confluence-space.summary, read:confluence-content.all.
Generate and copy the OAuth access token.
In the TruthVouch wizard, set the auth mode to OAuth and enter the access token.

What Gets Synced

Confluence pages within the selected spaces — page content is fetched in Confluence storage format (XHTML) and delivered as HTML for processing
Page labels — attached as metadata for use as tags on knowledge items
Pages are synced based on modification date. On each incremental sync, only pages modified since the last sync are fetched
The connector handles Confluence’s cursor-based pagination automatically

Note: Confluence page attachments (PDF, DOCX files attached to pages) are not synced in the current version. Only the page body content is processed. Upload attachments manually if needed.

Rate Limiting

The connector includes a 1-second delay between consecutive paginated API calls and handles HTTP 429 (Too Many Requests) responses by respecting the Retry-After header, with a 30-second default wait when no header is present.

SharePoint Setup

Requirements

A Microsoft 365 tenant with SharePoint Online
An Azure AD app registration with appropriate permissions

Authentication: Azure AD App Registration

SharePoint uses the OAuth 2.0 client credentials flow via Microsoft Graph API.

Go to the Azure Portal > Azure Active Directory > App registrations > New registration.
Name the app (e.g., “TruthVouch Knowledge Connector”).
Under API permissions, add the following Application permissions for Microsoft Graph:
- Sites.Read.All — read all SharePoint site content
- Files.Read.All — read all files in SharePoint document libraries
Click Grant admin consent for the permissions.
Under Certificates & secrets, create a new client secret and copy the value.
Note the Application (client) ID from the app overview page.
Note the Directory (tenant) ID from the app overview page.
In the TruthVouch wizard, enter:
- Tenant ID: Your Azure AD tenant ID
- Client ID: The app registration’s client ID
- Client Secret: The client secret value you created

The connector acquires a Bearer token using the client credentials flow and caches it for the token’s lifetime to avoid unnecessary round-trips.

What Gets Synced

Files from selected SharePoint document libraries: DOCX, PDF, TXT, XLSX, PPTX formats
The connector uses Microsoft Graph delta queries for efficient incremental sync — only changed files are fetched on subsequent syncs
Delta tokens are persisted between sync runs so each sync picks up exactly where the previous one left off
If a delta token expires (HTTP 410 Gone), the connector automatically falls back to a full sync for that drive
Deleted files are detected via the Graph delta response and the corresponding documents are archived in TruthVouch

Rate Limiting

The connector includes a 200ms delay between consecutive Graph API calls and handles HTTP 429 responses by respecting the Retry-After header, with a 30-second default wait.

Google Drive Setup

Requirements

A Google Workspace account
Either a service account JSON key or an OAuth 2.0 access token

Authentication: Service Account

Recommended for automated sync without user interaction.

Go to the Google Cloud Console > IAM & Admin > Service Accounts.
Create a new service account (e.g., “truthvouch-knowledge-connector”).
Grant the service account access to the shared drives you want to sync (add as a member in Drive settings).
Under Keys, create a new JSON key and download it.
Enable the Google Drive API in the Google Cloud Console under APIs & Services.
In the TruthVouch wizard, paste the entire contents of the JSON key file into the Access Token field.

The connector detects the JSON key format automatically and generates short-lived Bearer tokens via JWT assertion.

Authentication: OAuth 2.0

For setups where you prefer user-delegated access:

Configure an OAuth consent screen in the Google Cloud Console.
Create OAuth 2.0 credentials (Web application type).
Obtain an access token through the OAuth flow.
Enter the access token in the TruthVouch wizard.

What Gets Synced

Files from selected shared drives: Google Docs, Sheets, and Slides (exported as plain text), PDF, DOCX, and TXT files
The connector uses the Google Drive Changes API with page tokens for efficient incremental sync
Page tokens are persisted between syncs so each run processes only new changes
Google-native formats (Docs, Sheets, Slides) are exported as text using the Google Drive export endpoint — no separate conversion step is needed

Rate Limiting

The connector includes a 100ms delay between API calls and handles HTTP 403/429 rate-limit responses with a 30-second retry, respecting the Retry-After header when present.

Sync Scheduling

Interval Options

Interval	Use Case
Every 1 hour	Fast-changing documentation, active project wikis
Every 6 hours (default)	Standard organizational docs, policies
Every 12 hours	Stable reference documentation
Daily	Infrequently updated content, archival material

The sync scheduler checks every minute for connectors that are due for their next sync. When a connector’s interval has elapsed since its last sync, a sync job is queued for background execution.

Manual Sync

You can trigger a sync at any time regardless of the schedule:

Open the Connectors tab.
Find the connector in the list.
Click the Sync Now action (or open the connector detail panel and click Sync Now).

The sync runs in the background. You can continue using the platform while it processes.

Pause and Resume

To temporarily stop a connector from syncing on its schedule:

Click the Pause action on a connector.
The connector status changes to paused (amber badge).
Scheduled syncs are skipped while paused. You can still trigger manual syncs.
Click Resume to reactivate the schedule.

Sync Process

Each sync follows this process:

Fetch changes — The connector queries the external system for documents modified since the last sync timestamp. For the initial sync (no previous timestamp), all documents in scope are fetched.
Change detection — For each document, the connector compares the external system’s last-modified timestamp against the stored value. Documents that haven’t changed are skipped.
Content download — Changed or new documents are downloaded from the external system.
Document processing — Each document goes through the same pipeline as manual uploads: text extraction, chunking (approximately 512 tokens with overlap), and embedding generation.
Auto-extraction (if enabled) — Extracted documents are analyzed by an LLM to identify structured knowledge items (facts, policies, procedures).
Deletion handling — Documents deleted in the external system are archived in TruthVouch (soft delete). They are removed from the governance pipeline but retained for audit purposes.
Sync log — Results are recorded: documents added, updated, archived, facts extracted, and any per-document errors.

Single document failures do not abort the entire sync. If one document fails to process (e.g., unsupported format, extraction error), the error is logged and the sync continues with remaining documents.

Monitoring Sync Status

Connector List View

The Connectors tab shows a table with all configured connectors:

Column	Description
Name	The connector’s display name
Type	Confluence, SharePoint, or Google Drive (with icon)
Status	Current status badge: active (green), paused (amber), error (red), disabled (gray)
Last Sync	When the last sync completed and its result (success, partial, failed)
Documents Synced	Number of documents processed in the last sync
Next Sync	Estimated time until the next scheduled sync

Connector Detail Panel

Click a connector row to open the detail panel, which shows:

Configuration summary: Type, scope, schedule, auto-extract setting
Sync history: A chronological list of recent sync runs with timing, document counts (added/updated/archived), facts extracted, and status
Failed documents: An expandable list showing which documents failed during sync and the specific error message for each
Actions: Edit, Sync Now, Pause/Resume, Delete

Status Indicators

Status	Meaning
Active (green)	Connector is running on schedule. Last sync completed successfully.
Paused (amber)	Connector is temporarily paused. Scheduled syncs are skipped.
Error (red)	The last sync failed or authentication has expired. Check the error message and re-authenticate if needed.
Disabled (gray)	Connector has been manually disabled.

When a connector enters the error state (e.g., due to an expired OAuth token or revoked permissions), it displays a descriptive error message. Common resolutions include re-authenticating or checking that the API permissions haven’t been revoked.

Supported File Types

Source	Supported Formats
Confluence	Page content (HTML). Page attachments are not synced in the current version.
SharePoint	DOCX, PDF, TXT, XLSX, PPTX. Other file types are skipped.
Google Drive	Google Docs, Google Sheets, Google Slides (exported as text), PDF, DOCX, TXT. Other file types are skipped.

Size limit: Files larger than 50 MB are skipped. This matches the Knowledge Base’s per-file upload limit. Skipped files are logged in the sync history details.

Credential Security

Connector credentials (API tokens, OAuth tokens, client secrets, service account keys) are encrypted at rest using AES-256-GCM before being stored in the database. The encryption uses the platform’s existing field encryption service with key versioning.

Credentials are:

Encrypted before persistence — never stored in plaintext
Decrypted only at sync time — only the background sync worker decrypts credentials when executing a sync
Never logged — credential values are excluded from all log output

When you delete a connector, the encrypted credentials are permanently removed along with the configuration record. Documents that were synced by the connector remain in the Knowledge Base with their connector_id reference set to null.

Auto-Extraction of Knowledge Items

When auto-extraction is enabled on a connector, each newly synced document automatically triggers the same LLM-powered fact extraction used by the manual Extract Facts feature on the Knowledge Base.

For each document, the system:

Analyzes the document content using an LLM
Identifies structured facts, policies, procedures, and key claims
Creates knowledge items with categories, entity names, and confidence scores
Generates embeddings for each new item

Note: Unlike the manual extraction workflow (which presents facts for review before committing), auto-extracted items from connectors are created directly. Review newly synced knowledge items periodically to ensure quality.

Troubleshooting

Authentication Errors

“Authentication failed” or HTTP 401:

Confluence: Verify your email and API token are correct. API tokens expire when your Atlassian password changes.
SharePoint: Verify the tenant ID, client ID, and client secret. Client secrets expire on the schedule you configured in Azure AD (default: 2 years). Ensure admin consent has been granted for the required permissions.
Google Drive: Verify the service account JSON key is complete (not truncated). Ensure the Google Drive API is enabled in the Cloud Console. Check that the service account has been granted access to the shared drives.

“Access denied” or HTTP 403:

Confluence: Ensure the account has read access to the selected spaces.
SharePoint: Ensure the app registration has Sites.Read.All and Files.Read.All permissions with admin consent.
Google Drive: Ensure the service account is added as a member of the shared drives you want to sync.

Sync Failures

“FetchChanges failed”:

Check network connectivity to the external system. If your TruthVouch instance is behind a firewall, ensure outbound HTTPS access to the external system’s API endpoints.
Check the connector’s error message in the detail panel for a specific error.

Partial sync (some documents processed, some failed):

Open the connector detail panel and expand the Failed documents section.
Common per-document failures include: unsupported file format, file exceeds 50 MB limit, or temporary API errors.
The connector continues processing remaining documents even when individual documents fail.

Stale delta token (SharePoint):

If SharePoint returns HTTP 410 (Gone), the delta token has expired. The connector automatically falls back to a full sync for the affected drive and acquires a new delta token.

Rate Limiting

All three connectors implement built-in rate limiting to stay within provider API quotas:

Confluence: 1-second delay between paginated calls; 429 response handled with Retry-After header
SharePoint: 200ms delay between Graph API calls; 429 handled with Retry-After
Google Drive: 100ms delay between API calls; 403/429 handled with 30-second retry

If you consistently hit rate limits, consider increasing the sync interval or reducing the scope to fewer spaces/sites/drives.

Unsupported File Types

Files with unsupported extensions are silently skipped during sync. They appear in the sync log details but do not cause the sync to fail. If you need to process a file type that isn’t supported, download and convert it manually, then upload it through the Knowledge Base’s standard upload feature.

Next Steps

Knowledge Base — Full documentation on documents, knowledge items, and the governance pipeline
Governance Overview — Understand the full governance platform
Truth Firewall — Learn how knowledge feeds the 17-stage pipeline
Audit Trail — Review how synced documents are logged