Knowledge Connectors
Knowledge Connectors automatically synchronize documents from external systems into your Knowledge Base. Instead of manually uploading documents, connectors pull content from Confluence, SharePoint, and Google Drive on a configurable schedule, keeping your governance pipeline grounded with up-to-date organizational knowledge.
Connectors perform one-way sync: external system to Knowledge Base. Documents are processed through the same pipeline as manual uploads — text extraction, chunking, embedding generation, and optional auto-extraction of structured knowledge items.
Who Can Use Knowledge Connectors?
Knowledge Connectors are available on Professional, Business, and Enterprise subscription tiers — the same tiers that have access to the Knowledge Base. Starter tier accounts see an upgrade prompt.
All connector API endpoints enforce tier checking. Requests from accounts below Professional tier receive a 403 Forbidden response.
Available Connectors
TruthVouch provides three built-in connectors:
| Connector | Source System | Auth Methods | Scope | What It Syncs |
|---|---|---|---|---|
| Confluence | Atlassian Confluence Cloud | API token (email + token) or OAuth 2.0 | Spaces | Pages (HTML content), page labels |
| SharePoint | Microsoft 365 SharePoint | OAuth 2.0 (Azure AD client credentials) | Sites and document libraries | DOCX, PDF, TXT, XLSX, PPTX files |
| Google Drive | Google Workspace | Service account JSON key or OAuth 2.0 | Shared drives and folders | Google Docs, Sheets, Slides (exported as text), PDF, DOCX, TXT |
Setting Up a Connector
Navigate to AI Governance > Knowledge > Knowledge Base, then select the Connectors tab (the third tab after Documents and Knowledge Items).
Step 1: Open the Add Connector Wizard
Click Add Connector in the top-right corner. A multi-step wizard opens to guide you through the setup.
Step 2: Select Connector Type
Choose from three connector cards:
- Confluence — Sync Confluence Cloud spaces and pages
- SharePoint — Sync SharePoint Online document libraries
- Google Drive — Sync Google Workspace shared drives and folders
Click the card for the system you want to connect, then click Next.
Step 3: Configure Authentication
Enter the credentials required for your chosen connector type. Each connector has specific authentication requirements — see the detailed setup sections below.
Click Test Connection to verify that TruthVouch can reach your external system with the provided credentials. The test performs a read-only API call (listing one space, site, or drive) to confirm access.
If the test fails, you see a specific error message — common issues include expired tokens, insufficient permissions, or incorrect base URLs. Fix the issue and test again before proceeding.
Step 4: Select Scope
After a successful connection test, the wizard loads available scopes from your external system:
- Confluence: A list of spaces you have access to
- SharePoint: A two-level hierarchy of sites and their document libraries
- Google Drive: Shared drives and their root-level folders
Select the specific spaces, sites/libraries, or drives/folders you want to sync. Only selected scopes are included in sync operations.
Step 5: Configure Sync Settings
- Name: A descriptive name for this connector (e.g., “Engineering Confluence” or “HR Policies SharePoint”).
- Sync interval: How often the connector should check for changes. Options: every 1 hour, 6 hours (default), 12 hours, or daily.
- Auto-extract facts: When enabled, newly synced documents automatically trigger LLM-powered extraction of structured knowledge items. Extracted items are created directly in the Knowledge Base. When disabled (default), documents are ingested and chunked but knowledge items must be extracted manually.
Step 6: Review and Create
The final step shows a summary of your configuration: connector type, authentication mode, selected scopes, sync schedule, and auto-extraction setting.
Click Create to save the connector. The credentials are encrypted at rest using AES-256-GCM before being stored.
After creation, you can click Start Initial Sync to trigger the first sync immediately, or wait for the next scheduled run.
Confluence Setup
Requirements
- An Atlassian Cloud account with access to the Confluence instance you want to sync
- Either an API token or an OAuth 2.0 access token
Authentication: API Token
This is the simpler method, recommended for getting started.
- Go to https://id.atlassian.com/manage-profile/security/api-tokens.
- Click Create API token.
- Give it a label (e.g., “TruthVouch Knowledge Connector”) and click Create.
- Copy the generated token.
- In the TruthVouch wizard, enter:
- Base URL: Your Confluence instance URL (e.g.,
https://your-company.atlassian.net) - Email: The email address associated with your Atlassian account
- API Token: The token you just created
- Base URL: Your Confluence instance URL (e.g.,
The connector uses Basic authentication with your email and API token.
Authentication: OAuth 2.0
For production deployments where you want app-level access without tying the connector to a personal account:
- Go to the Atlassian Developer Console and create an OAuth 2.0 app.
- Configure the required scopes:
read:confluence-space.summary,read:confluence-content.all. - Generate and copy the OAuth access token.
- In the TruthVouch wizard, set the auth mode to OAuth and enter the access token.
What Gets Synced
- Confluence pages within the selected spaces — page content is fetched in Confluence storage format (XHTML) and delivered as HTML for processing
- Page labels — attached as metadata for use as tags on knowledge items
- Pages are synced based on modification date. On each incremental sync, only pages modified since the last sync are fetched
- The connector handles Confluence’s cursor-based pagination automatically
Note: Confluence page attachments (PDF, DOCX files attached to pages) are not synced in the current version. Only the page body content is processed. Upload attachments manually if needed.
Rate Limiting
The connector includes a 1-second delay between consecutive paginated API calls and handles HTTP 429 (Too Many Requests) responses by respecting the Retry-After header, with a 30-second default wait when no header is present.
SharePoint Setup
Requirements
- A Microsoft 365 tenant with SharePoint Online
- An Azure AD app registration with appropriate permissions
Authentication: Azure AD App Registration
SharePoint uses the OAuth 2.0 client credentials flow via Microsoft Graph API.
- Go to the Azure Portal > Azure Active Directory > App registrations > New registration.
- Name the app (e.g., “TruthVouch Knowledge Connector”).
- Under API permissions, add the following Application permissions for Microsoft Graph:
Sites.Read.All— read all SharePoint site contentFiles.Read.All— read all files in SharePoint document libraries
- Click Grant admin consent for the permissions.
- Under Certificates & secrets, create a new client secret and copy the value.
- Note the Application (client) ID from the app overview page.
- Note the Directory (tenant) ID from the app overview page.
- In the TruthVouch wizard, enter:
- Tenant ID: Your Azure AD tenant ID
- Client ID: The app registration’s client ID
- Client Secret: The client secret value you created
The connector acquires a Bearer token using the client credentials flow and caches it for the token’s lifetime to avoid unnecessary round-trips.
What Gets Synced
- Files from selected SharePoint document libraries: DOCX, PDF, TXT, XLSX, PPTX formats
- The connector uses Microsoft Graph delta queries for efficient incremental sync — only changed files are fetched on subsequent syncs
- Delta tokens are persisted between sync runs so each sync picks up exactly where the previous one left off
- If a delta token expires (HTTP 410 Gone), the connector automatically falls back to a full sync for that drive
- Deleted files are detected via the Graph delta response and the corresponding documents are archived in TruthVouch
Rate Limiting
The connector includes a 200ms delay between consecutive Graph API calls and handles HTTP 429 responses by respecting the Retry-After header, with a 30-second default wait.
Google Drive Setup
Requirements
- A Google Workspace account
- Either a service account JSON key or an OAuth 2.0 access token
Authentication: Service Account
Recommended for automated sync without user interaction.
- Go to the Google Cloud Console > IAM & Admin > Service Accounts.
- Create a new service account (e.g., “truthvouch-knowledge-connector”).
- Grant the service account access to the shared drives you want to sync (add as a member in Drive settings).
- Under Keys, create a new JSON key and download it.
- Enable the Google Drive API in the Google Cloud Console under APIs & Services.
- In the TruthVouch wizard, paste the entire contents of the JSON key file into the Access Token field.
The connector detects the JSON key format automatically and generates short-lived Bearer tokens via JWT assertion.
Authentication: OAuth 2.0
For setups where you prefer user-delegated access:
- Configure an OAuth consent screen in the Google Cloud Console.
- Create OAuth 2.0 credentials (Web application type).
- Obtain an access token through the OAuth flow.
- Enter the access token in the TruthVouch wizard.
What Gets Synced
- Files from selected shared drives: Google Docs, Sheets, and Slides (exported as plain text), PDF, DOCX, and TXT files
- The connector uses the Google Drive Changes API with page tokens for efficient incremental sync
- Page tokens are persisted between syncs so each run processes only new changes
- Google-native formats (Docs, Sheets, Slides) are exported as text using the Google Drive export endpoint — no separate conversion step is needed
Rate Limiting
The connector includes a 100ms delay between API calls and handles HTTP 403/429 rate-limit responses with a 30-second retry, respecting the Retry-After header when present.
Sync Scheduling
Interval Options
| Interval | Use Case |
|---|---|
| Every 1 hour | Fast-changing documentation, active project wikis |
| Every 6 hours (default) | Standard organizational docs, policies |
| Every 12 hours | Stable reference documentation |
| Daily | Infrequently updated content, archival material |
The sync scheduler checks every minute for connectors that are due for their next sync. When a connector’s interval has elapsed since its last sync, a sync job is queued for background execution.
Manual Sync
You can trigger a sync at any time regardless of the schedule:
- Open the Connectors tab.
- Find the connector in the list.
- Click the Sync Now action (or open the connector detail panel and click Sync Now).
The sync runs in the background. You can continue using the platform while it processes.
Pause and Resume
To temporarily stop a connector from syncing on its schedule:
- Click the Pause action on a connector.
- The connector status changes to paused (amber badge).
- Scheduled syncs are skipped while paused. You can still trigger manual syncs.
- Click Resume to reactivate the schedule.
Sync Process
Each sync follows this process:
- Fetch changes — The connector queries the external system for documents modified since the last sync timestamp. For the initial sync (no previous timestamp), all documents in scope are fetched.
- Change detection — For each document, the connector compares the external system’s last-modified timestamp against the stored value. Documents that haven’t changed are skipped.
- Content download — Changed or new documents are downloaded from the external system.
- Document processing — Each document goes through the same pipeline as manual uploads: text extraction, chunking (approximately 512 tokens with overlap), and embedding generation.
- Auto-extraction (if enabled) — Extracted documents are analyzed by an LLM to identify structured knowledge items (facts, policies, procedures).
- Deletion handling — Documents deleted in the external system are archived in TruthVouch (soft delete). They are removed from the governance pipeline but retained for audit purposes.
- Sync log — Results are recorded: documents added, updated, archived, facts extracted, and any per-document errors.
Single document failures do not abort the entire sync. If one document fails to process (e.g., unsupported format, extraction error), the error is logged and the sync continues with remaining documents.
Monitoring Sync Status
Connector List View
The Connectors tab shows a table with all configured connectors:
| Column | Description |
|---|---|
| Name | The connector’s display name |
| Type | Confluence, SharePoint, or Google Drive (with icon) |
| Status | Current status badge: active (green), paused (amber), error (red), disabled (gray) |
| Last Sync | When the last sync completed and its result (success, partial, failed) |
| Documents Synced | Number of documents processed in the last sync |
| Next Sync | Estimated time until the next scheduled sync |
Connector Detail Panel
Click a connector row to open the detail panel, which shows:
- Configuration summary: Type, scope, schedule, auto-extract setting
- Sync history: A chronological list of recent sync runs with timing, document counts (added/updated/archived), facts extracted, and status
- Failed documents: An expandable list showing which documents failed during sync and the specific error message for each
- Actions: Edit, Sync Now, Pause/Resume, Delete
Status Indicators
| Status | Meaning |
|---|---|
| Active (green) | Connector is running on schedule. Last sync completed successfully. |
| Paused (amber) | Connector is temporarily paused. Scheduled syncs are skipped. |
| Error (red) | The last sync failed or authentication has expired. Check the error message and re-authenticate if needed. |
| Disabled (gray) | Connector has been manually disabled. |
When a connector enters the error state (e.g., due to an expired OAuth token or revoked permissions), it displays a descriptive error message. Common resolutions include re-authenticating or checking that the API permissions haven’t been revoked.
Supported File Types
| Source | Supported Formats |
|---|---|
| Confluence | Page content (HTML). Page attachments are not synced in the current version. |
| SharePoint | DOCX, PDF, TXT, XLSX, PPTX. Other file types are skipped. |
| Google Drive | Google Docs, Google Sheets, Google Slides (exported as text), PDF, DOCX, TXT. Other file types are skipped. |
Size limit: Files larger than 50 MB are skipped. This matches the Knowledge Base’s per-file upload limit. Skipped files are logged in the sync history details.
Credential Security
Connector credentials (API tokens, OAuth tokens, client secrets, service account keys) are encrypted at rest using AES-256-GCM before being stored in the database. The encryption uses the platform’s existing field encryption service with key versioning.
Credentials are:
- Encrypted before persistence — never stored in plaintext
- Decrypted only at sync time — only the background sync worker decrypts credentials when executing a sync
- Never logged — credential values are excluded from all log output
When you delete a connector, the encrypted credentials are permanently removed along with the configuration record. Documents that were synced by the connector remain in the Knowledge Base with their connector_id reference set to null.
Auto-Extraction of Knowledge Items
When auto-extraction is enabled on a connector, each newly synced document automatically triggers the same LLM-powered fact extraction used by the manual Extract Facts feature on the Knowledge Base.
For each document, the system:
- Analyzes the document content using an LLM
- Identifies structured facts, policies, procedures, and key claims
- Creates knowledge items with categories, entity names, and confidence scores
- Generates embeddings for each new item
Note: Unlike the manual extraction workflow (which presents facts for review before committing), auto-extracted items from connectors are created directly. Review newly synced knowledge items periodically to ensure quality.
Troubleshooting
Authentication Errors
“Authentication failed” or HTTP 401:
- Confluence: Verify your email and API token are correct. API tokens expire when your Atlassian password changes.
- SharePoint: Verify the tenant ID, client ID, and client secret. Client secrets expire on the schedule you configured in Azure AD (default: 2 years). Ensure admin consent has been granted for the required permissions.
- Google Drive: Verify the service account JSON key is complete (not truncated). Ensure the Google Drive API is enabled in the Cloud Console. Check that the service account has been granted access to the shared drives.
“Access denied” or HTTP 403:
- Confluence: Ensure the account has read access to the selected spaces.
- SharePoint: Ensure the app registration has
Sites.Read.AllandFiles.Read.Allpermissions with admin consent. - Google Drive: Ensure the service account is added as a member of the shared drives you want to sync.
Sync Failures
“FetchChanges failed”:
- Check network connectivity to the external system. If your TruthVouch instance is behind a firewall, ensure outbound HTTPS access to the external system’s API endpoints.
- Check the connector’s error message in the detail panel for a specific error.
Partial sync (some documents processed, some failed):
- Open the connector detail panel and expand the Failed documents section.
- Common per-document failures include: unsupported file format, file exceeds 50 MB limit, or temporary API errors.
- The connector continues processing remaining documents even when individual documents fail.
Stale delta token (SharePoint):
- If SharePoint returns HTTP 410 (Gone), the delta token has expired. The connector automatically falls back to a full sync for the affected drive and acquires a new delta token.
Rate Limiting
All three connectors implement built-in rate limiting to stay within provider API quotas:
- Confluence: 1-second delay between paginated calls; 429 response handled with Retry-After header
- SharePoint: 200ms delay between Graph API calls; 429 handled with Retry-After
- Google Drive: 100ms delay between API calls; 403/429 handled with 30-second retry
If you consistently hit rate limits, consider increasing the sync interval or reducing the scope to fewer spaces/sites/drives.
Unsupported File Types
Files with unsupported extensions are silently skipped during sync. They appear in the sync log details but do not cause the sync to fail. If you need to process a file type that isn’t supported, download and convert it manually, then upload it through the Knowledge Base’s standard upload feature.
Next Steps
- Knowledge Base — Full documentation on documents, knowledge items, and the governance pipeline
- Governance Overview — Understand the full governance platform
- Truth Firewall — Learn how knowledge feeds the 17-stage pipeline
- Audit Trail — Review how synced documents are logged