StellarBase
Docs StellarBase Platform Knowledge Base
StellarBase Platform

Knowledge Base

The unified data layer. Every connected source becomes part of one searchable, semantically-linked corpus — without moving the underlying data.

What it is

A knowledge base in StellarBase is not a data lake or a document warehouse. It’s a semantic layer that sits on top of your existing systems. When you connect Google Drive, StellarBase does not copy every file into its own storage. It indexes, parses, embeds, and links — and queries those indexes while the underlying files stay where they are.

Documents are fetched on-demand when a user needs to see original content. Preview is served through our rendering; the source-of-truth remains your system.

Structure

Bases

A Base is the top-level container. It has its own data, its own agents, its own permissions, and its own audit log. Typical patterns:

  • A law firm has one Base per matter (isolation for privilege)
  • A hospital has one Base per department (oncology, cardiology, rare disease)
  • A manufacturer has one Base per plant (process IP stays contained)
  • A research lab has one Base per project

Bases can share nothing or share everything, depending on your permissions model. See Auth & RBAC.

Collections

Inside a Base, collections group related sources. A “Contracts” collection might include a VDR connector, an iManage folder, and three Zotero libraries. Collections can be tagged, searched independently, and scoped to specific agents.

Documents

Each document is a single ingested unit — a PDF, an email thread, a Slack channel export, a row-group from a database. Documents carry:

  • Content — parsed text, tables, figures, metadata
  • Provenance — source system, path, author, timestamp, version
  • Embeddings — vector representations for semantic search
  • Entities — people, organizations, locations, amounts, dates (resolved by DSM)
  • Classifications — labels assigned automatically or by your custom classifier

How data gets in

Three mechanisms:

Connectors (continuous sync)

The primary path. You configure a connector once; it syncs continuously. When a file is added, updated, or deleted in the source system, the knowledge base reflects the change within minutes. See Data Sources for the full list.

Upload (one-shot)

For things that don’t live in a connected system. Drag a folder of PDFs into the UI, or POST them to the ingest API. One-time ingestion, no continuous sync.

Offline data packs (air-gapped)

For air-gapped deployments, StellarBase accepts signed data packs — pre-built corpora you load once and refresh quarterly (legal codes, regulations, scientific literature mirrors). See Air-gapped.

How documents are processed

Every ingested document is parsed, embedded for semantic search, scanned for entities by DSM, and linked into the unified graph — all automatically on ingest. The output is a single hybrid index combining vector, keyword, and structured retrieval.

You don’t configure the pipeline for typical use; advanced users can override defaults.

Deduplication

The same document can arrive via multiple sources — a paper on PubMed, an ArXiv preprint, a file in your Zotero, a PDF in your Drive. StellarBase merges duplicates into a single canonical record with all versions preserved. DOI, ORCID, and other identifiers are resolved automatically. Citation counts aggregate across versions.

Storage & costs

Storage in StellarBase is bound by the indexes (a small fraction of original size) plus cached previews. The underlying documents are not duplicated — they’re fetched from the source system on demand. This keeps storage costs predictable even with large corpora.

ComponentSize relative to source
Extracted text + metadata~ 5–10%
Vector embeddings~ 1–3%
Entity + graph tables< 1%
Preview cache (thumbnails, page images)~ 10–20%

Permissions

Every Base, collection, and document respects your RBAC policy. An agent configured for the “Corporate finance” Base can’t see documents in “Private banking” — even if they live in the same Postgres. See Auth & RBAC.

Related