StellarBase
Docs StellarBase Platform Search & Discovery
StellarBase Platform

Search & Discovery

Natural-language retrieval across every connected source. Hybrid (semantic + keyword + structured), cross-lingual, permission-aware, citation-bound.

What “search” covers

Three layered capabilities, all exposed through the same UI and API:

  • Retrieval — given a query, return ranked passages from your corpus
  • Question answering — given a question, return an answer with citations to source passages
  • Discovery — exploration without a specific query: clusters, recent activity, related content

Hybrid retrieval

Three retrieval signals combine in every query:

SignalWhat it captures
SemanticMeaning — “low-cost airlines” matches “budget carriers”
KeywordExact terms — names, IDs, technical jargon
StructuredMetadata, entity, date, author filters

Pure semantic search misses exact-match queries (a specific contract ID, a specific gene name). Pure keyword search misses paraphrase. Hybrid is robust to both.

Cross-lingual

A query in Czech finds relevant German, French, and Polish results. Multilingual embeddings put content from every language into a shared vector space. See Multilingual.

Permission-aware

Search respects every layer of permissions. Documents a user can’t read won’t appear in results — even if they’re semantically perfect matches. This is enforced at the index layer, not as a post-filter, so there’s no information leak about what exists.

Re-ranking

After initial retrieval, a reranker re-scores the candidates for relevance — producing a much sharper top-10 than embedding similarity alone. Reranking is automatic on agent and chat queries; configurable on direct API access.

Filters

Combine semantic + structured filters in one query:

  • Source: only documents from a specific connector or collection
  • Date: published after / before / between
  • Language: only documents in specific languages
  • Entity: documents mentioning a specific person / organization / project
  • Type: PDFs only, emails only, etc.
  • Custom metadata: any field you’ve added during ingest

Question answering

For natural-language questions, search retrieves relevant passages, then an LLM synthesizes an answer grounded in those passages. The answer always cites — click any citation to jump to the source. If no relevant passages exist, the system says so rather than hallucinating.

QA is the default behaviour in chat. For direct retrieval (just give me passages, don’t synthesize), there’s a separate retrieval mode.

Discovery views

Beyond explicit search, the platform offers exploratory views:

  • Recent — what’s been added or modified recently
  • Trending — what your team has been searching / accessing
  • Related — given a document, what else is conceptually adjacent
  • Topic clusters — auto-grouped documents by theme
  • Citation graph — for academic / legal corpora, the network of who-cites-whom

Saved searches

Save a query for one-click re-execution. Combined with notifications, you get alerts when new documents match a saved query — useful for monitoring incoming literature, tracking a regulatory topic, watching a counterparty.

Search via API

Programmatic access for integration with other systems. Standard REST, returns ranked results with provenance and confidence scores. See API Reference.

Performance

OperationTypical latency
Hybrid retrieval (top-10 from < 1M docs)< 200 ms
Hybrid retrieval (top-10 from 100M docs)< 600 ms
Question answering (with synthesis)1–5 s, depending on model
Faceted exploration< 300 ms

Common patterns

Find supporting evidence for a claim

Pose the claim as a question. The QA flow returns the strongest supporting passages with citations.

Find counter-evidence

Negation often confuses naive search. Use the structured filter contradicts:<claim> for explicit counter-evidence retrieval.

Track a topic over time

Save the query, enable notifications, get a weekly digest of new matches.

Compare two perspectives

Run the same query against two different collections (legal team’s notes vs. opposing counsel’s submissions, your protocol vs. a published guideline) and view side-by-side.

Related

  • DSM — the entity graph that powers structured queries
  • Multilingual — cross-language details
  • Chat — conversational search
  • API Reference — programmatic access