Search & Discovery
Natural-language retrieval across every connected source. Hybrid (semantic + keyword + structured), cross-lingual, permission-aware, citation-bound.
What “search” covers
Three layered capabilities, all exposed through the same UI and API:
- Retrieval — given a query, return ranked passages from your corpus
- Question answering — given a question, return an answer with citations to source passages
- Discovery — exploration without a specific query: clusters, recent activity, related content
Hybrid retrieval
Three retrieval signals combine in every query:
| Signal | What it captures |
|---|---|
| Semantic | Meaning — “low-cost airlines” matches “budget carriers” |
| Keyword | Exact terms — names, IDs, technical jargon |
| Structured | Metadata, entity, date, author filters |
Pure semantic search misses exact-match queries (a specific contract ID, a specific gene name). Pure keyword search misses paraphrase. Hybrid is robust to both.
Cross-lingual
A query in Czech finds relevant German, French, and Polish results. Multilingual embeddings put content from every language into a shared vector space. See Multilingual.
Permission-aware
Search respects every layer of permissions. Documents a user can’t read won’t appear in results — even if they’re semantically perfect matches. This is enforced at the index layer, not as a post-filter, so there’s no information leak about what exists.
Re-ranking
After initial retrieval, a reranker re-scores the candidates for relevance — producing a much sharper top-10 than embedding similarity alone. Reranking is automatic on agent and chat queries; configurable on direct API access.
Filters
Combine semantic + structured filters in one query:
- Source: only documents from a specific connector or collection
- Date: published after / before / between
- Language: only documents in specific languages
- Entity: documents mentioning a specific person / organization / project
- Type: PDFs only, emails only, etc.
- Custom metadata: any field you’ve added during ingest
Question answering
For natural-language questions, search retrieves relevant passages, then an LLM synthesizes an answer grounded in those passages. The answer always cites — click any citation to jump to the source. If no relevant passages exist, the system says so rather than hallucinating.
QA is the default behaviour in chat. For direct retrieval (just give me passages, don’t synthesize), there’s a separate retrieval mode.
Discovery views
Beyond explicit search, the platform offers exploratory views:
- Recent — what’s been added or modified recently
- Trending — what your team has been searching / accessing
- Related — given a document, what else is conceptually adjacent
- Topic clusters — auto-grouped documents by theme
- Citation graph — for academic / legal corpora, the network of who-cites-whom
Saved searches
Save a query for one-click re-execution. Combined with notifications, you get alerts when new documents match a saved query — useful for monitoring incoming literature, tracking a regulatory topic, watching a counterparty.
Search via API
Programmatic access for integration with other systems. Standard REST, returns ranked results with provenance and confidence scores. See API Reference.
Performance
| Operation | Typical latency |
|---|---|
| Hybrid retrieval (top-10 from < 1M docs) | < 200 ms |
| Hybrid retrieval (top-10 from 100M docs) | < 600 ms |
| Question answering (with synthesis) | 1–5 s, depending on model |
| Faceted exploration | < 300 ms |
Common patterns
Find supporting evidence for a claim
Pose the claim as a question. The QA flow returns the strongest supporting passages with citations.
Find counter-evidence
Negation often confuses naive search. Use the structured filter contradicts:<claim> for explicit counter-evidence retrieval.
Track a topic over time
Save the query, enable notifications, get a weekly digest of new matches.
Compare two perspectives
Run the same query against two different collections (legal team’s notes vs. opposing counsel’s submissions, your protocol vs. a published guideline) and view side-by-side.
Related
- DSM — the entity graph that powers structured queries
- Multilingual — cross-language details
- Chat — conversational search
- API Reference — programmatic access
