AI Agents

A scoped AI specialist you configure. The agent's system prompt is where your domain expertise lives. Its tools, knowledge scope, and output schema make it predictable. Every answer is cited.

What an agent is

An agent is a configuration, not a fixed model. Five pieces define it:

Piece	What it determines
Name + role	How users invoke it; what it presents itself as
System prompt	Behaviour, constraints, your playbook
Knowledge scope	Which collections it can read
Tool allowlist	What actions it can take
Output schema	Whether responses must follow a fixed structure

The system prompt is the asset

Most of your domain expertise — your firm’s playbook, your hospital’s protocols, your senior engineer’s heuristics — encodes into the system prompt. This is the most valuable thing you create on the platform.

Good prompts share a structure:

Role definition — “You are a [specific role].”
Goal — what success looks like for one task
Inputs — what the agent will receive
Required outputs — exact fields, format, citation requirements
Rules / heuristics — the rules from your playbook, named explicitly
Edge cases — what to do when input is ambiguous, missing, conflicting
Style — formal / informal, language, brevity

Resist the urge to write one giant agent that does everything. Specialists outperform generalists. Three focused agents (NDA Reviewer, MSA Reviewer, Employment Reviewer) beat one “Contract Reviewer” that tries to be all three.

Knowledge scope

Limit each agent to the collections it actually needs. A research agent doesn’t need access to HR records. Scoping matters for:

Quality — agents perform better on focused corpora
Privacy — even authorized users may want strict data partitioning
Cost — smaller scope means smaller retrieval payloads
Speed — less to search

Scopes can be defined at Base, collection, or document-tag level.

Tools

Tools are functions an agent can invoke. The platform ships with built-ins covering the basics — knowledge-base search, single-document search, full-document fetch, OCR on attached files, entity and graph lookup, and structured-output emission against a schema.

Beyond built-ins, register your own tools — REST endpoints, custom Python scripts, your own ML models, calendar APIs, anything addressable. See Zero-Trust for how tool allowlists work.

Output schema (optional but recommended)

For agents whose output feeds downstream systems (CSV exports, database inserts, workflow steps), define an output schema. The agent’s response must conform to it. Examples:

Risk register row: id, type, counterparty, jurisdiction, risk_level, flag_text, citation
Treatment option: regimen, evidence_grade, contraindications, citation_url
Lit-review entry: paper_id, relevance_score, summary, methods, limitations

Schema enforcement uses native structured-output features of modern LLMs. The agent will retry up to N times if its output doesn’t conform; if it still doesn’t, the workflow flags the row for human review.

Multi-agent workflows

Agents can call other agents. A “Tumor Board Coordinator” agent might invoke “Case Summarizer”, “Guideline Agent”, and “Literature Agent” in sequence, then synthesize their outputs into a final packet.

This composition is preferred over one mega-agent. Each component agent is testable in isolation; the orchestration is its own concern.

Versioning

🛠 On the roadmap. Agent versioning — rollback and side-by-side comparison — is on the roadmap; no firm date yet.

Every change to an agent’s prompt or configuration is versioned. Roll back instantly if a new version regresses. Compare two versions side-by-side on a held-out test set.

Testing

Build a test set of representative inputs (5–50 cases). Run the agent against the set after every prompt change. Track quality with a simple rubric:

Did it produce the expected output structure?
Did it cite correctly?
Did it follow the playbook rules?
Did it flag edge cases appropriately?

Test sets prevent prompt-tuning from regressing previously-working cases.

Cost & latency

Agent cost is dominated by LLM tokens. Tips:

Tight knowledge scope reduces retrieval payload, which reduces context size
Use smaller models (GPT-OSS 120B over Qwen 3.5 397B) where quality permits
Cache deterministic sub-results
Use smart routing to pick the cheapest model that meets your quality bar

Common mistakes

Vague prompts — “review this and tell me if it’s good” → agent doesn’t know what good means
Missing edge cases — agent confidently produces nonsense for inputs you didn’t anticipate
No output schema — agents drift in format over time, breaking downstream parsers
Too many tools — agents pick wrong tools when many are available; allowlist tightly
No test set — every prompt edit becomes risky

Workflows — running agents on a schedule
Chat — invoking agents conversationally
LLMs — picking the right model for an agent
Zero-Trust — tool security

AI Agents

What an agent is

The system prompt is the asset

Knowledge scope

Tools

Output schema (optional but recommended)

Multi-agent workflows

Versioning

Testing

Cost & latency

Common mistakes

Related