StellarBase
Docs StellarBase Platform AI Agents
StellarBase Platform

AI Agents

A scoped AI specialist you configure. The agent's system prompt is where your domain expertise lives. Its tools, knowledge scope, and output schema make it predictable. Every answer is cited.

What an agent is

An agent is a configuration, not a fixed model. Five pieces define it:

PieceWhat it determines
Name + roleHow users invoke it; what it presents itself as
System promptBehaviour, constraints, your playbook
Knowledge scopeWhich collections it can read
Tool allowlistWhat actions it can take
Output schemaWhether responses must follow a fixed structure

The system prompt is the asset

Most of your domain expertise — your firm’s playbook, your hospital’s protocols, your senior engineer’s heuristics — encodes into the system prompt. This is the most valuable thing you create on the platform.

Good prompts share a structure:

  • Role definition — “You are a [specific role].”
  • Goal — what success looks like for one task
  • Inputs — what the agent will receive
  • Required outputs — exact fields, format, citation requirements
  • Rules / heuristics — the rules from your playbook, named explicitly
  • Edge cases — what to do when input is ambiguous, missing, conflicting
  • Style — formal / informal, language, brevity

Resist the urge to write one giant agent that does everything. Specialists outperform generalists. Three focused agents (NDA Reviewer, MSA Reviewer, Employment Reviewer) beat one “Contract Reviewer” that tries to be all three.

Knowledge scope

Limit each agent to the collections it actually needs. A research agent doesn’t need access to HR records. Scoping matters for:

  • Quality — agents perform better on focused corpora
  • Privacy — even authorized users may want strict data partitioning
  • Cost — smaller scope means smaller retrieval payloads
  • Speed — less to search

Scopes can be defined at Base, collection, or document-tag level.

Tools

Tools are functions an agent can invoke. The platform ships with built-ins covering the basics — knowledge-base search, single-document search, full-document fetch, OCR on attached files, entity and graph lookup, and structured-output emission against a schema.

Beyond built-ins, register your own tools — REST endpoints, custom Python scripts, your own ML models, calendar APIs, anything addressable. See Zero-Trust for how tool allowlists work.

Output schema (optional but recommended)

For agents whose output feeds downstream systems (CSV exports, database inserts, workflow steps), define an output schema. The agent’s response must conform to it. Examples:

  • Risk register row: id, type, counterparty, jurisdiction, risk_level, flag_text, citation
  • Treatment option: regimen, evidence_grade, contraindications, citation_url
  • Lit-review entry: paper_id, relevance_score, summary, methods, limitations

Schema enforcement uses native structured-output features of modern LLMs. The agent will retry up to N times if its output doesn’t conform; if it still doesn’t, the workflow flags the row for human review.

Multi-agent workflows

Agents can call other agents. A “Tumor Board Coordinator” agent might invoke “Case Summarizer”, “Guideline Agent”, and “Literature Agent” in sequence, then synthesize their outputs into a final packet.

This composition is preferred over one mega-agent. Each component agent is testable in isolation; the orchestration is its own concern.

Versioning

Every change to an agent’s prompt or configuration is versioned. Roll back instantly if a new version regresses. Compare two versions side-by-side on a held-out test set.

Testing

Build a test set of representative inputs (5–50 cases). Run the agent against the set after every prompt change. Track quality with a simple rubric:

  • Did it produce the expected output structure?
  • Did it cite correctly?
  • Did it follow the playbook rules?
  • Did it flag edge cases appropriately?

Test sets prevent prompt-tuning from regressing previously-working cases.

Cost & latency

Agent cost is dominated by LLM tokens. Tips:

  • Tight knowledge scope reduces retrieval payload, which reduces context size
  • Use smaller models (GPT-OSS 120B over Qwen 3.5 397B) where quality permits
  • Cache deterministic sub-results
  • Use smart routing to pick the cheapest model that meets your quality bar

Common mistakes

  • Vague prompts — “review this and tell me if it’s good” → agent doesn’t know what good means
  • Missing edge cases — agent confidently produces nonsense for inputs you didn’t anticipate
  • No output schema — agents drift in format over time, breaking downstream parsers
  • Too many tools — agents pick wrong tools when many are available; allowlist tightly
  • No test set — every prompt edit becomes risky

Related

  • Workflows — running agents on a schedule
  • Chat — invoking agents conversationally
  • LLMs — picking the right model for an agent
  • Zero-Trust — tool security