Smart Routing

Pick the cheapest / fastest / highest-quality model that meets your constraints, automatically. You describe what you care about; we pick.

Why smart routing

StellarCloud offers several LLMs with different price-quality-latency trade-offs. Hardcoding a specific model in every agent locks you in:

Cost goes up because you’re over-provisioning for simple tasks
Quality suffers because you’re using a small model for hard queries
You have to rewrite dozens of configs when a better model appears

Smart routing solves this by picking per-request based on your declared preferences.

How it works

Instead of pinning a specific model, you specify a routing policy ("model": "@route:cheap-first"). The router looks at your request, weighs it against your policy’s constraints, picks the model that best satisfies them, and returns the completion plus metadata about what was used.

Built-in policies

Policy	Behaviour
`@route:cheap-first`	Pick the cheapest model that can plausibly handle the task. Escalate only if quality gates fail.
`@route:fast-first`	Pick the lowest-latency model. Trades quality for speed.
`@route:best`	Always pick the highest-quality model. Use when cost is secondary.
`@route:balanced`	Default policy. A sensible mid-tier choice unless the request needs frontier reasoning, long context, or strong multilingual handling.
`@route:code`	Optimised for code generation and tool-calling.
`@route:long-context`	Pick the model with the largest context window.

Custom policies

Define your own routing rules. Dimensions you can constrain:

Budget — max cost per request, max daily / monthly spend
Latency — P50 / P95 target
Quality floor — a minimum benchmark score (from an internal eval set)
Context size — route by expected prompt length
Language — route certain languages to specific models
Model allowlist — restrict to a subset of models
Fallback chain — if primary fails or exceeds SLO, try next

Fallbacks

Every routing decision has an implicit fallback chain. If the chosen model is down, rate-limited, or returns an error, the router tries the next eligible model. You always get a response (or a definitive error) — you never see transient provider outages.

Observability

The response metadata tells you:

Which model was chosen
Why (policy + which constraint dominated)
Alternatives considered
Estimated cost vs. actual

Dashboards show distribution of model usage over time — useful for cost optimization and for spotting cases where a policy is under-routing or over-routing.

When not to use smart routing

Regulatory fixation — if compliance requires a specific model for audit reasons, hardcode it
Reproducibility — for workflows where the exact model is part of the “recipe” (e.g. a reproducible research pipeline), pick explicitly
Fine-grained benchmarking — while A/B testing a model for a specific task, pick explicitly

Evaluating models

For your own workloads, we provide an eval harness:

Build a test set from your real queries
Run all candidate models against it
Score with a metric of your choice (exact match, LLM-as-judge, human labels)
See which model wins on your specific workload

Use the results to tune your routing policy — or to justify hardcoding.