StellarBase
Docs StellarCloud Large Language Models
StellarCloud

Large Language Models

Open-weights generative models served from EU GPUs. Different sizes, different strengths — pick one explicitly, or let smart routing decide.

Available models

GPT-OSS 120B

Open-weights LLM released by OpenAI. Broad general-purpose capability, instruction-tuned, reliable for most workloads. Apache 2.0 licence. 128K context.

  • Best for: general question-answering, summarisation, most agent workflows, drafting
  • Price: €0.20 / 1M input tokens · €0.80 / 1M output tokens
  • Latency: depends on output length; fastest of the three on short responses

Devstral 2

Mistral’s coding-focused model. Strong on software engineering tasks, tool use, agentic workflows with multi-step reasoning. Apache 2.0. 128K context.

  • Best for: code generation + review, technical writing, workflows that invoke many tools, structured output
  • Price: €0.50 / 1M input tokens · €2.00 / 1M output tokens
  • Latency: moderate. Strong tool-calling compliance reduces retry overhead.

Qwen 3.5 397B A17B

Frontier-tier open-weights model. Multilingual reasoning, long context, top-end quality. 256K context.

  • Best for: complex reasoning, multilingual workloads, long-context analysis (full contracts, long papers), anywhere you’d use GPT-4
  • Price: €0.70 / 1M input tokens · €3.80 / 1M output tokens
  • Latency: highest. The model is the largest; responses take longest.

Choosing between them

You care about…Pick
Cheapest inferenceGPT-OSS 120B
Fastest response on short promptsGPT-OSS 120B
Best coding / tool-calling qualityDevstral 2
Long-context (> 128K tokens)Qwen 3.5 397B
Best multilingual (EU languages)Qwen 3.5 397B
Best reasoning on hard problemsQwen 3.5 397B
Budget-balanced defaultSmart routing

Context windows

A “context window” is the maximum combined size of input + output. Larger is not always better — longer contexts cost more and are slower. Typical guidance:

  • Short interactions (chat, Q&A): any model at its default context is fine
  • Document analysis (one contract, one paper): 128K is enough for 300+ pages
  • Corpus analysis (multiple documents, long conversations): 256K (Qwen 3.5) or retrieval-augmented

Because StellarBase agents do retrieval-augmented generation by default, you rarely need to pass huge context — the knowledge base serves the relevant passages and the model operates on a curated subset. Long-context models are still useful for specific workflows (summarising a full regulation, analysing a 500-page monograph).

Structured output

All three models support structured output (JSON Schema or Pydantic-like definitions). Devstral 2 has the strongest schema compliance — we recommend it for workflows that must produce strict outputs (CSV rows, database inserts, fields for external systems).

Streaming

Token-by-token streaming is supported on all three models. Default for chat-like workloads, optional for agent workflows where you want the full response before acting.

Tool use / function calling

All three support OpenAI-compatible tool calling. Devstral 2 has the most reliable tool-calling behaviour across complex multi-step workflows.

Fine-tuning

Not currently offered as a managed service for these models. For customisation, use:

  • System prompts — for behavioural tuning
  • Retrieval — for knowledge adaptation (almost always the right answer)
  • On-premise deployment — if you need full weight control, self-host the model in your infrastructure and fine-tune there

Commercial LLMs (GPT-4, Claude, Gemini)

Not hosted on StellarCloud — they’re proprietary and hosted by their vendors. You can still use them from StellarBase via StellarGate, which anonymizes your prompts before forwarding them to the vendor.

Related