Docs Deployment On-Premise

Deployment

On-Premise Deployment

The full platform in your own infrastructure. Same UX and APIs as managed cloud — but you own the data and the operations.

Availability: This deployment mode is offered to Business & Enterprise customers (sales-led). The managed EU cloud is the self-serve path; feature availability follows the platform roadmap.

When to go on-premise

Regulatory requirement: data cannot leave your DC
IP sensitivity: process parameters, trade secrets, privileged information
Volume economics: per-token pricing becomes expensive past a threshold
Integration tightness: you need the platform inside the same network as key internal systems
Latency: sub-millisecond response to internal users

For zero-internet-egress deployments, see Air-gapped. For a mix of managed + on-prem, see Hybrid.

What ships

Container images

Signed OCI images for the full StellarBase service set, imported into your private registry. See Docker & Kubernetes for the operational shape.

Helm chart

Kubernetes deployment with sensible defaults. Customisable values.yaml. Covers ingress, TLS, persistent volumes, resource limits, autoscaling rules.

Docker Compose (alternative)

For single-node or small multi-node deployments where Kubernetes would be overkill. Production-grade but less scalable.

Models

StellarCloud model weights packaged as signed tarballs. Choose which to install at deploy time — LLMs alone can be ~240 GB, specialized models are much smaller.

Tooling

Admin CLI, health-check scripts, backup / restore utilities, migration scripts, monitoring setup.

Infrastructure requirements

Compute

Scale	Nodes	vCPU	RAM
Pilot (< 50 users)	3	24	96 GB
Department (< 500 users)	6	96	384 GB
Enterprise (< 5,000 users)	12+	200+	1 TB+

GPU (for LLMs)

Workload	Minimum	Recommended
Specialized models only	1x L4	2x L4 / L40S
+ GPT-OSS 120B	2x H100	4x H100
+ Qwen 3.5 397B	8x H100	8x H200

You don’t need to host all LLMs — you choose. Some customers host only the specialized models and route LLM calls to StellarCloud (if connected) or commercial providers via StellarGate.

Storage

Component	Typical size
Postgres (metadata, config)	100 GB (pilot) → 5 TB (enterprise)
Object storage (documents, embeddings)	Depends on corpus — plan ~15% of corpus size
Model weights	~ 4 GB (specialized only) → 250 GB (with large LLMs)
Logs + audit	100 GB / month for typical workloads

Network

Internal network with low-latency connectivity between nodes (10 GbE recommended)
TLS-terminating ingress (we provide the config)
Egress policy — none required for air-gapped; controlled egress for connected deployments

Identity

Integration with your IdP (SAML, OIDC, LDAP/AD). Local users supported for emergencies.

Deployment process

Planning (Week 1) — capacity sizing, network topology, security review
Infrastructure (Week 2) — your team provisions Kubernetes, Postgres, object storage, GPUs
Install (Week 3) — Helm deployment, configuration, first login
Identity + connectors (Week 3–4) — SSO integration, first data sources
Pilot (Week 4–6) — first users, validation, security review
Go-live (Week 6–8) — full rollout

Typical 6–8 weeks for a straightforward deployment. Regulated / air-gapped adds 4–8 weeks for security review and certification.

Updates

Updates ship as signed Helm chart bumps or image updates. Cadence:

Security patches — within 14 days of release (critical patches within 48 hours)
Minor releases — quarterly
Major releases — annually

You control when to apply. Blue-green deployments via Helm provide zero-downtime upgrades. Roll back with a single Helm command.

Support model

Named engineers assigned to your deployment. Communication via:

Your preferred channel (email, Slack, Teams, phone)
Health-check data you share on request
Remote support (if permitted) via your controlled channels
On-site engineering support for installation + annual review (business tier)

Monitoring

Prometheus / Grafana stack integrated. Dashboards for:

Application health (request rate, latency, errors)
Ingestion pipeline (throughput, lag, failures)
LLM inference (tokens, GPU utilization, queue depth)
Storage (disk, backup status)
Security (auth failures, policy violations)

Metrics exportable to your existing observability stack (Datadog, New Relic, etc.).

Backup & recovery

Postgres: continuous backup to object storage
Object storage: versioning + cross-bucket replication (your choice of target)
Point-in-time recovery tested quarterly
Configuration / IaC in your Git

Licensing

Annual licence based on:

Number of users / Bases
Models included (some large LLMs have separate licence terms)
Support tier

Per-token pricing does not apply on self-hosted — unlimited requests within your licence.