On-Premise Deployment
The full platform in your own infrastructure. Same features, same UX, same APIs as managed cloud — but you own the data and the operations.
When to go on-premise
- Regulatory requirement: data cannot leave your DC
- IP sensitivity: process parameters, trade secrets, privileged information
- Volume economics: per-token pricing becomes expensive past a threshold
- Integration tightness: you need the platform inside the same network as key internal systems
- Latency: sub-millisecond response to internal users
For zero-internet-egress deployments, see Air-gapped. For a mix of managed + on-prem, see Hybrid.
What ships
Container images
Signed OCI images for the full StellarBase service set, imported into your private registry. See Docker & Kubernetes for the operational shape.
Helm chart
Kubernetes deployment with sensible defaults. Customisable values.yaml. Covers ingress, TLS, persistent volumes, resource limits, autoscaling rules.
Docker Compose (alternative)
For single-node or small multi-node deployments where Kubernetes would be overkill. Production-grade but less scalable.
Models
StellarCloud model weights packaged as signed tarballs. Choose which to install at deploy time — LLMs alone can be ~240 GB, specialized models are much smaller.
Tooling
Admin CLI, health-check scripts, backup / restore utilities, migration scripts, monitoring setup.
Infrastructure requirements
Compute
| Scale | Nodes | vCPU | RAM |
|---|---|---|---|
| Pilot (< 50 users) | 3 | 24 | 96 GB |
| Department (< 500 users) | 6 | 96 | 384 GB |
| Enterprise (< 5,000 users) | 12+ | 200+ | 1 TB+ |
GPU (for LLMs)
| Workload | Minimum | Recommended |
|---|---|---|
| Specialized models only | 1x L4 | 2x L4 / L40S |
| + GPT-OSS 120B | 2x H100 | 4x H100 |
| + Qwen 3.5 397B | 8x H100 | 8x H200 |
You don’t need to host all LLMs — you choose. Some customers host only the specialized models and route LLM calls to StellarCloud (if connected) or commercial providers via StellarGate.
Storage
| Component | Typical size |
|---|---|
| Postgres (metadata, config) | 100 GB (pilot) → 5 TB (enterprise) |
| Object storage (documents, embeddings) | Depends on corpus — plan ~15% of corpus size |
| Model weights | ~ 4 GB (specialized only) → 250 GB (with large LLMs) |
| Logs + audit | 100 GB / month for typical workloads |
Network
- Internal network with low-latency connectivity between nodes (10 GbE recommended)
- TLS-terminating ingress (we provide the config)
- Egress policy — none required for air-gapped; controlled egress for connected deployments
Identity
Integration with your IdP (SAML, OIDC, LDAP/AD). Local users supported for emergencies.
Deployment process
- Planning (Week 1) — capacity sizing, network topology, security review
- Infrastructure (Week 2) — your team provisions Kubernetes, Postgres, object storage, GPUs
- Install (Week 3) — Helm deployment, configuration, first login
- Identity + connectors (Week 3–4) — SSO integration, first data sources
- Pilot (Week 4–6) — first users, validation, security review
- Go-live (Week 6–8) — full rollout
Typical 6–8 weeks for a straightforward deployment. Regulated / air-gapped adds 4–8 weeks for security review and certification.
Updates
Updates ship as signed Helm chart bumps or image updates. Cadence:
- Security patches — within 14 days of release (critical patches within 48 hours)
- Minor releases — quarterly
- Major releases — annually
You control when to apply. Blue-green deployments via Helm provide zero-downtime upgrades. Roll back with a single Helm command.
Support model
Named engineers assigned to your deployment. Communication via:
- Your preferred channel (email, Slack, Teams, phone)
- Health-check data you share on request
- Remote support (if permitted) via your controlled channels
- On-site engineering support for installation + annual review (business tier)
Monitoring
Prometheus / Grafana stack integrated. Dashboards for:
- Application health (request rate, latency, errors)
- Ingestion pipeline (throughput, lag, failures)
- LLM inference (tokens, GPU utilization, queue depth)
- Storage (disk, backup status)
- Security (auth failures, policy violations)
Metrics exportable to your existing observability stack (Datadog, New Relic, etc.).
Backup & recovery
- Postgres: continuous backup to object storage
- Object storage: versioning + cross-bucket replication (your choice of target)
- Point-in-time recovery tested quarterly
- Configuration / IaC in your Git
Licensing
Annual licence based on:
- Number of users / Bases
- Models included (some large LLMs have separate licence terms)
- Support tier
Per-token pricing does not apply on self-hosted — unlimited requests within your licence.
Related
- Deployment Overview
- Docker & Kubernetes — technical details
- Air-gapped
- Hybrid
