Docs Deployment Docker & Kubernetes

Deployment

Docker & Kubernetes

Self-hosted deployments ship as standard cloud-native primitives. Helm charts and Compose files, GPU-aware, with the operational surface a platform team expects.

Availability: Self-hosted (Docker / Kubernetes) deployments are offered to Business & Enterprise customers (sales-led). The managed EU cloud is the self-serve path; feature availability follows the platform roadmap.

Deployment options

Option	When to use
Docker Compose	Single-node pilots, development environments, small teams
Helm on Kubernetes	Production, multi-node, auto-scaling
Operator (K8s)	Advanced — CRD-based lifecycle management
Bare binary + systemd	Minimal-dependency environments

What you’re deploying

StellarBase ships as a set of stateless application services plus a small number of stateful dependencies. Stateless services scale horizontally; stateful components are industry-standard and either bundled or pluggable with managed offerings.

The architecture is intentionally boring: cloud-native primitives, a Helm umbrella chart, no exotic runtime. If your team operates Kubernetes, they already know how to operate StellarBase.

Stateful dependencies

Component	Role
PostgreSQL 14+	Metadata, config, audit log
Redis 7+	Queues, rate limits, ephemeral state
Object storage (S3-compatible)	Documents, model weights, backups
Vector index	Semantic search

All four are replaceable with managed offerings (RDS, ElastiCache, managed MinIO, etc.) or with your existing internal infrastructure. Sizing is workload-dependent — we provide reference values per tier in the Helm chart.

Kubernetes requirements

Kubernetes 1.27+
Ingress controller (NGINX, Traefik, Istio — your choice)
StorageClass for persistent volumes (SSD recommended)
NVIDIA GPU Operator for GPU workloads
cert-manager for TLS (or your own cert pipeline)
Metrics-server for HPA

Helm chart structure

Single umbrella chart with sub-charts per service. The values file is the primary configuration surface. Common customisations:

Replica counts and resource requests / limits
GPU selector for inference pods
Storage class and sizes
Secrets source (K8s secrets, Vault, AWS Secrets Manager)
Ingress hostnames and TLS
Model choices (which LLMs to install)

We provide opinionated defaults for three tiers (pilot, department, enterprise). Override anything.

GPU scheduling

Inference pods declare GPU requirements. Kubernetes GPU Operator schedules them onto matching nodes. Multi-GPU inference is handled automatically — you size the pool, we manage placement.

For multi-tenant isolation, use GPU time-slicing (MIG on H100 / H200) to assign slices to different tenants.

Scaling policies

Horizontal Pod Autoscaler — stateless services scale on CPU + custom metrics (queue depth, request rate).
Vertical Pod Autoscaler — recommended for learning correct resource sizing. Start in recommendation mode, apply after a week.
Cluster autoscaler — for cloud K8s (EKS, GKE, AKS), scales nodes up / down with workload.

Networking

Service mesh

Istio or Linkerd recommended for mTLS between services. We provide PeerAuthentication and AuthorizationPolicy manifests.

Ingress

Single ingress per environment. TLS via cert-manager or your own certificates. WAF optional (recommended for internet-exposed deployments).

Egress

Default: block all egress, explicit allowlist for required destinations (LLM providers, connector targets). For air-gapped, block all.

Observability

Prometheus metrics on every service
OpenTelemetry traces for request-level debugging
Structured JSON logs, shippable to Loki / Elasticsearch / your SIEM
Grafana dashboards bundled in the Helm chart

Backup & restore

Postgres: continuous backup to object storage with point-in-time recovery, tested quarterly.

Object storage: versioning enabled. Cross-region replication optional.

Config: IaC in your Git. The Helm chart + values.yaml is the source of truth.

Upgrades

Rolling updates via helm upgrade. Blue-green option for zero-downtime in production. Database migrations run automatically on startup.

Always upgrade in lower environments first. Read the release notes — minor versions occasionally require a brief read-only window for certain migrations.

Debugging

Health endpoints — every service exposes /healthz and /readyz
Correlation IDs — track a single request across services via X-Correlation-Id
Audit log — operational actions visible in the admin UI

Docker Compose (for pilots)

Simpler for small deployments:

Single-host, services as containers
Same images as K8s
Production-grade but limited to vertical scaling on one machine
Good for pilots up to ~50 users