Docker & Kubernetes
Self-hosted deployments ship as standard cloud-native primitives. Helm charts and Compose files, GPU-aware, with the operational surface a platform team expects.
Deployment options
| Option | When to use |
|---|---|
| Docker Compose | Single-node pilots, development environments, small teams |
| Helm on Kubernetes | Production, multi-node, auto-scaling |
| Operator (K8s) | Advanced — CRD-based lifecycle management |
| Bare binary + systemd | Minimal-dependency environments |
What you’re deploying
StellarBase ships as a set of stateless application services plus a small number of stateful dependencies. Stateless services scale horizontally; stateful components are industry-standard and either bundled or pluggable with managed offerings.
The architecture is intentionally boring: cloud-native primitives, a Helm umbrella chart, no exotic runtime. If your team operates Kubernetes, they already know how to operate StellarBase.
Stateful dependencies
| Component | Role |
|---|---|
| PostgreSQL 14+ | Metadata, config, audit log |
| Redis 7+ | Queues, rate limits, ephemeral state |
| Object storage (S3-compatible) | Documents, model weights, backups |
| Vector index | Semantic search |
All four are replaceable with managed offerings (RDS, ElastiCache, managed MinIO, etc.) or with your existing internal infrastructure. Sizing is workload-dependent — we provide reference values per tier in the Helm chart.
Kubernetes requirements
- Kubernetes 1.27+
- Ingress controller (NGINX, Traefik, Istio — your choice)
- StorageClass for persistent volumes (SSD recommended)
- NVIDIA GPU Operator for GPU workloads
- cert-manager for TLS (or your own cert pipeline)
- Metrics-server for HPA
Helm chart structure
Single umbrella chart with sub-charts per service. The values file is the primary configuration surface. Common customisations:
- Replica counts and resource requests / limits
- GPU selector for inference pods
- Storage class and sizes
- Secrets source (K8s secrets, Vault, AWS Secrets Manager)
- Ingress hostnames and TLS
- Model choices (which LLMs to install)
We provide opinionated defaults for three tiers (pilot, department, enterprise). Override anything.
GPU scheduling
Inference pods declare GPU requirements. Kubernetes GPU Operator schedules them onto matching nodes. Multi-GPU inference is handled automatically — you size the pool, we manage placement.
For multi-tenant isolation, use GPU time-slicing (MIG on H100 / H200) to assign slices to different tenants.
Scaling policies
- Horizontal Pod Autoscaler — stateless services scale on CPU + custom metrics (queue depth, request rate).
- Vertical Pod Autoscaler — recommended for learning correct resource sizing. Start in recommendation mode, apply after a week.
- Cluster autoscaler — for cloud K8s (EKS, GKE, AKS), scales nodes up / down with workload.
Networking
Service mesh
Istio or Linkerd recommended for mTLS between services. We provide PeerAuthentication and AuthorizationPolicy manifests.
Ingress
Single ingress per environment. TLS via cert-manager or your own certificates. WAF optional (recommended for internet-exposed deployments).
Egress
Default: block all egress, explicit allowlist for required destinations (LLM providers, connector targets). For air-gapped, block all.
Observability
- Prometheus metrics on every service
- OpenTelemetry traces for request-level debugging
- Structured JSON logs, shippable to Loki / Elasticsearch / your SIEM
- Grafana dashboards bundled in the Helm chart
Backup & restore
Postgres: continuous backup to object storage with point-in-time recovery, tested quarterly.
Object storage: versioning enabled. Cross-region replication optional.
Config: IaC in your Git. The Helm chart + values.yaml is the source of truth.
Upgrades
Rolling updates via helm upgrade. Blue-green option for zero-downtime in production. Database migrations run automatically on startup.
Always upgrade in lower environments first. Read the release notes — minor versions occasionally require a brief read-only window for certain migrations.
Debugging
- Health endpoints — every service exposes
/healthzand/readyz - Correlation IDs — track a single request across services via
X-Correlation-Id - Audit log — operational actions visible in the admin UI
Docker Compose (for pilots)
Simpler for small deployments:
- Single-host, services as containers
- Same images as K8s
- Production-grade but limited to vertical scaling on one machine
- Good for pilots up to ~50 users
