Product Apr 10, 2026 4 min read

StellarCloud: One API for Every AI Model

Access GPT-4, Claude, Gemini, Mistral, Llama, and your own models through a single API. Smart routing picks the best model for every query.

Why a Unified API Matters

The AI landscape is fragmented. Every model provider has its own API format, authentication scheme, rate limits, error codes, and pricing model. If you want to use GPT-4 for one task, Claude for another, and Llama for cost-sensitive workloads, you’re managing three different integrations, three sets of credentials, and three billing dashboards.

This isn’t just an engineering inconvenience. It creates real organizational problems:

Vendor lock-in — switching models means rewriting integration code, updating error handling, and retraining teams
Inconsistent monitoring — usage, costs, and performance are scattered across multiple dashboards with no unified view
Slow experimentation — testing a new model requires a new integration, which means weeks of work instead of minutes
Security sprawl — each provider needs its own API key management, access controls, and audit trail

StellarCloud eliminates all of this. One API key, one format, one dashboard. Swap models by changing a single string parameter.

Smart Routing

Not every query needs GPT-4. A simple translation doesn’t require the same model as a complex legal analysis. But today, most teams either use the same model for everything (overpaying for simple tasks) or manually route queries (adding complexity and latency).

StellarCloud’s smart routing solves this automatically. Set model: "auto" and define your optimization targets — cost, latency, quality, or a balance of all three. The router analyzes query complexity in real time and picks the optimal model.

curl https://api.stellarbase.ai/v1/chat/completions 
  -H "Authorization: Bearer sb_your_key" 
  -H "Content-Type: application/json" 
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "Translate this to French: Hello"}
    ],
    "routing": {
      "optimize": "cost",
      "max_latency_ms": 2000
    }
  }'

The routing engine uses a lightweight classifier that adds less than 10ms of overhead. It considers query length, complexity signals, language, and your configured constraints. Over time, it learns from your usage patterns to make better routing decisions.

Routing Strategies

Cost optimization — routes to the cheapest model that meets quality thresholds. Typical savings: 40-60% versus always using a frontier model
Latency optimization — picks the fastest model that meets quality requirements. Critical for real-time applications
Quality optimization — always routes to the highest-capability model, with fallback ordering
Balanced — weighted combination of cost, latency, and quality. The default for most workloads

EU Data Residency

All StellarCloud infrastructure runs in EU data centers. Your prompts, responses, and usage data never leave the European Union. This is a hard guarantee, not a best-effort policy.

For models that don’t offer EU endpoints natively (like some OpenAI models), StellarCloud routes through our EU proxy layer. The prompt leaves our EU infrastructure, hits the model’s API, and the response returns to our EU infrastructure — but your data is never stored outside the EU. Prompts and responses are held in memory only for the duration of the request.

This makes StellarCloud compliant with GDPR data residency requirements without any additional configuration on your part.

Pricing

StellarCloud uses transparent per-token pricing with no markup on smaller models and a thin margin on frontier models. You pay for exactly what you use — no monthly minimums, no seat licenses, no hidden fees.

Open-source models (Llama, Mistral, Mixtral) — at cost, we don’t add any margin
Commercial models (GPT-4, Claude, Gemini) — provider price + 5% infrastructure fee
Smart routing — no additional charge, included with every plan
Volume discounts — automatic tiered pricing that kicks in at 10M tokens/month

Every request is logged with exact token counts and costs in your dashboard, broken down by model, team, and project.

Supported Models

StellarCloud currently supports over 30 models across six providers:

OpenAI — GPT-4o, GPT-4o mini, GPT-4 Turbo, o1, o1-mini
Anthropic — Claude Opus 4, Claude Sonnet 4, Claude Haiku 3.5
Google — Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.0 Flash
Mistral — Mistral Large, Mistral Medium, Mistral Small, Codestral
Meta — Llama 3.3 70B, Llama 3.3 8B, Llama 3.1 405B
Custom — bring your own model via GGUF upload or VLLM endpoint

New models are typically available within 48 hours of public release. We run compatibility tests against the unified API format before enabling any new model.

Getting Started

You can start using StellarCloud in under two minutes. Here’s the quickest path:

Step 1: Get Your API Key

Sign up at app.stellarbase.ai and generate an API key from the StellarCloud section. No credit card required for the free tier (100K tokens/month).

Step 2: Make Your First Call

StellarCloud is OpenAI-compatible, so you can use any existing OpenAI client library. Here’s a simple curl request:

curl https://api.stellarbase.ai/v1/chat/completions 
  -H "Authorization: Bearer sb_your_key" 
  -H "Content-Type: application/json" 
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ]
  }'

Step 3: Use Our SDK (Optional)

For a better developer experience, install the StellarBase SDK:

npm install @stellarbase/sdk

import StellarCloud from '@stellarbase/sdk';

const cloud = new StellarCloud('sb_your_key');

const response = await cloud.chat({
	model: 'claude-sonnet-4-20250514',
	messages: [{ role: 'user', content: 'Summarize this document...' }]
});

console.log(response.choices[0].message.content);

That’s it. You now have access to every major AI model through a single API. Switch models, enable smart routing, and monitor usage — all from one place. Visit stellarbase.ai/inference to learn more.