Specialized Models
Small, fast, purpose-built models for language detection, entity recognition, lemmatization, embeddings, and re-ranking. These are the workhorses that power StellarBase internally — and they're all available as APIs.
Text processing
Language Detection
Identifies the language of a text across 1,000+ languages, including rare and low-resource ones. Lightweight, fast, CPU-only.
- Use case: route multilingual corpora to the right downstream model; tag content in mixed-language pipelines
- Accuracy: > 99% on well-formed text of 100+ characters
- Unit: per 1M requests
Lemmatization
Reduces inflected words to their base forms across 60+ languages. Useful for keyword search, topic modeling, and improving downstream NER performance.
- Use case: search over inflected languages (Czech, Polish, Hungarian, Finnish)
- Unit: per 1K documents
Zero-shot Named Entity Recognition
You specify the entity types you want (“person, medication, trial_id”) and the model finds them — no fine-tuning required.
- Use case: custom entity extraction on domain-specific corpora
- Strengths: flexibility (any entity type via prompt), multilingual
- Unit: per 1M tokens
Entity Linking
Takes detected entities and links them to canonical identifiers — Wikidata, your internal knowledge base, or any custom graph.
- Use case: resolve “Dr. Nováková” to a specific person in your HR directory
- Unit: per 1K documents
Document processing
StellarOCR
Composite document-processing engine — one endpoint that returns text, layout, tables, figures, and math formulas from any PDF or image. Detailed in StellarOCR docs.
- Unit: per 1K pages
- Billing: flat rate regardless of how many internal models run
Embeddings
Multilingual Embeddings — Fast
Fast multilingual embeddings for large-scale workloads.
- Use case: large-scale semantic search, clustering, duplicate detection
- Unit: per 1M tokens
Multilingual Embeddings — High-recall
Higher-quality multilingual embeddings where recall matters most.
- Use case: retrieval for mission-critical workloads where a few extra points of recall matter
- Unit: per 1M tokens
Long-context Embeddings
Multilingual embeddings tuned for long passages and whole-document retrieval.
- Use case: embed an entire contract or paper as one vector
- Unit: per 1M tokens
Image Embeddings
Visual embeddings that work directly on pixels — no text captions or labels needed.
- Use case: image search, duplicate detection, cross-modal retrieval
- Unit: per 1K images
Retrieval
Reranker
Given a query and a set of candidate passages, scores each for relevance. Dramatically sharper than embedding similarity alone.
- Use case: re-rank the top results from a retrieval step before handing to the LLM
- Unit: per 1K searches
Choosing
| Task | Capability |
|---|---|
| Detect language | Language Detection |
| Extract entities (any type) | Zero-shot NER |
| Link entities to a KB | Entity Linking |
| Text → vector (fast, cheap) | Multilingual Embeddings — Fast |
| Text → vector (best recall) | Multilingual Embeddings — High-recall |
| Long documents → single vector | Long-context Embeddings |
| Image → vector | Image Embeddings |
| Re-rank search results | Reranker |
| Parse any document | StellarOCR |
Pricing
Per-unit pricing in EUR, no tiers, no minimums. See the full pricing table.
Self-hosted
Every model on this page runs inside the StellarBase on-premise bundle. For air-gapped deployments, model weights ship as signed data packs and update on your schedule. See On-Premise.
Custom models
Bring your own — HuggingFace endpoints, REST APIs, gRPC services, local checkpoints. Register once as a tool and agents and workflows can call it. See Agents.
