Multilingual

European knowledge lives in 24+ languages. Your search shouldn't stop at English. StellarBase ingests, indexes, and retrieves across languages by default — no configuration required.

What “multilingual” means here

Three distinct capabilities, often conflated:

1. Ingestion in any language

Every document is processed regardless of its language. Language is detected automatically — you don’t have to tag sources. OCR, layout detection, and entity extraction all work across 24+ European languages out of the box.

2. Cross-lingual retrieval

A query in Czech finds relevant German, French, and Polish documents. This is not a translation layer — it’s a shared embedding space. Multilingual embedding models encode content from any language into the same vector space, so semantic matches work across the corpus regardless of the language they were written in.

3. Inline translation for reading

When you click through to a document in a language you don’t speak, an inline translation appears alongside the original. The original passage is always preserved for citation — translation is a reading aid, not a replacement.

Supported languages

Full first-class support for the following, including OCR, entity extraction, cross-lingual embeddings, and translation:

Region	Languages
Central + Eastern Europe	Czech, Slovak, Polish, Hungarian, Romanian, Bulgarian, Slovenian, Croatian, Serbian
Western Europe	English, German, French, Italian, Spanish, Portuguese, Dutch
Nordic	Swedish, Norwegian, Danish, Finnish
Baltic	Estonian, Latvian, Lithuanian
Other European	Greek, Maltese, Irish

Lower-resource languages (handled but with potentially reduced accuracy): Icelandic, Luxembourgish, Welsh, Basque, Catalan, Galician.

Beyond European: global language support via the same embedding models (Arabic, Mandarin, Japanese, Hindi, Turkish, Hebrew, Russian, Ukrainian). OCR accuracy varies for non-Latin scripts.

How it works in practice

Search

When you type a query, StellarBase:

Detects your query language
Embeds the query using the multilingual embedder
Retrieves the top-K passages from the entire corpus (any language)
Presents results with language badges (de, fr, cs, etc.)
Offers inline translation for non-matching-language hits

Agents

Agents work in any language. A single agent can read Czech input, pull German source material, and produce an English summary — all in one turn. Citations preserve the original language; the agent’s output adapts to your preference or to Base default.

Entity resolution

“Jana Nováková”, “J. Novák”, and “Novak, Jana” resolve to the same person — even across documents in different languages. The DSM engine normalizes transliteration, diacritics, and common variants.

Multilingual OCR

StellarOCR detects language per region within a document, so a page with English body text and a Latin footer, or a Czech patient record with German drug names, is handled correctly. See StellarOCR.

Translation tools

Two options depending on sensitivity:

Local translation — local open-source translation models run inside your deployment. Private, no external call.
External translation — DeepL, Google Translate, AWS Translate via connector. Faster for low-resource languages, but requires StellarGate anonymization first if the content is sensitive.

Localization of the UI

The StellarBase UI itself is available in Czech, English, German, French, Italian, Spanish, Polish, Slovak, Hungarian, and Dutch. Per-user language preference is respected. Additional UI languages are added based on customer demand.

StellarOCR — multilingual document processing
Search — cross-lingual retrieval details
Specialized Models — the embedding models used