StellarOCR
One endpoint that turns any document — PDF, photo, scan, image — into clean, structured output. Text, layout, tables, figures, math formulas all preserved.
What StellarOCR does
StellarOCR is a composite document-processing engine. A single upload of a PDF or image returns:
- Structured text with reading order preserved
- Layout hierarchy — headers, paragraphs, lists, captions
- Tables as structured rows/columns (not flattened to text)
- Figures extracted as separate images with captions linked
- Math formulas converted to LaTeX
- Language detected per region
- Bounding boxes for every element, for click-to-locate in the original
When it’s used
Automatically, on every document that enters the knowledge base. You don’t invoke it directly in normal usage — the ingestion pipeline does. When someone uploads a contract, a paper, a scanned invoice, or a photo of a whiteboard, StellarOCR handles the extraction before downstream agents see the content.
It’s also available as a standalone API on StellarCloud, if you want to use it outside StellarBase.
Why one endpoint
OCR and document understanding traditionally require a pipeline: layout detector → text recognizer → table recognizer → formula recognizer → stitching. StellarOCR orchestrates that pipeline for you. You send a PDF; you get structured output. One endpoint, one bill.
What it handles well
Born-digital PDFs
Text extractable directly — no OCR needed for the glyphs themselves. StellarOCR still runs layout detection to preserve tables and figures.
Scanned PDFs
Typical office scans, legal docs, old academic papers. Handles mixed orientations, skew, moderate noise. Multi-column layouts preserved.
Photos of documents
Phone photos of contracts, whiteboards, handwritten notes, book pages. Perspective correction and illumination normalization run automatically.
Complex tables
Merged cells, nested headers, multi-page tables with repeated headers — all reconstructed as proper table structure, not flattened into text.
Math
Inline and display equations converted to LaTeX. Works for typeset math and handwritten formulas (with somewhat lower accuracy).
Mixed-language documents
EU regulations in five languages side-by-side, patient charts with Czech running text and Latin drug names — all handled without configuration. See Multilingual.
What it struggles with
Be realistic:
- Extreme cursive handwriting — if a human expert would need time to read it, StellarOCR will too. Expect lower accuracy.
- Very low-resolution scans — below 150 DPI, typical OCR failure modes apply.
- Heavy creative typography — stylized fonts, extreme kerning, decorative characters. Marketing materials can be tricky.
- Diagrams with implicit semantics — flowcharts, ER diagrams, org charts. Shapes and arrows are detected but relationships may need manual review.
For these cases, StellarOCR produces confidence scores you can filter on. Low-confidence regions can be routed to human-in-the-loop review.
Under the hood
StellarOCR is a composition of specialized models — layout detector, text recognizer, table recognizer, math recognizer. The specific models are an implementation detail that can change without breaking your integration. You get one API, one billing meter, one SLA.
StellarOCR is billed as a single unit (per 1K pages) on StellarCloud pricing. You don’t pay separately for each underlying model.
Output format
The default output is a structured JSON document with:
- Page-by-page breakdown
- Each block tagged with its type (heading, paragraph, table, figure, formula)
- Reading-order preserved
- Bounding boxes for click-through
- Optional Markdown export for simpler downstream consumption
For knowledge-base ingestion, the output is consumed directly by the indexing pipeline — you never see the JSON unless you call the API directly.
Related
- Supported Formats — the full list of file types
- Multilingual — cross-language OCR behaviour
- StellarCloud API — using OCR as a standalone API
