StellarBase
Docs StellarBase Platform StellarOCR
StellarBase Platform

StellarOCR

One endpoint that turns any document — PDF, photo, scan, image — into clean, structured output. Text, layout, tables, figures, math formulas all preserved.

What StellarOCR does

StellarOCR is a composite document-processing engine. A single upload of a PDF or image returns:

  • Structured text with reading order preserved
  • Layout hierarchy — headers, paragraphs, lists, captions
  • Tables as structured rows/columns (not flattened to text)
  • Figures extracted as separate images with captions linked
  • Math formulas converted to LaTeX
  • Language detected per region
  • Bounding boxes for every element, for click-to-locate in the original

When it’s used

Automatically, on every document that enters the knowledge base. You don’t invoke it directly in normal usage — the ingestion pipeline does. When someone uploads a contract, a paper, a scanned invoice, or a photo of a whiteboard, StellarOCR handles the extraction before downstream agents see the content.

It’s also available as a standalone API on StellarCloud, if you want to use it outside StellarBase.

Why one endpoint

OCR and document understanding traditionally require a pipeline: layout detector → text recognizer → table recognizer → formula recognizer → stitching. StellarOCR orchestrates that pipeline for you. You send a PDF; you get structured output. One endpoint, one bill.

What it handles well

Born-digital PDFs

Text extractable directly — no OCR needed for the glyphs themselves. StellarOCR still runs layout detection to preserve tables and figures.

Scanned PDFs

Typical office scans, legal docs, old academic papers. Handles mixed orientations, skew, moderate noise. Multi-column layouts preserved.

Photos of documents

Phone photos of contracts, whiteboards, handwritten notes, book pages. Perspective correction and illumination normalization run automatically.

Complex tables

Merged cells, nested headers, multi-page tables with repeated headers — all reconstructed as proper table structure, not flattened into text.

Math

Inline and display equations converted to LaTeX. Works for typeset math and handwritten formulas (with somewhat lower accuracy).

Mixed-language documents

EU regulations in five languages side-by-side, patient charts with Czech running text and Latin drug names — all handled without configuration. See Multilingual.

What it struggles with

Be realistic:

  • Extreme cursive handwriting — if a human expert would need time to read it, StellarOCR will too. Expect lower accuracy.
  • Very low-resolution scans — below 150 DPI, typical OCR failure modes apply.
  • Heavy creative typography — stylized fonts, extreme kerning, decorative characters. Marketing materials can be tricky.
  • Diagrams with implicit semantics — flowcharts, ER diagrams, org charts. Shapes and arrows are detected but relationships may need manual review.

For these cases, StellarOCR produces confidence scores you can filter on. Low-confidence regions can be routed to human-in-the-loop review.

Under the hood

StellarOCR is a composition of specialized models — layout detector, text recognizer, table recognizer, math recognizer. The specific models are an implementation detail that can change without breaking your integration. You get one API, one billing meter, one SLA.

StellarOCR is billed as a single unit (per 1K pages) on StellarCloud pricing. You don’t pay separately for each underlying model.

Output format

The default output is a structured JSON document with:

  • Page-by-page breakdown
  • Each block tagged with its type (heading, paragraph, table, figure, formula)
  • Reading-order preserved
  • Bounding boxes for click-through
  • Optional Markdown export for simpler downstream consumption

For knowledge-base ingestion, the output is consumed directly by the indexing pipeline — you never see the JSON unless you call the API directly.

Related