StellarBase
Docs StellarBase Platform Semantic Module (DSM)
StellarBase Platform

Dynamic Semantic Module (DSM)

The proprietary engine that builds a living semantic graph across your data — entities, relationships, references — without manual tagging and without depending on an LLM at query time.

What DSM does

When a document enters the knowledge base, DSM processes it to extract:

  • Entities — people, organizations, places, amounts, dates, identifiers, custom types
  • Relationships — typed edges between entities (Acme employs Smith; Smith authored Paper-7; Paper-7 cites Paper-3)
  • References — cross-document mentions resolved to canonical IDs
  • Concepts — abstract topics each document covers
  • Provenance — every fact traced to the document + passage that asserts it

The result is a queryable graph that grows as your corpus grows.

How it differs from RAG

Most “AI search” systems use retrieval-augmented generation (RAG): embed everything, retrieve nearest neighbours, hand them to an LLM. That works for simple Q&A but breaks down when you need:

  • Counting (“how many contracts mention CoC clauses?”)
  • Aggregation (“sum of damages claimed in the last quarter”)
  • Multi-hop reasoning (“which suppliers of company X were also flagged in audit Y?”)
  • Time-bounded queries (“papers citing Smith before 2023”)
  • Audit (“why did the agent claim X?”)

DSM provides the structure that makes those queries answerable. The LLM (when used) operates on the graph’s output, not the raw passages — so it has less room to hallucinate.

Entity resolution

“Jana Nováková”, “J. Novák”, ”novakova@acme.cz”, and “Mrs. Novákova” all reference the same person. DSM resolves them automatically — combining textual signals with external identifiers (ORCID, DOI, public corporate registries) where available.

You can override resolution decisions manually — sometimes two real people share a name and DSM merges them. The merge UI lets you split or merge entities with full audit trail.

Custom entity types

Out of the box, DSM extracts ~15 standard entity types (person, org, place, amount, etc.). For your domain-specific entities (LEGAL_CASE, CONTRACT_ID, GRAVE_ID, CLINICAL_TRIAL, ASSET_TAG), define them once and DSM extracts them across the corpus.

Definition methods:

  • Regex — for structured ID patterns (e.g. INV-\d{6})
  • Examples — provide 5–20 sample entities, DSM learns to find more
  • Dictionary — provide a known list, DSM finds them deterministically
  • Custom model — plug in your own classifier as a tool

Relationship types

Edges between entities are typed. Default vocabulary:

RelationshipExample
employed_bySmith → Acme
authoredSmith → Paper-7
cited_by / citesPaper-3 → Paper-7
located_inAcme HQ → Prague
signed_by / signedSmith → Contract-42
related_togeneric fallback when more specific can’t be inferred

Custom relationship types: define them like custom entity types.

Multilingual graph

Entities resolve across languages. “Jan Novák”, “Jano Novák”, and “Yan Novak” map to the same node. The graph is language-agnostic — facts about an entity from a Czech document are queryable from an English query, and vice versa.

Provenance

Every node and edge in the graph carries provenance: which document, which passage, which timestamp. When an agent claims “Smith employed by Acme”, you can click through to the exact sentence in the exact document where that’s asserted. If the source document changes, the graph updates accordingly.

Versioning & time

The graph is time-aware. Facts have validity ranges:

  • “Smith employed by Acme [2020 – 2023]”
  • “Smith employed by Globex [2023 – present]”

Queries can filter by time: “who worked at Acme in 2022?” returns the graph at that point in time. Useful for due diligence, litigation prep, historical analysis.

Querying the graph

Three layers of access:

  • Search UI — natural language, results enriched with graph context
  • Agents — agents can call graph queries as a tool
  • Direct query — for power users, a graph query language is exposed

Performance

The DSM graph is built incrementally as documents are ingested. New documents propagate to the graph within minutes. Queries against the graph return in milliseconds for typical workloads (graphs up to a few hundred million nodes).

Storage

The DSM graph is managed alongside StellarBase’s standard data plane. Backup, replication, and recovery follow the same policies as the rest of the platform. See Deployment.

Related