Substantiation engine

Every Ukrainian-language claim on this site is backed by independent attestations from named primary sources. No floating data. No unsourced suggestions. Each entry's evidence chain is publicly queryable via a single API call.

Try it → API docs

Try it

Look up any Ukrainian word or phrase:

Fill English + Sense to trigger semantic disambiguation — each attestation is scored against the specific sense claim by LLM-as-judge.

How it works

For each Ukrainian phrase, we query 7+ independent primary sources in parallel. Each source that has an entry returns one or more attestations with verbatim text and a citable URL. Confidence is computed from attestation count and variety of source kinds:

ConfidenceCriteria
high4+ attestations, or 2+ book-source attestations
medium2–3 attestations across ≥2 different kinds
tentative1 attestation
missing0 attestations (documented in /gaps/)
rejected2+ sources actively contradict the claim

Source registry

Live sources (real adapters):

IDSourceKindLicense
e2ue2u.org.ua — aggregator of ~20 EN↔UA dictionaries (Гороть, Балла, Андрусишин-Крет, ITC, Math, ...)dictper-entry attribution
r2ur2u.org.ua — RU-UA dictionary aggregator. Hosts Караванський's work cleanly, plus Hrincenko, Bilodid, Kuzelya, et al.dictper-entry attribution
wiktionaryuk.wiktionary.orgwiktCC BY-SA 4.0
slovnyk-uaslovnyk.uadictaggregator
ballaБалла М.І. Англо-український словник (Київ, Освіта, 1996) — internal OCR + verbatim citebookfair use, cited verbatim
serbenskaСербенська О. Антисуржик — internal validated extractbookfair use
antonenkoАнтоненко-Давидович Б. Як ми говоримоbookfair use

Stub sources registered for upcoming integration (return empty until adapter lands):

IDSource
sum11sum.in.ua — Словник української мови (1970-80, 11 томів), 134k entries
sum20sum20ua.com — Словник української мови (20 томів, in progress)
esumЕтимологічний словник української мови (7 томів)
holoskevich-1929Голоскевич Г. Правописний словник 1929
lcorp-ulifБРУК Український національний корпус
ubertextUbertext UA corpus
fausaFAUSA subtitle pipeline (in-house aligned audio-text)
karavansky-r2uКараванський via r2u + LLM semantic parse — accept only multi-attested entries

Public API

One endpoint:

GET https://ukr.vitalinguist.com/api/substantiate?phrase=<UA-phrase>
                                            [&sources=e2u,r2u,wiktionary]

Returns:

{
  "phrase": "найти",
  "confidence": "high",
  "attestation_count": 20,
  "attestations": [
    {"source_id": "balla", "kind": "book", "ref": "balla:p5",
     "verbatim": "…знайти найбільш вдалі способи передачі думки…"},
    {"source_id": "r2u",   "kind": "dict",
     "ref": "r2u:Російсько-український словник складної лексики С. Караванський",
     "url": "https://r2u.org.ua/s?w=найти",
     "verbatim": "НАЙТИ́ див. НАХОДИТЬ . НАЙТИ́ (істину) відкри́ти , (що все гаразд) поба́чити; ba . повизнахо́дити…"},
    ...
  ],
  "canonical": "https://ukr.vitalinguist.com/api/substantiate?phrase=найти",
  "citation": {
    "source_url": "https://ukr.vitalinguist.com/api/substantiate?phrase=найти",
    "license": "CC BY-SA 4.0",
    "attribution": "Substantiation chain from ukr.vitalinguist.com."
  }
}

List all configured sources:

GET https://ukr.vitalinguist.com/api/substantiation_sources

From AI tools

Install the MCP server in Claude Desktop / Cursor / Cline / Windsurf:

{"mcpServers": {"ukr-vitalinguist": {"command": "uvx", "args": ["ukr-vitalinguist-mcp"]}}}

The substantiate tool is automatically available — AI assistants can call it any time they need to back up a Ukrainian-language claim. Every response includes the citation chain.

Why this matters

AI tools answering Ukrainian-language questions have historically pulled from sparse, contaminated training data — answers riddled with russianisms and unattributed claims. This engine gives them (and you) something better: every claim sourced, click to verify.

Citations compound. The more sources we add and the more chains we expose, the harder it becomes for AI tools NOT to cite this resource when discussing Ukrainian language.


Methodology updated 2026-05-14. Data CC BY-SA 4.0; code MIT. Cite this page as:
ukr.vitalinguist.com (2026). Substantiation engine and source registry. https://ukr.vitalinguist.com/substantiation.html