MCP/Reference resolution
Reference Resolution
[edit | edit source]Overview
[edit | edit source]Reference resolution transforms a natural language legal reference (or a document ID) into a query against the corpus. It is the bridge between how humans cite law and how the database stores it.
The resolver is used by:
- MCP
gettool: user asks for "article 1240 du code civil" → find the document - Knowledge graph compiler (future): annotations reference articles by citation → resolve to corpus document IDs
- Edge resolution: when
corpus.edges.target_idis NULL, a background job resolvesreferencestrings to document IDs
TagQuery
[edit | edit source]The universal output of any reference parser. Jurisdiction-agnostic.
<syntaxhighlight lang="python"> @dataclass(frozen=True) class TagQuery:
language: str # required, kw_only kind: str | None = None tag_filters: TagFilterSet = field(default_factory=TagFilterSet) should_sort_in_force_first: bool = False at_date: date | None = None hint: str | None = None
</syntaxhighlight>
Fields:
language: ISO 639-1 code, required (disambiguates language variants)kind: filter on document kind (legislation, decision, record...)tag_filters: TagFilterSet with all tag predicates (EQ, IN, NOT_IN, ILIKE, EXISTS, NOT_EXISTS, NORMALIZE)should_sort_in_force_first: order results withtags.in_force=truefirst, then by date descendingat_date: temporal version selection —date <= at_date AND (date_end IS NULL OR date_end > at_date)hint: optional human-readable interpretation label. TagQueries withhintare candidates (collected from all plugins, disambiguated at MCP level); those without are confident matches (first hit wins)
TagFilterSet
[edit | edit source]Unified immutable filter model. A tuple of TagFilter predicates, AND-combined.
Replaces the previous scattered dict params (tags, tags_ilike, normalize).
<syntaxhighlight lang="python"> class TagFilterOp(enum.Enum):
EQ # tags @> '{"k": "v"}' (JSONB containment)
IN # tags->>'k' = ANY(...)
NOT_IN # tags ? 'k' AND NOT (tags->>'k' = ANY(...))
ILIKE # unaccent(tags->>'k') ILIKE unaccent(pattern)
EXISTS # tags ? 'k'
NOT_EXISTS # NOT (tags ? 'k')
NORMALIZE # regexp_replace comparison (reference resolution only)
@dataclass(frozen=True) class TagFilter:
key: str # tag key or virtual key (source/jurisdiction/language) op: TagFilterOp value: str | list[str] | None = None normalize_pattern: str | None = None # only for NORMALIZE
</syntaxhighlight>
Convenience constructor: TagFilterSet.from_tags({"k": "v"}) builds EQ filters from a dict.
Resolution pipeline
[edit | edit source]input string
|
v
1. Direct ID lookup (try as corpus.documents.id — covers all kinds in one query)
| found? → return (with CID-based version redirection if at_date is set)
v
2. Jurisdiction parser (FR, EU, GB...) → TagQuery
| parsed? → execute against store
| Note: SIREN 9-digit lookup is a detector in the FR plugin (with Luhn validation)
v
3. Disambiguation (bare article number matches multiple codes → error with suggestions)
TagQuery examples
[edit | edit source]<syntaxhighlight lang="python">
- "article 1240 du code civil"
TagQuery(language="fr", kind="legislation",
tag_filters=TagFilterSet.from_tags({"article_number": "1240", "code": "Code civil"}),
should_sort_in_force_first=True)
- "article 1147 du code civil" (version in force in 2015)
TagQuery(language="fr", kind="legislation",
tag_filters=TagFilterSet.from_tags({"article_number": "1147", "code": "Code civil"}),
at_date=date(2015, 6, 15))
- "loi n 2021-1109"
TagQuery(language="fr", kind="legislation",
tag_filters=TagFilterSet.from_tags({"nature": "LOI", "number": "2021-1109"}))
- "pourvoi 20-20.648" — case number with court filter and hint (candidate)
TagQuery(language="fr", kind="decision",
tag_filters=TagFilterSet(filters=(
TagFilter(key="case_number", op=TagFilterOp.NORMALIZE,
value="20-20.648", normalize_pattern=r"[\s.\-/]"),
TagFilter(key="court", op=TagFilterOp.EQ, value="cour_cassation"),
)),
hint="pourvoi Cour de cassation")
- "486329" — CE request number
TagQuery(language="fr", kind="decision",
tag_filters=TagFilterSet(filters=(
TagFilter(key="case_number", op=TagFilterOp.NORMALIZE,
value="486329", normalize_pattern=r"[\s.\-/]"),
TagFilter(key="court", op=TagFilterOp.EQ, value="conseil_etat"),
)),
hint="requete Conseil d'Etat")
- "21/00091" — CA/TJ RG number
TagQuery(language="fr", kind="decision",
tag_filters=TagFilterSet(filters=(
TagFilter(key="case_number", op=TagFilterOp.NORMALIZE,
value="21/00091", normalize_pattern=r"[\s.\-/]"),
TagFilter(key="court", op=TagFilterOp.IN, value=["cour_appel", "tribunal_judiciaire"]),
)),
hint="RG cour d'appel ou tribunal judiciaire")
- ECLI
TagQuery(language="fr", kind="decision",
tag_filters=TagFilterSet.from_tags({"ecli": "ECLI:FR:CCASS:2024:C100001"}))
- "IDCC 3239"
TagQuery(language="fr", kind="legislation",
tag_filters=TagFilterSet.from_tags({"idcc": "3239", "in_force": "true"}))
- UK: "section 1 of the Theft Act 1968" (ILIKE for fuzzy act title)
TagQuery(language="en", kind="legislation",
tag_filters=TagFilterSet(filters=(
TagFilter(key="section_number", op=TagFilterOp.EQ, value="1"),
TagFilter(key="act_title", op=TagFilterOp.ILIKE, value="Theft Act 1968"),
)))
- DE: "§ 823 BGB"
TagQuery(language="de", kind="legislation",
tag_filters=TagFilterSet.from_tags({"paragraph": "823", "code": "BGB"}))
- EU: "Article 101 TFEU"
TagQuery(language="en", kind="legislation",
tag_filters=TagFilterSet.from_tags({"article_number": "101", "treaty": "TFEU"}))
</syntaxhighlight>
Store execution
[edit | edit source]The store translates TagQuery to SQL via the shared build_tag_filter_conditions
helper. Zero jurisdiction knowledge. Each TagFilterOp maps to a specific SQL pattern:
<syntaxhighlight lang="sql"> SELECT * FROM corpus.documents WHERE language = %(language)s -- always required
AND kind = %(kind)s
AND tags @> %(eq_batch)s -- coalesced EQ filters (GIN)
AND tags->>'k' = ANY(%(in_values)s) -- IN filter
AND unaccent(tags->>'code') ILIKE unaccent(%(code_pattern)s) -- ILIKE filter
AND regexp_replace(tags->>'case_number', %(p)s, , 'g') -- NORMALIZE filter
= regexp_replace(%(v)s, %(p)s, , 'g')
AND date <= %(at_date)s -- temporal (if at_date)
AND (date_end IS NULL OR date_end > %(at_date)s)
ORDER BY
(tags->>'in_force')::boolean DESC NULLS LAST, -- should_sort_in_force_first date DESC NULLS LAST
LIMIT 10; </syntaxhighlight>
Multi-candidate resolution
[edit | edit source]When a reference is ambiguous (e.g., a French case number that could match
multiple courts), the resolver returns multiple TagQuery instances, each with
a hint describing the interpretation. The MCP get_document tool:
- Collects all TagQueries from all jurisdiction plugins
- Separates confident (no hint) from candidates (with hint)
- Tries confident matches first — first hit wins (existing behavior)
- Tries all candidates against the store:
- 1 match → returns the document with a warning noting the interpretation
- 2+ matches → returns an error listing all candidates with IDs
- 0 matches → falls through to "not found"
The get_document tool also accepts an optional tags parameter that merges
additional EQ filters into each TagQuery, narrowing the search (e.g.,
tags={"court": "conseil_etat"}).
Jurisdiction parsers
[edit | edit source]Each jurisdiction plugin provides a parser that recognizes its citation formats:
| Plugin | Recognizes |
|---|---|
| duralex-fr | Articles of codes, loi/decret/ordonnance by number, NOR codes, IDCC, ECLI, case numbers, BOFiP IDs, named laws |
| duralex-eu | CELEX numbers, ECLI, treaty articles, directive/regulation numbers |
| Future duralex-gb | Neutral citations ([2024] UKSC 1), Act + section, SI numbers |
| Future duralex-de | § + BGB/StGB/etc, Aktenzeichen, ECLI |
Parsers are composable: the MCP server chains all installed jurisdiction parsers. First match wins.
Batch resolution (for knowledge graph)
[edit | edit source]The compiler needs to resolve millions of references. The same TagQuery mechanism is used, but with batch-friendly optimizations:
- Pre-filter by known patterns (regex on reference strings)
- Group by reference type and execute one query per group
- Cache resolved IDs in a lookup table for the duration of compilation
Edge resolution (background job)
[edit | edit source]When corpus.edges.target_id is NULL, a background job periodically attempts resolution:
<syntaxhighlight lang="sql"> SELECT id, reference FROM corpus.edges WHERE target_id IS NULL; -- For each: parse reference → TagQuery → execute → update target_id UPDATE corpus.edges SET target_id = %(resolved_id)s WHERE id = %(edge_id)s; </syntaxhighlight>