MCP/Reference resolution

From Dura Lex Wiki
Revision as of 02:06, 23 April 2026 by Nicolas (talk | contribs) (Create MCP/Reference resolution page from REFERENCE-RESOLUTION.md (via create-page on MediaWiki MCP Server))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Reference Resolution

[edit | edit source]

Overview

[edit | edit source]

Reference resolution transforms a natural language legal reference (or a document ID) into a query against the corpus. It is the bridge between how humans cite law and how the database stores it.

The resolver is used by:

  • MCP get tool: user asks for "article 1240 du code civil" → find the document
  • Knowledge graph compiler (future): annotations reference articles by citation → resolve to corpus document IDs
  • Edge resolution: when corpus.edges.target_id is NULL, a background job resolves reference strings to document IDs

TagQuery

[edit | edit source]

The universal output of any reference parser. Jurisdiction-agnostic.

<syntaxhighlight lang="python"> @dataclass(frozen=True) class TagQuery:

   language: str  # required, kw_only
   kind: str | None = None
   tag_filters: TagFilterSet = field(default_factory=TagFilterSet)
   should_sort_in_force_first: bool = False
   at_date: date | None = None
   hint: str | None = None

</syntaxhighlight>

Fields:

  • language: ISO 639-1 code, required (disambiguates language variants)
  • kind: filter on document kind (legislation, decision, record...)
  • tag_filters: TagFilterSet with all tag predicates (EQ, IN, NOT_IN, ILIKE, EXISTS, NOT_EXISTS, NORMALIZE)
  • should_sort_in_force_first: order results with tags.in_force=true first, then by date descending
  • at_date: temporal version selection — date <= at_date AND (date_end IS NULL OR date_end > at_date)
  • hint: optional human-readable interpretation label. TagQueries with hint are candidates (collected from all plugins, disambiguated at MCP level); those without are confident matches (first hit wins)

TagFilterSet

[edit | edit source]

Unified immutable filter model. A tuple of TagFilter predicates, AND-combined. Replaces the previous scattered dict params (tags, tags_ilike, normalize).

<syntaxhighlight lang="python"> class TagFilterOp(enum.Enum):

   EQ          # tags @> '{"k": "v"}' (JSONB containment)
   IN          # tags->>'k' = ANY(...)
   NOT_IN      # tags ? 'k' AND NOT (tags->>'k' = ANY(...))
   ILIKE       # unaccent(tags->>'k') ILIKE unaccent(pattern)
   EXISTS      # tags ? 'k'
   NOT_EXISTS  # NOT (tags ? 'k')
   NORMALIZE   # regexp_replace comparison (reference resolution only)

@dataclass(frozen=True) class TagFilter:

   key: str                              # tag key or virtual key (source/jurisdiction/language)
   op: TagFilterOp
   value: str | list[str] | None = None
   normalize_pattern: str | None = None  # only for NORMALIZE

</syntaxhighlight>

Convenience constructor: TagFilterSet.from_tags({"k": "v"}) builds EQ filters from a dict.

Resolution pipeline

[edit | edit source]
input string
    |
    v
1. Direct ID lookup (try as corpus.documents.id — covers all kinds in one query)
    |  found? → return (with CID-based version redirection if at_date is set)
    v
2. Jurisdiction parser (FR, EU, GB...) → TagQuery
    |  parsed? → execute against store
    |  Note: SIREN 9-digit lookup is a detector in the FR plugin (with Luhn validation)
    v
3. Disambiguation (bare article number matches multiple codes → error with suggestions)

TagQuery examples

[edit | edit source]

<syntaxhighlight lang="python">

  1. "article 1240 du code civil"

TagQuery(language="fr", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"article_number": "1240", "code": "Code civil"}),
   should_sort_in_force_first=True)
  1. "article 1147 du code civil" (version in force in 2015)

TagQuery(language="fr", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"article_number": "1147", "code": "Code civil"}),
   at_date=date(2015, 6, 15))
  1. "loi n 2021-1109"

TagQuery(language="fr", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"nature": "LOI", "number": "2021-1109"}))
  1. "pourvoi 20-20.648" — case number with court filter and hint (candidate)

TagQuery(language="fr", kind="decision",

   tag_filters=TagFilterSet(filters=(
       TagFilter(key="case_number", op=TagFilterOp.NORMALIZE,
                 value="20-20.648", normalize_pattern=r"[\s.\-/]"),
       TagFilter(key="court", op=TagFilterOp.EQ, value="cour_cassation"),
   )),
   hint="pourvoi Cour de cassation")
  1. "486329" — CE request number

TagQuery(language="fr", kind="decision",

   tag_filters=TagFilterSet(filters=(
       TagFilter(key="case_number", op=TagFilterOp.NORMALIZE,
                 value="486329", normalize_pattern=r"[\s.\-/]"),
       TagFilter(key="court", op=TagFilterOp.EQ, value="conseil_etat"),
   )),
   hint="requete Conseil d'Etat")
  1. "21/00091" — CA/TJ RG number

TagQuery(language="fr", kind="decision",

   tag_filters=TagFilterSet(filters=(
       TagFilter(key="case_number", op=TagFilterOp.NORMALIZE,
                 value="21/00091", normalize_pattern=r"[\s.\-/]"),
       TagFilter(key="court", op=TagFilterOp.IN, value=["cour_appel", "tribunal_judiciaire"]),
   )),
   hint="RG cour d'appel ou tribunal judiciaire")
  1. ECLI

TagQuery(language="fr", kind="decision",

   tag_filters=TagFilterSet.from_tags({"ecli": "ECLI:FR:CCASS:2024:C100001"}))
  1. "IDCC 3239"

TagQuery(language="fr", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"idcc": "3239", "in_force": "true"}))
  1. UK: "section 1 of the Theft Act 1968" (ILIKE for fuzzy act title)

TagQuery(language="en", kind="legislation",

   tag_filters=TagFilterSet(filters=(
       TagFilter(key="section_number", op=TagFilterOp.EQ, value="1"),
       TagFilter(key="act_title", op=TagFilterOp.ILIKE, value="Theft Act 1968"),
   )))
  1. DE: "§ 823 BGB"

TagQuery(language="de", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"paragraph": "823", "code": "BGB"}))
  1. EU: "Article 101 TFEU"

TagQuery(language="en", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"article_number": "101", "treaty": "TFEU"}))

</syntaxhighlight>

Store execution

[edit | edit source]

The store translates TagQuery to SQL via the shared build_tag_filter_conditions helper. Zero jurisdiction knowledge. Each TagFilterOp maps to a specific SQL pattern:

<syntaxhighlight lang="sql"> SELECT * FROM corpus.documents WHERE language = %(language)s -- always required

 AND kind = %(kind)s
 AND tags @> %(eq_batch)s                                          -- coalesced EQ filters (GIN)
 AND tags->>'k' = ANY(%(in_values)s)                               -- IN filter
 AND unaccent(tags->>'code') ILIKE unaccent(%(code_pattern)s)      -- ILIKE filter
 AND regexp_replace(tags->>'case_number', %(p)s, , 'g')          -- NORMALIZE filter
     = regexp_replace(%(v)s, %(p)s, , 'g')
 AND date <= %(at_date)s                                           -- temporal (if at_date)
 AND (date_end IS NULL OR date_end > %(at_date)s)

ORDER BY

   (tags->>'in_force')::boolean DESC NULLS LAST,                   -- should_sort_in_force_first
   date DESC NULLS LAST

LIMIT 10; </syntaxhighlight>

Multi-candidate resolution

[edit | edit source]

When a reference is ambiguous (e.g., a French case number that could match multiple courts), the resolver returns multiple TagQuery instances, each with a hint describing the interpretation. The MCP get_document tool:

  1. Collects all TagQueries from all jurisdiction plugins
  2. Separates confident (no hint) from candidates (with hint)
  3. Tries confident matches first — first hit wins (existing behavior)
  4. Tries all candidates against the store:
    • 1 match → returns the document with a warning noting the interpretation
    • 2+ matches → returns an error listing all candidates with IDs
    • 0 matches → falls through to "not found"

The get_document tool also accepts an optional tags parameter that merges additional EQ filters into each TagQuery, narrowing the search (e.g., tags={"court": "conseil_etat"}).

Jurisdiction parsers

[edit | edit source]

Each jurisdiction plugin provides a parser that recognizes its citation formats:

Plugin Recognizes
duralex-fr Articles of codes, loi/decret/ordonnance by number, NOR codes, IDCC, ECLI, case numbers, BOFiP IDs, named laws
duralex-eu CELEX numbers, ECLI, treaty articles, directive/regulation numbers
Future duralex-gb Neutral citations ([2024] UKSC 1), Act + section, SI numbers
Future duralex-de § + BGB/StGB/etc, Aktenzeichen, ECLI

Parsers are composable: the MCP server chains all installed jurisdiction parsers. First match wins.

Batch resolution (for knowledge graph)

[edit | edit source]

The compiler needs to resolve millions of references. The same TagQuery mechanism is used, but with batch-friendly optimizations:

  • Pre-filter by known patterns (regex on reference strings)
  • Group by reference type and execute one query per group
  • Cache resolved IDs in a lookup table for the duration of compilation

Edge resolution (background job)

[edit | edit source]

When corpus.edges.target_id is NULL, a background job periodically attempts resolution:

<syntaxhighlight lang="sql"> SELECT id, reference FROM corpus.edges WHERE target_id IS NULL; -- For each: parse reference → TagQuery → execute → update target_id UPDATE corpus.edges SET target_id = %(resolved_id)s WHERE id = %(edge_id)s; </syntaxhighlight>