Reference Resolution

Overview

Reference resolution transforms a natural language legal reference (or a document ID) into a query against the corpus. It is the bridge between how humans cite law and how the database stores it.

The resolver is used by:

MCP get tool: user asks for "article 1240 du code civil" → find the document
Knowledge graph compiler (future): annotations reference articles by citation → resolve to corpus document IDs
Edge resolution: when corpus.edges.target_id is NULL, a background job resolves reference strings to document IDs

TagQuery

The universal output of any reference parser. Jurisdiction-agnostic.

<syntaxhighlight lang="python"> @dataclass(frozen=True) class TagQuery:

   language: str  # required, kw_only
   kind: str | None = None
   tag_filters: TagFilterSet = field(default_factory=TagFilterSet)
   should_sort_in_force_first: bool = False
   at_date: date | None = None
   hint: str | None = None

</syntaxhighlight>

Fields:

language: ISO 639-1 code, required (disambiguates language variants)
kind: filter on document kind (legislation, decision, record...)
tag_filters: TagFilterSet with all tag predicates (EQ, IN, NOT_IN, ILIKE, EXISTS, NOT_EXISTS, NORMALIZE)
should_sort_in_force_first: order results with tags.in_force=true first, then by date descending
at_date: temporal version selection — date <= at_date AND (date_end IS NULL OR date_end > at_date)
hint: optional human-readable interpretation label. TagQueries with hint are candidates (collected from all plugins, disambiguated at MCP level); those without are confident matches (first hit wins)

TagFilterSet

Unified immutable filter model. A tuple of TagFilter predicates, AND-combined. Replaces the previous scattered dict params (tags, tags_ilike, normalize).

<syntaxhighlight lang="python"> class TagFilterOp(enum.Enum):

   EQ          # tags @> '{"k": "v"}' (JSONB containment)
   IN          # tags->>'k' = ANY(...)
   NOT_IN      # tags ? 'k' AND NOT (tags->>'k' = ANY(...))
   ILIKE       # unaccent(tags->>'k') ILIKE unaccent(pattern)
   EXISTS      # tags ? 'k'
   NOT_EXISTS  # NOT (tags ? 'k')
   NORMALIZE   # regexp_replace comparison (reference resolution only)

@dataclass(frozen=True) class TagFilter:

   key: str                              # tag key or virtual key (source/jurisdiction/language)
   op: TagFilterOp
   value: str | list[str] | None = None
   normalize_pattern: str | None = None  # only for NORMALIZE

</syntaxhighlight>

Convenience constructor: TagFilterSet.from_tags({"k": "v"}) builds EQ filters from a dict.

Resolution pipeline

input string
    |
    v
1. Direct ID lookup (try as corpus.documents.id — covers all kinds in one query)
    |  found? → return (with CID-based version redirection if at_date is set)
    v
2. Jurisdiction parser (FR, EU, GB...) → TagQuery
    |  parsed? → execute against store
    |  Note: SIREN 9-digit lookup is a detector in the FR plugin (with Luhn validation)
    v
3. Disambiguation (bare article number matches multiple codes → error with suggestions)

TagQuery examples

"article 1240 du code civil"

TagQuery(language="fr", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"article_number": "1240", "code": "Code civil"}),
   should_sort_in_force_first=True)

"article 1147 du code civil" (version in force in 2015)

TagQuery(language="fr", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"article_number": "1147", "code": "Code civil"}),
   at_date=date(2015, 6, 15))

"loi n 2021-1109"

TagQuery(language="fr", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"nature": "LOI", "number": "2021-1109"}))

"pourvoi 20-20.648" — case number with court filter and hint (candidate)

TagQuery(language="fr", kind="decision",

   tag_filters=TagFilterSet(filters=(
       TagFilter(key="case_number", op=TagFilterOp.NORMALIZE,
                 value="20-20.648", normalize_pattern=r"[\s.\-/]"),
       TagFilter(key="court", op=TagFilterOp.EQ, value="cour_cassation"),
   )),
   hint="pourvoi Cour de cassation")

"486329" — CE request number

TagQuery(language="fr", kind="decision",

   tag_filters=TagFilterSet(filters=(
       TagFilter(key="case_number", op=TagFilterOp.NORMALIZE,
                 value="486329", normalize_pattern=r"[\s.\-/]"),
       TagFilter(key="court", op=TagFilterOp.EQ, value="conseil_etat"),
   )),
   hint="requete Conseil d'Etat")

"21/00091" — CA/TJ RG number

TagQuery(language="fr", kind="decision",

   tag_filters=TagFilterSet(filters=(
       TagFilter(key="case_number", op=TagFilterOp.NORMALIZE,
                 value="21/00091", normalize_pattern=r"[\s.\-/]"),
       TagFilter(key="court", op=TagFilterOp.IN, value=["cour_appel", "tribunal_judiciaire"]),
   )),
   hint="RG cour d'appel ou tribunal judiciaire")

ECLI

TagQuery(language="fr", kind="decision",

   tag_filters=TagFilterSet.from_tags({"ecli": "ECLI:FR:CCASS:2024:C100001"}))

"IDCC 3239"

TagQuery(language="fr", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"idcc": "3239", "in_force": "true"}))

UK: "section 1 of the Theft Act 1968" (ILIKE for fuzzy act title)

TagQuery(language="en", kind="legislation",

   tag_filters=TagFilterSet(filters=(
       TagFilter(key="section_number", op=TagFilterOp.EQ, value="1"),
       TagFilter(key="act_title", op=TagFilterOp.ILIKE, value="Theft Act 1968"),
   )))

DE: "§ 823 BGB"

TagQuery(language="de", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"paragraph": "823", "code": "BGB"}))

EU: "Article 101 TFEU"

TagQuery(language="en", kind="legislation",

   tag_filters=TagFilterSet.from_tags({"article_number": "101", "treaty": "TFEU"}))

</syntaxhighlight>

Store execution

The store translates TagQuery to SQL via the shared build_tag_filter_conditions helper. Zero jurisdiction knowledge. Each TagFilterOp maps to a specific SQL pattern:

<syntaxhighlight lang="sql"> SELECT * FROM corpus.documents WHERE language = %(language)s -- always required

 AND kind = %(kind)s
 AND tags @> %(eq_batch)s                                          -- coalesced EQ filters (GIN)
 AND tags->>'k' = ANY(%(in_values)s)                               -- IN filter
 AND unaccent(tags->>'code') ILIKE unaccent(%(code_pattern)s)      -- ILIKE filter
 AND regexp_replace(tags->>'case_number', %(p)s, , 'g')          -- NORMALIZE filter
     = regexp_replace(%(v)s, %(p)s, , 'g')
 AND date <= %(at_date)s                                           -- temporal (if at_date)
 AND (date_end IS NULL OR date_end > %(at_date)s)

ORDER BY

   (tags->>'in_force')::boolean DESC NULLS LAST,                   -- should_sort_in_force_first
   date DESC NULLS LAST

LIMIT 10; </syntaxhighlight>

Multi-candidate resolution

When a reference is ambiguous (e.g., a French case number that could match multiple courts), the resolver returns multiple TagQuery instances, each with a hint describing the interpretation. The MCP get_document tool:

Collects all TagQueries from all jurisdiction plugins
Separates confident (no hint) from candidates (with hint)
Tries confident matches first — first hit wins (existing behavior)
Tries all candidates against the store:
- 1 match → returns the document with a warning noting the interpretation
- 2+ matches → returns an error listing all candidates with IDs
- 0 matches → falls through to "not found"

The get_document tool also accepts an optional tags parameter that merges additional EQ filters into each TagQuery, narrowing the search (e.g., tags={"court": "conseil_etat"}).

Jurisdiction parsers

Each jurisdiction plugin provides a parser that recognizes its citation formats:

Plugin	Recognizes
duralex-fr	Articles of codes, loi/decret/ordonnance by number, NOR codes, IDCC, ECLI, case numbers, BOFiP IDs, named laws
duralex-eu	CELEX numbers, ECLI, treaty articles, directive/regulation numbers
Future duralex-gb	Neutral citations ([2024] UKSC 1), Act + section, SI numbers
Future duralex-de	§ + BGB/StGB/etc, Aktenzeichen, ECLI

Parsers are composable: the MCP server chains all installed jurisdiction parsers. First match wins.

Batch resolution (for knowledge graph)

The compiler needs to resolve millions of references. The same TagQuery mechanism is used, but with batch-friendly optimizations:

Pre-filter by known patterns (regex on reference strings)
Group by reference type and execute one query per group
Cache resolved IDs in a lookup table for the duration of compilation

Edge resolution (background job)

When corpus.edges.target_id is NULL, a background job periodically attempts resolution:

<syntaxhighlight lang="sql"> SELECT id, reference FROM corpus.edges WHERE target_id IS NULL; -- For each: parse reference → TagQuery → execute → update target_id UPDATE corpus.edges SET target_id = %(resolved_id)s WHERE id = %(edge_id)s; </syntaxhighlight>

MCP/Reference resolution

Contents

Reference Resolution

Overview

TagQuery

TagFilterSet

Resolution pipeline

TagQuery examples

Store execution

Multi-candidate resolution

Jurisdiction parsers

Batch resolution (for knowledge graph)

Edge resolution (background job)

Navigation menu

MCP/Reference resolution

Reference Resolution

Overview

TagQuery

TagFilterSet

Resolution pipeline

TagQuery examples

Store execution

Multi-candidate resolution

Jurisdiction parsers

Batch resolution (for knowledge graph)

Edge resolution (background job)

Navigation menu

Search