Editing
MCP/Reference resolution
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
= Reference Resolution = == Overview == Reference resolution transforms a natural language legal reference (or a document ID) into a query against the corpus. It is the bridge between how humans cite law and how the database stores it. The resolver is used by: * '''MCP <code>get</code> tool''': user asks for "article 1240 du code civil" β find the document * '''Knowledge graph compiler''' (future): annotations reference articles by citation β resolve to corpus document IDs * '''Edge resolution''': when <code>corpus.edges.target_id</code> is NULL, a background job resolves <code>reference</code> strings to document IDs == TagQuery == The universal output of any reference parser. Jurisdiction-agnostic. <syntaxhighlight lang="python"> @dataclass(frozen=True) class TagQuery: language: str # required, kw_only kind: str | None = None tag_filters: TagFilterSet = field(default_factory=TagFilterSet) should_sort_in_force_first: bool = False at_date: date | None = None hint: str | None = None </syntaxhighlight> Fields: * <code>language</code>: ISO 639-1 code, required (disambiguates language variants) * <code>kind</code>: filter on document kind (legislation, decision, record...) * <code>tag_filters</code>: TagFilterSet with all tag predicates (EQ, IN, NOT_IN, ILIKE, EXISTS, NOT_EXISTS, NORMALIZE) * <code>should_sort_in_force_first</code>: order results with <code>tags.in_force=true</code> first, then by date descending * <code>at_date</code>: temporal version selection β <code>date <= at_date AND (date_end IS NULL OR date_end > at_date)</code> * <code>hint</code>: optional human-readable interpretation label. TagQueries '''with''' <code>hint</code> are ''candidates'' (collected from all plugins, disambiguated at MCP level); those '''without''' are ''confident matches'' (first hit wins) == TagFilterSet == Unified immutable filter model. A tuple of <code>TagFilter</code> predicates, AND-combined. Replaces the previous scattered dict params (<code>tags</code>, <code>tags_ilike</code>, <code>normalize</code>). <syntaxhighlight lang="python"> class TagFilterOp(enum.Enum): EQ # tags @> '{"k": "v"}' (JSONB containment) IN # tags->>'k' = ANY(...) NOT_IN # tags ? 'k' AND NOT (tags->>'k' = ANY(...)) ILIKE # unaccent(tags->>'k') ILIKE unaccent(pattern) EXISTS # tags ? 'k' NOT_EXISTS # NOT (tags ? 'k') NORMALIZE # regexp_replace comparison (reference resolution only) @dataclass(frozen=True) class TagFilter: key: str # tag key or virtual key (source/jurisdiction/language) op: TagFilterOp value: str | list[str] | None = None normalize_pattern: str | None = None # only for NORMALIZE </syntaxhighlight> Convenience constructor: <code>TagFilterSet.from_tags({"k": "v"})</code> builds EQ filters from a dict. == Resolution pipeline == <pre> input string | v 1. Direct ID lookup (try as corpus.documents.id β covers all kinds in one query) | found? β return (with CID-based version redirection if at_date is set) v 2. Jurisdiction parser (FR, EU, GB...) β TagQuery | parsed? β execute against store | Note: SIREN 9-digit lookup is a detector in the FR plugin (with Luhn validation) v 3. Disambiguation (bare article number matches multiple codes β error with suggestions) </pre> == TagQuery examples == <syntaxhighlight lang="python"> # "article 1240 du code civil" TagQuery(language="fr", kind="legislation", tag_filters=TagFilterSet.from_tags({"article_number": "1240", "code": "Code civil"}), should_sort_in_force_first=True) # "article 1147 du code civil" (version in force in 2015) TagQuery(language="fr", kind="legislation", tag_filters=TagFilterSet.from_tags({"article_number": "1147", "code": "Code civil"}), at_date=date(2015, 6, 15)) # "loi n 2021-1109" TagQuery(language="fr", kind="legislation", tag_filters=TagFilterSet.from_tags({"nature": "LOI", "number": "2021-1109"})) # "pourvoi 20-20.648" β case number with court filter and hint (candidate) TagQuery(language="fr", kind="decision", tag_filters=TagFilterSet(filters=( TagFilter(key="case_number", op=TagFilterOp.NORMALIZE, value="20-20.648", normalize_pattern=r"[\s.\-/]"), TagFilter(key="court", op=TagFilterOp.EQ, value="cour_cassation"), )), hint="pourvoi Cour de cassation") # "486329" β CE request number TagQuery(language="fr", kind="decision", tag_filters=TagFilterSet(filters=( TagFilter(key="case_number", op=TagFilterOp.NORMALIZE, value="486329", normalize_pattern=r"[\s.\-/]"), TagFilter(key="court", op=TagFilterOp.EQ, value="conseil_etat"), )), hint="requete Conseil d'Etat") # "21/00091" β CA/TJ RG number TagQuery(language="fr", kind="decision", tag_filters=TagFilterSet(filters=( TagFilter(key="case_number", op=TagFilterOp.NORMALIZE, value="21/00091", normalize_pattern=r"[\s.\-/]"), TagFilter(key="court", op=TagFilterOp.IN, value=["cour_appel", "tribunal_judiciaire"]), )), hint="RG cour d'appel ou tribunal judiciaire") # ECLI TagQuery(language="fr", kind="decision", tag_filters=TagFilterSet.from_tags({"ecli": "ECLI:FR:CCASS:2024:C100001"})) # "IDCC 3239" TagQuery(language="fr", kind="legislation", tag_filters=TagFilterSet.from_tags({"idcc": "3239", "in_force": "true"})) # UK: "section 1 of the Theft Act 1968" (ILIKE for fuzzy act title) TagQuery(language="en", kind="legislation", tag_filters=TagFilterSet(filters=( TagFilter(key="section_number", op=TagFilterOp.EQ, value="1"), TagFilter(key="act_title", op=TagFilterOp.ILIKE, value="Theft Act 1968"), ))) # DE: "Β§ 823 BGB" TagQuery(language="de", kind="legislation", tag_filters=TagFilterSet.from_tags({"paragraph": "823", "code": "BGB"})) # EU: "Article 101 TFEU" TagQuery(language="en", kind="legislation", tag_filters=TagFilterSet.from_tags({"article_number": "101", "treaty": "TFEU"})) </syntaxhighlight> == Store execution == The store translates TagQuery to SQL via the shared <code>build_tag_filter_conditions</code> helper. Zero jurisdiction knowledge. Each TagFilterOp maps to a specific SQL pattern: <syntaxhighlight lang="sql"> SELECT * FROM corpus.documents WHERE language = %(language)s -- always required AND kind = %(kind)s AND tags @> %(eq_batch)s -- coalesced EQ filters (GIN) AND tags->>'k' = ANY(%(in_values)s) -- IN filter AND unaccent(tags->>'code') ILIKE unaccent(%(code_pattern)s) -- ILIKE filter AND regexp_replace(tags->>'case_number', %(p)s, '', 'g') -- NORMALIZE filter = regexp_replace(%(v)s, %(p)s, '', 'g') AND date <= %(at_date)s -- temporal (if at_date) AND (date_end IS NULL OR date_end > %(at_date)s) ORDER BY (tags->>'in_force')::boolean DESC NULLS LAST, -- should_sort_in_force_first date DESC NULLS LAST LIMIT 10; </syntaxhighlight> == Multi-candidate resolution == When a reference is ambiguous (e.g., a French case number that could match multiple courts), the resolver returns multiple TagQuery instances, each with a <code>hint</code> describing the interpretation. The MCP <code>get_document</code> tool: # Collects all TagQueries from all jurisdiction plugins # Separates '''confident''' (no hint) from '''candidates''' (with hint) # Tries confident matches first β first hit wins (existing behavior) # Tries all candidates against the store: #* 1 match β returns the document with a warning noting the interpretation #* 2+ matches β returns an error listing all candidates with IDs #* 0 matches β falls through to "not found" The <code>get_document</code> tool also accepts an optional <code>tags</code> parameter that merges additional EQ filters into each TagQuery, narrowing the search (e.g., <code>tags={"court": "conseil_etat"}</code>). == Jurisdiction parsers == Each jurisdiction plugin provides a parser that recognizes its citation formats: {| class="wikitable" ! Plugin !! Recognizes |- | duralex-fr || Articles of codes, loi/decret/ordonnance by number, NOR codes, IDCC, ECLI, case numbers, BOFiP IDs, named laws |- | duralex-eu || CELEX numbers, ECLI, treaty articles, directive/regulation numbers |- | Future duralex-gb || Neutral citations ([2024] UKSC 1), Act + section, SI numbers |- | Future duralex-de || Β§ + BGB/StGB/etc, Aktenzeichen, ECLI |} Parsers are composable: the MCP server chains all installed jurisdiction parsers. First match wins. == Batch resolution (for knowledge graph) == The compiler needs to resolve millions of references. The same TagQuery mechanism is used, but with batch-friendly optimizations: * Pre-filter by known patterns (regex on reference strings) * Group by reference type and execute one query per group * Cache resolved IDs in a lookup table for the duration of compilation == Edge resolution (background job) == When <code>corpus.edges.target_id</code> is NULL, a background job periodically attempts resolution: <syntaxhighlight lang="sql"> SELECT id, reference FROM corpus.edges WHERE target_id IS NULL; -- For each: parse reference β TagQuery β execute β update target_id UPDATE corpus.edges SET target_id = %(resolved_id)s WHERE id = %(edge_id)s; </syntaxhighlight> [[Category:MCP]]
Summary:
Please note that all contributions to Dura Lex Wiki are considered to be released under the Creative Commons Attribution-ShareAlike (see
Dura Lex Wiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
Edit source
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Tools
What links here
Related changes
Page information