Architecture

From Dura Lex Wiki
Revision as of 02:18, 23 April 2026 by Nicolas (talk | contribs) (Architecture page — from spec/ARCHITECTURE.md, full faithful conversion (via create-page on MediaWiki MCP Server))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

What Dura Lex is

[edit | edit source]

An open-source platform for international legal data. MIT license, ODbL data.

Structures public legal data (legislation, case law, company registries, administrative guidance) into a jurisdiction-agnostic corpus, queryable by AI agents via MCP and by humans via a web portal.

OS metaphor

[edit | edit source]
OS concept Dura Lex equivalent
Kernel duralex — protocols, data models, unified schema
Drivers Country packages (duralex-fr, duralex-eu, future duralex-gb, duralex-de)
File system Document IDs, jurisdiction-scoped URIs
System calls search(), get(), browse(), guidelines(), quality_check()
Package manager Compiled SQLite packages for the knowledge graph
Applications duralex-mcp (MCP server), duralex-portal (web), duralex-graph (future)

Package overview (8 packages, 6 repos)

[edit | edit source]
Package Repo Role Depends on
duralex duralex/ Core protocols, data models, temporal, corpus schema, DocumentStore, SearchEngine, FTS. Specs + docs. nothing
duralex-fr duralex-jurisdictions/ French jurisdiction plugin: tag schema, formatters, reference resolver, synonym thesaurus. duralex
duralex-eu duralex-jurisdictions/ EU jurisdiction plugin: EUR-Lex, CJEU, ECHR references. duralex
duralex-ingest duralex-ingest/ Schema DDL, universal BatchWriter, state management, HTML sanitizer. duralex
duralex-ingest-fr duralex-ingest/ French parsers: DILA, Judilibre, BODACC, RNE, BOFiP, CADA, CE, CNIL. duralex, ingest, fr
duralex-ingest-eu duralex-ingest/ EU parsers: EUR-Lex (Cellar), CJEU, ECHR. duralex, ingest
duralex-mcp duralex-mcp/ MCP server: 5 tools (search, get, browse, guidelines, quality_check). Plugin discovery. Docker infra. duralex, fr/eu
duralex-portal duralex-portal/ Web portal for human consultation of the corpus. duralex

Future: duralex-graph (knowledge graph compiler, separate repo).

Dependency graph

[edit | edit source]
duralex                           no dependencies
     |
     +----> duralex-fr            depends on: duralex
     +----> duralex-eu            depends on: duralex
     |
     +----> duralex-ingest        depends on: duralex
     |           |
     |           +----> duralex-ingest-fr    depends on: duralex, ingest, fr
     |           +----> duralex-ingest-eu    depends on: duralex, ingest
     |
     +----> duralex-mcp           depends on: duralex, fr, eu (via plugin discovery)
     +----> duralex-portal        depends on: duralex
     +----> duralex-graph         depends on: duralex (future)

Two PostgreSQL schemas

[edit | edit source]
corpus.*    -- source documents, citations, tag stats, source metadata
graph.*     -- concepts, annotations, compiled edges, compilation metadata (future)

Separate schemas in the same database. Different lifecycle:

  • corpus: source data, precious (days to re-ingest), stable after ingestion.
  • graph: compiled knowledge, reproducible (hours to recompile), write-heavy during compilation.

Benefits: separate VACUUM/ANALYZE, separate backup strategies, DROP SCHEMA graph CASCADE to recompile without touching corpus.

Core composition pattern

[edit | edit source]

Three levels:

  1. Core protocol (duralex): defines WHAT can be done (DocumentStore protocol, SearchEngine protocol).
  2. Country implementation (duralex-fr): implements HOW for a specific jurisdiction (formatters, reference resolvers, tag schemas).
  3. Application (duralex-mcp): composes protocols + implementations at startup.

Reference resolution uses TagQuery as the universal interface between jurisdiction parsers and the store. This is a core architectural decision — all reference resolution across all jurisdictions flows through TagQuery. There are no typed reference classes per jurisdiction.

  • Core defines: TagQuery dataclass (language, kind, tag_filters: TagFilterSet, should_sort_in_force_first, at_date) and TagFilterSet (immutable tuple of TagFilter predicates with operators EQ/IN/NOT_IN/ILIKE/EXISTS/NOT_EXISTS/NORMALIZE). See MCP/Reference resolution.
  • Jurisdiction plugin (e.g., duralex-fr): parses "article 1240 du code civil" into TagQuery(language="fr", kind="legislation", tag_filters=TagFilterSet.from_tags({"article_number": "1240", "code": "Code civil"})).
  • Store: translates TagQuery to SQL via the shared build_tag_filter_conditions builder (generic, zero jurisdiction knowledge).

MCP tools: 5

[edit | edit source]

search, get, browse, guidelines, quality_check. Each tool is jurisdiction-agnostic. The jurisdiction plugins provide tag schemas, formatters, and reference resolvers — not tools.

Language boundary

[edit | edit source]
  • Code (protocols, classes, functions, variables, docstrings): English.
  • Content (concept names, article text, court names, legal vocabulary): jurisdiction language.