Architecture
What Dura Lex is
[edit | edit source]An open-source platform for international legal data. MIT license, ODbL data.
Structures public legal data (legislation, case law, company registries, administrative guidance) into a jurisdiction-agnostic corpus, queryable by AI agents via MCP and by humans via a web portal.
OS metaphor
[edit | edit source]| OS concept | Dura Lex equivalent |
|---|---|
| Kernel | duralex — protocols, data models, unified schema |
| Drivers | Country packages (duralex-fr, duralex-eu, future duralex-gb, duralex-de) |
| File system | Document IDs, jurisdiction-scoped URIs |
| System calls | search(), get(), browse(), guidelines(), quality_check() |
| Package manager | Compiled SQLite packages for the knowledge graph |
| Applications | duralex-mcp (MCP server), duralex-portal (web), duralex-graph (future) |
Package overview (8 packages, 6 repos)
[edit | edit source]| Package | Repo | Role | Depends on |
|---|---|---|---|
| duralex | duralex/ | Core protocols, data models, temporal, corpus schema, DocumentStore, SearchEngine, FTS. Specs + docs. | nothing |
| duralex-fr | duralex-jurisdictions/ | French jurisdiction plugin: tag schema, formatters, reference resolver, synonym thesaurus. | duralex |
| duralex-eu | duralex-jurisdictions/ | EU jurisdiction plugin: EUR-Lex, CJEU, ECHR references. | duralex |
| duralex-ingest | duralex-ingest/ | Schema DDL, universal BatchWriter, state management, HTML sanitizer. | duralex |
| duralex-ingest-fr | duralex-ingest/ | French parsers: DILA, Judilibre, BODACC, RNE, BOFiP, CADA, CE, CNIL. | duralex, ingest, fr |
| duralex-ingest-eu | duralex-ingest/ | EU parsers: EUR-Lex (Cellar), CJEU, ECHR. | duralex, ingest |
| duralex-mcp | duralex-mcp/ | MCP server: 5 tools (search, get, browse, guidelines, quality_check). Plugin discovery. Docker infra. | duralex, fr/eu |
| duralex-portal | duralex-portal/ | Web portal for human consultation of the corpus. | duralex |
Future: duralex-graph (knowledge graph compiler, separate repo).
Dependency graph
[edit | edit source]duralex no dependencies
|
+----> duralex-fr depends on: duralex
+----> duralex-eu depends on: duralex
|
+----> duralex-ingest depends on: duralex
| |
| +----> duralex-ingest-fr depends on: duralex, ingest, fr
| +----> duralex-ingest-eu depends on: duralex, ingest
|
+----> duralex-mcp depends on: duralex, fr, eu (via plugin discovery)
+----> duralex-portal depends on: duralex
+----> duralex-graph depends on: duralex (future)
Two PostgreSQL schemas
[edit | edit source]corpus.* -- source documents, citations, tag stats, source metadata graph.* -- concepts, annotations, compiled edges, compilation metadata (future)
Separate schemas in the same database. Different lifecycle:
- corpus: source data, precious (days to re-ingest), stable after ingestion.
- graph: compiled knowledge, reproducible (hours to recompile), write-heavy during compilation.
Benefits: separate VACUUM/ANALYZE, separate backup strategies, DROP SCHEMA graph CASCADE to recompile without touching corpus.
Core composition pattern
[edit | edit source]Three levels:
- Core protocol (duralex): defines WHAT can be done (DocumentStore protocol, SearchEngine protocol).
- Country implementation (duralex-fr): implements HOW for a specific jurisdiction (formatters, reference resolvers, tag schemas).
- Application (duralex-mcp): composes protocols + implementations at startup.
Reference resolution uses TagQuery as the universal interface between jurisdiction parsers and the store. This is a core architectural decision — all reference resolution across all jurisdictions flows through TagQuery. There are no typed reference classes per jurisdiction.
- Core defines:
TagQuerydataclass (language,kind,tag_filters: TagFilterSet,should_sort_in_force_first,at_date) andTagFilterSet(immutable tuple ofTagFilterpredicates with operators EQ/IN/NOT_IN/ILIKE/EXISTS/NOT_EXISTS/NORMALIZE). See MCP/Reference resolution. - Jurisdiction plugin (e.g., duralex-fr): parses "article 1240 du code civil" into
TagQuery(language="fr", kind="legislation", tag_filters=TagFilterSet.from_tags({"article_number": "1240", "code": "Code civil"})). - Store: translates
TagQueryto SQL via the sharedbuild_tag_filter_conditionsbuilder (generic, zero jurisdiction knowledge).
MCP tools: 5
[edit | edit source]search, get, browse, guidelines, quality_check. Each tool is jurisdiction-agnostic. The jurisdiction plugins provide tag schemas, formatters, and reference resolvers — not tools.
Language boundary
[edit | edit source]- Code (protocols, classes, functions, variables, docstrings): English.
- Content (concept names, article text, court names, legal vocabulary): jurisdiction language.