Philosophy: Difference between revisions

From Dura Lex Wiki
Jump to navigation Jump to search
Enrich with paper content: justice gap, hallucination rates, TA Grenoble, 80+ MCP analysis, credibility by audit, legal boundary (via update-page on MediaWiki MCP Server)
Rewrite: less paper verbatim, add legal privilege, aspirational tone, international scope, not France-centric (via update-page on MediaWiki MCP Server)
 
Line 1: Line 1:
The founding argument of Dura Lex.
== The reality ==


== The access to justice gap ==
Citizens, lawyers, and organizations increasingly use AI to handle legal questions. This trend will not reverse. AI is becoming a primary interface to the law, worldwide.


5.1 billion people worldwide have unmet legal needs (World Justice Project, 2019). The economic cost is estimated at 0.5–3% of GDP (OECD, 2016). Access to law is not a niche problem — it is a structural deficit affecting the majority of the world's population.
The question is not whether people will use AI for law. They already do. The question is whether they will do it '''safely'''.
 
In France, 31% of citizens have given up asserting their rights (Défenseur des droits, 2017–2020). Only 11% consult a lawyer as a first resort; 40% turn to the internet. Between legal aid (capped at €12,957/year) and lawyer fees (€300/hour average), a vast ''missing middle'' has access to neither.
 
The problem extends to organizations. A company operating across jurisdictions pays law firms in each country for often recurring questions — regulatory compliance, supplier disputes, local labor law. The cost is massive, quality is hard to verify, and legal knowledge does not capitalize from one case to the next. Worse: sensitive data — contracts, litigation strategy, due diligence — is sent to third-party providers with no real control over its processing.
 
== The technological dead-ends ==
 
Existing solutions do not bridge this gap:
 
* '''Document search''' retrieves texts but does not reason. The user must already know what to look for.
* '''Judicial prediction''' promises success probabilities — but the Court of Appeal of Rennes concluded in 2017 that one such system provided "no added value". The French Ministry of Justice's DataJust project was abandoned after two years.
* '''Legal chatbots hallucinate.''' Even augmented by RAG (Retrieval-Augmented Generation), hallucination rates remain 17% (Lexis+ AI) to 43% (GPT-4) according to Magesh et al. (2025). Another study measures up to 88% of invented citations (Dahl et al., 2024).
* '''Legislative computation''' (Catala, OpenFisca) formalizes schedules and calculations, but does not qualify a legal situation. It answers "how much?", not "which law applies and with what arguments?".
 
There is a missing layer between raw legal data and reasoning: an open foundation, structured for the machine, useful to citizens and organizations alike.


== The problem: black boxes ==
== The problem ==


Citizens and lawyers increasingly use AI to handle legal questions — drafting contracts, researching case law, understanding their rights, preparing arguments. This trend will not reverse. AI is becoming a primary interface to the law.
=== Black boxes ===
 
The question is not whether people will use AI for law. They already do. The question is whether they will do it '''safely'''.


Current legal AI tools are black boxes. The data they rely on is opaque. Their reasoning is not auditable. Their confidentiality commitments are neither provable nor verifiable.
Current legal AI tools are black boxes. The data they rely on is opaque. Their reasoning is not auditable. Their confidentiality commitments are neither provable nor verifiable.
Line 31: Line 14:
''« Les engagements de confidentialité des fournisseurs de solutions d'IA sont ni prouvables ni vérifiables. »''
''« Les engagements de confidentialité des fournisseurs de solutions d'IA sont ni prouvables ni vérifiables. »''


— CNB, ''Guide de la déontologie et de l'intelligence artificielle'', adopted March 13, 2026
French National Bar Council (CNB), ''Guide de la déontologie et de l'intelligence artificielle'', March 2026
</blockquote>
</blockquote>


The French National Bar Council (CNB) states that sharing client data with external AI systems may breach professional secrecy obligations (''secret professionnel''). No current commercial legal AI offers full auditability of its data sources, processing pipeline, or reasoning chain.
=== No legal privilege ===
 
Attorney-client privilege does not extend to AI conversations. When a lawyer, a company, or an individual consults a cloud AI about a legal matter, those conversations are stored on someone else's servers. They can be:


The risk is not theoretical. The Administrative Court of Grenoble rendered in December 2025 the first French decisions sanctioning AI-generated legal filings, described as "anything but legally framed". Eight decisions followed in weeks, including the first targeting a lawyer (TJ Périgueux, December 2025). The CNB adopted its deontological guide on generative AI in direct response.
* '''Seized''' in court proceedings or by regulators
* '''Subpoenaed''' via domestic or foreign discovery (Cloud Act, FISA)
* '''Exposed''' in a data breach
* '''Used''' for profiling or training, depending on the provider's terms


Disciplinary sanctions, civil liability, and criminal exposure under articles 226-13 and 226-14 of the French Penal Code.
There is no privilege, no secrecy, no control. Every cloud-hosted legal conversation is a liability.


<blockquote>
=== Hallucination ===
'''References:''' CNB, ''Guide de la déontologie et de l'intelligence artificielle'', March 2026. CNB, ''Guide pratique d'utilisation des systèmes d'IAG'', September 2024. CNB, ''Grille de lecture — Intelligence artificielle'', June 2025. TA Grenoble, 3 December 2025, n°2510860. TJ Périgueux, 18 December 2025, n°23/00452.
 
</blockquote>
LLMs hallucinate on legal content. Even augmented by RAG, hallucination rates remain 17% (Lexis+ AI) to 43% (GPT-4) according to recent empirical studies. Up to 88% of legal citations can be invented. Courts have already started sanctioning AI-generated legal filings.


== The mission ==
== The mission ==


Dura Lex does not aim to prevent AI usage in law — it aims to make it '''safe'''.
Dura Lex does not aim to prevent AI usage in law — it aims to make it '''safe'''. Our goal:
 
Four pillars:


; Safety
; Safety
: Strict guidelines, quality checks, content quality levels on every document. The system never hides uncertainty — it expresses it. Every document carries its reliability level. Every gap in coverage is flagged. A <code>quality_check</code> tool lets the AI self-audit its own response against the corpus.
: Strict guidelines, quality checks, content quality tracking on every document. The system should never hide uncertainty — it should express it. Gaps in coverage should be flagged, not concealed.


; Transparency
; Transparency
: Everything is traceable and auditable. Every document has a provenance. Every enrichment is tagged with its method and confidence level. Every reasoning path can be verified against the source. <code>content_quality</code> shows document reliability. <code>needs_review</code> flags anomalies. <code>translation_quality</code> distinguishes official from machine translations.
: Everything traceable and auditable, from source data to final answer. Every document carries its provenance. Every enrichment carries its method and confidence. This is where we are heading — not every link in the chain is fully auditable today, but the architecture is designed for it and we are building toward it.


; Sovereignty
; Sovereignty
: The entire stack can run on-premise, on sovereign European infrastructure, or fully air-gapped. No dependency on foreign cloud providers. No data leaves without explicit choice. The law comes to your data — not your data to someone else's cloud.
: The entire stack deployable on-premise, on sovereign infrastructure, or fully air-gapped. The law comes to the user's data — not the user's data to someone else's cloud. No dependency on foreign providers.


; Professional secrecy
; Professional secrecy
: Conversations, queries, and research stay under the user's control. Multiple privacy modes from standard to air-gapped. Designed for the requirements of ''secret professionnel'' as defined by the CNB.
: Conversations, queries, and research under the user's control. The architecture is designed so that sensitive data never needs to leave the user's perimeter.
 
== The answer: an open operating system for legal data ==
 
Dura Lex is architected as an '''operating system for law''':


== The answer: digital commons ==
* A '''jurisdiction-agnostic kernel''' — protocols, data types, URI schema, independent of any country
* '''Jurisdiction drivers''' — one plugin per legal system (France and EU today, designed for any country)
* A '''robust ingestion pipeline''' — structured, versioned, reproducible
* '''Services''' — MCP server for AI agents, web portal for humans, full-text search with per-language stemming


Dura Lex is the opposite of a black box.
France is the first implementation, not the scope. The schema follows the OpenStreetMap model: a single <code>documents</code> table with JSONB tags. Six universal structural kinds (<code>legislation</code>, <code>decision</code>, <code>record</code>, <code>notice</code>, <code>section</code>, <code>chunk</code>) cover every document type we have encountered across 25+ jurisdictions tested. Legal categories live in tags, not in the schema. Adding a jurisdiction requires zero schema migration.


=== Open source, open data ===
=== Open source, open data ===
Line 72: Line 65:
! Component !! License !! Rationale
! Component !! License !! Rationale
|-
|-
| Software (all packages) || '''MIT''' || Maximum adoption — anyone can use, fork, embed, commercialize without restriction
| Software || '''MIT''' || Anyone can use, fork, embed, commercialize without restriction
|-
|-
| Enriched data (corpus, edges, annotations) || '''ODbL''' || Share-alike for data — improvements flow back to the commons
| Enriched data || '''ODbL''' || Share-alike — improvements flow back to the commons, no one can close the data
|-
|-
| Raw source data || Per-source (Licence Ouverte 2.0, CC0) || Government open data — already public
| Raw source data || Per-source (Licence Ouverte, CC0, etc.) || Government open data
|}
|}


This is the OpenStreetMap model: permissive code, copyleft data. The ecosystem grows because everyone can build on it. The data stays open because no one can close it.
The OpenStreetMap model: permissive code, copyleft data.


=== Credibility by audit, not by reputation ===
=== Credibility by audit, not by reputation ===


Traditional legal publishing relies on curation: editorial committees select, rank, interpret. This work has immense value but it implies a filter. A minority interpretation, however legally founded, may not be retained. An emerging jurisprudential trend may fly under the radar.
Traditional legal publishing relies on editorial curation — selection, ranking, interpretation. This work has immense value, but it implies a filter.
 
Dura Lex discards nothing. 3.4 million decisions are there, with their contradictions, their tensions, their minority positions. The system does not make editorial judgments about what deserves to be seen — it structures everything, and lets formal reasoning surface what is relevant for a given situation.
 
Authority does not come from a name on the cover — it comes from '''traceability''': every assertion points to its source, every reasoning is reproducible, every conclusion is auditable. The data is the proof.
 
=== Auditability ===
 
Every link in the chain is visible and verifiable:
 
* Every document carries its '''content quality''' level — from raw OCR to jurist-reviewed
* Every edge (cross-reference, amendment, citation) carries its '''provenance'''
* Every translation is tagged with its '''method''' — official, machine, human-reviewed
* Safety '''guidelines''' are loaded before every research session
* The AI can run a '''quality check''' against the corpus after answering


This traceability makes the system natively compliant with the EU AI Act (mandatory traceability, Article 53) and Article 33 of the French law of March 23, 2019 (sourced arguments, not judicial predictions).
Our approach: discard nothing, structure everything, and make every assertion traceable to its source. Authority comes from '''traceability''', not from a name on the cover.


=== Doubt is always expressed ===
=== Doubt is always expressed ===


The system never pretends to certainty it does not have. Missing data, low-quality OCR, incomplete temporal coverage, untested jurisdictions — all are surfaced, never hidden.
The system should never pretend to certainty it does not have. Missing data, low-quality sources, incomplete coverage, untested jurisdictions — all should be surfaced, never hidden.
 
When a tool tells you "here is the answer" without showing you where it looked, what it found, and what it might have missed — that is a black box.
 
When every step is inspectable, every limitation is stated, and every source is cited — that is a digital common.
 
=== Unique positioning ===


We have identified and analyzed over 80 legal MCP servers across 40+ jurisdictions. The majority are API relays: they forward queries to Légifrance, CourtListener, or EUR-Lex with minimal tool descriptions and no behavioral framing.
A tool that tells you "here is the answer" without showing where it looked, what it found, and what it might have missed — that is a black box.


Dura Lex is the only project with a mandatory <code>safety_guidelines</code> tool — a call the model must make before any research, injecting rules of conduct and jurisdictional specifics. No other project has an equivalent. We are also the only project with a quality feedback mechanism allowing the model to report issues in the data.
A tool where every step is inspectable, every limitation is stated, and every source is cited — that is a digital common.


=== Legal boundary ===
=== Unique in the landscape ===


Dura Lex provides '''documentary legal information''', which is explicitly permitted by Article 66-1 of the French law of December 31, 1971. It does not provide personalized legal advice. This boundary is built into the architecture itself: the server provides sources and safety rules, not conclusions.
We have analyzed 80+ legal MCP servers across 40+ jurisdictions. The vast majority are simple API relays with no behavioral framing. Dura Lex is the only project with mandatory safety guidelines injected before every research session, and the only one with a quality feedback mechanism allowing the AI to report issues in the data.

Latest revision as of 01:41, 23 April 2026

The reality

[edit | edit source]

Citizens, lawyers, and organizations increasingly use AI to handle legal questions. This trend will not reverse. AI is becoming a primary interface to the law, worldwide.

The question is not whether people will use AI for law. They already do. The question is whether they will do it safely.

The problem

[edit | edit source]

Black boxes

[edit | edit source]

Current legal AI tools are black boxes. The data they rely on is opaque. Their reasoning is not auditable. Their confidentiality commitments are neither provable nor verifiable.

« Les engagements de confidentialité des fournisseurs de solutions d'IA sont ni prouvables ni vérifiables. »

— French National Bar Council (CNB), Guide de la déontologie et de l'intelligence artificielle, March 2026

[edit | edit source]

Attorney-client privilege does not extend to AI conversations. When a lawyer, a company, or an individual consults a cloud AI about a legal matter, those conversations are stored on someone else's servers. They can be:

  • Seized in court proceedings or by regulators
  • Subpoenaed via domestic or foreign discovery (Cloud Act, FISA)
  • Exposed in a data breach
  • Used for profiling or training, depending on the provider's terms

There is no privilege, no secrecy, no control. Every cloud-hosted legal conversation is a liability.

Hallucination

[edit | edit source]

LLMs hallucinate on legal content. Even augmented by RAG, hallucination rates remain 17% (Lexis+ AI) to 43% (GPT-4) according to recent empirical studies. Up to 88% of legal citations can be invented. Courts have already started sanctioning AI-generated legal filings.

The mission

[edit | edit source]

Dura Lex does not aim to prevent AI usage in law — it aims to make it safe. Our goal:

Safety
Strict guidelines, quality checks, content quality tracking on every document. The system should never hide uncertainty — it should express it. Gaps in coverage should be flagged, not concealed.
Transparency
Everything traceable and auditable, from source data to final answer. Every document carries its provenance. Every enrichment carries its method and confidence. This is where we are heading — not every link in the chain is fully auditable today, but the architecture is designed for it and we are building toward it.
Sovereignty
The entire stack deployable on-premise, on sovereign infrastructure, or fully air-gapped. The law comes to the user's data — not the user's data to someone else's cloud. No dependency on foreign providers.
Professional secrecy
Conversations, queries, and research under the user's control. The architecture is designed so that sensitive data never needs to leave the user's perimeter.
[edit | edit source]

Dura Lex is architected as an operating system for law:

  • A jurisdiction-agnostic kernel — protocols, data types, URI schema, independent of any country
  • Jurisdiction drivers — one plugin per legal system (France and EU today, designed for any country)
  • A robust ingestion pipeline — structured, versioned, reproducible
  • Services — MCP server for AI agents, web portal for humans, full-text search with per-language stemming

France is the first implementation, not the scope. The schema follows the OpenStreetMap model: a single documents table with JSONB tags. Six universal structural kinds (legislation, decision, record, notice, section, chunk) cover every document type we have encountered across 25+ jurisdictions tested. Legal categories live in tags, not in the schema. Adding a jurisdiction requires zero schema migration.

Open source, open data

[edit | edit source]
Component License Rationale
Software MIT Anyone can use, fork, embed, commercialize without restriction
Enriched data ODbL Share-alike — improvements flow back to the commons, no one can close the data
Raw source data Per-source (Licence Ouverte, CC0, etc.) Government open data

The OpenStreetMap model: permissive code, copyleft data.

Credibility by audit, not by reputation

[edit | edit source]

Traditional legal publishing relies on editorial curation — selection, ranking, interpretation. This work has immense value, but it implies a filter.

Our approach: discard nothing, structure everything, and make every assertion traceable to its source. Authority comes from traceability, not from a name on the cover.

Doubt is always expressed

[edit | edit source]

The system should never pretend to certainty it does not have. Missing data, low-quality sources, incomplete coverage, untested jurisdictions — all should be surfaced, never hidden.

A tool that tells you "here is the answer" without showing where it looked, what it found, and what it might have missed — that is a black box.

A tool where every step is inspectable, every limitation is stated, and every source is cited — that is a digital common.

Unique in the landscape

[edit | edit source]

We have analyzed 80+ legal MCP servers across 40+ jurisdictions. The vast majority are simple API relays with no behavioral framing. Dura Lex is the only project with mandatory safety guidelines injected before every research session, and the only one with a quality feedback mechanism allowing the AI to report issues in the data.