Philosophy: Difference between revisions

From Dura Lex Wiki
Jump to navigation Jump to search
Philosophy page — founding argument, CNB position, 4 pillars, digital commons (via create-page on MediaWiki MCP Server)
 
Rewrite: less paper verbatim, add legal privilege, aspirational tone, international scope, not France-centric (via update-page on MediaWiki MCP Server)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
The founding argument of Dura Lex.
== The reality ==
== The reality ==


Citizens and lawyers increasingly use AI to handle legal questions — drafting contracts, researching case law, understanding their rights, preparing arguments. This trend will not reverse. AI is becoming a primary interface to the law.
Citizens, lawyers, and organizations increasingly use AI to handle legal questions. This trend will not reverse. AI is becoming a primary interface to the law, worldwide.


The question is not whether people will use AI for law. They already do. The question is whether they will do it safely.
The question is not whether people will use AI for law. They already do. The question is whether they will do it '''safely'''.


== The problem ==
== The problem ==
=== Black boxes ===


Current legal AI tools are black boxes. The data they rely on is opaque. Their reasoning is not auditable. Their confidentiality commitments are neither provable nor verifiable.
Current legal AI tools are black boxes. The data they rely on is opaque. Their reasoning is not auditable. Their confidentiality commitments are neither provable nor verifiable.
Line 14: Line 14:
''« Les engagements de confidentialité des fournisseurs de solutions d'IA sont ni prouvables ni vérifiables. »''
''« Les engagements de confidentialité des fournisseurs de solutions d'IA sont ni prouvables ni vérifiables. »''


— CNB, ''Guide de la déontologie et de l'intelligence artificielle'', adopted March 13, 2026
French National Bar Council (CNB), ''Guide de la déontologie et de l'intelligence artificielle'', March 2026
</blockquote>
</blockquote>


The French National Bar Council (CNB) states that sharing client data with external AI systems may breach professional secrecy obligations (''secret professionnel''). No current commercial legal AI offers full auditability of its data sources, processing pipeline, or reasoning chain.
=== No legal privilege ===
 
Attorney-client privilege does not extend to AI conversations. When a lawyer, a company, or an individual consults a cloud AI about a legal matter, those conversations are stored on someone else's servers. They can be:
 
* '''Seized''' in court proceedings or by regulators
* '''Subpoenaed''' via domestic or foreign discovery (Cloud Act, FISA)
* '''Exposed''' in a data breach
* '''Used''' for profiling or training, depending on the provider's terms


The risk is not theoretical. Disciplinary sanctions, civil liability, and criminal exposure under articles 226-13 and 226-14 of the French Penal Code.
There is no privilege, no secrecy, no control. Every cloud-hosted legal conversation is a liability.


<blockquote>
=== Hallucination ===
'''References:''' CNB, ''Guide de la déontologie et de l'intelligence artificielle'', March 2026. CNB, ''Guide pratique d'utilisation des systèmes d'IAG'', September 2024. CNB, ''Grille de lecture — Intelligence artificielle'', June 2025.
 
</blockquote>
LLMs hallucinate on legal content. Even augmented by RAG, hallucination rates remain 17% (Lexis+ AI) to 43% (GPT-4) according to recent empirical studies. Up to 88% of legal citations can be invented. Courts have already started sanctioning AI-generated legal filings.


== The mission ==
== The mission ==


Dura Lex does not aim to prevent AI usage in law — it aims to make it '''safe'''.
Dura Lex does not aim to prevent AI usage in law — it aims to make it '''safe'''. Our goal:
 
Four pillars:


; Safety
; Safety
: Strict guidelines, quality checks, content quality levels on every document. The system never hides uncertainty — it expresses it. Every document carries its reliability level. Every gap in coverage is flagged. A <code>quality_check</code> tool lets the AI self-audit its own response against the corpus.
: Strict guidelines, quality checks, content quality tracking on every document. The system should never hide uncertainty — it should express it. Gaps in coverage should be flagged, not concealed.


; Transparency
; Transparency
: Everything is traceable and auditable. Every document has a provenance. Every enrichment is tagged with its method and confidence level. Every reasoning path can be verified against the source. <code>content_quality</code> shows document reliability. <code>needs_review</code> flags anomalies. <code>translation_quality</code> distinguishes official from machine translations.
: Everything traceable and auditable, from source data to final answer. Every document carries its provenance. Every enrichment carries its method and confidence. This is where we are heading — not every link in the chain is fully auditable today, but the architecture is designed for it and we are building toward it.


; Sovereignty
; Sovereignty
: The entire stack can run on-premise, on sovereign European infrastructure, or fully air-gapped. No dependency on foreign cloud providers. No data leaves without explicit choice. The law comes to your data — not your data to someone else's cloud.
: The entire stack deployable on-premise, on sovereign infrastructure, or fully air-gapped. The law comes to the user's data — not the user's data to someone else's cloud. No dependency on foreign providers.


; Professional secrecy
; Professional secrecy
: Conversations, queries, and research stay under the user's control. Multiple privacy modes from standard to air-gapped. Designed for the requirements of ''secret professionnel'' as defined by the CNB.
: Conversations, queries, and research under the user's control. The architecture is designed so that sensitive data never needs to leave the user's perimeter.
 
== The answer: an open operating system for legal data ==
 
Dura Lex is architected as an '''operating system for law''':


== The answer: digital commons ==
* A '''jurisdiction-agnostic kernel''' — protocols, data types, URI schema, independent of any country
* '''Jurisdiction drivers''' — one plugin per legal system (France and EU today, designed for any country)
* A '''robust ingestion pipeline''' — structured, versioned, reproducible
* '''Services''' — MCP server for AI agents, web portal for humans, full-text search with per-language stemming


Dura Lex is the opposite of a black box.
France is the first implementation, not the scope. The schema follows the OpenStreetMap model: a single <code>documents</code> table with JSONB tags. Six universal structural kinds (<code>legislation</code>, <code>decision</code>, <code>record</code>, <code>notice</code>, <code>section</code>, <code>chunk</code>) cover every document type we have encountered across 25+ jurisdictions tested. Legal categories live in tags, not in the schema. Adding a jurisdiction requires zero schema migration.


=== Open source, open data ===
=== Open source, open data ===
Line 53: Line 65:
! Component !! License !! Rationale
! Component !! License !! Rationale
|-
|-
| Software (all packages) || '''MIT''' || Maximum adoption — anyone can use, fork, embed, commercialize without restriction
| Software || '''MIT''' || Anyone can use, fork, embed, commercialize without restriction
|-
|-
| Enriched data (corpus, edges, annotations) || '''ODbL''' || Share-alike for data — improvements flow back to the commons
| Enriched data || '''ODbL''' || Share-alike — improvements flow back to the commons, no one can close the data
|-
|-
| Raw source data || Per-source (Licence Ouverte 2.0, CC0) || Government open data — already public
| Raw source data || Per-source (Licence Ouverte, CC0, etc.) || Government open data
|}
|}


This is the OpenStreetMap model: permissive code, copyleft data. The ecosystem grows because everyone can build on it. The data stays open because no one can close it.
The OpenStreetMap model: permissive code, copyleft data.


=== Auditability ===
=== Credibility by audit, not by reputation ===


Every link in the chain is visible and verifiable:
Traditional legal publishing relies on editorial curation — selection, ranking, interpretation. This work has immense value, but it implies a filter.


* Every document carries its '''content quality''' level — from raw OCR to jurist-reviewed
Our approach: discard nothing, structure everything, and make every assertion traceable to its source. Authority comes from '''traceability''', not from a name on the cover.
* Every edge (cross-reference, amendment, citation) carries its '''provenance'''
* Every translation is tagged with its '''method''' — official, machine, human-reviewed
* Safety '''guidelines''' are loaded before every research session
* The AI can run a '''quality check''' against the corpus after answering


=== Doubt is always expressed ===
=== Doubt is always expressed ===


The system never pretends to certainty it does not have. Missing data, low-quality OCR, incomplete temporal coverage, untested jurisdictions — all are surfaced, never hidden.
The system should never pretend to certainty it does not have. Missing data, low-quality sources, incomplete coverage, untested jurisdictions — all should be surfaced, never hidden.
 
A tool that tells you "here is the answer" without showing where it looked, what it found, and what it might have missed — that is a black box.
 
A tool where every step is inspectable, every limitation is stated, and every source is cited — that is a digital common.


When a tool tells you "here is the answer" without showing you where it looked, what it found, and what it might have missed — that is a black box.
=== Unique in the landscape ===


When every step is inspectable, every limitation is stated, and every source is cited — that is a digital common.
We have analyzed 80+ legal MCP servers across 40+ jurisdictions. The vast majority are simple API relays with no behavioral framing. Dura Lex is the only project with mandatory safety guidelines injected before every research session, and the only one with a quality feedback mechanism allowing the AI to report issues in the data.

Latest revision as of 01:41, 23 April 2026

The reality

[edit | edit source]

Citizens, lawyers, and organizations increasingly use AI to handle legal questions. This trend will not reverse. AI is becoming a primary interface to the law, worldwide.

The question is not whether people will use AI for law. They already do. The question is whether they will do it safely.

The problem

[edit | edit source]

Black boxes

[edit | edit source]

Current legal AI tools are black boxes. The data they rely on is opaque. Their reasoning is not auditable. Their confidentiality commitments are neither provable nor verifiable.

« Les engagements de confidentialité des fournisseurs de solutions d'IA sont ni prouvables ni vérifiables. »

— French National Bar Council (CNB), Guide de la déontologie et de l'intelligence artificielle, March 2026

[edit | edit source]

Attorney-client privilege does not extend to AI conversations. When a lawyer, a company, or an individual consults a cloud AI about a legal matter, those conversations are stored on someone else's servers. They can be:

  • Seized in court proceedings or by regulators
  • Subpoenaed via domestic or foreign discovery (Cloud Act, FISA)
  • Exposed in a data breach
  • Used for profiling or training, depending on the provider's terms

There is no privilege, no secrecy, no control. Every cloud-hosted legal conversation is a liability.

Hallucination

[edit | edit source]

LLMs hallucinate on legal content. Even augmented by RAG, hallucination rates remain 17% (Lexis+ AI) to 43% (GPT-4) according to recent empirical studies. Up to 88% of legal citations can be invented. Courts have already started sanctioning AI-generated legal filings.

The mission

[edit | edit source]

Dura Lex does not aim to prevent AI usage in law — it aims to make it safe. Our goal:

Safety
Strict guidelines, quality checks, content quality tracking on every document. The system should never hide uncertainty — it should express it. Gaps in coverage should be flagged, not concealed.
Transparency
Everything traceable and auditable, from source data to final answer. Every document carries its provenance. Every enrichment carries its method and confidence. This is where we are heading — not every link in the chain is fully auditable today, but the architecture is designed for it and we are building toward it.
Sovereignty
The entire stack deployable on-premise, on sovereign infrastructure, or fully air-gapped. The law comes to the user's data — not the user's data to someone else's cloud. No dependency on foreign providers.
Professional secrecy
Conversations, queries, and research under the user's control. The architecture is designed so that sensitive data never needs to leave the user's perimeter.
[edit | edit source]

Dura Lex is architected as an operating system for law:

  • A jurisdiction-agnostic kernel — protocols, data types, URI schema, independent of any country
  • Jurisdiction drivers — one plugin per legal system (France and EU today, designed for any country)
  • A robust ingestion pipeline — structured, versioned, reproducible
  • Services — MCP server for AI agents, web portal for humans, full-text search with per-language stemming

France is the first implementation, not the scope. The schema follows the OpenStreetMap model: a single documents table with JSONB tags. Six universal structural kinds (legislation, decision, record, notice, section, chunk) cover every document type we have encountered across 25+ jurisdictions tested. Legal categories live in tags, not in the schema. Adding a jurisdiction requires zero schema migration.

Open source, open data

[edit | edit source]
Component License Rationale
Software MIT Anyone can use, fork, embed, commercialize without restriction
Enriched data ODbL Share-alike — improvements flow back to the commons, no one can close the data
Raw source data Per-source (Licence Ouverte, CC0, etc.) Government open data

The OpenStreetMap model: permissive code, copyleft data.

Credibility by audit, not by reputation

[edit | edit source]

Traditional legal publishing relies on editorial curation — selection, ranking, interpretation. This work has immense value, but it implies a filter.

Our approach: discard nothing, structure everything, and make every assertion traceable to its source. Authority comes from traceability, not from a name on the cover.

Doubt is always expressed

[edit | edit source]

The system should never pretend to certainty it does not have. Missing data, low-quality sources, incomplete coverage, untested jurisdictions — all should be surfaced, never hidden.

A tool that tells you "here is the answer" without showing where it looked, what it found, and what it might have missed — that is a black box.

A tool where every step is inspectable, every limitation is stated, and every source is cited — that is a digital common.

Unique in the landscape

[edit | edit source]

We have analyzed 80+ legal MCP servers across 40+ jurisdictions. The vast majority are simple API relays with no behavioral framing. Dura Lex is the only project with mandatory safety guidelines injected before every research session, and the only one with a quality feedback mechanism allowing the AI to report issues in the data.