Nicolas: Import from duralex/spec/GRAPH.md — faithful conversion to wikitext (via create-page on MediaWiki MCP Server)

2026-04-23T02:12:47Z

Import from duralex/spec/GRAPH.md — faithful conversion to wikitext (via create-page on MediaWiki MCP Server)

New page

= Knowledge Graph (Draft) =

This document is a preliminary sketch. The graph layer will be refined when implementation begins. See the legalscript spec (separate repository) for the full vision.

== Scope ==

The knowledge graph is a separate system built ON TOP of the corpus. It is not part of the corpus schema. It lives in a separate PostgreSQL schema (<code>graph.*</code>) and compiles into self-contained SQLite packages for runtime navigation.

== Relationship to corpus ==

<pre>
corpus.* graph.* compiled/
documents (source texts) → concepts (legal concepts) → fr.civil.sqlite
edges (citations) → annotations (LLM/human) → fr.travail.sqlite
→ edges (concept relations) → fr.conso.sqlite
→ compilations (metadata)
</pre>

* The corpus stores source data (legislation, decisions, guidance). Precious, days to re-ingest.
* The graph stores derived knowledge (concepts, annotations). Reproducible, hours to recompile.
* Compiled packages are self-contained SQLite for runtime. No PostgreSQL dependency at query time.

== Graph schema (draft) ==

<syntaxhighlight lang="sql">
CREATE SCHEMA IF NOT EXISTS graph;

CREATE TABLE graph.concepts (
id text PRIMARY KEY, -- thematic path: fr.civil.contrat.formation.consentement.vice.dol
jurisdiction text NOT NULL,
parent_id text, -- parent concept (extends)
title text NOT NULL,
concept_type text NOT NULL, -- qualifiable, standard_ouvert, principe_directeur, procedural, bareme
defining_articles jsonb, -- references to corpus documents
metadata jsonb DEFAULT '{}',
created_at timestamptz DEFAULT now()
);

CREATE TABLE graph.annotations (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
doc_id text NOT NULL, -- references corpus.documents.id
annotation_type text NOT NULL, -- structure, defines, illustrates, condition, qualify, framework...
concept_id text, -- references graph.concepts.id
version int NOT NULL,
parent_version int,
method text NOT NULL, -- stub, llm, human, jurist
confidence text NOT NULL, -- stub, memory_only, source_checked, cross_validated, disputed
author text,
prompt_hash text,
content jsonb NOT NULL,
created_at timestamptz DEFAULT now()
);

-- PostgreSQL does not allow function calls in table-level UNIQUE constraints.
-- Use a unique index instead:
CREATE UNIQUE INDEX idx_ann_unique
ON graph.annotations (doc_id, annotation_type, coalesce(concept_id, ''), version);

CREATE TABLE graph.edges (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
source_id text NOT NULL,
target_id text NOT NULL,
kind text NOT NULL,
properties jsonb DEFAULT '{}',
UNIQUE (source_id, target_id, kind)
);

CREATE TABLE graph.compilations (
id text PRIMARY KEY,
version int NOT NULL,
compiled_at timestamptz NOT NULL,
source_commit text,
quality jsonb,
dependencies jsonb,
artifact_path text
);
</syntaxhighlight>

== Why separate from corpus ==

The graph has different structural needs:
* Annotations have versioning (version + parent_version), confidence, method, prompt_hash as first-class columns
* Concepts have concept_type and defining_articles
* No body/body_search/content_fts/language — annotations are structured JSONB, not searchable text
* Different write patterns: corpus is append-mostly, graph is recompiled in bulk

Forcing annotations into corpus.documents would leave 5 of 14 columns always NULL. That is not optimal — it is forcing.

== Corpus guarantees for the graph ==

The corpus schema guarantees properties the graph depends on:

# '''Stable IDs''' — corpus.documents.id never changes after ingestion. The graph references documents by ID.
# '''Permanent article identity''' — <code>tags.cid</code> groups temporal versions of the same article. The graph needs this to link concepts to articles across renumbering.
# '''Immutable body''' — <code>body</code> does not change after initial ingestion. Future annotation anchoring (character offsets) depends on this stability.
# '''No reverse dependency''' — corpus never references graph. The graph depends on corpus, not the inverse.

[[Category:Corpus]]

Corpus/Graph - Revision history

Nicolas: Import from duralex/spec/GRAPH.md — faithful conversion to wikitext (via create-page on MediaWiki MCP Server)