<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.dura-lex.org/index.php?action=history&amp;feed=atom&amp;title=Corpus%2FGraph</id>
	<title>Corpus/Graph - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.dura-lex.org/index.php?action=history&amp;feed=atom&amp;title=Corpus%2FGraph"/>
	<link rel="alternate" type="text/html" href="https://wiki.dura-lex.org/index.php?title=Corpus/Graph&amp;action=history"/>
	<updated>2026-04-23T05:37:26Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.3</generator>
	<entry>
		<id>https://wiki.dura-lex.org/index.php?title=Corpus/Graph&amp;diff=53&amp;oldid=prev</id>
		<title>Nicolas: Import from duralex/spec/GRAPH.md — faithful conversion to wikitext (via create-page on MediaWiki MCP Server)</title>
		<link rel="alternate" type="text/html" href="https://wiki.dura-lex.org/index.php?title=Corpus/Graph&amp;diff=53&amp;oldid=prev"/>
		<updated>2026-04-23T02:12:47Z</updated>

		<summary type="html">&lt;p&gt;Import from duralex/spec/GRAPH.md — faithful conversion to wikitext (via create-page on MediaWiki MCP Server)&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;= Knowledge Graph (Draft) =&lt;br /&gt;
&lt;br /&gt;
This document is a preliminary sketch. The graph layer will be refined when implementation begins. See the legalscript spec (separate repository) for the full vision.&lt;br /&gt;
&lt;br /&gt;
== Scope ==&lt;br /&gt;
&lt;br /&gt;
The knowledge graph is a separate system built ON TOP of the corpus. It is not part of the corpus schema. It lives in a separate PostgreSQL schema (&amp;lt;code&amp;gt;graph.*&amp;lt;/code&amp;gt;) and compiles into self-contained SQLite packages for runtime navigation.&lt;br /&gt;
&lt;br /&gt;
== Relationship to corpus ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
corpus.*                              graph.*                          compiled/&lt;br /&gt;
  documents (source texts)     →      concepts (legal concepts)     →  fr.civil.sqlite&lt;br /&gt;
  edges (citations)            →      annotations (LLM/human)       →  fr.travail.sqlite&lt;br /&gt;
                               →      edges (concept relations)     →  fr.conso.sqlite&lt;br /&gt;
                               →      compilations (metadata)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The corpus stores source data (legislation, decisions, guidance). Precious, days to re-ingest.&lt;br /&gt;
* The graph stores derived knowledge (concepts, annotations). Reproducible, hours to recompile.&lt;br /&gt;
* Compiled packages are self-contained SQLite for runtime. No PostgreSQL dependency at query time.&lt;br /&gt;
&lt;br /&gt;
== Graph schema (draft) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sql&amp;quot;&amp;gt;&lt;br /&gt;
CREATE SCHEMA IF NOT EXISTS graph;&lt;br /&gt;
&lt;br /&gt;
CREATE TABLE graph.concepts (&lt;br /&gt;
    id                text PRIMARY KEY,    -- thematic path: fr.civil.contrat.formation.consentement.vice.dol&lt;br /&gt;
    jurisdiction      text NOT NULL,&lt;br /&gt;
    parent_id         text,                -- parent concept (extends)&lt;br /&gt;
    title             text NOT NULL,&lt;br /&gt;
    concept_type      text NOT NULL,       -- qualifiable, standard_ouvert, principe_directeur, procedural, bareme&lt;br /&gt;
    defining_articles jsonb,               -- references to corpus documents&lt;br /&gt;
    metadata          jsonb DEFAULT &amp;#039;{}&amp;#039;,&lt;br /&gt;
    created_at        timestamptz DEFAULT now()&lt;br /&gt;
);&lt;br /&gt;
&lt;br /&gt;
CREATE TABLE graph.annotations (&lt;br /&gt;
    id              bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,&lt;br /&gt;
    doc_id          text NOT NULL,         -- references corpus.documents.id&lt;br /&gt;
    annotation_type text NOT NULL,         -- structure, defines, illustrates, condition, qualify, framework...&lt;br /&gt;
    concept_id      text,                  -- references graph.concepts.id&lt;br /&gt;
    version         int NOT NULL,&lt;br /&gt;
    parent_version  int,&lt;br /&gt;
    method          text NOT NULL,         -- stub, llm, human, jurist&lt;br /&gt;
    confidence      text NOT NULL,         -- stub, memory_only, source_checked, cross_validated, disputed&lt;br /&gt;
    author          text,&lt;br /&gt;
    prompt_hash     text,&lt;br /&gt;
    content         jsonb NOT NULL,&lt;br /&gt;
    created_at      timestamptz DEFAULT now()&lt;br /&gt;
);&lt;br /&gt;
&lt;br /&gt;
-- PostgreSQL does not allow function calls in table-level UNIQUE constraints.&lt;br /&gt;
-- Use a unique index instead:&lt;br /&gt;
CREATE UNIQUE INDEX idx_ann_unique&lt;br /&gt;
    ON graph.annotations (doc_id, annotation_type, coalesce(concept_id, &amp;#039;&amp;#039;), version);&lt;br /&gt;
&lt;br /&gt;
CREATE TABLE graph.edges (&lt;br /&gt;
    id          bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,&lt;br /&gt;
    source_id   text NOT NULL,&lt;br /&gt;
    target_id   text NOT NULL,&lt;br /&gt;
    kind        text NOT NULL,&lt;br /&gt;
    properties  jsonb DEFAULT &amp;#039;{}&amp;#039;,&lt;br /&gt;
    UNIQUE (source_id, target_id, kind)&lt;br /&gt;
);&lt;br /&gt;
&lt;br /&gt;
CREATE TABLE graph.compilations (&lt;br /&gt;
    id            text PRIMARY KEY,&lt;br /&gt;
    version       int NOT NULL,&lt;br /&gt;
    compiled_at   timestamptz NOT NULL,&lt;br /&gt;
    source_commit text,&lt;br /&gt;
    quality       jsonb,&lt;br /&gt;
    dependencies  jsonb,&lt;br /&gt;
    artifact_path text&lt;br /&gt;
);&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Why separate from corpus ==&lt;br /&gt;
&lt;br /&gt;
The graph has different structural needs:&lt;br /&gt;
* Annotations have versioning (version + parent_version), confidence, method, prompt_hash as first-class columns&lt;br /&gt;
* Concepts have concept_type and defining_articles&lt;br /&gt;
* No body/body_search/content_fts/language — annotations are structured JSONB, not searchable text&lt;br /&gt;
* Different write patterns: corpus is append-mostly, graph is recompiled in bulk&lt;br /&gt;
&lt;br /&gt;
Forcing annotations into corpus.documents would leave 5 of 14 columns always NULL. That is not optimal — it is forcing.&lt;br /&gt;
&lt;br /&gt;
== Corpus guarantees for the graph ==&lt;br /&gt;
&lt;br /&gt;
The corpus schema guarantees properties the graph depends on:&lt;br /&gt;
&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;Stable IDs&amp;#039;&amp;#039;&amp;#039; — corpus.documents.id never changes after ingestion. The graph references documents by ID.&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;Permanent article identity&amp;#039;&amp;#039;&amp;#039; — &amp;lt;code&amp;gt;tags.cid&amp;lt;/code&amp;gt; groups temporal versions of the same article. The graph needs this to link concepts to articles across renumbering.&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;Immutable body&amp;#039;&amp;#039;&amp;#039; — &amp;lt;code&amp;gt;body&amp;lt;/code&amp;gt; does not change after initial ingestion. Future annotation anchoring (character offsets) depends on this stability.&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;No reverse dependency&amp;#039;&amp;#039;&amp;#039; — corpus never references graph. The graph depends on corpus, not the inverse.&lt;br /&gt;
&lt;br /&gt;
[[Category:Corpus]]&lt;/div&gt;</summary>
		<author><name>Nicolas</name></author>
	</entry>
</feed>