Editing
Corpus/Graph
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
= Knowledge Graph (Draft) = This document is a preliminary sketch. The graph layer will be refined when implementation begins. See the legalscript spec (separate repository) for the full vision. == Scope == The knowledge graph is a separate system built ON TOP of the corpus. It is not part of the corpus schema. It lives in a separate PostgreSQL schema (<code>graph.*</code>) and compiles into self-contained SQLite packages for runtime navigation. == Relationship to corpus == <pre> corpus.* graph.* compiled/ documents (source texts) β concepts (legal concepts) β fr.civil.sqlite edges (citations) β annotations (LLM/human) β fr.travail.sqlite β edges (concept relations) β fr.conso.sqlite β compilations (metadata) </pre> * The corpus stores source data (legislation, decisions, guidance). Precious, days to re-ingest. * The graph stores derived knowledge (concepts, annotations). Reproducible, hours to recompile. * Compiled packages are self-contained SQLite for runtime. No PostgreSQL dependency at query time. == Graph schema (draft) == <syntaxhighlight lang="sql"> CREATE SCHEMA IF NOT EXISTS graph; CREATE TABLE graph.concepts ( id text PRIMARY KEY, -- thematic path: fr.civil.contrat.formation.consentement.vice.dol jurisdiction text NOT NULL, parent_id text, -- parent concept (extends) title text NOT NULL, concept_type text NOT NULL, -- qualifiable, standard_ouvert, principe_directeur, procedural, bareme defining_articles jsonb, -- references to corpus documents metadata jsonb DEFAULT '{}', created_at timestamptz DEFAULT now() ); CREATE TABLE graph.annotations ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, doc_id text NOT NULL, -- references corpus.documents.id annotation_type text NOT NULL, -- structure, defines, illustrates, condition, qualify, framework... concept_id text, -- references graph.concepts.id version int NOT NULL, parent_version int, method text NOT NULL, -- stub, llm, human, jurist confidence text NOT NULL, -- stub, memory_only, source_checked, cross_validated, disputed author text, prompt_hash text, content jsonb NOT NULL, created_at timestamptz DEFAULT now() ); -- PostgreSQL does not allow function calls in table-level UNIQUE constraints. -- Use a unique index instead: CREATE UNIQUE INDEX idx_ann_unique ON graph.annotations (doc_id, annotation_type, coalesce(concept_id, ''), version); CREATE TABLE graph.edges ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, source_id text NOT NULL, target_id text NOT NULL, kind text NOT NULL, properties jsonb DEFAULT '{}', UNIQUE (source_id, target_id, kind) ); CREATE TABLE graph.compilations ( id text PRIMARY KEY, version int NOT NULL, compiled_at timestamptz NOT NULL, source_commit text, quality jsonb, dependencies jsonb, artifact_path text ); </syntaxhighlight> == Why separate from corpus == The graph has different structural needs: * Annotations have versioning (version + parent_version), confidence, method, prompt_hash as first-class columns * Concepts have concept_type and defining_articles * No body/body_search/content_fts/language β annotations are structured JSONB, not searchable text * Different write patterns: corpus is append-mostly, graph is recompiled in bulk Forcing annotations into corpus.documents would leave 5 of 14 columns always NULL. That is not optimal β it is forcing. == Corpus guarantees for the graph == The corpus schema guarantees properties the graph depends on: # '''Stable IDs''' β corpus.documents.id never changes after ingestion. The graph references documents by ID. # '''Permanent article identity''' β <code>tags.cid</code> groups temporal versions of the same article. The graph needs this to link concepts to articles across renumbering. # '''Immutable body''' β <code>body</code> does not change after initial ingestion. Future annotation anchoring (character offsets) depends on this stability. # '''No reverse dependency''' β corpus never references graph. The graph depends on corpus, not the inverse. [[Category:Corpus]]
Summary:
Please note that all contributions to Dura Lex Wiki are considered to be released under the Creative Commons Attribution-ShareAlike (see
Dura Lex Wiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
Edit source
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Tools
What links here
Related changes
Page information