Editing
Corpus/Schema
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Schema: corpus.edges == <syntaxhighlight lang="sql"> CREATE TABLE corpus.edges ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, source_id text NOT NULL, target_id text, kind text NOT NULL, reference text, properties jsonb DEFAULT '{}', UNIQUE (source_id, target_id, kind) ); CREATE INDEX idx_edge_source ON corpus.edges (source_id); CREATE INDEX idx_edge_target ON corpus.edges (target_id) WHERE target_id IS NOT NULL; CREATE INDEX idx_edge_kind ON corpus.edges (kind); </syntaxhighlight> === Column definitions === '''id''' -- Synthetic identity, <code>bigint GENERATED ALWAYS AS IDENTITY</code>. Edges have no natural key. '''source_id''' -- The document that originates the relationship. Not a foreign key. '''target_id''' -- The document that receives the relationship. Nullable: an unresolved reference (a citation to a document not yet ingested, or a reference to an external system) has <code>target_id = NULL</code> and <code>reference</code> containing the raw citation text. Once the target is ingested or resolved, <code>target_id</code> is backfilled. '''kind''' -- Relationship type. Open vocabulary. Examples: <code>cites</code>, <code>amends</code>, <code>repeals</code>, <code>implements</code>, <code>transposes</code>, <code>overrules</code>, <code>affirms</code>, <code>commences</code>, <code>language_variant</code>. See [[Corpus/Edge types|EDGE-TYPES]] for the full taxonomy. '''reference''' -- Raw citation text as found in the source document. Preserved even after resolution for traceability. Example: <code>"article L. 121-1 du code de la consommation"</code>. '''properties''' -- JSONB metadata on the relationship. Standard provenance keys (see ADR: edge pipeline architecture, design-decisions/2026-04-22-edge-pipeline-architecture.md): * <code>extraction</code>: <code>"publisher_provided"</code> (source has a structured link) or <code>"id_resolved"</code> (textual reference resolved by the reference resolver). * <code>source_type</code>: identifies the source system (<code>legi_lien</code>, <code>cellar_sparql</code>, <code>hudoc_appno</code>, <code>cellar_nim</code>). * <code>raw_link_type</code>: source-native link type preserved for traceability (e.g., <code>"TRANSPOSITION"</code>, <code>"cdm:resource_legal_amends_resource_legal"</code>). Type-specific examples: <code>{"scope": "partial", "articles": ["1", "2", "3"]}</code> for partial amendments, <code>{"commencement_date": "2024-01-01", "territorial_extent": "england_and_wales"}</code> for UK effects, <code>{"reservation": "sous reserve de..."}</code> for conditional constitutionality decisions. === Quarantine model === Edges with unresolved targets are stored with <code>target_id = NULL</code> and the raw citation in <code>reference</code>. They are visible to MCP and queryable for monitoring. The partial unique index <code>idx_edge_unique_without_target ON (source_id, reference, kind) WHERE target_id IS NULL</code> prevents duplicate quarantine entries. '''Ownership:''' an edge is owned by the jurisdiction prefix of its <code>source_id</code>. Drop-and-reingest for jurisdiction X deletes edges WHERE <code>source_id LIKE '{x}.%'</code>. Cross-jurisdiction quarantine edges (e.g., EU NIM β FR target) survive the target jurisdiction's reingest. '''Reconciliation:''' after each jurisdiction ingest, a post-ingest pass attempts to resolve quarantine edges whose reference matches the freshly ingested jurisdiction. Resolved β <code>UPDATE target_id</code>. Still unresolved β remains in quarantine. === Why no foreign keys === At 50M+ edges, FK checks on every insert become a measurable bottleneck. Sources are ingested independently and in parallel -- a French decision citing a EU directive may be ingested before the directive itself. FKs would either block ingestion order or require two-pass inserts. Orphan cleanup runs periodically (post-ingest batch job), not on every write. ----
Summary:
Please note that all contributions to Dura Lex Wiki are considered to be released under the Creative Commons Attribution-ShareAlike (see
Dura Lex Wiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
Edit source
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
Tools
What links here
Related changes
Page information