Development/Python

From Dura Lex Wiki
Revision as of 02:04, 23 April 2026 by Nicolas (talk | contribs) (Create Python conventions page from coding-conventions/PYTHON.md (via create-page on MediaWiki MCP Server))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Python Conventions

[edit | edit source]

All duralex-* packages follow these conventions. No exceptions.

Naming

[edit | edit source]

Casing (PEP 8)

[edit | edit source]
Element Case Example
Functions, methods, variables snake_case parse_legislation_article()
Classes PascalCase LegislationArticle
Constants UPPER_SNAKE ALLOWED_TABLE_NAMES
Enum members UPPER_SNAKE Confidence.SOURCE_CHECKED
Enum string values lowercase "source_checked"
Module filenames snake_case connection_pool.py

No abbreviations

[edit | edit source]

The code is written and maintained by AI. The AI does not tire of typing. Full words always.

Wrong Right
ref reference
leg legislation
dec decision
doc document
fts full_text_search
tbl table
q query
lim limit
flt filter
el element
ctx context
conn connection
cfg configuration
num number
idx index
val value

Qualified names

[edit | edit source]

Single-word names are ambiguous. Always qualify with the domain.

Wrong Right Why
query search_query Could be SQL, HTTP, FTS...
text article_text Could be anything
content html_content What kind?
result search_result Result of what?
data decision_data Meaningless alone
items matched_articles What items?
response search_response From where?
path file_path or concept_path Filesystem? URI?

Booleans read as phrases

[edit | edit source]

A boolean variable or parameter must read as a true/false statement.

<syntaxhighlight lang="python">

  1. Wrong

active = True force = False recursive = True

  1. Right

is_in_force = True should_force_refresh = False is_recursive = True has_been_verified = False should_include_repealed = True </syntaxhighlight>

Classes: named for what they ARE

[edit | edit source]

<syntaxhighlight lang="python"> class LegislationArticle: ... class CaseLawDecision: ... class ResolvedReference: ... class AnnotationEnvelope: ... class ConceptDefinition: ... class SearchFilters: ... class SearchResults: ... class CompiledPackage: ... </syntaxhighlight>

Methods: verb + explicit object

[edit | edit source]

<syntaxhighlight lang="python"> def parse_legislation_article(xml_path: Path) -> LegislationArticle: ... def resolve_legal_reference(raw_text: str) -> list[ResolvedReference]: ... def search_full_text(query: str, filters: SearchFilters) -> SearchResults: ... def compile_domain_package(domain: str) -> CompiledPackage: ... def sanitize_html_content(raw_html: str) -> str: ... def extract_text_content(element: Element, xpath: str) -> str | None: ... def validate_date_range(date_from: str | None, date_to: str | None) -> None: ... </syntaxhighlight>

Protocols: named for the capability

[edit | edit source]

<syntaxhighlight lang="python"> class LegislationParser(Protocol): ... class ReferenceResolver(Protocol): ... class SearchEngine(Protocol): ... class VersionSelector(Protocol): ... class DecisionDownloader(Protocol): ... </syntaxhighlight>

Enums

[edit | edit source]

<syntaxhighlight lang="python"> class ConceptType(Enum):

   QUALIFIABLE = "qualifiable"
   OPEN_STANDARD = "open_standard"
   GUIDING_PRINCIPLE = "guiding_principle"
   PROCEDURAL = "procedural"
   SCALE = "scale"

class Confidence(Enum):

   STUB = "stub"
   MEMORY_ONLY = "memory_only"
   SOURCE_CHECKED = "source_checked"
   CROSS_VALIDATED = "cross_validated"

class Outcome(Enum):

   QUALIFIED = "qualified"
   NOT_QUALIFIED = "not_qualified"
   VALIDATED = "validated"
   INVALIDATED = "invalidated"
   PROCEDURAL = "procedural"
   MOOT = "moot"

</syntaxhighlight>

Architecture patterns

[edit | edit source]

Dependency injection

[edit | edit source]

Dependencies are passed explicitly. No global singletons, no module-level mutable state.

<syntaxhighlight lang="python">

  1. Wrong

class SearchEngine:

   def __init__(self):
       self._pool = _get_global_pool()
       self._load_cache()
  1. Right

class FullTextSearchEngine:

   def __init__(self, connection_pool: ConnectionPool):
       self.connection_pool = connection_pool

</syntaxhighlight>

No side effects in __init__

[edit | edit source]

Constructors store parameters. They do not open connections, load caches, or perform I/O.

<syntaxhighlight lang="python">

  1. Wrong

class FrenchReferenceResolver:

   def __init__(self, connection_pool: ConnectionPool):
       self.connection_pool = connection_pool
       self._code_cache = self._load_code_cache()  # I/O in __init__
  1. Right

class FrenchReferenceResolver:

   def __init__(self, connection_pool: ConnectionPool):
       self.connection_pool = connection_pool
       self._code_cache: dict[str, str] | None = None
   def _ensure_code_cache(self) -> dict[str, str]:
       if self._code_cache is None:
           self._code_cache = self._load_code_cache()
       return self._code_cache

</syntaxhighlight>

Composition over inheritance

[edit | edit source]

Core libraries define Protocol interfaces. Country packages and plugins provide implementations. Applications compose them.

<syntaxhighlight lang="python">

  1. duralex -- defines the interface

class ReferenceResolver(Protocol):

   def resolve_legal_reference(self, raw_text: str) -> list[ResolvedReference]: ...

class CompositeReferenceResolver:

   """Chains multiple resolvers. First match wins."""
   def __init__(self, resolvers: list[ReferenceResolver]):
       self.resolvers = resolvers
   def resolve_legal_reference(self, raw_text: str) -> list[ResolvedReference]:
       for resolver in self.resolvers:
           if results := resolver.resolve_legal_reference(raw_text):
               return results
       return []
  1. duralex-fr -- implements for France

class FrenchLegalReferenceResolver:

   """French legal references: articles, lois, pourvois, ECLI."""
   ...
  1. Application -- composes at startup

resolver = CompositeReferenceResolver([

   FrenchLegalReferenceResolver(connection_pool=pool),
   SireneCompanyResolver(),

]) </syntaxhighlight>

One module = one concept

[edit | edit source]

A Python file should contain one coherent concept. If you need a table of contents to navigate the file, split it.

Wrong Right
db.py (1400 lines: pool + CRUD + FTS + ingest + dedup + browse) connection_pool.py, full_text_search.py, ingest_state.py, browse_structure.py
validation.py (filters + jurisdiction + pagination + dates + courts) search_filters.py, court_classification.py

Target: under 300 lines per file. Hard limit: 500 lines.

Type annotations

[edit | edit source]

Every function signature is fully annotated. An auditor reads signatures before reading bodies.

<syntaxhighlight lang="python">

  1. Wrong

def search(query, table, limit=20):

   ...
  1. Right

def search_full_text(

   search_query: str,
   table_name: str,
   result_limit: int = 20,
   date_from: date | None = None,
   date_to: date | None = None,

) -> SearchResults:

   ...

</syntaxhighlight>

Use | union syntax (Python 3.10+), not Optional or Union.

Docstrings

[edit | edit source]

Every public class and function has a docstring. Docstrings include Examples blocks -- AI reads examples first to understand expected behavior.

<syntaxhighlight lang="python"> def resolve_legal_reference(raw_text: str) -> list[ResolvedReference]:

   """Parse a legal citation string into structured references.
   Runs a pipeline of detectors in priority order (most specific first).
   First match wins. Returns empty list if no pattern matches.
   Args:
       raw_text: A French legal citation in natural language.
   Returns:
       List of resolved references with canonical URIs.
   Examples:
       >>> resolve_legal_reference("article 1240 du code civil")
       [ResolvedReference(uri="fr.law.code.civil.article-1240")]
       >>> resolve_legal_reference("loi n° 85-677")
       [ResolvedReference(uri="fr.law.loi.85-677")]
       >>> resolve_legal_reference("bonjour")
       []
   """

</syntaxhighlight>

Error handling

[edit | edit source]

Errors are explicit. Never swallowed. Never hidden behind a generic fallback.

<syntaxhighlight lang="python">

  1. Wrong

try:

   result = parse_article(path)

except Exception:

   result = None
  1. Right

try:

   result = parse_article(path)

except FileNotFoundError:

   raise ArticleNotFoundError(article_id=article_id, path=path) from None

except etree.XMLSyntaxError as error:

   raise ArticleParseError(article_id=article_id, detail=str(error)) from error

</syntaxhighlight>

Custom exception classes inherit from a common base:

<syntaxhighlight lang="python"> class DuralexError(Exception):

   """Base exception for all Dura Lex errors."""

class ArticleNotFoundError(DuralexError):

   """Raised when a legislation article file does not exist on disk."""

class ArticleParseError(DuralexError):

   """Raised when a legislation article XML file cannot be parsed."""

class ReferenceResolutionError(DuralexError):

   """Raised when a legal reference is ambiguous or malformed."""

</syntaxhighlight>

Language

[edit | edit source]

All code is in English. Variable names, function names, class names, docstrings, comments, error messages -- everything.

Content is in the jurisdiction's language. Concept names (fr.civil.contrat.formation.consentement.vice.dol), article text, legal vocabulary, court names -- these are in French (or the local language of the jurisdiction).

The boundary is clear: code structure is English, data values are local.