Development/Python
Python Conventions
[edit | edit source]All duralex-* packages follow these conventions. No exceptions.
Naming
[edit | edit source]Casing (PEP 8)
[edit | edit source]| Element | Case | Example |
|---|---|---|
| Functions, methods, variables | snake_case |
parse_legislation_article()
|
| Classes | PascalCase |
LegislationArticle
|
| Constants | UPPER_SNAKE |
ALLOWED_TABLE_NAMES
|
| Enum members | UPPER_SNAKE |
Confidence.SOURCE_CHECKED
|
| Enum string values | lowercase |
"source_checked"
|
| Module filenames | snake_case |
connection_pool.py
|
No abbreviations
[edit | edit source]The code is written and maintained by AI. The AI does not tire of typing. Full words always.
| Wrong | Right |
|---|---|
ref |
reference
|
leg |
legislation
|
dec |
decision
|
doc |
document
|
fts |
full_text_search
|
tbl |
table
|
q |
query
|
lim |
limit
|
flt |
filter
|
el |
element
|
ctx |
context
|
conn |
connection
|
cfg |
configuration
|
num |
number
|
idx |
index
|
val |
value
|
Qualified names
[edit | edit source]Single-word names are ambiguous. Always qualify with the domain.
| Wrong | Right | Why |
|---|---|---|
query |
search_query |
Could be SQL, HTTP, FTS... |
text |
article_text |
Could be anything |
content |
html_content |
What kind? |
result |
search_result |
Result of what? |
data |
decision_data |
Meaningless alone |
items |
matched_articles |
What items? |
response |
search_response |
From where? |
path |
file_path or concept_path |
Filesystem? URI? |
Booleans read as phrases
[edit | edit source]A boolean variable or parameter must read as a true/false statement.
<syntaxhighlight lang="python">
- Wrong
active = True force = False recursive = True
- Right
is_in_force = True should_force_refresh = False is_recursive = True has_been_verified = False should_include_repealed = True </syntaxhighlight>
Classes: named for what they ARE
[edit | edit source]<syntaxhighlight lang="python"> class LegislationArticle: ... class CaseLawDecision: ... class ResolvedReference: ... class AnnotationEnvelope: ... class ConceptDefinition: ... class SearchFilters: ... class SearchResults: ... class CompiledPackage: ... </syntaxhighlight>
Methods: verb + explicit object
[edit | edit source]<syntaxhighlight lang="python"> def parse_legislation_article(xml_path: Path) -> LegislationArticle: ... def resolve_legal_reference(raw_text: str) -> list[ResolvedReference]: ... def search_full_text(query: str, filters: SearchFilters) -> SearchResults: ... def compile_domain_package(domain: str) -> CompiledPackage: ... def sanitize_html_content(raw_html: str) -> str: ... def extract_text_content(element: Element, xpath: str) -> str | None: ... def validate_date_range(date_from: str | None, date_to: str | None) -> None: ... </syntaxhighlight>
Protocols: named for the capability
[edit | edit source]<syntaxhighlight lang="python"> class LegislationParser(Protocol): ... class ReferenceResolver(Protocol): ... class SearchEngine(Protocol): ... class VersionSelector(Protocol): ... class DecisionDownloader(Protocol): ... </syntaxhighlight>
Enums
[edit | edit source]<syntaxhighlight lang="python"> class ConceptType(Enum):
QUALIFIABLE = "qualifiable" OPEN_STANDARD = "open_standard" GUIDING_PRINCIPLE = "guiding_principle" PROCEDURAL = "procedural" SCALE = "scale"
class Confidence(Enum):
STUB = "stub" MEMORY_ONLY = "memory_only" SOURCE_CHECKED = "source_checked" CROSS_VALIDATED = "cross_validated"
class Outcome(Enum):
QUALIFIED = "qualified" NOT_QUALIFIED = "not_qualified" VALIDATED = "validated" INVALIDATED = "invalidated" PROCEDURAL = "procedural" MOOT = "moot"
</syntaxhighlight>
Architecture patterns
[edit | edit source]Dependency injection
[edit | edit source]Dependencies are passed explicitly. No global singletons, no module-level mutable state.
<syntaxhighlight lang="python">
- Wrong
class SearchEngine:
def __init__(self):
self._pool = _get_global_pool()
self._load_cache()
- Right
class FullTextSearchEngine:
def __init__(self, connection_pool: ConnectionPool):
self.connection_pool = connection_pool
</syntaxhighlight>
No side effects in __init__
[edit | edit source]Constructors store parameters. They do not open connections, load caches, or perform I/O.
<syntaxhighlight lang="python">
- Wrong
class FrenchReferenceResolver:
def __init__(self, connection_pool: ConnectionPool):
self.connection_pool = connection_pool
self._code_cache = self._load_code_cache() # I/O in __init__
- Right
class FrenchReferenceResolver:
def __init__(self, connection_pool: ConnectionPool):
self.connection_pool = connection_pool
self._code_cache: dict[str, str] | None = None
def _ensure_code_cache(self) -> dict[str, str]:
if self._code_cache is None:
self._code_cache = self._load_code_cache()
return self._code_cache
</syntaxhighlight>
Composition over inheritance
[edit | edit source]Core libraries define Protocol interfaces. Country packages and plugins provide implementations. Applications compose them.
<syntaxhighlight lang="python">
- duralex -- defines the interface
class ReferenceResolver(Protocol):
def resolve_legal_reference(self, raw_text: str) -> list[ResolvedReference]: ...
class CompositeReferenceResolver:
"""Chains multiple resolvers. First match wins."""
def __init__(self, resolvers: list[ReferenceResolver]):
self.resolvers = resolvers
def resolve_legal_reference(self, raw_text: str) -> list[ResolvedReference]:
for resolver in self.resolvers:
if results := resolver.resolve_legal_reference(raw_text):
return results
return []
- duralex-fr -- implements for France
class FrenchLegalReferenceResolver:
"""French legal references: articles, lois, pourvois, ECLI.""" ...
- Application -- composes at startup
resolver = CompositeReferenceResolver([
FrenchLegalReferenceResolver(connection_pool=pool), SireneCompanyResolver(),
]) </syntaxhighlight>
One module = one concept
[edit | edit source]A Python file should contain one coherent concept. If you need a table of contents to navigate the file, split it.
| Wrong | Right |
|---|---|
db.py (1400 lines: pool + CRUD + FTS + ingest + dedup + browse) |
connection_pool.py, full_text_search.py, ingest_state.py, browse_structure.py
|
validation.py (filters + jurisdiction + pagination + dates + courts) |
search_filters.py, court_classification.py
|
Target: under 300 lines per file. Hard limit: 500 lines.
Type annotations
[edit | edit source]Every function signature is fully annotated. An auditor reads signatures before reading bodies.
<syntaxhighlight lang="python">
- Wrong
def search(query, table, limit=20):
...
- Right
def search_full_text(
search_query: str, table_name: str, result_limit: int = 20, date_from: date | None = None, date_to: date | None = None,
) -> SearchResults:
...
</syntaxhighlight>
Use | union syntax (Python 3.10+), not Optional or Union.
Docstrings
[edit | edit source]Every public class and function has a docstring. Docstrings include Examples blocks -- AI reads examples first to understand expected behavior.
<syntaxhighlight lang="python"> def resolve_legal_reference(raw_text: str) -> list[ResolvedReference]:
"""Parse a legal citation string into structured references.
Runs a pipeline of detectors in priority order (most specific first). First match wins. Returns empty list if no pattern matches.
Args:
raw_text: A French legal citation in natural language.
Returns:
List of resolved references with canonical URIs.
Examples:
>>> resolve_legal_reference("article 1240 du code civil")
[ResolvedReference(uri="fr.law.code.civil.article-1240")]
>>> resolve_legal_reference("loi n° 85-677")
[ResolvedReference(uri="fr.law.loi.85-677")]
>>> resolve_legal_reference("bonjour")
[]
"""
</syntaxhighlight>
Error handling
[edit | edit source]Errors are explicit. Never swallowed. Never hidden behind a generic fallback.
<syntaxhighlight lang="python">
- Wrong
try:
result = parse_article(path)
except Exception:
result = None
- Right
try:
result = parse_article(path)
except FileNotFoundError:
raise ArticleNotFoundError(article_id=article_id, path=path) from None
except etree.XMLSyntaxError as error:
raise ArticleParseError(article_id=article_id, detail=str(error)) from error
</syntaxhighlight>
Custom exception classes inherit from a common base:
<syntaxhighlight lang="python"> class DuralexError(Exception):
"""Base exception for all Dura Lex errors."""
class ArticleNotFoundError(DuralexError):
"""Raised when a legislation article file does not exist on disk."""
class ArticleParseError(DuralexError):
"""Raised when a legislation article XML file cannot be parsed."""
class ReferenceResolutionError(DuralexError):
"""Raised when a legal reference is ambiguous or malformed."""
</syntaxhighlight>
Language
[edit | edit source]All code is in English. Variable names, function names, class names, docstrings, comments, error messages -- everything.
Content is in the jurisdiction's language. Concept names (fr.civil.contrat.formation.consentement.vice.dol), article text, legal vocabulary, court names -- these are in French (or the local language of the jurisdiction).
The boundary is clear: code structure is English, data values are local.