Development/Testing
Testing Conventions
[edit | edit source]All duralex-* packages follow these conventions for test organization, naming, and execution.
Directory structure
[edit | edit source]Tests mirror the src/duralex/ structure:
tests/
├── conftest.py # shared fixtures for this repo
├── data/ # mirrors src/duralex/data/
│ ├── test_models.py
│ └── test_schema.py
├── temporal/ # mirrors src/duralex/temporal/
│ └── test_temporal_version.py
├── store/ # mirrors src/duralex/corpus/store/
│ ├── test_full_text_search_configuration.py
│ └── test_full_text_search_query_builder.py
├── integration/ # tests requiring external services
│ └── test_postgres_document_store.py
└── fixtures/ # real data files for tests
├── fr.legiarti000006436298.xml
├── fr.cetatext000046783006.xml
└── 5fdcdddd994f0448aad44c0b.json
No __init__.py in test directories. Use --import-mode=importlib instead.
Test naming
[edit | edit source]Pattern: test_{what}_{scenario}
<syntaxhighlight lang="python">
- What is being tested + specific scenario
def test_resolve_legal_reference_article_of_code(): def test_resolve_legal_reference_empty_string(): def test_resolve_legal_reference_unknown_pattern(): def test_resolve_legal_reference_pourvoi_number():
def test_parse_legislation_article_basic_fields(): def test_parse_legislation_article_missing_content(): def test_parse_legislation_article_multiple_versions():
def test_sanitize_html_content_strips_script_tags(): def test_sanitize_html_content_preserves_table_structure(): def test_sanitize_html_content_empty_input(): </syntaxhighlight>
The test name alone must tell an auditor what is being verified, without reading the body.
Test body structure
[edit | edit source]Each test follows a clear three-section structure. Sections are separated by blank lines, not comments.
<syntaxhighlight lang="python"> def test_resolve_legal_reference_article_of_code():
"""'article 1240 du code civil' resolves to fr.law.code.civil.article-1240.""" resolver = FrenchLegalReferenceResolver()
results = resolver.resolve_legal_reference("article 1240 du code civil")
assert len(results) == 1 assert results[0].uri == "fr.law.code.civil.article-1240"
</syntaxhighlight>
- Line 1: Setup (create objects, prepare inputs)
- Line 2: Act (call the function under test)
- Line 3: Assert (verify the result)
One assertion per concept. Multiple assert statements are fine when they verify different aspects of the same result.
Fixtures
[edit | edit source]Factory fixtures for data objects
[edit | edit source]When tests need multiple instances with variations, use factory fixtures:
<syntaxhighlight lang="python"> @pytest.fixture def make_legislation_article():
"""Factory: creates LegislationArticle instances with sensible defaults."""
def _make(
article_id: str = "fr.legiarti000006436298",
article_number: str = "1240",
code_name: str = "code civil",
is_in_force: bool = True,
**overrides,
) -> LegislationArticle:
defaults = {
"article_id": article_id,
"article_number": article_number,
"code_name": code_name,
"is_in_force": is_in_force,
"html_content": "
Tout fait quelconque de l'homme...
",
"plain_text_content": "Tout fait quelconque de l'homme...",
}
defaults.update(overrides)
return LegislationArticle(**defaults)
return _make
</syntaxhighlight>
Real data fixtures
[edit | edit source]Test fixtures use real files from public institutional data sources (DILA, Judilibre, EUR-Lex). Not invented mocks. This is consistent with the project's commitment to transparency and auditability.
<syntaxhighlight lang="python"> FIXTURES_DIR = Path(__file__).parent / "fixtures"
@pytest.fixture def sample_legi_article_path() -> Path:
"""Real LEGI XML article: article 1240 du code civil.""" return FIXTURES_DIR / "fr.legiarti000006436298.xml"
@pytest.fixture def sample_jade_decision_path() -> Path:
"""Real JADE XML decision: CE, 13 dec 2022, n 462274.""" return FIXTURES_DIR / "fr.cetatext000046783006.xml"
</syntaxhighlight>
Session-scoped fixtures for expensive resources
[edit | edit source]<syntaxhighlight lang="python"> @pytest.fixture(scope="session") def database_connection():
"""PostgreSQL connection via testcontainers. Shared across all tests in session."""
with PostgresContainer("postgres:17") as postgres:
pool = ConnectionPool(postgres.get_connection_url())
yield pool
pool.close()
</syntaxhighlight>
Markers
[edit | edit source]Two standard markers, defined in every repo's pyproject.toml:
<syntaxhighlight lang="python"> @pytest.mark.slow def test_parse_all_code_civil_articles():
"""Parse all 2000+ articles of Code civil. Verifies no crash.""" ...
@pytest.mark.integration def test_full_text_search_returns_ranked_results(database_connection):
"""FTS with real PostgreSQL, real French stemming, real data.""" ...
</syntaxhighlight>
Default pytest invocation excludes both:
<syntaxhighlight lang="bash"> pytest # unit tests only (fast) pytest -m integration # integration only pytest -m "not slow" # everything except slow pytest -m "" # everything including slow + integration </syntaxhighlight>
Parametrize for data-driven tests
[edit | edit source]Use @pytest.mark.parametrize for testing multiple inputs against the same logic:
<syntaxhighlight lang="python"> @pytest.mark.parametrize("raw_input, expected_uri", [
("article 1240 du code civil", "fr.law.code.civil.article-1240"),
("article L. 442-1 du code de commerce", "fr.law.code.commerce.article-l442-1"),
("loi n° 85-677", "fr.law.loi.85-677"),
("24-14.340", "fr.court.pourvoi.24-14.340"),
]) def test_resolve_legal_reference_known_patterns(raw_input, expected_uri):
resolver = FrenchLegalReferenceResolver() results = resolver.resolve_legal_reference(raw_input) assert results[0].uri == expected_uri
</syntaxhighlight>
What NOT to test
[edit | edit source]- Private implementation details (functions starting with
_) - Third-party library behavior (lxml, psycopg, nh3)
- Configuration values
pyproject.toml reference
[edit | edit source]<syntaxhighlight lang="toml"> [tool.pytest.ini_options] testpaths = ["tests"] addopts = [
"--strict-config", "--strict-markers", "--import-mode=importlib", "-m not slow and not integration",
] xfail_strict = true consider_namespace_packages = true filterwarnings = ["error"] markers = [
"slow: tests with extended runtime", "integration: requires PostgreSQL (testcontainers)",
] </syntaxhighlight>