Editing Development/Testing (section)

= Testing Conventions =

All duralex-* packages follow these conventions for test organization, naming, and execution.

== Directory structure ==

Tests mirror the <code>src/duralex/</code> structure:

<pre>
tests/
├── conftest.py                      # shared fixtures for this repo
├── data/                            # mirrors src/duralex/data/
│   ├── test_models.py
│   └── test_schema.py
├── temporal/                        # mirrors src/duralex/temporal/
│   └── test_temporal_version.py
├── store/                           # mirrors src/duralex/corpus/store/
│   ├── test_full_text_search_configuration.py
│   └── test_full_text_search_query_builder.py
├── integration/                     # tests requiring external services
│   └── test_postgres_document_store.py
└── fixtures/                        # real data files for tests
    ├── fr.legiarti000006436298.xml
    ├── fr.cetatext000046783006.xml
    └── 5fdcdddd994f0448aad44c0b.json
</pre>

No <code>__init__.py</code> in test directories. Use <code>--import-mode=importlib</code> instead.

== Test naming ==

Pattern: <code>test_{what}_{scenario}</code>

<syntaxhighlight lang="python">
# What is being tested + specific scenario
def test_resolve_legal_reference_article_of_code():
def test_resolve_legal_reference_empty_string():
def test_resolve_legal_reference_unknown_pattern():
def test_resolve_legal_reference_pourvoi_number():

def test_parse_legislation_article_basic_fields():
def test_parse_legislation_article_missing_content():
def test_parse_legislation_article_multiple_versions():

def test_sanitize_html_content_strips_script_tags():
def test_sanitize_html_content_preserves_table_structure():
def test_sanitize_html_content_empty_input():
</syntaxhighlight>

The test name alone must tell an auditor what is being verified, without reading the body.

== Test body structure ==

Each test follows a clear three-section structure. Sections are separated by blank lines, not comments.

<syntaxhighlight lang="python">
def test_resolve_legal_reference_article_of_code():
    """'article 1240 du code civil' resolves to fr.law.code.civil.article-1240."""
    resolver = FrenchLegalReferenceResolver()

    results = resolver.resolve_legal_reference("article 1240 du code civil")

    assert len(results) == 1
    assert results[0].uri == "fr.law.code.civil.article-1240"
</syntaxhighlight>

* '''Line 1:''' Setup (create objects, prepare inputs)
* '''Line 2:''' Act (call the function under test)
* '''Line 3:''' Assert (verify the result)

One assertion per concept. Multiple <code>assert</code> statements are fine when they verify different aspects of the same result.

== Fixtures ==

=== Factory fixtures for data objects ===

When tests need multiple instances with variations, use factory fixtures:

<syntaxhighlight lang="python">
@pytest.fixture
def make_legislation_article():
    """Factory: creates LegislationArticle instances with sensible defaults."""
    def _make(
        article_id: str = "fr.legiarti000006436298",
        article_number: str = "1240",
        code_name: str = "code civil",
        is_in_force: bool = True,
        **overrides,
    ) -> LegislationArticle:
        defaults = {
            "article_id": article_id,
            "article_number": article_number,
            "code_name": code_name,
            "is_in_force": is_in_force,
            "html_content": "<p>Tout fait quelconque de l'homme...</p>",
            "plain_text_content": "Tout fait quelconque de l'homme...",
        }
        defaults.update(overrides)
        return LegislationArticle(**defaults)
    return _make
</syntaxhighlight>

=== Real data fixtures ===

Test fixtures use '''real files''' from public institutional data sources (DILA, Judilibre, EUR-Lex). Not invented mocks. This is consistent with the project's commitment to transparency and auditability.

<syntaxhighlight lang="python">
FIXTURES_DIR = Path(__file__).parent / "fixtures"

@pytest.fixture
def sample_legi_article_path() -> Path:
    """Real LEGI XML article: article 1240 du code civil."""
    return FIXTURES_DIR / "fr.legiarti000006436298.xml"

@pytest.fixture
def sample_jade_decision_path() -> Path:
    """Real JADE XML decision: CE, 13 dec 2022, n 462274."""
    return FIXTURES_DIR / "fr.cetatext000046783006.xml"
</syntaxhighlight>

=== Session-scoped fixtures for expensive resources ===

<syntaxhighlight lang="python">
@pytest.fixture(scope="session")
def database_connection():
    """PostgreSQL connection via testcontainers. Shared across all tests in session."""
    with PostgresContainer("postgres:17") as postgres:
        pool = ConnectionPool(postgres.get_connection_url())
        yield pool
        pool.close()
</syntaxhighlight>

== Markers ==

Two standard markers, defined in every repo's <code>pyproject.toml</code>:

<syntaxhighlight lang="python">
@pytest.mark.slow
def test_parse_all_code_civil_articles():
    """Parse all 2000+ articles of Code civil. Verifies no crash."""
    ...

@pytest.mark.integration
def test_full_text_search_returns_ranked_results(database_connection):
    """FTS with real PostgreSQL, real French stemming, real data."""
    ...
</syntaxhighlight>

Default <code>pytest</code> invocation excludes both:

<syntaxhighlight lang="bash">
pytest                         # unit tests only (fast)
pytest -m integration          # integration only
pytest -m "not slow"           # everything except slow
pytest -m ""                   # everything including slow + integration
</syntaxhighlight>

== Parametrize for data-driven tests ==

Use <code>@pytest.mark.parametrize</code> for testing multiple inputs against the same logic:

<syntaxhighlight lang="python">
@pytest.mark.parametrize("raw_input, expected_uri", [
    ("article 1240 du code civil", "fr.law.code.civil.article-1240"),
    ("article L. 442-1 du code de commerce", "fr.law.code.commerce.article-l442-1"),
    ("loi n° 85-677", "fr.law.loi.85-677"),
    ("24-14.340", "fr.court.pourvoi.24-14.340"),
])
def test_resolve_legal_reference_known_patterns(raw_input, expected_uri):
    resolver = FrenchLegalReferenceResolver()
    results = resolver.resolve_legal_reference(raw_input)
    assert results[0].uri == expected_uri
</syntaxhighlight>

== What NOT to test ==

* Private implementation details (functions starting with <code>_</code>)
* Third-party library behavior (lxml, psycopg, nh3)
* Configuration values

== pyproject.toml reference ==

<syntaxhighlight lang="toml">
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = [
    "--strict-config",
    "--strict-markers",
    "--import-mode=importlib",
    "-m not slow and not integration",
]
xfail_strict = true
consider_namespace_packages = true
filterwarnings = ["error"]
markers = [
    "slow: tests with extended runtime",
    "integration: requires PostgreSQL (testcontainers)",
]
</syntaxhighlight>

[[Category:Development]]