Development/Testing

From Dura Lex Wiki
Jump to navigation Jump to search

Testing Conventions

[edit | edit source]

All duralex-* packages follow these conventions for test organization, naming, and execution.

Directory structure

[edit | edit source]

Tests mirror the src/duralex/ structure:

tests/
├── conftest.py                      # shared fixtures for this repo
├── data/                            # mirrors src/duralex/data/
│   ├── test_models.py
│   └── test_schema.py
├── temporal/                        # mirrors src/duralex/temporal/
│   └── test_temporal_version.py
├── store/                           # mirrors src/duralex/corpus/store/
│   ├── test_full_text_search_configuration.py
│   └── test_full_text_search_query_builder.py
├── integration/                     # tests requiring external services
│   └── test_postgres_document_store.py
└── fixtures/                        # real data files for tests
    ├── fr.legiarti000006436298.xml
    ├── fr.cetatext000046783006.xml
    └── 5fdcdddd994f0448aad44c0b.json

No __init__.py in test directories. Use --import-mode=importlib instead.

Test naming

[edit | edit source]

Pattern: test_{what}_{scenario}

<syntaxhighlight lang="python">

  1. What is being tested + specific scenario

def test_resolve_legal_reference_article_of_code(): def test_resolve_legal_reference_empty_string(): def test_resolve_legal_reference_unknown_pattern(): def test_resolve_legal_reference_pourvoi_number():

def test_parse_legislation_article_basic_fields(): def test_parse_legislation_article_missing_content(): def test_parse_legislation_article_multiple_versions():

def test_sanitize_html_content_strips_script_tags(): def test_sanitize_html_content_preserves_table_structure(): def test_sanitize_html_content_empty_input(): </syntaxhighlight>

The test name alone must tell an auditor what is being verified, without reading the body.

Test body structure

[edit | edit source]

Each test follows a clear three-section structure. Sections are separated by blank lines, not comments.

<syntaxhighlight lang="python"> def test_resolve_legal_reference_article_of_code():

   """'article 1240 du code civil' resolves to fr.law.code.civil.article-1240."""
   resolver = FrenchLegalReferenceResolver()
   results = resolver.resolve_legal_reference("article 1240 du code civil")
   assert len(results) == 1
   assert results[0].uri == "fr.law.code.civil.article-1240"

</syntaxhighlight>

  • Line 1: Setup (create objects, prepare inputs)
  • Line 2: Act (call the function under test)
  • Line 3: Assert (verify the result)

One assertion per concept. Multiple assert statements are fine when they verify different aspects of the same result.

Fixtures

[edit | edit source]

Factory fixtures for data objects

[edit | edit source]

When tests need multiple instances with variations, use factory fixtures:

<syntaxhighlight lang="python"> @pytest.fixture def make_legislation_article():

   """Factory: creates LegislationArticle instances with sensible defaults."""
   def _make(
       article_id: str = "fr.legiarti000006436298",
       article_number: str = "1240",
       code_name: str = "code civil",
       is_in_force: bool = True,
       **overrides,
   ) -> LegislationArticle:
       defaults = {
           "article_id": article_id,
           "article_number": article_number,
           "code_name": code_name,
           "is_in_force": is_in_force,

"html_content": "

Tout fait quelconque de l'homme...

",

           "plain_text_content": "Tout fait quelconque de l'homme...",
       }
       defaults.update(overrides)
       return LegislationArticle(**defaults)
   return _make

</syntaxhighlight>

Real data fixtures

[edit | edit source]

Test fixtures use real files from public institutional data sources (DILA, Judilibre, EUR-Lex). Not invented mocks. This is consistent with the project's commitment to transparency and auditability.

<syntaxhighlight lang="python"> FIXTURES_DIR = Path(__file__).parent / "fixtures"

@pytest.fixture def sample_legi_article_path() -> Path:

   """Real LEGI XML article: article 1240 du code civil."""
   return FIXTURES_DIR / "fr.legiarti000006436298.xml"

@pytest.fixture def sample_jade_decision_path() -> Path:

   """Real JADE XML decision: CE, 13 dec 2022, n 462274."""
   return FIXTURES_DIR / "fr.cetatext000046783006.xml"

</syntaxhighlight>

Session-scoped fixtures for expensive resources

[edit | edit source]

<syntaxhighlight lang="python"> @pytest.fixture(scope="session") def database_connection():

   """PostgreSQL connection via testcontainers. Shared across all tests in session."""
   with PostgresContainer("postgres:17") as postgres:
       pool = ConnectionPool(postgres.get_connection_url())
       yield pool
       pool.close()

</syntaxhighlight>

Markers

[edit | edit source]

Two standard markers, defined in every repo's pyproject.toml:

<syntaxhighlight lang="python"> @pytest.mark.slow def test_parse_all_code_civil_articles():

   """Parse all 2000+ articles of Code civil. Verifies no crash."""
   ...

@pytest.mark.integration def test_full_text_search_returns_ranked_results(database_connection):

   """FTS with real PostgreSQL, real French stemming, real data."""
   ...

</syntaxhighlight>

Default pytest invocation excludes both:

<syntaxhighlight lang="bash"> pytest # unit tests only (fast) pytest -m integration # integration only pytest -m "not slow" # everything except slow pytest -m "" # everything including slow + integration </syntaxhighlight>

Parametrize for data-driven tests

[edit | edit source]

Use @pytest.mark.parametrize for testing multiple inputs against the same logic:

<syntaxhighlight lang="python"> @pytest.mark.parametrize("raw_input, expected_uri", [

   ("article 1240 du code civil", "fr.law.code.civil.article-1240"),
   ("article L. 442-1 du code de commerce", "fr.law.code.commerce.article-l442-1"),
   ("loi n° 85-677", "fr.law.loi.85-677"),
   ("24-14.340", "fr.court.pourvoi.24-14.340"),

]) def test_resolve_legal_reference_known_patterns(raw_input, expected_uri):

   resolver = FrenchLegalReferenceResolver()
   results = resolver.resolve_legal_reference(raw_input)
   assert results[0].uri == expected_uri

</syntaxhighlight>

What NOT to test

[edit | edit source]
  • Private implementation details (functions starting with _)
  • Third-party library behavior (lxml, psycopg, nh3)
  • Configuration values

pyproject.toml reference

[edit | edit source]

<syntaxhighlight lang="toml"> [tool.pytest.ini_options] testpaths = ["tests"] addopts = [

   "--strict-config",
   "--strict-markers",
   "--import-mode=importlib",
   "-m not slow and not integration",

] xfail_strict = true consider_namespace_packages = true filterwarnings = ["error"] markers = [

   "slow: tests with extended runtime",
   "integration: requires PostgreSQL (testcontainers)",

] </syntaxhighlight>