Files
noteflow/tests/CLAUDE.md

13 KiB

Python Tests Development Guide

Overview

The test suite (tests/) provides comprehensive coverage for the NoteFlow Python backend. Tests are organized to mirror the source structure and enforce quality gates through baseline comparison.

Architecture: pytest + pytest-asyncio + testcontainers for integration


Directory Structure

tests/
├── application/          # Service and use case tests
├── benchmarks/          # Performance benchmarking tests
├── config/              # Configuration tests
├── domain/              # Domain entity and value object tests
├── fixtures/            # Test data files (audio samples)
│   └── audio/           # Audio fixture files
├── grpc/                # gRPC servicer and streaming tests
├── infrastructure/      # Infrastructure adapter tests
│   ├── asr/             # Speech-to-text engine tests
│   ├── audio/           # Audio processing tests
│   ├── auth/            # Authentication/OAuth tests
│   ├── calendar/        # Calendar integration tests
│   ├── diarization/     # Speaker diarization tests
│   ├── export/          # Export functionality tests
│   ├── gpu/             # GPU hardware detection tests
│   ├── metrics/         # Metrics and observability tests
│   ├── ner/             # Named Entity Recognition tests
│   ├── observability/   # OpenTelemetry instrumentation tests
│   ├── persistence/     # Database ORM tests
│   ├── security/        # Encryption/crypto tests
│   ├── summarization/   # Summary generation tests
│   ├── triggers/        # Window trigger detection tests
│   └── webhooks/        # Webhook delivery tests
├── integration/         # Full system integration tests (PostgreSQL)
├── quality/             # Code quality and static analysis tests
│   ├── _detectors/      # Quality rule detector implementations
│   └── baselines.json   # Frozen violation baseline
├── stress/              # Stress, concurrency, and fuzz tests
└── conftest.py          # Root-level shared fixtures

Running Tests

# All tests
pytest

# Skip slow tests (model loading)
pytest -m "not slow"

# Integration tests only
pytest -m integration

# Quality gate checks
pytest tests/quality/

# Stress/fuzz tests
pytest tests/stress/

# Specific test file
pytest tests/domain/test_meeting.py

# With coverage
pytest --cov=src/noteflow --cov-report=html

Pytest Markers

Marker Purpose
@pytest.mark.slow Model loading, GPU operations
@pytest.mark.integration External services (PostgreSQL)
@pytest.mark.stress Stress and concurrency tests

Usage:

@pytest.mark.slow
@pytest.mark.parametrize("model_size", ["tiny", "base"])
def test_asr_model_loading(model_size: str) -> None:
    ...

Key Fixtures

Root Fixtures (conftest.py)

Fixture Scope Purpose
mock_optional_extras session Mocks openai, anthropic, ollama (collection time)
reset_context_vars function Isolates logging context variables
mock_uow function Full UnitOfWork mock with all repos
meeting_id function MeetingId (UUID-based)
sample_meeting function Meeting in CREATED state
recording_meeting function Meeting in RECORDING state
sample_rate function 16000 (DEFAULT_SAMPLE_RATE)
crypto function AesGcmCryptoBox with InMemoryKeyStore
meetings_dir function Temporary meetings directory
webhook_config function WebhookConfig for MEETING_COMPLETED
mock_grpc_context function Mock gRPC ServicerContext
mockasr_engine function Mock ASR engine
memory_servicer function In-memory NoteFlowServicer

Utility Functions (not fixtures):

  • approx_float(expected, rel, abs) — Type-safe pytest.approx wrapper
  • approx_sequence(expected, rel, abs) — Float sequence comparison

Integration Fixtures (integration/conftest.py)

Fixture Purpose
session_factory PostgreSQL async_sessionmaker (testcontainers)
session Individual AsyncSession with rollback
persisted_meeting Created and persisted Meeting
stopped_meeting_with_segments Meeting in STOPPED state with speaker segments
audio_fixture_path Path to sample_discord.wav
audio_samples Normalized float32 array

Quality Fixtures (quality/)

Fixture Purpose
baselines.json Frozen violation counts by rule
_baseline.py Baseline comparison utilities
_detectors/ AST-based detection implementations

Testing Patterns

1. Declarative Tests (No Loops)

CRITICAL: Test functions must NOT contain loops. Use parametrize:

# ✅ CORRECT: Parametrized test
@pytest.mark.parametrize(
    ("text", "expected_category"),
    [
        ("John Smith", EntityCategory.PERSON),
        ("Apple Inc.", EntityCategory.COMPANY),
        ("Python", EntityCategory.TECHNICAL),
    ],
)
def test_extract_entity_category(text: str, expected_category: EntityCategory) -> None:
    result = extract_entity(text)
    assert result.category == expected_category


# ❌ WRONG: Loop in test
def test_extract_entity_categories() -> None:
    test_cases = [("John Smith", EntityCategory.PERSON), ...]
    for text, expected in test_cases:  # FORBIDDEN
        result = extract_entity(text)
        assert result.category == expected

2. Async Testing

# Async fixtures
@pytest.fixture
async def persisted_meeting(session: AsyncSession) -> Meeting:
    meeting = Meeting.create(...)
    session.add(MeetingModel.from_entity(meeting))
    await session.commit()
    return meeting


# Async tests (auto-marked by pytest-asyncio)
async def test_fetch_meeting(uow: UnitOfWork, persisted_meeting: Meeting) -> None:
    result = await uow.meetings.get(persisted_meeting.id)
    assert result is not None

3. Mock Strategies

Repository Mocks:

@pytest.fixture
def mock_meetings_repo() -> AsyncMock:
    repo = AsyncMock(spec=MeetingRepository)
    repo.get = AsyncMock(return_value=None)
    repo.create = AsyncMock()
    return repo

Service Mocks with Side Effects:

@pytest.fixture
def mock_executor(captured_payloads: list) -> AsyncMock:
    async def capture_delivery(config, event_type, payload):
        captured_payloads.append(payload)
        return WebhookDelivery(status_code=200, ...)

    executor = AsyncMock(spec=WebhookExecutor)
    executor.deliver = AsyncMock(side_effect=capture_delivery)
    return executor

4. Quality Gate Pattern

Tests in tests/quality/ use baseline comparison:

def test_no_high_complexity_functions() -> None:
    violations = collect_high_complexity(parse_errors=[])
    assert_no_new_violations("high_complexity", violations)

Behavior:

  • Fails if NEW violations are introduced
  • Passes if violations match or decrease from baseline
  • Baseline is frozen in baselines.json

5. Integration Test Setup

@pytest.fixture(scope="session")
async def session_factory() -> AsyncGenerator[async_sessionmaker, None]:
    _, database_url = get_or_create_container()  # testcontainers
    engine = create_test_engine(database_url)
    async with engine.begin() as conn:
        await initialize_test_schema(conn)
    yield create_test_session_factory(engine)
    await cleanup_test_schema(conn)
    await engine.dispose()

Quality Tests (tests/quality/)

Quality Rules

Test File Rule Description
test_code_smells.py high_complexity Cyclomatic complexity > threshold
test_code_smells.py god_class Classes with too many methods
test_code_smells.py deep_nesting Nesting depth > 7 levels
test_code_smells.py long_method Methods > 50 lines
test_test_smells.py test_loop Loops in test assertions
test_test_smells.py test_conditional Conditionals in assertions
test_magic_values.py magic_number Unextracted numeric constants
test_duplicate_code.py code_duplication Repeated code blocks
test_stale_code.py dead_code Unreachable code
test_unnecessary_wrappers.py thin_wrapper Unnecessary wrapper functions
test_decentralized_helpers.py helper_sprawl Unconsolidated helpers
test_baseline_self.py baseline_integrity Baseline file validation

Baseline System

# _baseline.py
@dataclass
class Violation:
    rule: str
    relative_path: str
    identifier: str
    detail: str | None = None

    @property
    def stable_id(self) -> str:
        """Unique identifier: rule|relative_path|identifier[|detail]"""
        ...

def assert_no_new_violations(rule: str, violations: list[Violation]) -> None:
    """Compares current violations against frozen baseline."""
    baseline = load_baseline()
    result = compare_violations(baseline[rule], violations)
    assert result.passed, f"New violations: {result.new_violations}"

Running Quality Checks

# All quality tests
pytest tests/quality/

# Specific rule
pytest tests/quality/test_code_smells.py

# Update baseline (REQUIRES APPROVAL)
# Do NOT do this without explicit permission

Fixture Scope Guidelines

Scope Use When
session Expensive setup (DB containers, ML models)
module Shared across test class (NER engine)
function Per-test isolation (default)
autouse=True Auto-inject into all tests

Adding New Tests

1. Choose Location

Mirror the source structure:

  • src/noteflow/domain/entities/meeting.pytests/domain/test_meeting.py
  • src/noteflow/infrastructure/ner/engine.pytests/infrastructure/ner/test_engine.py

2. Use Existing Fixtures

Check conftest.py files for reusable fixtures before creating new ones.

3. Follow Patterns

# tests/<module>/test_my_feature.py
import pytest
from noteflow.<module> import MyFeature


class TestMyFeature:
    """Tests for MyFeature class."""

    def test_basic_operation(self, sample_meeting: Meeting) -> None:
        """Test basic operation with sample meeting."""
        feature = MyFeature()
        result = feature.process(sample_meeting)
        assert result.success is True

    @pytest.mark.parametrize(
        ("input_value", "expected"),
        [
            ("valid", True),
            ("invalid", False),
        ],
    )
    def test_validation(self, input_value: str, expected: bool) -> None:
        """Test validation with various inputs."""
        result = MyFeature.validate(input_value)
        assert result == expected

    @pytest.mark.slow
    def test_with_model_loading(self) -> None:
        """Test requiring ML model loading."""
        ...

4. Add Fixtures to conftest.py

# tests/<module>/conftest.py
import pytest
from noteflow.<module> import MyFeature


@pytest.fixture
def my_feature() -> MyFeature:
    """Configured MyFeature instance."""
    return MyFeature(config=test_config)

Forbidden Patterns

Loops in Test Assertions

# WRONG
def test_items() -> None:
    for item in items:
        assert item.valid

# RIGHT: Use parametrize
@pytest.mark.parametrize("item", items)
def test_item_valid(item: Item) -> None:
    assert item.valid

Conditionals in Assertions

# WRONG
def test_result() -> None:
    if condition:
        assert result == expected_a
    else:
        assert result == expected_b

# RIGHT: Separate tests or parametrize

Modifying Quality Baselines

NEVER add entries to baselines.json without explicit approval. Fix the actual code violation instead.

Using print() for Debugging

# WRONG
def test_something() -> None:
    print(f"Debug: {result}")  # Will be captured, not shown

# RIGHT: Use pytest's capsys or assert messages
def test_something(capsys) -> None:
    ...
    captured = capsys.readouterr()

Key Files Reference

File Purpose
conftest.py Root fixtures (524 lines)
quality/baselines.json Frozen violation baseline
quality/_baseline.py Baseline comparison utilities
quality/_detectors/ AST-based detectors
integration/conftest.py PostgreSQL testcontainers setup
stress/conftest.py Stress test fixtures
fixtures/audio/ Audio sample files

See Also

  • /src/noteflow/CLAUDE.md — Python backend standards
  • /pyproject.toml — pytest configuration
  • /support/ — Test support utilities (non-test code)