Files
noteflow/docs/triage.md
Travis Vasceannie 0c1dbb362f feat: add structured logging for persistence and export operations
- Introduced logging for CRUD operations in repositories to enhance visibility into database interactions.
- Implemented timing logs for BaseRepository and UnitOfWork lifecycle events to track performance.
- Added export logging for size and duration without exposing sensitive content.
- Promoted logging levels for specific operations to improve clarity and reduce noise in logs.
- Established a framework for consistent logging practices across persistence and export functionalities.
2026-01-14 01:18:44 +00:00

15 KiB

Technical Debt Triage

This document tracks known issues, technical debt, and areas needing improvement.


Insufficient Logging - Comprehensive Audit

Discovered: 2025-12-31 Last Verified: 2026-01-03 Sprint Docs:

  • docs/sprints/sprint_logging_gap_remediation_p1.md
  • docs/sprints/sprint_logging_gap_remediation_p2.md

Impact: Caused ~1 hour of debugging when Ollama 120s timeout appeared as migration hang Total Issues Found: 100+


1. Network/External Service Connections

1.1 CRITICAL: Ollama Availability Check - Silent 120s Timeout

File: src/noteflow/infrastructure/summarization/ollama_provider.py:101-115

@property
def is_available(self) -> bool:
    try:
        client = self._get_client()
        client.list()  # Silent 120-second timeout!
        return True
    except (ConnectionError, TimeoutError, ...):
        return False

Status (2026-01-03): Resolved — log_timing added around availability check.

Fix:

@property
def is_available(self) -> bool:
    try:
        logger.info("Checking Ollama availability at %s (timeout: %.0fs)...", self._host, self._timeout)
        client = self._get_client()
        client.list()
        logger.info("Ollama server is available")
        return True
    except TimeoutError:
        logger.warning("Ollama server timeout at %s after %.0fs", self._host, self._timeout)
        return False
    except (ConnectionError, RuntimeError, OSError) as e:
        logger.debug("Ollama server unreachable at %s: %s", self._host, e)
        return False

1.2 Cloud Summarization API Calls - No Request Logging

File: src/noteflow/infrastructure/summarization/cloud_provider.py:238-282

def _call_openai(self, user_prompt: str, system_prompt: str) -> tuple[str, int | None]:
    try:
        response = client.chat.completions.create(...)  # No timing logged
    except TimeoutError as e:
        raise SummarizationTimeoutError(...)  # No duration logged

Status (2026-01-03): Resolved — log_timing wraps OpenAI/Anthropic calls and response logging added.

Fix: Add logger.info("Initiating OpenAI API call: model=%s", self._model) before call, log duration after.


1.3 Google Calendar API - No Request Logging

File: src/noteflow/infrastructure/calendar/google_adapter.py:76-91

async with httpx.AsyncClient() as client:
    response = await client.get(url, params=params, headers=headers)  # No logging

Status (2026-01-03): Resolved — request timing logged via log_timing.

Fix: Log request start, duration, and response status.


1.4 OAuth Token Refresh - Missing Timing

File: src/noteflow/infrastructure/calendar/oauth_manager.py:211-222

async def refresh_tokens(...) -> OAuthTokens:
    response = await client.post(token_url, data=data)  # No timing

Status (2026-01-03): Resolved — refresh timing logged via log_timing.


1.5 Webhook Delivery - Missing Initial Request Log

File: src/noteflow/infrastructure/webhooks/executor.py:107-237

async def deliver(...) -> WebhookDelivery:
    for attempt in range(1, max_retries + 1):
        _logger.debug("Webhook delivery attempt %d/%d", attempt, max_retries)  # DEBUG only

Status (2026-01-03): Resolved — info log at delivery start + completion.


1.6 Database Connection Creation - No Logging

File: src/noteflow/infrastructure/persistence/database.py:85-116

def create_engine_and_session_factory(...):
    engine = sa_create_async_engine(database_url, pool_size=pool_size, ...)
    # No logging of connection parameters

Status (2026-01-03): Resolved — engine creation logged with masked URL.


1.7 Rust gRPC Client Connection - No Tracing

File: client/src-tauri/src/grpc/client/core.rs:174-197

async fn perform_connect(&self) -> Result<ServerInfo> {
    let channel = endpoint.connect().await  // No tracing before/after
        .map_err(|e| Error::Connection(...))?;

Status (2026-01-03): Not implemented — see P1 sprint.


2. Blocking/Long-Running Operations

2.1 NER Service - Silent Model Warmup

File: src/noteflow/application/services/ner_service.py:185-197

await loop.run_in_executor(
    None,
    lambda: self._ner_engine.extract("warm up"),  # No logging
)

Status (2026-01-03): Not implemented — see P1 sprint.


2.2 ASR Transcription - No Duration Logging

File: src/noteflow/infrastructure/asr/engine.py:156-177

async def transcribe_async(...) -> list[AsrResult]:
    return await loop.run_in_executor(None, ...)  # No timing

Status (2026-01-03): Not implemented — see P1 sprint.


2.3 Diarization - Missing Blocking Operation Logging

File: src/noteflow/infrastructure/diarization/engine.py:299-347

def diarize_full(...) -> Sequence[SpeakerTurn]:
    logger.debug("Running offline diarization on %.2fs audio", ...)  # DEBUG only
    annotation = self._offline_pipeline(waveform, ...)  # No end logging

Status (2026-01-03): Resolved — log_timing wraps diarization.


2.4 Diarization Job Timeout - No Pre-Timeout Context

File: src/noteflow/grpc/_mixins/diarization/_jobs.py:173-186

async with asyncio.timeout(DIARIZATION_TIMEOUT_SECONDS):
    updated_count = await self.refine_speaker_diarization(...)
# No logging of timeout value before entering block

Status (2026-01-03): Resolved — timeout value logged in job handler.


3. Error Handling - Silent Failures

3.1 Silent ValueError Returns

Files:

  • src/noteflow/grpc/_mixins/meeting.py:64-67 - workspace UUID parse
  • src/noteflow/grpc/_mixins/converters.py:76-79 - meeting ID parse
  • src/noteflow/grpc/_mixins/diarization/_jobs.py:84-87 - meeting ID validation
  • src/noteflow/infrastructure/triggers/calendar.py:141-144 - datetime parse
try:
    UUID(workspace_id)
except ValueError:
    return None  # Silent failure, no logging

Status (2026-01-03): Not implemented — add WARN + redaction (P1 sprint).


3.2 Silent Settings Fallbacks

Files:

  • src/noteflow/infrastructure/webhooks/executor.py:56-65
  • src/noteflow/infrastructure/summarization/ollama_provider.py:44-48
  • src/noteflow/infrastructure/summarization/cloud_provider.py:48-52
  • src/noteflow/grpc/_mixins/diarization_job.py:63-66
except Exception:
    return DEFAULT_VALUES  # No logging that fallback occurred

Status (2026-01-03): Not implemented — add warning logs (P1 sprint).


3.3 gRPC Client Stub Unavailable - Silent Returns

Files: src/noteflow/grpc/_client_mixins/*.py (multiple locations)

if not self._stub:
    return None  # No logging of connection issue

Status (2026-01-03): Not implemented — add rate-limited warn log (P1 sprint).


4. State Transitions and Lifecycle

4.1 Meeting State Changes Not Logged

Status (2026-01-03): Resolved — meeting service logs transitions.


4.2 Diarization Job State - Missing Previous State

File: src/noteflow/grpc/_mixins/diarization/_jobs.py:147-171

await repo.diarization_jobs.update_status(job_id, JOB_STATUS_RUNNING, ...)
**Status (2026-01-03):** Resolved  state transitions logged.

4.3 Segmenter State Machine - No Transition Logging

File: src/noteflow/infrastructure/asr/segmenter.py:121-127

if is_speech:
    self._state = SegmenterState.SPEECH  # No logging of IDLE -> SPEECH

Status (2026-01-03): Not implemented — see P1 sprint.


4.4 Stream Cleanup - No Logging

File: src/noteflow/grpc/_mixins/streaming/_cleanup.py:14-34

def cleanup_stream_resources(host, meeting_id):
    # Multiple cleanup operations, no completion log
    host._active_streams.discard(meeting_id)

4.5 Diarization Session Close - DEBUG Only

File: src/noteflow/infrastructure/diarization/session.py:145-159

Status (2026-01-03): Not implemented — see P2 sprint.

def close(self) -> None:
    logger.debug("Session %s closed", self.meeting_id)  # Should be INFO

4.6 Background Task Spawning - No Task ID

File: src/noteflow/grpc/_mixins/diarization/_jobs.py:130-132

Status (2026-01-03): Not implemented — see P2 sprint.

task = asyncio.create_task(self._run_diarization_job(job_id, num_speakers))
self._diarization_tasks[job_id] = task  # No logging of task creation

4.7 Audio Flush Thread - No Start/End Logging

File: src/noteflow/infrastructure/audio/writer.py:135-157

Status (2026-01-03): Not implemented — see P2 sprint.

self._flush_thread.start()  # No logging
# ...
def _periodic_flush_loop(self):
    while not self._stop_flush.wait(...):
        # No entry/exit logging for loop

5. Database Operations

5.1 BaseRepository - No Query Timing

File: src/noteflow/infrastructure/persistence/repositories/_base.py

Status (2026-01-03): Not implemented — see P2 sprint.

All methods (_execute_scalar, _execute_scalars, _add_and_flush, _delete_and_flush, _add_all_and_flush, _execute_update, _execute_delete) have no timing or logging.


5.2 Unit of Work - No Transaction Logging

File: src/noteflow/infrastructure/persistence/unit_of_work.py:220-296

Status (2026-01-03): Not implemented — see P2 sprint.


5.3 Repository CRUD Operations - No Logging

Files:

  • meeting_repo.py - create, update, delete, list_all
  • segment_repo.py - add_batch, update_embedding, update_speaker
  • summary_repo.py - save (upsert with cascades)
  • diarization_job_repo.py - create, mark_running_as_failed, prune_completed
  • entity_repo.py - save_batch, delete_by_meeting
  • webhook_repo.py - create, add_delivery
  • integration_repo.py - set_secrets
  • usage_event_repo.py - add_batch, delete_before
  • preferences_repo.py - set_bulk

Status (2026-01-03): Not implemented — see P2 sprint.


6. File System Operations

6.1 Meeting Directory Creation - Not Logged

File: src/noteflow/infrastructure/audio/writer.py:109-111

Status (2026-01-03): Resolved — audio writer open logs meeting and dir.

self._meeting_dir.mkdir(parents=True, exist_ok=True)  # No logging

6.2 Manifest Read/Write - Not Logged

File: src/noteflow/infrastructure/audio/writer.py:122-123

Status (2026-01-03): Partially implemented — open logged, manifest write still unlogged (P2 sprint).

manifest_path.write_text(json.dumps(manifest, indent=2))  # No logging

6.3 Asset Deletion - Silent No-Op

File: src/noteflow/infrastructure/persistence/repositories/asset_repo.py:49-51

Status (2026-01-03): Not implemented — see P2 sprint.

if meeting_dir.exists():
    shutil.rmtree(meeting_dir)
    logger.info("Deleted meeting assets at %s", meeting_dir)
# No log when directory doesn't exist

7. Export Operations

7.1 PDF Export - No Timing

File: src/noteflow/infrastructure/export/pdf.py:161-186

def export(self, meeting, segments) -> bytes:
    pdf_bytes = weasy_html(string=html_content).write_pdf()  # No timing
    return pdf_bytes

Status (2026-01-03): Not implemented — see P2 sprint.


7.2 Markdown/HTML Export - No Logging

Files: markdown.py:37-89, html.py:158-187

Status (2026-01-03): Not implemented — see P2 sprint.

No logging of export operations.


8. Initialization Sequences

8.1 Lazy Model Loading - Not Logged at Load Time

Files:

  • NerEngine._ensure_loaded() - spaCy model load
  • DiarizationEngine - pyannote model load
  • OllamaSummarizer._get_client() - client creation

Status (2026-01-03): Partially implemented — some model loads logged, NER warmup not logged (P1 sprint).


8.2 Singleton Creation - Silent

File: src/noteflow/infrastructure/metrics/collector.py:168-178

Status (2026-01-03): Not implemented — out of P1/P2 scope unless needed.

def get_metrics_collector() -> MetricsCollector:
    global _metrics_collector
    if _metrics_collector is None:
        _metrics_collector = MetricsCollector()  # No logging
    return _metrics_collector

8.3 Provider Registration - DEBUG Level

File: src/noteflow/application/services/summarization_service.py:119-127

Status (2026-01-03): Partially implemented — still debug in factory registration; consider if INFO needed.

def register_provider(self, mode, provider):
    logger.debug("Registered %s provider", mode.value)  # Should be INFO at startup

Summary Statistics

Category Issue Count Severity
Network/External Services 7 CRITICAL (mostly resolved)
Blocking/Long-Running 4 HIGH (partially unresolved)
Error Handling 10+ HIGH (partially unresolved)
State Transitions 7 MEDIUM (partially unresolved)
Database Operations 30+ MEDIUM (unresolved)
File System 3 LOW (partially unresolved)
Export 3 LOW (unresolved)
Initialization 5 MEDIUM (partially unresolved)
Total 100+ -

For all async/blocking operations:

logger.info("Starting <operation>: context=%s", context)
start = time.perf_counter()
try:
    result = await some_operation()
    elapsed_ms = (time.perf_counter() - start) * 1000
    logger.info("<Operation> completed: result_count=%d, duration_ms=%.2f", len(result), elapsed_ms)
except TimeoutError:
    elapsed_ms = (time.perf_counter() - start) * 1000
    logger.error("<Operation> timeout after %.2fms", elapsed_ms)
    raise
except Exception as e:
    elapsed_ms = (time.perf_counter() - start) * 1000
    logger.error("<Operation> failed after %.2fms: %s", elapsed_ms, e)
    raise

Priority Fixes

P0 - Fix Immediately

  1. (Resolved) Ollama is_available timeout logging
  2. (Resolved) Summarization factory timing
  3. (Resolved) Database migration progress logging

P1 - Fix This Sprint

  1. (Resolved) All external HTTP calls (calendar, OAuth, webhooks)
  2. All run_in_executor calls (ASR, NER, diarization)
  3. Silent ValueError returns

P2 - Fix Next Sprint

  1. Repository CRUD logging
  2. State transition logging (segmenter + diarization session)
  3. Background task lifecycle logging

Resolved Issues

  • Server-side state volatility → Diarization jobs persisted to DB
  • Hardcoded directory pathsasset_path column added to meetings
  • Synchronous blocking in async gRPCrun_in_executor for diarization
  • Summarization consent not persisted → Stored in user_preferences table
  • VU meter update throttling → 20fps throttle implemented
  • Webhook infrastructure missing → Full webhook subsystem implemented
  • Integration/OAuth token storageIntegrationSecretModel for secure storage