chore: refactor docker-compose configuration and enhance logging

- Renamed the PostgreSQL service to 'db' for clarity in the docker-compose file.
- Updated the database connection URL to reflect the new service name.
- Added 'extra_hosts' configuration to facilitate communication with the host from within the containers.
- Introduced logging enhancements in the Ollama provider to improve visibility during configuration loading and client creation.

All quality checks pass.
This commit is contained in:
2025-12-31 06:41:33 -05:00
parent e23c0555e2
commit bbc88ed10b
3 changed files with 550 additions and 5 deletions

View File

@@ -2,7 +2,7 @@ services:
# =============================================================================
# Infrastructure Services
# =============================================================================
postgres:
db:
container_name: noteflow-postgres
image: pgvector/pgvector:pg15
restart: unless-stopped
@@ -88,16 +88,18 @@ services:
restart: unless-stopped
ports:
- "50051:50051"
extra_hosts:
- "host.docker.internal:host-gateway"
env_file:
- .env
environment:
NOTEFLOW_DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-noteflow}:${POSTGRES_PASSWORD:-noteflow}@postgres:5432/${POSTGRES_DB:-noteflow}
NOTEFLOW_DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-noteflow}:${POSTGRES_PASSWORD:-noteflow}@db:5432/${POSTGRES_DB:-noteflow}
NOTEFLOW_REDIS_URL: redis://redis:6379/0
NOTEFLOW_QDRANT_URL: http://qdrant:6333
volumes:
- .:/workspace
depends_on:
postgres:
db:
condition: service_healthy
redis:
condition: service_healthy
@@ -118,16 +120,18 @@ services:
restart: unless-stopped
ports:
- "50051:50051"
extra_hosts:
- "host.docker.internal:host-gateway"
env_file:
- .env
environment:
NOTEFLOW_DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-noteflow}:${POSTGRES_PASSWORD:-noteflow}@postgres:5432/${POSTGRES_DB:-noteflow}
NOTEFLOW_DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-noteflow}:${POSTGRES_PASSWORD:-noteflow}@db:5432/${POSTGRES_DB:-noteflow}
NOTEFLOW_REDIS_URL: redis://redis:6379/0
NOTEFLOW_QDRANT_URL: http://qdrant:6333
volumes:
- .:/workspace
depends_on:
postgres:
db:
condition: service_healthy
redis:
condition: service_healthy

View File

@@ -0,0 +1,532 @@
# Technical Debt Triage
This document tracks known issues, technical debt, and areas needing improvement.
---
## Insufficient Logging - Comprehensive Audit
**Discovered:** 2025-12-31
**Impact:** Caused ~1 hour of debugging when Ollama 120s timeout appeared as migration hang
**Total Issues Found:** 100+
---
## 1. Network/External Service Connections
### 1.1 CRITICAL: Ollama Availability Check - Silent 120s Timeout
**File:** `src/noteflow/infrastructure/summarization/ollama_provider.py:101-115`
```python
@property
def is_available(self) -> bool:
try:
client = self._get_client()
client.list() # Silent 120-second timeout!
return True
except (ConnectionError, TimeoutError, ...):
return False
```
**Problem:** No logging before/during availability check. Default timeout 120s causes invisible hangs.
**Fix:**
```python
@property
def is_available(self) -> bool:
try:
logger.info("Checking Ollama availability at %s (timeout: %.0fs)...", self._host, self._timeout)
client = self._get_client()
client.list()
logger.info("Ollama server is available")
return True
except TimeoutError:
logger.warning("Ollama server timeout at %s after %.0fs", self._host, self._timeout)
return False
except (ConnectionError, RuntimeError, OSError) as e:
logger.debug("Ollama server unreachable at %s: %s", self._host, e)
return False
```
---
### 1.2 Cloud Summarization API Calls - No Request Logging
**File:** `src/noteflow/infrastructure/summarization/cloud_provider.py:238-282`
```python
def _call_openai(self, user_prompt: str, system_prompt: str) -> tuple[str, int | None]:
try:
response = client.chat.completions.create(...) # No timing logged
except TimeoutError as e:
raise SummarizationTimeoutError(...) # No duration logged
```
**Problem:** No logging before API call, no timing, no model/URL context.
**Fix:** Add `logger.info("Initiating OpenAI API call: model=%s", self._model)` before call, log duration after.
---
### 1.3 Google Calendar API - No Request Logging
**File:** `src/noteflow/infrastructure/calendar/google_adapter.py:76-91`
```python
async with httpx.AsyncClient() as client:
response = await client.get(url, params=params, headers=headers) # No logging
```
**Problem:** HTTP request made without logging URL, timing, or response status.
**Fix:** Log request start, duration, and response status.
---
### 1.4 OAuth Token Refresh - Missing Timing
**File:** `src/noteflow/infrastructure/calendar/oauth_manager.py:211-222`
```python
async def refresh_tokens(...) -> OAuthTokens:
response = await client.post(token_url, data=data) # No timing
```
**Problem:** Token refresh (network call) has no timing info.
---
### 1.5 Webhook Delivery - Missing Initial Request Log
**File:** `src/noteflow/infrastructure/webhooks/executor.py:107-237`
```python
async def deliver(...) -> WebhookDelivery:
for attempt in range(1, max_retries + 1):
_logger.debug("Webhook delivery attempt %d/%d", attempt, max_retries) # DEBUG only
```
**Problem:** Only debug-level for attempts. No INFO log at delivery start.
---
### 1.6 Database Connection Creation - No Logging
**File:** `src/noteflow/infrastructure/persistence/database.py:85-116`
```python
def create_engine_and_session_factory(...):
engine = sa_create_async_engine(database_url, pool_size=pool_size, ...)
# No logging of connection parameters
```
**Problem:** Database engine creation not logged with connection details.
---
### 1.7 Rust gRPC Client Connection - No Tracing
**File:** `client/src-tauri/src/grpc/client/core.rs:174-197`
```rust
async fn perform_connect(&self) -> Result<ServerInfo> {
let channel = endpoint.connect().await // No tracing before/after
.map_err(|e| Error::Connection(...))?;
```
**Problem:** No tracing before connection attempt or on success with duration.
---
## 2. Blocking/Long-Running Operations
### 2.1 NER Service - Silent Model Warmup
**File:** `src/noteflow/application/services/ner_service.py:185-197`
```python
await loop.run_in_executor(
None,
lambda: self._ner_engine.extract("warm up"), # No logging
)
```
**Problem:** Blocking model warmup in executor without timing.
---
### 2.2 ASR Transcription - No Duration Logging
**File:** `src/noteflow/infrastructure/asr/engine.py:156-177`
```python
async def transcribe_async(...) -> list[AsrResult]:
return await loop.run_in_executor(None, ...) # No timing
```
**Problem:** Transcription duration and real-time factor not logged.
---
### 2.3 Diarization - Missing Blocking Operation Logging
**File:** `src/noteflow/infrastructure/diarization/engine.py:299-347`
```python
def diarize_full(...) -> Sequence[SpeakerTurn]:
logger.debug("Running offline diarization on %.2fs audio", ...) # DEBUG only
annotation = self._offline_pipeline(waveform, ...) # No end logging
```
**Problem:** Multi-minute operation only has DEBUG log, no completion timing.
---
### 2.4 Diarization Job Timeout - No Pre-Timeout Context
**File:** `src/noteflow/grpc/_mixins/diarization/_jobs.py:173-186`
```python
async with asyncio.timeout(DIARIZATION_TIMEOUT_SECONDS):
updated_count = await self.refine_speaker_diarization(...)
# No logging of timeout value before entering block
```
**Problem:** When timeout occurs, no record of configured timeout value.
---
## 3. Error Handling - Silent Failures
### 3.1 Silent ValueError Returns
**Files:**
- `src/noteflow/grpc/_mixins/meeting.py:64-67` - workspace UUID parse
- `src/noteflow/grpc/_mixins/converters.py:76-79` - meeting ID parse
- `src/noteflow/grpc/_mixins/diarization/_jobs.py:84-87` - meeting ID validation
- `src/noteflow/infrastructure/triggers/calendar.py:141-144` - datetime parse
```python
try:
UUID(workspace_id)
except ValueError:
return None # Silent failure, no logging
```
**Problem:** Invalid input silently returns None without any trace.
---
### 3.2 Silent Settings Fallbacks
**Files:**
- `src/noteflow/infrastructure/webhooks/executor.py:56-65`
- `src/noteflow/infrastructure/summarization/ollama_provider.py:44-48`
- `src/noteflow/infrastructure/summarization/cloud_provider.py:48-52`
- `src/noteflow/grpc/_mixins/diarization_job.py:63-66`
```python
except Exception:
return DEFAULT_VALUES # No logging that fallback occurred
```
**Problem:** Settings load failures silently fall back to defaults.
---
### 3.3 gRPC Client Stub Unavailable - Silent Returns
**Files:** `src/noteflow/grpc/_client_mixins/*.py` (multiple locations)
```python
if not self._stub:
return None # No logging of connection issue
```
**Problem:** When gRPC stub unavailable, methods silently return None.
---
## 4. State Transitions and Lifecycle
### 4.1 Meeting State Changes Not Logged
**Problem:** When meetings transition between states (CREATED → RECORDING → STOPPED → COMPLETED), no structured log shows the transition with old/new state.
---
### 4.2 Diarization Job State - Missing Previous State
**File:** `src/noteflow/grpc/_mixins/diarization/_jobs.py:147-171`
```python
await repo.diarization_jobs.update_status(job_id, JOB_STATUS_RUNNING, ...)
# Doesn't log previous state
```
---
### 4.3 Segmenter State Machine - No Transition Logging
**File:** `src/noteflow/infrastructure/asr/segmenter.py:121-127`
```python
if is_speech:
self._state = SegmenterState.SPEECH # No logging of IDLE -> SPEECH
```
**Problem:** VAD state machine transitions are silent.
---
### 4.4 Stream Cleanup - No Logging
**File:** `src/noteflow/grpc/_mixins/streaming/_cleanup.py:14-34`
```python
def cleanup_stream_resources(host, meeting_id):
# Multiple cleanup operations, no completion log
host._active_streams.discard(meeting_id)
```
---
### 4.5 Diarization Session Close - DEBUG Only
**File:** `src/noteflow/infrastructure/diarization/session.py:145-159`
```python
def close(self) -> None:
logger.debug("Session %s closed", self.meeting_id) # Should be INFO
```
---
### 4.6 Background Task Spawning - No Task ID
**File:** `src/noteflow/grpc/_mixins/diarization/_jobs.py:130-132`
```python
task = asyncio.create_task(self._run_diarization_job(job_id, num_speakers))
self._diarization_tasks[job_id] = task # No logging of task creation
```
---
### 4.7 Audio Flush Thread - No Start/End Logging
**File:** `src/noteflow/infrastructure/audio/writer.py:135-157`
```python
self._flush_thread.start() # No logging
# ...
def _periodic_flush_loop(self):
while not self._stop_flush.wait(...):
# No entry/exit logging for loop
```
---
## 5. Database Operations
### 5.1 BaseRepository - No Query Timing
**File:** `src/noteflow/infrastructure/persistence/repositories/_base.py`
All methods (`_execute_scalar`, `_execute_scalars`, `_add_and_flush`, `_delete_and_flush`, `_add_all_and_flush`, `_execute_update`, `_execute_delete`) have no timing or logging.
---
### 5.2 Unit of Work - No Transaction Logging
**File:** `src/noteflow/infrastructure/persistence/unit_of_work.py:220-296`
- `__aenter__`: No session creation log
- `__aexit__`: No rollback logging on exception
- `commit()`: No commit log
- `rollback()`: No rollback log
---
### 5.3 Repository CRUD Operations - No Logging
**Files:**
- `meeting_repo.py` - create, update, delete, list_all
- `segment_repo.py` - add_batch, update_embedding, update_speaker
- `summary_repo.py` - save (upsert with cascades)
- `diarization_job_repo.py` - create, mark_running_as_failed, prune_completed
- `entity_repo.py` - save_batch, delete_by_meeting
- `webhook_repo.py` - create, add_delivery
- `integration_repo.py` - set_secrets
- `usage_event_repo.py` - add_batch, delete_before
- `preferences_repo.py` - set_bulk
**Problem:** Most repository operations have no logging of what was created/updated/deleted.
---
## 6. File System Operations
### 6.1 Meeting Directory Creation - Not Logged
**File:** `src/noteflow/infrastructure/audio/writer.py:109-111`
```python
self._meeting_dir.mkdir(parents=True, exist_ok=True) # No logging
```
---
### 6.2 Manifest Read/Write - Not Logged
**File:** `src/noteflow/infrastructure/audio/writer.py:122-123`
```python
manifest_path.write_text(json.dumps(manifest, indent=2)) # No logging
```
---
### 6.3 Asset Deletion - Silent No-Op
**File:** `src/noteflow/infrastructure/persistence/repositories/asset_repo.py:49-51`
```python
if meeting_dir.exists():
shutil.rmtree(meeting_dir)
logger.info("Deleted meeting assets at %s", meeting_dir)
# No log when directory doesn't exist
```
---
## 7. Export Operations
### 7.1 PDF Export - No Timing
**File:** `src/noteflow/infrastructure/export/pdf.py:161-186`
```python
def export(self, meeting, segments) -> bytes:
pdf_bytes = weasy_html(string=html_content).write_pdf() # No timing
return pdf_bytes
```
**Problem:** PDF generation can be slow; no timing or size logged.
---
### 7.2 Markdown/HTML Export - No Logging
**Files:** `markdown.py:37-89`, `html.py:158-187`
No logging of export operations.
---
## 8. Initialization Sequences
### 8.1 Lazy Model Loading - Not Logged at Load Time
**Files:**
- `NerEngine._ensure_loaded()` - spaCy model load
- `DiarizationEngine` - pyannote model load
- `OllamaSummarizer._get_client()` - client creation
**Problem:** Lazy initialization happens without logging.
---
### 8.2 Singleton Creation - Silent
**File:** `src/noteflow/infrastructure/metrics/collector.py:168-178`
```python
def get_metrics_collector() -> MetricsCollector:
global _metrics_collector
if _metrics_collector is None:
_metrics_collector = MetricsCollector() # No logging
return _metrics_collector
```
---
### 8.3 Provider Registration - DEBUG Level
**File:** `src/noteflow/application/services/summarization_service.py:119-127`
```python
def register_provider(self, mode, provider):
logger.debug("Registered %s provider", mode.value) # Should be INFO at startup
```
---
## Summary Statistics
| Category | Issue Count | Severity |
|----------|-------------|----------|
| Network/External Services | 7 | CRITICAL |
| Blocking/Long-Running | 4 | HIGH |
| Error Handling | 10+ | HIGH |
| State Transitions | 7 | MEDIUM |
| Database Operations | 30+ | MEDIUM |
| File System | 3 | LOW |
| Export | 3 | LOW |
| Initialization | 5 | MEDIUM |
| **Total** | **100+** | - |
---
## Recommended Logging Pattern
For all async/blocking operations:
```python
logger.info("Starting <operation>: context=%s", context)
start = time.perf_counter()
try:
result = await some_operation()
elapsed_ms = (time.perf_counter() - start) * 1000
logger.info("<Operation> completed: result_count=%d, duration_ms=%.2f", len(result), elapsed_ms)
except TimeoutError:
elapsed_ms = (time.perf_counter() - start) * 1000
logger.error("<Operation> timeout after %.2fms", elapsed_ms)
raise
except Exception as e:
elapsed_ms = (time.perf_counter() - start) * 1000
logger.error("<Operation> failed after %.2fms: %s", elapsed_ms, e)
raise
```
---
## Priority Fixes
### P0 - Fix Immediately
1. Ollama `is_available` timeout logging
2. Summarization factory timing
3. Database migration progress logging
### P1 - Fix This Sprint
4. All external HTTP calls (calendar, OAuth, webhooks)
5. All `run_in_executor` calls (ASR, NER, diarization)
6. Silent ValueError returns
### P2 - Fix Next Sprint
7. Repository CRUD logging
8. State transition logging
9. Background task lifecycle logging
---
## Resolved Issues
- ~~Server-side state volatility~~ → Diarization jobs persisted to DB
- ~~Hardcoded directory paths~~ → `asset_path` column added to meetings
- ~~Synchronous blocking in async gRPC~~ → `run_in_executor` for diarization
- ~~Summarization consent not persisted~~ → Stored in `user_preferences` table
- ~~VU meter update throttling~~ → 20fps throttle implemented
- ~~Webhook infrastructure missing~~ → Full webhook subsystem implemented
- ~~Integration/OAuth token storage~~ → `IntegrationSecretModel` for secure storage

View File

@@ -3,12 +3,15 @@
from __future__ import annotations
import asyncio
import logging
import os
import time
from datetime import UTC, datetime
from typing import TYPE_CHECKING
from noteflow.domain.entities import Summary
logger = logging.getLogger(__name__)
from noteflow.domain.summarization import (
InvalidResponseError,
ProviderUnavailableError,
@@ -45,6 +48,10 @@ def _get_ollama_settings() -> tuple[str, float, float]:
# - Settings may fail to load in unit tests without full config
# - Use sensible defaults for local Ollama server
except Exception:
logger.warning(
"Failed to load Ollama settings, using defaults: "
"host=http://localhost:11434, timeout=120s, temperature=0.3"
)
return ("http://localhost:11434", 120.0, 0.3)
@@ -85,8 +92,10 @@ class OllamaSummarizer:
try:
import ollama
logger.debug("Creating Ollama client for host=%s", self._host)
self._client = ollama.Client(host=self._host)
except ImportError as e:
logger.error("Ollama package not installed")
raise ProviderUnavailableError(
"ollama package not installed. Install with: pip install ollama"
) from e