refactor: enhance audio capture and playback handling with optional sounddevice support

- Refactored audio capture and playback components to improve handling of the optional `sounddevice` module, allowing for better compatibility in environments lacking PortAudio.
- Introduced a new `sounddevice_support.py` module to centralize the management of sounddevice interactions and error handling.
- Updated the `SoundDeviceCapture` and `SoundDevicePlayback` classes to utilize the new support module, ensuring consistent stream management and error logging.
- Improved the `generate_dek` method in the crypto module to enforce key size validation, enhancing security and reliability.
- Enhanced the key file existence check in the keystore to include validation of the key content, improving robustness against invalid key files.
This commit is contained in:
2026-01-03 12:50:00 +00:00
parent c8bcc0b609
commit 832a691dd7
14 changed files with 1167 additions and 68 deletions

View File

@@ -13,12 +13,18 @@ Analysis of the gRPC API contracts, state management, and streaming operations r
| [SPRINT-GAP-003](./sprint-gap-003-error-handling.md) | Error Handling Mismatches | Medium | M |
| [SPRINT-GAP-004](./sprint-gap-004-diarization-lifecycle.md) | Diarization Job Lifecycle | Medium | S |
| [SPRINT-GAP-005](./sprint-gap-005-entity-resource-leak.md) | Entity Mixin Resource Leak | High | S |
| [SPRINT-GAP-006](./sprint-gap-006-connection-bootstrapping.md) | Connection Bootstrapping | High | M |
| [SPRINT-GAP-007](./sprint-gap-007-simulation-mode-clarity.md) | Simulation Mode Clarity | Medium | S |
| [SPRINT-GAP-008](./sprint-gap-008-server-address-consistency.md) | Server Address Consistency | Medium | M |
| [SPRINT-GAP-009](./sprint-gap-009-event-bridge-contracts.md) | Event Bridge Contracts | Medium | S |
| [SPRINT-GAP-010](./sprint-gap-010-identity-and-rpc-logging.md) | Identity + RPC Logging | Medium | M |
## Priority Matrix
### Critical (Address Immediately)
- **SPRINT-GAP-001**: Audio streaming fire-and-forget can cause data loss
- **SPRINT-GAP-005**: Entity mixin context manager misuse causes resource leaks
- **SPRINT-GAP-006**: Recording can start without a connected gRPC client
### High Priority
- **SPRINT-GAP-002**: Meeting cache invalidation prevents stale data
@@ -26,6 +32,10 @@ Analysis of the gRPC API contracts, state management, and streaming operations r
### Medium Priority
- **SPRINT-GAP-004**: Diarization polling resilience improvements
- **SPRINT-GAP-007**: Simulated paths need explicit UX and safety rails
- **SPRINT-GAP-008**: Default server addressing can be mis-pointed in Docker setups
- **SPRINT-GAP-009**: Event bridge should initialize before connection
- **SPRINT-GAP-010**: Identity metadata and per-RPC logging not wired
## Cross-Cutting Concerns

View File

@@ -0,0 +1,127 @@
# Sprint GAP-006: Connection Bootstrapping and Streaming Guardrails
> **Size**: M | **Owner**: Client (Rust) + Frontend (TypeScript) | **Prerequisites**: None
> **Phase**: Gaps - Wiring Reliability
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verified in code; implementation scope needs owner confirmation.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Behavior when `start_recording` is invoked while disconnected | Pending | Decide auto-connect vs hard fail with UX prompt |
| **B2** | Source of server URL for auto-connect in Rust | Pending | Use cached preferences or last successful URL |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | `start_recording` does not guarantee an active gRPC connection | Add auto-connect or preflight check |
| G2 | UI does not explicitly force connect before starting streaming | Add guarded connect attempt in recording flow |
### Prerequisite Verification
| Prerequisite | Status | Notes |
|--------------|--------|-------|
| Tauri connect command exists | ✅ | `client/src-tauri/src/commands/connection.rs` |
| Streaming setup requires connected client | ✅ | `client/src-tauri/src/grpc/streaming.rs` |
---
## Validation Status (2026-01-03)
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Rust auto-connect in `start_recording` | Not implemented | `start_recording` calls `start_streaming` without ensuring connection |
| Frontend preflight connect on recording start | Not implemented | Recording flow assumes connection state; no explicit connect call |
### PARTIALLY IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| App bootstrap connect attempt | Partial | `initializeAPI()` attempts `connect`, but failures fall back to cached mode |
**Downstream impact**: Recording can fail locally with `NotConnected`, and the backend sees no traffic, producing no logs.
---
## Objective
Ensure recording/streaming always attempts a gRPC connection before transmitting audio, and provide clear UX for connection failures so the backend is reached when expected.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Auto-connect in Rust | Yes, if disconnected | Prevents silent failures when UI misses connect |
| UI preflight | Attempt connect on record start when disconnected | Keeps behavior explicit and surfaces errors to users |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| `start_recording` command | `client/src-tauri/src/commands/recording/mod.rs` | Entry point to hook auto-connect |
| `connect` command + events | `client/src-tauri/src/commands/connection.rs` | Reusable connect logic + error emission |
| Stream manager uses `get_client()` | `client/src-tauri/src/grpc/streaming.rs` | Fails fast when disconnected |
| Client `get_client()` returns `NotConnected` | `client/src-tauri/src/grpc/client/core.rs` | Root cause of silent failure |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Client Layer (Rust)** | | |
| Auto-connect in `start_recording` when disconnected | M | Use stored server URL or default config; reuse connect path |
| Emit clear error when auto-connect fails | S | Reuse existing error event payloads |
| **Client Layer (TypeScript)** | | |
| Preflight connect when recording starts and not connected | S | Use existing API `connect` and toast on failure |
| Surface server URL origin (preferences vs default) | S | Include in error copy for debugging |
**Total Effort**: M (2-4 hours)
---
## Deliverables
### Client
- [ ] `client/src-tauri/src/commands/recording/mod.rs` — auto-connect before streaming
- [ ] `client/src-tauri/src/grpc/client/core.rs` — helper to expose last server URL for auto-connect
- [ ] `client/src/pages/Recording.tsx` — preflight connect and better failure messaging
---
## Test Strategy
### Core test cases
- **Rust**: unit test that `start_recording` attempts connect when disconnected
- **TS**: recording start triggers `connect` when `isConnected` is false
- **TS**: failure path shows offline toast and does not start stream
---
## Quality Gates
- [ ] No silent `NotConnected` failures in recording path
- [ ] Connection failure emits `ERROR` event with category and retryable flag
- [ ] `npm run test` passes for recording flow tests
- [ ] `npm run lint:rs` passes
---
## Post-Sprint
- [ ] Consider auto-retry policy for transient connection failures
- [ ] Add connection metrics (attempt count, time to connect)

View File

@@ -0,0 +1,117 @@
# Sprint GAP-007: Simulation Mode Clarity and Safety Rails
> **Size**: S | **Owner**: Frontend (TypeScript) | **Prerequisites**: None
> **Phase**: Gaps - Mode Transparency
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verified in code; scope depends on product intent for simulation in Tauri.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Should simulation be allowed in Tauri production builds? | Pending | Decide if developer-only or always available |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | API mode (mock vs tauri vs cached) is not obvious in UI | Add global mode indicator + console log |
| G2 | Simulation can bypass backend without explicit user acknowledgement | Require explicit toggle confirmation and persistent badge |
---
## Validation Status (2026-01-03)
### PARTIALLY IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Simulation default off | Implemented | `simulate_transcription` defaults to `false` |
| Recording simulation path | Implemented | Recording uses `mockAPI` when simulate + disconnected |
| Simulation badge | Implemented | `SimulationIndicator` renders when enabled |
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Global API mode indicator | Not implemented | No universal badge for mock/cached/tauri |
| Explicit confirmation on enabling simulation | Not implemented | Toggle is immediate |
**Downstream impact**: Users can record in simulated mode and expect backend activity, causing confusion when backend logs show nothing.
---
## Objective
Make simulation and API modes unambiguous so users understand when the backend is bypassed, preventing false expectations about backend activity.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Simulation availability | Developer-only in production builds | Reduces accidental backend bypass |
| Mode visibility | Global badge + toast on toggle | Clear, persistent feedback |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| Default `simulate_transcription` | `client/src/lib/preferences.ts` | Starts disabled |
| Simulation indicator | `client/src/components/simulation-indicator.tsx` | Reuse as global badge |
| Recording simulation logic | `client/src/pages/Recording.tsx` | Hook point for warning and gating |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Client Layer (TypeScript)** | | |
| Add global API mode badge (mock/cached/tauri) | S | Reuse connection state + indicator styling |
| Add explicit confirmation when enabling simulation | S | Modal or toast with confirm |
| Log active adapter on init | S | Console + debug overlay for Tauri |
**Total Effort**: S (1-2 hours)
---
## Deliverables
### Client
- [ ] `client/src/contexts/connection-context.tsx` — expose adapter/mode to UI
- [ ] `client/src/components/` — global mode badge component
- [ ] `client/src/pages/Settings.tsx` — confirmation for simulation toggle
- [ ] `client/src/pages/Recording.tsx` — include mode badge in recording header
---
## Test Strategy
### Core test cases
- **UI**: mode badge reflects mock/cached/tauri states
- **UI**: enabling simulation requires confirmation and persists badge
- **UI**: recording view shows simulation status when active
---
## Quality Gates
- [ ] Simulation state is visible on all recording screens
- [ ] Simulation toggle requires explicit confirmation
- [ ] `npm run test` passes for affected UI tests
---
## Post-Sprint
- [ ] Add telemetry event for simulation toggle usage

View File

@@ -0,0 +1,125 @@
# Sprint GAP-008: Server Address Consistency and Docker Reachability
> **Size**: M | **Owner**: Backend (Python) + Client (Rust) + Frontend (TypeScript) | **Prerequisites**: None
> **Phase**: Gaps - Connectivity
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verified in code; need decision on default bind address in Docker.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Default server bind address (`[::]` vs `0.0.0.0`) | Pending | Decide cross-platform Docker default |
| **B2** | Preferred client default host (`localhost` vs `127.0.0.1`) | Pending | Decide IPv4-first strategy for desktop client |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Defaults are scattered across Python, Rust, and TS | Centralize and document source-of-truth |
| G2 | UI does not show server URL origin (env vs prefs) | Surface full URL and source |
---
## Validation Status (2026-01-03)
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Unified server address defaults | Not implemented | Defaults exist in TS + Rust separately |
| UI display of effective server URL | Not implemented | No UI indicator of current URL or source |
### PARTIALLY IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Backend bind address | Partial | Server binds to `[::]:port`, configurable port only |
| Client env override | Implemented | Rust reads `NOTEFLOW_SERVER_ADDRESS` |
**Downstream impact**: Desktop clients can connect to a different address than expected in Docker setups, and failures are hard to diagnose.
---
## Objective
Make server address behavior explicit and consistent across Python, Rust, and TypeScript layers, and provide UI visibility into the effective URL and its source.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Server bind default | `0.0.0.0` in Docker or configurable via env | Improves IPv4 port publishing reliability |
| Client default host | Prefer `127.0.0.1` for desktop | Avoid IPv6 localhost ambiguity |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| Backend default port | `src/noteflow/config/settings/_main.py` | Port is consistent at 50051 |
| Backend bind address | `src/noteflow/grpc/server.py` | Currently `[::]:port` |
| Rust default address + env | `client/src-tauri/src/config.rs` | Uses `NOTEFLOW_SERVER_ADDRESS` |
| TS defaults | `client/src/lib/config/server.ts` | Uses `localhost:50051` |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Backend (Python)** | | |
| Add configurable bind address (env + config) | M | Default to `0.0.0.0` when running in Docker |
| **Client (Rust)** | | |
| Align default address with TS defaults | S | Prefer IPv4 loopback |
| **Frontend (TypeScript)** | | |
| Show effective server URL and source | S | Preferences vs env vs default |
| Document connection behavior in settings UI | S | Include Docker port mapping reminder |
**Total Effort**: M (2-4 hours)
---
## Deliverables
### Backend
- [ ] `src/noteflow/grpc/server.py` — bind address configurable via settings/env
- [ ] `src/noteflow/config/settings.py` — add bind address setting
### Client
- [ ] `client/src-tauri/src/config.rs` — align defaults, expose effective URL
- [ ] `client/src/lib/config/server.ts` — update defaults and docs
- [ ] `client/src/pages/Settings.tsx` — show effective URL + source
---
## Test Strategy
### Core test cases
- **Python**: server binds to configured address (unit test with config)
- **Rust**: default server URL matches TS defaults
- **TS**: settings UI displays URL origin accurately
---
## Quality Gates
- [ ] Defaults aligned across Python, Rust, and TypeScript
- [ ] Effective URL visible in UI
- [ ] Docs mention Docker port publish requirement
---
## Post-Sprint
- [ ] Add connection troubleshooting checklist to docs/observability.md

View File

@@ -0,0 +1,117 @@
# Sprint GAP-009: Event Bridge Initialization and Contract Guarantees
> **Size**: S | **Owner**: Frontend (TypeScript) + Client (Rust) | **Prerequisites**: None
> **Phase**: Gaps - Event Wiring
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verified in code; needs confirmation on desired bridge startup timing.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Should event bridge start before connection? | Pending | Recommend yes to capture early events |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Event bridge starts only after successful connect | Initialize on app boot for Tauri |
| G2 | Event name sources are split across TS + Rust | Enforce single canonical source and tests |
---
## Validation Status (2026-01-03)
### PARTIALLY IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Event names centralized | Implemented | Rust uses `event_names`, TS uses `TauriEvents` |
| Event bridge | Partial | Started after connect in `initializeAPI` |
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Early event bridge init | Not implemented | Disconnected mode skips bridge startup |
| Contract cross-check tests | Not implemented | No explicit TS<->Rust contract validation |
**Downstream impact**: Early connection/error events may be missed or not forwarded to the frontend when connection fails.
---
## Objective
Ensure the Tauri event bridge is always initialized in desktop mode and keep event names in sync across Rust and TypeScript.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Bridge initialization timing | At app boot in Tauri | Guarantees early events are captured |
| Contract validation | Add a TS test to validate event list | Prevent silent drift |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| Rust event names | `client/src-tauri/src/events/mod.rs` | Canonical Rust constants |
| TS event names | `client/src/api/tauri-constants.ts` | Canonical TS constants |
| Event bridge implementation | `client/src/lib/tauri-events.ts` | Hook point for init timing |
| API init flow | `client/src/api/index.ts` | Controls when bridge starts |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Client Layer (TypeScript)** | | |
| Start event bridge during app boot in Tauri mode | S | Move `startTauriEventBridge` earlier |
| Add contract test for event name parity | S | Compare TS constants against expected list |
| **Client Layer (Rust)** | | |
| Add doc comment that event names are canonical | S | Small clarity change |
**Total Effort**: S (1-2 hours)
---
## Deliverables
### Client
- [ ] `client/src/api/index.ts` — initialize event bridge before connect
- [ ] `client/src/lib/tauri-events.ts` — guard against double-init
- [ ] `client/src/api/tauri-constants.ts` — add test coverage for event list
- [ ] `client/src-tauri/src/events/mod.rs` — document canonical event names
---
## Test Strategy
### Core test cases
- **TS**: event bridge initializes in Tauri mode even when disconnected
- **TS**: event list contract test fails if names drift
---
## Quality Gates
- [ ] Event bridge runs before first connection attempt
- [ ] Event names remain aligned across TS and Rust
- [ ] `npm run test` passes
---
## Post-Sprint
- [ ] Consider generating TS constants from Rust event list

View File

@@ -0,0 +1,118 @@
# Sprint GAP-010: Identity Metadata and Per-RPC Logging
> **Size**: M | **Owner**: Backend (Python) + Client (Rust) | **Prerequisites**: None
> **Phase**: Gaps - Observability and Identity
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verified in code; requires decision on identity metadata requirements.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Is identity metadata required for all RPCs? | Pending | Decide if headers are optional or mandatory |
| **B2** | Where to source user/workspace IDs in Tauri client | Pending | Use local identity commands or preferences |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Identity interceptor exists but is not registered | Add to gRPC server interceptors |
| G2 | Tauri client does not attach identity metadata | Add tonic interceptor |
| G3 | No per-RPC logging | Add server-side logging interceptor |
---
## Validation Status (2026-01-03)
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Server identity interceptor wiring | Not implemented | Interceptor defined but unused |
| Client metadata injection | Not implemented | Tonic interceptor not configured |
| Per-RPC logging | Not implemented | Server lacks request logging interceptor |
**Downstream impact**: Request logs lack user/workspace context, and backend activity can appear invisible when service methods do not log.
---
## Objective
Provide consistent identity metadata across RPCs and ensure every request is logged with method, duration, status, and peer information.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Identity metadata | Optional but attached by default | Avoid breaking older clients while improving logs |
| Logging fields | method, status, duration, peer, request_id | Minimum for traceability |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| Identity interceptor | `src/noteflow/grpc/interceptors/identity.py` | Ready to register with server |
| gRPC server construction | `src/noteflow/grpc/server.py` | Hook point for interceptors |
| Tonic interceptor support | `client/src-tauri/src/grpc/noteflow.rs` | Client can wrap `NoteFlowServiceClient` |
| Local identity commands | `client/src-tauri/src/commands/identity.rs` | Source of user/workspace IDs |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Backend (Python)** | | |
| Register identity interceptor on gRPC server | S | Add to `grpc.aio.server` interceptors |
| Add request logging interceptor | M | Log method, status, duration, peer |
| **Client (Rust)** | | |
| Add tonic interceptor to inject metadata | M | Use request ID + local identity |
| Ensure request ID generation when absent | S | Align with backend expectations |
**Total Effort**: M (2-4 hours)
---
## Deliverables
### Backend
- [ ] `src/noteflow/grpc/server.py` — register interceptors
- [ ] `src/noteflow/grpc/interceptors/` — add request logging interceptor
### Client
- [ ] `client/src-tauri/src/grpc/client/core.rs` — attach tonic interceptor
- [ ] `client/src-tauri/src/state/` — expose identity context for metadata
---
## Test Strategy
### Core test cases
- **Python**: interceptor sets context vars from metadata
- **Python**: per-RPC logs emitted for a sample method
- **Rust**: interceptor attaches metadata headers on request
---
## Quality Gates
- [ ] All RPCs include request_id in logs
- [ ] Identity metadata present when available
- [ ] No change required to proto schema
---
## Post-Sprint
- [ ] Add correlation ID propagation to frontend logs

View File

@@ -0,0 +1,168 @@
# Sprint: Logging Gap Remediation (P1 - Runtime/Inputs)
> **Size**: M | **Owner**: Platform | **Prerequisites**: log_timing + get_logger already in place
> **Phase**: Observability - Runtime Diagnostics
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verification complete, scope needs owner/priority confirmation.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Log level policy for invalid input (warn vs info vs debug) | ✅ | WARN with redaction |
| **B2** | PII redaction rules for UUIDs and URLs in logs | Pending | Align with security guidance |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Stub-missing logs could be noisy in gRPC client mixins | Add rate-limited or once-per-session logging |
| G2 | Timing vs. count metrics for long-running CPU tasks | Standardize on `log_timing` + optional result_count |
### Prerequisite Verification
| Prerequisite | Status | Notes |
|--------------|--------|-------|
| `log_timing` helper available | ✅ | `src/noteflow/infrastructure/logging/timing.py` |
| `log_state_transition` available | ✅ | `src/noteflow/infrastructure/logging/transitions.py` |
---
## Validation Status (2026-01-03)
### RESOLVED SINCE TRIAGE
| Component | Status | Notes |
|-----------|--------|-------|
| Ollama availability logging | Resolved | `src/noteflow/infrastructure/summarization/ollama_provider.py` uses `log_timing` |
| Cloud LLM API timing/logging | Resolved | `src/noteflow/infrastructure/summarization/cloud_provider.py` uses `log_timing` |
| Google Calendar request timing | Resolved | `src/noteflow/infrastructure/calendar/google_adapter.py` uses `log_timing` |
| OAuth refresh timing | Resolved | `src/noteflow/infrastructure/calendar/oauth_manager.py` uses `log_timing` |
| Webhook delivery start/finish | Resolved | `src/noteflow/infrastructure/webhooks/executor.py` info logs |
| Database engine + migrations | Resolved | `src/noteflow/infrastructure/persistence/database.py` info logs |
| Diarization full timing | Resolved | `src/noteflow/infrastructure/diarization/engine.py` uses `log_timing` |
| Diarization job timeout logging | Resolved | `src/noteflow/grpc/_mixins/diarization/_status.py` |
| Meeting state transitions | Resolved | `src/noteflow/application/services/meeting_service.py` |
| Streaming cleanup | Resolved | `src/noteflow/grpc/_mixins/streaming/_cleanup.py` |
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| NER warmup timing/logs | Not implemented | `src/noteflow/application/services/ner_service.py` uses `run_in_executor` without logs |
| ASR `transcribe_async` timing | Not implemented | `src/noteflow/infrastructure/asr/engine.py` lacks duration/RTF logs |
| Segmenter state transitions | Not implemented | `src/noteflow/infrastructure/asr/segmenter.py` no transition logs |
| Silent UUID parsing (workspace) | Not implemented | `src/noteflow/grpc/_mixins/meeting.py` returns None on ValueError |
| Silent meeting-id parsing | Not implemented | `src/noteflow/grpc/_mixins/converters/_id_parsing.py` returns None on ValueError |
| Silent calendar datetime parsing | Not implemented | `src/noteflow/infrastructure/triggers/calendar.py` returns None on ValueError |
| Settings fallback logging | Not implemented | `_get_llm_settings`, `_get_webhook_settings`, `diarization_job_ttl_seconds` |
| gRPC client stub missing logs | Not implemented | `src/noteflow/grpc/_client_mixins/*.py` return None silently |
| Rust gRPC connection tracing | Not implemented | `client/src-tauri/src/grpc/client/core.rs` no start/finish timing |
**Downstream impact**: Runtime visibility gaps for user-facing latency, failure diagnosis, and client connection issues.
---
## Objective
Close remaining high-impact logging gaps for runtime operations and input validation to reduce debugging time and improve failure diagnosis across Python gRPC services and the Tauri client.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| **Timing utility** | Use `log_timing` | Consistent duration metrics and structured fields |
| **Invalid input logging** | Warn-level with redaction | Catch client errors without leaking sensitive data |
| **Stub-missing logging** | Rate-limited (once per client instance) | Avoid log spam while preserving visibility |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| `log_timing` helper | `src/noteflow/infrastructure/logging/timing.py` | Use for executor + network timing |
| `log_state_transition` | `src/noteflow/infrastructure/logging/transitions.py` | Reuse for state-machine transitions |
| Existing log_timing usage | `ollama_provider.py`, `cloud_provider.py`, `google_adapter.py` | Follow established patterns |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Application Layer** | | |
| Add NER warmup + extraction timing logs | S | Use `log_timing` around `run_in_executor` |
| **Infrastructure Layer** | | |
| Add ASR `transcribe_async` duration + RTF logging | M | Include audio duration and model size |
| Add segmenter state transition logs | S | Use `log_state_transition` or structured info logs |
| Add settings fallback warning logs | S | `_get_llm_settings`, `_get_webhook_settings`, `diarization_job_ttl_seconds` |
| **API Layer** | | |
| Log invalid workspace UUID parsing (WARN + redaction) | S | `src/noteflow/grpc/_mixins/meeting.py` |
| Log invalid meeting_id parsing (WARN + redaction) | S | `src/noteflow/grpc/_mixins/converters/_id_parsing.py` |
| Log calendar datetime parse failures (WARN + redaction) | S | `src/noteflow/infrastructure/triggers/calendar.py` |
| gRPC client mixins log missing stub (rate-limited) | S | `src/noteflow/grpc/_client_mixins/*.py` |
| **Client Layer** | | |
| Add tracing for gRPC connect attempts | S | `client/src-tauri/src/grpc/client/core.rs` |
**Total Effort**: M (2-4 hours)
---
## Deliverables
### Backend
**Application Layer**:
- [ ] `src/noteflow/application/services/ner_service.py` — add warmup/extraction timing logs
**Infrastructure Layer**:
- [ ] `src/noteflow/infrastructure/asr/engine.py` — log transcription duration + RTF
- [ ] `src/noteflow/infrastructure/asr/segmenter.py` — log state transitions
- [ ] `src/noteflow/infrastructure/summarization/cloud_provider.py` — log settings fallback
- [ ] `src/noteflow/infrastructure/webhooks/executor.py` — log settings fallback
**API Layer**:
- [ ] `src/noteflow/grpc/_mixins/meeting.py` — log invalid workspace UUID parse (WARN + redaction)
- [ ] `src/noteflow/grpc/_mixins/converters/_id_parsing.py` — log invalid meeting_id parse (WARN + redaction)
- [ ] `src/noteflow/infrastructure/triggers/calendar.py` — log datetime parse errors (WARN + redaction)
- [ ] `src/noteflow/grpc/_client_mixins/*.py` — log missing stub (rate-limited)
- [ ] `src/noteflow/grpc/_mixins/diarization_job.py` — log settings fallback
### Client
- [ ] `client/src-tauri/src/grpc/client/core.rs` — log connection attempt duration + endpoint
---
## Test Strategy
### Core test cases
- **Application**: `caplog` validates NER warmup logs appear when lazy-load path is taken
- **Infrastructure**: `caplog` validates ASR timing log fields include duration and audio length
- **API**: invalid UUID parsing emits warning and aborts/returns safely
- **Client**: basic unit test or log snapshot for connection start/failure paths
---
## Quality Gates
- [ ] Added logs use structured fields and follow existing logging patterns
- [ ] No new `# type: ignore` or `Any` introduced
- [ ] Targeted tests for new logging paths where practical
- [ ] `ruff check` + `mypy` pass (backend)
- [ ] `npm run lint:rs` pass (client)
---
## Post-Sprint
- [ ] Evaluate if logging should be sampled for high-frequency segmenter transitions
- [ ] Consider centralized log suppression for repeated invalid client inputs

View File

@@ -0,0 +1,144 @@
# Sprint: Logging Gap Remediation (P2 - Persistence/Exports)
> **Size**: L | **Owner**: Platform | **Prerequisites**: P1 logging gaps resolved
> **Phase**: Observability - Data & Lifecycle
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verification complete, scope needs prioritization.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Log volume for repository CRUD operations | Pending | Decide sampling/level policy |
| **B2** | Sensitive data in repository logs | Pending | Redaction and field allowlist |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Consistent DB timing strategy across BaseRepository and UoW | Add `log_timing` helpers or per-method timing |
| G2 | Export logs should include size without dumping content | Log byte count + segment count only |
### Prerequisite Verification
| Prerequisite | Status | Notes |
|--------------|--------|-------|
| Logging helpers available | ✅ | `log_timing`, `get_logger` |
| State transition logger | ✅ | `log_state_transition` |
---
## Validation Status (2026-01-03)
### PARTIALLY IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| DB migrations lifecycle logs | Partial | Migration start/end logged; repo/UoW still silent |
| Audio writer open logging | Partial | Open/flush errors logged, but thread lifecycle unlogged |
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| BaseRepository query timing | Not implemented | `src/noteflow/infrastructure/persistence/repositories/_base.py` |
| UnitOfWork lifecycle logs | Not implemented | `src/noteflow/infrastructure/persistence/unit_of_work.py` |
| Repository CRUD logging | Not implemented | `meeting_repo.py`, `segment_repo.py`, `summary_repo.py`, etc. |
| Asset deletion no-op logging | Not implemented | `src/noteflow/infrastructure/persistence/repositories/asset_repo.py` |
| Export timing/logging | Not implemented | `pdf.py`, `markdown.py`, `html.py` |
| Diarization session close log level | Not implemented | `src/noteflow/infrastructure/diarization/session.py` uses debug |
| Background task lifecycle logs | Not implemented | `src/noteflow/grpc/_mixins/diarization/_jobs.py` task creation missing |
**Downstream impact**: Limited visibility into DB performance, export latency, and lifecycle cleanup.
---
## Objective
Add structured logging for persistence, export, and lifecycle operations so DB performance issues and long-running exports are diagnosable without ad-hoc debugging.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| **Repository logging level** | INFO for mutations, DEBUG for reads | Avoid log noise while capturing state changes |
| **Timing strategy** | `log_timing` around DB write batches | Consistent duration metrics without per-row spam |
| **Export logging** | Log sizes and durations only | Avoid dumping user content |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| Migration logging | `src/noteflow/infrastructure/persistence/database.py` | Reuse for DB lifecycle logs |
| Log helpers | `src/noteflow/infrastructure/logging/*` | Standardize on structured logging |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Infrastructure Layer** | | |
| Add BaseRepository timing wrappers | M | `_execute_*` methods emit duration |
| Add UnitOfWork lifecycle logs | S | __aenter__/commit/rollback/exit |
| Add CRUD mutation logs in repositories | L | Create/Update/Delete summary logs |
| Add asset deletion no-op log | S | log when directory missing |
| Add export timing logs | M | PDF/Markdown/HTML export duration + size |
| Promote diarization session close to INFO | S | `session.py` |
| Log diarization job task creation | S | `grpc/_mixins/diarization/_jobs.py` |
| Add audio flush thread lifecycle logs | S | `infrastructure/audio/writer.py` |
**Total Effort**: L (4-8 hours)
---
## Deliverables
### Backend
**Infrastructure Layer**:
- [ ] `src/noteflow/infrastructure/persistence/repositories/_base.py` — timing logs for DB operations
- [ ] `src/noteflow/infrastructure/persistence/unit_of_work.py` — session/commit/rollback logs
- [ ] `src/noteflow/infrastructure/persistence/repositories/*_repo.py` — mutation logging
- [ ] `src/noteflow/infrastructure/persistence/repositories/asset_repo.py` — no-op delete log
- [ ] `src/noteflow/infrastructure/export/pdf.py` — duration + byte-size log
- [ ] `src/noteflow/infrastructure/export/markdown.py` — export count log
- [ ] `src/noteflow/infrastructure/export/html.py` — export count log
- [ ] `src/noteflow/infrastructure/diarization/session.py` — info-level close log
- [ ] `src/noteflow/grpc/_mixins/diarization/_jobs.py` — background task creation log
- [ ] `src/noteflow/infrastructure/audio/writer.py` — flush thread lifecycle logs
---
## Test Strategy
### Core test cases
- **Repositories**: `caplog` validates mutation logging for create/update/delete
- **UnitOfWork**: log emitted on commit/rollback paths
- **Exports**: ensure logs include duration and output size (bytes/segments)
- **Lifecycle**: diarization session close emits info log
---
## Quality Gates
- [ ] Logging includes structured fields and avoids payload content
- [ ] No new `# type: ignore` or `Any` introduced
- [ ] `pytest` passes for touched modules
- [ ] `ruff check` + `mypy` pass
---
## Post-Sprint
- [ ] Assess performance impact of repo timing logs
- [ ] Consider opt-in logging for high-volume read paths