refactor: update linting and logging configurations

- Adjusted linting settings in `basedpyright.lint.json` to improve performance metrics.
- Updated `biome.json` to reflect changes in the number of unchanged files.
- Removed outdated sprint documentation files and added new plans for logging centralization and quality suite hardening.
- Enhanced logging functionality across various services to improve observability and error tracking.
- Introduced new test cases for gRPC interceptors and observability mixins to ensure robust error handling and logging.
This commit is contained in:
2026-01-03 23:34:23 +00:00
parent 1a55636b3d
commit bde264131f
136 changed files with 11209 additions and 2471 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -1 +1 @@
{"summary":{"changed":0,"unchanged":300,"matches":0,"duration":{"secs":0,"nanos":63487663},"scannerDuration":{"secs":0,"nanos":2321824},"errors":0,"warnings":0,"infos":0,"skipped":0,"suggestedFixesSkipped":0,"diagnosticsNotPrinted":0},"diagnostics":[],"command":"lint"}
{"summary":{"changed":0,"unchanged":301,"matches":0,"duration":{"secs":0,"nanos":80436438},"scannerDuration":{"secs":0,"nanos":2575466},"errors":0,"warnings":0,"infos":0,"skipped":0,"suggestedFixesSkipped":0,"diagnosticsNotPrinted":0},"diagnostics":[],"command":"lint"}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,90 @@
# Sprint GAP-009: Event Bridge Initialization and Contract Guarantees
> **Size**: S | **Owner**: Frontend (TypeScript) + Client (Rust) | **Prerequisites**: None
> **Phase**: Gaps - Event Wiring
> **Status**: ✅ COMPLETED (2026-01-03)
---
## Summary
Ensured the Tauri event bridge is always initialized in desktop mode before connection attempts, and added contract validation tests to keep event names synchronized across Rust and TypeScript.
### Key Changes
| Component | Before | After |
|-----------|--------|-------|
| Event bridge timing | After successful connect | Before connect (captures early events) |
| AUDIO_WARNING event | Missing in TypeScript | Synchronized across TS and Rust |
| Contract validation | None | Test validates event name parity |
| Documentation | Minimal | Rust `event_names` module documents sync process |
---
## Resolved Issues
| ID | Issue | Resolution |
|----|-------|------------|
| **B1** | Should event bridge start before connection? | Yes - moved to before connect in `initializeAPI()` |
| **G1** | Event bridge starts only after successful connect | Fixed - now starts before connection attempt |
| **G2** | Event name sources split across TS + Rust | Contract test enforces synchronization |
---
## Implementation Details
### Client (TypeScript)
| File | Change |
|------|--------|
| `client/src/api/index.ts` | Moved `startTauriEventBridge()` before `connect()` call |
| `client/src/api/tauri-constants.ts` | Added `AUDIO_WARNING` event, added sync documentation |
| `client/src/lib/tauri-events.ts` | Added `AudioWarningEvent` interface, subscribe to `AUDIO_WARNING` |
| `client/src/api/tauri-constants.test.ts` | New contract test validating event name parity |
### Client (Rust)
| File | Change |
|------|--------|
| `client/src-tauri/src/events/mod.rs` | Added comprehensive documentation for `event_names` module |
---
## Deliverables
- [x] `client/src/api/index.ts` — initialize event bridge before connect
- [x] `client/src/lib/tauri-events.ts` — added AUDIO_WARNING event support
- [x] `client/src/api/tauri-constants.ts` — added AUDIO_WARNING, sync documentation
- [x] `client/src/api/tauri-constants.test.ts` — new contract test (4 test cases)
- [x] `client/src-tauri/src/events/mod.rs` — documented canonical event names
---
## Test Results
```
src/api/tauri-constants.test.ts
✓ contains all expected Rust event names
✓ does not contain extra events not in Rust
✓ has event values matching their keys (self-consistency)
✓ has exactly 14 events matching Rust
Test Files: 64 passed
Tests: 594 passed
```
---
## Quality Gates
- [x] Event bridge runs before first connection attempt
- [x] Event names aligned across TS and Rust (14 events each)
- [x] `npm run test` passes
- [x] `npm run type-check` passes
- [x] `npm run lint` passes
---
## Post-Sprint
- [ ] Consider generating TS constants from Rust event list

View File

@@ -0,0 +1,126 @@
# Sprint GAP-010: Identity Metadata and Per-RPC Logging
> **Size**: M | **Owner**: Backend (Python) + Client (Rust) | **Prerequisites**: None
> **Phase**: Gaps - Observability and Identity
> **Status**: ✅ COMPLETE (2026-01-03)
---
## Open Issues & Prerequisites
> ✅ **Completed**: 2026-01-03 — All blocking issues resolved, implementation complete.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Is identity metadata required for all RPCs? | ✅ Resolved | **Required** — x-request-id header mandatory, returns UNAUTHENTICATED if missing |
| **B2** | Where to source user/workspace IDs in Tauri client | ✅ Resolved | Use identity commands pattern (DEFAULT_USER_ID/DEFAULT_WORKSPACE_ID) |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Identity interceptor exists but is not registered | ✅ Added to gRPC server interceptors |
| G2 | Tauri client does not attach identity metadata | ✅ Tonic IdentityInterceptor added |
| G3 | No per-RPC logging | ✅ RequestLoggingInterceptor added |
---
## Validation Status (2026-01-03)
### ✅ IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Server identity interceptor wiring | ✅ Complete | Registered in `server.py`, rejects missing x-request-id |
| Client metadata injection | ✅ Complete | `IdentityInterceptor` injects headers on every request |
| Per-RPC logging | ✅ Complete | `RequestLoggingInterceptor` logs at INFO level |
**Result**: All RPCs now include identity context in logs, and every request is logged with method, status, duration_ms, peer, and request_id.
---
## Objective
Provide consistent identity metadata across RPCs and ensure every request is logged with method, duration, status, and peer information.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Identity metadata | **Required** (x-request-id mandatory) | Ensures all requests are traceable; returns UNAUTHENTICATED if missing |
| Rejection status | gRPC UNAUTHENTICATED | Standard status code for missing authentication context |
| Logging level | INFO | Visible in normal operation without debug mode |
| Logging fields | method, status, duration_ms, peer, request_id | Minimum for traceability |
| Client identity source | Identity commands pattern | Uses DEFAULT_USER_ID/DEFAULT_WORKSPACE_ID constants |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| Identity interceptor | `src/noteflow/grpc/interceptors/identity.py` | Ready to register with server |
| gRPC server construction | `src/noteflow/grpc/server.py` | Hook point for interceptors |
| Tonic interceptor support | `client/src-tauri/src/grpc/noteflow.rs` | Client can wrap `NoteFlowServiceClient` |
| Local identity commands | `client/src-tauri/src/commands/identity.rs` | Source of user/workspace IDs |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Backend (Python)** | | |
| Register identity interceptor on gRPC server | S | Add to `grpc.aio.server` interceptors |
| Add request logging interceptor | M | Log method, status, duration, peer |
| **Client (Rust)** | | |
| Add tonic interceptor to inject metadata | M | Use request ID + local identity |
| Ensure request ID generation when absent | S | Align with backend expectations |
**Total Effort**: M (2-4 hours)
---
## Deliverables
### Backend
- [x] `src/noteflow/grpc/server.py` — register interceptors (RequestLoggingInterceptor + IdentityInterceptor)
- [x] `src/noteflow/grpc/interceptors/logging.py` — new RequestLoggingInterceptor
- [x] `src/noteflow/grpc/interceptors/identity.py` — updated to require x-request-id, reject with UNAUTHENTICATED
- [x] `tests/grpc/test_interceptors.py` — comprehensive interceptor tests
### Client
- [x] `client/src-tauri/src/grpc/client/core.rs` — IdentityInterceptor + InterceptedClient type alias
- [x] `client/src-tauri/src/grpc/client/mod.rs` — export IdentityInterceptor and InterceptedClient
---
## Test Strategy
### Core test cases
- [x] **Python**: interceptor sets context vars from metadata (`test_sets_context_vars_from_metadata`)
- [x] **Python**: interceptor rejects missing x-request-id with UNAUTHENTICATED (`test_rejects_missing_request_id_with_unauthenticated`)
- [x] **Python**: per-RPC logs emitted with method, status, duration_ms, peer, request_id (`test_logs_rpc_completion`)
- [x] **Python**: error status logged on exception (`test_logs_error_status_on_exception`)
- [x] **Rust**: interceptor attaches metadata headers on request (IdentityInterceptor implementation)
---
## Quality Gates
- [x] All RPCs include request_id in logs (enforced by IdentityInterceptor)
- [x] Identity metadata present when available (injected by client interceptor)
- [x] No change required to proto schema (metadata headers only)
---
## Post-Sprint
- [ ] Add correlation ID propagation to frontend logs

View File

@@ -0,0 +1,160 @@
# Sprint: Logging Gap Remediation (P1 - Runtime/Inputs)
> **Status**: ✅ **COMPLETE** (2026-01-03)
> **Size**: M | **Owner**: Platform | **Prerequisites**: log_timing + get_logger already in place
> **Phase**: Observability - Runtime Diagnostics
---
## Open Issues & Prerequisites
> ✅ **Completed**: 2026-01-03 — All items implemented and verified.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Log level policy for invalid input (warn vs info vs debug) | ✅ | WARN with redaction |
| **B2** | PII redaction rules for UUIDs and URLs in logs | ✅ | UUID truncation implemented in meeting.py, project_id, workspace_id |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Stub-missing logs could be noisy in gRPC client mixins | Resolved via rate-limited `warn_stub_missing` |
| G2 | Timing vs. count metrics for long-running CPU tasks | Resolved via `log_timing` usage + context fields |
### Prerequisite Verification
| Prerequisite | Status | Notes |
|--------------|--------|-------|
| `log_timing` helper available | ✅ | `src/noteflow/infrastructure/logging/timing.py` |
| `log_state_transition` available | ✅ | `src/noteflow/infrastructure/logging/transitions.py` |
---
## Validation Status (2026-01-03)
### RESOLVED SINCE TRIAGE
| Component | Status | Notes |
|-----------|--------|-------|
| Ollama availability logging | Resolved | `src/noteflow/infrastructure/summarization/ollama_provider.py` uses `log_timing` |
| Cloud LLM API timing/logging | Resolved | `src/noteflow/infrastructure/summarization/cloud_provider.py` uses `log_timing` |
| Google Calendar request timing | Resolved | `src/noteflow/infrastructure/calendar/google_adapter.py` uses `log_timing` |
| OAuth refresh timing | Resolved | `src/noteflow/infrastructure/calendar/oauth_manager.py` uses `log_timing` |
| Webhook delivery start/finish | Resolved | `src/noteflow/infrastructure/webhooks/executor.py` info logs |
| Database engine + migrations | Resolved | `src/noteflow/infrastructure/persistence/database.py` info logs |
| Diarization full timing | Resolved | `src/noteflow/infrastructure/diarization/engine.py` uses `log_timing` |
| Diarization job timeout logging | Resolved | `src/noteflow/grpc/_mixins/diarization/_status.py` |
| Meeting state transitions | Resolved | `src/noteflow/application/services/meeting_service.py` |
| Streaming cleanup | Resolved | `src/noteflow/grpc/_mixins/streaming/_cleanup.py` |
| NER warmup + extraction timing | Resolved | `src/noteflow/application/services/ner_service.py` uses `log_timing` |
| ASR `transcribe_async` timing + context | Resolved | `src/noteflow/infrastructure/asr/engine.py` uses `log_timing` |
| Invalid meeting_id parsing logs | Resolved | `src/noteflow/grpc/_mixins/converters/_id_parsing.py` warns w/ truncation |
| Calendar datetime parse warnings | Resolved | `src/noteflow/infrastructure/triggers/calendar.py` warns w/ truncation |
| Settings fallback logs | Resolved | `_get_llm_settings`, `_get_webhook_settings`, `diarization_job_ttl_seconds` |
| gRPC client stub missing logs | Resolved | `_client_mixins/*` use `get_client_rate_limiter()` |
| Rust gRPC connection tracing | Resolved | `client/src-tauri/src/grpc/client/core.rs` logs connect timing |
### IMPLEMENTED THIS SPRINT
| Component | Status | Notes |
|-----------|--------|-------|
| Segmenter state transitions | ✅ Complete | `src/noteflow/infrastructure/asr/segmenter.py` uses `logger.debug("segmenter_state_transition", ...)` |
| Workspace UUID parsing warning | ✅ Complete | `src/noteflow/grpc/_mixins/meeting.py` logs with redaction on ValueError |
**Status**: All gaps resolved. No downstream visibility issues remain.
---
## Objective
Close remaining high-impact logging gaps for runtime operations and input validation to reduce debugging time and improve failure diagnosis across Python gRPC services and the Tauri client.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| **Timing utility** | Use `log_timing` | Consistent duration metrics and structured fields |
| **Invalid input logging** | Warn-level with redaction | Catch client errors without leaking sensitive data |
| **Stub-missing logging** | Rate-limited (once per client instance) | Avoid log spam while preserving visibility |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| `log_timing` helper | `src/noteflow/infrastructure/logging/timing.py` | Use for executor + network timing |
| `log_state_transition` | `src/noteflow/infrastructure/logging/transitions.py` | Reuse for state-machine transitions |
| Existing log_timing usage | `ollama_provider.py`, `cloud_provider.py`, `google_adapter.py` | Follow established patterns |
---
## Scope
| Task | Effort | Status |
|------|--------|--------|
| **Infrastructure Layer** | | |
| Add segmenter state transition logs | S | ✅ Already implemented (debug-level `segmenter_state_transition`) |
| **API Layer** | | |
| Log invalid workspace UUID parsing (WARN + redaction) | S | ✅ Implemented |
| **Client Layer** | | |
| (None) | | N/A |
**Total Effort**: ✅ Complete
---
## Deliverables
### Backend
**Application Layer**:
- [x] `src/noteflow/application/services/ner_service.py` — warmup/extraction timing logs present
**Infrastructure Layer**:
- [x] `src/noteflow/infrastructure/asr/engine.py` — transcription timing logs present
- [x] `src/noteflow/infrastructure/asr/segmenter.py` — state transitions logged with `segmenter_state_transition`
- [x] `src/noteflow/infrastructure/summarization/cloud_provider.py` — settings fallback logs present
- [x] `src/noteflow/infrastructure/webhooks/executor.py` — settings fallback logs present
**API Layer**:
- [x] `src/noteflow/grpc/_mixins/meeting.py` — invalid workspace UUID parse logs (WARN + redaction)
- [x] `src/noteflow/grpc/_mixins/converters/_id_parsing.py` — invalid meeting_id parse logs present
- [x] `src/noteflow/infrastructure/triggers/calendar.py` — datetime parse warnings present
- [x] `src/noteflow/grpc/_client_mixins/*.py` — stub-missing logs present (rate-limited)
- [x] `src/noteflow/grpc/_mixins/diarization_job.py` — settings fallback logs present
### Client
- [x] `client/src-tauri/src/grpc/client/core.rs` — connection timing logs present
---
## Test Strategy
### Core test cases
- **Infrastructure**: `caplog` validates segmenter transition logs emit on state changes
- **API**: invalid workspace UUID parsing emits warning and returns safely
---
## Quality Gates
- [x] Added logs use structured fields and follow existing logging patterns
- [x] No new `# type: ignore` or `Any` introduced
- [x] Targeted tests for new logging paths where practical
- [x] `ruff check` + `mypy` pass (backend)
- [x] `npm run lint:rs` pass (client)
---
## Post-Sprint
- [ ] Evaluate if logging should be sampled for high-frequency segmenter transitions
- [ ] Consider centralized log suppression for repeated invalid client inputs

View File

@@ -0,0 +1,196 @@
# Sprint: Logging Gap Remediation (P2 - Persistence/Exports)
> **Size**: L | **Owner**: Platform | **Prerequisites**: P1 logging gaps resolved
> **Phase**: Observability - Data & Lifecycle
> **Status**: ✅ COMPLETE (2026-01-03)
---
## Open Issues & Prerequisites
> ✅ **Completed**: 2026-01-03 — All P2 logging gaps implemented.
### Blocking Issues (Resolved)
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Log volume for repository CRUD operations | ✅ Resolved | INFO for mutations, DEBUG for reads |
| **B2** | Sensitive data in repository logs | ✅ Resolved | Log IDs and counts only, no content |
### Design Gaps (Addressed)
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Consistent DB timing strategy across BaseRepository and UoW | ✅ Added timing to `_execute_*` and flush methods |
| G2 | Export logs should include size without dumping content | ✅ Log byte count + segment count + duration_ms |
### Prerequisite Verification
| Prerequisite | Status | Notes |
|--------------|--------|-------|
| Logging helpers available | ✅ | `log_timing`, `get_logger` |
| State transition logger | ✅ | `log_state_transition` |
---
## Validation Status (2026-01-03)
### ✅ FULLY IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| BaseRepository query timing | ✅ Implemented | `_execute_*` and flush methods timed |
| UnitOfWork lifecycle logs | ✅ Implemented | `__aenter__`, commit, rollback, `__aexit__` |
| Repository CRUD logging | ✅ Implemented | All repos: meeting, segment, summary, annotation, webhook, etc. |
| Asset deletion no-op logging | ✅ Implemented | `assets_delete_skipped_not_found` log |
| Export timing/logging | ✅ Implemented | HTML, Markdown, PDF with duration_ms + size_bytes |
| Diarization session close log level | ✅ Implemented | Promoted to INFO level |
| Background task lifecycle logs | ✅ Implemented | `diarization_task_created` log |
| Audio writer flush thread | ✅ Implemented | `flush_thread_started`, `flush_thread_stopped` |
**Downstream impact**: Full visibility into DB performance, export latency, and lifecycle cleanup.
---
## Objective
Add structured logging for persistence, export, and lifecycle operations so DB performance issues and long-running exports are diagnosable without ad-hoc debugging.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| **Repository logging level** | INFO for mutations, DEBUG for reads | Avoid log noise while capturing state changes |
| **Timing strategy** | `log_timing` around DB write batches | Consistent duration metrics without per-row spam |
| **Export logging** | Log sizes and durations only | Avoid dumping user content |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| Migration logging | `src/noteflow/infrastructure/persistence/database.py` | Reuse for DB lifecycle logs |
| Log helpers | `src/noteflow/infrastructure/logging/*` | Standardize on structured logging |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Infrastructure Layer** | | |
| Add BaseRepository timing wrappers | M | `_execute_*` methods emit duration |
| Add UnitOfWork lifecycle logs | S | __aenter__/commit/rollback/exit |
| Add CRUD mutation logs in repositories | L | Create/Update/Delete summary logs |
| Add asset deletion no-op log | S | log when directory missing |
| Add export timing logs | M | PDF/Markdown/HTML export duration + size |
| Promote diarization session close to INFO | S | `session.py` |
| Log diarization job task creation | S | `grpc/_mixins/diarization/_jobs.py` |
| Add audio flush thread lifecycle logs | S | `infrastructure/audio/writer.py` |
**Total Effort**: L (4-8 hours)
---
## Deliverables
### Backend
**Infrastructure Layer**:
- [x] `src/noteflow/infrastructure/persistence/repositories/_base.py` — timing logs for DB operations
- [x] `src/noteflow/infrastructure/persistence/unit_of_work.py` — session/commit/rollback logs
- [x] `src/noteflow/infrastructure/persistence/repositories/*_repo.py` — mutation logging
- [x] `src/noteflow/infrastructure/persistence/repositories/asset_repo.py` — no-op delete log
- [x] `src/noteflow/infrastructure/export/pdf.py` — duration + byte-size log
- [x] `src/noteflow/infrastructure/export/markdown.py` — export count log
- [x] `src/noteflow/infrastructure/export/html.py` — export count log
- [x] `src/noteflow/infrastructure/diarization/session.py` — info-level close log
- [x] `src/noteflow/grpc/_mixins/diarization/_jobs.py` — background task creation log
- [x] `src/noteflow/infrastructure/audio/writer.py` — flush thread lifecycle logs
---
## Test Strategy
### Core test cases
- **Repositories**: `caplog` validates mutation logging for create/update/delete
- **UnitOfWork**: log emitted on commit/rollback paths
- **Exports**: ensure logs include duration and output size (bytes/segments)
- **Lifecycle**: diarization session close emits info log
---
## Quality Gates
- [x] Logging includes structured fields and avoids payload content
- [x] No new `# type: ignore` or `Any` introduced
- [x] `pytest` passes for touched modules
- [x] `basedpyright src/` passes (0 errors)
---
## Post-Sprint
- [ ] Assess performance impact of repo timing logs
- [ ] Consider opt-in logging for high-volume read paths
---
## Implementation Summary (2026-01-03)
### Logging Events Added
| Module | Event Names |
|--------|------------|
| BaseRepository | `db_execute_scalar`, `db_execute_scalars`, `db_add_and_flush`, `db_delete_and_flush` |
| UnitOfWork | `uow_session_started`, `uow_transaction_committed`, `uow_transaction_rolled_back`, `uow_session_closed` |
| MeetingRepository | `meeting_created`, `meeting_updated`, `meeting_deleted` |
| SegmentRepository | `segment_added`, `segments_batch_added` |
| SummaryRepository | `summary_created`, `summary_updated`, `summary_deleted` |
| AnnotationRepository | `annotation_added`, `annotation_updated`, `annotation_deleted` |
| PreferencesRepository | `preference_created`, `preference_updated`, `preference_deleted` |
| WebhookRepository | `webhook_created`, `webhook_updated`, `webhook_delivery_recorded` |
| DiarizationJobRepository | `diarization_job_created`, `diarization_job_status_updated`, etc. |
| EntityRepository | `entity_saved`, `entities_batch_saved`, `entities_deleted_by_meeting` |
| IntegrationRepository | `integration_created`, `integration_updated`, `sync_run_created` |
| AssetRepository | `assets_deleted`, `assets_delete_skipped_not_found` |
| HtmlExporter | `html_exported` (with segment_count, size_bytes, duration_ms) |
| MarkdownExporter | `markdown_exported` (with segment_count, size_bytes, duration_ms) |
| PdfExporter | `pdf_exported` (with segment_count, size_bytes, duration_ms) |
| DiarizationSession | `diarization_session_closed` (promoted to INFO) |
| DiarizationJobs | `diarization_task_created` |
| AudioWriter | `flush_thread_started`, `flush_thread_stopped`, `flush_thread_timeout` |
### Tests Added
- `tests/infrastructure/persistence/test_logging_persistence.py` — 16 tests verifying logger configuration across all modified modules
### Implementation Notes
**Completion**: 100% (10/10 deliverables)
**Design principles followed**:
- No new wrapper classes or compatibility layers
- Direct use of existing `get_logger(__name__)` pattern
- Inline `time.perf_counter()` for timing (no helper abstraction)
- Structured logging with keyword args only (no format strings)
- Log IDs and metrics only, never content/payloads
**Lines of code added per file** (approximate):
- `_base.py`: +25 lines (timing around 5 methods)
- `unit_of_work.py`: +20 lines (lifecycle events)
- `*_repo.py` (10 files): ~8-15 lines each
- Export files (3): ~12 lines each
- `session.py`: 1 line change (DEBUG → INFO)
- `_jobs.py`: +6 lines
- `writer.py`: +8 lines
**No bloat introduced**:
- No new classes, protocols, or abstract types
- No new dependencies
- No configuration changes required
- Zero runtime overhead when log level disabled

View File

@@ -1,117 +0,0 @@
# Sprint GAP-009: Event Bridge Initialization and Contract Guarantees
> **Size**: S | **Owner**: Frontend (TypeScript) + Client (Rust) | **Prerequisites**: None
> **Phase**: Gaps - Event Wiring
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verified in code; needs confirmation on desired bridge startup timing.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Should event bridge start before connection? | Pending | Recommend yes to capture early events |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Event bridge starts only after successful connect | Initialize on app boot for Tauri |
| G2 | Event name sources are split across TS + Rust | Enforce single canonical source and tests |
---
## Validation Status (2026-01-03)
### PARTIALLY IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Event names centralized | Implemented | Rust uses `event_names`, TS uses `TauriEvents` |
| Event bridge | Partial | Started after connect in `initializeAPI` |
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Early event bridge init | Not implemented | Disconnected mode skips bridge startup |
| Contract cross-check tests | Not implemented | No explicit TS<->Rust contract validation |
**Downstream impact**: Early connection/error events may be missed or not forwarded to the frontend when connection fails.
---
## Objective
Ensure the Tauri event bridge is always initialized in desktop mode and keep event names in sync across Rust and TypeScript.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Bridge initialization timing | At app boot in Tauri | Guarantees early events are captured |
| Contract validation | Add a TS test to validate event list | Prevent silent drift |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| Rust event names | `client/src-tauri/src/events/mod.rs` | Canonical Rust constants |
| TS event names | `client/src/api/tauri-constants.ts` | Canonical TS constants |
| Event bridge implementation | `client/src/lib/tauri-events.ts` | Hook point for init timing |
| API init flow | `client/src/api/index.ts` | Controls when bridge starts |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Client Layer (TypeScript)** | | |
| Start event bridge during app boot in Tauri mode | S | Move `startTauriEventBridge` earlier |
| Add contract test for event name parity | S | Compare TS constants against expected list |
| **Client Layer (Rust)** | | |
| Add doc comment that event names are canonical | S | Small clarity change |
**Total Effort**: S (1-2 hours)
---
## Deliverables
### Client
- [ ] `client/src/api/index.ts` — initialize event bridge before connect
- [ ] `client/src/lib/tauri-events.ts` — guard against double-init
- [ ] `client/src/api/tauri-constants.ts` — add test coverage for event list
- [ ] `client/src-tauri/src/events/mod.rs` — document canonical event names
---
## Test Strategy
### Core test cases
- **TS**: event bridge initializes in Tauri mode even when disconnected
- **TS**: event list contract test fails if names drift
---
## Quality Gates
- [ ] Event bridge runs before first connection attempt
- [ ] Event names remain aligned across TS and Rust
- [ ] `npm run test` passes
---
## Post-Sprint
- [ ] Consider generating TS constants from Rust event list

View File

@@ -1,118 +0,0 @@
# Sprint GAP-010: Identity Metadata and Per-RPC Logging
> **Size**: M | **Owner**: Backend (Python) + Client (Rust) | **Prerequisites**: None
> **Phase**: Gaps - Observability and Identity
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verified in code; requires decision on identity metadata requirements.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Is identity metadata required for all RPCs? | Pending | Decide if headers are optional or mandatory |
| **B2** | Where to source user/workspace IDs in Tauri client | Pending | Use local identity commands or preferences |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Identity interceptor exists but is not registered | Add to gRPC server interceptors |
| G2 | Tauri client does not attach identity metadata | Add tonic interceptor |
| G3 | No per-RPC logging | Add server-side logging interceptor |
---
## Validation Status (2026-01-03)
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| Server identity interceptor wiring | Not implemented | Interceptor defined but unused |
| Client metadata injection | Not implemented | Tonic interceptor not configured |
| Per-RPC logging | Not implemented | Server lacks request logging interceptor |
**Downstream impact**: Request logs lack user/workspace context, and backend activity can appear invisible when service methods do not log.
---
## Objective
Provide consistent identity metadata across RPCs and ensure every request is logged with method, duration, status, and peer information.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Identity metadata | Optional but attached by default | Avoid breaking older clients while improving logs |
| Logging fields | method, status, duration, peer, request_id | Minimum for traceability |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| Identity interceptor | `src/noteflow/grpc/interceptors/identity.py` | Ready to register with server |
| gRPC server construction | `src/noteflow/grpc/server.py` | Hook point for interceptors |
| Tonic interceptor support | `client/src-tauri/src/grpc/noteflow.rs` | Client can wrap `NoteFlowServiceClient` |
| Local identity commands | `client/src-tauri/src/commands/identity.rs` | Source of user/workspace IDs |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Backend (Python)** | | |
| Register identity interceptor on gRPC server | S | Add to `grpc.aio.server` interceptors |
| Add request logging interceptor | M | Log method, status, duration, peer |
| **Client (Rust)** | | |
| Add tonic interceptor to inject metadata | M | Use request ID + local identity |
| Ensure request ID generation when absent | S | Align with backend expectations |
**Total Effort**: M (2-4 hours)
---
## Deliverables
### Backend
- [ ] `src/noteflow/grpc/server.py` — register interceptors
- [ ] `src/noteflow/grpc/interceptors/` — add request logging interceptor
### Client
- [ ] `client/src-tauri/src/grpc/client/core.rs` — attach tonic interceptor
- [ ] `client/src-tauri/src/state/` — expose identity context for metadata
---
## Test Strategy
### Core test cases
- **Python**: interceptor sets context vars from metadata
- **Python**: per-RPC logs emitted for a sample method
- **Rust**: interceptor attaches metadata headers on request
---
## Quality Gates
- [ ] All RPCs include request_id in logs
- [ ] Identity metadata present when available
- [ ] No change required to proto schema
---
## Post-Sprint
- [ ] Add correlation ID propagation to frontend logs

View File

@@ -1,168 +0,0 @@
# Sprint: Logging Gap Remediation (P1 - Runtime/Inputs)
> **Size**: M | **Owner**: Platform | **Prerequisites**: log_timing + get_logger already in place
> **Phase**: Observability - Runtime Diagnostics
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verification complete, scope needs owner/priority confirmation.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Log level policy for invalid input (warn vs info vs debug) | ✅ | WARN with redaction |
| **B2** | PII redaction rules for UUIDs and URLs in logs | Pending | Align with security guidance |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Stub-missing logs could be noisy in gRPC client mixins | Add rate-limited or once-per-session logging |
| G2 | Timing vs. count metrics for long-running CPU tasks | Standardize on `log_timing` + optional result_count |
### Prerequisite Verification
| Prerequisite | Status | Notes |
|--------------|--------|-------|
| `log_timing` helper available | ✅ | `src/noteflow/infrastructure/logging/timing.py` |
| `log_state_transition` available | ✅ | `src/noteflow/infrastructure/logging/transitions.py` |
---
## Validation Status (2026-01-03)
### RESOLVED SINCE TRIAGE
| Component | Status | Notes |
|-----------|--------|-------|
| Ollama availability logging | Resolved | `src/noteflow/infrastructure/summarization/ollama_provider.py` uses `log_timing` |
| Cloud LLM API timing/logging | Resolved | `src/noteflow/infrastructure/summarization/cloud_provider.py` uses `log_timing` |
| Google Calendar request timing | Resolved | `src/noteflow/infrastructure/calendar/google_adapter.py` uses `log_timing` |
| OAuth refresh timing | Resolved | `src/noteflow/infrastructure/calendar/oauth_manager.py` uses `log_timing` |
| Webhook delivery start/finish | Resolved | `src/noteflow/infrastructure/webhooks/executor.py` info logs |
| Database engine + migrations | Resolved | `src/noteflow/infrastructure/persistence/database.py` info logs |
| Diarization full timing | Resolved | `src/noteflow/infrastructure/diarization/engine.py` uses `log_timing` |
| Diarization job timeout logging | Resolved | `src/noteflow/grpc/_mixins/diarization/_status.py` |
| Meeting state transitions | Resolved | `src/noteflow/application/services/meeting_service.py` |
| Streaming cleanup | Resolved | `src/noteflow/grpc/_mixins/streaming/_cleanup.py` |
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| NER warmup timing/logs | Not implemented | `src/noteflow/application/services/ner_service.py` uses `run_in_executor` without logs |
| ASR `transcribe_async` timing | Not implemented | `src/noteflow/infrastructure/asr/engine.py` lacks duration/RTF logs |
| Segmenter state transitions | Not implemented | `src/noteflow/infrastructure/asr/segmenter.py` no transition logs |
| Silent UUID parsing (workspace) | Not implemented | `src/noteflow/grpc/_mixins/meeting.py` returns None on ValueError |
| Silent meeting-id parsing | Not implemented | `src/noteflow/grpc/_mixins/converters/_id_parsing.py` returns None on ValueError |
| Silent calendar datetime parsing | Not implemented | `src/noteflow/infrastructure/triggers/calendar.py` returns None on ValueError |
| Settings fallback logging | Not implemented | `_get_llm_settings`, `_get_webhook_settings`, `diarization_job_ttl_seconds` |
| gRPC client stub missing logs | Not implemented | `src/noteflow/grpc/_client_mixins/*.py` return None silently |
| Rust gRPC connection tracing | Not implemented | `client/src-tauri/src/grpc/client/core.rs` no start/finish timing |
**Downstream impact**: Runtime visibility gaps for user-facing latency, failure diagnosis, and client connection issues.
---
## Objective
Close remaining high-impact logging gaps for runtime operations and input validation to reduce debugging time and improve failure diagnosis across Python gRPC services and the Tauri client.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| **Timing utility** | Use `log_timing` | Consistent duration metrics and structured fields |
| **Invalid input logging** | Warn-level with redaction | Catch client errors without leaking sensitive data |
| **Stub-missing logging** | Rate-limited (once per client instance) | Avoid log spam while preserving visibility |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| `log_timing` helper | `src/noteflow/infrastructure/logging/timing.py` | Use for executor + network timing |
| `log_state_transition` | `src/noteflow/infrastructure/logging/transitions.py` | Reuse for state-machine transitions |
| Existing log_timing usage | `ollama_provider.py`, `cloud_provider.py`, `google_adapter.py` | Follow established patterns |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Application Layer** | | |
| Add NER warmup + extraction timing logs | S | Use `log_timing` around `run_in_executor` |
| **Infrastructure Layer** | | |
| Add ASR `transcribe_async` duration + RTF logging | M | Include audio duration and model size |
| Add segmenter state transition logs | S | Use `log_state_transition` or structured info logs |
| Add settings fallback warning logs | S | `_get_llm_settings`, `_get_webhook_settings`, `diarization_job_ttl_seconds` |
| **API Layer** | | |
| Log invalid workspace UUID parsing (WARN + redaction) | S | `src/noteflow/grpc/_mixins/meeting.py` |
| Log invalid meeting_id parsing (WARN + redaction) | S | `src/noteflow/grpc/_mixins/converters/_id_parsing.py` |
| Log calendar datetime parse failures (WARN + redaction) | S | `src/noteflow/infrastructure/triggers/calendar.py` |
| gRPC client mixins log missing stub (rate-limited) | S | `src/noteflow/grpc/_client_mixins/*.py` |
| **Client Layer** | | |
| Add tracing for gRPC connect attempts | S | `client/src-tauri/src/grpc/client/core.rs` |
**Total Effort**: M (2-4 hours)
---
## Deliverables
### Backend
**Application Layer**:
- [ ] `src/noteflow/application/services/ner_service.py` — add warmup/extraction timing logs
**Infrastructure Layer**:
- [ ] `src/noteflow/infrastructure/asr/engine.py` — log transcription duration + RTF
- [ ] `src/noteflow/infrastructure/asr/segmenter.py` — log state transitions
- [ ] `src/noteflow/infrastructure/summarization/cloud_provider.py` — log settings fallback
- [ ] `src/noteflow/infrastructure/webhooks/executor.py` — log settings fallback
**API Layer**:
- [ ] `src/noteflow/grpc/_mixins/meeting.py` — log invalid workspace UUID parse (WARN + redaction)
- [ ] `src/noteflow/grpc/_mixins/converters/_id_parsing.py` — log invalid meeting_id parse (WARN + redaction)
- [ ] `src/noteflow/infrastructure/triggers/calendar.py` — log datetime parse errors (WARN + redaction)
- [ ] `src/noteflow/grpc/_client_mixins/*.py` — log missing stub (rate-limited)
- [ ] `src/noteflow/grpc/_mixins/diarization_job.py` — log settings fallback
### Client
- [ ] `client/src-tauri/src/grpc/client/core.rs` — log connection attempt duration + endpoint
---
## Test Strategy
### Core test cases
- **Application**: `caplog` validates NER warmup logs appear when lazy-load path is taken
- **Infrastructure**: `caplog` validates ASR timing log fields include duration and audio length
- **API**: invalid UUID parsing emits warning and aborts/returns safely
- **Client**: basic unit test or log snapshot for connection start/failure paths
---
## Quality Gates
- [ ] Added logs use structured fields and follow existing logging patterns
- [ ] No new `# type: ignore` or `Any` introduced
- [ ] Targeted tests for new logging paths where practical
- [ ] `ruff check` + `mypy` pass (backend)
- [ ] `npm run lint:rs` pass (client)
---
## Post-Sprint
- [ ] Evaluate if logging should be sampled for high-frequency segmenter transitions
- [ ] Consider centralized log suppression for repeated invalid client inputs

View File

@@ -1,144 +0,0 @@
# Sprint: Logging Gap Remediation (P2 - Persistence/Exports)
> **Size**: L | **Owner**: Platform | **Prerequisites**: P1 logging gaps resolved
> **Phase**: Observability - Data & Lifecycle
---
## Open Issues & Prerequisites
> ⚠️ **Review Date**: 2026-01-03 — Verification complete, scope needs prioritization.
### Blocking Issues
| ID | Issue | Status | Resolution |
|----|-------|--------|------------|
| **B1** | Log volume for repository CRUD operations | Pending | Decide sampling/level policy |
| **B2** | Sensitive data in repository logs | Pending | Redaction and field allowlist |
### Design Gaps to Address
| ID | Gap | Resolution |
|----|-----|------------|
| G1 | Consistent DB timing strategy across BaseRepository and UoW | Add `log_timing` helpers or per-method timing |
| G2 | Export logs should include size without dumping content | Log byte count + segment count only |
### Prerequisite Verification
| Prerequisite | Status | Notes |
|--------------|--------|-------|
| Logging helpers available | ✅ | `log_timing`, `get_logger` |
| State transition logger | ✅ | `log_state_transition` |
---
## Validation Status (2026-01-03)
### PARTIALLY IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| DB migrations lifecycle logs | Partial | Migration start/end logged; repo/UoW still silent |
| Audio writer open logging | Partial | Open/flush errors logged, but thread lifecycle unlogged |
### NOT IMPLEMENTED
| Component | Status | Notes |
|-----------|--------|-------|
| BaseRepository query timing | Not implemented | `src/noteflow/infrastructure/persistence/repositories/_base.py` |
| UnitOfWork lifecycle logs | Not implemented | `src/noteflow/infrastructure/persistence/unit_of_work.py` |
| Repository CRUD logging | Not implemented | `meeting_repo.py`, `segment_repo.py`, `summary_repo.py`, etc. |
| Asset deletion no-op logging | Not implemented | `src/noteflow/infrastructure/persistence/repositories/asset_repo.py` |
| Export timing/logging | Not implemented | `pdf.py`, `markdown.py`, `html.py` |
| Diarization session close log level | Not implemented | `src/noteflow/infrastructure/diarization/session.py` uses debug |
| Background task lifecycle logs | Not implemented | `src/noteflow/grpc/_mixins/diarization/_jobs.py` task creation missing |
**Downstream impact**: Limited visibility into DB performance, export latency, and lifecycle cleanup.
---
## Objective
Add structured logging for persistence, export, and lifecycle operations so DB performance issues and long-running exports are diagnosable without ad-hoc debugging.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| **Repository logging level** | INFO for mutations, DEBUG for reads | Avoid log noise while capturing state changes |
| **Timing strategy** | `log_timing` around DB write batches | Consistent duration metrics without per-row spam |
| **Export logging** | Log sizes and durations only | Avoid dumping user content |
---
## What Already Exists
| Asset | Location | Implication |
|-------|----------|-------------|
| Migration logging | `src/noteflow/infrastructure/persistence/database.py` | Reuse for DB lifecycle logs |
| Log helpers | `src/noteflow/infrastructure/logging/*` | Standardize on structured logging |
---
## Scope
| Task | Effort | Notes |
|------|--------|-------|
| **Infrastructure Layer** | | |
| Add BaseRepository timing wrappers | M | `_execute_*` methods emit duration |
| Add UnitOfWork lifecycle logs | S | __aenter__/commit/rollback/exit |
| Add CRUD mutation logs in repositories | L | Create/Update/Delete summary logs |
| Add asset deletion no-op log | S | log when directory missing |
| Add export timing logs | M | PDF/Markdown/HTML export duration + size |
| Promote diarization session close to INFO | S | `session.py` |
| Log diarization job task creation | S | `grpc/_mixins/diarization/_jobs.py` |
| Add audio flush thread lifecycle logs | S | `infrastructure/audio/writer.py` |
**Total Effort**: L (4-8 hours)
---
## Deliverables
### Backend
**Infrastructure Layer**:
- [ ] `src/noteflow/infrastructure/persistence/repositories/_base.py` — timing logs for DB operations
- [ ] `src/noteflow/infrastructure/persistence/unit_of_work.py` — session/commit/rollback logs
- [ ] `src/noteflow/infrastructure/persistence/repositories/*_repo.py` — mutation logging
- [ ] `src/noteflow/infrastructure/persistence/repositories/asset_repo.py` — no-op delete log
- [ ] `src/noteflow/infrastructure/export/pdf.py` — duration + byte-size log
- [ ] `src/noteflow/infrastructure/export/markdown.py` — export count log
- [ ] `src/noteflow/infrastructure/export/html.py` — export count log
- [ ] `src/noteflow/infrastructure/diarization/session.py` — info-level close log
- [ ] `src/noteflow/grpc/_mixins/diarization/_jobs.py` — background task creation log
- [ ] `src/noteflow/infrastructure/audio/writer.py` — flush thread lifecycle logs
---
## Test Strategy
### Core test cases
- **Repositories**: `caplog` validates mutation logging for create/update/delete
- **UnitOfWork**: log emitted on commit/rollback paths
- **Exports**: ensure logs include duration and output size (bytes/segments)
- **Lifecycle**: diarization session close emits info log
---
## Quality Gates
- [ ] Logging includes structured fields and avoids payload content
- [ ] No new `# type: ignore` or `Any` introduced
- [ ] `pytest` passes for touched modules
- [ ] `ruff check` + `mypy` pass
---
## Post-Sprint
- [ ] Assess performance impact of repo timing logs
- [ ] Consider opt-in logging for high-volume read paths

View File

@@ -95,9 +95,7 @@ class UsageEvent:
event_type: str,
metrics: UsageMetrics,
*,
meeting_id: str | None = None,
success: bool = True,
error_code: str | None = None,
context: UsageEventContext | None = None,
attributes: dict[str, object] | None = None,
) -> UsageEvent:
"""Create usage event from metrics object.
@@ -105,28 +103,36 @@ class UsageEvent:
Args:
event_type: Event type identifier.
metrics: Provider/model metrics.
meeting_id: Associated meeting ID.
success: Whether the operation succeeded.
error_code: Error code if failed.
context: Context fields for the event.
attributes: Additional context attributes.
Returns:
New UsageEvent instance.
"""
resolved_context = context or UsageEventContext()
return cls(
event_type=event_type,
meeting_id=meeting_id,
meeting_id=resolved_context.meeting_id,
provider_name=metrics.provider_name,
model_name=metrics.model_name,
tokens_input=metrics.tokens_input,
tokens_output=metrics.tokens_output,
latency_ms=metrics.latency_ms,
success=success,
error_code=error_code,
success=resolved_context.success,
error_code=resolved_context.error_code,
attributes=attributes or {},
)
@dataclass(frozen=True, slots=True)
class UsageEventContext:
"""Common context fields for usage events."""
meeting_id: str | None = None
success: bool = True
error_code: str | None = None
class UsageEventSink(Protocol):
"""Protocol for usage event emission.
@@ -147,9 +153,7 @@ class UsageEventSink(Protocol):
event_type: str,
metrics: UsageMetrics | None = None,
*,
meeting_id: str | None = None,
success: bool = True,
error_code: str | None = None,
context: UsageEventContext | None = None,
**attributes: object,
) -> None:
"""Convenience method to record a usage event with common fields.
@@ -157,9 +161,7 @@ class UsageEventSink(Protocol):
Args:
event_type: Event type identifier.
metrics: Optional provider/model metrics.
meeting_id: Associated meeting ID.
success: Whether the operation succeeded.
error_code: Error code if failed.
context: Context fields for the event.
**attributes: Additional context attributes.
"""
...
@@ -176,9 +178,7 @@ class NullUsageEventSink:
event_type: str,
metrics: UsageMetrics | None = None,
*,
meeting_id: str | None = None,
success: bool = True,
error_code: str | None = None,
context: UsageEventContext | None = None,
**attributes: object,
) -> None:
"""Discard the event."""

View File

@@ -0,0 +1,42 @@
"""Meeting service shared types."""
from __future__ import annotations
from dataclasses import dataclass, field
from noteflow.domain.entities import WordTiming
@dataclass(frozen=True, slots=True)
class SegmentData:
"""Data for creating a transcript segment.
Groups segment parameters to reduce parameter count in service methods.
"""
segment_id: int
"""Segment sequence number."""
text: str
"""Transcript text."""
start_time: float
"""Start time in seconds."""
end_time: float
"""End time in seconds."""
words: list[WordTiming] = field(default_factory=list)
"""Optional word-level timing."""
language: str = "en"
"""Detected language code."""
language_confidence: float = 0.0
"""Language detection confidence."""
avg_logprob: float = 0.0
"""Average log probability."""
no_speech_prob: float = 0.0
"""No-speech probability."""

View File

@@ -6,7 +6,8 @@ Uses existing Integration entity and IntegrationRepository for persistence.
from __future__ import annotations
from typing import TYPE_CHECKING
from datetime import datetime
from typing import TYPE_CHECKING, TypedDict, Unpack
from uuid import UUID
from noteflow.config.constants import ERR_TOKEN_REFRESH_PREFIX, OAUTH_FIELD_ACCESS_TOKEN
@@ -23,6 +24,14 @@ from noteflow.infrastructure.calendar.oauth_manager import OAuthError
from noteflow.infrastructure.calendar.outlook_adapter import OutlookCalendarError
from noteflow.infrastructure.logging import get_logger
class _CalendarServiceDepsKwargs(TypedDict, total=False):
"""Optional dependency overrides for CalendarService."""
oauth_manager: OAuthManager
google_adapter: GoogleCalendarAdapter
outlook_adapter: OutlookCalendarAdapter
if TYPE_CHECKING:
from collections.abc import Callable
@@ -53,21 +62,20 @@ class CalendarService:
self,
uow_factory: Callable[[], UnitOfWork],
settings: CalendarIntegrationSettings,
oauth_manager: OAuthManager | None = None,
google_adapter: GoogleCalendarAdapter | None = None,
outlook_adapter: OutlookCalendarAdapter | None = None,
**kwargs: Unpack[_CalendarServiceDepsKwargs],
) -> None:
"""Initialize calendar service.
Args:
uow_factory: Factory function returning UnitOfWork instances.
settings: Calendar settings with OAuth credentials.
oauth_manager: Optional OAuth manager (created from settings if not provided).
google_adapter: Optional Google adapter (created if not provided).
outlook_adapter: Optional Outlook adapter (created if not provided).
**kwargs: Optional dependency overrides.
"""
self._uow_factory = uow_factory
self._settings = settings
oauth_manager = kwargs.get("oauth_manager")
google_adapter = kwargs.get("google_adapter")
outlook_adapter = kwargs.get("outlook_adapter")
self._oauth_manager = oauth_manager or OAuthManager(settings)
self._google_adapter = google_adapter or GoogleCalendarAdapter()
self._outlook_adapter = outlook_adapter or OutlookCalendarAdapter()
@@ -126,8 +134,22 @@ class CalendarService:
"""
oauth_provider = self._parse_provider(provider)
tokens = await self._exchange_tokens(oauth_provider, code, state)
email = await self._fetch_provider_email(oauth_provider, tokens.access_token)
integration_id = await self._store_calendar_integration(provider, email, tokens)
logger.info("Completed OAuth for provider=%s, email=%s", provider, email)
return integration_id
async def _exchange_tokens(
self,
oauth_provider: OAuthProvider,
code: str,
state: str,
) -> OAuthTokens:
"""Exchange authorization code for tokens."""
try:
tokens = await self._oauth_manager.complete_auth(
return await self._oauth_manager.complete_auth(
provider=oauth_provider,
code=code,
state=state,
@@ -135,13 +157,24 @@ class CalendarService:
except OAuthError as e:
raise CalendarServiceError(f"OAuth failed: {e}") from e
# Get user email from provider
async def _fetch_provider_email(
self,
oauth_provider: OAuthProvider,
access_token: str,
) -> str:
"""Fetch the account email for a provider."""
try:
email = await self._fetch_account_email(oauth_provider, tokens.access_token)
return await self._fetch_account_email(oauth_provider, access_token)
except (GoogleCalendarError, OutlookCalendarError) as e:
raise CalendarServiceError(f"Failed to get user email: {e}") from e
# Persist integration and tokens
async def _store_calendar_integration(
self,
provider: str,
email: str,
tokens: OAuthTokens,
) -> UUID:
"""Persist calendar integration and encrypted tokens."""
async with self._uow_factory() as uow:
integration = await uow.integrations.get_by_provider(
provider=provider,
@@ -162,18 +195,13 @@ class CalendarService:
integration.connect(provider_email=email)
await uow.integrations.update(integration)
# Store encrypted tokens
await uow.integrations.set_secrets(
integration_id=integration.id,
secrets=tokens.to_secrets_dict(),
)
await uow.commit()
# Capture ID before leaving context manager
integration_id = integration.id
logger.info("Completed OAuth for provider=%s, email=%s", provider, email)
return integration_id
return integration.id
async def get_connection_status(self, provider: str) -> OAuthConnectionInfo:
"""Get OAuth connection status for a provider.
@@ -198,17 +226,7 @@ class CalendarService:
# Check token expiry
secrets = await uow.integrations.get_secrets(integration.id)
expires_at = None
status = self._map_integration_status(integration.status)
if secrets and integration.is_connected:
try:
tokens = OAuthTokens.from_secrets_dict(secrets)
expires_at = tokens.expires_at
if tokens.is_expired():
status = "expired"
except (KeyError, ValueError):
status = IntegrationStatus.ERROR.value
status, expires_at = self._resolve_connection_status(integration, secrets)
return OAuthConnectionInfo(
provider=provider,
@@ -418,3 +436,23 @@ class CalendarService:
def _map_integration_status(status: IntegrationStatus) -> str:
"""Map IntegrationStatus to connection status string."""
return status.value if status in IntegrationStatus else IntegrationStatus.DISCONNECTED.value
@staticmethod
def _resolve_connection_status(
integration: Integration,
secrets: dict[str, str] | None,
) -> tuple[str, datetime | None]:
"""Resolve connection status and expiration time from stored secrets."""
status = CalendarService._map_integration_status(integration.status)
if not secrets or not integration.is_connected:
return status, None
try:
tokens = OAuthTokens.from_secrets_dict(secrets)
except (KeyError, ValueError):
return IntegrationStatus.ERROR.value, None
expires_at = tokens.expires_at
if tokens.is_expired():
return "expired", expires_at
return status, expires_at

View File

@@ -172,44 +172,72 @@ class ExportService:
format=fmt.value if fmt else "inferred",
)
# Determine format from extension if not provided
if fmt is None:
fmt = self.infer_format_from_extension(output_path.suffix)
logger.debug(
"Format inferred from extension",
extension=output_path.suffix,
inferred_format=fmt.value,
)
fmt = self._resolve_export_format(output_path, fmt)
content = await self.export_transcript(meeting_id, fmt)
# Ensure correct extension
exporter = self.get_exporter(fmt)
original_path = output_path
if output_path.suffix != exporter.file_extension:
output_path = output_path.with_suffix(exporter.file_extension)
logger.debug(
"Adjusted file extension",
original_path=str(original_path),
adjusted_path=str(output_path),
expected_extension=exporter.file_extension,
)
output_path = self._ensure_output_extension(output_path, exporter)
file_size = self._write_export_content(output_path, content, meeting_id, fmt)
logger.info(
"File export completed",
meeting_id=str(meeting_id),
output_path=str(output_path),
format=fmt.value,
file_size_bytes=file_size,
)
return output_path
def _resolve_export_format(
self,
output_path: Path,
fmt: ExportFormat | None,
) -> ExportFormat:
"""Resolve export format, inferring from file extension if needed."""
if fmt is not None:
return fmt
inferred = self.infer_format_from_extension(output_path.suffix)
logger.debug(
"Format inferred from extension",
extension=output_path.suffix,
inferred_format=inferred.value,
)
return inferred
def _ensure_output_extension(
self,
output_path: Path,
exporter: TranscriptExporter,
) -> Path:
"""Ensure output path uses the exporter's extension."""
if output_path.suffix == exporter.file_extension:
return output_path
adjusted_path = output_path.with_suffix(exporter.file_extension)
logger.debug(
"Adjusted file extension",
original_path=str(output_path),
adjusted_path=str(adjusted_path),
expected_extension=exporter.file_extension,
)
return adjusted_path
def _write_export_content(
self,
output_path: Path,
content: str | bytes,
meeting_id: MeetingId,
fmt: ExportFormat,
) -> int:
"""Write export content to disk and return file size."""
output_path.parent.mkdir(parents=True, exist_ok=True)
try:
if isinstance(content, bytes):
output_path.write_bytes(content)
else:
output_path.write_text(content, encoding="utf-8")
file_size = output_path.stat().st_size
logger.info(
"File export completed",
meeting_id=str(meeting_id),
output_path=str(output_path),
format=fmt.value,
file_size_bytes=file_size,
)
return output_path.stat().st_size
except OSError as exc:
logger.error(
"File write failed",
@@ -219,8 +247,6 @@ class ExportService:
)
raise
return output_path
def infer_format_from_extension(self, extension: str) -> ExportFormat:
"""Infer export format from file extension.

View File

@@ -6,22 +6,14 @@ Orchestrates meeting-related use cases with persistence.
from __future__ import annotations
from collections.abc import Sequence
from dataclasses import dataclass, field
from datetime import UTC, datetime
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, NotRequired, Required, TypedDict, Unpack
from noteflow.domain.entities import (
ActionItem,
Annotation,
KeyPoint,
Meeting,
Segment,
Summary,
WordTiming,
)
from noteflow.domain.entities import ActionItem, Annotation, KeyPoint, Meeting, Segment, Summary
from noteflow.domain.value_objects import AnnotationId, AnnotationType, MeetingId
from noteflow.infrastructure.logging import get_logger, log_state_transition
from ._meeting_types import SegmentData
if TYPE_CHECKING:
from collections.abc import Sequence as SequenceType
@@ -31,39 +23,24 @@ if TYPE_CHECKING:
logger = get_logger(__name__)
@dataclass(frozen=True, slots=True)
class SegmentData:
"""Data for creating a transcript segment.
class _SummarySaveKwargs(TypedDict, total=False):
"""Optional summary fields for save_summary."""
Groups segment parameters to reduce parameter count in service methods.
"""
key_points: list[KeyPoint] | None
action_items: list[ActionItem] | None
provider_name: str
model_name: str
segment_id: int
"""Segment sequence number."""
text: str
"""Transcript text."""
class _AnnotationCreateKwargs(TypedDict):
"""Required fields for creating an annotation."""
start_time: float
"""Start time in seconds."""
end_time: float
"""End time in seconds."""
words: list[WordTiming] = field(default_factory=list)
"""Optional word-level timing."""
language: str = "en"
"""Detected language code."""
language_confidence: float = 0.0
"""Language detection confidence."""
avg_logprob: float = 0.0
"""Average log probability."""
no_speech_prob: float = 0.0
"""No-speech probability."""
meeting_id: Required[MeetingId]
annotation_type: Required[AnnotationType]
text: Required[str]
start_time: Required[float]
end_time: Required[float]
segment_ids: NotRequired[list[int] | None]
class MeetingService:
@@ -338,29 +315,27 @@ class MeetingService:
self,
meeting_id: MeetingId,
executive_summary: str,
key_points: list[KeyPoint] | None = None,
action_items: list[ActionItem] | None = None,
provider_name: str = "",
model_name: str = "",
**kwargs: Unpack[_SummarySaveKwargs],
) -> Summary:
"""Save or update a meeting summary.
Args:
meeting_id: Meeting identifier.
executive_summary: Executive summary text.
key_points: List of key points.
action_items: List of action items.
provider_name: Name of the provider that generated the summary.
model_name: Name of the model that generated the summary.
**kwargs: Optional summary fields (key_points, action_items, provider_name, model_name).
Returns:
Saved summary.
"""
key_points = kwargs.get("key_points") or []
action_items = kwargs.get("action_items") or []
provider_name = kwargs.get("provider_name", "")
model_name = kwargs.get("model_name", "")
summary = Summary(
meeting_id=meeting_id,
executive_summary=executive_summary,
key_points=key_points or [],
action_items=action_items or [],
key_points=key_points,
action_items=action_items,
generated_at=datetime.now(UTC),
provider_name=provider_name,
model_name=model_name,
@@ -386,28 +361,24 @@ class MeetingService:
async def add_annotation(
self,
meeting_id: MeetingId,
annotation_type: AnnotationType,
text: str,
start_time: float,
end_time: float,
segment_ids: list[int] | None = None,
**kwargs: Unpack[_AnnotationCreateKwargs],
) -> Annotation:
"""Add an annotation to a meeting.
Args:
meeting_id: Meeting identifier.
annotation_type: Type of annotation.
text: Annotation text.
start_time: Start time in seconds.
end_time: End time in seconds.
segment_ids: Optional list of linked segment IDs.
**kwargs: Annotation fields.
Returns:
Added annotation.
"""
from uuid import uuid4
meeting_id = kwargs["meeting_id"]
annotation_type = kwargs["annotation_type"]
text = kwargs["text"]
start_time = kwargs["start_time"]
end_time = kwargs["end_time"]
segment_ids = kwargs.get("segment_ids") or []
annotation = Annotation(
id=AnnotationId(uuid4()),
meeting_id=meeting_id,
@@ -415,7 +386,7 @@ class MeetingService:
text=text,
start_time=start_time,
end_time=end_time,
segment_ids=segment_ids or [],
segment_ids=segment_ids,
)
async with self._uow:

View File

@@ -13,7 +13,7 @@ from typing import TYPE_CHECKING
from noteflow.config.constants import ERROR_MSG_MEETING_PREFIX
from noteflow.config.settings import get_feature_flags
from noteflow.domain.entities.named_entity import NamedEntity
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.logging import get_logger, log_timing
if TYPE_CHECKING:
from collections.abc import Callable, Sequence
@@ -176,24 +176,41 @@ class NerService:
List of extracted entities.
"""
async with self._extraction_lock:
# Ensure model is loaded (thread-safe)
if not self._ner_engine.is_ready():
async with self._model_load_lock:
if not self._ner_engine.is_ready():
# Warm up model with a simple extraction
loop = asyncio.get_running_loop()
await loop.run_in_executor(
None,
lambda: self._ner_engine.extract("warm up"),
)
await self._ensure_model_ready()
return await self._extract_entities(segments)
# Extract entities in executor (CPU-bound)
loop = asyncio.get_running_loop()
return await loop.run_in_executor(
async def _ensure_model_ready(self) -> None:
"""Ensure the NER model is loaded and warmed up safely."""
if self._ner_engine.is_ready():
return
async with self._model_load_lock:
if self._ner_engine.is_ready():
return
await self._warmup_model()
async def _warmup_model(self) -> None:
"""Warm up the NER model with a simple extraction."""
loop = asyncio.get_running_loop()
with log_timing("ner_warmup"):
await loop.run_in_executor(
None,
lambda: self._ner_engine.extract("warm up"),
)
async def _extract_entities(
self,
segments: list[tuple[int, str]],
) -> list[NamedEntity]:
"""Extract entities in an executor (CPU-bound)."""
loop = asyncio.get_running_loop()
segment_count = len(segments)
with log_timing("ner_extraction", segment_count=segment_count):
entities = await loop.run_in_executor(
None,
self._ner_engine.extract_from_segments,
segments,
)
return entities
async def get_entities(self, meeting_id: MeetingId) -> Sequence[NamedEntity]:
"""Get cached entities for a meeting (no extraction).

View File

@@ -2,7 +2,7 @@
from __future__ import annotations
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, TypedDict, Unpack
from uuid import UUID, uuid4
from noteflow.domain.entities.project import Project, ProjectSettings, slugify
@@ -13,6 +13,31 @@ from ._types import ProjectCrudRepositoryProvider
if TYPE_CHECKING:
from collections.abc import Sequence
class _ProjectCreateKwargs(TypedDict, total=False):
"""Optional fields for project creation."""
slug: str | None
description: str | None
settings: ProjectSettings | None
class _ProjectListKwargs(TypedDict, total=False):
"""Optional fields for listing projects."""
include_archived: bool
limit: int
offset: int
class _ProjectUpdateKwargs(TypedDict, total=False):
"""Optional fields for project updates."""
name: str | None
slug: str | None
description: str | None
settings: ProjectSettings | None
logger = get_logger(__name__)
@@ -24,9 +49,7 @@ class ProjectCrudMixin:
uow: ProjectCrudRepositoryProvider,
workspace_id: UUID,
name: str,
slug: str | None = None,
description: str | None = None,
settings: ProjectSettings | None = None,
**kwargs: Unpack[_ProjectCreateKwargs],
) -> Project:
"""Create a new project in a workspace.
@@ -34,9 +57,7 @@ class ProjectCrudMixin:
uow: Unit of work for database access.
workspace_id: Parent workspace UUID.
name: Project name.
slug: Optional URL slug (auto-generated from name if not provided).
description: Optional project description.
settings: Optional project settings.
**kwargs: Optional fields (slug, description, settings).
Returns:
Created project.
@@ -49,6 +70,9 @@ class ProjectCrudMixin:
raise NotImplementedError(msg)
project_id = uuid4()
slug = kwargs.get("slug")
description = kwargs.get("description")
settings = kwargs.get("settings")
generated_slug = slug or slugify(name)
project = await uow.projects.create(
@@ -124,18 +148,14 @@ class ProjectCrudMixin:
self,
uow: ProjectCrudRepositoryProvider,
workspace_id: UUID,
include_archived: bool = False,
limit: int = 50,
offset: int = 0,
**kwargs: Unpack[_ProjectListKwargs],
) -> Sequence[Project]:
"""List projects in a workspace.
Args:
uow: Unit of work for database access.
workspace_id: Workspace UUID.
include_archived: Whether to include archived projects.
limit: Maximum projects to return.
offset: Pagination offset.
**kwargs: Optional filters (include_archived, limit, offset).
Returns:
List of projects.
@@ -143,6 +163,9 @@ class ProjectCrudMixin:
if not uow.supports_projects:
return []
include_archived = kwargs.get("include_archived", False)
limit = kwargs.get("limit", 50)
offset = kwargs.get("offset", 0)
return await uow.projects.list_for_workspace(
workspace_id,
include_archived=include_archived,
@@ -154,20 +177,14 @@ class ProjectCrudMixin:
self,
uow: ProjectCrudRepositoryProvider,
project_id: UUID,
name: str | None = None,
slug: str | None = None,
description: str | None = None,
settings: ProjectSettings | None = None,
**kwargs: Unpack[_ProjectUpdateKwargs],
) -> Project | None:
"""Update a project.
Args:
uow: Unit of work for database access.
project_id: Project UUID.
name: New name (optional).
slug: New slug (optional).
description: New description (optional).
settings: New settings (optional).
**kwargs: Optional updates (name, slug, description, settings).
Returns:
Updated project if found, None otherwise.
@@ -179,6 +196,11 @@ class ProjectCrudMixin:
if not project:
return None
name = kwargs.get("name")
slug = kwargs.get("slug")
description = kwargs.get("description")
settings = kwargs.get("settings")
if name is not None:
project.update_name(name)
if slug is not None:

View File

@@ -48,6 +48,14 @@ class RecoveryResult:
return self.meetings_recovered + self.diarization_jobs_failed
@dataclass(frozen=True)
class _RecoveryValidation:
"""Result of meeting recovery checks."""
is_valid: bool
previous_state: MeetingState
class RecoveryService:
"""Recover meetings from crash states on server startup.
@@ -170,40 +178,15 @@ class RecoveryService:
recovery_time = datetime.now(UTC).isoformat()
for meeting in meetings:
previous_state = meeting.state
meeting.mark_error()
log_state_transition(
"meeting",
str(meeting.id),
previous_state,
meeting.state,
reason="crash_recovery",
)
# Add crash recovery metadata
meeting.metadata["crash_recovered"] = "true"
meeting.metadata["crash_recovery_time"] = recovery_time
meeting.metadata["crash_previous_state"] = previous_state.name
# Validate audio files if configured
validation = self.validate_meeting_audio(meeting)
meeting.metadata["audio_valid"] = str(validation.is_valid).lower()
validation = self._recover_meeting(meeting, recovery_time)
if not validation.is_valid:
audio_failures += 1
meeting.metadata["audio_error"] = validation.error_message or "unknown"
logger.warning(
"Audio validation failed for meeting %s: %s",
meeting.id,
validation.error_message,
)
await self._uow.meetings.update(meeting)
recovered.append(meeting)
logger.info(
"Recovered crashed meeting: id=%s, previous_state=%s, audio_valid=%s",
meeting.id,
previous_state,
validation.previous_state,
validation.is_valid,
)
@@ -215,6 +198,40 @@ class RecoveryService:
)
return recovered, audio_failures
def _recover_meeting(
self, meeting: Meeting, recovery_time: str
) -> _RecoveryValidation:
"""Apply crash recovery updates to a single meeting."""
previous_state = meeting.state
meeting.mark_error()
log_state_transition(
"meeting",
str(meeting.id),
previous_state,
meeting.state,
reason="crash_recovery",
)
meeting.metadata["crash_recovered"] = "true"
meeting.metadata["crash_recovery_time"] = recovery_time
meeting.metadata["crash_previous_state"] = previous_state.name
validation = self.validate_meeting_audio(meeting)
meeting.metadata["audio_valid"] = str(validation.is_valid).lower()
if not validation.is_valid:
meeting.metadata["audio_error"] = validation.error_message or "unknown"
logger.warning(
"Audio validation failed for meeting %s: %s",
meeting.id,
validation.error_message,
)
return _RecoveryValidation(
is_valid=validation.is_valid,
previous_state=previous_state,
)
async def count_crashed_meetings(self) -> int:
"""Count meetings currently in crash states.

View File

@@ -7,7 +7,7 @@ from __future__ import annotations
from dataclasses import dataclass, field
from enum import Enum
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, TypedDict, Unpack
from noteflow.application.observability.ports import (
NullUsageEventSink,
@@ -40,6 +40,15 @@ if TYPE_CHECKING:
logger = get_logger(__name__)
class _SummarizationOptionsKwargs(TypedDict, total=False):
"""Optional overrides for summarization behavior."""
mode: SummarizationMode | None
max_key_points: int | None
max_action_items: int | None
style_prompt: str | None
class SummarizationMode(Enum):
"""Available summarization modes."""
@@ -182,20 +191,14 @@ class SummarizationService:
self,
meeting_id: MeetingId,
segments: Sequence[Segment],
mode: SummarizationMode | None = None,
max_key_points: int | None = None,
max_action_items: int | None = None,
style_prompt: str | None = None,
**kwargs: Unpack[_SummarizationOptionsKwargs],
) -> SummarizationServiceResult:
"""Generate evidence-linked summary for meeting transcript.
Args:
meeting_id: The meeting ID.
segments: Transcript segments to summarize.
mode: Override default mode (None uses settings default).
max_key_points: Override default max key points.
max_action_items: Override default max action items.
style_prompt: Optional style instruction to prepend to system prompt.
**kwargs: Optional overrides (mode, max_key_points, max_action_items, style_prompt).
Returns:
SummarizationServiceResult with summary and verification.
@@ -204,6 +207,11 @@ class SummarizationService:
SummarizationError: If summarization fails and no fallback available.
ProviderUnavailableError: If no provider is available for the mode.
"""
mode = kwargs.get("mode")
max_key_points = kwargs.get("max_key_points")
max_action_items = kwargs.get("max_action_items")
style_prompt = kwargs.get("style_prompt")
target_mode = mode or self.settings.default_mode
provider, actual_mode = self._get_provider_with_fallback(target_mode)
fallback_used = actual_mode != target_mode

View File

@@ -203,29 +203,7 @@ class WebhookService:
try:
delivery = await self._executor.deliver(config, event_type, payload)
deliveries.append(delivery)
if delivery.succeeded:
_logger.info(
"Webhook delivered: %s -> %s (status=%d)",
event_type.value,
config.url,
delivery.status_code,
)
elif delivery.attempt_count > 0:
_logger.warning(
"Webhook failed: %s -> %s (error=%s)",
event_type.value,
config.url,
delivery.error_message,
)
else:
_logger.debug(
"Webhook skipped: %s -> %s (reason=%s)",
event_type.value,
config.url,
delivery.error_message,
)
self._log_delivery(event_type, config.url, delivery)
# INTENTIONAL BROAD HANDLER: Fire-and-forget webhook delivery
# - Webhook failures must never block calling code
# - All exceptions logged but suppressed
@@ -234,6 +212,37 @@ class WebhookService:
return deliveries
@staticmethod
def _log_delivery(
event_type: WebhookEventType,
url: str,
delivery: WebhookDelivery,
) -> None:
if delivery.succeeded:
_logger.info(
"Webhook delivered: %s -> %s (status=%d)",
event_type.value,
url,
delivery.status_code,
)
return
if delivery.attempt_count > 0:
_logger.warning(
"Webhook failed: %s -> %s (error=%s)",
event_type.value,
url,
delivery.error_message,
)
return
_logger.debug(
"Webhook skipped: %s -> %s (reason=%s)",
event_type.value,
url,
delivery.error_message,
)
async def close(self) -> None:
"""Clean up resources."""
await self._executor.close()

View File

@@ -10,7 +10,7 @@ from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime
from enum import StrEnum
from typing import Self, cast
from typing import NotRequired, Required, Self, TypedDict, Unpack, cast
from uuid import UUID, uuid4
from noteflow.domain.utils.time import utc_now
@@ -192,6 +192,28 @@ class OidcProviderCreateParams:
"""Whether to require email verification."""
@dataclass(frozen=True, slots=True)
class OidcProviderRegistration:
"""Required fields for registering an OIDC provider."""
workspace_id: UUID
name: str
issuer_url: str
client_id: str
client_secret: str | None = None
preset: OidcProviderPreset = OidcProviderPreset.CUSTOM
class _OidcProviderCreateKwargs(TypedDict):
"""Keyword arguments for OidcProviderConfig.create."""
workspace_id: Required[UUID]
name: Required[str]
issuer_url: Required[str]
client_id: Required[str]
params: NotRequired[OidcProviderCreateParams | None]
@dataclass
class OidcProviderConfig:
"""OIDC provider configuration.
@@ -231,24 +253,21 @@ class OidcProviderConfig:
@classmethod
def create(
cls,
workspace_id: UUID,
name: str,
issuer_url: str,
client_id: str,
params: OidcProviderCreateParams | None = None,
**kwargs: Unpack[_OidcProviderCreateKwargs],
) -> OidcProviderConfig:
"""Create a new OIDC provider configuration.
Args:
workspace_id: Workspace this provider belongs to.
name: Display name for the provider.
issuer_url: OIDC issuer URL (base URL for discovery).
client_id: OAuth client ID.
params: Optional creation parameters (preset, scopes, etc.).
**kwargs: Provider creation fields.
Returns:
New OidcProviderConfig instance.
"""
workspace_id = kwargs["workspace_id"]
name = kwargs["name"]
issuer_url = kwargs["issuer_url"]
client_id = kwargs["client_id"]
params = kwargs.get("params")
p = params or OidcProviderCreateParams()
now = utc_now()
return cls(

View File

@@ -4,7 +4,7 @@ from __future__ import annotations
from dataclasses import dataclass, field
from enum import Enum
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, NotRequired, Required, TypedDict, Unpack
from uuid import UUID, uuid4
if TYPE_CHECKING:
@@ -53,6 +53,16 @@ class EntityCategory(Enum):
raise ValueError(f"Invalid entity category: {value}") from e
class _NamedEntityCreateKwargs(TypedDict):
"""Keyword arguments for NamedEntity.create."""
text: Required[str]
category: Required[EntityCategory]
segment_ids: Required[list[int]]
confidence: Required[float]
meeting_id: NotRequired[MeetingId | None]
@dataclass
class NamedEntity:
"""A named entity extracted from a meeting transcript.
@@ -85,11 +95,7 @@ class NamedEntity:
@classmethod
def create(
cls,
text: str,
category: EntityCategory,
segment_ids: list[int],
confidence: float,
meeting_id: MeetingId | None = None,
**kwargs: Unpack[_NamedEntityCreateKwargs],
) -> NamedEntity:
"""Create a new named entity with validation and normalization.
@@ -97,11 +103,7 @@ class NamedEntity:
and confidence validation before entity construction.
Args:
text: The entity text as it appears in transcript.
category: Classification category.
segment_ids: Segments where entity appears (will be deduplicated and sorted).
confidence: Extraction confidence (0.0-1.0).
meeting_id: Optional meeting association.
**kwargs: Named entity fields.
Returns:
New NamedEntity instance with normalized fields.
@@ -110,6 +112,12 @@ class NamedEntity:
ValueError: If text is empty or confidence is out of range.
"""
# Validate required text
text = kwargs["text"]
category = kwargs["category"]
segment_ids = kwargs["segment_ids"]
confidence = kwargs["confidence"]
meeting_id = kwargs.get("meeting_id")
stripped_text = text.strip()
if not stripped_text:
raise ValueError("Entity text cannot be empty")

View File

@@ -7,7 +7,7 @@ from __future__ import annotations
from collections.abc import Sequence
from datetime import datetime
from typing import TYPE_CHECKING, Protocol
from typing import TYPE_CHECKING, Protocol, TypedDict, Unpack
from noteflow.config.constants import ERR_SERVER_RESTARTED
@@ -19,6 +19,15 @@ if TYPE_CHECKING:
)
class DiarizationStatusKwargs(TypedDict, total=False):
"""Optional fields for diarization job status updates."""
segments_updated: int | None
speaker_ids: list[str] | None
error_message: str | None
started_at: datetime | None
class DiarizationJobRepository(Protocol):
"""Repository protocol for DiarizationJob operations.
@@ -52,21 +61,14 @@ class DiarizationJobRepository(Protocol):
self,
job_id: str,
status: int,
*,
segments_updated: int | None = None,
speaker_ids: list[str] | None = None,
error_message: str | None = None,
started_at: datetime | None = None,
**kwargs: Unpack[DiarizationStatusKwargs],
) -> bool:
"""Update job status and optional fields.
Args:
job_id: Job identifier.
status: New status value.
segments_updated: Optional segments count.
speaker_ids: Optional speaker IDs list.
error_message: Optional error message.
started_at: Optional job start timestamp.
**kwargs: Optional update fields.
Returns:
True if job was updated, False if not found.

View File

@@ -3,7 +3,8 @@
from __future__ import annotations
from collections.abc import Sequence
from typing import TYPE_CHECKING, Protocol
from dataclasses import dataclass
from typing import TYPE_CHECKING, Protocol, TypedDict, Unpack
if TYPE_CHECKING:
from uuid import UUID
@@ -11,6 +12,25 @@ if TYPE_CHECKING:
from noteflow.domain.entities.project import Project, ProjectSettings
@dataclass(frozen=True, slots=True)
class ProjectCreateOptions:
"""Optional parameters for project creation."""
slug: str | None = None
description: str | None = None
is_default: bool = False
settings: ProjectSettings | None = None
class ProjectCreateKwargs(TypedDict, total=False):
"""Legacy keyword arguments for project creation."""
slug: str | None
description: str | None
is_default: bool
settings: ProjectSettings | None
class ProjectRepository(Protocol):
"""Repository protocol for Project operations."""
@@ -60,10 +80,7 @@ class ProjectRepository(Protocol):
project_id: UUID,
workspace_id: UUID,
name: str,
slug: str | None = None,
description: str | None = None,
is_default: bool = False,
settings: ProjectSettings | None = None,
**kwargs: Unpack[ProjectCreateKwargs],
) -> Project:
"""Create a new project.
@@ -71,10 +88,7 @@ class ProjectRepository(Protocol):
project_id: UUID for the new project.
workspace_id: Parent workspace UUID.
name: Project name.
slug: Optional URL slug.
description: Optional description.
is_default: Whether this is the workspace's default project.
settings: Optional project settings.
**kwargs: Optional creation settings.
Returns:
Created project.

View File

@@ -3,7 +3,7 @@
from __future__ import annotations
from collections.abc import Sequence
from typing import TYPE_CHECKING, Protocol
from typing import TYPE_CHECKING, Protocol, TypedDict, Unpack
if TYPE_CHECKING:
from uuid import UUID
@@ -16,6 +16,14 @@ if TYPE_CHECKING:
)
class WorkspaceCreateKwargs(TypedDict, total=False):
"""Optional workspace creation fields."""
slug: str | None
is_default: bool
settings: WorkspaceSettings | None
class WorkspaceRepository(Protocol):
"""Repository protocol for Workspace operations."""
@@ -57,9 +65,7 @@ class WorkspaceRepository(Protocol):
workspace_id: UUID,
name: str,
owner_id: UUID,
slug: str | None = None,
is_default: bool = False,
settings: WorkspaceSettings | None = None,
**kwargs: Unpack[WorkspaceCreateKwargs],
) -> Workspace:
"""Create a new workspace.
@@ -67,9 +73,7 @@ class WorkspaceRepository(Protocol):
workspace_id: UUID for the new workspace.
name: Workspace name.
owner_id: User UUID of the owner.
slug: Optional URL slug.
is_default: Whether this is the user's default workspace.
settings: Optional workspace settings.
**kwargs: Optional fields (slug, is_default, settings).
Returns:
Created workspace.

View File

@@ -7,7 +7,7 @@ from __future__ import annotations
from collections.abc import Sequence
from datetime import datetime
from typing import TYPE_CHECKING, Protocol
from typing import TYPE_CHECKING, Protocol, TypedDict, Unpack
if TYPE_CHECKING:
from uuid import UUID
@@ -16,6 +16,16 @@ if TYPE_CHECKING:
from noteflow.domain.value_objects import AnnotationId, MeetingId, MeetingState
class MeetingListKwargs(TypedDict, total=False):
"""Optional arguments for listing meetings."""
states: list[MeetingState] | None
limit: int
offset: int
sort_desc: bool
project_id: UUID | None
class MeetingRepository(Protocol):
"""Repository protocol for Meeting aggregate operations."""
@@ -68,20 +78,12 @@ class MeetingRepository(Protocol):
async def list_all(
self,
states: list[MeetingState] | None = None,
limit: int = 100,
offset: int = 0,
sort_desc: bool = True,
project_id: UUID | None = None,
**kwargs: Unpack[MeetingListKwargs],
) -> tuple[Sequence[Meeting], int]:
"""List meetings with optional filtering.
Args:
states: Optional list of states to filter by.
limit: Maximum number of meetings to return.
offset: Number of meetings to skip.
sort_desc: Sort by created_at descending if True.
project_id: Optional project scope filter.
**kwargs: Optional filters (states, limit, offset, sort_desc, project_id).
Returns:
Tuple of (meetings list, total count matching filter).

View File

@@ -9,7 +9,7 @@ from __future__ import annotations
from dataclasses import asdict, dataclass, field
from datetime import datetime
from enum import Enum
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, NotRequired, Required, TypedDict, Unpack
from uuid import UUID, uuid4
from noteflow.domain.utils.time import utc_now
@@ -88,29 +88,23 @@ class WebhookConfig:
@classmethod
def create(
cls,
workspace_id: UUID,
url: str,
events: list[WebhookEventType],
*,
name: str = "Webhook",
secret: str | None = None,
timeout_ms: int = DEFAULT_WEBHOOK_TIMEOUT_MS,
max_retries: int = DEFAULT_WEBHOOK_MAX_RETRIES,
**kwargs: Unpack["WebhookConfigCreateKwargs"],
) -> WebhookConfig:
"""Create a new webhook configuration.
Args:
workspace_id: Workspace UUID.
url: Target URL for delivery.
events: List of event types to subscribe.
name: Display name.
secret: Optional HMAC signing secret.
timeout_ms: Request timeout in milliseconds.
max_retries: Maximum retry attempts.
**kwargs: Webhook config fields.
Returns:
New WebhookConfig with generated ID and timestamps.
"""
workspace_id = kwargs["workspace_id"]
url = kwargs["url"]
events = kwargs["events"]
name = kwargs.get("name", "Webhook")
secret = kwargs.get("secret")
timeout_ms = kwargs.get("timeout_ms", DEFAULT_WEBHOOK_TIMEOUT_MS)
max_retries = kwargs.get("max_retries", DEFAULT_WEBHOOK_MAX_RETRIES)
now = utc_now()
return cls(
id=uuid4(),
@@ -137,6 +131,28 @@ class WebhookConfig:
return event_type in self.events
@dataclass(frozen=True, slots=True)
class WebhookConfigCreateOptions:
"""Optional parameters for webhook config creation."""
name: str = "Webhook"
secret: str | None = None
timeout_ms: int = DEFAULT_WEBHOOK_TIMEOUT_MS
max_retries: int = DEFAULT_WEBHOOK_MAX_RETRIES
class WebhookConfigCreateKwargs(TypedDict):
"""Keyword arguments for webhook config creation."""
workspace_id: Required[UUID]
url: Required[str]
events: Required[list[WebhookEventType]]
name: NotRequired[str]
secret: NotRequired[str | None]
timeout_ms: NotRequired[int]
max_retries: NotRequired[int]
@dataclass(frozen=True, slots=True)
class DeliveryResult:
"""Result of a webhook delivery attempt.

146
src/noteflow/grpc/_cli.py Normal file
View File

@@ -0,0 +1,146 @@
"""CLI helpers for the gRPC server entrypoint."""
from __future__ import annotations
import argparse
from typing import TYPE_CHECKING
from noteflow.config.constants import DEFAULT_GRPC_PORT
from noteflow.infrastructure.asr.engine import VALID_MODEL_SIZES
from noteflow.infrastructure.logging import get_logger
from ._config import (
DEFAULT_BIND_ADDRESS,
DEFAULT_MODEL,
AsrConfig,
DiarizationConfig,
GrpcServerConfig,
)
logger = get_logger(__name__)
if TYPE_CHECKING:
from noteflow.config.settings import Settings
def parse_args() -> argparse.Namespace:
"""Parse command-line arguments for the gRPC server."""
parser = argparse.ArgumentParser(description="NoteFlow gRPC Server")
parser.add_argument(
"-p",
"--port",
type=int,
default=DEFAULT_GRPC_PORT,
help=f"Port to listen on (default: {DEFAULT_GRPC_PORT})",
)
parser.add_argument(
"-m",
"--model",
type=str,
default=DEFAULT_MODEL,
choices=list(VALID_MODEL_SIZES),
help=f"ASR model size (default: {DEFAULT_MODEL})",
)
parser.add_argument(
"-d",
"--device",
type=str,
default="cpu",
choices=["cpu", "cuda"],
help="ASR device (default: cpu)",
)
parser.add_argument(
"-c",
"--compute-type",
type=str,
default="int8",
choices=["int8", "float16", "float32"],
help="ASR compute type (default: int8)",
)
parser.add_argument(
"--database-url",
type=str,
default=None,
help="PostgreSQL database URL (overrides NOTEFLOW_DATABASE_URL)",
)
parser.add_argument(
"-v",
"--verbose",
action="store_true",
help="Enable verbose logging",
)
parser.add_argument(
"--diarization",
action="store_true",
help="Enable speaker diarization (requires pyannote.audio)",
)
parser.add_argument(
"--diarization-hf-token",
type=str,
default=None,
help="HuggingFace token for pyannote models (overrides NOTEFLOW_DIARIZATION_HF_TOKEN)",
)
parser.add_argument(
"--diarization-device",
type=str,
default="auto",
choices=["auto", "cpu", "cuda", "mps"],
help="Device for diarization (default: auto)",
)
return parser.parse_args()
def build_config_from_args(args: argparse.Namespace, settings: Settings | None) -> GrpcServerConfig:
"""Build server configuration from CLI arguments and settings.
CLI arguments take precedence over environment settings.
"""
database_url = args.database_url
if not database_url and settings:
database_url = str(settings.database_url)
if not database_url:
logger.warning("No database URL configured, running in-memory mode")
diarization_enabled = args.diarization
diarization_hf_token = args.diarization_hf_token
diarization_device = args.diarization_device
diarization_streaming_latency: float | None = None
diarization_min_speakers: int | None = None
diarization_max_speakers: int | None = None
diarization_refinement_enabled = True
if settings and not diarization_enabled:
diarization_enabled = settings.diarization_enabled
if settings and not diarization_hf_token:
diarization_hf_token = settings.diarization_hf_token
if settings and diarization_device == "auto":
diarization_device = settings.diarization_device
if settings:
diarization_streaming_latency = settings.diarization_streaming_latency
diarization_min_speakers = settings.diarization_min_speakers
diarization_max_speakers = settings.diarization_max_speakers
diarization_refinement_enabled = settings.diarization_refinement_enabled
bind_address = DEFAULT_BIND_ADDRESS
if settings:
bind_address = settings.grpc_bind_address
return GrpcServerConfig(
port=args.port,
bind_address=bind_address,
asr=AsrConfig(
model=args.model,
device=args.device,
compute_type=args.compute_type,
),
database_url=database_url,
diarization=DiarizationConfig(
enabled=diarization_enabled,
hf_token=diarization_hf_token,
device=diarization_device,
streaming_latency=diarization_streaming_latency,
min_speakers=diarization_min_speakers,
max_speakers=diarization_max_speakers,
refinement_enabled=diarization_refinement_enabled,
),
)

View File

@@ -2,19 +2,41 @@
from __future__ import annotations
from typing import TYPE_CHECKING, cast
from typing import TYPE_CHECKING, NotRequired, Required, TypedDict, Unpack, cast
import grpc
from noteflow.grpc._client_mixins.converters import annotation_type_to_proto, proto_to_annotation_info
from noteflow.grpc._types import AnnotationInfo
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.logging import get_client_rate_limiter, get_logger
if TYPE_CHECKING:
from noteflow.grpc._client_mixins.protocols import ClientHost
class _AnnotationCreateKwargs(TypedDict):
"""Keyword arguments for creating an annotation."""
meeting_id: Required[str]
annotation_type: Required[str]
text: Required[str]
start_time: Required[float]
end_time: Required[float]
segment_ids: NotRequired[list[int] | None]
class _AnnotationUpdateKwargs(TypedDict, total=False):
"""Keyword arguments for updating an annotation."""
annotation_type: str | None
text: str | None
start_time: float | None
end_time: float | None
segment_ids: list[int] | None
logger = get_logger(__name__)
_rate_limiter = get_client_rate_limiter()
RpcError = cast(type[Exception], getattr(grpc, "RpcError", Exception))
@@ -23,30 +45,27 @@ class AnnotationClientMixin:
def add_annotation(
self: ClientHost,
meeting_id: str,
annotation_type: str,
text: str,
start_time: float,
end_time: float,
segment_ids: list[int] | None = None,
**kwargs: Unpack[_AnnotationCreateKwargs],
) -> AnnotationInfo | None:
"""Add an annotation to a meeting.
Args:
meeting_id: Meeting ID.
annotation_type: Type of annotation (action_item, decision, note).
text: Annotation text.
start_time: Start time in seconds.
end_time: End time in seconds.
segment_ids: Optional list of linked segment IDs.
**kwargs: Annotation fields.
Returns:
AnnotationInfo or None if request fails.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("add_annotation")
return None
try:
meeting_id = kwargs["meeting_id"]
annotation_type = kwargs["annotation_type"]
text = kwargs["text"]
start_time = kwargs["start_time"]
end_time = kwargs["end_time"]
segment_ids = kwargs.get("segment_ids") or []
proto_type = annotation_type_to_proto(annotation_type)
request = noteflow_pb2.AddAnnotationRequest(
meeting_id=meeting_id,
@@ -54,7 +73,7 @@ class AnnotationClientMixin:
text=text,
start_time=start_time,
end_time=end_time,
segment_ids=segment_ids or [],
segment_ids=segment_ids,
)
response = self.stub.AddAnnotation(request)
return proto_to_annotation_info(response)
@@ -72,6 +91,7 @@ class AnnotationClientMixin:
AnnotationInfo or None if not found.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("get_annotation")
return None
try:
@@ -99,6 +119,7 @@ class AnnotationClientMixin:
List of AnnotationInfo.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("list_annotations")
return []
try:
@@ -117,29 +138,27 @@ class AnnotationClientMixin:
def update_annotation(
self: ClientHost,
annotation_id: str,
annotation_type: str | None = None,
text: str | None = None,
start_time: float | None = None,
end_time: float | None = None,
segment_ids: list[int] | None = None,
**kwargs: Unpack[_AnnotationUpdateKwargs],
) -> AnnotationInfo | None:
"""Update an existing annotation.
Args:
annotation_id: Annotation ID.
annotation_type: Optional new type.
text: Optional new text.
start_time: Optional new start time.
end_time: Optional new end time.
segment_ids: Optional new segment IDs.
**kwargs: Optional annotation fields.
Returns:
Updated AnnotationInfo or None if request fails.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("update_annotation")
return None
try:
annotation_type = kwargs.get("annotation_type")
text = kwargs.get("text")
start_time = kwargs.get("start_time")
end_time = kwargs.get("end_time")
segment_ids = kwargs.get("segment_ids")
proto_type = (
annotation_type_to_proto(annotation_type)
if annotation_type
@@ -169,6 +188,7 @@ class AnnotationClientMixin:
True if deleted successfully.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("delete_annotation")
return False
try:

View File

@@ -9,12 +9,13 @@ import grpc
from noteflow.grpc._client_mixins.converters import job_status_to_str
from noteflow.grpc._types import DiarizationResult, RenameSpeakerResult
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.logging import get_client_rate_limiter, get_logger
if TYPE_CHECKING:
from noteflow.grpc._client_mixins.protocols import ClientHost
logger = get_logger(__name__)
_rate_limiter = get_client_rate_limiter()
class DiarizationClientMixin:
@@ -38,6 +39,7 @@ class DiarizationClientMixin:
DiarizationResult with job status or None if request fails.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("refine_speaker_diarization")
return None
try:
@@ -71,6 +73,7 @@ class DiarizationClientMixin:
DiarizationResult with current status or None if request fails.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("get_diarization_job_status")
return None
try:
@@ -105,6 +108,7 @@ class DiarizationClientMixin:
RenameSpeakerResult or None if request fails.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("rename_speaker")
return None
try:

View File

@@ -9,12 +9,13 @@ import grpc
from noteflow.grpc._client_mixins.converters import export_format_to_proto
from noteflow.grpc._types import ExportResult
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.logging import get_client_rate_limiter, get_logger
if TYPE_CHECKING:
from noteflow.grpc._client_mixins.protocols import ClientHost
logger = get_logger(__name__)
_rate_limiter = get_client_rate_limiter()
class ExportClientMixin:
@@ -35,6 +36,7 @@ class ExportClientMixin:
ExportResult or None if request fails.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("export_transcript")
return None
try:

View File

@@ -9,12 +9,13 @@ import grpc
from noteflow.grpc._client_mixins.converters import proto_to_meeting_info
from noteflow.grpc._types import MeetingInfo, TranscriptSegment
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.logging import get_client_rate_limiter, get_logger
if TYPE_CHECKING:
from noteflow.grpc._client_mixins.protocols import ClientHost
logger = get_logger(__name__)
_rate_limiter = get_client_rate_limiter()
class MeetingClientMixin:
@@ -30,6 +31,7 @@ class MeetingClientMixin:
MeetingInfo or None if request fails.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("create_meeting")
return None
try:
@@ -50,6 +52,7 @@ class MeetingClientMixin:
Updated MeetingInfo or None if request fails.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("stop_meeting")
return None
try:
@@ -70,6 +73,7 @@ class MeetingClientMixin:
MeetingInfo or None if not found.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("get_meeting")
return None
try:
@@ -94,6 +98,7 @@ class MeetingClientMixin:
List of TranscriptSegment or empty list if not found.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("get_meeting_segments")
return []
try:
@@ -131,6 +136,7 @@ class MeetingClientMixin:
List of MeetingInfo.
"""
if not self.stub:
_rate_limiter.warn_stub_missing("list_meetings")
return []
try:

View File

@@ -152,3 +152,7 @@ class ClientHost(Protocol):
def stop_streaming(self) -> None:
"""Stop streaming audio."""
...
def handle_stream_response(self, response: ProtoTranscriptUpdate) -> None:
"""Handle a single transcript update from the stream."""
...

View File

@@ -14,15 +14,16 @@ from noteflow.config.constants import DEFAULT_SAMPLE_RATE
from noteflow.grpc._config import STREAMING_CONFIG
from noteflow.grpc._types import ConnectionCallback, TranscriptCallback, TranscriptSegment
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.logging import get_client_rate_limiter, get_logger
if TYPE_CHECKING:
import numpy as np
from numpy.typing import NDArray
from noteflow.grpc._client_mixins.protocols import ClientHost
from noteflow.grpc._client_mixins.protocols import ClientHost, ProtoTranscriptUpdate
logger = get_logger(__name__)
_rate_limiter = get_client_rate_limiter()
class StreamingClientMixin:
@@ -46,7 +47,7 @@ class StreamingClientMixin:
True if streaming started.
"""
if not self.stub:
logger.error("Not connected")
_rate_limiter.warn_stub_missing("start_streaming")
return False
if self.stream_thread and self.stream_thread.is_alive():
@@ -144,35 +145,42 @@ class StreamingClientMixin:
for response in responses:
if self.stop_streaming_event.is_set():
break
if response.update_type == noteflow_pb2.UPDATE_TYPE_FINAL:
segment = TranscriptSegment(
segment_id=response.segment.segment_id,
text=response.segment.text,
start_time=response.segment.start_time,
end_time=response.segment.end_time,
language=response.segment.language,
is_final=True,
speaker_id=response.segment.speaker_id,
speaker_confidence=response.segment.speaker_confidence,
)
self.notify_transcript(segment)
elif response.update_type == noteflow_pb2.UPDATE_TYPE_PARTIAL:
segment = TranscriptSegment(
segment_id=0,
text=response.partial_text,
start_time=0,
end_time=0,
language="",
is_final=False,
)
self.notify_transcript(segment)
self.handle_stream_response(response)
except grpc.RpcError as e:
logger.error("Stream error: %s", e)
self.notify_connection(False, f"Stream error: {e}")
def handle_stream_response(
self: ClientHost,
response: ProtoTranscriptUpdate,
) -> None:
"""Handle a single transcript update from the stream."""
if response.update_type == noteflow_pb2.UPDATE_TYPE_FINAL:
segment = TranscriptSegment(
segment_id=response.segment.segment_id,
text=response.segment.text,
start_time=response.segment.start_time,
end_time=response.segment.end_time,
language=response.segment.language,
is_final=True,
speaker_id=response.segment.speaker_id,
speaker_confidence=response.segment.speaker_confidence,
)
self.notify_transcript(segment)
return
if response.update_type == noteflow_pb2.UPDATE_TYPE_PARTIAL:
segment = TranscriptSegment(
segment_id=0,
text=response.partial_text,
start_time=0,
end_time=0,
language="",
is_final=False,
)
self.notify_transcript(segment)
def notify_transcript(self: ClientHost, segment: TranscriptSegment) -> None:
"""Notify transcript callback.

View File

@@ -89,22 +89,28 @@ class GrpcServerConfig:
database_url: str | None = None
diarization: DiarizationConfig = field(default_factory=DiarizationConfig)
@dataclass(frozen=True, slots=True)
class Args:
"""Flat arguments for constructing a GrpcServerConfig."""
port: int
asr_model: str
asr_device: str
asr_compute_type: str
bind_address: str = DEFAULT_BIND_ADDRESS
database_url: str | None = None
diarization_enabled: bool = False
diarization_hf_token: str | None = None
diarization_device: str = DEFAULT_DIARIZATION_DEVICE
diarization_streaming_latency: float | None = None
diarization_min_speakers: int | None = None
diarization_max_speakers: int | None = None
diarization_refinement_enabled: bool = True
@classmethod
def from_args(
cls,
port: int,
asr_model: str,
asr_device: str,
asr_compute_type: str,
bind_address: str = DEFAULT_BIND_ADDRESS,
database_url: str | None = None,
diarization_enabled: bool = False,
diarization_hf_token: str | None = None,
diarization_device: str = DEFAULT_DIARIZATION_DEVICE,
diarization_streaming_latency: float | None = None,
diarization_min_speakers: int | None = None,
diarization_max_speakers: int | None = None,
diarization_refinement_enabled: bool = True,
args: Args,
) -> GrpcServerConfig:
"""Create config from flat argument values.
@@ -112,22 +118,22 @@ class GrpcServerConfig:
run_server() signature to structured configuration.
"""
return cls(
port=port,
bind_address=bind_address,
port=args.port,
bind_address=args.bind_address,
asr=AsrConfig(
model=asr_model,
device=asr_device,
compute_type=asr_compute_type,
model=args.asr_model,
device=args.asr_device,
compute_type=args.asr_compute_type,
),
database_url=database_url,
database_url=args.database_url,
diarization=DiarizationConfig(
enabled=diarization_enabled,
hf_token=diarization_hf_token,
device=diarization_device,
streaming_latency=diarization_streaming_latency,
min_speakers=diarization_min_speakers,
max_speakers=diarization_max_speakers,
refinement_enabled=diarization_refinement_enabled,
enabled=args.diarization_enabled,
hf_token=args.diarization_hf_token,
device=args.diarization_device,
streaming_latency=args.diarization_streaming_latency,
min_speakers=args.diarization_min_speakers,
max_speakers=args.diarization_max_speakers,
refinement_enabled=args.diarization_refinement_enabled,
),
)

View File

@@ -1,5 +1,6 @@
"""gRPC service mixins for NoteFlowServicer."""
from ._types import GrpcContext, GrpcStatusContext
from .annotation import AnnotationMixin
from .calendar import CalendarMixin
from .diarization import DiarizationMixin
@@ -17,6 +18,8 @@ from .sync import SyncMixin
from .webhooks import WebhooksMixin
__all__ = [
"GrpcContext",
"GrpcStatusContext",
"AnnotationMixin",
"CalendarMixin",
"DiarizationJobMixin",

View File

@@ -6,10 +6,20 @@ from typing import TYPE_CHECKING
from uuid import UUID
from noteflow.domain.value_objects import AnnotationId, MeetingId
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from ..errors import AbortableContext
logger = get_logger(__name__)
def _truncate_for_log(value: str, max_len: int = 8) -> str:
"""Truncate a value for safe logging (PII redaction)."""
if len(value) > max_len:
return f"{value[:max_len]}..."
return value
def parse_meeting_id(meeting_id_str: str) -> MeetingId:
"""Parse string to MeetingId.
@@ -48,6 +58,11 @@ async def parse_meeting_id_or_abort(
try:
return MeetingId(UUID(meeting_id_str))
except ValueError:
logger.warning(
"invalid_meeting_id_format",
meeting_id_truncated=_truncate_for_log(meeting_id_str),
meeting_id_length=len(meeting_id_str),
)
await abort_invalid_argument(context, "Invalid meeting_id")
@@ -65,6 +80,11 @@ def parse_meeting_id_or_none(meeting_id_str: str) -> MeetingId | None:
try:
return MeetingId(UUID(meeting_id_str))
except ValueError:
logger.warning(
"invalid_meeting_id_format",
meeting_id_truncated=_truncate_for_log(meeting_id_str),
meeting_id_length=len(meeting_id_str),
)
return None

View File

@@ -175,6 +175,12 @@ class JobsMixin(JobStatusMixin):
num_speakers = request.num_speakers or None
task = asyncio.create_task(self.run_diarization_job(job_id, num_speakers))
self.diarization_tasks[job_id] = task
logger.info(
"diarization_task_created",
job_id=job_id,
meeting_id=request.meeting_id,
num_speakers=num_speakers,
)
return noteflow_pb2.RefineSpeakerDiarizationResponse(
segments_updated=0, job_id=job_id, status=noteflow_pb2.JOB_STATUS_QUEUED

View File

@@ -14,6 +14,9 @@ from ..converters import parse_meeting_id_or_none
from ._speaker import apply_speaker_to_segment
if TYPE_CHECKING:
from noteflow.domain.entities import Segment
from noteflow.domain.ports.unit_of_work import UnitOfWork
from ..protocols import ServicerHost
logger = get_logger(__name__)
@@ -80,15 +83,24 @@ class RefinementMixin:
async with self.create_repository_provider() as repo:
segments = await repo.segments.get_by_meeting(parsed_meeting_id)
for segment in segments:
if apply_speaker_to_segment(segment, turns):
# For DB segments with db_id, use update_speaker
if segment.db_id is not None:
await repo.segments.update_speaker(
segment.db_id,
segment.speaker_id,
segment.speaker_confidence,
)
updated_count += 1
if not apply_speaker_to_segment(segment, turns):
continue
await _persist_speaker_update(repo, segment)
updated_count += 1
await repo.commit()
return updated_count
async def _persist_speaker_update(
repo: UnitOfWork,
segment: Segment,
) -> None:
"""Persist speaker update for a segment if it has a DB identity."""
if segment.db_id is None:
return
await repo.segments.update_speaker(
segment.db_id,
segment.speaker_id,
segment.speaker_confidence,
)

View File

@@ -15,6 +15,8 @@ from .._types import GrpcContext
if TYPE_CHECKING:
from collections.abc import Sequence
from noteflow.domain.ports.unit_of_work import UnitOfWork
from ..protocols import ServicerHost
@@ -97,18 +99,10 @@ class SpeakerMixin:
segments = await repo.segments.get_by_meeting(meeting_id)
for segment in segments:
if segment.speaker_id == request.old_speaker_id:
# For DB segments with db_id, use update_speaker
if segment.db_id is not None:
await repo.segments.update_speaker(
segment.db_id,
request.new_speaker_name,
segment.speaker_confidence,
)
else:
# Memory segments: update directly
segment.speaker_id = request.new_speaker_name
updated_count += 1
if segment.speaker_id != request.old_speaker_id:
continue
await _apply_speaker_rename(repo, segment, request.new_speaker_name)
updated_count += 1
await repo.commit()
@@ -116,3 +110,19 @@ class SpeakerMixin:
segments_updated=updated_count,
success=updated_count > 0,
)
async def _apply_speaker_rename(
repo: UnitOfWork,
segment: Segment,
new_speaker_name: str,
) -> None:
"""Persist speaker rename for a segment."""
if segment.db_id is not None:
await repo.segments.update_speaker(
segment.db_id,
new_speaker_name,
segment.speaker_confidence,
)
return
segment.speaker_id = new_speaker_name

View File

@@ -3,6 +3,7 @@
from __future__ import annotations
import asyncio
from dataclasses import dataclass
from functools import partial
from typing import TYPE_CHECKING
@@ -14,6 +15,9 @@ from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.repositories import StreamingTurn
if TYPE_CHECKING:
from noteflow.grpc.stream_state import MeetingStreamState
from noteflow.infrastructure.diarization import DiarizationSession
from ..protocols import ServicerHost
logger = get_logger(__name__)
@@ -46,33 +50,75 @@ class StreamingDiarizationMixin:
loop = asyncio.get_running_loop()
session = await self.ensure_diarization_session(meeting_id, state, loop)
if session is None:
return
context = DiarizationChunkContext(meeting_id=meeting_id, state=state)
new_turns = await self.process_diarization_chunk(
context,
session,
audio,
loop,
)
if not new_turns:
return
# Populate diarization_turns for compatibility with maybe_assign_speaker
state.diarization_turns.extend(new_turns)
state.diarization_stream_time = session.stream_time
# Persist turns immediately for crash resilience (DB only)
await self.persist_streaming_turns(meeting_id, list(new_turns))
async def ensure_diarization_session(
self: ServicerHost,
meeting_id: str,
state: MeetingStreamState,
loop: asyncio.AbstractEventLoop,
) -> DiarizationSession | None:
"""Return an initialized diarization session or None on failure."""
# Get or create per-meeting session under lock
async with self.diarization_lock:
session = state.diarization_session
if session is None:
try:
session = await loop.run_in_executor(
None,
self.diarization_engine.create_streaming_session,
meeting_id,
)
prior_turns = state.diarization_turns
prior_stream_time = state.diarization_stream_time
if prior_turns or prior_stream_time:
session.restore(prior_turns, stream_time=prior_stream_time)
state.diarization_session = session
except (RuntimeError, ValueError) as exc:
logger.warning(
"Streaming diarization disabled for meeting %s: %s",
meeting_id,
exc,
)
state.diarization_streaming_failed = True
return
if session is not None:
return session
# Guard: diarization_engine checked by caller (process_streaming_diarization)
engine = self.diarization_engine
if engine is None:
return None
try:
session = await loop.run_in_executor(
None,
engine.create_streaming_session,
meeting_id,
)
prior_turns = state.diarization_turns
prior_stream_time = state.diarization_stream_time
if prior_turns or prior_stream_time:
session.restore(prior_turns, stream_time=prior_stream_time)
state.diarization_session = session
return session
except (RuntimeError, ValueError) as exc:
logger.warning(
"Streaming diarization disabled for meeting %s: %s",
meeting_id,
exc,
)
state.diarization_streaming_failed = True
return None
async def process_diarization_chunk(
self: ServicerHost,
context: "DiarizationChunkContext",
session: DiarizationSession,
audio: NDArray[np.float32],
loop: asyncio.AbstractEventLoop,
) -> list[SpeakerTurn] | None:
"""Process a diarization chunk, returning new turns or None on failure."""
# Process chunk in thread pool (outside lock for parallelism)
try:
new_turns = await loop.run_in_executor(
turns = await loop.run_in_executor(
None,
partial(
session.process_chunk,
@@ -80,22 +126,23 @@ class StreamingDiarizationMixin:
sample_rate=self.DEFAULT_SAMPLE_RATE,
),
)
return list(turns)
except (RuntimeError, OSError) as exc:
logger.warning(
"Streaming diarization failed for meeting %s: %s",
meeting_id,
context.meeting_id,
exc,
)
state.diarization_streaming_failed = True
return
context.state.diarization_streaming_failed = True
return None
# Populate diarization_turns for compatibility with maybe_assign_speaker
if new_turns:
state.diarization_turns.extend(new_turns)
state.diarization_stream_time = session.stream_time
# Persist turns immediately for crash resilience (DB only)
await self.persist_streaming_turns(meeting_id, list(new_turns))
@dataclass(frozen=True, slots=True)
class DiarizationChunkContext:
"""Context for processing a diarization chunk."""
meeting_id: str
state: MeetingStreamState
async def persist_streaming_turns(
self: ServicerHost,

View File

@@ -75,7 +75,12 @@ class DiarizationJobMixin:
return float(settings.diarization_job_ttl_hours * SECONDS_PER_HOUR)
# INTENTIONAL BROAD HANDLER: Fallback for testing environments
# - Settings may fail to load in unit tests without full config
except Exception:
except Exception as exc:
logger.warning(
"diarization_ttl_settings_fallback",
error_type=type(exc).__name__,
fallback_ttl_seconds=_DEFAULT_JOB_TTL_SECONDS,
)
return _DEFAULT_JOB_TTL_SECONDS
async def prune_diarization_jobs(self: DiarizationJobServicer) -> None:
@@ -123,33 +128,7 @@ class DiarizationJobMixin:
await abort_not_found(context, "Diarization job", request.job_id)
raise # Unreachable but helps type checker
# Calculate progress percentage (time-based for running jobs)
progress_percent = 0.0
if job.status == noteflow_pb2.JOB_STATUS_COMPLETED:
progress_percent = 100.0
elif job.status == noteflow_pb2.JOB_STATUS_RUNNING and job.started_at is not None:
# All datetimes should now be timezone-aware UTC.
now = utc_now()
# Ensure started_at is also aware; should be UTC from repository.
started: datetime = job.started_at
elapsed = (now - started).total_seconds()
audio_duration = job.audio_duration_seconds
if audio_duration is not None and audio_duration > 0:
# ~10 seconds processing per 60 seconds audio
estimated_duration = audio_duration * 0.17
progress_percent = min(95.0, (elapsed / estimated_duration) * 100)
else:
# Fallback: assume 2 minutes total
progress_percent = min(95.0, (elapsed / 120) * 100)
return noteflow_pb2.DiarizationJobStatus(
job_id=job.job_id,
status=int(job.status),
segments_updated=job.segments_updated,
speaker_ids=job.speaker_ids,
error_message=job.error_message,
progress_percent=progress_percent,
)
return _build_job_status(job)
async def CancelDiarizationJob(
self: DiarizationJobServicer,
@@ -226,30 +205,38 @@ class DiarizationJobMixin:
active_jobs = await repo.diarization_jobs.get_all_active()
for job in active_jobs:
# Calculate progress percentage (time-based for running jobs)
progress_percent = 0.0
if job.status == noteflow_pb2.JOB_STATUS_RUNNING and job.started_at is not None:
now = utc_now()
started: datetime = job.started_at
elapsed = (now - started).total_seconds()
audio_duration = job.audio_duration_seconds
if audio_duration is not None and audio_duration > 0:
# ~10 seconds processing per 60 seconds audio
estimated_duration = audio_duration * 0.17
progress_percent = min(95.0, (elapsed / estimated_duration) * 100)
else:
# Fallback: assume 2 minutes total
progress_percent = min(95.0, (elapsed / 120) * 100)
job_status = noteflow_pb2.DiarizationJobStatus(
job_id=job.job_id,
status=int(job.status),
segments_updated=job.segments_updated,
speaker_ids=job.speaker_ids,
error_message=job.error_message,
progress_percent=progress_percent,
)
response.jobs.append(job_status)
response.jobs.append(_build_job_status(job))
logger.debug("Returning %d active diarization jobs", len(response.jobs))
return response
def _build_job_status(job: DiarizationJob) -> noteflow_pb2.DiarizationJobStatus:
"""Build proto status from a diarization job."""
return noteflow_pb2.DiarizationJobStatus(
job_id=job.job_id,
status=int(job.status),
segments_updated=job.segments_updated,
speaker_ids=job.speaker_ids,
error_message=job.error_message,
progress_percent=_calculate_progress_percent(job),
)
def _calculate_progress_percent(job: DiarizationJob) -> float:
"""Calculate progress percentage for a diarization job."""
if job.status == noteflow_pb2.JOB_STATUS_COMPLETED:
return 100.0
if job.status != noteflow_pb2.JOB_STATUS_RUNNING or job.started_at is None:
return 0.0
now = utc_now()
started: datetime = job.started_at
elapsed = (now - started).total_seconds()
audio_duration = job.audio_duration_seconds
if audio_duration is not None and audio_duration > 0:
# ~10 seconds processing per 60 seconds audio
estimated_duration = audio_duration * 0.17
return min(95.0, (elapsed / estimated_duration) * 100)
# Fallback: assume 2 minutes total
return min(95.0, (elapsed / 120) * 100)

View File

@@ -135,6 +135,12 @@ async def _resolve_active_project_id(
try:
workspace_uuid = UUID(workspace_id)
except ValueError:
truncated = workspace_id[:8] + "..." if len(workspace_id) > 8 else workspace_id
logger.warning(
"resolve_active_project: invalid workspace_id format",
workspace_id_truncated=truncated,
workspace_id_length=len(workspace_id),
)
return None
_, active_project = await host.project_service.get_active_project(
@@ -282,6 +288,12 @@ class MeetingMixin:
try:
project_id = UUID(request.project_id)
except ValueError:
truncated = request.project_id[:8] + "..." if len(request.project_id) > 8 else request.project_id
logger.warning(
"ListMeetings: invalid project_id format",
project_id_truncated=truncated,
project_id_length=len(request.project_id),
)
await abort_invalid_argument(context, f"{ERROR_INVALID_PROJECT_ID_PREFIX}{request.project_id}")
async with self.create_repository_provider() as repo:

View File

@@ -7,7 +7,12 @@ from typing import Protocol, cast
from uuid import UUID
from noteflow.config.constants import ERROR_INVALID_WORKSPACE_ID_FORMAT
from noteflow.domain.auth.oidc import ClaimMapping, OidcProviderConfig, OidcProviderPreset
from noteflow.domain.auth.oidc import (
ClaimMapping,
OidcProviderConfig,
OidcProviderPreset,
OidcProviderRegistration,
)
from noteflow.infrastructure.auth.oidc_discovery import OidcDiscoveryError
from noteflow.infrastructure.auth.oidc_registry import (
PROVIDER_PRESETS,
@@ -148,7 +153,7 @@ class OidcMixin:
# Register provider
oidc_service = self.get_oidc_service()
try:
provider, warnings = await oidc_service.register_provider(
registration = OidcProviderRegistration(
workspace_id=workspace_id,
name=request.name,
issuer_url=request.issuer_url,
@@ -160,6 +165,7 @@ class OidcMixin:
),
preset=preset,
)
provider, warnings = await oidc_service.register_provider(registration)
_apply_custom_provider_config(
provider,

View File

@@ -30,6 +30,7 @@ if TYPE_CHECKING:
from noteflow.infrastructure.auth.oidc_registry import OidcAuthService
from noteflow.infrastructure.diarization import (
DiarizationEngine,
DiarizationSession,
SpeakerTurn,
)
from noteflow.infrastructure.persistence.repositories import DiarizationJob
@@ -43,15 +44,12 @@ if TYPE_CHECKING:
from ..proto import noteflow_pb2
from ..stream_state import MeetingStreamState
from ._types import GrpcContext, GrpcStatusContext
from .diarization._streaming import DiarizationChunkContext
from .streaming._types import StreamSessionInit
class ServicerHost(Protocol):
"""Protocol defining shared state and methods for service mixins.
All mixins should type-hint `self` as `ServicerHost` to access these
attributes and methods from the host NoteFlowServicer class.
"""
class _ServicerState(Protocol):
"""Shared state required by service mixins."""
# Configuration
session_factory: async_sessionmaker[AsyncSession] | None
@@ -107,6 +105,13 @@ class ServicerHost(Protocol):
PARTIAL_CADENCE_SECONDS: Final[float]
MIN_PARTIAL_AUDIO_SECONDS: Final[float]
# OIDC service
oidc_service: OidcAuthService | None
class _ServicerCoreMethods(Protocol):
"""Core helper methods shared across mixins."""
@property
def diarization_job_ttl_seconds(self) -> float:
"""Return diarization job TTL from settings."""
@@ -125,15 +130,7 @@ class ServicerHost(Protocol):
...
def create_repository_provider(self) -> UnitOfWork:
"""Create a repository provider (database or memory backed).
Returns a UnitOfWork implementation appropriate for the current
configuration. Use this for operations that can work with either
backend, eliminating the need for if/else branching.
Returns:
SqlAlchemyUnitOfWork if database configured, MemoryUnitOfWork otherwise.
"""
"""Create a repository provider (database or memory backed)."""
...
def next_segment_id(self, meeting_id: str, fallback: int = 0) -> int:
@@ -149,11 +146,7 @@ class ServicerHost(Protocol):
...
def get_stream_state(self, meeting_id: str) -> MeetingStreamState | None:
"""Get consolidated streaming state for a meeting.
Returns None if meeting has no active stream state.
Single lookup replaces 13+ dict accesses in hot paths.
"""
"""Get consolidated streaming state for a meeting."""
...
def ensure_meeting_dek(self, meeting: Meeting) -> tuple[bytes, bytes, bool]:
@@ -178,7 +171,14 @@ class ServicerHost(Protocol):
"""Close and remove the audio writer for a meeting."""
...
# Diarization mixin methods (for internal cross-references)
def get_oidc_service(self) -> OidcAuthService:
"""Get or create the OIDC auth service."""
...
class _ServicerDiarizationMethods(Protocol):
"""Diarization helpers used by streaming and job mixins."""
async def prune_diarization_jobs(self) -> None:
"""Prune expired diarization jobs from in-memory cache."""
...
@@ -215,7 +215,6 @@ class ServicerHost(Protocol):
"""Run post-meeting speaker diarization refinement."""
...
# Diarization job management methods
async def update_job_completed(
self,
job_id: str,
@@ -278,19 +277,37 @@ class ServicerHost(Protocol):
"""Process audio chunk for streaming diarization (best-effort)."""
...
# Webhook methods
async def ensure_diarization_session(
self,
meeting_id: str,
state: MeetingStreamState,
loop: asyncio.AbstractEventLoop,
) -> DiarizationSession | None:
"""Return an initialized diarization session or None on failure."""
...
async def process_diarization_chunk(
self,
context: DiarizationChunkContext,
session: DiarizationSession,
audio: NDArray[np.float32],
loop: asyncio.AbstractEventLoop,
) -> list[SpeakerTurn] | None:
"""Process a diarization chunk, returning new turns or None on failure."""
...
class _ServicerWebhookMethods(Protocol):
"""Webhook helpers."""
async def fire_stop_webhooks(self, meeting: Meeting) -> None:
"""Trigger webhooks for meeting stop (fire-and-forget)."""
...
# OIDC service
oidc_service: OidcAuthService | None
def get_oidc_service(self) -> OidcAuthService:
"""Get or create the OIDC auth service."""
...
class _ServicerPreferencesMethods(Protocol):
"""Preferences helpers."""
# Preferences methods
async def decode_and_validate_prefs(
self,
request: noteflow_pb2.SetPreferencesRequest,
@@ -309,7 +326,10 @@ class ServicerHost(Protocol):
"""Apply preferences based on merge mode."""
...
# Streaming methods
class _ServicerStreamingMethods(Protocol):
"""Streaming helpers."""
async def init_stream_for_meeting(
self,
meeting_id: str,
@@ -334,7 +354,20 @@ class ServicerHost(Protocol):
"""Flush remaining audio from segmenter at stream end."""
...
# Summarization methods
async def prepare_stream_chunk(
self,
current_meeting_id: str | None,
initialized_meeting_id: str | None,
chunk: noteflow_pb2.AudioChunk,
context: GrpcContext,
) -> tuple[str, str | None] | None:
"""Validate and initialize streaming state for a chunk."""
...
class _ServicerSummarizationMethods(Protocol):
"""Summarization helpers."""
async def summarize_or_placeholder(
self,
meeting_id: MeetingId,
@@ -352,7 +385,10 @@ class ServicerHost(Protocol):
"""Generate a lightweight placeholder summary when summarization fails."""
...
# Sync mixin methods
class _ServicerSyncMethods(Protocol):
"""Sync helpers."""
def ensure_sync_runs_cache(self) -> dict[UUID, SyncRun]:
"""Ensure the sync runs cache exists."""
...
@@ -404,3 +440,18 @@ class ServicerHost(Protocol):
) -> SyncRun | None:
"""Mark sync run as failed with error message."""
...
class ServicerHost(
_ServicerState,
_ServicerCoreMethods,
_ServicerDiarizationMethods,
_ServicerWebhookMethods,
_ServicerPreferencesMethods,
_ServicerStreamingMethods,
_ServicerSummarizationMethods,
_ServicerSyncMethods,
Protocol,
):
"""Protocol defining shared state and methods for service mixins."""
pass

View File

@@ -60,23 +60,15 @@ class StreamingMixin:
try:
async for chunk in request_iterator:
meeting_id = chunk.meeting_id
if not meeting_id:
await abort_invalid_argument(context, "meeting_id required")
# Initialize stream on first chunk
if current_meeting_id is None:
# Track meeting_id BEFORE init to guarantee cleanup on any exception
# (cleanup_stream_resources is idempotent, safe to call even if init aborts)
initialized_meeting_id = meeting_id
init_result = await self.init_stream_for_meeting(meeting_id, context)
if init_result is None:
return # Error already sent via context.abort
current_meeting_id = meeting_id
elif meeting_id != current_meeting_id:
await abort_invalid_argument(
context, "Stream may only contain a single meeting_id"
)
prep = await self.prepare_stream_chunk(
current_meeting_id,
initialized_meeting_id,
chunk,
context,
)
if prep is None:
return # Error already sent via context.abort
current_meeting_id, initialized_meeting_id = prep
# Check for stop request (graceful shutdown from StopMeeting)
if current_meeting_id in self.stop_requested:
@@ -100,6 +92,33 @@ class StreamingMixin:
if cleanup_meeting := current_meeting_id or initialized_meeting_id:
cleanup_stream_resources(self, cleanup_meeting)
async def prepare_stream_chunk(
self: ServicerHost,
current_meeting_id: str | None,
initialized_meeting_id: str | None,
chunk: noteflow_pb2.AudioChunk,
context: GrpcContext,
) -> tuple[str, str | None] | None:
"""Validate and initialize streaming state for a chunk."""
meeting_id = chunk.meeting_id
if not meeting_id:
await abort_invalid_argument(context, "meeting_id required")
return None
if current_meeting_id is None:
# Track meeting_id BEFORE init to guarantee cleanup on any exception
initialized_meeting_id = meeting_id
init_result = await self.init_stream_for_meeting(meeting_id, context)
if init_result is None:
return None
return meeting_id, initialized_meeting_id
if meeting_id != current_meeting_id:
await abort_invalid_argument(context, "Stream may only contain a single meeting_id")
return None
return current_meeting_id, initialized_meeting_id
async def init_stream_for_meeting(
self: ServicerHost,
meeting_id: str,

View File

@@ -209,16 +209,21 @@ def decrement_pending_chunks(host: ServicerHost, meeting_id: str) -> None:
Call this after ASR processing completes for a segment.
"""
if hasattr(host, "_pending_chunks") and meeting_id in host.pending_chunks:
# Decrement by ACK_CHUNK_INTERVAL since we process in batches
host.pending_chunks[meeting_id] = max(
0, host.pending_chunks[meeting_id] - ACK_CHUNK_INTERVAL
)
if receipt_times := host.chunk_receipt_times.get(meeting_id):
# Remove timestamps corresponding to processed chunks
for _ in range(min(ACK_CHUNK_INTERVAL, len(receipt_times))):
if receipt_times:
receipt_times.popleft()
if not hasattr(host, "_pending_chunks"):
return
if meeting_id not in host.pending_chunks:
return
# Decrement by ACK_CHUNK_INTERVAL since we process in batches
host.pending_chunks[meeting_id] = max(
0, host.pending_chunks[meeting_id] - ACK_CHUNK_INTERVAL
)
receipt_times = host.chunk_receipt_times.get(meeting_id)
if not receipt_times:
return
# Remove timestamps corresponding to processed chunks
for _ in range(min(ACK_CHUNK_INTERVAL, len(receipt_times))):
receipt_times.popleft()
def _convert_audio_format(

View File

@@ -138,24 +138,8 @@ class StreamSessionManager:
Returns:
Initialization result, or None if error was sent.
"""
# Atomic check-and-add protected by lock with timeout to prevent deadlock
try:
async with asyncio.timeout(STREAM_INIT_LOCK_TIMEOUT_SECONDS):
async with host.stream_init_lock:
if meeting_id in host.active_streams:
await abort_failed_precondition(
context, f"{ERROR_MSG_MEETING_PREFIX}{meeting_id} already streaming"
)
host.active_streams.add(meeting_id)
except TimeoutError:
logger.error(
"Stream initialization lock timeout for meeting %s after %.1fs",
meeting_id,
STREAM_INIT_LOCK_TIMEOUT_SECONDS,
)
await abort_failed_precondition(
context, "Stream initialization timed out - server may be overloaded"
)
if not await StreamSessionManager._reserve_stream_slot(host, meeting_id, context):
return None
init_result = await StreamSessionManager._init_stream_session(host, meeting_id)
@@ -166,6 +150,48 @@ class StreamSessionManager:
return init_result
@staticmethod
async def _reserve_stream_slot(
host: ServicerHost,
meeting_id: str,
context: GrpcContext,
) -> bool:
"""Reserve the meeting for streaming or abort on conflict."""
# Atomic check-and-add protected by lock with timeout to prevent deadlock
try:
async with asyncio.timeout(STREAM_INIT_LOCK_TIMEOUT_SECONDS):
reserved = await StreamSessionManager._try_reserve_stream_slot(
host,
meeting_id,
context,
)
except TimeoutError:
logger.error(
"Stream initialization lock timeout for meeting %s after %.1fs",
meeting_id,
STREAM_INIT_LOCK_TIMEOUT_SECONDS,
)
await abort_failed_precondition(
context, "Stream initialization timed out - server may be overloaded"
)
return False
return reserved
@staticmethod
async def _try_reserve_stream_slot(
host: ServicerHost,
meeting_id: str,
context: GrpcContext,
) -> bool:
async with host.stream_init_lock:
if meeting_id in host.active_streams:
await abort_failed_precondition(
context, f"{ERROR_MSG_MEETING_PREFIX}{meeting_id} already streaming"
)
return False
host.active_streams.add(meeting_id)
return True
@staticmethod
async def _init_stream_session(
host: ServicerHost,

View File

@@ -0,0 +1,184 @@
"""Type stubs for NoteFlowServicer mixin methods (type checking only)."""
from __future__ import annotations
from collections.abc import AsyncIterator
from typing import Protocol
from ._mixins._types import GrpcContext, GrpcStatusContext
from .proto import noteflow_pb2
class _StreamingStubs(Protocol):
"""Streaming mixin stubs."""
def StreamTranscription(
self,
request_iterator: AsyncIterator[noteflow_pb2.AudioChunk],
context: GrpcContext,
) -> AsyncIterator[noteflow_pb2.TranscriptUpdate]: ...
class _CalendarStubs(Protocol):
"""Calendar mixin stubs."""
async def GetCalendarProviders(
self,
request: noteflow_pb2.GetCalendarProvidersRequest,
context: GrpcContext,
) -> noteflow_pb2.GetCalendarProvidersResponse: ...
async def InitiateOAuth(
self,
request: noteflow_pb2.InitiateOAuthRequest,
context: GrpcContext,
) -> noteflow_pb2.InitiateOAuthResponse: ...
async def CompleteOAuth(
self,
request: noteflow_pb2.CompleteOAuthRequest,
context: GrpcContext,
) -> noteflow_pb2.CompleteOAuthResponse: ...
async def GetOAuthConnectionStatus(
self,
request: noteflow_pb2.GetOAuthConnectionStatusRequest,
context: GrpcContext,
) -> noteflow_pb2.GetOAuthConnectionStatusResponse: ...
async def DisconnectOAuth(
self,
request: noteflow_pb2.DisconnectOAuthRequest,
context: GrpcContext,
) -> noteflow_pb2.DisconnectOAuthResponse: ...
class _SummarizationStubs(Protocol):
"""Summarization mixin stubs."""
async def GetCloudConsentStatus(
self,
request: noteflow_pb2.GetCloudConsentStatusRequest,
context: GrpcContext,
) -> noteflow_pb2.GetCloudConsentStatusResponse: ...
async def GrantCloudConsent(
self,
request: noteflow_pb2.GrantCloudConsentRequest,
context: GrpcContext,
) -> noteflow_pb2.GrantCloudConsentResponse: ...
async def RevokeCloudConsent(
self,
request: noteflow_pb2.RevokeCloudConsentRequest,
context: GrpcContext,
) -> noteflow_pb2.RevokeCloudConsentResponse: ...
async def GenerateSummary(
self,
request: noteflow_pb2.GenerateSummaryRequest,
context: GrpcContext,
) -> noteflow_pb2.Summary: ...
class _SyncStubs(Protocol):
"""Sync mixin stubs."""
async def StartIntegrationSync(
self,
request: noteflow_pb2.StartIntegrationSyncRequest,
context: GrpcContext,
) -> noteflow_pb2.StartIntegrationSyncResponse: ...
async def GetSyncStatus(
self,
request: noteflow_pb2.GetSyncStatusRequest,
context: GrpcContext,
) -> noteflow_pb2.GetSyncStatusResponse: ...
async def ListSyncHistory(
self,
request: noteflow_pb2.ListSyncHistoryRequest,
context: GrpcContext,
) -> noteflow_pb2.ListSyncHistoryResponse: ...
async def GetUserIntegrations(
self,
request: noteflow_pb2.GetUserIntegrationsRequest,
context: GrpcContext,
) -> noteflow_pb2.GetUserIntegrationsResponse: ...
class _DiarizationStubs(Protocol):
"""Diarization mixin stubs."""
async def RefineSpeakerDiarization(
self,
request: noteflow_pb2.RefineSpeakerDiarizationRequest,
context: GrpcStatusContext,
) -> noteflow_pb2.RefineSpeakerDiarizationResponse: ...
async def RenameSpeaker(
self,
request: noteflow_pb2.RenameSpeakerRequest,
context: GrpcContext,
) -> noteflow_pb2.RenameSpeakerResponse: ...
async def GetDiarizationJobStatus(
self,
request: noteflow_pb2.GetDiarizationJobStatusRequest,
context: GrpcContext,
) -> noteflow_pb2.DiarizationJobStatus: ...
async def CancelDiarizationJob(
self,
request: noteflow_pb2.CancelDiarizationJobRequest,
context: GrpcContext,
) -> noteflow_pb2.CancelDiarizationJobResponse: ...
class _WebhookStubs(Protocol):
"""Webhook mixin stubs."""
async def RegisterWebhook(
self,
request: noteflow_pb2.RegisterWebhookRequest,
context: GrpcContext,
) -> noteflow_pb2.WebhookConfigProto: ...
async def ListWebhooks(
self,
request: noteflow_pb2.ListWebhooksRequest,
context: GrpcContext,
) -> noteflow_pb2.ListWebhooksResponse: ...
async def UpdateWebhook(
self,
request: noteflow_pb2.UpdateWebhookRequest,
context: GrpcContext,
) -> noteflow_pb2.WebhookConfigProto: ...
async def DeleteWebhook(
self,
request: noteflow_pb2.DeleteWebhookRequest,
context: GrpcContext,
) -> noteflow_pb2.DeleteWebhookResponse: ...
async def GetWebhookDeliveries(
self,
request: noteflow_pb2.GetWebhookDeliveriesRequest,
context: GrpcContext,
) -> noteflow_pb2.GetWebhookDeliveriesResponse: ...
class NoteFlowServicerStubs(
_StreamingStubs,
_CalendarStubs,
_SummarizationStubs,
_SyncStubs,
_DiarizationStubs,
_WebhookStubs,
Protocol,
):
"""Composite protocol for NoteFlow servicer mixin stubs."""
pass

View File

@@ -8,6 +8,7 @@ from typing import TYPE_CHECKING, Final
import grpc
from noteflow.grpc import _types
from noteflow.grpc._client_mixins import (
AnnotationClientMixin,
DiarizationClientMixin,
@@ -16,7 +17,15 @@ from noteflow.grpc._client_mixins import (
StreamingClientMixin,
)
from noteflow.grpc._config import STREAMING_CONFIG
from noteflow.grpc import _types
from noteflow.grpc._types import (
AnnotationInfo,
DiarizationResult,
ExportResult,
MeetingInfo,
RenameSpeakerResult,
ServerInfo,
TranscriptSegment,
)
from noteflow.grpc.proto import noteflow_pb2, noteflow_pb2_grpc
from noteflow.infrastructure.logging import get_logger
@@ -24,10 +33,22 @@ if TYPE_CHECKING:
import numpy as np
from numpy.typing import NDArray
from noteflow.grpc._client_mixins.protocols import NoteFlowServiceStubProtocol
from noteflow.grpc._client_mixins.protocols import ClientHost, NoteFlowServiceStubProtocol
logger = get_logger(__name__)
# Re-export types for public API (used by grpc/__init__.py)
__all__ = [
"AnnotationInfo",
"DiarizationResult",
"ExportResult",
"MeetingInfo",
"NoteFlowClient",
"RenameSpeakerResult",
"ServerInfo",
"TranscriptSegment",
]
DEFAULT_SERVER: Final[str] = "localhost:50051"
CHUNK_TIMEOUT: Final[float] = 0.1 # Timeout for getting chunks from queue
@@ -144,7 +165,9 @@ class NoteFlowClient(
def disconnect(self) -> None:
"""Disconnect from the server."""
self.stop_streaming()
# Type assertion: NoteFlowClient implements ClientHost protocol
client: ClientHost = self
client.stop_streaming()
if self._channel is not None:
self._channel.close()

View File

@@ -1,12 +1,22 @@
"""gRPC interceptors for NoteFlow.
Provide cross-cutting concerns for RPC calls:
- Identity context propagation
- Request tracing
- Identity context propagation and validation
- Per-RPC request logging with timing
"""
from noteflow.grpc.interceptors.identity import IdentityInterceptor
from noteflow.grpc.interceptors.identity import (
METADATA_REQUEST_ID,
METADATA_USER_ID,
METADATA_WORKSPACE_ID,
IdentityInterceptor,
)
from noteflow.grpc.interceptors.logging import RequestLoggingInterceptor
__all__ = [
"METADATA_REQUEST_ID",
"METADATA_USER_ID",
"METADATA_WORKSPACE_ID",
"IdentityInterceptor",
"RequestLoggingInterceptor",
]

View File

@@ -2,6 +2,9 @@
Populate identity context (request ID, user ID, workspace ID) for RPC calls
by extracting from metadata and setting context variables.
Identity metadata is REQUIRED for all RPCs. Requests missing the x-request-id
header are rejected with UNAUTHENTICATED status.
"""
from __future__ import annotations
@@ -13,7 +16,6 @@ import grpc
from grpc import aio
from noteflow.infrastructure.logging import (
generate_request_id,
get_logger,
request_id_var,
user_id_var,
@@ -27,6 +29,9 @@ METADATA_REQUEST_ID = "x-request-id"
METADATA_USER_ID = "x-user-id"
METADATA_WORKSPACE_ID = "x-workspace-id"
# Error messages
_ERR_MISSING_REQUEST_ID = "Missing required x-request-id header"
_TRequest = TypeVar("_TRequest")
_TResponse = TypeVar("_TResponse")
@@ -37,15 +42,18 @@ def _coerce_metadata_value(value: str | bytes) -> str:
class IdentityInterceptor(aio.ServerInterceptor):
"""Interceptor that populates identity context for RPC calls.
"""Interceptor that validates and populates identity context for RPC calls.
Extract user and workspace identifiers from gRPC metadata and
set them as context variables for use throughout the request.
Identity metadata is REQUIRED. Requests missing x-request-id are rejected
with UNAUTHENTICATED status.
Metadata keys:
- x-request-id: Correlation ID for request tracing
- x-user-id: User identifier
- x-workspace-id: Workspace identifier for tenant scoping
- x-request-id: Correlation ID for request tracing (REQUIRED)
- x-user-id: User identifier (optional)
- x-workspace-id: Workspace identifier for tenant scoping (optional)
"""
async def intercept_service(
@@ -56,7 +64,7 @@ class IdentityInterceptor(aio.ServerInterceptor):
],
handler_call_details: grpc.HandlerCallDetails,
) -> grpc.RpcMethodHandler[_TRequest, _TResponse]:
"""Intercept incoming RPC calls to set identity context.
"""Intercept incoming RPC calls to validate and set identity context.
Args:
continuation: The next interceptor or handler.
@@ -64,19 +72,25 @@ class IdentityInterceptor(aio.ServerInterceptor):
Returns:
The RPC handler for this call.
Raises:
grpc.RpcError: UNAUTHENTICATED if x-request-id header is missing.
"""
# Generate or extract request ID
metadata = dict(handler_call_details.invocation_metadata or [])
# Validate required x-request-id header
request_id_value = metadata.get(METADATA_REQUEST_ID)
request_id = (
_coerce_metadata_value(request_id_value)
if request_id_value is not None
else generate_request_id()
)
if request_id_value is None:
logger.warning(
"Rejecting RPC: missing x-request-id header",
method=handler_call_details.method,
)
return _create_unauthenticated_handler(_ERR_MISSING_REQUEST_ID)
request_id = _coerce_metadata_value(request_id_value)
request_id_var.set(request_id)
# Extract user and workspace IDs from metadata
# Extract optional user and workspace IDs from metadata
if user_id_value := metadata.get(METADATA_USER_ID):
user_id_var.set(_coerce_metadata_value(user_id_value))
@@ -92,3 +106,29 @@ class IdentityInterceptor(aio.ServerInterceptor):
)
return await continuation(handler_call_details)
def _create_unauthenticated_handler[TRequest, TResponse](
message: str,
) -> grpc.RpcMethodHandler[TRequest, TResponse]:
"""Create a handler that rejects with UNAUTHENTICATED status.
Args:
message: Error message to include in the response.
Returns:
A gRPC method handler that rejects all requests.
"""
async def reject_unary_unary(
request: TRequest,
context: aio.ServicerContext[TRequest, TResponse],
) -> TResponse:
await context.abort(grpc.StatusCode.UNAUTHENTICATED, message)
raise AssertionError("Unreachable after abort")
return grpc.unary_unary_rpc_method_handler(
reject_unary_unary,
request_deserializer=None,
response_serializer=None,
)

View File

@@ -0,0 +1,304 @@
"""Request logging interceptor for gRPC calls.
Log every RPC call with method, status, duration, peer, and request context
at INFO level for production observability and traceability.
"""
from __future__ import annotations
import time
from collections.abc import AsyncIterator, Awaitable, Callable
from typing import TypeVar, cast
import grpc
from grpc import aio
from noteflow.infrastructure.logging import get_logger, get_request_id
logger = get_logger(__name__)
# TypeVars required for ServerInterceptor.intercept_service compatibility
_TRequest = TypeVar("_TRequest")
_TResponse = TypeVar("_TResponse")
class RequestLoggingInterceptor(aio.ServerInterceptor):
"""Interceptor that logs all RPC calls with timing and status.
Logs at INFO level for every request with:
- method: Full RPC method name (e.g., /noteflow.NoteFlowService/GetMeeting)
- status: gRPC status code (OK, NOT_FOUND, etc.)
- duration_ms: Request processing time in milliseconds
- peer: Client peer address
- request_id: Correlation ID from identity context
"""
async def intercept_service(
self,
continuation: Callable[
[grpc.HandlerCallDetails],
Awaitable[grpc.RpcMethodHandler[_TRequest, _TResponse]],
],
handler_call_details: grpc.HandlerCallDetails,
) -> grpc.RpcMethodHandler[_TRequest, _TResponse]:
"""Intercept incoming RPC calls to log request timing and status.
Args:
continuation: The next interceptor or handler.
handler_call_details: Details about the RPC call.
Returns:
Wrapped RPC handler that logs on completion.
"""
handler = await continuation(handler_call_details)
method = handler_call_details.method
# Return wrapped handler that logs on completion
return _create_logging_handler(handler, method)
def _create_logging_handler[TRequest, TResponse](
handler: grpc.RpcMethodHandler[TRequest, TResponse],
method: str,
) -> grpc.RpcMethodHandler[TRequest, TResponse]:
"""Wrap an RPC handler to add request logging.
Args:
handler: Original RPC handler.
method: Full RPC method name.
Returns:
Wrapped handler with logging.
"""
# Cast required: gRPC stub types don't fully express the generic Callable signatures
# for handler attributes, causing basedpyright to infer partially unknown types.
if handler.unary_unary is not None:
return grpc.unary_unary_rpc_method_handler(
cast(
Callable[
[TRequest, aio.ServicerContext[TRequest, TResponse]],
Awaitable[TResponse],
],
_wrap_unary_unary(handler.unary_unary, method),
),
request_deserializer=handler.request_deserializer,
response_serializer=handler.response_serializer,
)
if handler.unary_stream is not None:
return grpc.unary_stream_rpc_method_handler(
cast(
Callable[
[TRequest, aio.ServicerContext[TRequest, TResponse]],
AsyncIterator[TResponse],
],
_wrap_unary_stream(handler.unary_stream, method),
),
request_deserializer=handler.request_deserializer,
response_serializer=handler.response_serializer,
)
if handler.stream_unary is not None:
return grpc.stream_unary_rpc_method_handler(
cast(
Callable[
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
Awaitable[TResponse],
],
_wrap_stream_unary(handler.stream_unary, method),
),
request_deserializer=handler.request_deserializer,
response_serializer=handler.response_serializer,
)
if handler.stream_stream is not None:
return grpc.stream_stream_rpc_method_handler(
cast(
Callable[
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
AsyncIterator[TResponse],
],
_wrap_stream_stream(handler.stream_stream, method),
),
request_deserializer=handler.request_deserializer,
response_serializer=handler.response_serializer,
)
# Fallback: return original handler if type unknown
return handler
def _log_request(
method: str,
status: str,
duration_ms: float,
peer: str | None,
) -> None:
"""Log RPC request completion at INFO level.
Args:
method: Full RPC method name.
status: gRPC status code name.
duration_ms: Request duration in milliseconds.
peer: Client peer address.
"""
request_id = get_request_id()
logger.info(
"RPC completed",
method=method,
status=status,
duration_ms=round(duration_ms, 2),
peer=peer,
request_id=request_id,
)
def _get_peer[TRequest, TResponse](
context: aio.ServicerContext[TRequest, TResponse],
) -> str | None:
"""Extract peer address from context safely.
Args:
context: gRPC servicer context.
Returns:
Peer address string or None.
"""
try:
return context.peer()
except (AttributeError, RuntimeError):
return None
def _wrap_unary_unary[TRequest, TResponse](
handler: Callable[
[TRequest, aio.ServicerContext[TRequest, TResponse]],
Awaitable[TResponse],
],
method: str,
) -> Callable[
[TRequest, aio.ServicerContext[TRequest, TResponse]],
Awaitable[TResponse],
]:
"""Wrap unary-unary handler with logging."""
async def wrapper(
request: TRequest,
context: aio.ServicerContext[TRequest, TResponse],
) -> TResponse:
start = time.perf_counter()
peer = _get_peer(context)
status = "OK"
try:
return await handler(request, context)
except grpc.RpcError as e:
status = e.code().name if hasattr(e, "code") else "UNKNOWN"
raise
except Exception:
status = "INTERNAL"
raise
finally:
duration_ms = (time.perf_counter() - start) * 1000
_log_request(method, status, duration_ms, peer)
return wrapper
def _wrap_unary_stream[TRequest, TResponse](
handler: Callable[
[TRequest, aio.ServicerContext[TRequest, TResponse]],
AsyncIterator[TResponse],
],
method: str,
) -> Callable[
[TRequest, aio.ServicerContext[TRequest, TResponse]],
AsyncIterator[TResponse],
]:
"""Wrap unary-stream handler with logging."""
async def wrapper(
request: TRequest,
context: aio.ServicerContext[TRequest, TResponse],
) -> AsyncIterator[TResponse]:
start = time.perf_counter()
peer = _get_peer(context)
status = "OK"
try:
async for response in handler(request, context):
yield response
except grpc.RpcError as e:
status = e.code().name if hasattr(e, "code") else "UNKNOWN"
raise
except Exception:
status = "INTERNAL"
raise
finally:
duration_ms = (time.perf_counter() - start) * 1000
_log_request(method, status, duration_ms, peer)
return wrapper
def _wrap_stream_unary[TRequest, TResponse](
handler: Callable[
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
Awaitable[TResponse],
],
method: str,
) -> Callable[
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
Awaitable[TResponse],
]:
"""Wrap stream-unary handler with logging."""
async def wrapper(
request_iterator: AsyncIterator[TRequest],
context: aio.ServicerContext[TRequest, TResponse],
) -> TResponse:
start = time.perf_counter()
peer = _get_peer(context)
status = "OK"
try:
return await handler(request_iterator, context)
except grpc.RpcError as e:
status = e.code().name if hasattr(e, "code") else "UNKNOWN"
raise
except Exception:
status = "INTERNAL"
raise
finally:
duration_ms = (time.perf_counter() - start) * 1000
_log_request(method, status, duration_ms, peer)
return wrapper
def _wrap_stream_stream[TRequest, TResponse](
handler: Callable[
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
AsyncIterator[TResponse],
],
method: str,
) -> Callable[
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
AsyncIterator[TResponse],
]:
"""Wrap stream-stream handler with logging."""
async def wrapper(
request_iterator: AsyncIterator[TRequest],
context: aio.ServicerContext[TRequest, TResponse],
) -> AsyncIterator[TResponse]:
start = time.perf_counter()
peer = _get_peer(context)
status = "OK"
try:
async for response in handler(request_iterator, context):
yield response
except grpc.RpcError as e:
status = e.code().name if hasattr(e, "code") else "UNKNOWN"
raise
except Exception:
status = "INTERNAL"
raise
finally:
duration_ms = (time.perf_counter() - start) * 1000
_log_request(method, status, duration_ms, peer)
return wrapper

View File

@@ -7,17 +7,17 @@ Used as fallback when no database is configured.
from __future__ import annotations
import threading
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, Unpack
from noteflow.config.constants import ERROR_MSG_MEETING_PREFIX
from noteflow.domain.entities import Meeting, Segment, Summary
from noteflow.domain.value_objects import MeetingState
from noteflow.domain.ports.repositories.transcript import MeetingListKwargs
from noteflow.infrastructure.persistence.memory.repositories import (
InMemoryIntegrationRepository,
)
if TYPE_CHECKING:
from collections.abc import Sequence
from datetime import datetime
@@ -91,24 +91,22 @@ class MeetingStore:
def list_all(
self,
states: Sequence[MeetingState] | None = None,
limit: int = 100,
offset: int = 0,
sort_desc: bool = True,
project_id: str | None = None,
**kwargs: Unpack["MeetingListKwargs"],
) -> tuple[list[Meeting], int]:
"""List meetings with optional filtering.
Args:
states: Filter by these states (all if None).
limit: Max meetings per page.
offset: Pagination offset.
sort_desc: Sort by created_at descending.
**kwargs: Optional filters (states, limit, offset, sort_desc).
Returns:
Tuple of (paginated meeting list, total matching count).
"""
with self._lock:
states = kwargs.get("states")
limit = kwargs.get("limit", 100)
offset = kwargs.get("offset", 0)
sort_desc = kwargs.get("sort_desc", True)
project_id = kwargs.get("project_id")
meetings = list(self._meetings.values())
# Filter by state

View File

@@ -2,12 +2,11 @@
from __future__ import annotations
import argparse
import asyncio
import os
import signal
import time
from typing import TYPE_CHECKING, cast
from typing import TYPE_CHECKING, TypedDict, Unpack, cast
import grpc.aio
from pydantic import ValidationError
@@ -15,19 +14,12 @@ from pydantic import ValidationError
from noteflow.config.constants import DEFAULT_GRPC_PORT, SETTING_CLOUD_CONSENT_GRANTED
from noteflow.config.settings import get_feature_flags, get_settings
from noteflow.infrastructure.asr import FasterWhisperEngine
from noteflow.infrastructure.asr.engine import VALID_MODEL_SIZES
from noteflow.infrastructure.logging import LoggingConfig, configure_logging, get_logger
from noteflow.infrastructure.persistence.unit_of_work import SqlAlchemyUnitOfWork
from noteflow.infrastructure.summarization import create_summarization_service
from ._config import (
DEFAULT_BIND_ADDRESS,
DEFAULT_MODEL,
AsrConfig,
DiarizationConfig,
GrpcServerConfig,
ServicesConfig,
)
from ._cli import build_config_from_args, parse_args
from ._config import DEFAULT_BIND_ADDRESS, AsrConfig, GrpcServerConfig, ServicesConfig
from ._startup import (
create_calendar_service,
create_diarization_engine,
@@ -37,6 +29,7 @@ from ._startup import (
print_startup_banner,
setup_summarization_with_consent,
)
from .interceptors import IdentityInterceptor, RequestLoggingInterceptor
from .proto import noteflow_pb2_grpc
from .service import NoteFlowServicer
@@ -45,6 +38,17 @@ if TYPE_CHECKING:
from noteflow.config.settings import Settings
class _ServerInitKwargs(TypedDict, total=False):
"""Optional initialization parameters for NoteFlowServer."""
port: int
bind_address: str
asr: AsrConfig | None
session_factory: async_sessionmaker[AsyncSession] | None
db_engine: AsyncEngine | None
services: ServicesConfig | None
logger = get_logger(__name__)
@@ -53,32 +57,27 @@ class NoteFlowServer:
def __init__(
self,
port: int = DEFAULT_GRPC_PORT,
bind_address: str = DEFAULT_BIND_ADDRESS,
asr: AsrConfig | None = None,
session_factory: async_sessionmaker[AsyncSession] | None = None,
db_engine: AsyncEngine | None = None,
services: ServicesConfig | None = None,
**kwargs: Unpack[_ServerInitKwargs],
) -> None:
"""Initialize the server.
Args:
port: Port to listen on.
bind_address: Address to bind to (0.0.0.0 for all interfaces, 127.0.0.1 for localhost).
asr: ASR engine configuration (defaults to AsrConfig()).
session_factory: Optional async session factory for database.
db_engine: Optional database engine for lifecycle management.
services: Optional services configuration grouping all optional services.
**kwargs: Optional server initialization parameters.
"""
port = kwargs.get("port", DEFAULT_GRPC_PORT)
bind_address = kwargs.get("bind_address", DEFAULT_BIND_ADDRESS)
asr = kwargs.get("asr") or AsrConfig()
session_factory = kwargs.get("session_factory")
db_engine = kwargs.get("db_engine")
services = kwargs.get("services") or ServicesConfig()
self._port = port
self._bind_address = bind_address
asr = asr or AsrConfig()
self._asr_model = asr.model
self._asr_device = asr.device
self._asr_compute_type = asr.compute_type
self._session_factory = session_factory
self._db_engine = db_engine
services = services or ServicesConfig()
self._summarization_service = services.summarization_service
self._diarization_engine = services.diarization_engine
self._diarization_refinement_enabled = services.diarization_refinement_enabled
@@ -93,76 +92,13 @@ class NoteFlowServer:
"""Start the async gRPC server."""
logger.info("Starting NoteFlow gRPC server (async)...")
# Create ASR engine
logger.info(
"Loading ASR model '%s' on %s (%s)...",
self._asr_model,
self._asr_device,
self._asr_compute_type,
)
start_time = time.perf_counter()
asr_engine = FasterWhisperEngine(
compute_type=self._asr_compute_type,
device=self._asr_device,
)
asr_engine.load_model(self._asr_model)
load_time = time.perf_counter() - start_time
logger.info("ASR model loaded in %.2f seconds", load_time)
# Lazy-create summarization service if not provided
if self._summarization_service is None:
self._summarization_service = create_summarization_service()
logger.info("Summarization service initialized (default factory)")
# Lazy-create project service if not provided (requires database)
if self._project_service is None and self._session_factory is not None:
from noteflow.application.services.project_service import ProjectService
self._project_service = ProjectService()
logger.info("Project service initialized")
# Wire consent persistence if database is available
await self._wire_consent_persistence()
# Create servicer with session factory and services config
self._servicer = NoteFlowServicer(
asr_engine=asr_engine,
session_factory=self._session_factory,
services=ServicesConfig(
summarization_service=self._summarization_service,
diarization_engine=self._diarization_engine,
diarization_refinement_enabled=self._diarization_refinement_enabled,
ner_service=self._ner_service,
calendar_service=self._calendar_service,
webhook_service=self._webhook_service,
project_service=self._project_service,
),
)
# Recover orphaned diarization jobs from previous instance
asr_engine = self._load_asr_engine()
await self._ensure_services()
self._servicer = self._build_servicer(asr_engine)
await self._recover_orphaned_jobs()
# Create async gRPC server
self._server = grpc.aio.server(
options=[
("grpc.max_send_message_length", 100 * 1024 * 1024), # 100MB
("grpc.max_receive_message_length", 100 * 1024 * 1024),
],
)
# Register service
noteflow_pb2_grpc.add_NoteFlowServiceServicer_to_server(
cast(noteflow_pb2_grpc.NoteFlowServiceServicer, self._servicer),
self._server,
)
# Bind to port
address = f"{self._bind_address}:{self._port}"
self._server.add_insecure_port(address)
# Start server
self._server = self._create_server()
address = self._bind_server(self._server)
await self._server.start()
logger.info("Server listening on %s", address)
@@ -192,6 +128,78 @@ class NoteFlowServer:
if self._server:
await self._server.wait_for_termination()
def _load_asr_engine(self) -> FasterWhisperEngine:
"""Create and load the ASR engine."""
logger.info(
"Loading ASR model '%s' on %s (%s)...",
self._asr_model,
self._asr_device,
self._asr_compute_type,
)
start_time = time.perf_counter()
asr_engine = FasterWhisperEngine(
compute_type=self._asr_compute_type,
device=self._asr_device,
)
asr_engine.load_model(self._asr_model)
load_time = time.perf_counter() - start_time
logger.info("ASR model loaded in %.2f seconds", load_time)
return asr_engine
async def _ensure_services(self) -> None:
"""Initialize optional services and wire persistence hooks."""
if self._summarization_service is None:
self._summarization_service = create_summarization_service()
logger.info("Summarization service initialized (default factory)")
if self._project_service is None and self._session_factory is not None:
from noteflow.application.services.project_service import ProjectService
self._project_service = ProjectService()
logger.info("Project service initialized")
await self._wire_consent_persistence()
def _build_servicer(self, asr_engine: FasterWhisperEngine) -> NoteFlowServicer:
"""Construct the gRPC servicer instance."""
return NoteFlowServicer(
asr_engine=asr_engine,
session_factory=self._session_factory,
services=ServicesConfig(
summarization_service=self._summarization_service,
diarization_engine=self._diarization_engine,
diarization_refinement_enabled=self._diarization_refinement_enabled,
ner_service=self._ner_service,
calendar_service=self._calendar_service,
webhook_service=self._webhook_service,
project_service=self._project_service,
),
)
@staticmethod
def _create_server() -> grpc.aio.Server:
"""Create async gRPC server with interceptors and limits."""
return grpc.aio.server(
interceptors=[
RequestLoggingInterceptor(),
IdentityInterceptor(),
],
options=[
("grpc.max_send_message_length", 100 * 1024 * 1024), # 100MB
("grpc.max_receive_message_length", 100 * 1024 * 1024),
],
)
def _bind_server(self, server: grpc.aio.Server) -> str:
"""Register servicer and bind the server to the configured address."""
noteflow_pb2_grpc.add_NoteFlowServiceServicer_to_server(
cast(noteflow_pb2_grpc.NoteFlowServiceServicer, self._servicer),
server,
)
address = f"{self._bind_address}:{self._port}"
server.add_insecure_port(address)
return address
async def _recover_orphaned_jobs(self) -> None:
"""Mark orphaned diarization jobs as failed on startup.
@@ -283,35 +291,10 @@ async def run_server_with_config(config: GrpcServerConfig) -> None:
Args:
config: Complete server configuration.
"""
# Initialize database if configured
session_factory: async_sessionmaker[AsyncSession] | None = None
db_engine: AsyncEngine | None = None
if config.database_url:
db_engine, session_factory = await init_database_and_recovery(config.database_url)
# Create summarization service and configure cloud consent
summarization_service = create_summarization_service()
logger.info("Summarization service initialized")
cloud_llm_provider: str | None = None
session_factory, db_engine = await _init_db(config.database_url)
settings = get_settings()
if session_factory:
cloud_llm_provider = await setup_summarization_with_consent(
session_factory, summarization_service, settings
)
# Create optional services based on configuration
ner_service = create_ner_service(session_factory, settings)
calendar_service = await create_calendar_service(session_factory, settings)
diarization_engine = create_diarization_engine(config.diarization)
webhook_service = await create_webhook_service(session_factory, settings) if session_factory else None
# Log warning if webhooks enabled but no database
if get_feature_flags().webhooks_enabled and not session_factory:
logger.warning(
"Webhooks feature enabled but no database configured. "
"Webhooks require database for configuration persistence."
)
services, cloud_llm_provider = await _create_services(config, session_factory, settings)
_warn_webhooks_without_db(session_factory)
server = NoteFlowServer(
port=config.port,
@@ -319,18 +302,78 @@ async def run_server_with_config(config: GrpcServerConfig) -> None:
asr=config.asr,
session_factory=session_factory,
db_engine=db_engine,
services=ServicesConfig(
summarization_service=summarization_service,
diarization_engine=diarization_engine,
diarization_refinement_enabled=config.diarization.refinement_enabled,
ner_service=ner_service,
calendar_service=calendar_service,
webhook_service=webhook_service,
),
services=services,
)
# Set up graceful shutdown
loop = asyncio.get_running_loop()
shutdown_event = _register_shutdown_handlers(asyncio.get_running_loop())
try:
await server.start()
print_startup_banner(
config,
services.diarization_engine,
cloud_llm_provider,
services.calendar_service,
services.webhook_service,
)
await shutdown_event.wait()
finally:
if services.webhook_service is not None:
await services.webhook_service.close()
await server.stop()
async def _init_db(
database_url: str | None,
) -> tuple[async_sessionmaker[AsyncSession] | None, AsyncEngine | None]:
"""Initialize database and return session factory and engine."""
if database_url:
db_engine, session_factory = await init_database_and_recovery(database_url)
return session_factory, db_engine
return None, None
async def _create_services(
config: GrpcServerConfig,
session_factory: async_sessionmaker[AsyncSession] | None,
settings: Settings,
) -> tuple[ServicesConfig, str | None]:
"""Create optional services based on configuration."""
summarization_service = create_summarization_service()
logger.info("Summarization service initialized")
cloud_llm_provider: str | None = None
if session_factory:
cloud_llm_provider = await setup_summarization_with_consent(
session_factory, summarization_service, settings
)
services = ServicesConfig(
summarization_service=summarization_service,
diarization_engine=create_diarization_engine(config.diarization),
diarization_refinement_enabled=config.diarization.refinement_enabled,
ner_service=create_ner_service(session_factory, settings),
calendar_service=await create_calendar_service(session_factory, settings),
webhook_service=await create_webhook_service(session_factory, settings)
if session_factory
else None,
)
return services, cloud_llm_provider
def _warn_webhooks_without_db(
session_factory: async_sessionmaker[AsyncSession] | None,
) -> None:
"""Log warning if webhooks are enabled without a database."""
if get_feature_flags().webhooks_enabled and not session_factory:
logger.warning(
"Webhooks feature enabled but no database configured. "
"Webhooks require database for configuration persistence."
)
def _register_shutdown_handlers(loop: asyncio.AbstractEventLoop) -> asyncio.Event:
"""Register signal handlers to trigger a shutdown event."""
shutdown_event = asyncio.Event()
def signal_handler() -> None:
@@ -339,159 +382,12 @@ async def run_server_with_config(config: GrpcServerConfig) -> None:
for sig in (signal.SIGINT, signal.SIGTERM):
loop.add_signal_handler(sig, signal_handler)
try:
await server.start()
print_startup_banner(
config,
diarization_engine,
cloud_llm_provider,
calendar_service,
webhook_service,
)
await shutdown_event.wait()
finally:
if webhook_service is not None:
await webhook_service.close()
await server.stop()
def _parse_args() -> argparse.Namespace:
"""Parse command-line arguments for the gRPC server."""
parser = argparse.ArgumentParser(description="NoteFlow gRPC Server")
parser.add_argument(
"-p",
"--port",
type=int,
default=DEFAULT_GRPC_PORT,
help=f"Port to listen on (default: {DEFAULT_GRPC_PORT})",
)
parser.add_argument(
"-m",
"--model",
type=str,
default=DEFAULT_MODEL,
choices=list(VALID_MODEL_SIZES),
help=f"ASR model size (default: {DEFAULT_MODEL})",
)
parser.add_argument(
"-d",
"--device",
type=str,
default="cpu",
choices=["cpu", "cuda"],
help="ASR device (default: cpu)",
)
parser.add_argument(
"-c",
"--compute-type",
type=str,
default="int8",
choices=["int8", "float16", "float32"],
help="ASR compute type (default: int8)",
)
parser.add_argument(
"--database-url",
type=str,
default=None,
help="PostgreSQL database URL (overrides NOTEFLOW_DATABASE_URL)",
)
parser.add_argument(
"-v",
"--verbose",
action="store_true",
help="Enable verbose logging",
)
parser.add_argument(
"--diarization",
action="store_true",
help="Enable speaker diarization (requires pyannote.audio)",
)
parser.add_argument(
"--diarization-hf-token",
type=str,
default=None,
help="HuggingFace token for pyannote models (overrides NOTEFLOW_DIARIZATION_HF_TOKEN)",
)
parser.add_argument(
"--diarization-device",
type=str,
default="auto",
choices=["auto", "cpu", "cuda", "mps"],
help="Device for diarization (default: auto)",
)
return parser.parse_args()
def _build_config(args: argparse.Namespace, settings: Settings | None) -> GrpcServerConfig:
"""Build server configuration from CLI arguments and settings.
CLI arguments take precedence over environment settings.
Args:
args: Parsed command-line arguments.
settings: Optional application settings from environment.
Returns:
Complete server configuration.
"""
# Database URL: args override settings
database_url = args.database_url
if not database_url and settings:
database_url = str(settings.database_url)
if not database_url:
logger.warning("No database URL configured, running in-memory mode")
# Diarization config: args override settings
diarization_enabled = args.diarization
diarization_hf_token = args.diarization_hf_token
diarization_device = args.diarization_device
diarization_streaming_latency: float | None = None
diarization_min_speakers: int | None = None
diarization_max_speakers: int | None = None
diarization_refinement_enabled = True
if settings and not diarization_enabled:
diarization_enabled = settings.diarization_enabled
if settings and not diarization_hf_token:
diarization_hf_token = settings.diarization_hf_token
if settings and diarization_device == "auto":
diarization_device = settings.diarization_device
if settings:
diarization_streaming_latency = settings.diarization_streaming_latency
diarization_min_speakers = settings.diarization_min_speakers
diarization_max_speakers = settings.diarization_max_speakers
diarization_refinement_enabled = settings.diarization_refinement_enabled
# Bind address: settings override default (0.0.0.0)
bind_address = DEFAULT_BIND_ADDRESS
if settings:
bind_address = settings.grpc_bind_address
return GrpcServerConfig(
port=args.port,
bind_address=bind_address,
asr=AsrConfig(
model=args.model,
device=args.device,
compute_type=args.compute_type,
),
database_url=database_url,
diarization=DiarizationConfig(
enabled=diarization_enabled,
hf_token=diarization_hf_token,
device=diarization_device,
streaming_latency=diarization_streaming_latency,
min_speakers=diarization_min_speakers,
max_speakers=diarization_max_speakers,
refinement_enabled=diarization_refinement_enabled,
),
)
return shutdown_event
def main() -> None:
"""Entry point for NoteFlow gRPC server."""
args = _parse_args()
args = parse_args()
# Configure centralized logging with structlog
# Get log_format from env before full settings load (logging needed to report load errors)
@@ -507,7 +403,7 @@ def main() -> None:
settings = None
# Build configuration and run server
config = _build_config(args, settings)
config = build_config_from_args(args, settings)
asyncio.run(run_server_with_config(config))

View File

@@ -28,6 +28,7 @@ from noteflow.infrastructure.security.keystore import KeyringKeyStore
from ._config import ServicesConfig
from ._mixins import (
AnnotationMixin,
GrpcContext,
CalendarMixin,
DiarizationJobMixin,
DiarizationMixin,
@@ -48,14 +49,12 @@ from .proto import noteflow_pb2, noteflow_pb2_grpc
from .stream_state import MeetingStreamState
if TYPE_CHECKING:
from collections.abc import AsyncIterator
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
from noteflow.infrastructure.asr import FasterWhisperEngine
from noteflow.infrastructure.auth.oidc_registry import OidcAuthService
from ._mixins._types import GrpcContext
from ._service_stubs import NoteFlowServicerStubs
logger = get_logger(__name__)
@@ -64,6 +63,13 @@ if TYPE_CHECKING:
else:
_GrpcBaseServicer = noteflow_pb2_grpc.NoteFlowServiceServicer
# Empty class to satisfy MRO - cannot use `object` directly as it conflicts
# with NoteFlowServiceServicer's inheritance from object
class NoteFlowServicerStubs:
"""Runtime placeholder for type stubs (empty at runtime)."""
pass
class NoteFlowServicer(
StreamingMixin,
@@ -82,6 +88,7 @@ class NoteFlowServicer(
OidcMixin,
ProjectMixin,
ProjectMembershipMixin,
NoteFlowServicerStubs,
_GrpcBaseServicer,
):
"""Async gRPC service implementation for NoteFlow with PostgreSQL persistence.
@@ -90,153 +97,7 @@ class NoteFlowServicer(
use `self: Protocol` annotations.
"""
# Type stubs for mixin methods (fixes type inference when mixins use `self: Protocol`)
if TYPE_CHECKING:
# StreamingMixin (test_streaming_real_pipeline.py, test_e2e_streaming.py)
def StreamTranscription(
self,
request_iterator: AsyncIterator[noteflow_pb2.AudioChunk],
context: GrpcContext,
) -> AsyncIterator[noteflow_pb2.TranscriptUpdate]: ...
# CalendarMixin (test_oauth.py)
async def GetCalendarProviders(
self,
request: noteflow_pb2.GetCalendarProvidersRequest,
context: GrpcContext,
) -> noteflow_pb2.GetCalendarProvidersResponse: ...
async def InitiateOAuth(
self,
request: noteflow_pb2.InitiateOAuthRequest,
context: GrpcContext,
) -> noteflow_pb2.InitiateOAuthResponse: ...
async def CompleteOAuth(
self,
request: noteflow_pb2.CompleteOAuthRequest,
context: GrpcContext,
) -> noteflow_pb2.CompleteOAuthResponse: ...
async def GetOAuthConnectionStatus(
self,
request: noteflow_pb2.GetOAuthConnectionStatusRequest,
context: GrpcContext,
) -> noteflow_pb2.GetOAuthConnectionStatusResponse: ...
async def DisconnectOAuth(
self,
request: noteflow_pb2.DisconnectOAuthRequest,
context: GrpcContext,
) -> noteflow_pb2.DisconnectOAuthResponse: ...
# Type stubs for SummarizationMixin methods (test_cloud_consent.py, test_generate_summary.py)
async def GetCloudConsentStatus(
self,
request: noteflow_pb2.GetCloudConsentStatusRequest,
context: GrpcContext,
) -> noteflow_pb2.GetCloudConsentStatusResponse: ...
async def GrantCloudConsent(
self,
request: noteflow_pb2.GrantCloudConsentRequest,
context: GrpcContext,
) -> noteflow_pb2.GrantCloudConsentResponse: ...
async def RevokeCloudConsent(
self,
request: noteflow_pb2.RevokeCloudConsentRequest,
context: GrpcContext,
) -> noteflow_pb2.RevokeCloudConsentResponse: ...
async def GenerateSummary(
self,
request: noteflow_pb2.GenerateSummaryRequest,
context: GrpcContext,
) -> noteflow_pb2.Summary: ...
# Type stubs for SyncMixin methods (test_sync_orchestration.py)
async def StartIntegrationSync(
self,
request: noteflow_pb2.StartIntegrationSyncRequest,
context: GrpcContext,
) -> noteflow_pb2.StartIntegrationSyncResponse: ...
async def GetSyncStatus(
self,
request: noteflow_pb2.GetSyncStatusRequest,
context: GrpcContext,
) -> noteflow_pb2.GetSyncStatusResponse: ...
async def ListSyncHistory(
self,
request: noteflow_pb2.ListSyncHistoryRequest,
context: GrpcContext,
) -> noteflow_pb2.ListSyncHistoryResponse: ...
async def GetUserIntegrations(
self,
request: noteflow_pb2.GetUserIntegrationsRequest,
context: GrpcContext,
) -> noteflow_pb2.GetUserIntegrationsResponse: ...
# Type stubs for DiarizationMixin methods (test_diarization_mixin.py, test_diarization_refine.py)
async def RefineSpeakerDiarization(
self,
request: noteflow_pb2.RefineSpeakerDiarizationRequest,
context: GrpcContext,
) -> noteflow_pb2.RefineSpeakerDiarizationResponse: ...
# Type stubs for SpeakerMixin methods (test_diarization_mixin.py)
async def RenameSpeaker(
self,
request: noteflow_pb2.RenameSpeakerRequest,
context: GrpcContext,
) -> noteflow_pb2.RenameSpeakerResponse: ...
# Type stubs for DiarizationJobMixin methods (test_diarization_mixin.py, test_diarization_cancel.py)
async def GetDiarizationJobStatus(
self,
request: noteflow_pb2.GetDiarizationJobStatusRequest,
context: GrpcContext,
) -> noteflow_pb2.DiarizationJobStatus: ...
async def CancelDiarizationJob(
self,
request: noteflow_pb2.CancelDiarizationJobRequest,
context: GrpcContext,
) -> noteflow_pb2.CancelDiarizationJobResponse: ...
# Type stubs for WebhooksMixin methods (test_webhooks_mixin.py)
async def RegisterWebhook(
self,
request: noteflow_pb2.RegisterWebhookRequest,
context: GrpcContext,
) -> noteflow_pb2.WebhookConfigProto: ...
async def ListWebhooks(
self,
request: noteflow_pb2.ListWebhooksRequest,
context: GrpcContext,
) -> noteflow_pb2.ListWebhooksResponse: ...
async def UpdateWebhook(
self,
request: noteflow_pb2.UpdateWebhookRequest,
context: GrpcContext,
) -> noteflow_pb2.WebhookConfigProto: ...
async def DeleteWebhook(
self,
request: noteflow_pb2.DeleteWebhookRequest,
context: GrpcContext,
) -> noteflow_pb2.DeleteWebhookResponse: ...
async def GetWebhookDeliveries(
self,
request: noteflow_pb2.GetWebhookDeliveriesRequest,
context: GrpcContext,
) -> noteflow_pb2.GetWebhookDeliveriesResponse: ...
# Type stubs now live in _service_stubs.py to keep module size down.
VERSION: Final[str] = __version__
MAX_CHUNK_SIZE: Final[int] = 1024 * 1024 # 1MB
@@ -542,7 +403,6 @@ class NoteFlowServicer(
any running jobs as failed in the database.
"""
logger.info("Shutting down servicer...")
# Cancel in-flight diarization tasks
cancelled_job_ids = list(self.diarization_tasks.keys())
for job_id, task in list(self.diarization_tasks.items()):

View File

@@ -7,7 +7,7 @@ from __future__ import annotations
import asyncio
from collections.abc import Iterable, Iterator
from typing import TYPE_CHECKING, Final, Protocol, cast
from typing import TYPE_CHECKING, Final, Protocol, TypedDict, Unpack, cast
from noteflow.infrastructure.logging import get_logger, log_timing
@@ -16,6 +16,15 @@ if TYPE_CHECKING:
from numpy.typing import NDArray
class _WhisperTranscribeKwargs(TypedDict, total=False):
"""Keyword arguments supported by WhisperModel.transcribe."""
language: str | None
word_timestamps: bool
beam_size: int
vad_filter: bool
class _WhisperWord(Protocol):
word: str
start: float
@@ -41,11 +50,7 @@ class _WhisperModel(Protocol):
def transcribe(
self,
audio: NDArray[np.float32],
*,
language: str | None = None,
word_timestamps: bool = ...,
beam_size: int = ...,
vad_filter: bool = ...,
**kwargs: Unpack[_WhisperTranscribeKwargs],
) -> tuple[Iterable[_WhisperSegment], _WhisperInfo]: ...
from noteflow.infrastructure.asr.dto import AsrResult, WordTiming
@@ -209,13 +214,35 @@ class FasterWhisperEngine:
Returns:
List of AsrResult segments with word-level timestamps.
"""
# Calculate audio duration for timing context
sample_count = len(audio)
# Assume 16kHz sample rate (standard for Whisper)
audio_duration_seconds = sample_count / 16000.0
loop = asyncio.get_running_loop()
return await loop.run_in_executor(
None,
self._transcribe_to_list,
audio,
language,
)
with log_timing(
"asr_transcribe",
audio_duration_seconds=round(audio_duration_seconds, 2),
sample_count=sample_count,
model_size=self._model_size,
):
results = await loop.run_in_executor(
None,
self._transcribe_to_list,
audio,
language,
)
# Log real-time factor (RTF) for performance monitoring
# RTF < 1.0 means faster than real-time
if audio_duration_seconds > 0:
logger.debug(
"asr_transcribe_rtf",
segment_count=len(results),
audio_duration_seconds=round(audio_duration_seconds, 2),
)
return results
def _transcribe_to_list(
self,

View File

@@ -14,10 +14,13 @@ import numpy as np
from numpy.typing import NDArray
from noteflow.config.constants import DEFAULT_SAMPLE_RATE
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Iterator
logger = get_logger(__name__)
class SegmenterState(Enum):
"""Segmenter state machine states."""
@@ -150,9 +153,17 @@ class Segmenter:
"""Handle audio in IDLE state."""
if is_speech:
# Speech started - transition to SPEECH state
old_state = self._state
self._state = SegmenterState.SPEECH
self._speech_start_time = chunk_start
logger.debug(
"segmenter_state_transition",
from_state=old_state.name,
to_state=self._state.name,
stream_time=round(self._stream_time, 2),
)
# Capture how much pre-speech audio we are including (O(1) lookup).
self._leading_duration = self._leading_buffer_samples / self.config.sample_rate
@@ -194,15 +205,29 @@ class Segmenter:
else:
# Speech ended - transition to TRAILING
# Start trailing buffer with this silent chunk
old_state = self._state
self._state = SegmenterState.TRAILING
self._trailing_buffer = [audio]
self._trailing_duration = chunk_duration
logger.debug(
"segmenter_state_transition",
from_state=old_state.name,
to_state=self._state.name,
stream_time=round(self._stream_time, 2),
)
# Check if already past trailing threshold
if self._trailing_duration >= self.config.trailing_silence:
segment = self._emit_segment()
if segment is not None:
yield segment
logger.debug(
"segmenter_state_transition",
from_state=SegmenterState.TRAILING.name,
to_state=SegmenterState.IDLE.name,
stream_time=round(self._stream_time, 2),
)
self._state = SegmenterState.IDLE
def _handle_trailing(
@@ -219,7 +244,15 @@ class Segmenter:
self._speech_buffer.append(audio)
self._trailing_buffer.clear()
self._trailing_duration = 0.0
old_state = self._state
self._state = SegmenterState.SPEECH
logger.debug(
"segmenter_state_transition",
from_state=old_state.name,
to_state=self._state.name,
stream_time=round(self._stream_time, 2),
)
else:
# Still silence - accumulate trailing
self._trailing_buffer.append(audio)
@@ -230,6 +263,12 @@ class Segmenter:
segment = self._emit_segment()
if segment is not None:
yield segment
logger.debug(
"segmenter_state_transition",
from_state=SegmenterState.TRAILING.name,
to_state=SegmenterState.IDLE.name,
stream_time=round(self._stream_time, 2),
)
self._state = SegmenterState.IDLE
def _update_leading_buffer(self, audio: NDArray[np.float32]) -> None:

View File

@@ -7,12 +7,13 @@ from __future__ import annotations
import time
from collections.abc import Callable, Mapping
from typing import TYPE_CHECKING, cast
from typing import TYPE_CHECKING, Unpack, cast
import numpy as np
from noteflow.config.constants import DEFAULT_SAMPLE_RATE
from noteflow.infrastructure.audio.dto import AudioDeviceInfo, AudioFrameCallback
from noteflow.infrastructure.audio.protocols import AudioCaptureStartKwargs
from noteflow.infrastructure.audio.sounddevice_support import (
InputStreamLike,
MissingSoundDevice,
@@ -110,18 +111,14 @@ class SoundDeviceCapture:
self,
device_id: int | None,
on_frames: AudioFrameCallback,
sample_rate: int = DEFAULT_SAMPLE_RATE,
channels: int = 1,
chunk_duration_ms: int = 100,
**kwargs: Unpack[AudioCaptureStartKwargs],
) -> None:
"""Start capturing audio from the specified device.
Args:
device_id: Device ID to capture from, or None for default device.
on_frames: Callback receiving (frames, timestamp) for each chunk.
sample_rate: Sample rate in Hz (default 16kHz for ASR).
channels: Number of channels (default 1 for mono).
chunk_duration_ms: Duration of each audio chunk in milliseconds.
**kwargs: Optional capture settings (sample_rate, channels, chunk_duration_ms).
Raises:
RuntimeError: If already capturing.
@@ -130,6 +127,10 @@ class SoundDeviceCapture:
if self._stream is not None:
raise RuntimeError("Already capturing audio")
sample_rate = kwargs.get("sample_rate", DEFAULT_SAMPLE_RATE)
channels = kwargs.get("channels", 1)
chunk_duration_ms = kwargs.get("chunk_duration_ms", 100)
self._callback = on_frames
self._device_id = device_id
self._sample_rate = sample_rate

View File

@@ -5,9 +5,7 @@ Define Protocol interfaces for audio capture, level metering, and buffering.
from __future__ import annotations
from typing import TYPE_CHECKING, Protocol
from noteflow.config.constants import DEFAULT_SAMPLE_RATE
from typing import TYPE_CHECKING, Protocol, TypedDict, Unpack
if TYPE_CHECKING:
import numpy as np
@@ -20,6 +18,14 @@ if TYPE_CHECKING:
)
class AudioCaptureStartKwargs(TypedDict, total=False):
"""Optional parameters for AudioCapture.start."""
sample_rate: int
channels: int
chunk_duration_ms: int
class AudioCapture(Protocol):
"""Protocol for audio input capture.
@@ -39,18 +45,14 @@ class AudioCapture(Protocol):
self,
device_id: int | None,
on_frames: AudioFrameCallback,
sample_rate: int = DEFAULT_SAMPLE_RATE,
channels: int = 1,
chunk_duration_ms: int = 100,
**kwargs: Unpack[AudioCaptureStartKwargs],
) -> None:
"""Start capturing audio from the specified device.
Args:
device_id: Device ID to capture from, or None for default device.
on_frames: Callback receiving (frames, timestamp) for each chunk.
sample_rate: Sample rate in Hz (default 16kHz for ASR).
channels: Number of channels (default 1 for mono).
chunk_duration_ms: Duration of each audio chunk in milliseconds.
**kwargs: Optional capture settings (sample_rate, channels, chunk_duration_ms).
Raises:
RuntimeError: If already capturing.

View File

@@ -7,7 +7,7 @@ import json
import threading
from datetime import UTC, datetime
from pathlib import Path
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, TypedDict, Unpack
import numpy as np
@@ -24,6 +24,13 @@ if TYPE_CHECKING:
from noteflow.infrastructure.security.crypto import AesGcmCryptoBox
class _AudioWriterOpenKwargs(TypedDict, total=False):
"""Optional parameters for opening a meeting writer."""
sample_rate: int
asset_path: str | None
logger = get_logger(__name__)
@@ -81,8 +88,7 @@ class MeetingAudioWriter:
meeting_id: str,
dek: bytes,
wrapped_dek: bytes,
sample_rate: int = DEFAULT_SAMPLE_RATE,
asset_path: str | None = None,
**kwargs: Unpack[_AudioWriterOpenKwargs],
) -> None:
"""Open meeting for audio writing.
@@ -92,54 +98,21 @@ class MeetingAudioWriter:
meeting_id: Meeting UUID string.
dek: Unwrapped data encryption key (32 bytes).
wrapped_dek: Encrypted DEK to store in manifest.
sample_rate: Audio sample rate (default 16000 Hz).
asset_path: Relative path for audio storage (defaults to meeting_id).
This allows meetings_dir to change without orphaning files.
**kwargs: Optional settings (sample_rate, asset_path).
Raises:
RuntimeError: If already open.
OSError: If directory creation fails.
"""
if self._asset_writer is not None:
raise RuntimeError("Writer already open")
sample_rate = kwargs.get("sample_rate", DEFAULT_SAMPLE_RATE)
asset_path = kwargs.get("asset_path")
# Use asset_path if provided, otherwise default to meeting_id
storage_path = asset_path or meeting_id
# Create meeting directory
self._meeting_dir = self._meetings_dir / storage_path
self._meeting_dir.mkdir(parents=True, exist_ok=True)
# Write manifest.json
manifest = {
"meeting_id": meeting_id,
"created_at": datetime.now(UTC).isoformat(),
"sample_rate": sample_rate,
"channels": 1,
"format": "pcm16",
"wrapped_dek": wrapped_dek.hex(), # Store as hex string
}
manifest_path = self._meeting_dir / "manifest.json"
manifest_path.write_text(json.dumps(manifest, indent=2))
# Open encrypted audio file
audio_path = self._meeting_dir / "audio.enc"
self._asset_writer = ChunkedAssetWriter(self._crypto)
self._asset_writer.open(audio_path, dek)
self._sample_rate = sample_rate
self._chunk_count = 0
self._write_count = 0
self._buffer = io.BytesIO()
# Start periodic flush thread for crash resilience
self._stop_flush.clear()
self._flush_thread = threading.Thread(
target=self._periodic_flush_loop,
name=f"AudioFlush-{meeting_id[:8]}",
daemon=True,
)
self._flush_thread.start()
self._ensure_closed()
self._initialize_meeting_dir(meeting_id, asset_path)
self._write_manifest(meeting_id, wrapped_dek, sample_rate)
self._open_audio_file(dek)
self._reset_state(sample_rate)
self._start_flush_thread(meeting_id)
logger.info(
"Opened audio writer: meeting=%s, dir=%s, buffer_size=%d",
@@ -148,6 +121,63 @@ class MeetingAudioWriter:
self._buffer_size,
)
def _ensure_closed(self) -> None:
"""Raise if writer is already open."""
if self._asset_writer is not None:
raise RuntimeError("Writer already open")
def _initialize_meeting_dir(self, meeting_id: str, asset_path: str | None) -> None:
"""Create the meeting directory for this session."""
storage_path = asset_path or meeting_id
self._meeting_dir = self._meetings_dir / storage_path
self._meeting_dir.mkdir(parents=True, exist_ok=True)
def _write_manifest(
self,
meeting_id: str,
wrapped_dek: bytes,
sample_rate: int,
) -> None:
"""Write the manifest.json metadata file."""
if self._meeting_dir is None:
raise RuntimeError("Meeting directory not initialized")
manifest = {
"meeting_id": meeting_id,
"created_at": datetime.now(UTC).isoformat(),
"sample_rate": sample_rate,
"channels": 1,
"format": "pcm16",
"wrapped_dek": wrapped_dek.hex(),
}
manifest_path = self._meeting_dir / "manifest.json"
manifest_path.write_text(json.dumps(manifest, indent=2))
def _open_audio_file(self, dek: bytes) -> None:
"""Open the encrypted audio file for writing."""
if self._meeting_dir is None:
raise RuntimeError("Meeting directory not initialized")
audio_path = self._meeting_dir / "audio.enc"
self._asset_writer = ChunkedAssetWriter(self._crypto)
self._asset_writer.open(audio_path, dek)
def _reset_state(self, sample_rate: int) -> None:
"""Reset internal counters and buffers for a new meeting."""
self._sample_rate = sample_rate
self._chunk_count = 0
self._write_count = 0
self._buffer = io.BytesIO()
def _start_flush_thread(self, meeting_id: str) -> None:
"""Start periodic flush thread for crash resilience."""
self._stop_flush.clear()
self._flush_thread = threading.Thread(
target=self._periodic_flush_loop,
name=f"AudioFlush-{meeting_id[:8]}",
daemon=True,
)
self._flush_thread.start()
logger.info("flush_thread_started", meeting_id=meeting_id)
def _periodic_flush_loop(self) -> None:
"""Background thread: periodically flush buffer for crash resilience."""
while not self._stop_flush.wait(timeout=PERIODIC_FLUSH_INTERVAL_SECONDS):
@@ -243,7 +273,9 @@ class MeetingAudioWriter:
if self._flush_thread is not None:
self._flush_thread.join(timeout=3.0)
if self._flush_thread.is_alive():
logger.warning("Audio flush thread did not stop within timeout")
logger.warning("flush_thread_timeout", message="Audio flush thread did not stop within timeout")
else:
logger.info("flush_thread_stopped")
self._flush_thread = None
if self._asset_writer is not None:

View File

@@ -15,6 +15,7 @@ from noteflow.domain.auth.oidc import (
OidcProviderConfig,
OidcProviderCreateParams,
OidcProviderPreset,
OidcProviderRegistration,
)
from noteflow.infrastructure.auth.oidc_discovery import (
OidcDiscoveryClient,
@@ -186,21 +187,15 @@ class OidcProviderRegistry:
async def create_provider(
self,
workspace_id: UUID,
name: str,
issuer_url: str,
client_id: str,
params: OidcProviderCreateParams | None = None,
registration: OidcProviderRegistration,
*,
params: OidcProviderCreateParams | None = None,
auto_discover: bool = True,
) -> OidcProviderConfig:
"""Create and configure a new OIDC provider.
Args:
workspace_id: Workspace this provider belongs to.
name: Display name for the provider.
issuer_url: OIDC issuer URL.
client_id: OAuth client ID.
registration: Provider registration details.
params: Optional creation parameters (preset, scopes, etc.).
auto_discover: Whether to fetch discovery document.
@@ -223,10 +218,10 @@ class OidcProviderRegistry:
)
provider = OidcProviderConfig.create(
workspace_id=workspace_id,
name=name,
issuer_url=issuer_url,
client_id=client_id,
workspace_id=registration.workspace_id,
name=registration.name,
issuer_url=registration.issuer_url,
client_id=registration.client_id,
params=effective_params,
)
@@ -350,24 +345,14 @@ class OidcAuthService:
async def register_provider(
self,
workspace_id: UUID,
name: str,
issuer_url: str,
client_id: str,
client_secret: str | None = None,
registration: OidcProviderRegistration,
*,
preset: OidcProviderPreset = OidcProviderPreset.CUSTOM,
uow: UnitOfWork | None = None,
) -> tuple[OidcProviderConfig, list[str]]:
"""Register a new OIDC provider with validation.
Args:
workspace_id: Workspace this provider belongs to.
name: Display name for the provider.
issuer_url: OIDC issuer URL.
client_id: OAuth client ID.
client_secret: Optional client secret (for confidential clients).
preset: Provider preset.
registration: Provider registration details.
uow: Unit of work for persistence.
Returns:
@@ -377,17 +362,14 @@ class OidcAuthService:
OidcDiscoveryError: If discovery fails.
"""
provider = await self._registry.create_provider(
workspace_id=workspace_id,
name=name,
issuer_url=issuer_url,
client_id=client_id,
params=OidcProviderCreateParams(preset=preset),
registration,
params=OidcProviderCreateParams(preset=registration.preset),
)
warnings = await self._registry.validate_provider(provider)
# Store client secret securely if provided
if client_secret and uow:
if registration.client_secret and uow:
# Would store in IntegrationSecretModel
logger.info("Client secret provided for provider %s", provider.id)

View File

@@ -238,14 +238,18 @@ class GoogleCalendarAdapter(CalendarPort):
def _extract_meeting_url(self, item: _GoogleEvent) -> str | None:
"""Extract video meeting URL from event data."""
if hangout_link := item.get("hangoutLink"):
hangout_link = item.get("hangoutLink")
if hangout_link:
return hangout_link
if conference_data := item.get("conferenceData"):
entry_points = conference_data.get("entryPoints", [])
for entry in entry_points:
if entry.get("entryPointType") == "video":
if uri := entry.get("uri"):
return uri
conference_data = item.get("conferenceData")
if not conference_data:
return None
entry_points = conference_data.get("entryPoints", [])
for entry in entry_points:
uri = entry.get("uri")
if entry.get("entryPointType") == "video" and uri:
return uri
return None

View File

@@ -5,6 +5,7 @@ Implements CalendarPort for Outlook using Microsoft Graph API.
from __future__ import annotations
from dataclasses import dataclass
from datetime import UTC, datetime, timedelta
from typing import Final, TypedDict, cast
@@ -32,6 +33,16 @@ MAX_ERROR_BODY_LENGTH: Final[int] = 500
GRAPH_API_MAX_PAGE_SIZE: Final[int] = 100 # Graph API maximum
@dataclass(frozen=True, slots=True)
class _OutlookEventQuery:
"""Query parameters for fetching calendar events."""
start_time: str
end_time: str
hours_ahead: int
limit: int
class _OutlookDateTime(TypedDict, total=False):
dateTime: str
timeZone: str
@@ -139,22 +150,6 @@ class OutlookCalendarAdapter(CalendarPort):
"Prefer": 'outlook.timezone="UTC"',
}
# Initial page request
page_size = min(limit, GRAPH_API_MAX_PAGE_SIZE)
url: str | None = f"{self.GRAPH_API_BASE}/me/calendarView"
params: dict[str, str | int] | None = {
"startDateTime": start_time,
"endDateTime": end_time,
"$top": page_size,
"$orderby": "start/dateTime",
"$select": (
"id,subject,start,end,location,bodyPreview,"
"attendees,isAllDay,seriesMasterId,onlineMeeting,onlineMeetingUrl"
),
}
all_events: list[CalendarEventInfo] = []
with log_timing(
"outlook_calendar_list_events",
hours_ahead=hours_ahead,
@@ -164,38 +159,17 @@ class OutlookCalendarAdapter(CalendarPort):
timeout=httpx.Timeout(GRAPH_API_TIMEOUT),
limits=httpx.Limits(max_connections=MAX_CONNECTIONS),
) as client:
while url is not None:
response = await client.get(url, params=params, headers=headers)
if response.status_code == HTTP_STATUS_UNAUTHORIZED:
raise OutlookCalendarError(ERR_TOKEN_EXPIRED)
if response.status_code != HTTP_STATUS_OK:
error_body = _truncate_error_body(response.text)
logger.error("Microsoft Graph API error: %s", error_body)
raise OutlookCalendarError(f"{ERR_API_PREFIX}{error_body}")
data_value = response.json()
if not isinstance(data_value, dict):
logger.warning("Unexpected Microsoft Graph response payload")
break
data = cast(_OutlookEventsResponse, data_value)
items = data.get("value", [])
for item in items:
all_events.append(self._parse_event(item))
if len(all_events) >= limit:
logger.info(
"outlook_calendar_events_fetched",
event_count=len(all_events),
hours_ahead=hours_ahead,
)
return all_events
# Check for next page
next_link = data.get("@odata.nextLink") or data.get("@odata_nextLink")
url = str(next_link) if isinstance(next_link, str) else None
params = None # nextLink includes query params
query = _OutlookEventQuery(
start_time=start_time,
end_time=end_time,
hours_ahead=hours_ahead,
limit=limit,
)
all_events = await self._fetch_events(
client,
headers,
query,
)
logger.info(
"outlook_calendar_events_fetched",
@@ -204,6 +178,75 @@ class OutlookCalendarAdapter(CalendarPort):
)
return all_events
async def _fetch_events(
self,
client: httpx.AsyncClient,
headers: dict[str, str],
query: _OutlookEventQuery,
) -> list[CalendarEventInfo]:
"""Fetch events with pagination handling."""
page_size = min(query.limit, GRAPH_API_MAX_PAGE_SIZE)
url: str | None = f"{self.GRAPH_API_BASE}/me/calendarView"
params: dict[str, str | int] | None = {
"startDateTime": query.start_time,
"endDateTime": query.end_time,
"$top": page_size,
"$orderby": "start/dateTime",
"$select": (
"id,subject,start,end,location,bodyPreview,"
"attendees,isAllDay,seriesMasterId,onlineMeeting,onlineMeetingUrl"
),
}
all_events: list[CalendarEventInfo] = []
while url is not None:
response = await client.get(url, params=params, headers=headers)
self._raise_for_status(response)
parsed = self._parse_events_response(response)
if parsed is None:
break
items, next_url = parsed
for item in items:
all_events.append(self._parse_event(item))
if len(all_events) >= query.limit:
logger.info(
"outlook_calendar_events_fetched",
event_count=len(all_events),
hours_ahead=query.hours_ahead,
)
return all_events
url = next_url
params = None # nextLink includes query params
return all_events
@staticmethod
def _raise_for_status(response: httpx.Response) -> None:
"""Raise OutlookCalendarError on non-success responses."""
if response.status_code == HTTP_STATUS_UNAUTHORIZED:
raise OutlookCalendarError(ERR_TOKEN_EXPIRED)
if response.status_code != HTTP_STATUS_OK:
error_body = _truncate_error_body(response.text)
logger.error("Microsoft Graph API error: %s", error_body)
raise OutlookCalendarError(f"{ERR_API_PREFIX}{error_body}")
@staticmethod
def _parse_events_response(
response: httpx.Response,
) -> tuple[list[_OutlookEvent], str | None] | None:
"""Parse event payload and next link from the response."""
data_value = response.json()
if not isinstance(data_value, dict):
logger.warning("Unexpected Microsoft Graph response payload")
return None
data = cast(_OutlookEventsResponse, data_value)
items = data.get("value", [])
next_link = data.get("@odata.nextLink") or data.get("@odata_nextLink")
next_url = str(next_link) if isinstance(next_link, str) else None
return items, next_url
async def get_user_email(self, access_token: str) -> str:
"""Get authenticated user's email address.

View File

@@ -10,7 +10,7 @@ from __future__ import annotations
import os
from collections.abc import Mapping, Sequence
from typing import TYPE_CHECKING, Protocol, Self, cast
from typing import TYPE_CHECKING, Protocol, Self, TypedDict, Unpack, cast
from noteflow.config.constants import DEFAULT_SAMPLE_RATE, ERR_HF_TOKEN_REQUIRED
from noteflow.infrastructure.diarization.dto import SpeakerTurn
@@ -50,6 +50,15 @@ class _OfflinePipeline(Protocol):
class _TorchModule(Protocol):
def from_numpy(self, ndarray: NDArray[np.float32]) -> Tensor: ...
class _DiarizationEngineKwargs(TypedDict, total=False):
"""Optional diarization engine settings."""
hf_token: str | None
streaming_latency: float
min_speakers: int
max_speakers: int
logger = get_logger(__name__)
@@ -63,21 +72,19 @@ class DiarizationEngine:
def __init__(
self,
device: str = "auto",
hf_token: str | None = None,
streaming_latency: float = 0.5,
min_speakers: int = 1,
max_speakers: int = 10,
**kwargs: Unpack[_DiarizationEngineKwargs],
) -> None:
"""Initialize the diarization engine.
Args:
device: Device to use ("auto", "cpu", "cuda", "mps").
"auto" selects CUDA > MPS > CPU based on availability.
hf_token: HuggingFace token for pyannote model access.
streaming_latency: Latency for streaming diarization in seconds.
min_speakers: Minimum expected speakers for offline diarization.
max_speakers: Maximum expected speakers for offline diarization.
**kwargs: Optional settings (hf_token, streaming_latency, min_speakers, max_speakers).
"""
hf_token = kwargs.get("hf_token")
streaming_latency = kwargs.get("streaming_latency", 0.5)
min_speakers = kwargs.get("min_speakers", 1)
max_speakers = kwargs.get("max_speakers", 10)
self._device_preference = device
self._device: str | None = None
self._hf_token = hf_token

View File

@@ -156,7 +156,7 @@ class DiarizationSession:
self._turns.clear()
# Explicitly release pipeline reference to allow GC and GPU memory release
self._pipeline = None
logger.debug("Session %s closed", self.meeting_id)
logger.info("diarization_session_closed", meeting_id=self.meeting_id)
@property
def stream_time(self) -> float:

View File

@@ -5,6 +5,7 @@ Export meeting transcripts to HTML format.
from __future__ import annotations
import time
from datetime import datetime
from typing import TYPE_CHECKING
@@ -14,6 +15,7 @@ from noteflow.infrastructure.export._formatting import (
format_datetime,
format_timestamp,
)
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Sequence
@@ -22,6 +24,8 @@ if TYPE_CHECKING:
from noteflow.domain.entities.segment import Segment
from noteflow.domain.entities.summary import Summary
logger = get_logger(__name__)
# CSS styles for print-friendly HTML output
_HTML_STYLES = """
@@ -174,6 +178,7 @@ class HtmlExporter:
Returns:
HTML-formatted transcript string.
"""
start = time.perf_counter()
content_parts: list[str] = [f"<h1>{escape_html(meeting.title)}</h1>"]
content_parts.extend(_build_metadata_html(meeting, len(segments)))
content_parts.extend(_build_transcript_html(segments))
@@ -189,4 +194,13 @@ class HtmlExporter:
)
)
content = "\n".join(content_parts)
return _build_html_document(title=escape_html(meeting.title), content=content)
result = _build_html_document(title=escape_html(meeting.title), content=content)
elapsed_ms = (time.perf_counter() - start) * 1000
logger.info(
"html_exported",
meeting_id=str(meeting.id),
segment_count=len(segments),
size_bytes=len(result.encode("utf-8")),
duration_ms=round(elapsed_ms, 2),
)
return result

View File

@@ -5,10 +5,12 @@ Export meeting transcripts to Markdown format.
from __future__ import annotations
import time
from datetime import datetime
from typing import TYPE_CHECKING
from noteflow.infrastructure.export._formatting import format_datetime, format_timestamp
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Sequence
@@ -16,6 +18,8 @@ if TYPE_CHECKING:
from noteflow.domain.entities.meeting import Meeting
from noteflow.domain.entities.segment import Segment
logger = get_logger(__name__)
class MarkdownExporter:
"""Export meeting transcripts to Markdown format.
@@ -48,6 +52,7 @@ class MarkdownExporter:
Returns:
Markdown-formatted transcript string.
"""
start = time.perf_counter()
lines: list[str] = [
f"# {meeting.title}",
"",
@@ -86,4 +91,13 @@ class MarkdownExporter:
lines.append("---")
lines.append(f"*Exported from NoteFlow on {format_datetime(datetime.now())}*")
return "\n".join(lines)
result = "\n".join(lines)
elapsed_ms = (time.perf_counter() - start) * 1000
logger.info(
"markdown_exported",
meeting_id=str(meeting.id),
segment_count=len(segments),
size_bytes=len(result.encode("utf-8")),
duration_ms=round(elapsed_ms, 2),
)
return result

View File

@@ -5,6 +5,7 @@ Export meeting transcripts to PDF format.
from __future__ import annotations
import time
from typing import TYPE_CHECKING, Protocol, cast
from noteflow.config.constants import EXPORT_EXT_PDF
@@ -13,6 +14,7 @@ from noteflow.infrastructure.export._formatting import (
format_datetime,
format_timestamp,
)
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Sequence
@@ -20,6 +22,8 @@ if TYPE_CHECKING:
from noteflow.domain.entities.meeting import Meeting
from noteflow.domain.entities.segment import Segment
logger = get_logger(__name__)
class _WeasyHTMLProtocol(Protocol):
"""Protocol for weasyprint HTML class."""
@@ -175,6 +179,7 @@ class PdfExporter:
Raises:
RuntimeError: If weasyprint is not installed.
"""
start = time.perf_counter()
weasy_html = _get_weasy_html()
if weasy_html is None:
raise RuntimeError(
@@ -183,6 +188,14 @@ class PdfExporter:
html_content = self.build_html(meeting, segments)
pdf_bytes: bytes = weasy_html(string=html_content).write_pdf()
elapsed_ms = (time.perf_counter() - start) * 1000
logger.info(
"pdf_exported",
meeting_id=str(meeting.id),
segment_count=len(segments),
size_bytes=len(pdf_bytes),
duration_ms=round(elapsed_ms, 2),
)
return pdf_bytes
def build_html(self, meeting: Meeting, segments: Sequence[Segment]) -> str:

View File

@@ -28,10 +28,18 @@ from .structured import (
user_id_var,
workspace_id_var,
)
from .rate_limit import (
DEFAULT_RATE_LIMIT_SECONDS,
RateLimitedLogger,
get_client_rate_limiter,
)
from .timing import log_timing
from .transitions import log_state_transition
__all__ = [
"DEFAULT_RATE_LIMIT_SECONDS",
"RateLimitedLogger",
"get_client_rate_limiter",
"LogBuffer",
"LogBufferHandler",
"LogEntry",

View File

@@ -67,27 +67,35 @@ def add_otel_trace_context(
Returns:
Updated event dictionary with trace context.
"""
if trace_context := _extract_otel_context():
trace_id, span_id, parent_id = trace_context
event_dict[_TRACE_ID] = format(trace_id, _HEX_32)
event_dict[_SPAN_ID] = format(span_id, _HEX_16)
if parent_id is not None:
event_dict[_PARENT_SPAN_ID] = format(parent_id, _HEX_16)
return event_dict
def _extract_otel_context() -> tuple[int, int, int | None] | None:
"""Return OpenTelemetry trace/span IDs if available."""
try:
from opentelemetry import trace
span = trace.get_current_span()
if span.is_recording():
ctx = span.get_span_context()
if ctx.is_valid:
event_dict[_TRACE_ID] = format(ctx.trace_id, _HEX_32)
event_dict[_SPAN_ID] = format(ctx.span_id, _HEX_16)
# Parent span ID if available
parent = getattr(span, "parent", None)
if parent is not None:
parent_ctx = getattr(parent, _SPAN_ID, None)
if parent_ctx is not None:
event_dict[_PARENT_SPAN_ID] = format(parent_ctx, _HEX_16)
except ImportError:
pass
return None
try:
span = trace.get_current_span()
if not span.is_recording():
return None
ctx = span.get_span_context()
if not ctx.is_valid:
return None
parent = getattr(span, "parent", None)
parent_ctx = getattr(parent, _SPAN_ID, None) if parent is not None else None
return ctx.trace_id, ctx.span_id, parent_ctx
except (AttributeError, TypeError):
# Graceful degradation for edge cases
pass
return event_dict
return None
def build_processor_chain(config: LoggingConfig) -> Sequence[Processor]:

View File

@@ -0,0 +1,106 @@
"""Rate-limited logging utilities.
Provide helpers to prevent log spam for repetitive conditions.
"""
from __future__ import annotations
import time
from typing import Final
from .config import get_logger
# Default rate limit interval (60 seconds)
DEFAULT_RATE_LIMIT_SECONDS: Final[float] = 60.0
class RateLimitedLogger:
"""Logger that rate-limits messages by operation key.
Prevents log spam by only logging each unique key once per interval.
Example:
rate_limited = RateLimitedLogger()
# This will log at most once per 60 seconds for "create_meeting"
rate_limited.warn_stub_missing("create_meeting")
"""
def __init__(
self,
interval_seconds: float = DEFAULT_RATE_LIMIT_SECONDS,
logger_name: str | None = None,
) -> None:
"""Initialize rate-limited logger.
Args:
interval_seconds: Minimum time between logs for same key.
logger_name: Optional logger name (defaults to rate_limit module).
"""
self._interval = interval_seconds
self._last_logged: dict[str, float] = {}
self._logger = get_logger(logger_name or __name__)
def _should_log(self, key: str) -> bool:
"""Check if enough time has passed to log this key."""
now = time.monotonic()
last = self._last_logged.get(key)
if last is None or (now - last) >= self._interval:
self._last_logged[key] = now
return True
return False
def warn_stub_missing(self, operation: str) -> None:
"""Log a warning that gRPC stub is not available (rate-limited).
Args:
operation: Name of the operation being attempted.
"""
key = f"stub_missing:{operation}"
if self._should_log(key):
self._logger.warning(
"grpc_stub_not_available",
operation=operation,
hint="Client not connected to server",
)
def warn(self, key: str, message: str, **context: str | int | float | None) -> None:
"""Log a rate-limited warning.
Args:
key: Unique key for rate limiting.
message: Log message/event name.
**context: Additional context fields.
"""
if self._should_log(key):
ctx = {k: v for k, v in context.items() if v is not None}
self._logger.warning(message, **ctx)
def reset(self, key: str | None = None) -> None:
"""Reset rate limit state.
Args:
key: Specific key to reset, or None to reset all.
"""
if key is None:
self._last_logged.clear()
else:
self._last_logged.pop(key, None)
# Module-level singleton for client mixin use
_client_rate_limiter: RateLimitedLogger | None = None
def get_client_rate_limiter() -> RateLimitedLogger:
"""Get the shared rate-limited logger for gRPC client operations.
Returns:
Singleton RateLimitedLogger instance.
"""
global _client_rate_limiter
if _client_rate_limiter is None:
_client_rate_limiter = RateLimitedLogger(
interval_seconds=DEFAULT_RATE_LIMIT_SECONDS,
logger_name="noteflow.grpc.client",
)
return _client_rate_limiter

View File

@@ -17,6 +17,33 @@ from .config import get_logger
P = ParamSpec("P")
R = TypeVar("R")
T = TypeVar("T") # Used by _wrap_async for coroutine inner type
def _wrap_async(
func: Callable[P, Coroutine[object, object, T]],
operation: str,
context: dict[str, str | int | float | None],
) -> Callable[P, Coroutine[object, object, T]]:
@functools.wraps(func)
async def async_wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
with log_timing(operation, **context):
return await func(*args, **kwargs)
return async_wrapper
def _wrap_sync(
func: Callable[P, R],
operation: str,
context: dict[str, str | int | float | None],
) -> Callable[P, R]:
@functools.wraps(func)
def sync_wrapper(*args: P.args, **kwargs: P.kwargs) -> R:
with log_timing(operation, **context):
return func(*args, **kwargs)
return sync_wrapper
@contextmanager
@@ -94,28 +121,10 @@ def timed(
func: Callable[P, R],
) -> Callable[P, R]:
if asyncio.iscoroutinefunction(func):
@functools.wraps(func)
async def async_wrapper(
*args: P.args, **kwargs: P.kwargs
) -> R:
with log_timing(operation, **context):
# Cast required: iscoroutinefunction narrows but type checker
# cannot propagate this to the return type of func()
coro = cast(Coroutine[object, object, R], func(*args, **kwargs))
return await coro
# Cast required: async wrapper must be returned as Callable[P, R]
# but wraps() preserves async signature which doesn't match R directly
return cast(Callable[P, R], async_wrapper)
else:
@functools.wraps(func)
def sync_wrapper(*args: P.args, **kwargs: P.kwargs) -> R:
with log_timing(operation, **context):
return func(*args, **kwargs)
return sync_wrapper
# Cast required: asyncio.iscoroutinefunction provides runtime narrowing
# but type checker cannot propagate this to the generic R type
async_func = cast(Callable[P, Coroutine[object, object, R]], func)
return cast(Callable[P, R], _wrap_async(async_func, operation, context))
return _wrap_sync(func, operation, context)
return decorator

View File

@@ -9,11 +9,12 @@ import asyncio
import logging
from collections import deque
from threading import Lock
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, cast
from noteflow.application.observability.ports import (
NullUsageEventSink,
UsageEvent,
UsageEventContext,
UsageEventSink,
UsageMetrics,
)
@@ -31,6 +32,19 @@ if TYPE_CHECKING:
logger = get_logger(__name__)
def _extract_event_context(
context: UsageEventContext | None, attributes: dict[str, object]
) -> tuple[UsageEventContext, dict[str, object]]:
"""Extract context fields from attributes when not provided explicitly."""
if context is not None:
return context, attributes
meeting_id = cast(str | None, attributes.pop("meeting_id", None))
success = cast(bool, attributes.pop("success", True))
error_code = cast(str | None, attributes.pop("error_code", None))
return UsageEventContext(meeting_id=meeting_id, success=success, error_code=error_code), attributes
class LoggingUsageEventSink:
"""Usage event sink that logs events.
@@ -66,20 +80,27 @@ class LoggingUsageEventSink:
event_type: str,
metrics: UsageMetrics | None = None,
*,
meeting_id: str | None = None,
success: bool = True,
error_code: str | None = None,
context: UsageEventContext | None = None,
**attributes: object,
) -> None:
"""Log a simple usage event."""
m = metrics or UsageMetrics()
self.record(UsageEvent(
event_type=event_type, meeting_id=meeting_id,
provider_name=m.provider_name, model_name=m.model_name,
tokens_input=m.tokens_input, tokens_output=m.tokens_output,
latency_ms=m.latency_ms, success=success,
error_code=error_code, attributes=dict(attributes),
))
attrs = dict(attributes)
resolved_context, attrs = _extract_event_context(context, attrs)
self.record(
UsageEvent(
event_type=event_type,
meeting_id=resolved_context.meeting_id,
provider_name=m.provider_name,
model_name=m.model_name,
tokens_input=m.tokens_input,
tokens_output=m.tokens_output,
latency_ms=m.latency_ms,
success=resolved_context.success,
error_code=resolved_context.error_code,
attributes=attrs,
)
)
def _build_event_attributes(event: UsageEvent) -> dict[str, str | int | float | bool]:
@@ -178,20 +199,27 @@ class OtelUsageEventSink:
event_type: str,
metrics: UsageMetrics | None = None,
*,
meeting_id: str | None = None,
success: bool = True,
error_code: str | None = None,
context: UsageEventContext | None = None,
**attributes: object,
) -> None:
"""Record a simple usage event to current span."""
m = metrics or UsageMetrics()
self.record(UsageEvent(
event_type=event_type, meeting_id=meeting_id,
provider_name=m.provider_name, model_name=m.model_name,
tokens_input=m.tokens_input, tokens_output=m.tokens_output,
latency_ms=m.latency_ms, success=success,
error_code=error_code, attributes=dict(attributes),
))
attrs = dict(attributes)
resolved_context, attrs = _extract_event_context(context, attrs)
self.record(
UsageEvent(
event_type=event_type,
meeting_id=resolved_context.meeting_id,
provider_name=m.provider_name,
model_name=m.model_name,
tokens_input=m.tokens_input,
tokens_output=m.tokens_output,
latency_ms=m.latency_ms,
success=resolved_context.success,
error_code=resolved_context.error_code,
attributes=attrs,
)
)
class BufferedDatabaseUsageEventSink:
@@ -248,20 +276,27 @@ class BufferedDatabaseUsageEventSink:
event_type: str,
metrics: UsageMetrics | None = None,
*,
meeting_id: str | None = None,
success: bool = True,
error_code: str | None = None,
context: UsageEventContext | None = None,
**attributes: object,
) -> None:
"""Buffer a simple usage event."""
m = metrics or UsageMetrics()
self.record(UsageEvent(
event_type=event_type, meeting_id=meeting_id,
provider_name=m.provider_name, model_name=m.model_name,
tokens_input=m.tokens_input, tokens_output=m.tokens_output,
latency_ms=m.latency_ms, success=success,
error_code=error_code, attributes=dict(attributes),
))
attrs = dict(attributes)
resolved_context, attrs = _extract_event_context(context, attrs)
self.record(
UsageEvent(
event_type=event_type,
meeting_id=resolved_context.meeting_id,
provider_name=m.provider_name,
model_name=m.model_name,
tokens_input=m.tokens_input,
tokens_output=m.tokens_output,
latency_ms=m.latency_ms,
success=resolved_context.success,
error_code=resolved_context.error_code,
attributes=attrs,
)
)
def _schedule_flush(self) -> None:
"""Schedule an async flush on the event loop."""

View File

@@ -348,34 +348,20 @@ async def _handle_tables_without_alembic(
) -> None:
"""Handle case where tables exist but Alembic version doesn't."""
critical_tables = ["meetings", "segments", "diarization_jobs", "user_preferences"]
missing_tables: list[str] = []
async with session_factory() as session:
for table_name in critical_tables:
if not await _table_exists(session, table_name):
missing_tables.append(table_name)
missing_tables = await _find_missing_tables(session_factory, critical_tables)
if missing_tables:
logger.warning(
"Tables exist but missing critical tables: %s. Creating missing tables...",
", ".join(missing_tables),
)
if "user_preferences" in missing_tables:
async with session_factory() as session:
if await _create_user_preferences_table(session):
logger.info("Successfully created user_preferences table")
await _create_user_preferences_if_missing(session_factory, missing_tables)
logger.info("Stamping database after creating missing tables...")
await _stamp_database_async(database_url)
logger.info("Database schema ready (created missing tables and stamped)")
return
# Safety check for user_preferences
async with session_factory() as session:
if not await _table_exists(session, "user_preferences"):
logger.warning("user_preferences table missing despite check, creating it...")
await _create_user_preferences_table(session)
logger.info("Created user_preferences table (safety check)")
await _ensure_user_preferences_table(session_factory)
logger.info(
"Tables exist (%d) but Alembic version table missing, stamping database...",
@@ -385,6 +371,43 @@ async def _handle_tables_without_alembic(
logger.info("Database schema ready (stamped from schema.sql)")
async def _find_missing_tables(
session_factory: async_sessionmaker[AsyncSession],
critical_tables: list[str],
) -> list[str]:
"""Return list of critical tables missing from the database."""
missing_tables: list[str] = []
async with session_factory() as session:
for table_name in critical_tables:
if not await _table_exists(session, table_name):
missing_tables.append(table_name)
return missing_tables
async def _create_user_preferences_if_missing(
session_factory: async_sessionmaker[AsyncSession],
missing_tables: list[str],
) -> None:
"""Create user_preferences table when listed as missing."""
if "user_preferences" not in missing_tables:
return
async with session_factory() as session:
if await _create_user_preferences_table(session):
logger.info("Successfully created user_preferences table")
async def _ensure_user_preferences_table(
session_factory: async_sessionmaker[AsyncSession],
) -> None:
"""Create user_preferences table if still missing."""
async with session_factory() as session:
if await _table_exists(session, "user_preferences"):
return
logger.warning("user_preferences table missing despite check, creating it...")
await _create_user_preferences_table(session)
logger.info("Created user_preferences table (safety check)")
async def _handle_alembic_with_tables(
session_factory: async_sessionmaker[AsyncSession],
database_url: str,

View File

@@ -8,11 +8,11 @@ from __future__ import annotations
from collections.abc import Sequence
from datetime import datetime
from typing import TYPE_CHECKING
from uuid import UUID
from typing import TYPE_CHECKING, Unpack
from noteflow.domain.entities import Meeting, Segment, Summary
from noteflow.domain.value_objects import MeetingId, MeetingState
from noteflow.domain.ports.repositories.transcript import MeetingListKwargs
if TYPE_CHECKING:
from noteflow.grpc.meeting_store import MeetingStore
@@ -43,15 +43,22 @@ class MemoryMeetingRepository:
async def list_all(
self,
states: list[MeetingState] | None = None,
limit: int = 100,
offset: int = 0,
sort_desc: bool = True,
project_id: UUID | None = None,
**kwargs: Unpack[MeetingListKwargs],
) -> tuple[Sequence[Meeting], int]:
"""List meetings via in-memory store with optional state filtering."""
states = kwargs.get("states")
limit = kwargs.get("limit", 100)
offset = kwargs.get("offset", 0)
sort_desc = kwargs.get("sort_desc", True)
project_id = kwargs.get("project_id")
project_filter = str(project_id) if project_id else None
return self._store.list_all(states, limit, offset, sort_desc, project_filter)
return self._store.list_all(
states=states,
limit=limit,
offset=offset,
sort_desc=sort_desc,
project_id=project_filter,
)
async def count_by_state(self, state: MeetingState) -> int:
"""Count meetings in a specific state."""

View File

@@ -7,7 +7,7 @@ operations requiring database persistence.
from __future__ import annotations
from collections.abc import Sequence
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, Unpack
from uuid import UUID
_ERR_USERS_DB = "Users require database persistence"
@@ -19,8 +19,8 @@ if TYPE_CHECKING:
Workspace,
WorkspaceMembership,
WorkspaceRole,
WorkspaceSettings,
)
from noteflow.domain.ports.repositories.identity._workspace import WorkspaceCreateKwargs
class UnsupportedUserRepository:
@@ -86,9 +86,7 @@ class UnsupportedWorkspaceRepository:
workspace_id: UUID,
name: str,
owner_id: UUID,
slug: str | None = None,
is_default: bool = False,
settings: WorkspaceSettings | None = None,
**kwargs: Unpack[WorkspaceCreateKwargs],
) -> Workspace:
"""Not supported in memory mode."""
raise NotImplementedError(_ERR_WORKSPACES_DB)

View File

@@ -7,13 +7,14 @@ operations requiring database persistence.
from __future__ import annotations
from collections.abc import Sequence
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, Unpack
from uuid import UUID
_ERR_PROJECTS_DB = "Projects require database persistence"
if TYPE_CHECKING:
from noteflow.domain.entities.project import Project, ProjectSettings
from noteflow.domain.entities.project import Project
from noteflow.domain.ports.repositories.identity._project import ProjectCreateKwargs
from noteflow.domain.identity import ProjectMembership, ProjectRole
@@ -47,10 +48,7 @@ class UnsupportedProjectRepository:
project_id: UUID,
workspace_id: UUID,
name: str,
slug: str | None = None,
description: str | None = None,
is_default: bool = False,
settings: ProjectSettings | None = None,
**kwargs: Unpack[ProjectCreateKwargs],
) -> Project:
"""Not supported in memory mode."""
raise NotImplementedError(_ERR_PROJECTS_DB)

View File

@@ -7,11 +7,11 @@ operations requiring database persistence.
from __future__ import annotations
from collections.abc import Sequence
from datetime import datetime
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, Unpack
from uuid import UUID
from noteflow.config.constants import ERR_SERVER_RESTARTED
from noteflow.domain.ports.repositories.background import DiarizationStatusKwargs
_ERR_ANNOTATIONS_DB = "Annotations require database persistence"
_ERR_DIARIZATION_DB = "Diarization jobs require database persistence"
@@ -87,11 +87,7 @@ class UnsupportedDiarizationJobRepository:
self,
job_id: str,
status: int,
*,
segments_updated: int | None = None,
speaker_ids: list[str] | None = None,
error_message: str | None = None,
started_at: datetime | None = None,
**kwargs: Unpack[DiarizationStatusKwargs],
) -> bool:
"""Not supported in memory mode."""
raise NotImplementedError(_ERR_DIARIZATION_DB)

View File

@@ -8,6 +8,7 @@ Mixins require class attributes to be defined by the implementing class:
from __future__ import annotations
import time
from collections.abc import Sequence
from typing import TYPE_CHECKING, TypeVar, cast
from uuid import UUID
@@ -16,10 +17,14 @@ from sqlalchemy import select
from sqlalchemy.engine import CursorResult
from sqlalchemy.ext.asyncio import AsyncSession
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.sql import Delete, Select, Update
logger = get_logger(__name__)
TModel = TypeVar("TModel", bound="DeclarativeBase")
TExists = TypeVar("TExists")
TEntity = TypeVar("TEntity")
@@ -72,8 +77,12 @@ class BaseRepository:
Returns:
Single model instance or None if not found.
"""
start = time.perf_counter()
result = await self._session.execute(stmt)
return result.scalar_one_or_none()
row = result.scalar_one_or_none()
elapsed_ms = (time.perf_counter() - start) * 1000
logger.debug("db_execute_scalar", duration_ms=round(elapsed_ms, 2), found=row is not None)
return row
async def _execute_scalars(
self,
@@ -87,8 +96,12 @@ class BaseRepository:
Returns:
List of model instances.
"""
start = time.perf_counter()
result = await self._session.execute(stmt)
return list(result.scalars().all())
rows = list(result.scalars().all())
elapsed_ms = (time.perf_counter() - start) * 1000
logger.debug("db_execute_scalars", duration_ms=round(elapsed_ms, 2), count=len(rows))
return rows
async def _add_and_flush(self, model: TModel) -> TModel:
"""Add model to session and flush.
@@ -99,8 +112,11 @@ class BaseRepository:
Returns:
The persisted model with generated fields populated.
"""
start = time.perf_counter()
self._session.add(model)
await self._session.flush()
elapsed_ms = (time.perf_counter() - start) * 1000
logger.info("db_add_and_flush", duration_ms=round(elapsed_ms, 2))
return model
async def _delete_and_flush(self, model: object) -> None:
@@ -109,8 +125,11 @@ class BaseRepository:
Args:
model: ORM model instance to delete.
"""
start = time.perf_counter()
await self._session.delete(model)
await self._session.flush()
elapsed_ms = (time.perf_counter() - start) * 1000
logger.info("db_delete_and_flush", duration_ms=round(elapsed_ms, 2))
async def _add_all_and_flush(self, models: list[TModel]) -> list[TModel]:
"""Add multiple models to session and flush once.
@@ -123,8 +142,11 @@ class BaseRepository:
Returns:
The persisted models with generated fields populated.
"""
start = time.perf_counter()
self._session.add_all(models)
await self._session.flush()
elapsed_ms = (time.perf_counter() - start) * 1000
logger.info("db_add_all_and_flush", duration_ms=round(elapsed_ms, 2), count=len(models))
return models
async def _execute_count(self, stmt: Select[tuple[int]]) -> int:
@@ -136,8 +158,12 @@ class BaseRepository:
Returns:
Integer count value.
"""
start = time.perf_counter()
result = await self._session.execute(stmt)
return result.scalar_one()
count = result.scalar_one()
elapsed_ms = (time.perf_counter() - start) * 1000
logger.debug("db_execute_count", duration_ms=round(elapsed_ms, 2), count=count)
return count
async def _execute_exists(self, stmt: Select[tuple[TExists]]) -> bool:
"""Check if any rows match the query.
@@ -150,8 +176,12 @@ class BaseRepository:
Returns:
True if at least one row exists.
"""
start = time.perf_counter()
result = await self._session.execute(stmt.limit(1))
return result.scalar() is not None
exists = result.scalar() is not None
elapsed_ms = (time.perf_counter() - start) * 1000
logger.debug("db_execute_exists", duration_ms=round(elapsed_ms, 2), exists=exists)
return exists
async def _update_fields(
self,
@@ -167,9 +197,12 @@ class BaseRepository:
Returns:
The updated model.
"""
start = time.perf_counter()
for key, value in fields.items():
setattr(model, key, value)
await self._session.flush()
elapsed_ms = (time.perf_counter() - start) * 1000
logger.info("db_update_fields", duration_ms=round(elapsed_ms, 2), field_count=len(fields))
return model
async def _execute_update(self, stmt: Update) -> int:
@@ -181,8 +214,11 @@ class BaseRepository:
Returns:
Number of rows affected.
"""
start = time.perf_counter()
result = cast(CursorResult[tuple[()]], await self._session.execute(stmt))
await self._session.flush()
elapsed_ms = (time.perf_counter() - start) * 1000
logger.info("db_execute_update", duration_ms=round(elapsed_ms, 2), rows_affected=result.rowcount)
return result.rowcount
async def _execute_delete(self, stmt: Delete) -> int:
@@ -194,8 +230,11 @@ class BaseRepository:
Returns:
Number of rows deleted.
"""
start = time.perf_counter()
result = cast(CursorResult[tuple[()]], await self._session.execute(stmt))
await self._session.flush()
elapsed_ms = (time.perf_counter() - start) * 1000
logger.info("db_execute_delete", duration_ms=round(elapsed_ms, 2), rows_deleted=result.rowcount)
return result.rowcount

View File

@@ -11,12 +11,15 @@ from sqlalchemy import and_, delete, or_, select
from noteflow.domain.entities import Annotation
from noteflow.domain.value_objects import AnnotationId
from noteflow.infrastructure.converters import OrmConverter
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.models import AnnotationModel
from noteflow.infrastructure.persistence.repositories._base import BaseRepository
if TYPE_CHECKING:
from noteflow.domain.value_objects import MeetingId
logger = get_logger(__name__)
class SqlAlchemyAnnotationRepository(BaseRepository):
"""SQLAlchemy implementation of AnnotationRepository."""
@@ -45,6 +48,12 @@ class SqlAlchemyAnnotationRepository(BaseRepository):
self._session.add(model)
await self._session.flush()
annotation.db_id = model.id
logger.info(
"annotation_added",
annotation_id=str(annotation.id),
meeting_id=str(annotation.meeting_id),
annotation_type=annotation.annotation_type.value,
)
return annotation
async def get(self, annotation_id: AnnotationId) -> Annotation | None:
@@ -158,6 +167,11 @@ class SqlAlchemyAnnotationRepository(BaseRepository):
model.segment_ids = annotation.segment_ids
await self._session.flush()
logger.info(
"annotation_updated",
annotation_id=str(annotation.id),
annotation_type=annotation.annotation_type.value,
)
return annotation
async def delete(self, annotation_id: AnnotationId) -> bool:
@@ -179,4 +193,5 @@ class SqlAlchemyAnnotationRepository(BaseRepository):
await self._session.execute(delete(AnnotationModel).where(AnnotationModel.id == model.id))
await self._session.flush()
logger.info("annotation_deleted", annotation_id=str(annotation_id))
return True

View File

@@ -48,4 +48,10 @@ class FileSystemAssetRepository(AssetRepository):
if meeting_dir.exists():
shutil.rmtree(meeting_dir)
logger.info("Deleted meeting assets at %s", meeting_dir)
logger.info("assets_deleted", meeting_id=str(meeting_id), path=str(meeting_dir))
else:
logger.debug(
"assets_delete_skipped_not_found",
meeting_id=str(meeting_id),
path=str(meeting_dir),
)

View File

@@ -3,20 +3,24 @@
from collections.abc import Sequence
from dataclasses import dataclass, field
from datetime import UTC, datetime
from typing import Final
from typing import Final, Unpack
from uuid import UUID
from sqlalchemy import delete, select, update
from sqlalchemy.exc import IntegrityError
from noteflow.config.constants import ERR_SERVER_RESTARTED
from noteflow.domain.ports.repositories.background import DiarizationStatusKwargs
from noteflow.domain.utils.time import utc_now
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.models import (
DiarizationJobModel,
StreamingDiarizationTurnModel,
)
from noteflow.infrastructure.persistence.repositories._base import BaseRepository
logger = get_logger(__name__)
# Job status constants (mirrors proto enum)
JOB_STATUS_UNSPECIFIED: Final[int] = 0
JOB_STATUS_QUEUED: Final[int] = 1
@@ -103,6 +107,12 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
except IntegrityError as exc:
msg = f"DiarizationJob {job.job_id} already exists"
raise ValueError(msg) from exc
logger.info(
"diarization_job_created",
job_id=job.job_id,
meeting_id=job.meeting_id,
status=job.status,
)
return job
async def get(self, job_id: str) -> DiarizationJob | None:
@@ -123,21 +133,14 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
self,
job_id: str,
status: int,
*,
segments_updated: int | None = None,
speaker_ids: list[str] | None = None,
error_message: str | None = None,
started_at: datetime | None = None,
**kwargs: Unpack[DiarizationStatusKwargs],
) -> bool:
"""Update job status and optional fields.
Args:
job_id: Job identifier.
status: New status value.
segments_updated: Optional segments count.
speaker_ids: Optional speaker IDs list.
error_message: Optional error message.
started_at: Optional job start timestamp.
**kwargs: Optional update fields.
Returns:
True if job was updated, False if not found.
@@ -146,6 +149,11 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
"status": status,
"updated_at": utc_now(),
}
segments_updated = kwargs.get("segments_updated")
speaker_ids = kwargs.get("speaker_ids")
error_message = kwargs.get("error_message")
started_at = kwargs.get("started_at")
if segments_updated is not None:
values["segments_updated"] = segments_updated
if speaker_ids is not None:
@@ -157,6 +165,8 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
stmt = update(DiarizationJobModel).where(DiarizationJobModel.id == job_id).values(**values)
rowcount = await self._execute_update(stmt)
if rowcount > 0:
logger.info("diarization_job_status_updated", job_id=job_id, status=status)
return rowcount > 0
async def list_for_meeting(self, meeting_id: str) -> Sequence[DiarizationJob]:
@@ -235,7 +245,10 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
updated_at=utc_now(),
)
)
return await self._execute_update(stmt)
count = await self._execute_update(stmt)
if count > 0:
logger.info("diarization_jobs_marked_failed", count=count)
return count
async def prune_completed(self, ttl_seconds: float) -> int:
"""Delete completed/failed jobs older than TTL.
@@ -255,7 +268,10 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
),
DiarizationJobModel.updated_at < cutoff_dt,
)
return await self._execute_delete(stmt)
count = await self._execute_delete(stmt)
if count > 0:
logger.info("diarization_jobs_pruned", count=count, ttl_seconds=ttl_seconds)
return count
# Streaming diarization turn methods
@@ -286,6 +302,7 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
self._session.add(model)
await self._session.flush()
logger.debug("streaming_turns_added", meeting_id=meeting_id, count=len(turns))
return len(turns)
async def get_streaming_turns(self, meeting_id: str) -> list[StreamingTurn]:
@@ -329,4 +346,7 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
stmt = delete(StreamingDiarizationTurnModel).where(
StreamingDiarizationTurnModel.meeting_id == UUID(meeting_id)
)
return await self._execute_delete(stmt)
count = await self._execute_delete(stmt)
if count > 0:
logger.info("streaming_turns_cleared", meeting_id=meeting_id, count=count)
return count

View File

@@ -10,6 +10,7 @@ from sqlalchemy import delete, select
from noteflow.domain.entities.named_entity import EntityCategory, NamedEntity
from noteflow.infrastructure.converters.ner_converters import NerConverter
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.models import NamedEntityModel
from noteflow.infrastructure.persistence.repositories._base import (
BaseRepository,
@@ -20,6 +21,8 @@ from noteflow.infrastructure.persistence.repositories._base import (
if TYPE_CHECKING:
from noteflow.domain.value_objects import MeetingId
logger = get_logger(__name__)
class SqlAlchemyEntityRepository(
BaseRepository,
@@ -53,6 +56,12 @@ class SqlAlchemyEntityRepository(
merged = await self._session.merge(model)
await self._session.flush()
entity.db_id = merged.id
logger.info(
"entity_saved",
entity_id=str(entity.id),
meeting_id=str(entity.meeting_id),
category=entity.category.value,
)
return entity
async def save_batch(self, entities: Sequence[NamedEntity]) -> Sequence[NamedEntity]:
@@ -74,6 +83,12 @@ class SqlAlchemyEntityRepository(
entity.db_id = merged.id
await self._session.flush()
if entities:
logger.info(
"entities_batch_saved",
meeting_id=str(entities[0].meeting_id),
count=len(entities),
)
return entities
async def get(self, entity_id: UUID) -> NamedEntity | None:
@@ -116,7 +131,10 @@ class SqlAlchemyEntityRepository(
stmt = delete(NamedEntityModel).where(
NamedEntityModel.meeting_id == UUID(str(meeting_id))
)
return await self._execute_delete(stmt)
count = await self._execute_delete(stmt)
if count > 0:
logger.info("entities_deleted_by_meeting", meeting_id=str(meeting_id), count=count)
return count
async def update_pinned(self, entity_id: UUID, is_pinned: bool) -> bool:
"""Update the pinned status of an entity.
@@ -136,6 +154,7 @@ class SqlAlchemyEntityRepository(
model.is_pinned = is_pinned
await self._session.flush()
logger.info("entity_pinned_updated", entity_id=str(entity_id), is_pinned=is_pinned)
return True
async def exists_for_meeting(self, meeting_id: MeetingId) -> bool:
@@ -184,6 +203,7 @@ class SqlAlchemyEntityRepository(
model.category = EntityCategory.from_string(category).value
await self._session.flush()
logger.info("entity_updated", entity_id=str(entity_id))
return NerConverter.orm_to_domain(model)
async def delete(self, entity_id: UUID) -> bool:

View File

@@ -3,7 +3,7 @@
from __future__ import annotations
from collections.abc import Sequence
from typing import cast
from typing import Unpack, cast
from uuid import UUID
from sqlalchemy import and_, func, select
@@ -19,11 +19,10 @@ from noteflow.config.constants import (
RULE_FIELD_TEMPLATE_ID,
RULE_FIELD_TRIGGER_RULES,
)
from noteflow.domain.entities.project import (
ExportRules,
Project,
ProjectSettings,
TriggerRules,
from noteflow.domain.entities.project import ExportRules, Project, ProjectSettings, TriggerRules
from noteflow.domain.ports.repositories.identity._project import (
ProjectCreateKwargs,
ProjectCreateOptions,
)
from noteflow.domain.value_objects import ExportFormat
from noteflow.infrastructure.persistence.models import ProjectModel
@@ -256,10 +255,7 @@ class SqlAlchemyProjectRepository(
project_id: UUID,
workspace_id: UUID,
name: str,
slug: str | None = None,
description: str | None = None,
is_default: bool = False,
settings: ProjectSettings | None = None,
**kwargs: Unpack[ProjectCreateKwargs],
) -> Project:
"""Create a new project.
@@ -267,22 +263,20 @@ class SqlAlchemyProjectRepository(
project_id: UUID for the new project.
workspace_id: Parent workspace UUID.
name: Project name.
slug: Optional URL slug.
description: Optional description.
is_default: Whether this is the workspace's default project.
settings: Optional project settings.
**kwargs: Optional creation settings.
Returns:
Created project.
"""
settings_dict = self._settings_to_dict(settings) if settings else {}
merged = _merge_project_create_options(kwargs)
settings_dict = self._settings_to_dict(merged.settings) if merged.settings else {}
model = ProjectModel(
id=project_id,
workspace_id=workspace_id,
name=name,
slug=slug,
description=description,
is_default=is_default,
slug=merged.slug,
description=merged.description,
is_default=merged.is_default,
settings=settings_dict,
metadata_={},
)
@@ -427,3 +421,11 @@ class SqlAlchemyProjectRepository(
stmt = select(func.count()).select_from(ProjectModel).where(and_(*conditions))
result = await self._session.execute(stmt)
return result.scalar() or 0
def _merge_project_create_options(kwargs: ProjectCreateKwargs) -> ProjectCreateOptions:
"""Normalize project creation options from keyword args."""
return ProjectCreateOptions(
slug=kwargs.get("slug"),
description=kwargs.get("description"),
is_default=kwargs.get("is_default", False),
settings=kwargs.get("settings"),
)

View File

@@ -3,7 +3,7 @@
from __future__ import annotations
from collections.abc import Sequence
from typing import cast
from typing import Unpack, cast
from uuid import UUID
from sqlalchemy import and_, select
@@ -26,6 +26,7 @@ from noteflow.domain.identity import (
WorkspaceRole,
WorkspaceSettings,
)
from noteflow.domain.ports.repositories.identity._workspace import WorkspaceCreateKwargs
from noteflow.domain.value_objects import ExportFormat
from noteflow.infrastructure.persistence.models import (
DEFAULT_WORKSPACE_ID,
@@ -247,9 +248,7 @@ class SqlAlchemyWorkspaceRepository(
workspace_id: UUID,
name: str,
owner_id: UUID,
slug: str | None = None,
is_default: bool = False,
settings: WorkspaceSettings | None = None,
**kwargs: Unpack[WorkspaceCreateKwargs],
) -> Workspace:
"""Create a new workspace with owner membership.
@@ -257,13 +256,14 @@ class SqlAlchemyWorkspaceRepository(
workspace_id: UUID for the new workspace.
name: Workspace name.
owner_id: User UUID of the owner.
slug: Optional URL slug.
is_default: Whether this is the user's default workspace.
settings: Optional workspace settings.
**kwargs: Optional fields (slug, is_default, settings).
Returns:
Created workspace.
"""
slug = kwargs.get("slug")
is_default = kwargs.get("is_default", False)
settings = kwargs.get("settings")
model = WorkspaceModel(
id=workspace_id,
name=name,

View File

@@ -12,6 +12,7 @@ from noteflow.infrastructure.converters.integration_converters import (
IntegrationConverter,
SyncRunConverter,
)
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.models.integrations import (
IntegrationModel,
IntegrationSecretModel,
@@ -23,6 +24,8 @@ from noteflow.infrastructure.persistence.repositories._base import (
GetByIdMixin,
)
logger = get_logger(__name__)
class SqlAlchemyIntegrationRepository(
BaseRepository,
@@ -98,6 +101,12 @@ class SqlAlchemyIntegrationRepository(
kwargs = IntegrationConverter.to_orm_kwargs(integration)
model = IntegrationModel(**kwargs)
await self._add_and_flush(model)
logger.info(
"integration_created",
integration_id=str(model.id),
integration_type=model.type,
name=model.name,
)
return IntegrationConverter.orm_to_domain(model)
async def update(self, integration: Integration) -> Integration:
@@ -128,6 +137,11 @@ class SqlAlchemyIntegrationRepository(
model.updated_at = integration.updated_at
await self._session.flush()
logger.info(
"integration_updated",
integration_id=str(integration.id),
status=integration.status.value,
)
return IntegrationConverter.orm_to_domain(model)
async def delete(self, integration_id: UUID) -> bool:
@@ -195,6 +209,11 @@ class SqlAlchemyIntegrationRepository(
for key, value in secrets.items()
]
await self._add_all_and_flush(models)
logger.info(
"integration_secrets_updated",
integration_id=str(integration_id),
secret_count=len(secrets),
)
async def list_by_type(self, integration_type: str) -> Sequence[Integration]:
"""List integrations by type.
@@ -237,6 +256,11 @@ class SqlAlchemyIntegrationRepository(
kwargs = SyncRunConverter.to_orm_kwargs(sync_run)
model = IntegrationSyncRunModel(**kwargs)
await self._add_and_flush(model)
logger.info(
"sync_run_created",
sync_run_id=str(model.id),
integration_id=str(sync_run.integration_id),
)
return SyncRunConverter.orm_to_domain(model)
async def get_sync_run(self, sync_run_id: UUID) -> SyncRun | None:
@@ -282,6 +306,12 @@ class SqlAlchemyIntegrationRepository(
model.stats = sync_run.stats
await self._session.flush()
logger.info(
"sync_run_updated",
sync_run_id=str(sync_run.id),
status=sync_run.status.value,
duration_ms=sync_run.duration_ms,
)
return SyncRunConverter.orm_to_domain(model)
async def list_sync_runs(

View File

@@ -2,17 +2,22 @@
from collections.abc import Sequence
from datetime import datetime
from typing import Unpack
from uuid import UUID
from sqlalchemy import func, select
from noteflow.config.constants import ERROR_MSG_MEETING_PREFIX
from noteflow.domain.entities import Meeting
from noteflow.domain.ports.repositories.transcript import MeetingListKwargs
from noteflow.domain.value_objects import MeetingId, MeetingState
from noteflow.infrastructure.converters import OrmConverter
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.models import MeetingModel
from noteflow.infrastructure.persistence.repositories._base import BaseRepository
logger = get_logger(__name__)
class SqlAlchemyMeetingRepository(BaseRepository):
"""SQLAlchemy implementation of MeetingRepository."""
@@ -41,6 +46,7 @@ class SqlAlchemyMeetingRepository(BaseRepository):
)
self._session.add(model)
await self._session.flush()
logger.info("meeting_created", meeting_id=str(meeting.id))
return meeting
async def get(self, meeting_id: MeetingId) -> Meeting | None:
@@ -91,6 +97,7 @@ class SqlAlchemyMeetingRepository(BaseRepository):
meeting.version = model.version
await self._session.flush()
logger.info("meeting_updated", meeting_id=str(meeting.id), version=meeting.version)
return meeting
async def delete(self, meeting_id: MeetingId) -> bool:
@@ -106,30 +113,31 @@ class SqlAlchemyMeetingRepository(BaseRepository):
model = await self._execute_scalar(stmt)
if model is None:
logger.debug("meeting_delete_not_found", meeting_id=str(meeting_id))
return False
await self._delete_and_flush(model)
logger.info("meeting_deleted", meeting_id=str(meeting_id))
return True
async def list_all(
self,
states: list[MeetingState] | None = None,
limit: int = 100,
offset: int = 0,
sort_desc: bool = True,
project_id: UUID | None = None,
**kwargs: Unpack[MeetingListKwargs],
) -> tuple[Sequence[Meeting], int]:
"""List meetings with optional filtering.
Args:
states: Optional list of states to filter by.
limit: Maximum number of meetings to return.
offset: Number of meetings to skip.
sort_desc: Sort by created_at descending if True.
**kwargs: Optional filters (states, limit, offset, sort_desc, project_id).
Returns:
Tuple of (meetings list, total count matching filter).
"""
states = kwargs.get("states")
limit = kwargs.get("limit", 100)
offset = kwargs.get("offset", 0)
sort_desc = kwargs.get("sort_desc", True)
project_id = kwargs.get("project_id")
# Build base query
stmt = select(MeetingModel)

View File

@@ -8,9 +8,12 @@ from datetime import datetime
from sqlalchemy import func, select
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.models import UserPreferencesModel
from noteflow.infrastructure.persistence.repositories._base import BaseRepository
logger = get_logger(__name__)
@dataclass(frozen=True)
class PreferenceWithMetadata:
@@ -77,10 +80,12 @@ class SqlAlchemyPreferencesRepository(BaseRepository):
if model is None:
model = UserPreferencesModel(key=key, value={"value": value})
self._session.add(model)
await self._session.flush()
logger.info("preference_created", key=key)
else:
model.value = {"value": value}
await self._session.flush()
await self._session.flush()
logger.info("preference_updated", key=key)
async def delete(self, key: str) -> bool:
"""Delete a preference.
@@ -97,6 +102,7 @@ class SqlAlchemyPreferencesRepository(BaseRepository):
return False
await self._delete_and_flush(model)
logger.info("preference_deleted", key=key)
return True
async def get_all(self, keys: list[str] | None = None) -> dict[str, object]:

View File

@@ -8,9 +8,12 @@ from sqlalchemy import func, select, update
from noteflow.domain.entities import Segment
from noteflow.domain.value_objects import MeetingId
from noteflow.infrastructure.converters import OrmConverter
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.models import SegmentModel, WordTimingModel
from noteflow.infrastructure.persistence.repositories._base import BaseRepository
logger = get_logger(__name__)
class SqlAlchemySegmentRepository(BaseRepository):
"""SQLAlchemy implementation of SegmentRepository."""
@@ -66,6 +69,7 @@ class SqlAlchemySegmentRepository(BaseRepository):
# Update segment with db_id
segment.db_id = model.id
segment.meeting_id = meeting_id
logger.info("segment_added", meeting_id=str(meeting_id), segment_id=segment.segment_id)
return segment
async def add_batch(
@@ -102,6 +106,7 @@ class SqlAlchemySegmentRepository(BaseRepository):
segment.db_id = model.id
segment.meeting_id = meeting_id
logger.info("segments_batch_added", meeting_id=str(meeting_id), count=len(segments))
return list(segments)
async def get_by_meeting(

Some files were not shown because too many files have changed in this diff Show More