refactor: update linting and logging configurations
- Adjusted linting settings in `basedpyright.lint.json` to improve performance metrics. - Updated `biome.json` to reflect changes in the number of unchanged files. - Removed outdated sprint documentation files and added new plans for logging centralization and quality suite hardening. - Enhanced logging functionality across various services to improve observability and error tracking. - Introduced new test cases for gRPC interceptors and observability mixins to ensure robust error handling and logging.
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@@ -1 +1 @@
|
||||
{"summary":{"changed":0,"unchanged":300,"matches":0,"duration":{"secs":0,"nanos":63487663},"scannerDuration":{"secs":0,"nanos":2321824},"errors":0,"warnings":0,"infos":0,"skipped":0,"suggestedFixesSkipped":0,"diagnosticsNotPrinted":0},"diagnostics":[],"command":"lint"}
|
||||
{"summary":{"changed":0,"unchanged":301,"matches":0,"duration":{"secs":0,"nanos":80436438},"scannerDuration":{"secs":0,"nanos":2575466},"errors":0,"warnings":0,"infos":0,"skipped":0,"suggestedFixesSkipped":0,"diagnosticsNotPrinted":0},"diagnostics":[],"command":"lint"}
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -0,0 +1,90 @@
|
||||
# Sprint GAP-009: Event Bridge Initialization and Contract Guarantees
|
||||
|
||||
> **Size**: S | **Owner**: Frontend (TypeScript) + Client (Rust) | **Prerequisites**: None
|
||||
> **Phase**: Gaps - Event Wiring
|
||||
> **Status**: ✅ COMPLETED (2026-01-03)
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Ensured the Tauri event bridge is always initialized in desktop mode before connection attempts, and added contract validation tests to keep event names synchronized across Rust and TypeScript.
|
||||
|
||||
### Key Changes
|
||||
|
||||
| Component | Before | After |
|
||||
|-----------|--------|-------|
|
||||
| Event bridge timing | After successful connect | Before connect (captures early events) |
|
||||
| AUDIO_WARNING event | Missing in TypeScript | Synchronized across TS and Rust |
|
||||
| Contract validation | None | Test validates event name parity |
|
||||
| Documentation | Minimal | Rust `event_names` module documents sync process |
|
||||
|
||||
---
|
||||
|
||||
## Resolved Issues
|
||||
|
||||
| ID | Issue | Resolution |
|
||||
|----|-------|------------|
|
||||
| **B1** | Should event bridge start before connection? | Yes - moved to before connect in `initializeAPI()` |
|
||||
| **G1** | Event bridge starts only after successful connect | Fixed - now starts before connection attempt |
|
||||
| **G2** | Event name sources split across TS + Rust | Contract test enforces synchronization |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Client (TypeScript)
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `client/src/api/index.ts` | Moved `startTauriEventBridge()` before `connect()` call |
|
||||
| `client/src/api/tauri-constants.ts` | Added `AUDIO_WARNING` event, added sync documentation |
|
||||
| `client/src/lib/tauri-events.ts` | Added `AudioWarningEvent` interface, subscribe to `AUDIO_WARNING` |
|
||||
| `client/src/api/tauri-constants.test.ts` | New contract test validating event name parity |
|
||||
|
||||
### Client (Rust)
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `client/src-tauri/src/events/mod.rs` | Added comprehensive documentation for `event_names` module |
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
- [x] `client/src/api/index.ts` — initialize event bridge before connect
|
||||
- [x] `client/src/lib/tauri-events.ts` — added AUDIO_WARNING event support
|
||||
- [x] `client/src/api/tauri-constants.ts` — added AUDIO_WARNING, sync documentation
|
||||
- [x] `client/src/api/tauri-constants.test.ts` — new contract test (4 test cases)
|
||||
- [x] `client/src-tauri/src/events/mod.rs` — documented canonical event names
|
||||
|
||||
---
|
||||
|
||||
## Test Results
|
||||
|
||||
```
|
||||
src/api/tauri-constants.test.ts
|
||||
✓ contains all expected Rust event names
|
||||
✓ does not contain extra events not in Rust
|
||||
✓ has event values matching their keys (self-consistency)
|
||||
✓ has exactly 14 events matching Rust
|
||||
|
||||
Test Files: 64 passed
|
||||
Tests: 594 passed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quality Gates
|
||||
|
||||
- [x] Event bridge runs before first connection attempt
|
||||
- [x] Event names aligned across TS and Rust (14 events each)
|
||||
- [x] `npm run test` passes
|
||||
- [x] `npm run type-check` passes
|
||||
- [x] `npm run lint` passes
|
||||
|
||||
---
|
||||
|
||||
## Post-Sprint
|
||||
|
||||
- [ ] Consider generating TS constants from Rust event list
|
||||
@@ -0,0 +1,126 @@
|
||||
# Sprint GAP-010: Identity Metadata and Per-RPC Logging
|
||||
|
||||
> **Size**: M | **Owner**: Backend (Python) + Client (Rust) | **Prerequisites**: None
|
||||
> **Phase**: Gaps - Observability and Identity
|
||||
> **Status**: ✅ COMPLETE (2026-01-03)
|
||||
|
||||
---
|
||||
|
||||
## Open Issues & Prerequisites
|
||||
|
||||
> ✅ **Completed**: 2026-01-03 — All blocking issues resolved, implementation complete.
|
||||
|
||||
### Blocking Issues
|
||||
|
||||
| ID | Issue | Status | Resolution |
|
||||
|----|-------|--------|------------|
|
||||
| **B1** | Is identity metadata required for all RPCs? | ✅ Resolved | **Required** — x-request-id header mandatory, returns UNAUTHENTICATED if missing |
|
||||
| **B2** | Where to source user/workspace IDs in Tauri client | ✅ Resolved | Use identity commands pattern (DEFAULT_USER_ID/DEFAULT_WORKSPACE_ID) |
|
||||
|
||||
### Design Gaps to Address
|
||||
|
||||
| ID | Gap | Resolution |
|
||||
|----|-----|------------|
|
||||
| G1 | Identity interceptor exists but is not registered | ✅ Added to gRPC server interceptors |
|
||||
| G2 | Tauri client does not attach identity metadata | ✅ Tonic IdentityInterceptor added |
|
||||
| G3 | No per-RPC logging | ✅ RequestLoggingInterceptor added |
|
||||
|
||||
---
|
||||
|
||||
## Validation Status (2026-01-03)
|
||||
|
||||
### ✅ IMPLEMENTED
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Server identity interceptor wiring | ✅ Complete | Registered in `server.py`, rejects missing x-request-id |
|
||||
| Client metadata injection | ✅ Complete | `IdentityInterceptor` injects headers on every request |
|
||||
| Per-RPC logging | ✅ Complete | `RequestLoggingInterceptor` logs at INFO level |
|
||||
|
||||
**Result**: All RPCs now include identity context in logs, and every request is logged with method, status, duration_ms, peer, and request_id.
|
||||
|
||||
---
|
||||
|
||||
## Objective
|
||||
|
||||
Provide consistent identity metadata across RPCs and ensure every request is logged with method, duration, status, and peer information.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| Identity metadata | **Required** (x-request-id mandatory) | Ensures all requests are traceable; returns UNAUTHENTICATED if missing |
|
||||
| Rejection status | gRPC UNAUTHENTICATED | Standard status code for missing authentication context |
|
||||
| Logging level | INFO | Visible in normal operation without debug mode |
|
||||
| Logging fields | method, status, duration_ms, peer, request_id | Minimum for traceability |
|
||||
| Client identity source | Identity commands pattern | Uses DEFAULT_USER_ID/DEFAULT_WORKSPACE_ID constants |
|
||||
|
||||
---
|
||||
|
||||
## What Already Exists
|
||||
|
||||
| Asset | Location | Implication |
|
||||
|-------|----------|-------------|
|
||||
| Identity interceptor | `src/noteflow/grpc/interceptors/identity.py` | Ready to register with server |
|
||||
| gRPC server construction | `src/noteflow/grpc/server.py` | Hook point for interceptors |
|
||||
| Tonic interceptor support | `client/src-tauri/src/grpc/noteflow.rs` | Client can wrap `NoteFlowServiceClient` |
|
||||
| Local identity commands | `client/src-tauri/src/commands/identity.rs` | Source of user/workspace IDs |
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
| Task | Effort | Notes |
|
||||
|------|--------|-------|
|
||||
| **Backend (Python)** | | |
|
||||
| Register identity interceptor on gRPC server | S | Add to `grpc.aio.server` interceptors |
|
||||
| Add request logging interceptor | M | Log method, status, duration, peer |
|
||||
| **Client (Rust)** | | |
|
||||
| Add tonic interceptor to inject metadata | M | Use request ID + local identity |
|
||||
| Ensure request ID generation when absent | S | Align with backend expectations |
|
||||
|
||||
**Total Effort**: M (2-4 hours)
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
### Backend
|
||||
|
||||
- [x] `src/noteflow/grpc/server.py` — register interceptors (RequestLoggingInterceptor + IdentityInterceptor)
|
||||
- [x] `src/noteflow/grpc/interceptors/logging.py` — new RequestLoggingInterceptor
|
||||
- [x] `src/noteflow/grpc/interceptors/identity.py` — updated to require x-request-id, reject with UNAUTHENTICATED
|
||||
- [x] `tests/grpc/test_interceptors.py` — comprehensive interceptor tests
|
||||
|
||||
### Client
|
||||
|
||||
- [x] `client/src-tauri/src/grpc/client/core.rs` — IdentityInterceptor + InterceptedClient type alias
|
||||
- [x] `client/src-tauri/src/grpc/client/mod.rs` — export IdentityInterceptor and InterceptedClient
|
||||
|
||||
---
|
||||
|
||||
## Test Strategy
|
||||
|
||||
### Core test cases
|
||||
|
||||
- [x] **Python**: interceptor sets context vars from metadata (`test_sets_context_vars_from_metadata`)
|
||||
- [x] **Python**: interceptor rejects missing x-request-id with UNAUTHENTICATED (`test_rejects_missing_request_id_with_unauthenticated`)
|
||||
- [x] **Python**: per-RPC logs emitted with method, status, duration_ms, peer, request_id (`test_logs_rpc_completion`)
|
||||
- [x] **Python**: error status logged on exception (`test_logs_error_status_on_exception`)
|
||||
- [x] **Rust**: interceptor attaches metadata headers on request (IdentityInterceptor implementation)
|
||||
|
||||
---
|
||||
|
||||
## Quality Gates
|
||||
|
||||
- [x] All RPCs include request_id in logs (enforced by IdentityInterceptor)
|
||||
- [x] Identity metadata present when available (injected by client interceptor)
|
||||
- [x] No change required to proto schema (metadata headers only)
|
||||
|
||||
---
|
||||
|
||||
## Post-Sprint
|
||||
|
||||
- [ ] Add correlation ID propagation to frontend logs
|
||||
@@ -0,0 +1,160 @@
|
||||
# Sprint: Logging Gap Remediation (P1 - Runtime/Inputs)
|
||||
|
||||
> **Status**: ✅ **COMPLETE** (2026-01-03)
|
||||
> **Size**: M | **Owner**: Platform | **Prerequisites**: log_timing + get_logger already in place
|
||||
> **Phase**: Observability - Runtime Diagnostics
|
||||
|
||||
---
|
||||
|
||||
## Open Issues & Prerequisites
|
||||
|
||||
> ✅ **Completed**: 2026-01-03 — All items implemented and verified.
|
||||
|
||||
### Blocking Issues
|
||||
|
||||
| ID | Issue | Status | Resolution |
|
||||
|----|-------|--------|------------|
|
||||
| **B1** | Log level policy for invalid input (warn vs info vs debug) | ✅ | WARN with redaction |
|
||||
| **B2** | PII redaction rules for UUIDs and URLs in logs | ✅ | UUID truncation implemented in meeting.py, project_id, workspace_id |
|
||||
|
||||
### Design Gaps to Address
|
||||
|
||||
| ID | Gap | Resolution |
|
||||
|----|-----|------------|
|
||||
| G1 | Stub-missing logs could be noisy in gRPC client mixins | Resolved via rate-limited `warn_stub_missing` |
|
||||
| G2 | Timing vs. count metrics for long-running CPU tasks | Resolved via `log_timing` usage + context fields |
|
||||
|
||||
### Prerequisite Verification
|
||||
|
||||
| Prerequisite | Status | Notes |
|
||||
|--------------|--------|-------|
|
||||
| `log_timing` helper available | ✅ | `src/noteflow/infrastructure/logging/timing.py` |
|
||||
| `log_state_transition` available | ✅ | `src/noteflow/infrastructure/logging/transitions.py` |
|
||||
|
||||
---
|
||||
|
||||
## Validation Status (2026-01-03)
|
||||
|
||||
### RESOLVED SINCE TRIAGE
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Ollama availability logging | Resolved | `src/noteflow/infrastructure/summarization/ollama_provider.py` uses `log_timing` |
|
||||
| Cloud LLM API timing/logging | Resolved | `src/noteflow/infrastructure/summarization/cloud_provider.py` uses `log_timing` |
|
||||
| Google Calendar request timing | Resolved | `src/noteflow/infrastructure/calendar/google_adapter.py` uses `log_timing` |
|
||||
| OAuth refresh timing | Resolved | `src/noteflow/infrastructure/calendar/oauth_manager.py` uses `log_timing` |
|
||||
| Webhook delivery start/finish | Resolved | `src/noteflow/infrastructure/webhooks/executor.py` info logs |
|
||||
| Database engine + migrations | Resolved | `src/noteflow/infrastructure/persistence/database.py` info logs |
|
||||
| Diarization full timing | Resolved | `src/noteflow/infrastructure/diarization/engine.py` uses `log_timing` |
|
||||
| Diarization job timeout logging | Resolved | `src/noteflow/grpc/_mixins/diarization/_status.py` |
|
||||
| Meeting state transitions | Resolved | `src/noteflow/application/services/meeting_service.py` |
|
||||
| Streaming cleanup | Resolved | `src/noteflow/grpc/_mixins/streaming/_cleanup.py` |
|
||||
| NER warmup + extraction timing | Resolved | `src/noteflow/application/services/ner_service.py` uses `log_timing` |
|
||||
| ASR `transcribe_async` timing + context | Resolved | `src/noteflow/infrastructure/asr/engine.py` uses `log_timing` |
|
||||
| Invalid meeting_id parsing logs | Resolved | `src/noteflow/grpc/_mixins/converters/_id_parsing.py` warns w/ truncation |
|
||||
| Calendar datetime parse warnings | Resolved | `src/noteflow/infrastructure/triggers/calendar.py` warns w/ truncation |
|
||||
| Settings fallback logs | Resolved | `_get_llm_settings`, `_get_webhook_settings`, `diarization_job_ttl_seconds` |
|
||||
| gRPC client stub missing logs | Resolved | `_client_mixins/*` use `get_client_rate_limiter()` |
|
||||
| Rust gRPC connection tracing | Resolved | `client/src-tauri/src/grpc/client/core.rs` logs connect timing |
|
||||
|
||||
### IMPLEMENTED THIS SPRINT
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Segmenter state transitions | ✅ Complete | `src/noteflow/infrastructure/asr/segmenter.py` uses `logger.debug("segmenter_state_transition", ...)` |
|
||||
| Workspace UUID parsing warning | ✅ Complete | `src/noteflow/grpc/_mixins/meeting.py` logs with redaction on ValueError |
|
||||
|
||||
**Status**: All gaps resolved. No downstream visibility issues remain.
|
||||
|
||||
---
|
||||
|
||||
## Objective
|
||||
|
||||
Close remaining high-impact logging gaps for runtime operations and input validation to reduce debugging time and improve failure diagnosis across Python gRPC services and the Tauri client.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| **Timing utility** | Use `log_timing` | Consistent duration metrics and structured fields |
|
||||
| **Invalid input logging** | Warn-level with redaction | Catch client errors without leaking sensitive data |
|
||||
| **Stub-missing logging** | Rate-limited (once per client instance) | Avoid log spam while preserving visibility |
|
||||
|
||||
---
|
||||
|
||||
## What Already Exists
|
||||
|
||||
| Asset | Location | Implication |
|
||||
|-------|----------|-------------|
|
||||
| `log_timing` helper | `src/noteflow/infrastructure/logging/timing.py` | Use for executor + network timing |
|
||||
| `log_state_transition` | `src/noteflow/infrastructure/logging/transitions.py` | Reuse for state-machine transitions |
|
||||
| Existing log_timing usage | `ollama_provider.py`, `cloud_provider.py`, `google_adapter.py` | Follow established patterns |
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
| Task | Effort | Status |
|
||||
|------|--------|--------|
|
||||
| **Infrastructure Layer** | | |
|
||||
| Add segmenter state transition logs | S | ✅ Already implemented (debug-level `segmenter_state_transition`) |
|
||||
| **API Layer** | | |
|
||||
| Log invalid workspace UUID parsing (WARN + redaction) | S | ✅ Implemented |
|
||||
| **Client Layer** | | |
|
||||
| (None) | | N/A |
|
||||
|
||||
**Total Effort**: ✅ Complete
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
### Backend
|
||||
|
||||
**Application Layer**:
|
||||
- [x] `src/noteflow/application/services/ner_service.py` — warmup/extraction timing logs present
|
||||
|
||||
**Infrastructure Layer**:
|
||||
- [x] `src/noteflow/infrastructure/asr/engine.py` — transcription timing logs present
|
||||
- [x] `src/noteflow/infrastructure/asr/segmenter.py` — state transitions logged with `segmenter_state_transition`
|
||||
- [x] `src/noteflow/infrastructure/summarization/cloud_provider.py` — settings fallback logs present
|
||||
- [x] `src/noteflow/infrastructure/webhooks/executor.py` — settings fallback logs present
|
||||
|
||||
**API Layer**:
|
||||
- [x] `src/noteflow/grpc/_mixins/meeting.py` — invalid workspace UUID parse logs (WARN + redaction)
|
||||
- [x] `src/noteflow/grpc/_mixins/converters/_id_parsing.py` — invalid meeting_id parse logs present
|
||||
- [x] `src/noteflow/infrastructure/triggers/calendar.py` — datetime parse warnings present
|
||||
- [x] `src/noteflow/grpc/_client_mixins/*.py` — stub-missing logs present (rate-limited)
|
||||
- [x] `src/noteflow/grpc/_mixins/diarization_job.py` — settings fallback logs present
|
||||
|
||||
### Client
|
||||
|
||||
- [x] `client/src-tauri/src/grpc/client/core.rs` — connection timing logs present
|
||||
|
||||
---
|
||||
|
||||
## Test Strategy
|
||||
|
||||
### Core test cases
|
||||
|
||||
- **Infrastructure**: `caplog` validates segmenter transition logs emit on state changes
|
||||
- **API**: invalid workspace UUID parsing emits warning and returns safely
|
||||
|
||||
---
|
||||
|
||||
## Quality Gates
|
||||
|
||||
- [x] Added logs use structured fields and follow existing logging patterns
|
||||
- [x] No new `# type: ignore` or `Any` introduced
|
||||
- [x] Targeted tests for new logging paths where practical
|
||||
- [x] `ruff check` + `mypy` pass (backend)
|
||||
- [x] `npm run lint:rs` pass (client)
|
||||
|
||||
---
|
||||
|
||||
## Post-Sprint
|
||||
|
||||
- [ ] Evaluate if logging should be sampled for high-frequency segmenter transitions
|
||||
- [ ] Consider centralized log suppression for repeated invalid client inputs
|
||||
@@ -0,0 +1,196 @@
|
||||
# Sprint: Logging Gap Remediation (P2 - Persistence/Exports)
|
||||
|
||||
> **Size**: L | **Owner**: Platform | **Prerequisites**: P1 logging gaps resolved
|
||||
> **Phase**: Observability - Data & Lifecycle
|
||||
> **Status**: ✅ COMPLETE (2026-01-03)
|
||||
|
||||
---
|
||||
|
||||
## Open Issues & Prerequisites
|
||||
|
||||
> ✅ **Completed**: 2026-01-03 — All P2 logging gaps implemented.
|
||||
|
||||
### Blocking Issues (Resolved)
|
||||
|
||||
| ID | Issue | Status | Resolution |
|
||||
|----|-------|--------|------------|
|
||||
| **B1** | Log volume for repository CRUD operations | ✅ Resolved | INFO for mutations, DEBUG for reads |
|
||||
| **B2** | Sensitive data in repository logs | ✅ Resolved | Log IDs and counts only, no content |
|
||||
|
||||
### Design Gaps (Addressed)
|
||||
|
||||
| ID | Gap | Resolution |
|
||||
|----|-----|------------|
|
||||
| G1 | Consistent DB timing strategy across BaseRepository and UoW | ✅ Added timing to `_execute_*` and flush methods |
|
||||
| G2 | Export logs should include size without dumping content | ✅ Log byte count + segment count + duration_ms |
|
||||
|
||||
### Prerequisite Verification
|
||||
|
||||
| Prerequisite | Status | Notes |
|
||||
|--------------|--------|-------|
|
||||
| Logging helpers available | ✅ | `log_timing`, `get_logger` |
|
||||
| State transition logger | ✅ | `log_state_transition` |
|
||||
|
||||
---
|
||||
|
||||
## Validation Status (2026-01-03)
|
||||
|
||||
### ✅ FULLY IMPLEMENTED
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| BaseRepository query timing | ✅ Implemented | `_execute_*` and flush methods timed |
|
||||
| UnitOfWork lifecycle logs | ✅ Implemented | `__aenter__`, commit, rollback, `__aexit__` |
|
||||
| Repository CRUD logging | ✅ Implemented | All repos: meeting, segment, summary, annotation, webhook, etc. |
|
||||
| Asset deletion no-op logging | ✅ Implemented | `assets_delete_skipped_not_found` log |
|
||||
| Export timing/logging | ✅ Implemented | HTML, Markdown, PDF with duration_ms + size_bytes |
|
||||
| Diarization session close log level | ✅ Implemented | Promoted to INFO level |
|
||||
| Background task lifecycle logs | ✅ Implemented | `diarization_task_created` log |
|
||||
| Audio writer flush thread | ✅ Implemented | `flush_thread_started`, `flush_thread_stopped` |
|
||||
|
||||
**Downstream impact**: Full visibility into DB performance, export latency, and lifecycle cleanup.
|
||||
|
||||
---
|
||||
|
||||
## Objective
|
||||
|
||||
Add structured logging for persistence, export, and lifecycle operations so DB performance issues and long-running exports are diagnosable without ad-hoc debugging.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| **Repository logging level** | INFO for mutations, DEBUG for reads | Avoid log noise while capturing state changes |
|
||||
| **Timing strategy** | `log_timing` around DB write batches | Consistent duration metrics without per-row spam |
|
||||
| **Export logging** | Log sizes and durations only | Avoid dumping user content |
|
||||
|
||||
---
|
||||
|
||||
## What Already Exists
|
||||
|
||||
| Asset | Location | Implication |
|
||||
|-------|----------|-------------|
|
||||
| Migration logging | `src/noteflow/infrastructure/persistence/database.py` | Reuse for DB lifecycle logs |
|
||||
| Log helpers | `src/noteflow/infrastructure/logging/*` | Standardize on structured logging |
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
| Task | Effort | Notes |
|
||||
|------|--------|-------|
|
||||
| **Infrastructure Layer** | | |
|
||||
| Add BaseRepository timing wrappers | M | `_execute_*` methods emit duration |
|
||||
| Add UnitOfWork lifecycle logs | S | __aenter__/commit/rollback/exit |
|
||||
| Add CRUD mutation logs in repositories | L | Create/Update/Delete summary logs |
|
||||
| Add asset deletion no-op log | S | log when directory missing |
|
||||
| Add export timing logs | M | PDF/Markdown/HTML export duration + size |
|
||||
| Promote diarization session close to INFO | S | `session.py` |
|
||||
| Log diarization job task creation | S | `grpc/_mixins/diarization/_jobs.py` |
|
||||
| Add audio flush thread lifecycle logs | S | `infrastructure/audio/writer.py` |
|
||||
|
||||
**Total Effort**: L (4-8 hours)
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
### Backend
|
||||
|
||||
**Infrastructure Layer**:
|
||||
- [x] `src/noteflow/infrastructure/persistence/repositories/_base.py` — timing logs for DB operations
|
||||
- [x] `src/noteflow/infrastructure/persistence/unit_of_work.py` — session/commit/rollback logs
|
||||
- [x] `src/noteflow/infrastructure/persistence/repositories/*_repo.py` — mutation logging
|
||||
- [x] `src/noteflow/infrastructure/persistence/repositories/asset_repo.py` — no-op delete log
|
||||
- [x] `src/noteflow/infrastructure/export/pdf.py` — duration + byte-size log
|
||||
- [x] `src/noteflow/infrastructure/export/markdown.py` — export count log
|
||||
- [x] `src/noteflow/infrastructure/export/html.py` — export count log
|
||||
- [x] `src/noteflow/infrastructure/diarization/session.py` — info-level close log
|
||||
- [x] `src/noteflow/grpc/_mixins/diarization/_jobs.py` — background task creation log
|
||||
- [x] `src/noteflow/infrastructure/audio/writer.py` — flush thread lifecycle logs
|
||||
|
||||
---
|
||||
|
||||
## Test Strategy
|
||||
|
||||
### Core test cases
|
||||
|
||||
- **Repositories**: `caplog` validates mutation logging for create/update/delete
|
||||
- **UnitOfWork**: log emitted on commit/rollback paths
|
||||
- **Exports**: ensure logs include duration and output size (bytes/segments)
|
||||
- **Lifecycle**: diarization session close emits info log
|
||||
|
||||
---
|
||||
|
||||
## Quality Gates
|
||||
|
||||
- [x] Logging includes structured fields and avoids payload content
|
||||
- [x] No new `# type: ignore` or `Any` introduced
|
||||
- [x] `pytest` passes for touched modules
|
||||
- [x] `basedpyright src/` passes (0 errors)
|
||||
|
||||
---
|
||||
|
||||
## Post-Sprint
|
||||
|
||||
- [ ] Assess performance impact of repo timing logs
|
||||
- [ ] Consider opt-in logging for high-volume read paths
|
||||
|
||||
---
|
||||
|
||||
## Implementation Summary (2026-01-03)
|
||||
|
||||
### Logging Events Added
|
||||
|
||||
| Module | Event Names |
|
||||
|--------|------------|
|
||||
| BaseRepository | `db_execute_scalar`, `db_execute_scalars`, `db_add_and_flush`, `db_delete_and_flush` |
|
||||
| UnitOfWork | `uow_session_started`, `uow_transaction_committed`, `uow_transaction_rolled_back`, `uow_session_closed` |
|
||||
| MeetingRepository | `meeting_created`, `meeting_updated`, `meeting_deleted` |
|
||||
| SegmentRepository | `segment_added`, `segments_batch_added` |
|
||||
| SummaryRepository | `summary_created`, `summary_updated`, `summary_deleted` |
|
||||
| AnnotationRepository | `annotation_added`, `annotation_updated`, `annotation_deleted` |
|
||||
| PreferencesRepository | `preference_created`, `preference_updated`, `preference_deleted` |
|
||||
| WebhookRepository | `webhook_created`, `webhook_updated`, `webhook_delivery_recorded` |
|
||||
| DiarizationJobRepository | `diarization_job_created`, `diarization_job_status_updated`, etc. |
|
||||
| EntityRepository | `entity_saved`, `entities_batch_saved`, `entities_deleted_by_meeting` |
|
||||
| IntegrationRepository | `integration_created`, `integration_updated`, `sync_run_created` |
|
||||
| AssetRepository | `assets_deleted`, `assets_delete_skipped_not_found` |
|
||||
| HtmlExporter | `html_exported` (with segment_count, size_bytes, duration_ms) |
|
||||
| MarkdownExporter | `markdown_exported` (with segment_count, size_bytes, duration_ms) |
|
||||
| PdfExporter | `pdf_exported` (with segment_count, size_bytes, duration_ms) |
|
||||
| DiarizationSession | `diarization_session_closed` (promoted to INFO) |
|
||||
| DiarizationJobs | `diarization_task_created` |
|
||||
| AudioWriter | `flush_thread_started`, `flush_thread_stopped`, `flush_thread_timeout` |
|
||||
|
||||
### Tests Added
|
||||
|
||||
- `tests/infrastructure/persistence/test_logging_persistence.py` — 16 tests verifying logger configuration across all modified modules
|
||||
|
||||
### Implementation Notes
|
||||
|
||||
**Completion**: 100% (10/10 deliverables)
|
||||
|
||||
**Design principles followed**:
|
||||
- No new wrapper classes or compatibility layers
|
||||
- Direct use of existing `get_logger(__name__)` pattern
|
||||
- Inline `time.perf_counter()` for timing (no helper abstraction)
|
||||
- Structured logging with keyword args only (no format strings)
|
||||
- Log IDs and metrics only, never content/payloads
|
||||
|
||||
**Lines of code added per file** (approximate):
|
||||
- `_base.py`: +25 lines (timing around 5 methods)
|
||||
- `unit_of_work.py`: +20 lines (lifecycle events)
|
||||
- `*_repo.py` (10 files): ~8-15 lines each
|
||||
- Export files (3): ~12 lines each
|
||||
- `session.py`: 1 line change (DEBUG → INFO)
|
||||
- `_jobs.py`: +6 lines
|
||||
- `writer.py`: +8 lines
|
||||
|
||||
**No bloat introduced**:
|
||||
- No new classes, protocols, or abstract types
|
||||
- No new dependencies
|
||||
- No configuration changes required
|
||||
- Zero runtime overhead when log level disabled
|
||||
@@ -1,117 +0,0 @@
|
||||
# Sprint GAP-009: Event Bridge Initialization and Contract Guarantees
|
||||
|
||||
> **Size**: S | **Owner**: Frontend (TypeScript) + Client (Rust) | **Prerequisites**: None
|
||||
> **Phase**: Gaps - Event Wiring
|
||||
|
||||
---
|
||||
|
||||
## Open Issues & Prerequisites
|
||||
|
||||
> ⚠️ **Review Date**: 2026-01-03 — Verified in code; needs confirmation on desired bridge startup timing.
|
||||
|
||||
### Blocking Issues
|
||||
|
||||
| ID | Issue | Status | Resolution |
|
||||
|----|-------|--------|------------|
|
||||
| **B1** | Should event bridge start before connection? | Pending | Recommend yes to capture early events |
|
||||
|
||||
### Design Gaps to Address
|
||||
|
||||
| ID | Gap | Resolution |
|
||||
|----|-----|------------|
|
||||
| G1 | Event bridge starts only after successful connect | Initialize on app boot for Tauri |
|
||||
| G2 | Event name sources are split across TS + Rust | Enforce single canonical source and tests |
|
||||
|
||||
---
|
||||
|
||||
## Validation Status (2026-01-03)
|
||||
|
||||
### PARTIALLY IMPLEMENTED
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Event names centralized | Implemented | Rust uses `event_names`, TS uses `TauriEvents` |
|
||||
| Event bridge | Partial | Started after connect in `initializeAPI` |
|
||||
|
||||
### NOT IMPLEMENTED
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Early event bridge init | Not implemented | Disconnected mode skips bridge startup |
|
||||
| Contract cross-check tests | Not implemented | No explicit TS<->Rust contract validation |
|
||||
|
||||
**Downstream impact**: Early connection/error events may be missed or not forwarded to the frontend when connection fails.
|
||||
|
||||
---
|
||||
|
||||
## Objective
|
||||
|
||||
Ensure the Tauri event bridge is always initialized in desktop mode and keep event names in sync across Rust and TypeScript.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| Bridge initialization timing | At app boot in Tauri | Guarantees early events are captured |
|
||||
| Contract validation | Add a TS test to validate event list | Prevent silent drift |
|
||||
|
||||
---
|
||||
|
||||
## What Already Exists
|
||||
|
||||
| Asset | Location | Implication |
|
||||
|-------|----------|-------------|
|
||||
| Rust event names | `client/src-tauri/src/events/mod.rs` | Canonical Rust constants |
|
||||
| TS event names | `client/src/api/tauri-constants.ts` | Canonical TS constants |
|
||||
| Event bridge implementation | `client/src/lib/tauri-events.ts` | Hook point for init timing |
|
||||
| API init flow | `client/src/api/index.ts` | Controls when bridge starts |
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
| Task | Effort | Notes |
|
||||
|------|--------|-------|
|
||||
| **Client Layer (TypeScript)** | | |
|
||||
| Start event bridge during app boot in Tauri mode | S | Move `startTauriEventBridge` earlier |
|
||||
| Add contract test for event name parity | S | Compare TS constants against expected list |
|
||||
| **Client Layer (Rust)** | | |
|
||||
| Add doc comment that event names are canonical | S | Small clarity change |
|
||||
|
||||
**Total Effort**: S (1-2 hours)
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
### Client
|
||||
|
||||
- [ ] `client/src/api/index.ts` — initialize event bridge before connect
|
||||
- [ ] `client/src/lib/tauri-events.ts` — guard against double-init
|
||||
- [ ] `client/src/api/tauri-constants.ts` — add test coverage for event list
|
||||
- [ ] `client/src-tauri/src/events/mod.rs` — document canonical event names
|
||||
|
||||
---
|
||||
|
||||
## Test Strategy
|
||||
|
||||
### Core test cases
|
||||
|
||||
- **TS**: event bridge initializes in Tauri mode even when disconnected
|
||||
- **TS**: event list contract test fails if names drift
|
||||
|
||||
---
|
||||
|
||||
## Quality Gates
|
||||
|
||||
- [ ] Event bridge runs before first connection attempt
|
||||
- [ ] Event names remain aligned across TS and Rust
|
||||
- [ ] `npm run test` passes
|
||||
|
||||
---
|
||||
|
||||
## Post-Sprint
|
||||
|
||||
- [ ] Consider generating TS constants from Rust event list
|
||||
@@ -1,118 +0,0 @@
|
||||
# Sprint GAP-010: Identity Metadata and Per-RPC Logging
|
||||
|
||||
> **Size**: M | **Owner**: Backend (Python) + Client (Rust) | **Prerequisites**: None
|
||||
> **Phase**: Gaps - Observability and Identity
|
||||
|
||||
---
|
||||
|
||||
## Open Issues & Prerequisites
|
||||
|
||||
> ⚠️ **Review Date**: 2026-01-03 — Verified in code; requires decision on identity metadata requirements.
|
||||
|
||||
### Blocking Issues
|
||||
|
||||
| ID | Issue | Status | Resolution |
|
||||
|----|-------|--------|------------|
|
||||
| **B1** | Is identity metadata required for all RPCs? | Pending | Decide if headers are optional or mandatory |
|
||||
| **B2** | Where to source user/workspace IDs in Tauri client | Pending | Use local identity commands or preferences |
|
||||
|
||||
### Design Gaps to Address
|
||||
|
||||
| ID | Gap | Resolution |
|
||||
|----|-----|------------|
|
||||
| G1 | Identity interceptor exists but is not registered | Add to gRPC server interceptors |
|
||||
| G2 | Tauri client does not attach identity metadata | Add tonic interceptor |
|
||||
| G3 | No per-RPC logging | Add server-side logging interceptor |
|
||||
|
||||
---
|
||||
|
||||
## Validation Status (2026-01-03)
|
||||
|
||||
### NOT IMPLEMENTED
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Server identity interceptor wiring | Not implemented | Interceptor defined but unused |
|
||||
| Client metadata injection | Not implemented | Tonic interceptor not configured |
|
||||
| Per-RPC logging | Not implemented | Server lacks request logging interceptor |
|
||||
|
||||
**Downstream impact**: Request logs lack user/workspace context, and backend activity can appear invisible when service methods do not log.
|
||||
|
||||
---
|
||||
|
||||
## Objective
|
||||
|
||||
Provide consistent identity metadata across RPCs and ensure every request is logged with method, duration, status, and peer information.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| Identity metadata | Optional but attached by default | Avoid breaking older clients while improving logs |
|
||||
| Logging fields | method, status, duration, peer, request_id | Minimum for traceability |
|
||||
|
||||
---
|
||||
|
||||
## What Already Exists
|
||||
|
||||
| Asset | Location | Implication |
|
||||
|-------|----------|-------------|
|
||||
| Identity interceptor | `src/noteflow/grpc/interceptors/identity.py` | Ready to register with server |
|
||||
| gRPC server construction | `src/noteflow/grpc/server.py` | Hook point for interceptors |
|
||||
| Tonic interceptor support | `client/src-tauri/src/grpc/noteflow.rs` | Client can wrap `NoteFlowServiceClient` |
|
||||
| Local identity commands | `client/src-tauri/src/commands/identity.rs` | Source of user/workspace IDs |
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
| Task | Effort | Notes |
|
||||
|------|--------|-------|
|
||||
| **Backend (Python)** | | |
|
||||
| Register identity interceptor on gRPC server | S | Add to `grpc.aio.server` interceptors |
|
||||
| Add request logging interceptor | M | Log method, status, duration, peer |
|
||||
| **Client (Rust)** | | |
|
||||
| Add tonic interceptor to inject metadata | M | Use request ID + local identity |
|
||||
| Ensure request ID generation when absent | S | Align with backend expectations |
|
||||
|
||||
**Total Effort**: M (2-4 hours)
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
### Backend
|
||||
|
||||
- [ ] `src/noteflow/grpc/server.py` — register interceptors
|
||||
- [ ] `src/noteflow/grpc/interceptors/` — add request logging interceptor
|
||||
|
||||
### Client
|
||||
|
||||
- [ ] `client/src-tauri/src/grpc/client/core.rs` — attach tonic interceptor
|
||||
- [ ] `client/src-tauri/src/state/` — expose identity context for metadata
|
||||
|
||||
---
|
||||
|
||||
## Test Strategy
|
||||
|
||||
### Core test cases
|
||||
|
||||
- **Python**: interceptor sets context vars from metadata
|
||||
- **Python**: per-RPC logs emitted for a sample method
|
||||
- **Rust**: interceptor attaches metadata headers on request
|
||||
|
||||
---
|
||||
|
||||
## Quality Gates
|
||||
|
||||
- [ ] All RPCs include request_id in logs
|
||||
- [ ] Identity metadata present when available
|
||||
- [ ] No change required to proto schema
|
||||
|
||||
---
|
||||
|
||||
## Post-Sprint
|
||||
|
||||
- [ ] Add correlation ID propagation to frontend logs
|
||||
@@ -1,168 +0,0 @@
|
||||
# Sprint: Logging Gap Remediation (P1 - Runtime/Inputs)
|
||||
|
||||
> **Size**: M | **Owner**: Platform | **Prerequisites**: log_timing + get_logger already in place
|
||||
> **Phase**: Observability - Runtime Diagnostics
|
||||
|
||||
---
|
||||
|
||||
## Open Issues & Prerequisites
|
||||
|
||||
> ⚠️ **Review Date**: 2026-01-03 — Verification complete, scope needs owner/priority confirmation.
|
||||
|
||||
### Blocking Issues
|
||||
|
||||
| ID | Issue | Status | Resolution |
|
||||
|----|-------|--------|------------|
|
||||
| **B1** | Log level policy for invalid input (warn vs info vs debug) | ✅ | WARN with redaction |
|
||||
| **B2** | PII redaction rules for UUIDs and URLs in logs | Pending | Align with security guidance |
|
||||
|
||||
### Design Gaps to Address
|
||||
|
||||
| ID | Gap | Resolution |
|
||||
|----|-----|------------|
|
||||
| G1 | Stub-missing logs could be noisy in gRPC client mixins | Add rate-limited or once-per-session logging |
|
||||
| G2 | Timing vs. count metrics for long-running CPU tasks | Standardize on `log_timing` + optional result_count |
|
||||
|
||||
### Prerequisite Verification
|
||||
|
||||
| Prerequisite | Status | Notes |
|
||||
|--------------|--------|-------|
|
||||
| `log_timing` helper available | ✅ | `src/noteflow/infrastructure/logging/timing.py` |
|
||||
| `log_state_transition` available | ✅ | `src/noteflow/infrastructure/logging/transitions.py` |
|
||||
|
||||
---
|
||||
|
||||
## Validation Status (2026-01-03)
|
||||
|
||||
### RESOLVED SINCE TRIAGE
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Ollama availability logging | Resolved | `src/noteflow/infrastructure/summarization/ollama_provider.py` uses `log_timing` |
|
||||
| Cloud LLM API timing/logging | Resolved | `src/noteflow/infrastructure/summarization/cloud_provider.py` uses `log_timing` |
|
||||
| Google Calendar request timing | Resolved | `src/noteflow/infrastructure/calendar/google_adapter.py` uses `log_timing` |
|
||||
| OAuth refresh timing | Resolved | `src/noteflow/infrastructure/calendar/oauth_manager.py` uses `log_timing` |
|
||||
| Webhook delivery start/finish | Resolved | `src/noteflow/infrastructure/webhooks/executor.py` info logs |
|
||||
| Database engine + migrations | Resolved | `src/noteflow/infrastructure/persistence/database.py` info logs |
|
||||
| Diarization full timing | Resolved | `src/noteflow/infrastructure/diarization/engine.py` uses `log_timing` |
|
||||
| Diarization job timeout logging | Resolved | `src/noteflow/grpc/_mixins/diarization/_status.py` |
|
||||
| Meeting state transitions | Resolved | `src/noteflow/application/services/meeting_service.py` |
|
||||
| Streaming cleanup | Resolved | `src/noteflow/grpc/_mixins/streaming/_cleanup.py` |
|
||||
|
||||
### NOT IMPLEMENTED
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| NER warmup timing/logs | Not implemented | `src/noteflow/application/services/ner_service.py` uses `run_in_executor` without logs |
|
||||
| ASR `transcribe_async` timing | Not implemented | `src/noteflow/infrastructure/asr/engine.py` lacks duration/RTF logs |
|
||||
| Segmenter state transitions | Not implemented | `src/noteflow/infrastructure/asr/segmenter.py` no transition logs |
|
||||
| Silent UUID parsing (workspace) | Not implemented | `src/noteflow/grpc/_mixins/meeting.py` returns None on ValueError |
|
||||
| Silent meeting-id parsing | Not implemented | `src/noteflow/grpc/_mixins/converters/_id_parsing.py` returns None on ValueError |
|
||||
| Silent calendar datetime parsing | Not implemented | `src/noteflow/infrastructure/triggers/calendar.py` returns None on ValueError |
|
||||
| Settings fallback logging | Not implemented | `_get_llm_settings`, `_get_webhook_settings`, `diarization_job_ttl_seconds` |
|
||||
| gRPC client stub missing logs | Not implemented | `src/noteflow/grpc/_client_mixins/*.py` return None silently |
|
||||
| Rust gRPC connection tracing | Not implemented | `client/src-tauri/src/grpc/client/core.rs` no start/finish timing |
|
||||
|
||||
**Downstream impact**: Runtime visibility gaps for user-facing latency, failure diagnosis, and client connection issues.
|
||||
|
||||
---
|
||||
|
||||
## Objective
|
||||
|
||||
Close remaining high-impact logging gaps for runtime operations and input validation to reduce debugging time and improve failure diagnosis across Python gRPC services and the Tauri client.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| **Timing utility** | Use `log_timing` | Consistent duration metrics and structured fields |
|
||||
| **Invalid input logging** | Warn-level with redaction | Catch client errors without leaking sensitive data |
|
||||
| **Stub-missing logging** | Rate-limited (once per client instance) | Avoid log spam while preserving visibility |
|
||||
|
||||
---
|
||||
|
||||
## What Already Exists
|
||||
|
||||
| Asset | Location | Implication |
|
||||
|-------|----------|-------------|
|
||||
| `log_timing` helper | `src/noteflow/infrastructure/logging/timing.py` | Use for executor + network timing |
|
||||
| `log_state_transition` | `src/noteflow/infrastructure/logging/transitions.py` | Reuse for state-machine transitions |
|
||||
| Existing log_timing usage | `ollama_provider.py`, `cloud_provider.py`, `google_adapter.py` | Follow established patterns |
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
| Task | Effort | Notes |
|
||||
|------|--------|-------|
|
||||
| **Application Layer** | | |
|
||||
| Add NER warmup + extraction timing logs | S | Use `log_timing` around `run_in_executor` |
|
||||
| **Infrastructure Layer** | | |
|
||||
| Add ASR `transcribe_async` duration + RTF logging | M | Include audio duration and model size |
|
||||
| Add segmenter state transition logs | S | Use `log_state_transition` or structured info logs |
|
||||
| Add settings fallback warning logs | S | `_get_llm_settings`, `_get_webhook_settings`, `diarization_job_ttl_seconds` |
|
||||
| **API Layer** | | |
|
||||
| Log invalid workspace UUID parsing (WARN + redaction) | S | `src/noteflow/grpc/_mixins/meeting.py` |
|
||||
| Log invalid meeting_id parsing (WARN + redaction) | S | `src/noteflow/grpc/_mixins/converters/_id_parsing.py` |
|
||||
| Log calendar datetime parse failures (WARN + redaction) | S | `src/noteflow/infrastructure/triggers/calendar.py` |
|
||||
| gRPC client mixins log missing stub (rate-limited) | S | `src/noteflow/grpc/_client_mixins/*.py` |
|
||||
| **Client Layer** | | |
|
||||
| Add tracing for gRPC connect attempts | S | `client/src-tauri/src/grpc/client/core.rs` |
|
||||
|
||||
**Total Effort**: M (2-4 hours)
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
### Backend
|
||||
|
||||
**Application Layer**:
|
||||
- [ ] `src/noteflow/application/services/ner_service.py` — add warmup/extraction timing logs
|
||||
|
||||
**Infrastructure Layer**:
|
||||
- [ ] `src/noteflow/infrastructure/asr/engine.py` — log transcription duration + RTF
|
||||
- [ ] `src/noteflow/infrastructure/asr/segmenter.py` — log state transitions
|
||||
- [ ] `src/noteflow/infrastructure/summarization/cloud_provider.py` — log settings fallback
|
||||
- [ ] `src/noteflow/infrastructure/webhooks/executor.py` — log settings fallback
|
||||
|
||||
**API Layer**:
|
||||
- [ ] `src/noteflow/grpc/_mixins/meeting.py` — log invalid workspace UUID parse (WARN + redaction)
|
||||
- [ ] `src/noteflow/grpc/_mixins/converters/_id_parsing.py` — log invalid meeting_id parse (WARN + redaction)
|
||||
- [ ] `src/noteflow/infrastructure/triggers/calendar.py` — log datetime parse errors (WARN + redaction)
|
||||
- [ ] `src/noteflow/grpc/_client_mixins/*.py` — log missing stub (rate-limited)
|
||||
- [ ] `src/noteflow/grpc/_mixins/diarization_job.py` — log settings fallback
|
||||
|
||||
### Client
|
||||
|
||||
- [ ] `client/src-tauri/src/grpc/client/core.rs` — log connection attempt duration + endpoint
|
||||
|
||||
---
|
||||
|
||||
## Test Strategy
|
||||
|
||||
### Core test cases
|
||||
|
||||
- **Application**: `caplog` validates NER warmup logs appear when lazy-load path is taken
|
||||
- **Infrastructure**: `caplog` validates ASR timing log fields include duration and audio length
|
||||
- **API**: invalid UUID parsing emits warning and aborts/returns safely
|
||||
- **Client**: basic unit test or log snapshot for connection start/failure paths
|
||||
|
||||
---
|
||||
|
||||
## Quality Gates
|
||||
|
||||
- [ ] Added logs use structured fields and follow existing logging patterns
|
||||
- [ ] No new `# type: ignore` or `Any` introduced
|
||||
- [ ] Targeted tests for new logging paths where practical
|
||||
- [ ] `ruff check` + `mypy` pass (backend)
|
||||
- [ ] `npm run lint:rs` pass (client)
|
||||
|
||||
---
|
||||
|
||||
## Post-Sprint
|
||||
|
||||
- [ ] Evaluate if logging should be sampled for high-frequency segmenter transitions
|
||||
- [ ] Consider centralized log suppression for repeated invalid client inputs
|
||||
@@ -1,144 +0,0 @@
|
||||
# Sprint: Logging Gap Remediation (P2 - Persistence/Exports)
|
||||
|
||||
> **Size**: L | **Owner**: Platform | **Prerequisites**: P1 logging gaps resolved
|
||||
> **Phase**: Observability - Data & Lifecycle
|
||||
|
||||
---
|
||||
|
||||
## Open Issues & Prerequisites
|
||||
|
||||
> ⚠️ **Review Date**: 2026-01-03 — Verification complete, scope needs prioritization.
|
||||
|
||||
### Blocking Issues
|
||||
|
||||
| ID | Issue | Status | Resolution |
|
||||
|----|-------|--------|------------|
|
||||
| **B1** | Log volume for repository CRUD operations | Pending | Decide sampling/level policy |
|
||||
| **B2** | Sensitive data in repository logs | Pending | Redaction and field allowlist |
|
||||
|
||||
### Design Gaps to Address
|
||||
|
||||
| ID | Gap | Resolution |
|
||||
|----|-----|------------|
|
||||
| G1 | Consistent DB timing strategy across BaseRepository and UoW | Add `log_timing` helpers or per-method timing |
|
||||
| G2 | Export logs should include size without dumping content | Log byte count + segment count only |
|
||||
|
||||
### Prerequisite Verification
|
||||
|
||||
| Prerequisite | Status | Notes |
|
||||
|--------------|--------|-------|
|
||||
| Logging helpers available | ✅ | `log_timing`, `get_logger` |
|
||||
| State transition logger | ✅ | `log_state_transition` |
|
||||
|
||||
---
|
||||
|
||||
## Validation Status (2026-01-03)
|
||||
|
||||
### PARTIALLY IMPLEMENTED
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| DB migrations lifecycle logs | Partial | Migration start/end logged; repo/UoW still silent |
|
||||
| Audio writer open logging | Partial | Open/flush errors logged, but thread lifecycle unlogged |
|
||||
|
||||
### NOT IMPLEMENTED
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| BaseRepository query timing | Not implemented | `src/noteflow/infrastructure/persistence/repositories/_base.py` |
|
||||
| UnitOfWork lifecycle logs | Not implemented | `src/noteflow/infrastructure/persistence/unit_of_work.py` |
|
||||
| Repository CRUD logging | Not implemented | `meeting_repo.py`, `segment_repo.py`, `summary_repo.py`, etc. |
|
||||
| Asset deletion no-op logging | Not implemented | `src/noteflow/infrastructure/persistence/repositories/asset_repo.py` |
|
||||
| Export timing/logging | Not implemented | `pdf.py`, `markdown.py`, `html.py` |
|
||||
| Diarization session close log level | Not implemented | `src/noteflow/infrastructure/diarization/session.py` uses debug |
|
||||
| Background task lifecycle logs | Not implemented | `src/noteflow/grpc/_mixins/diarization/_jobs.py` task creation missing |
|
||||
|
||||
**Downstream impact**: Limited visibility into DB performance, export latency, and lifecycle cleanup.
|
||||
|
||||
---
|
||||
|
||||
## Objective
|
||||
|
||||
Add structured logging for persistence, export, and lifecycle operations so DB performance issues and long-running exports are diagnosable without ad-hoc debugging.
|
||||
|
||||
---
|
||||
|
||||
## Key Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| **Repository logging level** | INFO for mutations, DEBUG for reads | Avoid log noise while capturing state changes |
|
||||
| **Timing strategy** | `log_timing` around DB write batches | Consistent duration metrics without per-row spam |
|
||||
| **Export logging** | Log sizes and durations only | Avoid dumping user content |
|
||||
|
||||
---
|
||||
|
||||
## What Already Exists
|
||||
|
||||
| Asset | Location | Implication |
|
||||
|-------|----------|-------------|
|
||||
| Migration logging | `src/noteflow/infrastructure/persistence/database.py` | Reuse for DB lifecycle logs |
|
||||
| Log helpers | `src/noteflow/infrastructure/logging/*` | Standardize on structured logging |
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
| Task | Effort | Notes |
|
||||
|------|--------|-------|
|
||||
| **Infrastructure Layer** | | |
|
||||
| Add BaseRepository timing wrappers | M | `_execute_*` methods emit duration |
|
||||
| Add UnitOfWork lifecycle logs | S | __aenter__/commit/rollback/exit |
|
||||
| Add CRUD mutation logs in repositories | L | Create/Update/Delete summary logs |
|
||||
| Add asset deletion no-op log | S | log when directory missing |
|
||||
| Add export timing logs | M | PDF/Markdown/HTML export duration + size |
|
||||
| Promote diarization session close to INFO | S | `session.py` |
|
||||
| Log diarization job task creation | S | `grpc/_mixins/diarization/_jobs.py` |
|
||||
| Add audio flush thread lifecycle logs | S | `infrastructure/audio/writer.py` |
|
||||
|
||||
**Total Effort**: L (4-8 hours)
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
### Backend
|
||||
|
||||
**Infrastructure Layer**:
|
||||
- [ ] `src/noteflow/infrastructure/persistence/repositories/_base.py` — timing logs for DB operations
|
||||
- [ ] `src/noteflow/infrastructure/persistence/unit_of_work.py` — session/commit/rollback logs
|
||||
- [ ] `src/noteflow/infrastructure/persistence/repositories/*_repo.py` — mutation logging
|
||||
- [ ] `src/noteflow/infrastructure/persistence/repositories/asset_repo.py` — no-op delete log
|
||||
- [ ] `src/noteflow/infrastructure/export/pdf.py` — duration + byte-size log
|
||||
- [ ] `src/noteflow/infrastructure/export/markdown.py` — export count log
|
||||
- [ ] `src/noteflow/infrastructure/export/html.py` — export count log
|
||||
- [ ] `src/noteflow/infrastructure/diarization/session.py` — info-level close log
|
||||
- [ ] `src/noteflow/grpc/_mixins/diarization/_jobs.py` — background task creation log
|
||||
- [ ] `src/noteflow/infrastructure/audio/writer.py` — flush thread lifecycle logs
|
||||
|
||||
---
|
||||
|
||||
## Test Strategy
|
||||
|
||||
### Core test cases
|
||||
|
||||
- **Repositories**: `caplog` validates mutation logging for create/update/delete
|
||||
- **UnitOfWork**: log emitted on commit/rollback paths
|
||||
- **Exports**: ensure logs include duration and output size (bytes/segments)
|
||||
- **Lifecycle**: diarization session close emits info log
|
||||
|
||||
---
|
||||
|
||||
## Quality Gates
|
||||
|
||||
- [ ] Logging includes structured fields and avoids payload content
|
||||
- [ ] No new `# type: ignore` or `Any` introduced
|
||||
- [ ] `pytest` passes for touched modules
|
||||
- [ ] `ruff check` + `mypy` pass
|
||||
|
||||
---
|
||||
|
||||
## Post-Sprint
|
||||
|
||||
- [ ] Assess performance impact of repo timing logs
|
||||
- [ ] Consider opt-in logging for high-volume read paths
|
||||
@@ -95,9 +95,7 @@ class UsageEvent:
|
||||
event_type: str,
|
||||
metrics: UsageMetrics,
|
||||
*,
|
||||
meeting_id: str | None = None,
|
||||
success: bool = True,
|
||||
error_code: str | None = None,
|
||||
context: UsageEventContext | None = None,
|
||||
attributes: dict[str, object] | None = None,
|
||||
) -> UsageEvent:
|
||||
"""Create usage event from metrics object.
|
||||
@@ -105,28 +103,36 @@ class UsageEvent:
|
||||
Args:
|
||||
event_type: Event type identifier.
|
||||
metrics: Provider/model metrics.
|
||||
meeting_id: Associated meeting ID.
|
||||
success: Whether the operation succeeded.
|
||||
error_code: Error code if failed.
|
||||
context: Context fields for the event.
|
||||
attributes: Additional context attributes.
|
||||
|
||||
Returns:
|
||||
New UsageEvent instance.
|
||||
"""
|
||||
resolved_context = context or UsageEventContext()
|
||||
return cls(
|
||||
event_type=event_type,
|
||||
meeting_id=meeting_id,
|
||||
meeting_id=resolved_context.meeting_id,
|
||||
provider_name=metrics.provider_name,
|
||||
model_name=metrics.model_name,
|
||||
tokens_input=metrics.tokens_input,
|
||||
tokens_output=metrics.tokens_output,
|
||||
latency_ms=metrics.latency_ms,
|
||||
success=success,
|
||||
error_code=error_code,
|
||||
success=resolved_context.success,
|
||||
error_code=resolved_context.error_code,
|
||||
attributes=attributes or {},
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class UsageEventContext:
|
||||
"""Common context fields for usage events."""
|
||||
|
||||
meeting_id: str | None = None
|
||||
success: bool = True
|
||||
error_code: str | None = None
|
||||
|
||||
|
||||
class UsageEventSink(Protocol):
|
||||
"""Protocol for usage event emission.
|
||||
|
||||
@@ -147,9 +153,7 @@ class UsageEventSink(Protocol):
|
||||
event_type: str,
|
||||
metrics: UsageMetrics | None = None,
|
||||
*,
|
||||
meeting_id: str | None = None,
|
||||
success: bool = True,
|
||||
error_code: str | None = None,
|
||||
context: UsageEventContext | None = None,
|
||||
**attributes: object,
|
||||
) -> None:
|
||||
"""Convenience method to record a usage event with common fields.
|
||||
@@ -157,9 +161,7 @@ class UsageEventSink(Protocol):
|
||||
Args:
|
||||
event_type: Event type identifier.
|
||||
metrics: Optional provider/model metrics.
|
||||
meeting_id: Associated meeting ID.
|
||||
success: Whether the operation succeeded.
|
||||
error_code: Error code if failed.
|
||||
context: Context fields for the event.
|
||||
**attributes: Additional context attributes.
|
||||
"""
|
||||
...
|
||||
@@ -176,9 +178,7 @@ class NullUsageEventSink:
|
||||
event_type: str,
|
||||
metrics: UsageMetrics | None = None,
|
||||
*,
|
||||
meeting_id: str | None = None,
|
||||
success: bool = True,
|
||||
error_code: str | None = None,
|
||||
context: UsageEventContext | None = None,
|
||||
**attributes: object,
|
||||
) -> None:
|
||||
"""Discard the event."""
|
||||
|
||||
42
src/noteflow/application/services/_meeting_types.py
Normal file
42
src/noteflow/application/services/_meeting_types.py
Normal file
@@ -0,0 +1,42 @@
|
||||
"""Meeting service shared types."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
from noteflow.domain.entities import WordTiming
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class SegmentData:
|
||||
"""Data for creating a transcript segment.
|
||||
|
||||
Groups segment parameters to reduce parameter count in service methods.
|
||||
"""
|
||||
|
||||
segment_id: int
|
||||
"""Segment sequence number."""
|
||||
|
||||
text: str
|
||||
"""Transcript text."""
|
||||
|
||||
start_time: float
|
||||
"""Start time in seconds."""
|
||||
|
||||
end_time: float
|
||||
"""End time in seconds."""
|
||||
|
||||
words: list[WordTiming] = field(default_factory=list)
|
||||
"""Optional word-level timing."""
|
||||
|
||||
language: str = "en"
|
||||
"""Detected language code."""
|
||||
|
||||
language_confidence: float = 0.0
|
||||
"""Language detection confidence."""
|
||||
|
||||
avg_logprob: float = 0.0
|
||||
"""Average log probability."""
|
||||
|
||||
no_speech_prob: float = 0.0
|
||||
"""No-speech probability."""
|
||||
@@ -6,7 +6,8 @@ Uses existing Integration entity and IntegrationRepository for persistence.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import TYPE_CHECKING
|
||||
from datetime import datetime
|
||||
from typing import TYPE_CHECKING, TypedDict, Unpack
|
||||
from uuid import UUID
|
||||
|
||||
from noteflow.config.constants import ERR_TOKEN_REFRESH_PREFIX, OAUTH_FIELD_ACCESS_TOKEN
|
||||
@@ -23,6 +24,14 @@ from noteflow.infrastructure.calendar.oauth_manager import OAuthError
|
||||
from noteflow.infrastructure.calendar.outlook_adapter import OutlookCalendarError
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
|
||||
class _CalendarServiceDepsKwargs(TypedDict, total=False):
|
||||
"""Optional dependency overrides for CalendarService."""
|
||||
|
||||
oauth_manager: OAuthManager
|
||||
google_adapter: GoogleCalendarAdapter
|
||||
outlook_adapter: OutlookCalendarAdapter
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Callable
|
||||
|
||||
@@ -53,21 +62,20 @@ class CalendarService:
|
||||
self,
|
||||
uow_factory: Callable[[], UnitOfWork],
|
||||
settings: CalendarIntegrationSettings,
|
||||
oauth_manager: OAuthManager | None = None,
|
||||
google_adapter: GoogleCalendarAdapter | None = None,
|
||||
outlook_adapter: OutlookCalendarAdapter | None = None,
|
||||
**kwargs: Unpack[_CalendarServiceDepsKwargs],
|
||||
) -> None:
|
||||
"""Initialize calendar service.
|
||||
|
||||
Args:
|
||||
uow_factory: Factory function returning UnitOfWork instances.
|
||||
settings: Calendar settings with OAuth credentials.
|
||||
oauth_manager: Optional OAuth manager (created from settings if not provided).
|
||||
google_adapter: Optional Google adapter (created if not provided).
|
||||
outlook_adapter: Optional Outlook adapter (created if not provided).
|
||||
**kwargs: Optional dependency overrides.
|
||||
"""
|
||||
self._uow_factory = uow_factory
|
||||
self._settings = settings
|
||||
oauth_manager = kwargs.get("oauth_manager")
|
||||
google_adapter = kwargs.get("google_adapter")
|
||||
outlook_adapter = kwargs.get("outlook_adapter")
|
||||
self._oauth_manager = oauth_manager or OAuthManager(settings)
|
||||
self._google_adapter = google_adapter or GoogleCalendarAdapter()
|
||||
self._outlook_adapter = outlook_adapter or OutlookCalendarAdapter()
|
||||
@@ -126,8 +134,22 @@ class CalendarService:
|
||||
"""
|
||||
oauth_provider = self._parse_provider(provider)
|
||||
|
||||
tokens = await self._exchange_tokens(oauth_provider, code, state)
|
||||
email = await self._fetch_provider_email(oauth_provider, tokens.access_token)
|
||||
integration_id = await self._store_calendar_integration(provider, email, tokens)
|
||||
|
||||
logger.info("Completed OAuth for provider=%s, email=%s", provider, email)
|
||||
return integration_id
|
||||
|
||||
async def _exchange_tokens(
|
||||
self,
|
||||
oauth_provider: OAuthProvider,
|
||||
code: str,
|
||||
state: str,
|
||||
) -> OAuthTokens:
|
||||
"""Exchange authorization code for tokens."""
|
||||
try:
|
||||
tokens = await self._oauth_manager.complete_auth(
|
||||
return await self._oauth_manager.complete_auth(
|
||||
provider=oauth_provider,
|
||||
code=code,
|
||||
state=state,
|
||||
@@ -135,13 +157,24 @@ class CalendarService:
|
||||
except OAuthError as e:
|
||||
raise CalendarServiceError(f"OAuth failed: {e}") from e
|
||||
|
||||
# Get user email from provider
|
||||
async def _fetch_provider_email(
|
||||
self,
|
||||
oauth_provider: OAuthProvider,
|
||||
access_token: str,
|
||||
) -> str:
|
||||
"""Fetch the account email for a provider."""
|
||||
try:
|
||||
email = await self._fetch_account_email(oauth_provider, tokens.access_token)
|
||||
return await self._fetch_account_email(oauth_provider, access_token)
|
||||
except (GoogleCalendarError, OutlookCalendarError) as e:
|
||||
raise CalendarServiceError(f"Failed to get user email: {e}") from e
|
||||
|
||||
# Persist integration and tokens
|
||||
async def _store_calendar_integration(
|
||||
self,
|
||||
provider: str,
|
||||
email: str,
|
||||
tokens: OAuthTokens,
|
||||
) -> UUID:
|
||||
"""Persist calendar integration and encrypted tokens."""
|
||||
async with self._uow_factory() as uow:
|
||||
integration = await uow.integrations.get_by_provider(
|
||||
provider=provider,
|
||||
@@ -162,18 +195,13 @@ class CalendarService:
|
||||
integration.connect(provider_email=email)
|
||||
await uow.integrations.update(integration)
|
||||
|
||||
# Store encrypted tokens
|
||||
await uow.integrations.set_secrets(
|
||||
integration_id=integration.id,
|
||||
secrets=tokens.to_secrets_dict(),
|
||||
)
|
||||
await uow.commit()
|
||||
|
||||
# Capture ID before leaving context manager
|
||||
integration_id = integration.id
|
||||
|
||||
logger.info("Completed OAuth for provider=%s, email=%s", provider, email)
|
||||
return integration_id
|
||||
return integration.id
|
||||
|
||||
async def get_connection_status(self, provider: str) -> OAuthConnectionInfo:
|
||||
"""Get OAuth connection status for a provider.
|
||||
@@ -198,17 +226,7 @@ class CalendarService:
|
||||
|
||||
# Check token expiry
|
||||
secrets = await uow.integrations.get_secrets(integration.id)
|
||||
expires_at = None
|
||||
status = self._map_integration_status(integration.status)
|
||||
|
||||
if secrets and integration.is_connected:
|
||||
try:
|
||||
tokens = OAuthTokens.from_secrets_dict(secrets)
|
||||
expires_at = tokens.expires_at
|
||||
if tokens.is_expired():
|
||||
status = "expired"
|
||||
except (KeyError, ValueError):
|
||||
status = IntegrationStatus.ERROR.value
|
||||
status, expires_at = self._resolve_connection_status(integration, secrets)
|
||||
|
||||
return OAuthConnectionInfo(
|
||||
provider=provider,
|
||||
@@ -418,3 +436,23 @@ class CalendarService:
|
||||
def _map_integration_status(status: IntegrationStatus) -> str:
|
||||
"""Map IntegrationStatus to connection status string."""
|
||||
return status.value if status in IntegrationStatus else IntegrationStatus.DISCONNECTED.value
|
||||
|
||||
@staticmethod
|
||||
def _resolve_connection_status(
|
||||
integration: Integration,
|
||||
secrets: dict[str, str] | None,
|
||||
) -> tuple[str, datetime | None]:
|
||||
"""Resolve connection status and expiration time from stored secrets."""
|
||||
status = CalendarService._map_integration_status(integration.status)
|
||||
if not secrets or not integration.is_connected:
|
||||
return status, None
|
||||
|
||||
try:
|
||||
tokens = OAuthTokens.from_secrets_dict(secrets)
|
||||
except (KeyError, ValueError):
|
||||
return IntegrationStatus.ERROR.value, None
|
||||
|
||||
expires_at = tokens.expires_at
|
||||
if tokens.is_expired():
|
||||
return "expired", expires_at
|
||||
return status, expires_at
|
||||
|
||||
@@ -172,44 +172,72 @@ class ExportService:
|
||||
format=fmt.value if fmt else "inferred",
|
||||
)
|
||||
|
||||
# Determine format from extension if not provided
|
||||
if fmt is None:
|
||||
fmt = self.infer_format_from_extension(output_path.suffix)
|
||||
logger.debug(
|
||||
"Format inferred from extension",
|
||||
extension=output_path.suffix,
|
||||
inferred_format=fmt.value,
|
||||
)
|
||||
|
||||
fmt = self._resolve_export_format(output_path, fmt)
|
||||
content = await self.export_transcript(meeting_id, fmt)
|
||||
|
||||
# Ensure correct extension
|
||||
exporter = self.get_exporter(fmt)
|
||||
original_path = output_path
|
||||
if output_path.suffix != exporter.file_extension:
|
||||
output_path = output_path.with_suffix(exporter.file_extension)
|
||||
logger.debug(
|
||||
"Adjusted file extension",
|
||||
original_path=str(original_path),
|
||||
adjusted_path=str(output_path),
|
||||
expected_extension=exporter.file_extension,
|
||||
)
|
||||
output_path = self._ensure_output_extension(output_path, exporter)
|
||||
file_size = self._write_export_content(output_path, content, meeting_id, fmt)
|
||||
|
||||
logger.info(
|
||||
"File export completed",
|
||||
meeting_id=str(meeting_id),
|
||||
output_path=str(output_path),
|
||||
format=fmt.value,
|
||||
file_size_bytes=file_size,
|
||||
)
|
||||
|
||||
return output_path
|
||||
|
||||
def _resolve_export_format(
|
||||
self,
|
||||
output_path: Path,
|
||||
fmt: ExportFormat | None,
|
||||
) -> ExportFormat:
|
||||
"""Resolve export format, inferring from file extension if needed."""
|
||||
if fmt is not None:
|
||||
return fmt
|
||||
|
||||
inferred = self.infer_format_from_extension(output_path.suffix)
|
||||
logger.debug(
|
||||
"Format inferred from extension",
|
||||
extension=output_path.suffix,
|
||||
inferred_format=inferred.value,
|
||||
)
|
||||
return inferred
|
||||
|
||||
def _ensure_output_extension(
|
||||
self,
|
||||
output_path: Path,
|
||||
exporter: TranscriptExporter,
|
||||
) -> Path:
|
||||
"""Ensure output path uses the exporter's extension."""
|
||||
if output_path.suffix == exporter.file_extension:
|
||||
return output_path
|
||||
|
||||
adjusted_path = output_path.with_suffix(exporter.file_extension)
|
||||
logger.debug(
|
||||
"Adjusted file extension",
|
||||
original_path=str(output_path),
|
||||
adjusted_path=str(adjusted_path),
|
||||
expected_extension=exporter.file_extension,
|
||||
)
|
||||
return adjusted_path
|
||||
|
||||
def _write_export_content(
|
||||
self,
|
||||
output_path: Path,
|
||||
content: str | bytes,
|
||||
meeting_id: MeetingId,
|
||||
fmt: ExportFormat,
|
||||
) -> int:
|
||||
"""Write export content to disk and return file size."""
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
try:
|
||||
if isinstance(content, bytes):
|
||||
output_path.write_bytes(content)
|
||||
else:
|
||||
output_path.write_text(content, encoding="utf-8")
|
||||
|
||||
file_size = output_path.stat().st_size
|
||||
logger.info(
|
||||
"File export completed",
|
||||
meeting_id=str(meeting_id),
|
||||
output_path=str(output_path),
|
||||
format=fmt.value,
|
||||
file_size_bytes=file_size,
|
||||
)
|
||||
return output_path.stat().st_size
|
||||
except OSError as exc:
|
||||
logger.error(
|
||||
"File write failed",
|
||||
@@ -219,8 +247,6 @@ class ExportService:
|
||||
)
|
||||
raise
|
||||
|
||||
return output_path
|
||||
|
||||
def infer_format_from_extension(self, extension: str) -> ExportFormat:
|
||||
"""Infer export format from file extension.
|
||||
|
||||
|
||||
@@ -6,22 +6,14 @@ Orchestrates meeting-related use cases with persistence.
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import UTC, datetime
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, NotRequired, Required, TypedDict, Unpack
|
||||
|
||||
from noteflow.domain.entities import (
|
||||
ActionItem,
|
||||
Annotation,
|
||||
KeyPoint,
|
||||
Meeting,
|
||||
Segment,
|
||||
Summary,
|
||||
WordTiming,
|
||||
)
|
||||
from noteflow.domain.entities import ActionItem, Annotation, KeyPoint, Meeting, Segment, Summary
|
||||
from noteflow.domain.value_objects import AnnotationId, AnnotationType, MeetingId
|
||||
from noteflow.infrastructure.logging import get_logger, log_state_transition
|
||||
|
||||
from ._meeting_types import SegmentData
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence as SequenceType
|
||||
|
||||
@@ -31,39 +23,24 @@ if TYPE_CHECKING:
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class SegmentData:
|
||||
"""Data for creating a transcript segment.
|
||||
class _SummarySaveKwargs(TypedDict, total=False):
|
||||
"""Optional summary fields for save_summary."""
|
||||
|
||||
Groups segment parameters to reduce parameter count in service methods.
|
||||
"""
|
||||
key_points: list[KeyPoint] | None
|
||||
action_items: list[ActionItem] | None
|
||||
provider_name: str
|
||||
model_name: str
|
||||
|
||||
segment_id: int
|
||||
"""Segment sequence number."""
|
||||
|
||||
text: str
|
||||
"""Transcript text."""
|
||||
class _AnnotationCreateKwargs(TypedDict):
|
||||
"""Required fields for creating an annotation."""
|
||||
|
||||
start_time: float
|
||||
"""Start time in seconds."""
|
||||
|
||||
end_time: float
|
||||
"""End time in seconds."""
|
||||
|
||||
words: list[WordTiming] = field(default_factory=list)
|
||||
"""Optional word-level timing."""
|
||||
|
||||
language: str = "en"
|
||||
"""Detected language code."""
|
||||
|
||||
language_confidence: float = 0.0
|
||||
"""Language detection confidence."""
|
||||
|
||||
avg_logprob: float = 0.0
|
||||
"""Average log probability."""
|
||||
|
||||
no_speech_prob: float = 0.0
|
||||
"""No-speech probability."""
|
||||
meeting_id: Required[MeetingId]
|
||||
annotation_type: Required[AnnotationType]
|
||||
text: Required[str]
|
||||
start_time: Required[float]
|
||||
end_time: Required[float]
|
||||
segment_ids: NotRequired[list[int] | None]
|
||||
|
||||
|
||||
class MeetingService:
|
||||
@@ -338,29 +315,27 @@ class MeetingService:
|
||||
self,
|
||||
meeting_id: MeetingId,
|
||||
executive_summary: str,
|
||||
key_points: list[KeyPoint] | None = None,
|
||||
action_items: list[ActionItem] | None = None,
|
||||
provider_name: str = "",
|
||||
model_name: str = "",
|
||||
**kwargs: Unpack[_SummarySaveKwargs],
|
||||
) -> Summary:
|
||||
"""Save or update a meeting summary.
|
||||
|
||||
Args:
|
||||
meeting_id: Meeting identifier.
|
||||
executive_summary: Executive summary text.
|
||||
key_points: List of key points.
|
||||
action_items: List of action items.
|
||||
provider_name: Name of the provider that generated the summary.
|
||||
model_name: Name of the model that generated the summary.
|
||||
**kwargs: Optional summary fields (key_points, action_items, provider_name, model_name).
|
||||
|
||||
Returns:
|
||||
Saved summary.
|
||||
"""
|
||||
key_points = kwargs.get("key_points") or []
|
||||
action_items = kwargs.get("action_items") or []
|
||||
provider_name = kwargs.get("provider_name", "")
|
||||
model_name = kwargs.get("model_name", "")
|
||||
summary = Summary(
|
||||
meeting_id=meeting_id,
|
||||
executive_summary=executive_summary,
|
||||
key_points=key_points or [],
|
||||
action_items=action_items or [],
|
||||
key_points=key_points,
|
||||
action_items=action_items,
|
||||
generated_at=datetime.now(UTC),
|
||||
provider_name=provider_name,
|
||||
model_name=model_name,
|
||||
@@ -386,28 +361,24 @@ class MeetingService:
|
||||
|
||||
async def add_annotation(
|
||||
self,
|
||||
meeting_id: MeetingId,
|
||||
annotation_type: AnnotationType,
|
||||
text: str,
|
||||
start_time: float,
|
||||
end_time: float,
|
||||
segment_ids: list[int] | None = None,
|
||||
**kwargs: Unpack[_AnnotationCreateKwargs],
|
||||
) -> Annotation:
|
||||
"""Add an annotation to a meeting.
|
||||
|
||||
Args:
|
||||
meeting_id: Meeting identifier.
|
||||
annotation_type: Type of annotation.
|
||||
text: Annotation text.
|
||||
start_time: Start time in seconds.
|
||||
end_time: End time in seconds.
|
||||
segment_ids: Optional list of linked segment IDs.
|
||||
**kwargs: Annotation fields.
|
||||
|
||||
Returns:
|
||||
Added annotation.
|
||||
"""
|
||||
from uuid import uuid4
|
||||
|
||||
meeting_id = kwargs["meeting_id"]
|
||||
annotation_type = kwargs["annotation_type"]
|
||||
text = kwargs["text"]
|
||||
start_time = kwargs["start_time"]
|
||||
end_time = kwargs["end_time"]
|
||||
segment_ids = kwargs.get("segment_ids") or []
|
||||
annotation = Annotation(
|
||||
id=AnnotationId(uuid4()),
|
||||
meeting_id=meeting_id,
|
||||
@@ -415,7 +386,7 @@ class MeetingService:
|
||||
text=text,
|
||||
start_time=start_time,
|
||||
end_time=end_time,
|
||||
segment_ids=segment_ids or [],
|
||||
segment_ids=segment_ids,
|
||||
)
|
||||
|
||||
async with self._uow:
|
||||
|
||||
@@ -13,7 +13,7 @@ from typing import TYPE_CHECKING
|
||||
from noteflow.config.constants import ERROR_MSG_MEETING_PREFIX
|
||||
from noteflow.config.settings import get_feature_flags
|
||||
from noteflow.domain.entities.named_entity import NamedEntity
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.logging import get_logger, log_timing
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Callable, Sequence
|
||||
@@ -176,24 +176,41 @@ class NerService:
|
||||
List of extracted entities.
|
||||
"""
|
||||
async with self._extraction_lock:
|
||||
# Ensure model is loaded (thread-safe)
|
||||
if not self._ner_engine.is_ready():
|
||||
async with self._model_load_lock:
|
||||
if not self._ner_engine.is_ready():
|
||||
# Warm up model with a simple extraction
|
||||
loop = asyncio.get_running_loop()
|
||||
await loop.run_in_executor(
|
||||
None,
|
||||
lambda: self._ner_engine.extract("warm up"),
|
||||
)
|
||||
await self._ensure_model_ready()
|
||||
return await self._extract_entities(segments)
|
||||
|
||||
# Extract entities in executor (CPU-bound)
|
||||
loop = asyncio.get_running_loop()
|
||||
return await loop.run_in_executor(
|
||||
async def _ensure_model_ready(self) -> None:
|
||||
"""Ensure the NER model is loaded and warmed up safely."""
|
||||
if self._ner_engine.is_ready():
|
||||
return
|
||||
async with self._model_load_lock:
|
||||
if self._ner_engine.is_ready():
|
||||
return
|
||||
await self._warmup_model()
|
||||
|
||||
async def _warmup_model(self) -> None:
|
||||
"""Warm up the NER model with a simple extraction."""
|
||||
loop = asyncio.get_running_loop()
|
||||
with log_timing("ner_warmup"):
|
||||
await loop.run_in_executor(
|
||||
None,
|
||||
lambda: self._ner_engine.extract("warm up"),
|
||||
)
|
||||
|
||||
async def _extract_entities(
|
||||
self,
|
||||
segments: list[tuple[int, str]],
|
||||
) -> list[NamedEntity]:
|
||||
"""Extract entities in an executor (CPU-bound)."""
|
||||
loop = asyncio.get_running_loop()
|
||||
segment_count = len(segments)
|
||||
with log_timing("ner_extraction", segment_count=segment_count):
|
||||
entities = await loop.run_in_executor(
|
||||
None,
|
||||
self._ner_engine.extract_from_segments,
|
||||
segments,
|
||||
)
|
||||
return entities
|
||||
|
||||
async def get_entities(self, meeting_id: MeetingId) -> Sequence[NamedEntity]:
|
||||
"""Get cached entities for a meeting (no extraction).
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, TypedDict, Unpack
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
from noteflow.domain.entities.project import Project, ProjectSettings, slugify
|
||||
@@ -13,6 +13,31 @@ from ._types import ProjectCrudRepositoryProvider
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
|
||||
class _ProjectCreateKwargs(TypedDict, total=False):
|
||||
"""Optional fields for project creation."""
|
||||
|
||||
slug: str | None
|
||||
description: str | None
|
||||
settings: ProjectSettings | None
|
||||
|
||||
|
||||
class _ProjectListKwargs(TypedDict, total=False):
|
||||
"""Optional fields for listing projects."""
|
||||
|
||||
include_archived: bool
|
||||
limit: int
|
||||
offset: int
|
||||
|
||||
|
||||
class _ProjectUpdateKwargs(TypedDict, total=False):
|
||||
"""Optional fields for project updates."""
|
||||
|
||||
name: str | None
|
||||
slug: str | None
|
||||
description: str | None
|
||||
settings: ProjectSettings | None
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
@@ -24,9 +49,7 @@ class ProjectCrudMixin:
|
||||
uow: ProjectCrudRepositoryProvider,
|
||||
workspace_id: UUID,
|
||||
name: str,
|
||||
slug: str | None = None,
|
||||
description: str | None = None,
|
||||
settings: ProjectSettings | None = None,
|
||||
**kwargs: Unpack[_ProjectCreateKwargs],
|
||||
) -> Project:
|
||||
"""Create a new project in a workspace.
|
||||
|
||||
@@ -34,9 +57,7 @@ class ProjectCrudMixin:
|
||||
uow: Unit of work for database access.
|
||||
workspace_id: Parent workspace UUID.
|
||||
name: Project name.
|
||||
slug: Optional URL slug (auto-generated from name if not provided).
|
||||
description: Optional project description.
|
||||
settings: Optional project settings.
|
||||
**kwargs: Optional fields (slug, description, settings).
|
||||
|
||||
Returns:
|
||||
Created project.
|
||||
@@ -49,6 +70,9 @@ class ProjectCrudMixin:
|
||||
raise NotImplementedError(msg)
|
||||
|
||||
project_id = uuid4()
|
||||
slug = kwargs.get("slug")
|
||||
description = kwargs.get("description")
|
||||
settings = kwargs.get("settings")
|
||||
generated_slug = slug or slugify(name)
|
||||
|
||||
project = await uow.projects.create(
|
||||
@@ -124,18 +148,14 @@ class ProjectCrudMixin:
|
||||
self,
|
||||
uow: ProjectCrudRepositoryProvider,
|
||||
workspace_id: UUID,
|
||||
include_archived: bool = False,
|
||||
limit: int = 50,
|
||||
offset: int = 0,
|
||||
**kwargs: Unpack[_ProjectListKwargs],
|
||||
) -> Sequence[Project]:
|
||||
"""List projects in a workspace.
|
||||
|
||||
Args:
|
||||
uow: Unit of work for database access.
|
||||
workspace_id: Workspace UUID.
|
||||
include_archived: Whether to include archived projects.
|
||||
limit: Maximum projects to return.
|
||||
offset: Pagination offset.
|
||||
**kwargs: Optional filters (include_archived, limit, offset).
|
||||
|
||||
Returns:
|
||||
List of projects.
|
||||
@@ -143,6 +163,9 @@ class ProjectCrudMixin:
|
||||
if not uow.supports_projects:
|
||||
return []
|
||||
|
||||
include_archived = kwargs.get("include_archived", False)
|
||||
limit = kwargs.get("limit", 50)
|
||||
offset = kwargs.get("offset", 0)
|
||||
return await uow.projects.list_for_workspace(
|
||||
workspace_id,
|
||||
include_archived=include_archived,
|
||||
@@ -154,20 +177,14 @@ class ProjectCrudMixin:
|
||||
self,
|
||||
uow: ProjectCrudRepositoryProvider,
|
||||
project_id: UUID,
|
||||
name: str | None = None,
|
||||
slug: str | None = None,
|
||||
description: str | None = None,
|
||||
settings: ProjectSettings | None = None,
|
||||
**kwargs: Unpack[_ProjectUpdateKwargs],
|
||||
) -> Project | None:
|
||||
"""Update a project.
|
||||
|
||||
Args:
|
||||
uow: Unit of work for database access.
|
||||
project_id: Project UUID.
|
||||
name: New name (optional).
|
||||
slug: New slug (optional).
|
||||
description: New description (optional).
|
||||
settings: New settings (optional).
|
||||
**kwargs: Optional updates (name, slug, description, settings).
|
||||
|
||||
Returns:
|
||||
Updated project if found, None otherwise.
|
||||
@@ -179,6 +196,11 @@ class ProjectCrudMixin:
|
||||
if not project:
|
||||
return None
|
||||
|
||||
name = kwargs.get("name")
|
||||
slug = kwargs.get("slug")
|
||||
description = kwargs.get("description")
|
||||
settings = kwargs.get("settings")
|
||||
|
||||
if name is not None:
|
||||
project.update_name(name)
|
||||
if slug is not None:
|
||||
|
||||
@@ -48,6 +48,14 @@ class RecoveryResult:
|
||||
return self.meetings_recovered + self.diarization_jobs_failed
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class _RecoveryValidation:
|
||||
"""Result of meeting recovery checks."""
|
||||
|
||||
is_valid: bool
|
||||
previous_state: MeetingState
|
||||
|
||||
|
||||
class RecoveryService:
|
||||
"""Recover meetings from crash states on server startup.
|
||||
|
||||
@@ -170,40 +178,15 @@ class RecoveryService:
|
||||
recovery_time = datetime.now(UTC).isoformat()
|
||||
|
||||
for meeting in meetings:
|
||||
previous_state = meeting.state
|
||||
meeting.mark_error()
|
||||
log_state_transition(
|
||||
"meeting",
|
||||
str(meeting.id),
|
||||
previous_state,
|
||||
meeting.state,
|
||||
reason="crash_recovery",
|
||||
)
|
||||
|
||||
# Add crash recovery metadata
|
||||
meeting.metadata["crash_recovered"] = "true"
|
||||
meeting.metadata["crash_recovery_time"] = recovery_time
|
||||
meeting.metadata["crash_previous_state"] = previous_state.name
|
||||
|
||||
# Validate audio files if configured
|
||||
validation = self.validate_meeting_audio(meeting)
|
||||
meeting.metadata["audio_valid"] = str(validation.is_valid).lower()
|
||||
validation = self._recover_meeting(meeting, recovery_time)
|
||||
if not validation.is_valid:
|
||||
audio_failures += 1
|
||||
meeting.metadata["audio_error"] = validation.error_message or "unknown"
|
||||
logger.warning(
|
||||
"Audio validation failed for meeting %s: %s",
|
||||
meeting.id,
|
||||
validation.error_message,
|
||||
)
|
||||
|
||||
await self._uow.meetings.update(meeting)
|
||||
recovered.append(meeting)
|
||||
|
||||
logger.info(
|
||||
"Recovered crashed meeting: id=%s, previous_state=%s, audio_valid=%s",
|
||||
meeting.id,
|
||||
previous_state,
|
||||
validation.previous_state,
|
||||
validation.is_valid,
|
||||
)
|
||||
|
||||
@@ -215,6 +198,40 @@ class RecoveryService:
|
||||
)
|
||||
return recovered, audio_failures
|
||||
|
||||
def _recover_meeting(
|
||||
self, meeting: Meeting, recovery_time: str
|
||||
) -> _RecoveryValidation:
|
||||
"""Apply crash recovery updates to a single meeting."""
|
||||
previous_state = meeting.state
|
||||
meeting.mark_error()
|
||||
log_state_transition(
|
||||
"meeting",
|
||||
str(meeting.id),
|
||||
previous_state,
|
||||
meeting.state,
|
||||
reason="crash_recovery",
|
||||
)
|
||||
|
||||
meeting.metadata["crash_recovered"] = "true"
|
||||
meeting.metadata["crash_recovery_time"] = recovery_time
|
||||
meeting.metadata["crash_previous_state"] = previous_state.name
|
||||
|
||||
validation = self.validate_meeting_audio(meeting)
|
||||
meeting.metadata["audio_valid"] = str(validation.is_valid).lower()
|
||||
if not validation.is_valid:
|
||||
meeting.metadata["audio_error"] = validation.error_message or "unknown"
|
||||
logger.warning(
|
||||
"Audio validation failed for meeting %s: %s",
|
||||
meeting.id,
|
||||
validation.error_message,
|
||||
)
|
||||
|
||||
return _RecoveryValidation(
|
||||
is_valid=validation.is_valid,
|
||||
previous_state=previous_state,
|
||||
)
|
||||
|
||||
|
||||
async def count_crashed_meetings(self) -> int:
|
||||
"""Count meetings currently in crash states.
|
||||
|
||||
|
||||
@@ -7,7 +7,7 @@ from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, TypedDict, Unpack
|
||||
|
||||
from noteflow.application.observability.ports import (
|
||||
NullUsageEventSink,
|
||||
@@ -40,6 +40,15 @@ if TYPE_CHECKING:
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
class _SummarizationOptionsKwargs(TypedDict, total=False):
|
||||
"""Optional overrides for summarization behavior."""
|
||||
|
||||
mode: SummarizationMode | None
|
||||
max_key_points: int | None
|
||||
max_action_items: int | None
|
||||
style_prompt: str | None
|
||||
|
||||
|
||||
class SummarizationMode(Enum):
|
||||
"""Available summarization modes."""
|
||||
|
||||
@@ -182,20 +191,14 @@ class SummarizationService:
|
||||
self,
|
||||
meeting_id: MeetingId,
|
||||
segments: Sequence[Segment],
|
||||
mode: SummarizationMode | None = None,
|
||||
max_key_points: int | None = None,
|
||||
max_action_items: int | None = None,
|
||||
style_prompt: str | None = None,
|
||||
**kwargs: Unpack[_SummarizationOptionsKwargs],
|
||||
) -> SummarizationServiceResult:
|
||||
"""Generate evidence-linked summary for meeting transcript.
|
||||
|
||||
Args:
|
||||
meeting_id: The meeting ID.
|
||||
segments: Transcript segments to summarize.
|
||||
mode: Override default mode (None uses settings default).
|
||||
max_key_points: Override default max key points.
|
||||
max_action_items: Override default max action items.
|
||||
style_prompt: Optional style instruction to prepend to system prompt.
|
||||
**kwargs: Optional overrides (mode, max_key_points, max_action_items, style_prompt).
|
||||
|
||||
Returns:
|
||||
SummarizationServiceResult with summary and verification.
|
||||
@@ -204,6 +207,11 @@ class SummarizationService:
|
||||
SummarizationError: If summarization fails and no fallback available.
|
||||
ProviderUnavailableError: If no provider is available for the mode.
|
||||
"""
|
||||
mode = kwargs.get("mode")
|
||||
max_key_points = kwargs.get("max_key_points")
|
||||
max_action_items = kwargs.get("max_action_items")
|
||||
style_prompt = kwargs.get("style_prompt")
|
||||
|
||||
target_mode = mode or self.settings.default_mode
|
||||
provider, actual_mode = self._get_provider_with_fallback(target_mode)
|
||||
fallback_used = actual_mode != target_mode
|
||||
|
||||
@@ -203,29 +203,7 @@ class WebhookService:
|
||||
try:
|
||||
delivery = await self._executor.deliver(config, event_type, payload)
|
||||
deliveries.append(delivery)
|
||||
|
||||
if delivery.succeeded:
|
||||
_logger.info(
|
||||
"Webhook delivered: %s -> %s (status=%d)",
|
||||
event_type.value,
|
||||
config.url,
|
||||
delivery.status_code,
|
||||
)
|
||||
elif delivery.attempt_count > 0:
|
||||
_logger.warning(
|
||||
"Webhook failed: %s -> %s (error=%s)",
|
||||
event_type.value,
|
||||
config.url,
|
||||
delivery.error_message,
|
||||
)
|
||||
else:
|
||||
_logger.debug(
|
||||
"Webhook skipped: %s -> %s (reason=%s)",
|
||||
event_type.value,
|
||||
config.url,
|
||||
delivery.error_message,
|
||||
)
|
||||
|
||||
self._log_delivery(event_type, config.url, delivery)
|
||||
# INTENTIONAL BROAD HANDLER: Fire-and-forget webhook delivery
|
||||
# - Webhook failures must never block calling code
|
||||
# - All exceptions logged but suppressed
|
||||
@@ -234,6 +212,37 @@ class WebhookService:
|
||||
|
||||
return deliveries
|
||||
|
||||
@staticmethod
|
||||
def _log_delivery(
|
||||
event_type: WebhookEventType,
|
||||
url: str,
|
||||
delivery: WebhookDelivery,
|
||||
) -> None:
|
||||
if delivery.succeeded:
|
||||
_logger.info(
|
||||
"Webhook delivered: %s -> %s (status=%d)",
|
||||
event_type.value,
|
||||
url,
|
||||
delivery.status_code,
|
||||
)
|
||||
return
|
||||
|
||||
if delivery.attempt_count > 0:
|
||||
_logger.warning(
|
||||
"Webhook failed: %s -> %s (error=%s)",
|
||||
event_type.value,
|
||||
url,
|
||||
delivery.error_message,
|
||||
)
|
||||
return
|
||||
|
||||
_logger.debug(
|
||||
"Webhook skipped: %s -> %s (reason=%s)",
|
||||
event_type.value,
|
||||
url,
|
||||
delivery.error_message,
|
||||
)
|
||||
|
||||
async def close(self) -> None:
|
||||
"""Clean up resources."""
|
||||
await self._executor.close()
|
||||
|
||||
@@ -10,7 +10,7 @@ from __future__ import annotations
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
from enum import StrEnum
|
||||
from typing import Self, cast
|
||||
from typing import NotRequired, Required, Self, TypedDict, Unpack, cast
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
from noteflow.domain.utils.time import utc_now
|
||||
@@ -192,6 +192,28 @@ class OidcProviderCreateParams:
|
||||
"""Whether to require email verification."""
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class OidcProviderRegistration:
|
||||
"""Required fields for registering an OIDC provider."""
|
||||
|
||||
workspace_id: UUID
|
||||
name: str
|
||||
issuer_url: str
|
||||
client_id: str
|
||||
client_secret: str | None = None
|
||||
preset: OidcProviderPreset = OidcProviderPreset.CUSTOM
|
||||
|
||||
|
||||
class _OidcProviderCreateKwargs(TypedDict):
|
||||
"""Keyword arguments for OidcProviderConfig.create."""
|
||||
|
||||
workspace_id: Required[UUID]
|
||||
name: Required[str]
|
||||
issuer_url: Required[str]
|
||||
client_id: Required[str]
|
||||
params: NotRequired[OidcProviderCreateParams | None]
|
||||
|
||||
|
||||
@dataclass
|
||||
class OidcProviderConfig:
|
||||
"""OIDC provider configuration.
|
||||
@@ -231,24 +253,21 @@ class OidcProviderConfig:
|
||||
@classmethod
|
||||
def create(
|
||||
cls,
|
||||
workspace_id: UUID,
|
||||
name: str,
|
||||
issuer_url: str,
|
||||
client_id: str,
|
||||
params: OidcProviderCreateParams | None = None,
|
||||
**kwargs: Unpack[_OidcProviderCreateKwargs],
|
||||
) -> OidcProviderConfig:
|
||||
"""Create a new OIDC provider configuration.
|
||||
|
||||
Args:
|
||||
workspace_id: Workspace this provider belongs to.
|
||||
name: Display name for the provider.
|
||||
issuer_url: OIDC issuer URL (base URL for discovery).
|
||||
client_id: OAuth client ID.
|
||||
params: Optional creation parameters (preset, scopes, etc.).
|
||||
**kwargs: Provider creation fields.
|
||||
|
||||
Returns:
|
||||
New OidcProviderConfig instance.
|
||||
"""
|
||||
workspace_id = kwargs["workspace_id"]
|
||||
name = kwargs["name"]
|
||||
issuer_url = kwargs["issuer_url"]
|
||||
client_id = kwargs["client_id"]
|
||||
params = kwargs.get("params")
|
||||
p = params or OidcProviderCreateParams()
|
||||
now = utc_now()
|
||||
return cls(
|
||||
|
||||
@@ -4,7 +4,7 @@ from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, NotRequired, Required, TypedDict, Unpack
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
if TYPE_CHECKING:
|
||||
@@ -53,6 +53,16 @@ class EntityCategory(Enum):
|
||||
raise ValueError(f"Invalid entity category: {value}") from e
|
||||
|
||||
|
||||
class _NamedEntityCreateKwargs(TypedDict):
|
||||
"""Keyword arguments for NamedEntity.create."""
|
||||
|
||||
text: Required[str]
|
||||
category: Required[EntityCategory]
|
||||
segment_ids: Required[list[int]]
|
||||
confidence: Required[float]
|
||||
meeting_id: NotRequired[MeetingId | None]
|
||||
|
||||
|
||||
@dataclass
|
||||
class NamedEntity:
|
||||
"""A named entity extracted from a meeting transcript.
|
||||
@@ -85,11 +95,7 @@ class NamedEntity:
|
||||
@classmethod
|
||||
def create(
|
||||
cls,
|
||||
text: str,
|
||||
category: EntityCategory,
|
||||
segment_ids: list[int],
|
||||
confidence: float,
|
||||
meeting_id: MeetingId | None = None,
|
||||
**kwargs: Unpack[_NamedEntityCreateKwargs],
|
||||
) -> NamedEntity:
|
||||
"""Create a new named entity with validation and normalization.
|
||||
|
||||
@@ -97,11 +103,7 @@ class NamedEntity:
|
||||
and confidence validation before entity construction.
|
||||
|
||||
Args:
|
||||
text: The entity text as it appears in transcript.
|
||||
category: Classification category.
|
||||
segment_ids: Segments where entity appears (will be deduplicated and sorted).
|
||||
confidence: Extraction confidence (0.0-1.0).
|
||||
meeting_id: Optional meeting association.
|
||||
**kwargs: Named entity fields.
|
||||
|
||||
Returns:
|
||||
New NamedEntity instance with normalized fields.
|
||||
@@ -110,6 +112,12 @@ class NamedEntity:
|
||||
ValueError: If text is empty or confidence is out of range.
|
||||
"""
|
||||
# Validate required text
|
||||
text = kwargs["text"]
|
||||
category = kwargs["category"]
|
||||
segment_ids = kwargs["segment_ids"]
|
||||
confidence = kwargs["confidence"]
|
||||
meeting_id = kwargs.get("meeting_id")
|
||||
|
||||
stripped_text = text.strip()
|
||||
if not stripped_text:
|
||||
raise ValueError("Entity text cannot be empty")
|
||||
|
||||
@@ -7,7 +7,7 @@ from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from datetime import datetime
|
||||
from typing import TYPE_CHECKING, Protocol
|
||||
from typing import TYPE_CHECKING, Protocol, TypedDict, Unpack
|
||||
|
||||
from noteflow.config.constants import ERR_SERVER_RESTARTED
|
||||
|
||||
@@ -19,6 +19,15 @@ if TYPE_CHECKING:
|
||||
)
|
||||
|
||||
|
||||
class DiarizationStatusKwargs(TypedDict, total=False):
|
||||
"""Optional fields for diarization job status updates."""
|
||||
|
||||
segments_updated: int | None
|
||||
speaker_ids: list[str] | None
|
||||
error_message: str | None
|
||||
started_at: datetime | None
|
||||
|
||||
|
||||
class DiarizationJobRepository(Protocol):
|
||||
"""Repository protocol for DiarizationJob operations.
|
||||
|
||||
@@ -52,21 +61,14 @@ class DiarizationJobRepository(Protocol):
|
||||
self,
|
||||
job_id: str,
|
||||
status: int,
|
||||
*,
|
||||
segments_updated: int | None = None,
|
||||
speaker_ids: list[str] | None = None,
|
||||
error_message: str | None = None,
|
||||
started_at: datetime | None = None,
|
||||
**kwargs: Unpack[DiarizationStatusKwargs],
|
||||
) -> bool:
|
||||
"""Update job status and optional fields.
|
||||
|
||||
Args:
|
||||
job_id: Job identifier.
|
||||
status: New status value.
|
||||
segments_updated: Optional segments count.
|
||||
speaker_ids: Optional speaker IDs list.
|
||||
error_message: Optional error message.
|
||||
started_at: Optional job start timestamp.
|
||||
**kwargs: Optional update fields.
|
||||
|
||||
Returns:
|
||||
True if job was updated, False if not found.
|
||||
|
||||
@@ -3,7 +3,8 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from typing import TYPE_CHECKING, Protocol
|
||||
from dataclasses import dataclass
|
||||
from typing import TYPE_CHECKING, Protocol, TypedDict, Unpack
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from uuid import UUID
|
||||
@@ -11,6 +12,25 @@ if TYPE_CHECKING:
|
||||
from noteflow.domain.entities.project import Project, ProjectSettings
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class ProjectCreateOptions:
|
||||
"""Optional parameters for project creation."""
|
||||
|
||||
slug: str | None = None
|
||||
description: str | None = None
|
||||
is_default: bool = False
|
||||
settings: ProjectSettings | None = None
|
||||
|
||||
|
||||
class ProjectCreateKwargs(TypedDict, total=False):
|
||||
"""Legacy keyword arguments for project creation."""
|
||||
|
||||
slug: str | None
|
||||
description: str | None
|
||||
is_default: bool
|
||||
settings: ProjectSettings | None
|
||||
|
||||
|
||||
class ProjectRepository(Protocol):
|
||||
"""Repository protocol for Project operations."""
|
||||
|
||||
@@ -60,10 +80,7 @@ class ProjectRepository(Protocol):
|
||||
project_id: UUID,
|
||||
workspace_id: UUID,
|
||||
name: str,
|
||||
slug: str | None = None,
|
||||
description: str | None = None,
|
||||
is_default: bool = False,
|
||||
settings: ProjectSettings | None = None,
|
||||
**kwargs: Unpack[ProjectCreateKwargs],
|
||||
) -> Project:
|
||||
"""Create a new project.
|
||||
|
||||
@@ -71,10 +88,7 @@ class ProjectRepository(Protocol):
|
||||
project_id: UUID for the new project.
|
||||
workspace_id: Parent workspace UUID.
|
||||
name: Project name.
|
||||
slug: Optional URL slug.
|
||||
description: Optional description.
|
||||
is_default: Whether this is the workspace's default project.
|
||||
settings: Optional project settings.
|
||||
**kwargs: Optional creation settings.
|
||||
|
||||
Returns:
|
||||
Created project.
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from typing import TYPE_CHECKING, Protocol
|
||||
from typing import TYPE_CHECKING, Protocol, TypedDict, Unpack
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from uuid import UUID
|
||||
@@ -16,6 +16,14 @@ if TYPE_CHECKING:
|
||||
)
|
||||
|
||||
|
||||
class WorkspaceCreateKwargs(TypedDict, total=False):
|
||||
"""Optional workspace creation fields."""
|
||||
|
||||
slug: str | None
|
||||
is_default: bool
|
||||
settings: WorkspaceSettings | None
|
||||
|
||||
|
||||
class WorkspaceRepository(Protocol):
|
||||
"""Repository protocol for Workspace operations."""
|
||||
|
||||
@@ -57,9 +65,7 @@ class WorkspaceRepository(Protocol):
|
||||
workspace_id: UUID,
|
||||
name: str,
|
||||
owner_id: UUID,
|
||||
slug: str | None = None,
|
||||
is_default: bool = False,
|
||||
settings: WorkspaceSettings | None = None,
|
||||
**kwargs: Unpack[WorkspaceCreateKwargs],
|
||||
) -> Workspace:
|
||||
"""Create a new workspace.
|
||||
|
||||
@@ -67,9 +73,7 @@ class WorkspaceRepository(Protocol):
|
||||
workspace_id: UUID for the new workspace.
|
||||
name: Workspace name.
|
||||
owner_id: User UUID of the owner.
|
||||
slug: Optional URL slug.
|
||||
is_default: Whether this is the user's default workspace.
|
||||
settings: Optional workspace settings.
|
||||
**kwargs: Optional fields (slug, is_default, settings).
|
||||
|
||||
Returns:
|
||||
Created workspace.
|
||||
|
||||
@@ -7,7 +7,7 @@ from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from datetime import datetime
|
||||
from typing import TYPE_CHECKING, Protocol
|
||||
from typing import TYPE_CHECKING, Protocol, TypedDict, Unpack
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from uuid import UUID
|
||||
@@ -16,6 +16,16 @@ if TYPE_CHECKING:
|
||||
from noteflow.domain.value_objects import AnnotationId, MeetingId, MeetingState
|
||||
|
||||
|
||||
class MeetingListKwargs(TypedDict, total=False):
|
||||
"""Optional arguments for listing meetings."""
|
||||
|
||||
states: list[MeetingState] | None
|
||||
limit: int
|
||||
offset: int
|
||||
sort_desc: bool
|
||||
project_id: UUID | None
|
||||
|
||||
|
||||
class MeetingRepository(Protocol):
|
||||
"""Repository protocol for Meeting aggregate operations."""
|
||||
|
||||
@@ -68,20 +78,12 @@ class MeetingRepository(Protocol):
|
||||
|
||||
async def list_all(
|
||||
self,
|
||||
states: list[MeetingState] | None = None,
|
||||
limit: int = 100,
|
||||
offset: int = 0,
|
||||
sort_desc: bool = True,
|
||||
project_id: UUID | None = None,
|
||||
**kwargs: Unpack[MeetingListKwargs],
|
||||
) -> tuple[Sequence[Meeting], int]:
|
||||
"""List meetings with optional filtering.
|
||||
|
||||
Args:
|
||||
states: Optional list of states to filter by.
|
||||
limit: Maximum number of meetings to return.
|
||||
offset: Number of meetings to skip.
|
||||
sort_desc: Sort by created_at descending if True.
|
||||
project_id: Optional project scope filter.
|
||||
**kwargs: Optional filters (states, limit, offset, sort_desc, project_id).
|
||||
|
||||
Returns:
|
||||
Tuple of (meetings list, total count matching filter).
|
||||
|
||||
@@ -9,7 +9,7 @@ from __future__ import annotations
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, NotRequired, Required, TypedDict, Unpack
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
from noteflow.domain.utils.time import utc_now
|
||||
@@ -88,29 +88,23 @@ class WebhookConfig:
|
||||
@classmethod
|
||||
def create(
|
||||
cls,
|
||||
workspace_id: UUID,
|
||||
url: str,
|
||||
events: list[WebhookEventType],
|
||||
*,
|
||||
name: str = "Webhook",
|
||||
secret: str | None = None,
|
||||
timeout_ms: int = DEFAULT_WEBHOOK_TIMEOUT_MS,
|
||||
max_retries: int = DEFAULT_WEBHOOK_MAX_RETRIES,
|
||||
**kwargs: Unpack["WebhookConfigCreateKwargs"],
|
||||
) -> WebhookConfig:
|
||||
"""Create a new webhook configuration.
|
||||
|
||||
Args:
|
||||
workspace_id: Workspace UUID.
|
||||
url: Target URL for delivery.
|
||||
events: List of event types to subscribe.
|
||||
name: Display name.
|
||||
secret: Optional HMAC signing secret.
|
||||
timeout_ms: Request timeout in milliseconds.
|
||||
max_retries: Maximum retry attempts.
|
||||
**kwargs: Webhook config fields.
|
||||
|
||||
Returns:
|
||||
New WebhookConfig with generated ID and timestamps.
|
||||
"""
|
||||
workspace_id = kwargs["workspace_id"]
|
||||
url = kwargs["url"]
|
||||
events = kwargs["events"]
|
||||
name = kwargs.get("name", "Webhook")
|
||||
secret = kwargs.get("secret")
|
||||
timeout_ms = kwargs.get("timeout_ms", DEFAULT_WEBHOOK_TIMEOUT_MS)
|
||||
max_retries = kwargs.get("max_retries", DEFAULT_WEBHOOK_MAX_RETRIES)
|
||||
now = utc_now()
|
||||
return cls(
|
||||
id=uuid4(),
|
||||
@@ -137,6 +131,28 @@ class WebhookConfig:
|
||||
return event_type in self.events
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class WebhookConfigCreateOptions:
|
||||
"""Optional parameters for webhook config creation."""
|
||||
|
||||
name: str = "Webhook"
|
||||
secret: str | None = None
|
||||
timeout_ms: int = DEFAULT_WEBHOOK_TIMEOUT_MS
|
||||
max_retries: int = DEFAULT_WEBHOOK_MAX_RETRIES
|
||||
|
||||
|
||||
class WebhookConfigCreateKwargs(TypedDict):
|
||||
"""Keyword arguments for webhook config creation."""
|
||||
|
||||
workspace_id: Required[UUID]
|
||||
url: Required[str]
|
||||
events: Required[list[WebhookEventType]]
|
||||
name: NotRequired[str]
|
||||
secret: NotRequired[str | None]
|
||||
timeout_ms: NotRequired[int]
|
||||
max_retries: NotRequired[int]
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class DeliveryResult:
|
||||
"""Result of a webhook delivery attempt.
|
||||
|
||||
146
src/noteflow/grpc/_cli.py
Normal file
146
src/noteflow/grpc/_cli.py
Normal file
@@ -0,0 +1,146 @@
|
||||
"""CLI helpers for the gRPC server entrypoint."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from noteflow.config.constants import DEFAULT_GRPC_PORT
|
||||
from noteflow.infrastructure.asr.engine import VALID_MODEL_SIZES
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
|
||||
from ._config import (
|
||||
DEFAULT_BIND_ADDRESS,
|
||||
DEFAULT_MODEL,
|
||||
AsrConfig,
|
||||
DiarizationConfig,
|
||||
GrpcServerConfig,
|
||||
)
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.config.settings import Settings
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
"""Parse command-line arguments for the gRPC server."""
|
||||
parser = argparse.ArgumentParser(description="NoteFlow gRPC Server")
|
||||
parser.add_argument(
|
||||
"-p",
|
||||
"--port",
|
||||
type=int,
|
||||
default=DEFAULT_GRPC_PORT,
|
||||
help=f"Port to listen on (default: {DEFAULT_GRPC_PORT})",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-m",
|
||||
"--model",
|
||||
type=str,
|
||||
default=DEFAULT_MODEL,
|
||||
choices=list(VALID_MODEL_SIZES),
|
||||
help=f"ASR model size (default: {DEFAULT_MODEL})",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-d",
|
||||
"--device",
|
||||
type=str,
|
||||
default="cpu",
|
||||
choices=["cpu", "cuda"],
|
||||
help="ASR device (default: cpu)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-c",
|
||||
"--compute-type",
|
||||
type=str,
|
||||
default="int8",
|
||||
choices=["int8", "float16", "float32"],
|
||||
help="ASR compute type (default: int8)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--database-url",
|
||||
type=str,
|
||||
default=None,
|
||||
help="PostgreSQL database URL (overrides NOTEFLOW_DATABASE_URL)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-v",
|
||||
"--verbose",
|
||||
action="store_true",
|
||||
help="Enable verbose logging",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--diarization",
|
||||
action="store_true",
|
||||
help="Enable speaker diarization (requires pyannote.audio)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--diarization-hf-token",
|
||||
type=str,
|
||||
default=None,
|
||||
help="HuggingFace token for pyannote models (overrides NOTEFLOW_DIARIZATION_HF_TOKEN)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--diarization-device",
|
||||
type=str,
|
||||
default="auto",
|
||||
choices=["auto", "cpu", "cuda", "mps"],
|
||||
help="Device for diarization (default: auto)",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def build_config_from_args(args: argparse.Namespace, settings: Settings | None) -> GrpcServerConfig:
|
||||
"""Build server configuration from CLI arguments and settings.
|
||||
|
||||
CLI arguments take precedence over environment settings.
|
||||
"""
|
||||
database_url = args.database_url
|
||||
if not database_url and settings:
|
||||
database_url = str(settings.database_url)
|
||||
if not database_url:
|
||||
logger.warning("No database URL configured, running in-memory mode")
|
||||
|
||||
diarization_enabled = args.diarization
|
||||
diarization_hf_token = args.diarization_hf_token
|
||||
diarization_device = args.diarization_device
|
||||
diarization_streaming_latency: float | None = None
|
||||
diarization_min_speakers: int | None = None
|
||||
diarization_max_speakers: int | None = None
|
||||
diarization_refinement_enabled = True
|
||||
|
||||
if settings and not diarization_enabled:
|
||||
diarization_enabled = settings.diarization_enabled
|
||||
if settings and not diarization_hf_token:
|
||||
diarization_hf_token = settings.diarization_hf_token
|
||||
if settings and diarization_device == "auto":
|
||||
diarization_device = settings.diarization_device
|
||||
if settings:
|
||||
diarization_streaming_latency = settings.diarization_streaming_latency
|
||||
diarization_min_speakers = settings.diarization_min_speakers
|
||||
diarization_max_speakers = settings.diarization_max_speakers
|
||||
diarization_refinement_enabled = settings.diarization_refinement_enabled
|
||||
|
||||
bind_address = DEFAULT_BIND_ADDRESS
|
||||
if settings:
|
||||
bind_address = settings.grpc_bind_address
|
||||
|
||||
return GrpcServerConfig(
|
||||
port=args.port,
|
||||
bind_address=bind_address,
|
||||
asr=AsrConfig(
|
||||
model=args.model,
|
||||
device=args.device,
|
||||
compute_type=args.compute_type,
|
||||
),
|
||||
database_url=database_url,
|
||||
diarization=DiarizationConfig(
|
||||
enabled=diarization_enabled,
|
||||
hf_token=diarization_hf_token,
|
||||
device=diarization_device,
|
||||
streaming_latency=diarization_streaming_latency,
|
||||
min_speakers=diarization_min_speakers,
|
||||
max_speakers=diarization_max_speakers,
|
||||
refinement_enabled=diarization_refinement_enabled,
|
||||
),
|
||||
)
|
||||
@@ -2,19 +2,41 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import TYPE_CHECKING, cast
|
||||
from typing import TYPE_CHECKING, NotRequired, Required, TypedDict, Unpack, cast
|
||||
|
||||
import grpc
|
||||
|
||||
from noteflow.grpc._client_mixins.converters import annotation_type_to_proto, proto_to_annotation_info
|
||||
from noteflow.grpc._types import AnnotationInfo
|
||||
from noteflow.grpc.proto import noteflow_pb2
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.logging import get_client_rate_limiter, get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.grpc._client_mixins.protocols import ClientHost
|
||||
|
||||
|
||||
class _AnnotationCreateKwargs(TypedDict):
|
||||
"""Keyword arguments for creating an annotation."""
|
||||
|
||||
meeting_id: Required[str]
|
||||
annotation_type: Required[str]
|
||||
text: Required[str]
|
||||
start_time: Required[float]
|
||||
end_time: Required[float]
|
||||
segment_ids: NotRequired[list[int] | None]
|
||||
|
||||
|
||||
class _AnnotationUpdateKwargs(TypedDict, total=False):
|
||||
"""Keyword arguments for updating an annotation."""
|
||||
|
||||
annotation_type: str | None
|
||||
text: str | None
|
||||
start_time: float | None
|
||||
end_time: float | None
|
||||
segment_ids: list[int] | None
|
||||
|
||||
logger = get_logger(__name__)
|
||||
_rate_limiter = get_client_rate_limiter()
|
||||
RpcError = cast(type[Exception], getattr(grpc, "RpcError", Exception))
|
||||
|
||||
|
||||
@@ -23,30 +45,27 @@ class AnnotationClientMixin:
|
||||
|
||||
def add_annotation(
|
||||
self: ClientHost,
|
||||
meeting_id: str,
|
||||
annotation_type: str,
|
||||
text: str,
|
||||
start_time: float,
|
||||
end_time: float,
|
||||
segment_ids: list[int] | None = None,
|
||||
**kwargs: Unpack[_AnnotationCreateKwargs],
|
||||
) -> AnnotationInfo | None:
|
||||
"""Add an annotation to a meeting.
|
||||
|
||||
Args:
|
||||
meeting_id: Meeting ID.
|
||||
annotation_type: Type of annotation (action_item, decision, note).
|
||||
text: Annotation text.
|
||||
start_time: Start time in seconds.
|
||||
end_time: End time in seconds.
|
||||
segment_ids: Optional list of linked segment IDs.
|
||||
**kwargs: Annotation fields.
|
||||
|
||||
Returns:
|
||||
AnnotationInfo or None if request fails.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("add_annotation")
|
||||
return None
|
||||
|
||||
try:
|
||||
meeting_id = kwargs["meeting_id"]
|
||||
annotation_type = kwargs["annotation_type"]
|
||||
text = kwargs["text"]
|
||||
start_time = kwargs["start_time"]
|
||||
end_time = kwargs["end_time"]
|
||||
segment_ids = kwargs.get("segment_ids") or []
|
||||
proto_type = annotation_type_to_proto(annotation_type)
|
||||
request = noteflow_pb2.AddAnnotationRequest(
|
||||
meeting_id=meeting_id,
|
||||
@@ -54,7 +73,7 @@ class AnnotationClientMixin:
|
||||
text=text,
|
||||
start_time=start_time,
|
||||
end_time=end_time,
|
||||
segment_ids=segment_ids or [],
|
||||
segment_ids=segment_ids,
|
||||
)
|
||||
response = self.stub.AddAnnotation(request)
|
||||
return proto_to_annotation_info(response)
|
||||
@@ -72,6 +91,7 @@ class AnnotationClientMixin:
|
||||
AnnotationInfo or None if not found.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("get_annotation")
|
||||
return None
|
||||
|
||||
try:
|
||||
@@ -99,6 +119,7 @@ class AnnotationClientMixin:
|
||||
List of AnnotationInfo.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("list_annotations")
|
||||
return []
|
||||
|
||||
try:
|
||||
@@ -117,29 +138,27 @@ class AnnotationClientMixin:
|
||||
def update_annotation(
|
||||
self: ClientHost,
|
||||
annotation_id: str,
|
||||
annotation_type: str | None = None,
|
||||
text: str | None = None,
|
||||
start_time: float | None = None,
|
||||
end_time: float | None = None,
|
||||
segment_ids: list[int] | None = None,
|
||||
**kwargs: Unpack[_AnnotationUpdateKwargs],
|
||||
) -> AnnotationInfo | None:
|
||||
"""Update an existing annotation.
|
||||
|
||||
Args:
|
||||
annotation_id: Annotation ID.
|
||||
annotation_type: Optional new type.
|
||||
text: Optional new text.
|
||||
start_time: Optional new start time.
|
||||
end_time: Optional new end time.
|
||||
segment_ids: Optional new segment IDs.
|
||||
**kwargs: Optional annotation fields.
|
||||
|
||||
Returns:
|
||||
Updated AnnotationInfo or None if request fails.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("update_annotation")
|
||||
return None
|
||||
|
||||
try:
|
||||
annotation_type = kwargs.get("annotation_type")
|
||||
text = kwargs.get("text")
|
||||
start_time = kwargs.get("start_time")
|
||||
end_time = kwargs.get("end_time")
|
||||
segment_ids = kwargs.get("segment_ids")
|
||||
proto_type = (
|
||||
annotation_type_to_proto(annotation_type)
|
||||
if annotation_type
|
||||
@@ -169,6 +188,7 @@ class AnnotationClientMixin:
|
||||
True if deleted successfully.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("delete_annotation")
|
||||
return False
|
||||
|
||||
try:
|
||||
|
||||
@@ -9,12 +9,13 @@ import grpc
|
||||
from noteflow.grpc._client_mixins.converters import job_status_to_str
|
||||
from noteflow.grpc._types import DiarizationResult, RenameSpeakerResult
|
||||
from noteflow.grpc.proto import noteflow_pb2
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.logging import get_client_rate_limiter, get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.grpc._client_mixins.protocols import ClientHost
|
||||
|
||||
logger = get_logger(__name__)
|
||||
_rate_limiter = get_client_rate_limiter()
|
||||
|
||||
|
||||
class DiarizationClientMixin:
|
||||
@@ -38,6 +39,7 @@ class DiarizationClientMixin:
|
||||
DiarizationResult with job status or None if request fails.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("refine_speaker_diarization")
|
||||
return None
|
||||
|
||||
try:
|
||||
@@ -71,6 +73,7 @@ class DiarizationClientMixin:
|
||||
DiarizationResult with current status or None if request fails.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("get_diarization_job_status")
|
||||
return None
|
||||
|
||||
try:
|
||||
@@ -105,6 +108,7 @@ class DiarizationClientMixin:
|
||||
RenameSpeakerResult or None if request fails.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("rename_speaker")
|
||||
return None
|
||||
|
||||
try:
|
||||
|
||||
@@ -9,12 +9,13 @@ import grpc
|
||||
from noteflow.grpc._client_mixins.converters import export_format_to_proto
|
||||
from noteflow.grpc._types import ExportResult
|
||||
from noteflow.grpc.proto import noteflow_pb2
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.logging import get_client_rate_limiter, get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.grpc._client_mixins.protocols import ClientHost
|
||||
|
||||
logger = get_logger(__name__)
|
||||
_rate_limiter = get_client_rate_limiter()
|
||||
|
||||
|
||||
class ExportClientMixin:
|
||||
@@ -35,6 +36,7 @@ class ExportClientMixin:
|
||||
ExportResult or None if request fails.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("export_transcript")
|
||||
return None
|
||||
|
||||
try:
|
||||
|
||||
@@ -9,12 +9,13 @@ import grpc
|
||||
from noteflow.grpc._client_mixins.converters import proto_to_meeting_info
|
||||
from noteflow.grpc._types import MeetingInfo, TranscriptSegment
|
||||
from noteflow.grpc.proto import noteflow_pb2
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.logging import get_client_rate_limiter, get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.grpc._client_mixins.protocols import ClientHost
|
||||
|
||||
logger = get_logger(__name__)
|
||||
_rate_limiter = get_client_rate_limiter()
|
||||
|
||||
|
||||
class MeetingClientMixin:
|
||||
@@ -30,6 +31,7 @@ class MeetingClientMixin:
|
||||
MeetingInfo or None if request fails.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("create_meeting")
|
||||
return None
|
||||
|
||||
try:
|
||||
@@ -50,6 +52,7 @@ class MeetingClientMixin:
|
||||
Updated MeetingInfo or None if request fails.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("stop_meeting")
|
||||
return None
|
||||
|
||||
try:
|
||||
@@ -70,6 +73,7 @@ class MeetingClientMixin:
|
||||
MeetingInfo or None if not found.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("get_meeting")
|
||||
return None
|
||||
|
||||
try:
|
||||
@@ -94,6 +98,7 @@ class MeetingClientMixin:
|
||||
List of TranscriptSegment or empty list if not found.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("get_meeting_segments")
|
||||
return []
|
||||
|
||||
try:
|
||||
@@ -131,6 +136,7 @@ class MeetingClientMixin:
|
||||
List of MeetingInfo.
|
||||
"""
|
||||
if not self.stub:
|
||||
_rate_limiter.warn_stub_missing("list_meetings")
|
||||
return []
|
||||
|
||||
try:
|
||||
|
||||
@@ -152,3 +152,7 @@ class ClientHost(Protocol):
|
||||
def stop_streaming(self) -> None:
|
||||
"""Stop streaming audio."""
|
||||
...
|
||||
|
||||
def handle_stream_response(self, response: ProtoTranscriptUpdate) -> None:
|
||||
"""Handle a single transcript update from the stream."""
|
||||
...
|
||||
|
||||
@@ -14,15 +14,16 @@ from noteflow.config.constants import DEFAULT_SAMPLE_RATE
|
||||
from noteflow.grpc._config import STREAMING_CONFIG
|
||||
from noteflow.grpc._types import ConnectionCallback, TranscriptCallback, TranscriptSegment
|
||||
from noteflow.grpc.proto import noteflow_pb2
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.logging import get_client_rate_limiter, get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import numpy as np
|
||||
from numpy.typing import NDArray
|
||||
|
||||
from noteflow.grpc._client_mixins.protocols import ClientHost
|
||||
from noteflow.grpc._client_mixins.protocols import ClientHost, ProtoTranscriptUpdate
|
||||
|
||||
logger = get_logger(__name__)
|
||||
_rate_limiter = get_client_rate_limiter()
|
||||
|
||||
|
||||
class StreamingClientMixin:
|
||||
@@ -46,7 +47,7 @@ class StreamingClientMixin:
|
||||
True if streaming started.
|
||||
"""
|
||||
if not self.stub:
|
||||
logger.error("Not connected")
|
||||
_rate_limiter.warn_stub_missing("start_streaming")
|
||||
return False
|
||||
|
||||
if self.stream_thread and self.stream_thread.is_alive():
|
||||
@@ -144,35 +145,42 @@ class StreamingClientMixin:
|
||||
for response in responses:
|
||||
if self.stop_streaming_event.is_set():
|
||||
break
|
||||
|
||||
if response.update_type == noteflow_pb2.UPDATE_TYPE_FINAL:
|
||||
segment = TranscriptSegment(
|
||||
segment_id=response.segment.segment_id,
|
||||
text=response.segment.text,
|
||||
start_time=response.segment.start_time,
|
||||
end_time=response.segment.end_time,
|
||||
language=response.segment.language,
|
||||
is_final=True,
|
||||
speaker_id=response.segment.speaker_id,
|
||||
speaker_confidence=response.segment.speaker_confidence,
|
||||
)
|
||||
self.notify_transcript(segment)
|
||||
|
||||
elif response.update_type == noteflow_pb2.UPDATE_TYPE_PARTIAL:
|
||||
segment = TranscriptSegment(
|
||||
segment_id=0,
|
||||
text=response.partial_text,
|
||||
start_time=0,
|
||||
end_time=0,
|
||||
language="",
|
||||
is_final=False,
|
||||
)
|
||||
self.notify_transcript(segment)
|
||||
self.handle_stream_response(response)
|
||||
|
||||
except grpc.RpcError as e:
|
||||
logger.error("Stream error: %s", e)
|
||||
self.notify_connection(False, f"Stream error: {e}")
|
||||
|
||||
def handle_stream_response(
|
||||
self: ClientHost,
|
||||
response: ProtoTranscriptUpdate,
|
||||
) -> None:
|
||||
"""Handle a single transcript update from the stream."""
|
||||
if response.update_type == noteflow_pb2.UPDATE_TYPE_FINAL:
|
||||
segment = TranscriptSegment(
|
||||
segment_id=response.segment.segment_id,
|
||||
text=response.segment.text,
|
||||
start_time=response.segment.start_time,
|
||||
end_time=response.segment.end_time,
|
||||
language=response.segment.language,
|
||||
is_final=True,
|
||||
speaker_id=response.segment.speaker_id,
|
||||
speaker_confidence=response.segment.speaker_confidence,
|
||||
)
|
||||
self.notify_transcript(segment)
|
||||
return
|
||||
|
||||
if response.update_type == noteflow_pb2.UPDATE_TYPE_PARTIAL:
|
||||
segment = TranscriptSegment(
|
||||
segment_id=0,
|
||||
text=response.partial_text,
|
||||
start_time=0,
|
||||
end_time=0,
|
||||
language="",
|
||||
is_final=False,
|
||||
)
|
||||
self.notify_transcript(segment)
|
||||
|
||||
def notify_transcript(self: ClientHost, segment: TranscriptSegment) -> None:
|
||||
"""Notify transcript callback.
|
||||
|
||||
|
||||
@@ -89,22 +89,28 @@ class GrpcServerConfig:
|
||||
database_url: str | None = None
|
||||
diarization: DiarizationConfig = field(default_factory=DiarizationConfig)
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class Args:
|
||||
"""Flat arguments for constructing a GrpcServerConfig."""
|
||||
|
||||
port: int
|
||||
asr_model: str
|
||||
asr_device: str
|
||||
asr_compute_type: str
|
||||
bind_address: str = DEFAULT_BIND_ADDRESS
|
||||
database_url: str | None = None
|
||||
diarization_enabled: bool = False
|
||||
diarization_hf_token: str | None = None
|
||||
diarization_device: str = DEFAULT_DIARIZATION_DEVICE
|
||||
diarization_streaming_latency: float | None = None
|
||||
diarization_min_speakers: int | None = None
|
||||
diarization_max_speakers: int | None = None
|
||||
diarization_refinement_enabled: bool = True
|
||||
|
||||
@classmethod
|
||||
def from_args(
|
||||
cls,
|
||||
port: int,
|
||||
asr_model: str,
|
||||
asr_device: str,
|
||||
asr_compute_type: str,
|
||||
bind_address: str = DEFAULT_BIND_ADDRESS,
|
||||
database_url: str | None = None,
|
||||
diarization_enabled: bool = False,
|
||||
diarization_hf_token: str | None = None,
|
||||
diarization_device: str = DEFAULT_DIARIZATION_DEVICE,
|
||||
diarization_streaming_latency: float | None = None,
|
||||
diarization_min_speakers: int | None = None,
|
||||
diarization_max_speakers: int | None = None,
|
||||
diarization_refinement_enabled: bool = True,
|
||||
args: Args,
|
||||
) -> GrpcServerConfig:
|
||||
"""Create config from flat argument values.
|
||||
|
||||
@@ -112,22 +118,22 @@ class GrpcServerConfig:
|
||||
run_server() signature to structured configuration.
|
||||
"""
|
||||
return cls(
|
||||
port=port,
|
||||
bind_address=bind_address,
|
||||
port=args.port,
|
||||
bind_address=args.bind_address,
|
||||
asr=AsrConfig(
|
||||
model=asr_model,
|
||||
device=asr_device,
|
||||
compute_type=asr_compute_type,
|
||||
model=args.asr_model,
|
||||
device=args.asr_device,
|
||||
compute_type=args.asr_compute_type,
|
||||
),
|
||||
database_url=database_url,
|
||||
database_url=args.database_url,
|
||||
diarization=DiarizationConfig(
|
||||
enabled=diarization_enabled,
|
||||
hf_token=diarization_hf_token,
|
||||
device=diarization_device,
|
||||
streaming_latency=diarization_streaming_latency,
|
||||
min_speakers=diarization_min_speakers,
|
||||
max_speakers=diarization_max_speakers,
|
||||
refinement_enabled=diarization_refinement_enabled,
|
||||
enabled=args.diarization_enabled,
|
||||
hf_token=args.diarization_hf_token,
|
||||
device=args.diarization_device,
|
||||
streaming_latency=args.diarization_streaming_latency,
|
||||
min_speakers=args.diarization_min_speakers,
|
||||
max_speakers=args.diarization_max_speakers,
|
||||
refinement_enabled=args.diarization_refinement_enabled,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
"""gRPC service mixins for NoteFlowServicer."""
|
||||
|
||||
from ._types import GrpcContext, GrpcStatusContext
|
||||
from .annotation import AnnotationMixin
|
||||
from .calendar import CalendarMixin
|
||||
from .diarization import DiarizationMixin
|
||||
@@ -17,6 +18,8 @@ from .sync import SyncMixin
|
||||
from .webhooks import WebhooksMixin
|
||||
|
||||
__all__ = [
|
||||
"GrpcContext",
|
||||
"GrpcStatusContext",
|
||||
"AnnotationMixin",
|
||||
"CalendarMixin",
|
||||
"DiarizationJobMixin",
|
||||
|
||||
@@ -6,10 +6,20 @@ from typing import TYPE_CHECKING
|
||||
from uuid import UUID
|
||||
|
||||
from noteflow.domain.value_objects import AnnotationId, MeetingId
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ..errors import AbortableContext
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
def _truncate_for_log(value: str, max_len: int = 8) -> str:
|
||||
"""Truncate a value for safe logging (PII redaction)."""
|
||||
if len(value) > max_len:
|
||||
return f"{value[:max_len]}..."
|
||||
return value
|
||||
|
||||
|
||||
def parse_meeting_id(meeting_id_str: str) -> MeetingId:
|
||||
"""Parse string to MeetingId.
|
||||
@@ -48,6 +58,11 @@ async def parse_meeting_id_or_abort(
|
||||
try:
|
||||
return MeetingId(UUID(meeting_id_str))
|
||||
except ValueError:
|
||||
logger.warning(
|
||||
"invalid_meeting_id_format",
|
||||
meeting_id_truncated=_truncate_for_log(meeting_id_str),
|
||||
meeting_id_length=len(meeting_id_str),
|
||||
)
|
||||
await abort_invalid_argument(context, "Invalid meeting_id")
|
||||
|
||||
|
||||
@@ -65,6 +80,11 @@ def parse_meeting_id_or_none(meeting_id_str: str) -> MeetingId | None:
|
||||
try:
|
||||
return MeetingId(UUID(meeting_id_str))
|
||||
except ValueError:
|
||||
logger.warning(
|
||||
"invalid_meeting_id_format",
|
||||
meeting_id_truncated=_truncate_for_log(meeting_id_str),
|
||||
meeting_id_length=len(meeting_id_str),
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
|
||||
@@ -175,6 +175,12 @@ class JobsMixin(JobStatusMixin):
|
||||
num_speakers = request.num_speakers or None
|
||||
task = asyncio.create_task(self.run_diarization_job(job_id, num_speakers))
|
||||
self.diarization_tasks[job_id] = task
|
||||
logger.info(
|
||||
"diarization_task_created",
|
||||
job_id=job_id,
|
||||
meeting_id=request.meeting_id,
|
||||
num_speakers=num_speakers,
|
||||
)
|
||||
|
||||
return noteflow_pb2.RefineSpeakerDiarizationResponse(
|
||||
segments_updated=0, job_id=job_id, status=noteflow_pb2.JOB_STATUS_QUEUED
|
||||
|
||||
@@ -14,6 +14,9 @@ from ..converters import parse_meeting_id_or_none
|
||||
from ._speaker import apply_speaker_to_segment
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.domain.entities import Segment
|
||||
from noteflow.domain.ports.unit_of_work import UnitOfWork
|
||||
|
||||
from ..protocols import ServicerHost
|
||||
|
||||
logger = get_logger(__name__)
|
||||
@@ -80,15 +83,24 @@ class RefinementMixin:
|
||||
async with self.create_repository_provider() as repo:
|
||||
segments = await repo.segments.get_by_meeting(parsed_meeting_id)
|
||||
for segment in segments:
|
||||
if apply_speaker_to_segment(segment, turns):
|
||||
# For DB segments with db_id, use update_speaker
|
||||
if segment.db_id is not None:
|
||||
await repo.segments.update_speaker(
|
||||
segment.db_id,
|
||||
segment.speaker_id,
|
||||
segment.speaker_confidence,
|
||||
)
|
||||
updated_count += 1
|
||||
if not apply_speaker_to_segment(segment, turns):
|
||||
continue
|
||||
await _persist_speaker_update(repo, segment)
|
||||
updated_count += 1
|
||||
await repo.commit()
|
||||
|
||||
return updated_count
|
||||
|
||||
|
||||
async def _persist_speaker_update(
|
||||
repo: UnitOfWork,
|
||||
segment: Segment,
|
||||
) -> None:
|
||||
"""Persist speaker update for a segment if it has a DB identity."""
|
||||
if segment.db_id is None:
|
||||
return
|
||||
await repo.segments.update_speaker(
|
||||
segment.db_id,
|
||||
segment.speaker_id,
|
||||
segment.speaker_confidence,
|
||||
)
|
||||
|
||||
@@ -15,6 +15,8 @@ from .._types import GrpcContext
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
|
||||
from noteflow.domain.ports.unit_of_work import UnitOfWork
|
||||
|
||||
from ..protocols import ServicerHost
|
||||
|
||||
|
||||
@@ -97,18 +99,10 @@ class SpeakerMixin:
|
||||
segments = await repo.segments.get_by_meeting(meeting_id)
|
||||
|
||||
for segment in segments:
|
||||
if segment.speaker_id == request.old_speaker_id:
|
||||
# For DB segments with db_id, use update_speaker
|
||||
if segment.db_id is not None:
|
||||
await repo.segments.update_speaker(
|
||||
segment.db_id,
|
||||
request.new_speaker_name,
|
||||
segment.speaker_confidence,
|
||||
)
|
||||
else:
|
||||
# Memory segments: update directly
|
||||
segment.speaker_id = request.new_speaker_name
|
||||
updated_count += 1
|
||||
if segment.speaker_id != request.old_speaker_id:
|
||||
continue
|
||||
await _apply_speaker_rename(repo, segment, request.new_speaker_name)
|
||||
updated_count += 1
|
||||
|
||||
await repo.commit()
|
||||
|
||||
@@ -116,3 +110,19 @@ class SpeakerMixin:
|
||||
segments_updated=updated_count,
|
||||
success=updated_count > 0,
|
||||
)
|
||||
|
||||
|
||||
async def _apply_speaker_rename(
|
||||
repo: UnitOfWork,
|
||||
segment: Segment,
|
||||
new_speaker_name: str,
|
||||
) -> None:
|
||||
"""Persist speaker rename for a segment."""
|
||||
if segment.db_id is not None:
|
||||
await repo.segments.update_speaker(
|
||||
segment.db_id,
|
||||
new_speaker_name,
|
||||
segment.speaker_confidence,
|
||||
)
|
||||
return
|
||||
segment.speaker_id = new_speaker_name
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from dataclasses import dataclass
|
||||
from functools import partial
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
@@ -14,6 +15,9 @@ from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.persistence.repositories import StreamingTurn
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.grpc.stream_state import MeetingStreamState
|
||||
from noteflow.infrastructure.diarization import DiarizationSession
|
||||
|
||||
from ..protocols import ServicerHost
|
||||
|
||||
logger = get_logger(__name__)
|
||||
@@ -46,33 +50,75 @@ class StreamingDiarizationMixin:
|
||||
|
||||
loop = asyncio.get_running_loop()
|
||||
|
||||
session = await self.ensure_diarization_session(meeting_id, state, loop)
|
||||
if session is None:
|
||||
return
|
||||
|
||||
context = DiarizationChunkContext(meeting_id=meeting_id, state=state)
|
||||
new_turns = await self.process_diarization_chunk(
|
||||
context,
|
||||
session,
|
||||
audio,
|
||||
loop,
|
||||
)
|
||||
if not new_turns:
|
||||
return
|
||||
|
||||
# Populate diarization_turns for compatibility with maybe_assign_speaker
|
||||
state.diarization_turns.extend(new_turns)
|
||||
state.diarization_stream_time = session.stream_time
|
||||
|
||||
# Persist turns immediately for crash resilience (DB only)
|
||||
await self.persist_streaming_turns(meeting_id, list(new_turns))
|
||||
|
||||
async def ensure_diarization_session(
|
||||
self: ServicerHost,
|
||||
meeting_id: str,
|
||||
state: MeetingStreamState,
|
||||
loop: asyncio.AbstractEventLoop,
|
||||
) -> DiarizationSession | None:
|
||||
"""Return an initialized diarization session or None on failure."""
|
||||
# Get or create per-meeting session under lock
|
||||
async with self.diarization_lock:
|
||||
session = state.diarization_session
|
||||
if session is None:
|
||||
try:
|
||||
session = await loop.run_in_executor(
|
||||
None,
|
||||
self.diarization_engine.create_streaming_session,
|
||||
meeting_id,
|
||||
)
|
||||
prior_turns = state.diarization_turns
|
||||
prior_stream_time = state.diarization_stream_time
|
||||
if prior_turns or prior_stream_time:
|
||||
session.restore(prior_turns, stream_time=prior_stream_time)
|
||||
state.diarization_session = session
|
||||
except (RuntimeError, ValueError) as exc:
|
||||
logger.warning(
|
||||
"Streaming diarization disabled for meeting %s: %s",
|
||||
meeting_id,
|
||||
exc,
|
||||
)
|
||||
state.diarization_streaming_failed = True
|
||||
return
|
||||
if session is not None:
|
||||
return session
|
||||
# Guard: diarization_engine checked by caller (process_streaming_diarization)
|
||||
engine = self.diarization_engine
|
||||
if engine is None:
|
||||
return None
|
||||
try:
|
||||
session = await loop.run_in_executor(
|
||||
None,
|
||||
engine.create_streaming_session,
|
||||
meeting_id,
|
||||
)
|
||||
prior_turns = state.diarization_turns
|
||||
prior_stream_time = state.diarization_stream_time
|
||||
if prior_turns or prior_stream_time:
|
||||
session.restore(prior_turns, stream_time=prior_stream_time)
|
||||
state.diarization_session = session
|
||||
return session
|
||||
except (RuntimeError, ValueError) as exc:
|
||||
logger.warning(
|
||||
"Streaming diarization disabled for meeting %s: %s",
|
||||
meeting_id,
|
||||
exc,
|
||||
)
|
||||
state.diarization_streaming_failed = True
|
||||
return None
|
||||
|
||||
async def process_diarization_chunk(
|
||||
self: ServicerHost,
|
||||
context: "DiarizationChunkContext",
|
||||
session: DiarizationSession,
|
||||
audio: NDArray[np.float32],
|
||||
loop: asyncio.AbstractEventLoop,
|
||||
) -> list[SpeakerTurn] | None:
|
||||
"""Process a diarization chunk, returning new turns or None on failure."""
|
||||
# Process chunk in thread pool (outside lock for parallelism)
|
||||
try:
|
||||
new_turns = await loop.run_in_executor(
|
||||
turns = await loop.run_in_executor(
|
||||
None,
|
||||
partial(
|
||||
session.process_chunk,
|
||||
@@ -80,22 +126,23 @@ class StreamingDiarizationMixin:
|
||||
sample_rate=self.DEFAULT_SAMPLE_RATE,
|
||||
),
|
||||
)
|
||||
return list(turns)
|
||||
except (RuntimeError, OSError) as exc:
|
||||
logger.warning(
|
||||
"Streaming diarization failed for meeting %s: %s",
|
||||
meeting_id,
|
||||
context.meeting_id,
|
||||
exc,
|
||||
)
|
||||
state.diarization_streaming_failed = True
|
||||
return
|
||||
context.state.diarization_streaming_failed = True
|
||||
return None
|
||||
|
||||
# Populate diarization_turns for compatibility with maybe_assign_speaker
|
||||
if new_turns:
|
||||
state.diarization_turns.extend(new_turns)
|
||||
state.diarization_stream_time = session.stream_time
|
||||
|
||||
# Persist turns immediately for crash resilience (DB only)
|
||||
await self.persist_streaming_turns(meeting_id, list(new_turns))
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class DiarizationChunkContext:
|
||||
"""Context for processing a diarization chunk."""
|
||||
|
||||
meeting_id: str
|
||||
state: MeetingStreamState
|
||||
|
||||
async def persist_streaming_turns(
|
||||
self: ServicerHost,
|
||||
|
||||
@@ -75,7 +75,12 @@ class DiarizationJobMixin:
|
||||
return float(settings.diarization_job_ttl_hours * SECONDS_PER_HOUR)
|
||||
# INTENTIONAL BROAD HANDLER: Fallback for testing environments
|
||||
# - Settings may fail to load in unit tests without full config
|
||||
except Exception:
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"diarization_ttl_settings_fallback",
|
||||
error_type=type(exc).__name__,
|
||||
fallback_ttl_seconds=_DEFAULT_JOB_TTL_SECONDS,
|
||||
)
|
||||
return _DEFAULT_JOB_TTL_SECONDS
|
||||
|
||||
async def prune_diarization_jobs(self: DiarizationJobServicer) -> None:
|
||||
@@ -123,33 +128,7 @@ class DiarizationJobMixin:
|
||||
await abort_not_found(context, "Diarization job", request.job_id)
|
||||
raise # Unreachable but helps type checker
|
||||
|
||||
# Calculate progress percentage (time-based for running jobs)
|
||||
progress_percent = 0.0
|
||||
if job.status == noteflow_pb2.JOB_STATUS_COMPLETED:
|
||||
progress_percent = 100.0
|
||||
elif job.status == noteflow_pb2.JOB_STATUS_RUNNING and job.started_at is not None:
|
||||
# All datetimes should now be timezone-aware UTC.
|
||||
now = utc_now()
|
||||
# Ensure started_at is also aware; should be UTC from repository.
|
||||
started: datetime = job.started_at
|
||||
elapsed = (now - started).total_seconds()
|
||||
audio_duration = job.audio_duration_seconds
|
||||
if audio_duration is not None and audio_duration > 0:
|
||||
# ~10 seconds processing per 60 seconds audio
|
||||
estimated_duration = audio_duration * 0.17
|
||||
progress_percent = min(95.0, (elapsed / estimated_duration) * 100)
|
||||
else:
|
||||
# Fallback: assume 2 minutes total
|
||||
progress_percent = min(95.0, (elapsed / 120) * 100)
|
||||
|
||||
return noteflow_pb2.DiarizationJobStatus(
|
||||
job_id=job.job_id,
|
||||
status=int(job.status),
|
||||
segments_updated=job.segments_updated,
|
||||
speaker_ids=job.speaker_ids,
|
||||
error_message=job.error_message,
|
||||
progress_percent=progress_percent,
|
||||
)
|
||||
return _build_job_status(job)
|
||||
|
||||
async def CancelDiarizationJob(
|
||||
self: DiarizationJobServicer,
|
||||
@@ -226,30 +205,38 @@ class DiarizationJobMixin:
|
||||
active_jobs = await repo.diarization_jobs.get_all_active()
|
||||
|
||||
for job in active_jobs:
|
||||
# Calculate progress percentage (time-based for running jobs)
|
||||
progress_percent = 0.0
|
||||
if job.status == noteflow_pb2.JOB_STATUS_RUNNING and job.started_at is not None:
|
||||
now = utc_now()
|
||||
started: datetime = job.started_at
|
||||
elapsed = (now - started).total_seconds()
|
||||
audio_duration = job.audio_duration_seconds
|
||||
if audio_duration is not None and audio_duration > 0:
|
||||
# ~10 seconds processing per 60 seconds audio
|
||||
estimated_duration = audio_duration * 0.17
|
||||
progress_percent = min(95.0, (elapsed / estimated_duration) * 100)
|
||||
else:
|
||||
# Fallback: assume 2 minutes total
|
||||
progress_percent = min(95.0, (elapsed / 120) * 100)
|
||||
|
||||
job_status = noteflow_pb2.DiarizationJobStatus(
|
||||
job_id=job.job_id,
|
||||
status=int(job.status),
|
||||
segments_updated=job.segments_updated,
|
||||
speaker_ids=job.speaker_ids,
|
||||
error_message=job.error_message,
|
||||
progress_percent=progress_percent,
|
||||
)
|
||||
response.jobs.append(job_status)
|
||||
response.jobs.append(_build_job_status(job))
|
||||
|
||||
logger.debug("Returning %d active diarization jobs", len(response.jobs))
|
||||
return response
|
||||
|
||||
|
||||
def _build_job_status(job: DiarizationJob) -> noteflow_pb2.DiarizationJobStatus:
|
||||
"""Build proto status from a diarization job."""
|
||||
return noteflow_pb2.DiarizationJobStatus(
|
||||
job_id=job.job_id,
|
||||
status=int(job.status),
|
||||
segments_updated=job.segments_updated,
|
||||
speaker_ids=job.speaker_ids,
|
||||
error_message=job.error_message,
|
||||
progress_percent=_calculate_progress_percent(job),
|
||||
)
|
||||
|
||||
|
||||
def _calculate_progress_percent(job: DiarizationJob) -> float:
|
||||
"""Calculate progress percentage for a diarization job."""
|
||||
if job.status == noteflow_pb2.JOB_STATUS_COMPLETED:
|
||||
return 100.0
|
||||
if job.status != noteflow_pb2.JOB_STATUS_RUNNING or job.started_at is None:
|
||||
return 0.0
|
||||
|
||||
now = utc_now()
|
||||
started: datetime = job.started_at
|
||||
elapsed = (now - started).total_seconds()
|
||||
audio_duration = job.audio_duration_seconds
|
||||
if audio_duration is not None and audio_duration > 0:
|
||||
# ~10 seconds processing per 60 seconds audio
|
||||
estimated_duration = audio_duration * 0.17
|
||||
return min(95.0, (elapsed / estimated_duration) * 100)
|
||||
# Fallback: assume 2 minutes total
|
||||
return min(95.0, (elapsed / 120) * 100)
|
||||
|
||||
@@ -135,6 +135,12 @@ async def _resolve_active_project_id(
|
||||
try:
|
||||
workspace_uuid = UUID(workspace_id)
|
||||
except ValueError:
|
||||
truncated = workspace_id[:8] + "..." if len(workspace_id) > 8 else workspace_id
|
||||
logger.warning(
|
||||
"resolve_active_project: invalid workspace_id format",
|
||||
workspace_id_truncated=truncated,
|
||||
workspace_id_length=len(workspace_id),
|
||||
)
|
||||
return None
|
||||
|
||||
_, active_project = await host.project_service.get_active_project(
|
||||
@@ -282,6 +288,12 @@ class MeetingMixin:
|
||||
try:
|
||||
project_id = UUID(request.project_id)
|
||||
except ValueError:
|
||||
truncated = request.project_id[:8] + "..." if len(request.project_id) > 8 else request.project_id
|
||||
logger.warning(
|
||||
"ListMeetings: invalid project_id format",
|
||||
project_id_truncated=truncated,
|
||||
project_id_length=len(request.project_id),
|
||||
)
|
||||
await abort_invalid_argument(context, f"{ERROR_INVALID_PROJECT_ID_PREFIX}{request.project_id}")
|
||||
|
||||
async with self.create_repository_provider() as repo:
|
||||
|
||||
@@ -7,7 +7,12 @@ from typing import Protocol, cast
|
||||
from uuid import UUID
|
||||
|
||||
from noteflow.config.constants import ERROR_INVALID_WORKSPACE_ID_FORMAT
|
||||
from noteflow.domain.auth.oidc import ClaimMapping, OidcProviderConfig, OidcProviderPreset
|
||||
from noteflow.domain.auth.oidc import (
|
||||
ClaimMapping,
|
||||
OidcProviderConfig,
|
||||
OidcProviderPreset,
|
||||
OidcProviderRegistration,
|
||||
)
|
||||
from noteflow.infrastructure.auth.oidc_discovery import OidcDiscoveryError
|
||||
from noteflow.infrastructure.auth.oidc_registry import (
|
||||
PROVIDER_PRESETS,
|
||||
@@ -148,7 +153,7 @@ class OidcMixin:
|
||||
# Register provider
|
||||
oidc_service = self.get_oidc_service()
|
||||
try:
|
||||
provider, warnings = await oidc_service.register_provider(
|
||||
registration = OidcProviderRegistration(
|
||||
workspace_id=workspace_id,
|
||||
name=request.name,
|
||||
issuer_url=request.issuer_url,
|
||||
@@ -160,6 +165,7 @@ class OidcMixin:
|
||||
),
|
||||
preset=preset,
|
||||
)
|
||||
provider, warnings = await oidc_service.register_provider(registration)
|
||||
|
||||
_apply_custom_provider_config(
|
||||
provider,
|
||||
|
||||
@@ -30,6 +30,7 @@ if TYPE_CHECKING:
|
||||
from noteflow.infrastructure.auth.oidc_registry import OidcAuthService
|
||||
from noteflow.infrastructure.diarization import (
|
||||
DiarizationEngine,
|
||||
DiarizationSession,
|
||||
SpeakerTurn,
|
||||
)
|
||||
from noteflow.infrastructure.persistence.repositories import DiarizationJob
|
||||
@@ -43,15 +44,12 @@ if TYPE_CHECKING:
|
||||
from ..proto import noteflow_pb2
|
||||
from ..stream_state import MeetingStreamState
|
||||
from ._types import GrpcContext, GrpcStatusContext
|
||||
from .diarization._streaming import DiarizationChunkContext
|
||||
from .streaming._types import StreamSessionInit
|
||||
|
||||
|
||||
class ServicerHost(Protocol):
|
||||
"""Protocol defining shared state and methods for service mixins.
|
||||
|
||||
All mixins should type-hint `self` as `ServicerHost` to access these
|
||||
attributes and methods from the host NoteFlowServicer class.
|
||||
"""
|
||||
class _ServicerState(Protocol):
|
||||
"""Shared state required by service mixins."""
|
||||
|
||||
# Configuration
|
||||
session_factory: async_sessionmaker[AsyncSession] | None
|
||||
@@ -107,6 +105,13 @@ class ServicerHost(Protocol):
|
||||
PARTIAL_CADENCE_SECONDS: Final[float]
|
||||
MIN_PARTIAL_AUDIO_SECONDS: Final[float]
|
||||
|
||||
# OIDC service
|
||||
oidc_service: OidcAuthService | None
|
||||
|
||||
|
||||
class _ServicerCoreMethods(Protocol):
|
||||
"""Core helper methods shared across mixins."""
|
||||
|
||||
@property
|
||||
def diarization_job_ttl_seconds(self) -> float:
|
||||
"""Return diarization job TTL from settings."""
|
||||
@@ -125,15 +130,7 @@ class ServicerHost(Protocol):
|
||||
...
|
||||
|
||||
def create_repository_provider(self) -> UnitOfWork:
|
||||
"""Create a repository provider (database or memory backed).
|
||||
|
||||
Returns a UnitOfWork implementation appropriate for the current
|
||||
configuration. Use this for operations that can work with either
|
||||
backend, eliminating the need for if/else branching.
|
||||
|
||||
Returns:
|
||||
SqlAlchemyUnitOfWork if database configured, MemoryUnitOfWork otherwise.
|
||||
"""
|
||||
"""Create a repository provider (database or memory backed)."""
|
||||
...
|
||||
|
||||
def next_segment_id(self, meeting_id: str, fallback: int = 0) -> int:
|
||||
@@ -149,11 +146,7 @@ class ServicerHost(Protocol):
|
||||
...
|
||||
|
||||
def get_stream_state(self, meeting_id: str) -> MeetingStreamState | None:
|
||||
"""Get consolidated streaming state for a meeting.
|
||||
|
||||
Returns None if meeting has no active stream state.
|
||||
Single lookup replaces 13+ dict accesses in hot paths.
|
||||
"""
|
||||
"""Get consolidated streaming state for a meeting."""
|
||||
...
|
||||
|
||||
def ensure_meeting_dek(self, meeting: Meeting) -> tuple[bytes, bytes, bool]:
|
||||
@@ -178,7 +171,14 @@ class ServicerHost(Protocol):
|
||||
"""Close and remove the audio writer for a meeting."""
|
||||
...
|
||||
|
||||
# Diarization mixin methods (for internal cross-references)
|
||||
def get_oidc_service(self) -> OidcAuthService:
|
||||
"""Get or create the OIDC auth service."""
|
||||
...
|
||||
|
||||
|
||||
class _ServicerDiarizationMethods(Protocol):
|
||||
"""Diarization helpers used by streaming and job mixins."""
|
||||
|
||||
async def prune_diarization_jobs(self) -> None:
|
||||
"""Prune expired diarization jobs from in-memory cache."""
|
||||
...
|
||||
@@ -215,7 +215,6 @@ class ServicerHost(Protocol):
|
||||
"""Run post-meeting speaker diarization refinement."""
|
||||
...
|
||||
|
||||
# Diarization job management methods
|
||||
async def update_job_completed(
|
||||
self,
|
||||
job_id: str,
|
||||
@@ -278,19 +277,37 @@ class ServicerHost(Protocol):
|
||||
"""Process audio chunk for streaming diarization (best-effort)."""
|
||||
...
|
||||
|
||||
# Webhook methods
|
||||
async def ensure_diarization_session(
|
||||
self,
|
||||
meeting_id: str,
|
||||
state: MeetingStreamState,
|
||||
loop: asyncio.AbstractEventLoop,
|
||||
) -> DiarizationSession | None:
|
||||
"""Return an initialized diarization session or None on failure."""
|
||||
...
|
||||
|
||||
async def process_diarization_chunk(
|
||||
self,
|
||||
context: DiarizationChunkContext,
|
||||
session: DiarizationSession,
|
||||
audio: NDArray[np.float32],
|
||||
loop: asyncio.AbstractEventLoop,
|
||||
) -> list[SpeakerTurn] | None:
|
||||
"""Process a diarization chunk, returning new turns or None on failure."""
|
||||
...
|
||||
|
||||
|
||||
class _ServicerWebhookMethods(Protocol):
|
||||
"""Webhook helpers."""
|
||||
|
||||
async def fire_stop_webhooks(self, meeting: Meeting) -> None:
|
||||
"""Trigger webhooks for meeting stop (fire-and-forget)."""
|
||||
...
|
||||
|
||||
# OIDC service
|
||||
oidc_service: OidcAuthService | None
|
||||
|
||||
def get_oidc_service(self) -> OidcAuthService:
|
||||
"""Get or create the OIDC auth service."""
|
||||
...
|
||||
class _ServicerPreferencesMethods(Protocol):
|
||||
"""Preferences helpers."""
|
||||
|
||||
# Preferences methods
|
||||
async def decode_and_validate_prefs(
|
||||
self,
|
||||
request: noteflow_pb2.SetPreferencesRequest,
|
||||
@@ -309,7 +326,10 @@ class ServicerHost(Protocol):
|
||||
"""Apply preferences based on merge mode."""
|
||||
...
|
||||
|
||||
# Streaming methods
|
||||
|
||||
class _ServicerStreamingMethods(Protocol):
|
||||
"""Streaming helpers."""
|
||||
|
||||
async def init_stream_for_meeting(
|
||||
self,
|
||||
meeting_id: str,
|
||||
@@ -334,7 +354,20 @@ class ServicerHost(Protocol):
|
||||
"""Flush remaining audio from segmenter at stream end."""
|
||||
...
|
||||
|
||||
# Summarization methods
|
||||
async def prepare_stream_chunk(
|
||||
self,
|
||||
current_meeting_id: str | None,
|
||||
initialized_meeting_id: str | None,
|
||||
chunk: noteflow_pb2.AudioChunk,
|
||||
context: GrpcContext,
|
||||
) -> tuple[str, str | None] | None:
|
||||
"""Validate and initialize streaming state for a chunk."""
|
||||
...
|
||||
|
||||
|
||||
class _ServicerSummarizationMethods(Protocol):
|
||||
"""Summarization helpers."""
|
||||
|
||||
async def summarize_or_placeholder(
|
||||
self,
|
||||
meeting_id: MeetingId,
|
||||
@@ -352,7 +385,10 @@ class ServicerHost(Protocol):
|
||||
"""Generate a lightweight placeholder summary when summarization fails."""
|
||||
...
|
||||
|
||||
# Sync mixin methods
|
||||
|
||||
class _ServicerSyncMethods(Protocol):
|
||||
"""Sync helpers."""
|
||||
|
||||
def ensure_sync_runs_cache(self) -> dict[UUID, SyncRun]:
|
||||
"""Ensure the sync runs cache exists."""
|
||||
...
|
||||
@@ -404,3 +440,18 @@ class ServicerHost(Protocol):
|
||||
) -> SyncRun | None:
|
||||
"""Mark sync run as failed with error message."""
|
||||
...
|
||||
|
||||
|
||||
class ServicerHost(
|
||||
_ServicerState,
|
||||
_ServicerCoreMethods,
|
||||
_ServicerDiarizationMethods,
|
||||
_ServicerWebhookMethods,
|
||||
_ServicerPreferencesMethods,
|
||||
_ServicerStreamingMethods,
|
||||
_ServicerSummarizationMethods,
|
||||
_ServicerSyncMethods,
|
||||
Protocol,
|
||||
):
|
||||
"""Protocol defining shared state and methods for service mixins."""
|
||||
pass
|
||||
|
||||
@@ -60,23 +60,15 @@ class StreamingMixin:
|
||||
|
||||
try:
|
||||
async for chunk in request_iterator:
|
||||
meeting_id = chunk.meeting_id
|
||||
if not meeting_id:
|
||||
await abort_invalid_argument(context, "meeting_id required")
|
||||
|
||||
# Initialize stream on first chunk
|
||||
if current_meeting_id is None:
|
||||
# Track meeting_id BEFORE init to guarantee cleanup on any exception
|
||||
# (cleanup_stream_resources is idempotent, safe to call even if init aborts)
|
||||
initialized_meeting_id = meeting_id
|
||||
init_result = await self.init_stream_for_meeting(meeting_id, context)
|
||||
if init_result is None:
|
||||
return # Error already sent via context.abort
|
||||
current_meeting_id = meeting_id
|
||||
elif meeting_id != current_meeting_id:
|
||||
await abort_invalid_argument(
|
||||
context, "Stream may only contain a single meeting_id"
|
||||
)
|
||||
prep = await self.prepare_stream_chunk(
|
||||
current_meeting_id,
|
||||
initialized_meeting_id,
|
||||
chunk,
|
||||
context,
|
||||
)
|
||||
if prep is None:
|
||||
return # Error already sent via context.abort
|
||||
current_meeting_id, initialized_meeting_id = prep
|
||||
|
||||
# Check for stop request (graceful shutdown from StopMeeting)
|
||||
if current_meeting_id in self.stop_requested:
|
||||
@@ -100,6 +92,33 @@ class StreamingMixin:
|
||||
if cleanup_meeting := current_meeting_id or initialized_meeting_id:
|
||||
cleanup_stream_resources(self, cleanup_meeting)
|
||||
|
||||
async def prepare_stream_chunk(
|
||||
self: ServicerHost,
|
||||
current_meeting_id: str | None,
|
||||
initialized_meeting_id: str | None,
|
||||
chunk: noteflow_pb2.AudioChunk,
|
||||
context: GrpcContext,
|
||||
) -> tuple[str, str | None] | None:
|
||||
"""Validate and initialize streaming state for a chunk."""
|
||||
meeting_id = chunk.meeting_id
|
||||
if not meeting_id:
|
||||
await abort_invalid_argument(context, "meeting_id required")
|
||||
return None
|
||||
|
||||
if current_meeting_id is None:
|
||||
# Track meeting_id BEFORE init to guarantee cleanup on any exception
|
||||
initialized_meeting_id = meeting_id
|
||||
init_result = await self.init_stream_for_meeting(meeting_id, context)
|
||||
if init_result is None:
|
||||
return None
|
||||
return meeting_id, initialized_meeting_id
|
||||
|
||||
if meeting_id != current_meeting_id:
|
||||
await abort_invalid_argument(context, "Stream may only contain a single meeting_id")
|
||||
return None
|
||||
|
||||
return current_meeting_id, initialized_meeting_id
|
||||
|
||||
async def init_stream_for_meeting(
|
||||
self: ServicerHost,
|
||||
meeting_id: str,
|
||||
|
||||
@@ -209,16 +209,21 @@ def decrement_pending_chunks(host: ServicerHost, meeting_id: str) -> None:
|
||||
|
||||
Call this after ASR processing completes for a segment.
|
||||
"""
|
||||
if hasattr(host, "_pending_chunks") and meeting_id in host.pending_chunks:
|
||||
# Decrement by ACK_CHUNK_INTERVAL since we process in batches
|
||||
host.pending_chunks[meeting_id] = max(
|
||||
0, host.pending_chunks[meeting_id] - ACK_CHUNK_INTERVAL
|
||||
)
|
||||
if receipt_times := host.chunk_receipt_times.get(meeting_id):
|
||||
# Remove timestamps corresponding to processed chunks
|
||||
for _ in range(min(ACK_CHUNK_INTERVAL, len(receipt_times))):
|
||||
if receipt_times:
|
||||
receipt_times.popleft()
|
||||
if not hasattr(host, "_pending_chunks"):
|
||||
return
|
||||
if meeting_id not in host.pending_chunks:
|
||||
return
|
||||
|
||||
# Decrement by ACK_CHUNK_INTERVAL since we process in batches
|
||||
host.pending_chunks[meeting_id] = max(
|
||||
0, host.pending_chunks[meeting_id] - ACK_CHUNK_INTERVAL
|
||||
)
|
||||
receipt_times = host.chunk_receipt_times.get(meeting_id)
|
||||
if not receipt_times:
|
||||
return
|
||||
# Remove timestamps corresponding to processed chunks
|
||||
for _ in range(min(ACK_CHUNK_INTERVAL, len(receipt_times))):
|
||||
receipt_times.popleft()
|
||||
|
||||
|
||||
def _convert_audio_format(
|
||||
|
||||
@@ -138,24 +138,8 @@ class StreamSessionManager:
|
||||
Returns:
|
||||
Initialization result, or None if error was sent.
|
||||
"""
|
||||
# Atomic check-and-add protected by lock with timeout to prevent deadlock
|
||||
try:
|
||||
async with asyncio.timeout(STREAM_INIT_LOCK_TIMEOUT_SECONDS):
|
||||
async with host.stream_init_lock:
|
||||
if meeting_id in host.active_streams:
|
||||
await abort_failed_precondition(
|
||||
context, f"{ERROR_MSG_MEETING_PREFIX}{meeting_id} already streaming"
|
||||
)
|
||||
host.active_streams.add(meeting_id)
|
||||
except TimeoutError:
|
||||
logger.error(
|
||||
"Stream initialization lock timeout for meeting %s after %.1fs",
|
||||
meeting_id,
|
||||
STREAM_INIT_LOCK_TIMEOUT_SECONDS,
|
||||
)
|
||||
await abort_failed_precondition(
|
||||
context, "Stream initialization timed out - server may be overloaded"
|
||||
)
|
||||
if not await StreamSessionManager._reserve_stream_slot(host, meeting_id, context):
|
||||
return None
|
||||
|
||||
init_result = await StreamSessionManager._init_stream_session(host, meeting_id)
|
||||
|
||||
@@ -166,6 +150,48 @@ class StreamSessionManager:
|
||||
|
||||
return init_result
|
||||
|
||||
@staticmethod
|
||||
async def _reserve_stream_slot(
|
||||
host: ServicerHost,
|
||||
meeting_id: str,
|
||||
context: GrpcContext,
|
||||
) -> bool:
|
||||
"""Reserve the meeting for streaming or abort on conflict."""
|
||||
# Atomic check-and-add protected by lock with timeout to prevent deadlock
|
||||
try:
|
||||
async with asyncio.timeout(STREAM_INIT_LOCK_TIMEOUT_SECONDS):
|
||||
reserved = await StreamSessionManager._try_reserve_stream_slot(
|
||||
host,
|
||||
meeting_id,
|
||||
context,
|
||||
)
|
||||
except TimeoutError:
|
||||
logger.error(
|
||||
"Stream initialization lock timeout for meeting %s after %.1fs",
|
||||
meeting_id,
|
||||
STREAM_INIT_LOCK_TIMEOUT_SECONDS,
|
||||
)
|
||||
await abort_failed_precondition(
|
||||
context, "Stream initialization timed out - server may be overloaded"
|
||||
)
|
||||
return False
|
||||
return reserved
|
||||
|
||||
@staticmethod
|
||||
async def _try_reserve_stream_slot(
|
||||
host: ServicerHost,
|
||||
meeting_id: str,
|
||||
context: GrpcContext,
|
||||
) -> bool:
|
||||
async with host.stream_init_lock:
|
||||
if meeting_id in host.active_streams:
|
||||
await abort_failed_precondition(
|
||||
context, f"{ERROR_MSG_MEETING_PREFIX}{meeting_id} already streaming"
|
||||
)
|
||||
return False
|
||||
host.active_streams.add(meeting_id)
|
||||
return True
|
||||
|
||||
@staticmethod
|
||||
async def _init_stream_session(
|
||||
host: ServicerHost,
|
||||
|
||||
184
src/noteflow/grpc/_service_stubs.py
Normal file
184
src/noteflow/grpc/_service_stubs.py
Normal file
@@ -0,0 +1,184 @@
|
||||
"""Type stubs for NoteFlowServicer mixin methods (type checking only)."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import AsyncIterator
|
||||
from typing import Protocol
|
||||
|
||||
from ._mixins._types import GrpcContext, GrpcStatusContext
|
||||
from .proto import noteflow_pb2
|
||||
|
||||
|
||||
class _StreamingStubs(Protocol):
|
||||
"""Streaming mixin stubs."""
|
||||
|
||||
def StreamTranscription(
|
||||
self,
|
||||
request_iterator: AsyncIterator[noteflow_pb2.AudioChunk],
|
||||
context: GrpcContext,
|
||||
) -> AsyncIterator[noteflow_pb2.TranscriptUpdate]: ...
|
||||
|
||||
|
||||
class _CalendarStubs(Protocol):
|
||||
"""Calendar mixin stubs."""
|
||||
|
||||
async def GetCalendarProviders(
|
||||
self,
|
||||
request: noteflow_pb2.GetCalendarProvidersRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetCalendarProvidersResponse: ...
|
||||
|
||||
async def InitiateOAuth(
|
||||
self,
|
||||
request: noteflow_pb2.InitiateOAuthRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.InitiateOAuthResponse: ...
|
||||
|
||||
async def CompleteOAuth(
|
||||
self,
|
||||
request: noteflow_pb2.CompleteOAuthRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.CompleteOAuthResponse: ...
|
||||
|
||||
async def GetOAuthConnectionStatus(
|
||||
self,
|
||||
request: noteflow_pb2.GetOAuthConnectionStatusRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetOAuthConnectionStatusResponse: ...
|
||||
|
||||
async def DisconnectOAuth(
|
||||
self,
|
||||
request: noteflow_pb2.DisconnectOAuthRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.DisconnectOAuthResponse: ...
|
||||
|
||||
|
||||
class _SummarizationStubs(Protocol):
|
||||
"""Summarization mixin stubs."""
|
||||
|
||||
async def GetCloudConsentStatus(
|
||||
self,
|
||||
request: noteflow_pb2.GetCloudConsentStatusRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetCloudConsentStatusResponse: ...
|
||||
|
||||
async def GrantCloudConsent(
|
||||
self,
|
||||
request: noteflow_pb2.GrantCloudConsentRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GrantCloudConsentResponse: ...
|
||||
|
||||
async def RevokeCloudConsent(
|
||||
self,
|
||||
request: noteflow_pb2.RevokeCloudConsentRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.RevokeCloudConsentResponse: ...
|
||||
|
||||
async def GenerateSummary(
|
||||
self,
|
||||
request: noteflow_pb2.GenerateSummaryRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.Summary: ...
|
||||
|
||||
|
||||
class _SyncStubs(Protocol):
|
||||
"""Sync mixin stubs."""
|
||||
|
||||
async def StartIntegrationSync(
|
||||
self,
|
||||
request: noteflow_pb2.StartIntegrationSyncRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.StartIntegrationSyncResponse: ...
|
||||
|
||||
async def GetSyncStatus(
|
||||
self,
|
||||
request: noteflow_pb2.GetSyncStatusRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetSyncStatusResponse: ...
|
||||
|
||||
async def ListSyncHistory(
|
||||
self,
|
||||
request: noteflow_pb2.ListSyncHistoryRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.ListSyncHistoryResponse: ...
|
||||
|
||||
async def GetUserIntegrations(
|
||||
self,
|
||||
request: noteflow_pb2.GetUserIntegrationsRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetUserIntegrationsResponse: ...
|
||||
|
||||
|
||||
class _DiarizationStubs(Protocol):
|
||||
"""Diarization mixin stubs."""
|
||||
|
||||
async def RefineSpeakerDiarization(
|
||||
self,
|
||||
request: noteflow_pb2.RefineSpeakerDiarizationRequest,
|
||||
context: GrpcStatusContext,
|
||||
) -> noteflow_pb2.RefineSpeakerDiarizationResponse: ...
|
||||
|
||||
async def RenameSpeaker(
|
||||
self,
|
||||
request: noteflow_pb2.RenameSpeakerRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.RenameSpeakerResponse: ...
|
||||
|
||||
async def GetDiarizationJobStatus(
|
||||
self,
|
||||
request: noteflow_pb2.GetDiarizationJobStatusRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.DiarizationJobStatus: ...
|
||||
|
||||
async def CancelDiarizationJob(
|
||||
self,
|
||||
request: noteflow_pb2.CancelDiarizationJobRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.CancelDiarizationJobResponse: ...
|
||||
|
||||
|
||||
class _WebhookStubs(Protocol):
|
||||
"""Webhook mixin stubs."""
|
||||
|
||||
async def RegisterWebhook(
|
||||
self,
|
||||
request: noteflow_pb2.RegisterWebhookRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.WebhookConfigProto: ...
|
||||
|
||||
async def ListWebhooks(
|
||||
self,
|
||||
request: noteflow_pb2.ListWebhooksRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.ListWebhooksResponse: ...
|
||||
|
||||
async def UpdateWebhook(
|
||||
self,
|
||||
request: noteflow_pb2.UpdateWebhookRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.WebhookConfigProto: ...
|
||||
|
||||
async def DeleteWebhook(
|
||||
self,
|
||||
request: noteflow_pb2.DeleteWebhookRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.DeleteWebhookResponse: ...
|
||||
|
||||
async def GetWebhookDeliveries(
|
||||
self,
|
||||
request: noteflow_pb2.GetWebhookDeliveriesRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetWebhookDeliveriesResponse: ...
|
||||
|
||||
|
||||
class NoteFlowServicerStubs(
|
||||
_StreamingStubs,
|
||||
_CalendarStubs,
|
||||
_SummarizationStubs,
|
||||
_SyncStubs,
|
||||
_DiarizationStubs,
|
||||
_WebhookStubs,
|
||||
Protocol,
|
||||
):
|
||||
"""Composite protocol for NoteFlow servicer mixin stubs."""
|
||||
pass
|
||||
@@ -8,6 +8,7 @@ from typing import TYPE_CHECKING, Final
|
||||
|
||||
import grpc
|
||||
|
||||
from noteflow.grpc import _types
|
||||
from noteflow.grpc._client_mixins import (
|
||||
AnnotationClientMixin,
|
||||
DiarizationClientMixin,
|
||||
@@ -16,7 +17,15 @@ from noteflow.grpc._client_mixins import (
|
||||
StreamingClientMixin,
|
||||
)
|
||||
from noteflow.grpc._config import STREAMING_CONFIG
|
||||
from noteflow.grpc import _types
|
||||
from noteflow.grpc._types import (
|
||||
AnnotationInfo,
|
||||
DiarizationResult,
|
||||
ExportResult,
|
||||
MeetingInfo,
|
||||
RenameSpeakerResult,
|
||||
ServerInfo,
|
||||
TranscriptSegment,
|
||||
)
|
||||
from noteflow.grpc.proto import noteflow_pb2, noteflow_pb2_grpc
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
|
||||
@@ -24,10 +33,22 @@ if TYPE_CHECKING:
|
||||
import numpy as np
|
||||
from numpy.typing import NDArray
|
||||
|
||||
from noteflow.grpc._client_mixins.protocols import NoteFlowServiceStubProtocol
|
||||
from noteflow.grpc._client_mixins.protocols import ClientHost, NoteFlowServiceStubProtocol
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
# Re-export types for public API (used by grpc/__init__.py)
|
||||
__all__ = [
|
||||
"AnnotationInfo",
|
||||
"DiarizationResult",
|
||||
"ExportResult",
|
||||
"MeetingInfo",
|
||||
"NoteFlowClient",
|
||||
"RenameSpeakerResult",
|
||||
"ServerInfo",
|
||||
"TranscriptSegment",
|
||||
]
|
||||
|
||||
DEFAULT_SERVER: Final[str] = "localhost:50051"
|
||||
CHUNK_TIMEOUT: Final[float] = 0.1 # Timeout for getting chunks from queue
|
||||
|
||||
@@ -144,7 +165,9 @@ class NoteFlowClient(
|
||||
|
||||
def disconnect(self) -> None:
|
||||
"""Disconnect from the server."""
|
||||
self.stop_streaming()
|
||||
# Type assertion: NoteFlowClient implements ClientHost protocol
|
||||
client: ClientHost = self
|
||||
client.stop_streaming()
|
||||
|
||||
if self._channel is not None:
|
||||
self._channel.close()
|
||||
|
||||
@@ -1,12 +1,22 @@
|
||||
"""gRPC interceptors for NoteFlow.
|
||||
|
||||
Provide cross-cutting concerns for RPC calls:
|
||||
- Identity context propagation
|
||||
- Request tracing
|
||||
- Identity context propagation and validation
|
||||
- Per-RPC request logging with timing
|
||||
"""
|
||||
|
||||
from noteflow.grpc.interceptors.identity import IdentityInterceptor
|
||||
from noteflow.grpc.interceptors.identity import (
|
||||
METADATA_REQUEST_ID,
|
||||
METADATA_USER_ID,
|
||||
METADATA_WORKSPACE_ID,
|
||||
IdentityInterceptor,
|
||||
)
|
||||
from noteflow.grpc.interceptors.logging import RequestLoggingInterceptor
|
||||
|
||||
__all__ = [
|
||||
"METADATA_REQUEST_ID",
|
||||
"METADATA_USER_ID",
|
||||
"METADATA_WORKSPACE_ID",
|
||||
"IdentityInterceptor",
|
||||
"RequestLoggingInterceptor",
|
||||
]
|
||||
|
||||
@@ -2,6 +2,9 @@
|
||||
|
||||
Populate identity context (request ID, user ID, workspace ID) for RPC calls
|
||||
by extracting from metadata and setting context variables.
|
||||
|
||||
Identity metadata is REQUIRED for all RPCs. Requests missing the x-request-id
|
||||
header are rejected with UNAUTHENTICATED status.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@@ -13,7 +16,6 @@ import grpc
|
||||
from grpc import aio
|
||||
|
||||
from noteflow.infrastructure.logging import (
|
||||
generate_request_id,
|
||||
get_logger,
|
||||
request_id_var,
|
||||
user_id_var,
|
||||
@@ -27,6 +29,9 @@ METADATA_REQUEST_ID = "x-request-id"
|
||||
METADATA_USER_ID = "x-user-id"
|
||||
METADATA_WORKSPACE_ID = "x-workspace-id"
|
||||
|
||||
# Error messages
|
||||
_ERR_MISSING_REQUEST_ID = "Missing required x-request-id header"
|
||||
|
||||
_TRequest = TypeVar("_TRequest")
|
||||
_TResponse = TypeVar("_TResponse")
|
||||
|
||||
@@ -37,15 +42,18 @@ def _coerce_metadata_value(value: str | bytes) -> str:
|
||||
|
||||
|
||||
class IdentityInterceptor(aio.ServerInterceptor):
|
||||
"""Interceptor that populates identity context for RPC calls.
|
||||
"""Interceptor that validates and populates identity context for RPC calls.
|
||||
|
||||
Extract user and workspace identifiers from gRPC metadata and
|
||||
set them as context variables for use throughout the request.
|
||||
|
||||
Identity metadata is REQUIRED. Requests missing x-request-id are rejected
|
||||
with UNAUTHENTICATED status.
|
||||
|
||||
Metadata keys:
|
||||
- x-request-id: Correlation ID for request tracing
|
||||
- x-user-id: User identifier
|
||||
- x-workspace-id: Workspace identifier for tenant scoping
|
||||
- x-request-id: Correlation ID for request tracing (REQUIRED)
|
||||
- x-user-id: User identifier (optional)
|
||||
- x-workspace-id: Workspace identifier for tenant scoping (optional)
|
||||
"""
|
||||
|
||||
async def intercept_service(
|
||||
@@ -56,7 +64,7 @@ class IdentityInterceptor(aio.ServerInterceptor):
|
||||
],
|
||||
handler_call_details: grpc.HandlerCallDetails,
|
||||
) -> grpc.RpcMethodHandler[_TRequest, _TResponse]:
|
||||
"""Intercept incoming RPC calls to set identity context.
|
||||
"""Intercept incoming RPC calls to validate and set identity context.
|
||||
|
||||
Args:
|
||||
continuation: The next interceptor or handler.
|
||||
@@ -64,19 +72,25 @@ class IdentityInterceptor(aio.ServerInterceptor):
|
||||
|
||||
Returns:
|
||||
The RPC handler for this call.
|
||||
|
||||
Raises:
|
||||
grpc.RpcError: UNAUTHENTICATED if x-request-id header is missing.
|
||||
"""
|
||||
# Generate or extract request ID
|
||||
metadata = dict(handler_call_details.invocation_metadata or [])
|
||||
|
||||
# Validate required x-request-id header
|
||||
request_id_value = metadata.get(METADATA_REQUEST_ID)
|
||||
request_id = (
|
||||
_coerce_metadata_value(request_id_value)
|
||||
if request_id_value is not None
|
||||
else generate_request_id()
|
||||
)
|
||||
if request_id_value is None:
|
||||
logger.warning(
|
||||
"Rejecting RPC: missing x-request-id header",
|
||||
method=handler_call_details.method,
|
||||
)
|
||||
return _create_unauthenticated_handler(_ERR_MISSING_REQUEST_ID)
|
||||
|
||||
request_id = _coerce_metadata_value(request_id_value)
|
||||
request_id_var.set(request_id)
|
||||
|
||||
# Extract user and workspace IDs from metadata
|
||||
# Extract optional user and workspace IDs from metadata
|
||||
if user_id_value := metadata.get(METADATA_USER_ID):
|
||||
user_id_var.set(_coerce_metadata_value(user_id_value))
|
||||
|
||||
@@ -92,3 +106,29 @@ class IdentityInterceptor(aio.ServerInterceptor):
|
||||
)
|
||||
|
||||
return await continuation(handler_call_details)
|
||||
|
||||
|
||||
def _create_unauthenticated_handler[TRequest, TResponse](
|
||||
message: str,
|
||||
) -> grpc.RpcMethodHandler[TRequest, TResponse]:
|
||||
"""Create a handler that rejects with UNAUTHENTICATED status.
|
||||
|
||||
Args:
|
||||
message: Error message to include in the response.
|
||||
|
||||
Returns:
|
||||
A gRPC method handler that rejects all requests.
|
||||
"""
|
||||
|
||||
async def reject_unary_unary(
|
||||
request: TRequest,
|
||||
context: aio.ServicerContext[TRequest, TResponse],
|
||||
) -> TResponse:
|
||||
await context.abort(grpc.StatusCode.UNAUTHENTICATED, message)
|
||||
raise AssertionError("Unreachable after abort")
|
||||
|
||||
return grpc.unary_unary_rpc_method_handler(
|
||||
reject_unary_unary,
|
||||
request_deserializer=None,
|
||||
response_serializer=None,
|
||||
)
|
||||
|
||||
304
src/noteflow/grpc/interceptors/logging.py
Normal file
304
src/noteflow/grpc/interceptors/logging.py
Normal file
@@ -0,0 +1,304 @@
|
||||
"""Request logging interceptor for gRPC calls.
|
||||
|
||||
Log every RPC call with method, status, duration, peer, and request context
|
||||
at INFO level for production observability and traceability.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
from collections.abc import AsyncIterator, Awaitable, Callable
|
||||
from typing import TypeVar, cast
|
||||
|
||||
import grpc
|
||||
from grpc import aio
|
||||
|
||||
from noteflow.infrastructure.logging import get_logger, get_request_id
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
# TypeVars required for ServerInterceptor.intercept_service compatibility
|
||||
_TRequest = TypeVar("_TRequest")
|
||||
_TResponse = TypeVar("_TResponse")
|
||||
|
||||
|
||||
class RequestLoggingInterceptor(aio.ServerInterceptor):
|
||||
"""Interceptor that logs all RPC calls with timing and status.
|
||||
|
||||
Logs at INFO level for every request with:
|
||||
- method: Full RPC method name (e.g., /noteflow.NoteFlowService/GetMeeting)
|
||||
- status: gRPC status code (OK, NOT_FOUND, etc.)
|
||||
- duration_ms: Request processing time in milliseconds
|
||||
- peer: Client peer address
|
||||
- request_id: Correlation ID from identity context
|
||||
"""
|
||||
|
||||
async def intercept_service(
|
||||
self,
|
||||
continuation: Callable[
|
||||
[grpc.HandlerCallDetails],
|
||||
Awaitable[grpc.RpcMethodHandler[_TRequest, _TResponse]],
|
||||
],
|
||||
handler_call_details: grpc.HandlerCallDetails,
|
||||
) -> grpc.RpcMethodHandler[_TRequest, _TResponse]:
|
||||
"""Intercept incoming RPC calls to log request timing and status.
|
||||
|
||||
Args:
|
||||
continuation: The next interceptor or handler.
|
||||
handler_call_details: Details about the RPC call.
|
||||
|
||||
Returns:
|
||||
Wrapped RPC handler that logs on completion.
|
||||
"""
|
||||
handler = await continuation(handler_call_details)
|
||||
method = handler_call_details.method
|
||||
|
||||
# Return wrapped handler that logs on completion
|
||||
return _create_logging_handler(handler, method)
|
||||
|
||||
|
||||
def _create_logging_handler[TRequest, TResponse](
|
||||
handler: grpc.RpcMethodHandler[TRequest, TResponse],
|
||||
method: str,
|
||||
) -> grpc.RpcMethodHandler[TRequest, TResponse]:
|
||||
"""Wrap an RPC handler to add request logging.
|
||||
|
||||
Args:
|
||||
handler: Original RPC handler.
|
||||
method: Full RPC method name.
|
||||
|
||||
Returns:
|
||||
Wrapped handler with logging.
|
||||
"""
|
||||
# Cast required: gRPC stub types don't fully express the generic Callable signatures
|
||||
# for handler attributes, causing basedpyright to infer partially unknown types.
|
||||
if handler.unary_unary is not None:
|
||||
return grpc.unary_unary_rpc_method_handler(
|
||||
cast(
|
||||
Callable[
|
||||
[TRequest, aio.ServicerContext[TRequest, TResponse]],
|
||||
Awaitable[TResponse],
|
||||
],
|
||||
_wrap_unary_unary(handler.unary_unary, method),
|
||||
),
|
||||
request_deserializer=handler.request_deserializer,
|
||||
response_serializer=handler.response_serializer,
|
||||
)
|
||||
if handler.unary_stream is not None:
|
||||
return grpc.unary_stream_rpc_method_handler(
|
||||
cast(
|
||||
Callable[
|
||||
[TRequest, aio.ServicerContext[TRequest, TResponse]],
|
||||
AsyncIterator[TResponse],
|
||||
],
|
||||
_wrap_unary_stream(handler.unary_stream, method),
|
||||
),
|
||||
request_deserializer=handler.request_deserializer,
|
||||
response_serializer=handler.response_serializer,
|
||||
)
|
||||
if handler.stream_unary is not None:
|
||||
return grpc.stream_unary_rpc_method_handler(
|
||||
cast(
|
||||
Callable[
|
||||
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
|
||||
Awaitable[TResponse],
|
||||
],
|
||||
_wrap_stream_unary(handler.stream_unary, method),
|
||||
),
|
||||
request_deserializer=handler.request_deserializer,
|
||||
response_serializer=handler.response_serializer,
|
||||
)
|
||||
if handler.stream_stream is not None:
|
||||
return grpc.stream_stream_rpc_method_handler(
|
||||
cast(
|
||||
Callable[
|
||||
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
|
||||
AsyncIterator[TResponse],
|
||||
],
|
||||
_wrap_stream_stream(handler.stream_stream, method),
|
||||
),
|
||||
request_deserializer=handler.request_deserializer,
|
||||
response_serializer=handler.response_serializer,
|
||||
)
|
||||
# Fallback: return original handler if type unknown
|
||||
return handler
|
||||
|
||||
|
||||
def _log_request(
|
||||
method: str,
|
||||
status: str,
|
||||
duration_ms: float,
|
||||
peer: str | None,
|
||||
) -> None:
|
||||
"""Log RPC request completion at INFO level.
|
||||
|
||||
Args:
|
||||
method: Full RPC method name.
|
||||
status: gRPC status code name.
|
||||
duration_ms: Request duration in milliseconds.
|
||||
peer: Client peer address.
|
||||
"""
|
||||
request_id = get_request_id()
|
||||
logger.info(
|
||||
"RPC completed",
|
||||
method=method,
|
||||
status=status,
|
||||
duration_ms=round(duration_ms, 2),
|
||||
peer=peer,
|
||||
request_id=request_id,
|
||||
)
|
||||
|
||||
|
||||
def _get_peer[TRequest, TResponse](
|
||||
context: aio.ServicerContext[TRequest, TResponse],
|
||||
) -> str | None:
|
||||
"""Extract peer address from context safely.
|
||||
|
||||
Args:
|
||||
context: gRPC servicer context.
|
||||
|
||||
Returns:
|
||||
Peer address string or None.
|
||||
"""
|
||||
try:
|
||||
return context.peer()
|
||||
except (AttributeError, RuntimeError):
|
||||
return None
|
||||
|
||||
|
||||
def _wrap_unary_unary[TRequest, TResponse](
|
||||
handler: Callable[
|
||||
[TRequest, aio.ServicerContext[TRequest, TResponse]],
|
||||
Awaitable[TResponse],
|
||||
],
|
||||
method: str,
|
||||
) -> Callable[
|
||||
[TRequest, aio.ServicerContext[TRequest, TResponse]],
|
||||
Awaitable[TResponse],
|
||||
]:
|
||||
"""Wrap unary-unary handler with logging."""
|
||||
|
||||
async def wrapper(
|
||||
request: TRequest,
|
||||
context: aio.ServicerContext[TRequest, TResponse],
|
||||
) -> TResponse:
|
||||
start = time.perf_counter()
|
||||
peer = _get_peer(context)
|
||||
status = "OK"
|
||||
try:
|
||||
return await handler(request, context)
|
||||
except grpc.RpcError as e:
|
||||
status = e.code().name if hasattr(e, "code") else "UNKNOWN"
|
||||
raise
|
||||
except Exception:
|
||||
status = "INTERNAL"
|
||||
raise
|
||||
finally:
|
||||
duration_ms = (time.perf_counter() - start) * 1000
|
||||
_log_request(method, status, duration_ms, peer)
|
||||
|
||||
return wrapper
|
||||
|
||||
|
||||
def _wrap_unary_stream[TRequest, TResponse](
|
||||
handler: Callable[
|
||||
[TRequest, aio.ServicerContext[TRequest, TResponse]],
|
||||
AsyncIterator[TResponse],
|
||||
],
|
||||
method: str,
|
||||
) -> Callable[
|
||||
[TRequest, aio.ServicerContext[TRequest, TResponse]],
|
||||
AsyncIterator[TResponse],
|
||||
]:
|
||||
"""Wrap unary-stream handler with logging."""
|
||||
|
||||
async def wrapper(
|
||||
request: TRequest,
|
||||
context: aio.ServicerContext[TRequest, TResponse],
|
||||
) -> AsyncIterator[TResponse]:
|
||||
start = time.perf_counter()
|
||||
peer = _get_peer(context)
|
||||
status = "OK"
|
||||
try:
|
||||
async for response in handler(request, context):
|
||||
yield response
|
||||
except grpc.RpcError as e:
|
||||
status = e.code().name if hasattr(e, "code") else "UNKNOWN"
|
||||
raise
|
||||
except Exception:
|
||||
status = "INTERNAL"
|
||||
raise
|
||||
finally:
|
||||
duration_ms = (time.perf_counter() - start) * 1000
|
||||
_log_request(method, status, duration_ms, peer)
|
||||
|
||||
return wrapper
|
||||
|
||||
|
||||
def _wrap_stream_unary[TRequest, TResponse](
|
||||
handler: Callable[
|
||||
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
|
||||
Awaitable[TResponse],
|
||||
],
|
||||
method: str,
|
||||
) -> Callable[
|
||||
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
|
||||
Awaitable[TResponse],
|
||||
]:
|
||||
"""Wrap stream-unary handler with logging."""
|
||||
|
||||
async def wrapper(
|
||||
request_iterator: AsyncIterator[TRequest],
|
||||
context: aio.ServicerContext[TRequest, TResponse],
|
||||
) -> TResponse:
|
||||
start = time.perf_counter()
|
||||
peer = _get_peer(context)
|
||||
status = "OK"
|
||||
try:
|
||||
return await handler(request_iterator, context)
|
||||
except grpc.RpcError as e:
|
||||
status = e.code().name if hasattr(e, "code") else "UNKNOWN"
|
||||
raise
|
||||
except Exception:
|
||||
status = "INTERNAL"
|
||||
raise
|
||||
finally:
|
||||
duration_ms = (time.perf_counter() - start) * 1000
|
||||
_log_request(method, status, duration_ms, peer)
|
||||
|
||||
return wrapper
|
||||
|
||||
|
||||
def _wrap_stream_stream[TRequest, TResponse](
|
||||
handler: Callable[
|
||||
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
|
||||
AsyncIterator[TResponse],
|
||||
],
|
||||
method: str,
|
||||
) -> Callable[
|
||||
[AsyncIterator[TRequest], aio.ServicerContext[TRequest, TResponse]],
|
||||
AsyncIterator[TResponse],
|
||||
]:
|
||||
"""Wrap stream-stream handler with logging."""
|
||||
|
||||
async def wrapper(
|
||||
request_iterator: AsyncIterator[TRequest],
|
||||
context: aio.ServicerContext[TRequest, TResponse],
|
||||
) -> AsyncIterator[TResponse]:
|
||||
start = time.perf_counter()
|
||||
peer = _get_peer(context)
|
||||
status = "OK"
|
||||
try:
|
||||
async for response in handler(request_iterator, context):
|
||||
yield response
|
||||
except grpc.RpcError as e:
|
||||
status = e.code().name if hasattr(e, "code") else "UNKNOWN"
|
||||
raise
|
||||
except Exception:
|
||||
status = "INTERNAL"
|
||||
raise
|
||||
finally:
|
||||
duration_ms = (time.perf_counter() - start) * 1000
|
||||
_log_request(method, status, duration_ms, peer)
|
||||
|
||||
return wrapper
|
||||
@@ -7,17 +7,17 @@ Used as fallback when no database is configured.
|
||||
from __future__ import annotations
|
||||
|
||||
import threading
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, Unpack
|
||||
|
||||
from noteflow.config.constants import ERROR_MSG_MEETING_PREFIX
|
||||
from noteflow.domain.entities import Meeting, Segment, Summary
|
||||
from noteflow.domain.value_objects import MeetingState
|
||||
from noteflow.domain.ports.repositories.transcript import MeetingListKwargs
|
||||
from noteflow.infrastructure.persistence.memory.repositories import (
|
||||
InMemoryIntegrationRepository,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
from datetime import datetime
|
||||
|
||||
|
||||
@@ -91,24 +91,22 @@ class MeetingStore:
|
||||
|
||||
def list_all(
|
||||
self,
|
||||
states: Sequence[MeetingState] | None = None,
|
||||
limit: int = 100,
|
||||
offset: int = 0,
|
||||
sort_desc: bool = True,
|
||||
project_id: str | None = None,
|
||||
**kwargs: Unpack["MeetingListKwargs"],
|
||||
) -> tuple[list[Meeting], int]:
|
||||
"""List meetings with optional filtering.
|
||||
|
||||
Args:
|
||||
states: Filter by these states (all if None).
|
||||
limit: Max meetings per page.
|
||||
offset: Pagination offset.
|
||||
sort_desc: Sort by created_at descending.
|
||||
**kwargs: Optional filters (states, limit, offset, sort_desc).
|
||||
|
||||
Returns:
|
||||
Tuple of (paginated meeting list, total matching count).
|
||||
"""
|
||||
with self._lock:
|
||||
states = kwargs.get("states")
|
||||
limit = kwargs.get("limit", 100)
|
||||
offset = kwargs.get("offset", 0)
|
||||
sort_desc = kwargs.get("sort_desc", True)
|
||||
project_id = kwargs.get("project_id")
|
||||
meetings = list(self._meetings.values())
|
||||
|
||||
# Filter by state
|
||||
|
||||
@@ -2,12 +2,11 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import os
|
||||
import signal
|
||||
import time
|
||||
from typing import TYPE_CHECKING, cast
|
||||
from typing import TYPE_CHECKING, TypedDict, Unpack, cast
|
||||
|
||||
import grpc.aio
|
||||
from pydantic import ValidationError
|
||||
@@ -15,19 +14,12 @@ from pydantic import ValidationError
|
||||
from noteflow.config.constants import DEFAULT_GRPC_PORT, SETTING_CLOUD_CONSENT_GRANTED
|
||||
from noteflow.config.settings import get_feature_flags, get_settings
|
||||
from noteflow.infrastructure.asr import FasterWhisperEngine
|
||||
from noteflow.infrastructure.asr.engine import VALID_MODEL_SIZES
|
||||
from noteflow.infrastructure.logging import LoggingConfig, configure_logging, get_logger
|
||||
from noteflow.infrastructure.persistence.unit_of_work import SqlAlchemyUnitOfWork
|
||||
from noteflow.infrastructure.summarization import create_summarization_service
|
||||
|
||||
from ._config import (
|
||||
DEFAULT_BIND_ADDRESS,
|
||||
DEFAULT_MODEL,
|
||||
AsrConfig,
|
||||
DiarizationConfig,
|
||||
GrpcServerConfig,
|
||||
ServicesConfig,
|
||||
)
|
||||
from ._cli import build_config_from_args, parse_args
|
||||
from ._config import DEFAULT_BIND_ADDRESS, AsrConfig, GrpcServerConfig, ServicesConfig
|
||||
from ._startup import (
|
||||
create_calendar_service,
|
||||
create_diarization_engine,
|
||||
@@ -37,6 +29,7 @@ from ._startup import (
|
||||
print_startup_banner,
|
||||
setup_summarization_with_consent,
|
||||
)
|
||||
from .interceptors import IdentityInterceptor, RequestLoggingInterceptor
|
||||
from .proto import noteflow_pb2_grpc
|
||||
from .service import NoteFlowServicer
|
||||
|
||||
@@ -45,6 +38,17 @@ if TYPE_CHECKING:
|
||||
|
||||
from noteflow.config.settings import Settings
|
||||
|
||||
|
||||
class _ServerInitKwargs(TypedDict, total=False):
|
||||
"""Optional initialization parameters for NoteFlowServer."""
|
||||
|
||||
port: int
|
||||
bind_address: str
|
||||
asr: AsrConfig | None
|
||||
session_factory: async_sessionmaker[AsyncSession] | None
|
||||
db_engine: AsyncEngine | None
|
||||
services: ServicesConfig | None
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
@@ -53,32 +57,27 @@ class NoteFlowServer:
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
port: int = DEFAULT_GRPC_PORT,
|
||||
bind_address: str = DEFAULT_BIND_ADDRESS,
|
||||
asr: AsrConfig | None = None,
|
||||
session_factory: async_sessionmaker[AsyncSession] | None = None,
|
||||
db_engine: AsyncEngine | None = None,
|
||||
services: ServicesConfig | None = None,
|
||||
**kwargs: Unpack[_ServerInitKwargs],
|
||||
) -> None:
|
||||
"""Initialize the server.
|
||||
|
||||
Args:
|
||||
port: Port to listen on.
|
||||
bind_address: Address to bind to (0.0.0.0 for all interfaces, 127.0.0.1 for localhost).
|
||||
asr: ASR engine configuration (defaults to AsrConfig()).
|
||||
session_factory: Optional async session factory for database.
|
||||
db_engine: Optional database engine for lifecycle management.
|
||||
services: Optional services configuration grouping all optional services.
|
||||
**kwargs: Optional server initialization parameters.
|
||||
"""
|
||||
port = kwargs.get("port", DEFAULT_GRPC_PORT)
|
||||
bind_address = kwargs.get("bind_address", DEFAULT_BIND_ADDRESS)
|
||||
asr = kwargs.get("asr") or AsrConfig()
|
||||
session_factory = kwargs.get("session_factory")
|
||||
db_engine = kwargs.get("db_engine")
|
||||
services = kwargs.get("services") or ServicesConfig()
|
||||
|
||||
self._port = port
|
||||
self._bind_address = bind_address
|
||||
asr = asr or AsrConfig()
|
||||
self._asr_model = asr.model
|
||||
self._asr_device = asr.device
|
||||
self._asr_compute_type = asr.compute_type
|
||||
self._session_factory = session_factory
|
||||
self._db_engine = db_engine
|
||||
services = services or ServicesConfig()
|
||||
self._summarization_service = services.summarization_service
|
||||
self._diarization_engine = services.diarization_engine
|
||||
self._diarization_refinement_enabled = services.diarization_refinement_enabled
|
||||
@@ -93,76 +92,13 @@ class NoteFlowServer:
|
||||
"""Start the async gRPC server."""
|
||||
logger.info("Starting NoteFlow gRPC server (async)...")
|
||||
|
||||
# Create ASR engine
|
||||
logger.info(
|
||||
"Loading ASR model '%s' on %s (%s)...",
|
||||
self._asr_model,
|
||||
self._asr_device,
|
||||
self._asr_compute_type,
|
||||
)
|
||||
start_time = time.perf_counter()
|
||||
|
||||
asr_engine = FasterWhisperEngine(
|
||||
compute_type=self._asr_compute_type,
|
||||
device=self._asr_device,
|
||||
)
|
||||
asr_engine.load_model(self._asr_model)
|
||||
|
||||
load_time = time.perf_counter() - start_time
|
||||
logger.info("ASR model loaded in %.2f seconds", load_time)
|
||||
|
||||
# Lazy-create summarization service if not provided
|
||||
if self._summarization_service is None:
|
||||
self._summarization_service = create_summarization_service()
|
||||
logger.info("Summarization service initialized (default factory)")
|
||||
|
||||
# Lazy-create project service if not provided (requires database)
|
||||
if self._project_service is None and self._session_factory is not None:
|
||||
from noteflow.application.services.project_service import ProjectService
|
||||
|
||||
self._project_service = ProjectService()
|
||||
logger.info("Project service initialized")
|
||||
|
||||
# Wire consent persistence if database is available
|
||||
await self._wire_consent_persistence()
|
||||
|
||||
# Create servicer with session factory and services config
|
||||
self._servicer = NoteFlowServicer(
|
||||
asr_engine=asr_engine,
|
||||
session_factory=self._session_factory,
|
||||
services=ServicesConfig(
|
||||
summarization_service=self._summarization_service,
|
||||
diarization_engine=self._diarization_engine,
|
||||
diarization_refinement_enabled=self._diarization_refinement_enabled,
|
||||
ner_service=self._ner_service,
|
||||
calendar_service=self._calendar_service,
|
||||
webhook_service=self._webhook_service,
|
||||
project_service=self._project_service,
|
||||
),
|
||||
)
|
||||
|
||||
# Recover orphaned diarization jobs from previous instance
|
||||
asr_engine = self._load_asr_engine()
|
||||
await self._ensure_services()
|
||||
self._servicer = self._build_servicer(asr_engine)
|
||||
await self._recover_orphaned_jobs()
|
||||
|
||||
# Create async gRPC server
|
||||
self._server = grpc.aio.server(
|
||||
options=[
|
||||
("grpc.max_send_message_length", 100 * 1024 * 1024), # 100MB
|
||||
("grpc.max_receive_message_length", 100 * 1024 * 1024),
|
||||
],
|
||||
)
|
||||
|
||||
# Register service
|
||||
noteflow_pb2_grpc.add_NoteFlowServiceServicer_to_server(
|
||||
cast(noteflow_pb2_grpc.NoteFlowServiceServicer, self._servicer),
|
||||
self._server,
|
||||
)
|
||||
|
||||
# Bind to port
|
||||
address = f"{self._bind_address}:{self._port}"
|
||||
self._server.add_insecure_port(address)
|
||||
|
||||
# Start server
|
||||
self._server = self._create_server()
|
||||
address = self._bind_server(self._server)
|
||||
await self._server.start()
|
||||
logger.info("Server listening on %s", address)
|
||||
|
||||
@@ -192,6 +128,78 @@ class NoteFlowServer:
|
||||
if self._server:
|
||||
await self._server.wait_for_termination()
|
||||
|
||||
def _load_asr_engine(self) -> FasterWhisperEngine:
|
||||
"""Create and load the ASR engine."""
|
||||
logger.info(
|
||||
"Loading ASR model '%s' on %s (%s)...",
|
||||
self._asr_model,
|
||||
self._asr_device,
|
||||
self._asr_compute_type,
|
||||
)
|
||||
start_time = time.perf_counter()
|
||||
asr_engine = FasterWhisperEngine(
|
||||
compute_type=self._asr_compute_type,
|
||||
device=self._asr_device,
|
||||
)
|
||||
asr_engine.load_model(self._asr_model)
|
||||
load_time = time.perf_counter() - start_time
|
||||
logger.info("ASR model loaded in %.2f seconds", load_time)
|
||||
return asr_engine
|
||||
|
||||
async def _ensure_services(self) -> None:
|
||||
"""Initialize optional services and wire persistence hooks."""
|
||||
if self._summarization_service is None:
|
||||
self._summarization_service = create_summarization_service()
|
||||
logger.info("Summarization service initialized (default factory)")
|
||||
|
||||
if self._project_service is None and self._session_factory is not None:
|
||||
from noteflow.application.services.project_service import ProjectService
|
||||
|
||||
self._project_service = ProjectService()
|
||||
logger.info("Project service initialized")
|
||||
|
||||
await self._wire_consent_persistence()
|
||||
|
||||
def _build_servicer(self, asr_engine: FasterWhisperEngine) -> NoteFlowServicer:
|
||||
"""Construct the gRPC servicer instance."""
|
||||
return NoteFlowServicer(
|
||||
asr_engine=asr_engine,
|
||||
session_factory=self._session_factory,
|
||||
services=ServicesConfig(
|
||||
summarization_service=self._summarization_service,
|
||||
diarization_engine=self._diarization_engine,
|
||||
diarization_refinement_enabled=self._diarization_refinement_enabled,
|
||||
ner_service=self._ner_service,
|
||||
calendar_service=self._calendar_service,
|
||||
webhook_service=self._webhook_service,
|
||||
project_service=self._project_service,
|
||||
),
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _create_server() -> grpc.aio.Server:
|
||||
"""Create async gRPC server with interceptors and limits."""
|
||||
return grpc.aio.server(
|
||||
interceptors=[
|
||||
RequestLoggingInterceptor(),
|
||||
IdentityInterceptor(),
|
||||
],
|
||||
options=[
|
||||
("grpc.max_send_message_length", 100 * 1024 * 1024), # 100MB
|
||||
("grpc.max_receive_message_length", 100 * 1024 * 1024),
|
||||
],
|
||||
)
|
||||
|
||||
def _bind_server(self, server: grpc.aio.Server) -> str:
|
||||
"""Register servicer and bind the server to the configured address."""
|
||||
noteflow_pb2_grpc.add_NoteFlowServiceServicer_to_server(
|
||||
cast(noteflow_pb2_grpc.NoteFlowServiceServicer, self._servicer),
|
||||
server,
|
||||
)
|
||||
address = f"{self._bind_address}:{self._port}"
|
||||
server.add_insecure_port(address)
|
||||
return address
|
||||
|
||||
async def _recover_orphaned_jobs(self) -> None:
|
||||
"""Mark orphaned diarization jobs as failed on startup.
|
||||
|
||||
@@ -283,35 +291,10 @@ async def run_server_with_config(config: GrpcServerConfig) -> None:
|
||||
Args:
|
||||
config: Complete server configuration.
|
||||
"""
|
||||
# Initialize database if configured
|
||||
session_factory: async_sessionmaker[AsyncSession] | None = None
|
||||
db_engine: AsyncEngine | None = None
|
||||
if config.database_url:
|
||||
db_engine, session_factory = await init_database_and_recovery(config.database_url)
|
||||
|
||||
# Create summarization service and configure cloud consent
|
||||
summarization_service = create_summarization_service()
|
||||
logger.info("Summarization service initialized")
|
||||
|
||||
cloud_llm_provider: str | None = None
|
||||
session_factory, db_engine = await _init_db(config.database_url)
|
||||
settings = get_settings()
|
||||
if session_factory:
|
||||
cloud_llm_provider = await setup_summarization_with_consent(
|
||||
session_factory, summarization_service, settings
|
||||
)
|
||||
|
||||
# Create optional services based on configuration
|
||||
ner_service = create_ner_service(session_factory, settings)
|
||||
calendar_service = await create_calendar_service(session_factory, settings)
|
||||
diarization_engine = create_diarization_engine(config.diarization)
|
||||
webhook_service = await create_webhook_service(session_factory, settings) if session_factory else None
|
||||
|
||||
# Log warning if webhooks enabled but no database
|
||||
if get_feature_flags().webhooks_enabled and not session_factory:
|
||||
logger.warning(
|
||||
"Webhooks feature enabled but no database configured. "
|
||||
"Webhooks require database for configuration persistence."
|
||||
)
|
||||
services, cloud_llm_provider = await _create_services(config, session_factory, settings)
|
||||
_warn_webhooks_without_db(session_factory)
|
||||
|
||||
server = NoteFlowServer(
|
||||
port=config.port,
|
||||
@@ -319,18 +302,78 @@ async def run_server_with_config(config: GrpcServerConfig) -> None:
|
||||
asr=config.asr,
|
||||
session_factory=session_factory,
|
||||
db_engine=db_engine,
|
||||
services=ServicesConfig(
|
||||
summarization_service=summarization_service,
|
||||
diarization_engine=diarization_engine,
|
||||
diarization_refinement_enabled=config.diarization.refinement_enabled,
|
||||
ner_service=ner_service,
|
||||
calendar_service=calendar_service,
|
||||
webhook_service=webhook_service,
|
||||
),
|
||||
services=services,
|
||||
)
|
||||
|
||||
# Set up graceful shutdown
|
||||
loop = asyncio.get_running_loop()
|
||||
shutdown_event = _register_shutdown_handlers(asyncio.get_running_loop())
|
||||
|
||||
try:
|
||||
await server.start()
|
||||
print_startup_banner(
|
||||
config,
|
||||
services.diarization_engine,
|
||||
cloud_llm_provider,
|
||||
services.calendar_service,
|
||||
services.webhook_service,
|
||||
)
|
||||
await shutdown_event.wait()
|
||||
finally:
|
||||
if services.webhook_service is not None:
|
||||
await services.webhook_service.close()
|
||||
await server.stop()
|
||||
|
||||
|
||||
async def _init_db(
|
||||
database_url: str | None,
|
||||
) -> tuple[async_sessionmaker[AsyncSession] | None, AsyncEngine | None]:
|
||||
"""Initialize database and return session factory and engine."""
|
||||
if database_url:
|
||||
db_engine, session_factory = await init_database_and_recovery(database_url)
|
||||
return session_factory, db_engine
|
||||
return None, None
|
||||
|
||||
|
||||
async def _create_services(
|
||||
config: GrpcServerConfig,
|
||||
session_factory: async_sessionmaker[AsyncSession] | None,
|
||||
settings: Settings,
|
||||
) -> tuple[ServicesConfig, str | None]:
|
||||
"""Create optional services based on configuration."""
|
||||
summarization_service = create_summarization_service()
|
||||
logger.info("Summarization service initialized")
|
||||
|
||||
cloud_llm_provider: str | None = None
|
||||
if session_factory:
|
||||
cloud_llm_provider = await setup_summarization_with_consent(
|
||||
session_factory, summarization_service, settings
|
||||
)
|
||||
|
||||
services = ServicesConfig(
|
||||
summarization_service=summarization_service,
|
||||
diarization_engine=create_diarization_engine(config.diarization),
|
||||
diarization_refinement_enabled=config.diarization.refinement_enabled,
|
||||
ner_service=create_ner_service(session_factory, settings),
|
||||
calendar_service=await create_calendar_service(session_factory, settings),
|
||||
webhook_service=await create_webhook_service(session_factory, settings)
|
||||
if session_factory
|
||||
else None,
|
||||
)
|
||||
return services, cloud_llm_provider
|
||||
|
||||
|
||||
def _warn_webhooks_without_db(
|
||||
session_factory: async_sessionmaker[AsyncSession] | None,
|
||||
) -> None:
|
||||
"""Log warning if webhooks are enabled without a database."""
|
||||
if get_feature_flags().webhooks_enabled and not session_factory:
|
||||
logger.warning(
|
||||
"Webhooks feature enabled but no database configured. "
|
||||
"Webhooks require database for configuration persistence."
|
||||
)
|
||||
|
||||
|
||||
def _register_shutdown_handlers(loop: asyncio.AbstractEventLoop) -> asyncio.Event:
|
||||
"""Register signal handlers to trigger a shutdown event."""
|
||||
shutdown_event = asyncio.Event()
|
||||
|
||||
def signal_handler() -> None:
|
||||
@@ -339,159 +382,12 @@ async def run_server_with_config(config: GrpcServerConfig) -> None:
|
||||
|
||||
for sig in (signal.SIGINT, signal.SIGTERM):
|
||||
loop.add_signal_handler(sig, signal_handler)
|
||||
|
||||
try:
|
||||
await server.start()
|
||||
print_startup_banner(
|
||||
config,
|
||||
diarization_engine,
|
||||
cloud_llm_provider,
|
||||
calendar_service,
|
||||
webhook_service,
|
||||
)
|
||||
await shutdown_event.wait()
|
||||
finally:
|
||||
if webhook_service is not None:
|
||||
await webhook_service.close()
|
||||
await server.stop()
|
||||
|
||||
|
||||
def _parse_args() -> argparse.Namespace:
|
||||
"""Parse command-line arguments for the gRPC server."""
|
||||
parser = argparse.ArgumentParser(description="NoteFlow gRPC Server")
|
||||
parser.add_argument(
|
||||
"-p",
|
||||
"--port",
|
||||
type=int,
|
||||
default=DEFAULT_GRPC_PORT,
|
||||
help=f"Port to listen on (default: {DEFAULT_GRPC_PORT})",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-m",
|
||||
"--model",
|
||||
type=str,
|
||||
default=DEFAULT_MODEL,
|
||||
choices=list(VALID_MODEL_SIZES),
|
||||
help=f"ASR model size (default: {DEFAULT_MODEL})",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-d",
|
||||
"--device",
|
||||
type=str,
|
||||
default="cpu",
|
||||
choices=["cpu", "cuda"],
|
||||
help="ASR device (default: cpu)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-c",
|
||||
"--compute-type",
|
||||
type=str,
|
||||
default="int8",
|
||||
choices=["int8", "float16", "float32"],
|
||||
help="ASR compute type (default: int8)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--database-url",
|
||||
type=str,
|
||||
default=None,
|
||||
help="PostgreSQL database URL (overrides NOTEFLOW_DATABASE_URL)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"-v",
|
||||
"--verbose",
|
||||
action="store_true",
|
||||
help="Enable verbose logging",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--diarization",
|
||||
action="store_true",
|
||||
help="Enable speaker diarization (requires pyannote.audio)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--diarization-hf-token",
|
||||
type=str,
|
||||
default=None,
|
||||
help="HuggingFace token for pyannote models (overrides NOTEFLOW_DIARIZATION_HF_TOKEN)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--diarization-device",
|
||||
type=str,
|
||||
default="auto",
|
||||
choices=["auto", "cpu", "cuda", "mps"],
|
||||
help="Device for diarization (default: auto)",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def _build_config(args: argparse.Namespace, settings: Settings | None) -> GrpcServerConfig:
|
||||
"""Build server configuration from CLI arguments and settings.
|
||||
|
||||
CLI arguments take precedence over environment settings.
|
||||
|
||||
Args:
|
||||
args: Parsed command-line arguments.
|
||||
settings: Optional application settings from environment.
|
||||
|
||||
Returns:
|
||||
Complete server configuration.
|
||||
"""
|
||||
# Database URL: args override settings
|
||||
database_url = args.database_url
|
||||
if not database_url and settings:
|
||||
database_url = str(settings.database_url)
|
||||
if not database_url:
|
||||
logger.warning("No database URL configured, running in-memory mode")
|
||||
|
||||
# Diarization config: args override settings
|
||||
diarization_enabled = args.diarization
|
||||
diarization_hf_token = args.diarization_hf_token
|
||||
diarization_device = args.diarization_device
|
||||
diarization_streaming_latency: float | None = None
|
||||
diarization_min_speakers: int | None = None
|
||||
diarization_max_speakers: int | None = None
|
||||
diarization_refinement_enabled = True
|
||||
|
||||
if settings and not diarization_enabled:
|
||||
diarization_enabled = settings.diarization_enabled
|
||||
if settings and not diarization_hf_token:
|
||||
diarization_hf_token = settings.diarization_hf_token
|
||||
if settings and diarization_device == "auto":
|
||||
diarization_device = settings.diarization_device
|
||||
if settings:
|
||||
diarization_streaming_latency = settings.diarization_streaming_latency
|
||||
diarization_min_speakers = settings.diarization_min_speakers
|
||||
diarization_max_speakers = settings.diarization_max_speakers
|
||||
diarization_refinement_enabled = settings.diarization_refinement_enabled
|
||||
|
||||
# Bind address: settings override default (0.0.0.0)
|
||||
bind_address = DEFAULT_BIND_ADDRESS
|
||||
if settings:
|
||||
bind_address = settings.grpc_bind_address
|
||||
|
||||
return GrpcServerConfig(
|
||||
port=args.port,
|
||||
bind_address=bind_address,
|
||||
asr=AsrConfig(
|
||||
model=args.model,
|
||||
device=args.device,
|
||||
compute_type=args.compute_type,
|
||||
),
|
||||
database_url=database_url,
|
||||
diarization=DiarizationConfig(
|
||||
enabled=diarization_enabled,
|
||||
hf_token=diarization_hf_token,
|
||||
device=diarization_device,
|
||||
streaming_latency=diarization_streaming_latency,
|
||||
min_speakers=diarization_min_speakers,
|
||||
max_speakers=diarization_max_speakers,
|
||||
refinement_enabled=diarization_refinement_enabled,
|
||||
),
|
||||
)
|
||||
return shutdown_event
|
||||
|
||||
|
||||
def main() -> None:
|
||||
"""Entry point for NoteFlow gRPC server."""
|
||||
args = _parse_args()
|
||||
args = parse_args()
|
||||
|
||||
# Configure centralized logging with structlog
|
||||
# Get log_format from env before full settings load (logging needed to report load errors)
|
||||
@@ -507,7 +403,7 @@ def main() -> None:
|
||||
settings = None
|
||||
|
||||
# Build configuration and run server
|
||||
config = _build_config(args, settings)
|
||||
config = build_config_from_args(args, settings)
|
||||
asyncio.run(run_server_with_config(config))
|
||||
|
||||
|
||||
|
||||
@@ -28,6 +28,7 @@ from noteflow.infrastructure.security.keystore import KeyringKeyStore
|
||||
from ._config import ServicesConfig
|
||||
from ._mixins import (
|
||||
AnnotationMixin,
|
||||
GrpcContext,
|
||||
CalendarMixin,
|
||||
DiarizationJobMixin,
|
||||
DiarizationMixin,
|
||||
@@ -48,14 +49,12 @@ from .proto import noteflow_pb2, noteflow_pb2_grpc
|
||||
from .stream_state import MeetingStreamState
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import AsyncIterator
|
||||
|
||||
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker
|
||||
|
||||
from noteflow.infrastructure.asr import FasterWhisperEngine
|
||||
from noteflow.infrastructure.auth.oidc_registry import OidcAuthService
|
||||
|
||||
from ._mixins._types import GrpcContext
|
||||
from ._service_stubs import NoteFlowServicerStubs
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
@@ -64,6 +63,13 @@ if TYPE_CHECKING:
|
||||
else:
|
||||
_GrpcBaseServicer = noteflow_pb2_grpc.NoteFlowServiceServicer
|
||||
|
||||
# Empty class to satisfy MRO - cannot use `object` directly as it conflicts
|
||||
# with NoteFlowServiceServicer's inheritance from object
|
||||
class NoteFlowServicerStubs:
|
||||
"""Runtime placeholder for type stubs (empty at runtime)."""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
class NoteFlowServicer(
|
||||
StreamingMixin,
|
||||
@@ -82,6 +88,7 @@ class NoteFlowServicer(
|
||||
OidcMixin,
|
||||
ProjectMixin,
|
||||
ProjectMembershipMixin,
|
||||
NoteFlowServicerStubs,
|
||||
_GrpcBaseServicer,
|
||||
):
|
||||
"""Async gRPC service implementation for NoteFlow with PostgreSQL persistence.
|
||||
@@ -90,153 +97,7 @@ class NoteFlowServicer(
|
||||
use `self: Protocol` annotations.
|
||||
"""
|
||||
|
||||
# Type stubs for mixin methods (fixes type inference when mixins use `self: Protocol`)
|
||||
if TYPE_CHECKING:
|
||||
# StreamingMixin (test_streaming_real_pipeline.py, test_e2e_streaming.py)
|
||||
def StreamTranscription(
|
||||
self,
|
||||
request_iterator: AsyncIterator[noteflow_pb2.AudioChunk],
|
||||
context: GrpcContext,
|
||||
) -> AsyncIterator[noteflow_pb2.TranscriptUpdate]: ...
|
||||
|
||||
# CalendarMixin (test_oauth.py)
|
||||
async def GetCalendarProviders(
|
||||
self,
|
||||
request: noteflow_pb2.GetCalendarProvidersRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetCalendarProvidersResponse: ...
|
||||
|
||||
async def InitiateOAuth(
|
||||
self,
|
||||
request: noteflow_pb2.InitiateOAuthRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.InitiateOAuthResponse: ...
|
||||
|
||||
async def CompleteOAuth(
|
||||
self,
|
||||
request: noteflow_pb2.CompleteOAuthRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.CompleteOAuthResponse: ...
|
||||
|
||||
async def GetOAuthConnectionStatus(
|
||||
self,
|
||||
request: noteflow_pb2.GetOAuthConnectionStatusRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetOAuthConnectionStatusResponse: ...
|
||||
|
||||
async def DisconnectOAuth(
|
||||
self,
|
||||
request: noteflow_pb2.DisconnectOAuthRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.DisconnectOAuthResponse: ...
|
||||
|
||||
# Type stubs for SummarizationMixin methods (test_cloud_consent.py, test_generate_summary.py)
|
||||
async def GetCloudConsentStatus(
|
||||
self,
|
||||
request: noteflow_pb2.GetCloudConsentStatusRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetCloudConsentStatusResponse: ...
|
||||
|
||||
async def GrantCloudConsent(
|
||||
self,
|
||||
request: noteflow_pb2.GrantCloudConsentRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GrantCloudConsentResponse: ...
|
||||
|
||||
async def RevokeCloudConsent(
|
||||
self,
|
||||
request: noteflow_pb2.RevokeCloudConsentRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.RevokeCloudConsentResponse: ...
|
||||
|
||||
async def GenerateSummary(
|
||||
self,
|
||||
request: noteflow_pb2.GenerateSummaryRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.Summary: ...
|
||||
|
||||
# Type stubs for SyncMixin methods (test_sync_orchestration.py)
|
||||
async def StartIntegrationSync(
|
||||
self,
|
||||
request: noteflow_pb2.StartIntegrationSyncRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.StartIntegrationSyncResponse: ...
|
||||
|
||||
async def GetSyncStatus(
|
||||
self,
|
||||
request: noteflow_pb2.GetSyncStatusRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetSyncStatusResponse: ...
|
||||
|
||||
async def ListSyncHistory(
|
||||
self,
|
||||
request: noteflow_pb2.ListSyncHistoryRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.ListSyncHistoryResponse: ...
|
||||
|
||||
async def GetUserIntegrations(
|
||||
self,
|
||||
request: noteflow_pb2.GetUserIntegrationsRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetUserIntegrationsResponse: ...
|
||||
|
||||
# Type stubs for DiarizationMixin methods (test_diarization_mixin.py, test_diarization_refine.py)
|
||||
async def RefineSpeakerDiarization(
|
||||
self,
|
||||
request: noteflow_pb2.RefineSpeakerDiarizationRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.RefineSpeakerDiarizationResponse: ...
|
||||
|
||||
# Type stubs for SpeakerMixin methods (test_diarization_mixin.py)
|
||||
async def RenameSpeaker(
|
||||
self,
|
||||
request: noteflow_pb2.RenameSpeakerRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.RenameSpeakerResponse: ...
|
||||
|
||||
# Type stubs for DiarizationJobMixin methods (test_diarization_mixin.py, test_diarization_cancel.py)
|
||||
async def GetDiarizationJobStatus(
|
||||
self,
|
||||
request: noteflow_pb2.GetDiarizationJobStatusRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.DiarizationJobStatus: ...
|
||||
|
||||
async def CancelDiarizationJob(
|
||||
self,
|
||||
request: noteflow_pb2.CancelDiarizationJobRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.CancelDiarizationJobResponse: ...
|
||||
|
||||
# Type stubs for WebhooksMixin methods (test_webhooks_mixin.py)
|
||||
async def RegisterWebhook(
|
||||
self,
|
||||
request: noteflow_pb2.RegisterWebhookRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.WebhookConfigProto: ...
|
||||
|
||||
async def ListWebhooks(
|
||||
self,
|
||||
request: noteflow_pb2.ListWebhooksRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.ListWebhooksResponse: ...
|
||||
|
||||
async def UpdateWebhook(
|
||||
self,
|
||||
request: noteflow_pb2.UpdateWebhookRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.WebhookConfigProto: ...
|
||||
|
||||
async def DeleteWebhook(
|
||||
self,
|
||||
request: noteflow_pb2.DeleteWebhookRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.DeleteWebhookResponse: ...
|
||||
|
||||
async def GetWebhookDeliveries(
|
||||
self,
|
||||
request: noteflow_pb2.GetWebhookDeliveriesRequest,
|
||||
context: GrpcContext,
|
||||
) -> noteflow_pb2.GetWebhookDeliveriesResponse: ...
|
||||
# Type stubs now live in _service_stubs.py to keep module size down.
|
||||
|
||||
VERSION: Final[str] = __version__
|
||||
MAX_CHUNK_SIZE: Final[int] = 1024 * 1024 # 1MB
|
||||
@@ -542,7 +403,6 @@ class NoteFlowServicer(
|
||||
any running jobs as failed in the database.
|
||||
"""
|
||||
logger.info("Shutting down servicer...")
|
||||
|
||||
# Cancel in-flight diarization tasks
|
||||
cancelled_job_ids = list(self.diarization_tasks.keys())
|
||||
for job_id, task in list(self.diarization_tasks.items()):
|
||||
|
||||
@@ -7,7 +7,7 @@ from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from collections.abc import Iterable, Iterator
|
||||
from typing import TYPE_CHECKING, Final, Protocol, cast
|
||||
from typing import TYPE_CHECKING, Final, Protocol, TypedDict, Unpack, cast
|
||||
|
||||
from noteflow.infrastructure.logging import get_logger, log_timing
|
||||
|
||||
@@ -16,6 +16,15 @@ if TYPE_CHECKING:
|
||||
from numpy.typing import NDArray
|
||||
|
||||
|
||||
class _WhisperTranscribeKwargs(TypedDict, total=False):
|
||||
"""Keyword arguments supported by WhisperModel.transcribe."""
|
||||
|
||||
language: str | None
|
||||
word_timestamps: bool
|
||||
beam_size: int
|
||||
vad_filter: bool
|
||||
|
||||
|
||||
class _WhisperWord(Protocol):
|
||||
word: str
|
||||
start: float
|
||||
@@ -41,11 +50,7 @@ class _WhisperModel(Protocol):
|
||||
def transcribe(
|
||||
self,
|
||||
audio: NDArray[np.float32],
|
||||
*,
|
||||
language: str | None = None,
|
||||
word_timestamps: bool = ...,
|
||||
beam_size: int = ...,
|
||||
vad_filter: bool = ...,
|
||||
**kwargs: Unpack[_WhisperTranscribeKwargs],
|
||||
) -> tuple[Iterable[_WhisperSegment], _WhisperInfo]: ...
|
||||
|
||||
from noteflow.infrastructure.asr.dto import AsrResult, WordTiming
|
||||
@@ -209,13 +214,35 @@ class FasterWhisperEngine:
|
||||
Returns:
|
||||
List of AsrResult segments with word-level timestamps.
|
||||
"""
|
||||
# Calculate audio duration for timing context
|
||||
sample_count = len(audio)
|
||||
# Assume 16kHz sample rate (standard for Whisper)
|
||||
audio_duration_seconds = sample_count / 16000.0
|
||||
|
||||
loop = asyncio.get_running_loop()
|
||||
return await loop.run_in_executor(
|
||||
None,
|
||||
self._transcribe_to_list,
|
||||
audio,
|
||||
language,
|
||||
)
|
||||
with log_timing(
|
||||
"asr_transcribe",
|
||||
audio_duration_seconds=round(audio_duration_seconds, 2),
|
||||
sample_count=sample_count,
|
||||
model_size=self._model_size,
|
||||
):
|
||||
results = await loop.run_in_executor(
|
||||
None,
|
||||
self._transcribe_to_list,
|
||||
audio,
|
||||
language,
|
||||
)
|
||||
|
||||
# Log real-time factor (RTF) for performance monitoring
|
||||
# RTF < 1.0 means faster than real-time
|
||||
if audio_duration_seconds > 0:
|
||||
logger.debug(
|
||||
"asr_transcribe_rtf",
|
||||
segment_count=len(results),
|
||||
audio_duration_seconds=round(audio_duration_seconds, 2),
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
def _transcribe_to_list(
|
||||
self,
|
||||
|
||||
@@ -14,10 +14,13 @@ import numpy as np
|
||||
from numpy.typing import NDArray
|
||||
|
||||
from noteflow.config.constants import DEFAULT_SAMPLE_RATE
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Iterator
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
class SegmenterState(Enum):
|
||||
"""Segmenter state machine states."""
|
||||
@@ -150,9 +153,17 @@ class Segmenter:
|
||||
"""Handle audio in IDLE state."""
|
||||
if is_speech:
|
||||
# Speech started - transition to SPEECH state
|
||||
old_state = self._state
|
||||
self._state = SegmenterState.SPEECH
|
||||
self._speech_start_time = chunk_start
|
||||
|
||||
logger.debug(
|
||||
"segmenter_state_transition",
|
||||
from_state=old_state.name,
|
||||
to_state=self._state.name,
|
||||
stream_time=round(self._stream_time, 2),
|
||||
)
|
||||
|
||||
# Capture how much pre-speech audio we are including (O(1) lookup).
|
||||
self._leading_duration = self._leading_buffer_samples / self.config.sample_rate
|
||||
|
||||
@@ -194,15 +205,29 @@ class Segmenter:
|
||||
else:
|
||||
# Speech ended - transition to TRAILING
|
||||
# Start trailing buffer with this silent chunk
|
||||
old_state = self._state
|
||||
self._state = SegmenterState.TRAILING
|
||||
self._trailing_buffer = [audio]
|
||||
self._trailing_duration = chunk_duration
|
||||
|
||||
logger.debug(
|
||||
"segmenter_state_transition",
|
||||
from_state=old_state.name,
|
||||
to_state=self._state.name,
|
||||
stream_time=round(self._stream_time, 2),
|
||||
)
|
||||
|
||||
# Check if already past trailing threshold
|
||||
if self._trailing_duration >= self.config.trailing_silence:
|
||||
segment = self._emit_segment()
|
||||
if segment is not None:
|
||||
yield segment
|
||||
logger.debug(
|
||||
"segmenter_state_transition",
|
||||
from_state=SegmenterState.TRAILING.name,
|
||||
to_state=SegmenterState.IDLE.name,
|
||||
stream_time=round(self._stream_time, 2),
|
||||
)
|
||||
self._state = SegmenterState.IDLE
|
||||
|
||||
def _handle_trailing(
|
||||
@@ -219,7 +244,15 @@ class Segmenter:
|
||||
self._speech_buffer.append(audio)
|
||||
self._trailing_buffer.clear()
|
||||
self._trailing_duration = 0.0
|
||||
old_state = self._state
|
||||
self._state = SegmenterState.SPEECH
|
||||
|
||||
logger.debug(
|
||||
"segmenter_state_transition",
|
||||
from_state=old_state.name,
|
||||
to_state=self._state.name,
|
||||
stream_time=round(self._stream_time, 2),
|
||||
)
|
||||
else:
|
||||
# Still silence - accumulate trailing
|
||||
self._trailing_buffer.append(audio)
|
||||
@@ -230,6 +263,12 @@ class Segmenter:
|
||||
segment = self._emit_segment()
|
||||
if segment is not None:
|
||||
yield segment
|
||||
logger.debug(
|
||||
"segmenter_state_transition",
|
||||
from_state=SegmenterState.TRAILING.name,
|
||||
to_state=SegmenterState.IDLE.name,
|
||||
stream_time=round(self._stream_time, 2),
|
||||
)
|
||||
self._state = SegmenterState.IDLE
|
||||
|
||||
def _update_leading_buffer(self, audio: NDArray[np.float32]) -> None:
|
||||
|
||||
@@ -7,12 +7,13 @@ from __future__ import annotations
|
||||
|
||||
import time
|
||||
from collections.abc import Callable, Mapping
|
||||
from typing import TYPE_CHECKING, cast
|
||||
from typing import TYPE_CHECKING, Unpack, cast
|
||||
|
||||
import numpy as np
|
||||
|
||||
from noteflow.config.constants import DEFAULT_SAMPLE_RATE
|
||||
from noteflow.infrastructure.audio.dto import AudioDeviceInfo, AudioFrameCallback
|
||||
from noteflow.infrastructure.audio.protocols import AudioCaptureStartKwargs
|
||||
from noteflow.infrastructure.audio.sounddevice_support import (
|
||||
InputStreamLike,
|
||||
MissingSoundDevice,
|
||||
@@ -110,18 +111,14 @@ class SoundDeviceCapture:
|
||||
self,
|
||||
device_id: int | None,
|
||||
on_frames: AudioFrameCallback,
|
||||
sample_rate: int = DEFAULT_SAMPLE_RATE,
|
||||
channels: int = 1,
|
||||
chunk_duration_ms: int = 100,
|
||||
**kwargs: Unpack[AudioCaptureStartKwargs],
|
||||
) -> None:
|
||||
"""Start capturing audio from the specified device.
|
||||
|
||||
Args:
|
||||
device_id: Device ID to capture from, or None for default device.
|
||||
on_frames: Callback receiving (frames, timestamp) for each chunk.
|
||||
sample_rate: Sample rate in Hz (default 16kHz for ASR).
|
||||
channels: Number of channels (default 1 for mono).
|
||||
chunk_duration_ms: Duration of each audio chunk in milliseconds.
|
||||
**kwargs: Optional capture settings (sample_rate, channels, chunk_duration_ms).
|
||||
|
||||
Raises:
|
||||
RuntimeError: If already capturing.
|
||||
@@ -130,6 +127,10 @@ class SoundDeviceCapture:
|
||||
if self._stream is not None:
|
||||
raise RuntimeError("Already capturing audio")
|
||||
|
||||
sample_rate = kwargs.get("sample_rate", DEFAULT_SAMPLE_RATE)
|
||||
channels = kwargs.get("channels", 1)
|
||||
chunk_duration_ms = kwargs.get("chunk_duration_ms", 100)
|
||||
|
||||
self._callback = on_frames
|
||||
self._device_id = device_id
|
||||
self._sample_rate = sample_rate
|
||||
|
||||
@@ -5,9 +5,7 @@ Define Protocol interfaces for audio capture, level metering, and buffering.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import TYPE_CHECKING, Protocol
|
||||
|
||||
from noteflow.config.constants import DEFAULT_SAMPLE_RATE
|
||||
from typing import TYPE_CHECKING, Protocol, TypedDict, Unpack
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import numpy as np
|
||||
@@ -20,6 +18,14 @@ if TYPE_CHECKING:
|
||||
)
|
||||
|
||||
|
||||
class AudioCaptureStartKwargs(TypedDict, total=False):
|
||||
"""Optional parameters for AudioCapture.start."""
|
||||
|
||||
sample_rate: int
|
||||
channels: int
|
||||
chunk_duration_ms: int
|
||||
|
||||
|
||||
class AudioCapture(Protocol):
|
||||
"""Protocol for audio input capture.
|
||||
|
||||
@@ -39,18 +45,14 @@ class AudioCapture(Protocol):
|
||||
self,
|
||||
device_id: int | None,
|
||||
on_frames: AudioFrameCallback,
|
||||
sample_rate: int = DEFAULT_SAMPLE_RATE,
|
||||
channels: int = 1,
|
||||
chunk_duration_ms: int = 100,
|
||||
**kwargs: Unpack[AudioCaptureStartKwargs],
|
||||
) -> None:
|
||||
"""Start capturing audio from the specified device.
|
||||
|
||||
Args:
|
||||
device_id: Device ID to capture from, or None for default device.
|
||||
on_frames: Callback receiving (frames, timestamp) for each chunk.
|
||||
sample_rate: Sample rate in Hz (default 16kHz for ASR).
|
||||
channels: Number of channels (default 1 for mono).
|
||||
chunk_duration_ms: Duration of each audio chunk in milliseconds.
|
||||
**kwargs: Optional capture settings (sample_rate, channels, chunk_duration_ms).
|
||||
|
||||
Raises:
|
||||
RuntimeError: If already capturing.
|
||||
|
||||
@@ -7,7 +7,7 @@ import json
|
||||
import threading
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, TypedDict, Unpack
|
||||
|
||||
import numpy as np
|
||||
|
||||
@@ -24,6 +24,13 @@ if TYPE_CHECKING:
|
||||
|
||||
from noteflow.infrastructure.security.crypto import AesGcmCryptoBox
|
||||
|
||||
|
||||
class _AudioWriterOpenKwargs(TypedDict, total=False):
|
||||
"""Optional parameters for opening a meeting writer."""
|
||||
|
||||
sample_rate: int
|
||||
asset_path: str | None
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
@@ -81,8 +88,7 @@ class MeetingAudioWriter:
|
||||
meeting_id: str,
|
||||
dek: bytes,
|
||||
wrapped_dek: bytes,
|
||||
sample_rate: int = DEFAULT_SAMPLE_RATE,
|
||||
asset_path: str | None = None,
|
||||
**kwargs: Unpack[_AudioWriterOpenKwargs],
|
||||
) -> None:
|
||||
"""Open meeting for audio writing.
|
||||
|
||||
@@ -92,54 +98,21 @@ class MeetingAudioWriter:
|
||||
meeting_id: Meeting UUID string.
|
||||
dek: Unwrapped data encryption key (32 bytes).
|
||||
wrapped_dek: Encrypted DEK to store in manifest.
|
||||
sample_rate: Audio sample rate (default 16000 Hz).
|
||||
asset_path: Relative path for audio storage (defaults to meeting_id).
|
||||
This allows meetings_dir to change without orphaning files.
|
||||
**kwargs: Optional settings (sample_rate, asset_path).
|
||||
|
||||
Raises:
|
||||
RuntimeError: If already open.
|
||||
OSError: If directory creation fails.
|
||||
"""
|
||||
if self._asset_writer is not None:
|
||||
raise RuntimeError("Writer already open")
|
||||
sample_rate = kwargs.get("sample_rate", DEFAULT_SAMPLE_RATE)
|
||||
asset_path = kwargs.get("asset_path")
|
||||
|
||||
# Use asset_path if provided, otherwise default to meeting_id
|
||||
storage_path = asset_path or meeting_id
|
||||
|
||||
# Create meeting directory
|
||||
self._meeting_dir = self._meetings_dir / storage_path
|
||||
self._meeting_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Write manifest.json
|
||||
manifest = {
|
||||
"meeting_id": meeting_id,
|
||||
"created_at": datetime.now(UTC).isoformat(),
|
||||
"sample_rate": sample_rate,
|
||||
"channels": 1,
|
||||
"format": "pcm16",
|
||||
"wrapped_dek": wrapped_dek.hex(), # Store as hex string
|
||||
}
|
||||
manifest_path = self._meeting_dir / "manifest.json"
|
||||
manifest_path.write_text(json.dumps(manifest, indent=2))
|
||||
|
||||
# Open encrypted audio file
|
||||
audio_path = self._meeting_dir / "audio.enc"
|
||||
self._asset_writer = ChunkedAssetWriter(self._crypto)
|
||||
self._asset_writer.open(audio_path, dek)
|
||||
|
||||
self._sample_rate = sample_rate
|
||||
self._chunk_count = 0
|
||||
self._write_count = 0
|
||||
self._buffer = io.BytesIO()
|
||||
|
||||
# Start periodic flush thread for crash resilience
|
||||
self._stop_flush.clear()
|
||||
self._flush_thread = threading.Thread(
|
||||
target=self._periodic_flush_loop,
|
||||
name=f"AudioFlush-{meeting_id[:8]}",
|
||||
daemon=True,
|
||||
)
|
||||
self._flush_thread.start()
|
||||
self._ensure_closed()
|
||||
self._initialize_meeting_dir(meeting_id, asset_path)
|
||||
self._write_manifest(meeting_id, wrapped_dek, sample_rate)
|
||||
self._open_audio_file(dek)
|
||||
self._reset_state(sample_rate)
|
||||
self._start_flush_thread(meeting_id)
|
||||
|
||||
logger.info(
|
||||
"Opened audio writer: meeting=%s, dir=%s, buffer_size=%d",
|
||||
@@ -148,6 +121,63 @@ class MeetingAudioWriter:
|
||||
self._buffer_size,
|
||||
)
|
||||
|
||||
def _ensure_closed(self) -> None:
|
||||
"""Raise if writer is already open."""
|
||||
if self._asset_writer is not None:
|
||||
raise RuntimeError("Writer already open")
|
||||
|
||||
def _initialize_meeting_dir(self, meeting_id: str, asset_path: str | None) -> None:
|
||||
"""Create the meeting directory for this session."""
|
||||
storage_path = asset_path or meeting_id
|
||||
self._meeting_dir = self._meetings_dir / storage_path
|
||||
self._meeting_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
def _write_manifest(
|
||||
self,
|
||||
meeting_id: str,
|
||||
wrapped_dek: bytes,
|
||||
sample_rate: int,
|
||||
) -> None:
|
||||
"""Write the manifest.json metadata file."""
|
||||
if self._meeting_dir is None:
|
||||
raise RuntimeError("Meeting directory not initialized")
|
||||
manifest = {
|
||||
"meeting_id": meeting_id,
|
||||
"created_at": datetime.now(UTC).isoformat(),
|
||||
"sample_rate": sample_rate,
|
||||
"channels": 1,
|
||||
"format": "pcm16",
|
||||
"wrapped_dek": wrapped_dek.hex(),
|
||||
}
|
||||
manifest_path = self._meeting_dir / "manifest.json"
|
||||
manifest_path.write_text(json.dumps(manifest, indent=2))
|
||||
|
||||
def _open_audio_file(self, dek: bytes) -> None:
|
||||
"""Open the encrypted audio file for writing."""
|
||||
if self._meeting_dir is None:
|
||||
raise RuntimeError("Meeting directory not initialized")
|
||||
audio_path = self._meeting_dir / "audio.enc"
|
||||
self._asset_writer = ChunkedAssetWriter(self._crypto)
|
||||
self._asset_writer.open(audio_path, dek)
|
||||
|
||||
def _reset_state(self, sample_rate: int) -> None:
|
||||
"""Reset internal counters and buffers for a new meeting."""
|
||||
self._sample_rate = sample_rate
|
||||
self._chunk_count = 0
|
||||
self._write_count = 0
|
||||
self._buffer = io.BytesIO()
|
||||
|
||||
def _start_flush_thread(self, meeting_id: str) -> None:
|
||||
"""Start periodic flush thread for crash resilience."""
|
||||
self._stop_flush.clear()
|
||||
self._flush_thread = threading.Thread(
|
||||
target=self._periodic_flush_loop,
|
||||
name=f"AudioFlush-{meeting_id[:8]}",
|
||||
daemon=True,
|
||||
)
|
||||
self._flush_thread.start()
|
||||
logger.info("flush_thread_started", meeting_id=meeting_id)
|
||||
|
||||
def _periodic_flush_loop(self) -> None:
|
||||
"""Background thread: periodically flush buffer for crash resilience."""
|
||||
while not self._stop_flush.wait(timeout=PERIODIC_FLUSH_INTERVAL_SECONDS):
|
||||
@@ -243,7 +273,9 @@ class MeetingAudioWriter:
|
||||
if self._flush_thread is not None:
|
||||
self._flush_thread.join(timeout=3.0)
|
||||
if self._flush_thread.is_alive():
|
||||
logger.warning("Audio flush thread did not stop within timeout")
|
||||
logger.warning("flush_thread_timeout", message="Audio flush thread did not stop within timeout")
|
||||
else:
|
||||
logger.info("flush_thread_stopped")
|
||||
self._flush_thread = None
|
||||
|
||||
if self._asset_writer is not None:
|
||||
|
||||
@@ -15,6 +15,7 @@ from noteflow.domain.auth.oidc import (
|
||||
OidcProviderConfig,
|
||||
OidcProviderCreateParams,
|
||||
OidcProviderPreset,
|
||||
OidcProviderRegistration,
|
||||
)
|
||||
from noteflow.infrastructure.auth.oidc_discovery import (
|
||||
OidcDiscoveryClient,
|
||||
@@ -186,21 +187,15 @@ class OidcProviderRegistry:
|
||||
|
||||
async def create_provider(
|
||||
self,
|
||||
workspace_id: UUID,
|
||||
name: str,
|
||||
issuer_url: str,
|
||||
client_id: str,
|
||||
params: OidcProviderCreateParams | None = None,
|
||||
registration: OidcProviderRegistration,
|
||||
*,
|
||||
params: OidcProviderCreateParams | None = None,
|
||||
auto_discover: bool = True,
|
||||
) -> OidcProviderConfig:
|
||||
"""Create and configure a new OIDC provider.
|
||||
|
||||
Args:
|
||||
workspace_id: Workspace this provider belongs to.
|
||||
name: Display name for the provider.
|
||||
issuer_url: OIDC issuer URL.
|
||||
client_id: OAuth client ID.
|
||||
registration: Provider registration details.
|
||||
params: Optional creation parameters (preset, scopes, etc.).
|
||||
auto_discover: Whether to fetch discovery document.
|
||||
|
||||
@@ -223,10 +218,10 @@ class OidcProviderRegistry:
|
||||
)
|
||||
|
||||
provider = OidcProviderConfig.create(
|
||||
workspace_id=workspace_id,
|
||||
name=name,
|
||||
issuer_url=issuer_url,
|
||||
client_id=client_id,
|
||||
workspace_id=registration.workspace_id,
|
||||
name=registration.name,
|
||||
issuer_url=registration.issuer_url,
|
||||
client_id=registration.client_id,
|
||||
params=effective_params,
|
||||
)
|
||||
|
||||
@@ -350,24 +345,14 @@ class OidcAuthService:
|
||||
|
||||
async def register_provider(
|
||||
self,
|
||||
workspace_id: UUID,
|
||||
name: str,
|
||||
issuer_url: str,
|
||||
client_id: str,
|
||||
client_secret: str | None = None,
|
||||
registration: OidcProviderRegistration,
|
||||
*,
|
||||
preset: OidcProviderPreset = OidcProviderPreset.CUSTOM,
|
||||
uow: UnitOfWork | None = None,
|
||||
) -> tuple[OidcProviderConfig, list[str]]:
|
||||
"""Register a new OIDC provider with validation.
|
||||
|
||||
Args:
|
||||
workspace_id: Workspace this provider belongs to.
|
||||
name: Display name for the provider.
|
||||
issuer_url: OIDC issuer URL.
|
||||
client_id: OAuth client ID.
|
||||
client_secret: Optional client secret (for confidential clients).
|
||||
preset: Provider preset.
|
||||
registration: Provider registration details.
|
||||
uow: Unit of work for persistence.
|
||||
|
||||
Returns:
|
||||
@@ -377,17 +362,14 @@ class OidcAuthService:
|
||||
OidcDiscoveryError: If discovery fails.
|
||||
"""
|
||||
provider = await self._registry.create_provider(
|
||||
workspace_id=workspace_id,
|
||||
name=name,
|
||||
issuer_url=issuer_url,
|
||||
client_id=client_id,
|
||||
params=OidcProviderCreateParams(preset=preset),
|
||||
registration,
|
||||
params=OidcProviderCreateParams(preset=registration.preset),
|
||||
)
|
||||
|
||||
warnings = await self._registry.validate_provider(provider)
|
||||
|
||||
# Store client secret securely if provided
|
||||
if client_secret and uow:
|
||||
if registration.client_secret and uow:
|
||||
# Would store in IntegrationSecretModel
|
||||
logger.info("Client secret provided for provider %s", provider.id)
|
||||
|
||||
|
||||
@@ -238,14 +238,18 @@ class GoogleCalendarAdapter(CalendarPort):
|
||||
|
||||
def _extract_meeting_url(self, item: _GoogleEvent) -> str | None:
|
||||
"""Extract video meeting URL from event data."""
|
||||
if hangout_link := item.get("hangoutLink"):
|
||||
hangout_link = item.get("hangoutLink")
|
||||
if hangout_link:
|
||||
return hangout_link
|
||||
|
||||
if conference_data := item.get("conferenceData"):
|
||||
entry_points = conference_data.get("entryPoints", [])
|
||||
for entry in entry_points:
|
||||
if entry.get("entryPointType") == "video":
|
||||
if uri := entry.get("uri"):
|
||||
return uri
|
||||
conference_data = item.get("conferenceData")
|
||||
if not conference_data:
|
||||
return None
|
||||
|
||||
entry_points = conference_data.get("entryPoints", [])
|
||||
for entry in entry_points:
|
||||
uri = entry.get("uri")
|
||||
if entry.get("entryPointType") == "video" and uri:
|
||||
return uri
|
||||
|
||||
return None
|
||||
|
||||
@@ -5,6 +5,7 @@ Implements CalendarPort for Outlook using Microsoft Graph API.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import UTC, datetime, timedelta
|
||||
from typing import Final, TypedDict, cast
|
||||
|
||||
@@ -32,6 +33,16 @@ MAX_ERROR_BODY_LENGTH: Final[int] = 500
|
||||
GRAPH_API_MAX_PAGE_SIZE: Final[int] = 100 # Graph API maximum
|
||||
|
||||
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class _OutlookEventQuery:
|
||||
"""Query parameters for fetching calendar events."""
|
||||
|
||||
start_time: str
|
||||
end_time: str
|
||||
hours_ahead: int
|
||||
limit: int
|
||||
|
||||
|
||||
class _OutlookDateTime(TypedDict, total=False):
|
||||
dateTime: str
|
||||
timeZone: str
|
||||
@@ -139,22 +150,6 @@ class OutlookCalendarAdapter(CalendarPort):
|
||||
"Prefer": 'outlook.timezone="UTC"',
|
||||
}
|
||||
|
||||
# Initial page request
|
||||
page_size = min(limit, GRAPH_API_MAX_PAGE_SIZE)
|
||||
url: str | None = f"{self.GRAPH_API_BASE}/me/calendarView"
|
||||
params: dict[str, str | int] | None = {
|
||||
"startDateTime": start_time,
|
||||
"endDateTime": end_time,
|
||||
"$top": page_size,
|
||||
"$orderby": "start/dateTime",
|
||||
"$select": (
|
||||
"id,subject,start,end,location,bodyPreview,"
|
||||
"attendees,isAllDay,seriesMasterId,onlineMeeting,onlineMeetingUrl"
|
||||
),
|
||||
}
|
||||
|
||||
all_events: list[CalendarEventInfo] = []
|
||||
|
||||
with log_timing(
|
||||
"outlook_calendar_list_events",
|
||||
hours_ahead=hours_ahead,
|
||||
@@ -164,38 +159,17 @@ class OutlookCalendarAdapter(CalendarPort):
|
||||
timeout=httpx.Timeout(GRAPH_API_TIMEOUT),
|
||||
limits=httpx.Limits(max_connections=MAX_CONNECTIONS),
|
||||
) as client:
|
||||
while url is not None:
|
||||
response = await client.get(url, params=params, headers=headers)
|
||||
|
||||
if response.status_code == HTTP_STATUS_UNAUTHORIZED:
|
||||
raise OutlookCalendarError(ERR_TOKEN_EXPIRED)
|
||||
|
||||
if response.status_code != HTTP_STATUS_OK:
|
||||
error_body = _truncate_error_body(response.text)
|
||||
logger.error("Microsoft Graph API error: %s", error_body)
|
||||
raise OutlookCalendarError(f"{ERR_API_PREFIX}{error_body}")
|
||||
|
||||
data_value = response.json()
|
||||
if not isinstance(data_value, dict):
|
||||
logger.warning("Unexpected Microsoft Graph response payload")
|
||||
break
|
||||
data = cast(_OutlookEventsResponse, data_value)
|
||||
items = data.get("value", [])
|
||||
|
||||
for item in items:
|
||||
all_events.append(self._parse_event(item))
|
||||
if len(all_events) >= limit:
|
||||
logger.info(
|
||||
"outlook_calendar_events_fetched",
|
||||
event_count=len(all_events),
|
||||
hours_ahead=hours_ahead,
|
||||
)
|
||||
return all_events
|
||||
|
||||
# Check for next page
|
||||
next_link = data.get("@odata.nextLink") or data.get("@odata_nextLink")
|
||||
url = str(next_link) if isinstance(next_link, str) else None
|
||||
params = None # nextLink includes query params
|
||||
query = _OutlookEventQuery(
|
||||
start_time=start_time,
|
||||
end_time=end_time,
|
||||
hours_ahead=hours_ahead,
|
||||
limit=limit,
|
||||
)
|
||||
all_events = await self._fetch_events(
|
||||
client,
|
||||
headers,
|
||||
query,
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"outlook_calendar_events_fetched",
|
||||
@@ -204,6 +178,75 @@ class OutlookCalendarAdapter(CalendarPort):
|
||||
)
|
||||
return all_events
|
||||
|
||||
async def _fetch_events(
|
||||
self,
|
||||
client: httpx.AsyncClient,
|
||||
headers: dict[str, str],
|
||||
query: _OutlookEventQuery,
|
||||
) -> list[CalendarEventInfo]:
|
||||
"""Fetch events with pagination handling."""
|
||||
page_size = min(query.limit, GRAPH_API_MAX_PAGE_SIZE)
|
||||
url: str | None = f"{self.GRAPH_API_BASE}/me/calendarView"
|
||||
params: dict[str, str | int] | None = {
|
||||
"startDateTime": query.start_time,
|
||||
"endDateTime": query.end_time,
|
||||
"$top": page_size,
|
||||
"$orderby": "start/dateTime",
|
||||
"$select": (
|
||||
"id,subject,start,end,location,bodyPreview,"
|
||||
"attendees,isAllDay,seriesMasterId,onlineMeeting,onlineMeetingUrl"
|
||||
),
|
||||
}
|
||||
all_events: list[CalendarEventInfo] = []
|
||||
|
||||
while url is not None:
|
||||
response = await client.get(url, params=params, headers=headers)
|
||||
self._raise_for_status(response)
|
||||
parsed = self._parse_events_response(response)
|
||||
if parsed is None:
|
||||
break
|
||||
items, next_url = parsed
|
||||
|
||||
for item in items:
|
||||
all_events.append(self._parse_event(item))
|
||||
if len(all_events) >= query.limit:
|
||||
logger.info(
|
||||
"outlook_calendar_events_fetched",
|
||||
event_count=len(all_events),
|
||||
hours_ahead=query.hours_ahead,
|
||||
)
|
||||
return all_events
|
||||
|
||||
url = next_url
|
||||
params = None # nextLink includes query params
|
||||
|
||||
return all_events
|
||||
|
||||
@staticmethod
|
||||
def _raise_for_status(response: httpx.Response) -> None:
|
||||
"""Raise OutlookCalendarError on non-success responses."""
|
||||
if response.status_code == HTTP_STATUS_UNAUTHORIZED:
|
||||
raise OutlookCalendarError(ERR_TOKEN_EXPIRED)
|
||||
if response.status_code != HTTP_STATUS_OK:
|
||||
error_body = _truncate_error_body(response.text)
|
||||
logger.error("Microsoft Graph API error: %s", error_body)
|
||||
raise OutlookCalendarError(f"{ERR_API_PREFIX}{error_body}")
|
||||
|
||||
@staticmethod
|
||||
def _parse_events_response(
|
||||
response: httpx.Response,
|
||||
) -> tuple[list[_OutlookEvent], str | None] | None:
|
||||
"""Parse event payload and next link from the response."""
|
||||
data_value = response.json()
|
||||
if not isinstance(data_value, dict):
|
||||
logger.warning("Unexpected Microsoft Graph response payload")
|
||||
return None
|
||||
data = cast(_OutlookEventsResponse, data_value)
|
||||
items = data.get("value", [])
|
||||
next_link = data.get("@odata.nextLink") or data.get("@odata_nextLink")
|
||||
next_url = str(next_link) if isinstance(next_link, str) else None
|
||||
return items, next_url
|
||||
|
||||
async def get_user_email(self, access_token: str) -> str:
|
||||
"""Get authenticated user's email address.
|
||||
|
||||
|
||||
@@ -10,7 +10,7 @@ from __future__ import annotations
|
||||
|
||||
import os
|
||||
from collections.abc import Mapping, Sequence
|
||||
from typing import TYPE_CHECKING, Protocol, Self, cast
|
||||
from typing import TYPE_CHECKING, Protocol, Self, TypedDict, Unpack, cast
|
||||
|
||||
from noteflow.config.constants import DEFAULT_SAMPLE_RATE, ERR_HF_TOKEN_REQUIRED
|
||||
from noteflow.infrastructure.diarization.dto import SpeakerTurn
|
||||
@@ -50,6 +50,15 @@ class _OfflinePipeline(Protocol):
|
||||
class _TorchModule(Protocol):
|
||||
def from_numpy(self, ndarray: NDArray[np.float32]) -> Tensor: ...
|
||||
|
||||
|
||||
class _DiarizationEngineKwargs(TypedDict, total=False):
|
||||
"""Optional diarization engine settings."""
|
||||
|
||||
hf_token: str | None
|
||||
streaming_latency: float
|
||||
min_speakers: int
|
||||
max_speakers: int
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
@@ -63,21 +72,19 @@ class DiarizationEngine:
|
||||
def __init__(
|
||||
self,
|
||||
device: str = "auto",
|
||||
hf_token: str | None = None,
|
||||
streaming_latency: float = 0.5,
|
||||
min_speakers: int = 1,
|
||||
max_speakers: int = 10,
|
||||
**kwargs: Unpack[_DiarizationEngineKwargs],
|
||||
) -> None:
|
||||
"""Initialize the diarization engine.
|
||||
|
||||
Args:
|
||||
device: Device to use ("auto", "cpu", "cuda", "mps").
|
||||
"auto" selects CUDA > MPS > CPU based on availability.
|
||||
hf_token: HuggingFace token for pyannote model access.
|
||||
streaming_latency: Latency for streaming diarization in seconds.
|
||||
min_speakers: Minimum expected speakers for offline diarization.
|
||||
max_speakers: Maximum expected speakers for offline diarization.
|
||||
**kwargs: Optional settings (hf_token, streaming_latency, min_speakers, max_speakers).
|
||||
"""
|
||||
hf_token = kwargs.get("hf_token")
|
||||
streaming_latency = kwargs.get("streaming_latency", 0.5)
|
||||
min_speakers = kwargs.get("min_speakers", 1)
|
||||
max_speakers = kwargs.get("max_speakers", 10)
|
||||
self._device_preference = device
|
||||
self._device: str | None = None
|
||||
self._hf_token = hf_token
|
||||
|
||||
@@ -156,7 +156,7 @@ class DiarizationSession:
|
||||
self._turns.clear()
|
||||
# Explicitly release pipeline reference to allow GC and GPU memory release
|
||||
self._pipeline = None
|
||||
logger.debug("Session %s closed", self.meeting_id)
|
||||
logger.info("diarization_session_closed", meeting_id=self.meeting_id)
|
||||
|
||||
@property
|
||||
def stream_time(self) -> float:
|
||||
|
||||
@@ -5,6 +5,7 @@ Export meeting transcripts to HTML format.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
from datetime import datetime
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
@@ -14,6 +15,7 @@ from noteflow.infrastructure.export._formatting import (
|
||||
format_datetime,
|
||||
format_timestamp,
|
||||
)
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
@@ -22,6 +24,8 @@ if TYPE_CHECKING:
|
||||
from noteflow.domain.entities.segment import Segment
|
||||
from noteflow.domain.entities.summary import Summary
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
# CSS styles for print-friendly HTML output
|
||||
_HTML_STYLES = """
|
||||
@@ -174,6 +178,7 @@ class HtmlExporter:
|
||||
Returns:
|
||||
HTML-formatted transcript string.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
content_parts: list[str] = [f"<h1>{escape_html(meeting.title)}</h1>"]
|
||||
content_parts.extend(_build_metadata_html(meeting, len(segments)))
|
||||
content_parts.extend(_build_transcript_html(segments))
|
||||
@@ -189,4 +194,13 @@ class HtmlExporter:
|
||||
)
|
||||
)
|
||||
content = "\n".join(content_parts)
|
||||
return _build_html_document(title=escape_html(meeting.title), content=content)
|
||||
result = _build_html_document(title=escape_html(meeting.title), content=content)
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.info(
|
||||
"html_exported",
|
||||
meeting_id=str(meeting.id),
|
||||
segment_count=len(segments),
|
||||
size_bytes=len(result.encode("utf-8")),
|
||||
duration_ms=round(elapsed_ms, 2),
|
||||
)
|
||||
return result
|
||||
|
||||
@@ -5,10 +5,12 @@ Export meeting transcripts to Markdown format.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
from datetime import datetime
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from noteflow.infrastructure.export._formatting import format_datetime, format_timestamp
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
@@ -16,6 +18,8 @@ if TYPE_CHECKING:
|
||||
from noteflow.domain.entities.meeting import Meeting
|
||||
from noteflow.domain.entities.segment import Segment
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
class MarkdownExporter:
|
||||
"""Export meeting transcripts to Markdown format.
|
||||
@@ -48,6 +52,7 @@ class MarkdownExporter:
|
||||
Returns:
|
||||
Markdown-formatted transcript string.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
lines: list[str] = [
|
||||
f"# {meeting.title}",
|
||||
"",
|
||||
@@ -86,4 +91,13 @@ class MarkdownExporter:
|
||||
lines.append("---")
|
||||
lines.append(f"*Exported from NoteFlow on {format_datetime(datetime.now())}*")
|
||||
|
||||
return "\n".join(lines)
|
||||
result = "\n".join(lines)
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.info(
|
||||
"markdown_exported",
|
||||
meeting_id=str(meeting.id),
|
||||
segment_count=len(segments),
|
||||
size_bytes=len(result.encode("utf-8")),
|
||||
duration_ms=round(elapsed_ms, 2),
|
||||
)
|
||||
return result
|
||||
|
||||
@@ -5,6 +5,7 @@ Export meeting transcripts to PDF format.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
from typing import TYPE_CHECKING, Protocol, cast
|
||||
|
||||
from noteflow.config.constants import EXPORT_EXT_PDF
|
||||
@@ -13,6 +14,7 @@ from noteflow.infrastructure.export._formatting import (
|
||||
format_datetime,
|
||||
format_timestamp,
|
||||
)
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from collections.abc import Sequence
|
||||
@@ -20,6 +22,8 @@ if TYPE_CHECKING:
|
||||
from noteflow.domain.entities.meeting import Meeting
|
||||
from noteflow.domain.entities.segment import Segment
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
class _WeasyHTMLProtocol(Protocol):
|
||||
"""Protocol for weasyprint HTML class."""
|
||||
@@ -175,6 +179,7 @@ class PdfExporter:
|
||||
Raises:
|
||||
RuntimeError: If weasyprint is not installed.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
weasy_html = _get_weasy_html()
|
||||
if weasy_html is None:
|
||||
raise RuntimeError(
|
||||
@@ -183,6 +188,14 @@ class PdfExporter:
|
||||
|
||||
html_content = self.build_html(meeting, segments)
|
||||
pdf_bytes: bytes = weasy_html(string=html_content).write_pdf()
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.info(
|
||||
"pdf_exported",
|
||||
meeting_id=str(meeting.id),
|
||||
segment_count=len(segments),
|
||||
size_bytes=len(pdf_bytes),
|
||||
duration_ms=round(elapsed_ms, 2),
|
||||
)
|
||||
return pdf_bytes
|
||||
|
||||
def build_html(self, meeting: Meeting, segments: Sequence[Segment]) -> str:
|
||||
|
||||
@@ -28,10 +28,18 @@ from .structured import (
|
||||
user_id_var,
|
||||
workspace_id_var,
|
||||
)
|
||||
from .rate_limit import (
|
||||
DEFAULT_RATE_LIMIT_SECONDS,
|
||||
RateLimitedLogger,
|
||||
get_client_rate_limiter,
|
||||
)
|
||||
from .timing import log_timing
|
||||
from .transitions import log_state_transition
|
||||
|
||||
__all__ = [
|
||||
"DEFAULT_RATE_LIMIT_SECONDS",
|
||||
"RateLimitedLogger",
|
||||
"get_client_rate_limiter",
|
||||
"LogBuffer",
|
||||
"LogBufferHandler",
|
||||
"LogEntry",
|
||||
|
||||
@@ -67,27 +67,35 @@ def add_otel_trace_context(
|
||||
Returns:
|
||||
Updated event dictionary with trace context.
|
||||
"""
|
||||
if trace_context := _extract_otel_context():
|
||||
trace_id, span_id, parent_id = trace_context
|
||||
event_dict[_TRACE_ID] = format(trace_id, _HEX_32)
|
||||
event_dict[_SPAN_ID] = format(span_id, _HEX_16)
|
||||
if parent_id is not None:
|
||||
event_dict[_PARENT_SPAN_ID] = format(parent_id, _HEX_16)
|
||||
return event_dict
|
||||
|
||||
|
||||
def _extract_otel_context() -> tuple[int, int, int | None] | None:
|
||||
"""Return OpenTelemetry trace/span IDs if available."""
|
||||
try:
|
||||
from opentelemetry import trace
|
||||
|
||||
span = trace.get_current_span()
|
||||
if span.is_recording():
|
||||
ctx = span.get_span_context()
|
||||
if ctx.is_valid:
|
||||
event_dict[_TRACE_ID] = format(ctx.trace_id, _HEX_32)
|
||||
event_dict[_SPAN_ID] = format(ctx.span_id, _HEX_16)
|
||||
# Parent span ID if available
|
||||
parent = getattr(span, "parent", None)
|
||||
if parent is not None:
|
||||
parent_ctx = getattr(parent, _SPAN_ID, None)
|
||||
if parent_ctx is not None:
|
||||
event_dict[_PARENT_SPAN_ID] = format(parent_ctx, _HEX_16)
|
||||
except ImportError:
|
||||
pass
|
||||
return None
|
||||
|
||||
try:
|
||||
span = trace.get_current_span()
|
||||
if not span.is_recording():
|
||||
return None
|
||||
ctx = span.get_span_context()
|
||||
if not ctx.is_valid:
|
||||
return None
|
||||
parent = getattr(span, "parent", None)
|
||||
parent_ctx = getattr(parent, _SPAN_ID, None) if parent is not None else None
|
||||
return ctx.trace_id, ctx.span_id, parent_ctx
|
||||
except (AttributeError, TypeError):
|
||||
# Graceful degradation for edge cases
|
||||
pass
|
||||
return event_dict
|
||||
return None
|
||||
|
||||
|
||||
def build_processor_chain(config: LoggingConfig) -> Sequence[Processor]:
|
||||
|
||||
106
src/noteflow/infrastructure/logging/rate_limit.py
Normal file
106
src/noteflow/infrastructure/logging/rate_limit.py
Normal file
@@ -0,0 +1,106 @@
|
||||
"""Rate-limited logging utilities.
|
||||
|
||||
Provide helpers to prevent log spam for repetitive conditions.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
from typing import Final
|
||||
|
||||
from .config import get_logger
|
||||
|
||||
# Default rate limit interval (60 seconds)
|
||||
DEFAULT_RATE_LIMIT_SECONDS: Final[float] = 60.0
|
||||
|
||||
|
||||
class RateLimitedLogger:
|
||||
"""Logger that rate-limits messages by operation key.
|
||||
|
||||
Prevents log spam by only logging each unique key once per interval.
|
||||
|
||||
Example:
|
||||
rate_limited = RateLimitedLogger()
|
||||
# This will log at most once per 60 seconds for "create_meeting"
|
||||
rate_limited.warn_stub_missing("create_meeting")
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
interval_seconds: float = DEFAULT_RATE_LIMIT_SECONDS,
|
||||
logger_name: str | None = None,
|
||||
) -> None:
|
||||
"""Initialize rate-limited logger.
|
||||
|
||||
Args:
|
||||
interval_seconds: Minimum time between logs for same key.
|
||||
logger_name: Optional logger name (defaults to rate_limit module).
|
||||
"""
|
||||
self._interval = interval_seconds
|
||||
self._last_logged: dict[str, float] = {}
|
||||
self._logger = get_logger(logger_name or __name__)
|
||||
|
||||
def _should_log(self, key: str) -> bool:
|
||||
"""Check if enough time has passed to log this key."""
|
||||
now = time.monotonic()
|
||||
last = self._last_logged.get(key)
|
||||
if last is None or (now - last) >= self._interval:
|
||||
self._last_logged[key] = now
|
||||
return True
|
||||
return False
|
||||
|
||||
def warn_stub_missing(self, operation: str) -> None:
|
||||
"""Log a warning that gRPC stub is not available (rate-limited).
|
||||
|
||||
Args:
|
||||
operation: Name of the operation being attempted.
|
||||
"""
|
||||
key = f"stub_missing:{operation}"
|
||||
if self._should_log(key):
|
||||
self._logger.warning(
|
||||
"grpc_stub_not_available",
|
||||
operation=operation,
|
||||
hint="Client not connected to server",
|
||||
)
|
||||
|
||||
def warn(self, key: str, message: str, **context: str | int | float | None) -> None:
|
||||
"""Log a rate-limited warning.
|
||||
|
||||
Args:
|
||||
key: Unique key for rate limiting.
|
||||
message: Log message/event name.
|
||||
**context: Additional context fields.
|
||||
"""
|
||||
if self._should_log(key):
|
||||
ctx = {k: v for k, v in context.items() if v is not None}
|
||||
self._logger.warning(message, **ctx)
|
||||
|
||||
def reset(self, key: str | None = None) -> None:
|
||||
"""Reset rate limit state.
|
||||
|
||||
Args:
|
||||
key: Specific key to reset, or None to reset all.
|
||||
"""
|
||||
if key is None:
|
||||
self._last_logged.clear()
|
||||
else:
|
||||
self._last_logged.pop(key, None)
|
||||
|
||||
|
||||
# Module-level singleton for client mixin use
|
||||
_client_rate_limiter: RateLimitedLogger | None = None
|
||||
|
||||
|
||||
def get_client_rate_limiter() -> RateLimitedLogger:
|
||||
"""Get the shared rate-limited logger for gRPC client operations.
|
||||
|
||||
Returns:
|
||||
Singleton RateLimitedLogger instance.
|
||||
"""
|
||||
global _client_rate_limiter
|
||||
if _client_rate_limiter is None:
|
||||
_client_rate_limiter = RateLimitedLogger(
|
||||
interval_seconds=DEFAULT_RATE_LIMIT_SECONDS,
|
||||
logger_name="noteflow.grpc.client",
|
||||
)
|
||||
return _client_rate_limiter
|
||||
@@ -17,6 +17,33 @@ from .config import get_logger
|
||||
|
||||
P = ParamSpec("P")
|
||||
R = TypeVar("R")
|
||||
T = TypeVar("T") # Used by _wrap_async for coroutine inner type
|
||||
|
||||
|
||||
def _wrap_async(
|
||||
func: Callable[P, Coroutine[object, object, T]],
|
||||
operation: str,
|
||||
context: dict[str, str | int | float | None],
|
||||
) -> Callable[P, Coroutine[object, object, T]]:
|
||||
@functools.wraps(func)
|
||||
async def async_wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
|
||||
with log_timing(operation, **context):
|
||||
return await func(*args, **kwargs)
|
||||
|
||||
return async_wrapper
|
||||
|
||||
|
||||
def _wrap_sync(
|
||||
func: Callable[P, R],
|
||||
operation: str,
|
||||
context: dict[str, str | int | float | None],
|
||||
) -> Callable[P, R]:
|
||||
@functools.wraps(func)
|
||||
def sync_wrapper(*args: P.args, **kwargs: P.kwargs) -> R:
|
||||
with log_timing(operation, **context):
|
||||
return func(*args, **kwargs)
|
||||
|
||||
return sync_wrapper
|
||||
|
||||
|
||||
@contextmanager
|
||||
@@ -94,28 +121,10 @@ def timed(
|
||||
func: Callable[P, R],
|
||||
) -> Callable[P, R]:
|
||||
if asyncio.iscoroutinefunction(func):
|
||||
|
||||
@functools.wraps(func)
|
||||
async def async_wrapper(
|
||||
*args: P.args, **kwargs: P.kwargs
|
||||
) -> R:
|
||||
with log_timing(operation, **context):
|
||||
# Cast required: iscoroutinefunction narrows but type checker
|
||||
# cannot propagate this to the return type of func()
|
||||
coro = cast(Coroutine[object, object, R], func(*args, **kwargs))
|
||||
return await coro
|
||||
|
||||
# Cast required: async wrapper must be returned as Callable[P, R]
|
||||
# but wraps() preserves async signature which doesn't match R directly
|
||||
return cast(Callable[P, R], async_wrapper)
|
||||
else:
|
||||
|
||||
@functools.wraps(func)
|
||||
def sync_wrapper(*args: P.args, **kwargs: P.kwargs) -> R:
|
||||
with log_timing(operation, **context):
|
||||
return func(*args, **kwargs)
|
||||
|
||||
return sync_wrapper
|
||||
# Cast required: asyncio.iscoroutinefunction provides runtime narrowing
|
||||
# but type checker cannot propagate this to the generic R type
|
||||
async_func = cast(Callable[P, Coroutine[object, object, R]], func)
|
||||
return cast(Callable[P, R], _wrap_async(async_func, operation, context))
|
||||
return _wrap_sync(func, operation, context)
|
||||
|
||||
return decorator
|
||||
|
||||
|
||||
@@ -9,11 +9,12 @@ import asyncio
|
||||
import logging
|
||||
from collections import deque
|
||||
from threading import Lock
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, cast
|
||||
|
||||
from noteflow.application.observability.ports import (
|
||||
NullUsageEventSink,
|
||||
UsageEvent,
|
||||
UsageEventContext,
|
||||
UsageEventSink,
|
||||
UsageMetrics,
|
||||
)
|
||||
@@ -31,6 +32,19 @@ if TYPE_CHECKING:
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
def _extract_event_context(
|
||||
context: UsageEventContext | None, attributes: dict[str, object]
|
||||
) -> tuple[UsageEventContext, dict[str, object]]:
|
||||
"""Extract context fields from attributes when not provided explicitly."""
|
||||
if context is not None:
|
||||
return context, attributes
|
||||
|
||||
meeting_id = cast(str | None, attributes.pop("meeting_id", None))
|
||||
success = cast(bool, attributes.pop("success", True))
|
||||
error_code = cast(str | None, attributes.pop("error_code", None))
|
||||
return UsageEventContext(meeting_id=meeting_id, success=success, error_code=error_code), attributes
|
||||
|
||||
|
||||
class LoggingUsageEventSink:
|
||||
"""Usage event sink that logs events.
|
||||
|
||||
@@ -66,20 +80,27 @@ class LoggingUsageEventSink:
|
||||
event_type: str,
|
||||
metrics: UsageMetrics | None = None,
|
||||
*,
|
||||
meeting_id: str | None = None,
|
||||
success: bool = True,
|
||||
error_code: str | None = None,
|
||||
context: UsageEventContext | None = None,
|
||||
**attributes: object,
|
||||
) -> None:
|
||||
"""Log a simple usage event."""
|
||||
m = metrics or UsageMetrics()
|
||||
self.record(UsageEvent(
|
||||
event_type=event_type, meeting_id=meeting_id,
|
||||
provider_name=m.provider_name, model_name=m.model_name,
|
||||
tokens_input=m.tokens_input, tokens_output=m.tokens_output,
|
||||
latency_ms=m.latency_ms, success=success,
|
||||
error_code=error_code, attributes=dict(attributes),
|
||||
))
|
||||
attrs = dict(attributes)
|
||||
resolved_context, attrs = _extract_event_context(context, attrs)
|
||||
self.record(
|
||||
UsageEvent(
|
||||
event_type=event_type,
|
||||
meeting_id=resolved_context.meeting_id,
|
||||
provider_name=m.provider_name,
|
||||
model_name=m.model_name,
|
||||
tokens_input=m.tokens_input,
|
||||
tokens_output=m.tokens_output,
|
||||
latency_ms=m.latency_ms,
|
||||
success=resolved_context.success,
|
||||
error_code=resolved_context.error_code,
|
||||
attributes=attrs,
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
def _build_event_attributes(event: UsageEvent) -> dict[str, str | int | float | bool]:
|
||||
@@ -178,20 +199,27 @@ class OtelUsageEventSink:
|
||||
event_type: str,
|
||||
metrics: UsageMetrics | None = None,
|
||||
*,
|
||||
meeting_id: str | None = None,
|
||||
success: bool = True,
|
||||
error_code: str | None = None,
|
||||
context: UsageEventContext | None = None,
|
||||
**attributes: object,
|
||||
) -> None:
|
||||
"""Record a simple usage event to current span."""
|
||||
m = metrics or UsageMetrics()
|
||||
self.record(UsageEvent(
|
||||
event_type=event_type, meeting_id=meeting_id,
|
||||
provider_name=m.provider_name, model_name=m.model_name,
|
||||
tokens_input=m.tokens_input, tokens_output=m.tokens_output,
|
||||
latency_ms=m.latency_ms, success=success,
|
||||
error_code=error_code, attributes=dict(attributes),
|
||||
))
|
||||
attrs = dict(attributes)
|
||||
resolved_context, attrs = _extract_event_context(context, attrs)
|
||||
self.record(
|
||||
UsageEvent(
|
||||
event_type=event_type,
|
||||
meeting_id=resolved_context.meeting_id,
|
||||
provider_name=m.provider_name,
|
||||
model_name=m.model_name,
|
||||
tokens_input=m.tokens_input,
|
||||
tokens_output=m.tokens_output,
|
||||
latency_ms=m.latency_ms,
|
||||
success=resolved_context.success,
|
||||
error_code=resolved_context.error_code,
|
||||
attributes=attrs,
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
class BufferedDatabaseUsageEventSink:
|
||||
@@ -248,20 +276,27 @@ class BufferedDatabaseUsageEventSink:
|
||||
event_type: str,
|
||||
metrics: UsageMetrics | None = None,
|
||||
*,
|
||||
meeting_id: str | None = None,
|
||||
success: bool = True,
|
||||
error_code: str | None = None,
|
||||
context: UsageEventContext | None = None,
|
||||
**attributes: object,
|
||||
) -> None:
|
||||
"""Buffer a simple usage event."""
|
||||
m = metrics or UsageMetrics()
|
||||
self.record(UsageEvent(
|
||||
event_type=event_type, meeting_id=meeting_id,
|
||||
provider_name=m.provider_name, model_name=m.model_name,
|
||||
tokens_input=m.tokens_input, tokens_output=m.tokens_output,
|
||||
latency_ms=m.latency_ms, success=success,
|
||||
error_code=error_code, attributes=dict(attributes),
|
||||
))
|
||||
attrs = dict(attributes)
|
||||
resolved_context, attrs = _extract_event_context(context, attrs)
|
||||
self.record(
|
||||
UsageEvent(
|
||||
event_type=event_type,
|
||||
meeting_id=resolved_context.meeting_id,
|
||||
provider_name=m.provider_name,
|
||||
model_name=m.model_name,
|
||||
tokens_input=m.tokens_input,
|
||||
tokens_output=m.tokens_output,
|
||||
latency_ms=m.latency_ms,
|
||||
success=resolved_context.success,
|
||||
error_code=resolved_context.error_code,
|
||||
attributes=attrs,
|
||||
)
|
||||
)
|
||||
|
||||
def _schedule_flush(self) -> None:
|
||||
"""Schedule an async flush on the event loop."""
|
||||
|
||||
@@ -348,34 +348,20 @@ async def _handle_tables_without_alembic(
|
||||
) -> None:
|
||||
"""Handle case where tables exist but Alembic version doesn't."""
|
||||
critical_tables = ["meetings", "segments", "diarization_jobs", "user_preferences"]
|
||||
missing_tables: list[str] = []
|
||||
|
||||
async with session_factory() as session:
|
||||
for table_name in critical_tables:
|
||||
if not await _table_exists(session, table_name):
|
||||
missing_tables.append(table_name)
|
||||
missing_tables = await _find_missing_tables(session_factory, critical_tables)
|
||||
|
||||
if missing_tables:
|
||||
logger.warning(
|
||||
"Tables exist but missing critical tables: %s. Creating missing tables...",
|
||||
", ".join(missing_tables),
|
||||
)
|
||||
if "user_preferences" in missing_tables:
|
||||
async with session_factory() as session:
|
||||
if await _create_user_preferences_table(session):
|
||||
logger.info("Successfully created user_preferences table")
|
||||
|
||||
await _create_user_preferences_if_missing(session_factory, missing_tables)
|
||||
logger.info("Stamping database after creating missing tables...")
|
||||
await _stamp_database_async(database_url)
|
||||
logger.info("Database schema ready (created missing tables and stamped)")
|
||||
return
|
||||
|
||||
# Safety check for user_preferences
|
||||
async with session_factory() as session:
|
||||
if not await _table_exists(session, "user_preferences"):
|
||||
logger.warning("user_preferences table missing despite check, creating it...")
|
||||
await _create_user_preferences_table(session)
|
||||
logger.info("Created user_preferences table (safety check)")
|
||||
await _ensure_user_preferences_table(session_factory)
|
||||
|
||||
logger.info(
|
||||
"Tables exist (%d) but Alembic version table missing, stamping database...",
|
||||
@@ -385,6 +371,43 @@ async def _handle_tables_without_alembic(
|
||||
logger.info("Database schema ready (stamped from schema.sql)")
|
||||
|
||||
|
||||
async def _find_missing_tables(
|
||||
session_factory: async_sessionmaker[AsyncSession],
|
||||
critical_tables: list[str],
|
||||
) -> list[str]:
|
||||
"""Return list of critical tables missing from the database."""
|
||||
missing_tables: list[str] = []
|
||||
async with session_factory() as session:
|
||||
for table_name in critical_tables:
|
||||
if not await _table_exists(session, table_name):
|
||||
missing_tables.append(table_name)
|
||||
return missing_tables
|
||||
|
||||
|
||||
async def _create_user_preferences_if_missing(
|
||||
session_factory: async_sessionmaker[AsyncSession],
|
||||
missing_tables: list[str],
|
||||
) -> None:
|
||||
"""Create user_preferences table when listed as missing."""
|
||||
if "user_preferences" not in missing_tables:
|
||||
return
|
||||
async with session_factory() as session:
|
||||
if await _create_user_preferences_table(session):
|
||||
logger.info("Successfully created user_preferences table")
|
||||
|
||||
|
||||
async def _ensure_user_preferences_table(
|
||||
session_factory: async_sessionmaker[AsyncSession],
|
||||
) -> None:
|
||||
"""Create user_preferences table if still missing."""
|
||||
async with session_factory() as session:
|
||||
if await _table_exists(session, "user_preferences"):
|
||||
return
|
||||
logger.warning("user_preferences table missing despite check, creating it...")
|
||||
await _create_user_preferences_table(session)
|
||||
logger.info("Created user_preferences table (safety check)")
|
||||
|
||||
|
||||
async def _handle_alembic_with_tables(
|
||||
session_factory: async_sessionmaker[AsyncSession],
|
||||
database_url: str,
|
||||
|
||||
@@ -8,11 +8,11 @@ from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from datetime import datetime
|
||||
from typing import TYPE_CHECKING
|
||||
from uuid import UUID
|
||||
from typing import TYPE_CHECKING, Unpack
|
||||
|
||||
from noteflow.domain.entities import Meeting, Segment, Summary
|
||||
from noteflow.domain.value_objects import MeetingId, MeetingState
|
||||
from noteflow.domain.ports.repositories.transcript import MeetingListKwargs
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.grpc.meeting_store import MeetingStore
|
||||
@@ -43,15 +43,22 @@ class MemoryMeetingRepository:
|
||||
|
||||
async def list_all(
|
||||
self,
|
||||
states: list[MeetingState] | None = None,
|
||||
limit: int = 100,
|
||||
offset: int = 0,
|
||||
sort_desc: bool = True,
|
||||
project_id: UUID | None = None,
|
||||
**kwargs: Unpack[MeetingListKwargs],
|
||||
) -> tuple[Sequence[Meeting], int]:
|
||||
"""List meetings via in-memory store with optional state filtering."""
|
||||
states = kwargs.get("states")
|
||||
limit = kwargs.get("limit", 100)
|
||||
offset = kwargs.get("offset", 0)
|
||||
sort_desc = kwargs.get("sort_desc", True)
|
||||
project_id = kwargs.get("project_id")
|
||||
project_filter = str(project_id) if project_id else None
|
||||
return self._store.list_all(states, limit, offset, sort_desc, project_filter)
|
||||
return self._store.list_all(
|
||||
states=states,
|
||||
limit=limit,
|
||||
offset=offset,
|
||||
sort_desc=sort_desc,
|
||||
project_id=project_filter,
|
||||
)
|
||||
|
||||
async def count_by_state(self, state: MeetingState) -> int:
|
||||
"""Count meetings in a specific state."""
|
||||
|
||||
@@ -7,7 +7,7 @@ operations requiring database persistence.
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, Unpack
|
||||
from uuid import UUID
|
||||
|
||||
_ERR_USERS_DB = "Users require database persistence"
|
||||
@@ -19,8 +19,8 @@ if TYPE_CHECKING:
|
||||
Workspace,
|
||||
WorkspaceMembership,
|
||||
WorkspaceRole,
|
||||
WorkspaceSettings,
|
||||
)
|
||||
from noteflow.domain.ports.repositories.identity._workspace import WorkspaceCreateKwargs
|
||||
|
||||
|
||||
class UnsupportedUserRepository:
|
||||
@@ -86,9 +86,7 @@ class UnsupportedWorkspaceRepository:
|
||||
workspace_id: UUID,
|
||||
name: str,
|
||||
owner_id: UUID,
|
||||
slug: str | None = None,
|
||||
is_default: bool = False,
|
||||
settings: WorkspaceSettings | None = None,
|
||||
**kwargs: Unpack[WorkspaceCreateKwargs],
|
||||
) -> Workspace:
|
||||
"""Not supported in memory mode."""
|
||||
raise NotImplementedError(_ERR_WORKSPACES_DB)
|
||||
|
||||
@@ -7,13 +7,14 @@ operations requiring database persistence.
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, Unpack
|
||||
from uuid import UUID
|
||||
|
||||
_ERR_PROJECTS_DB = "Projects require database persistence"
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.domain.entities.project import Project, ProjectSettings
|
||||
from noteflow.domain.entities.project import Project
|
||||
from noteflow.domain.ports.repositories.identity._project import ProjectCreateKwargs
|
||||
from noteflow.domain.identity import ProjectMembership, ProjectRole
|
||||
|
||||
|
||||
@@ -47,10 +48,7 @@ class UnsupportedProjectRepository:
|
||||
project_id: UUID,
|
||||
workspace_id: UUID,
|
||||
name: str,
|
||||
slug: str | None = None,
|
||||
description: str | None = None,
|
||||
is_default: bool = False,
|
||||
settings: ProjectSettings | None = None,
|
||||
**kwargs: Unpack[ProjectCreateKwargs],
|
||||
) -> Project:
|
||||
"""Not supported in memory mode."""
|
||||
raise NotImplementedError(_ERR_PROJECTS_DB)
|
||||
|
||||
@@ -7,11 +7,11 @@ operations requiring database persistence.
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from datetime import datetime
|
||||
from typing import TYPE_CHECKING
|
||||
from typing import TYPE_CHECKING, Unpack
|
||||
from uuid import UUID
|
||||
|
||||
from noteflow.config.constants import ERR_SERVER_RESTARTED
|
||||
from noteflow.domain.ports.repositories.background import DiarizationStatusKwargs
|
||||
|
||||
_ERR_ANNOTATIONS_DB = "Annotations require database persistence"
|
||||
_ERR_DIARIZATION_DB = "Diarization jobs require database persistence"
|
||||
@@ -87,11 +87,7 @@ class UnsupportedDiarizationJobRepository:
|
||||
self,
|
||||
job_id: str,
|
||||
status: int,
|
||||
*,
|
||||
segments_updated: int | None = None,
|
||||
speaker_ids: list[str] | None = None,
|
||||
error_message: str | None = None,
|
||||
started_at: datetime | None = None,
|
||||
**kwargs: Unpack[DiarizationStatusKwargs],
|
||||
) -> bool:
|
||||
"""Not supported in memory mode."""
|
||||
raise NotImplementedError(_ERR_DIARIZATION_DB)
|
||||
|
||||
@@ -8,6 +8,7 @@ Mixins require class attributes to be defined by the implementing class:
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
from collections.abc import Sequence
|
||||
from typing import TYPE_CHECKING, TypeVar, cast
|
||||
from uuid import UUID
|
||||
@@ -16,10 +17,14 @@ from sqlalchemy import select
|
||||
from sqlalchemy.engine import CursorResult
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from sqlalchemy.orm import DeclarativeBase
|
||||
from sqlalchemy.sql import Delete, Select, Update
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
TModel = TypeVar("TModel", bound="DeclarativeBase")
|
||||
TExists = TypeVar("TExists")
|
||||
TEntity = TypeVar("TEntity")
|
||||
@@ -72,8 +77,12 @@ class BaseRepository:
|
||||
Returns:
|
||||
Single model instance or None if not found.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
result = await self._session.execute(stmt)
|
||||
return result.scalar_one_or_none()
|
||||
row = result.scalar_one_or_none()
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.debug("db_execute_scalar", duration_ms=round(elapsed_ms, 2), found=row is not None)
|
||||
return row
|
||||
|
||||
async def _execute_scalars(
|
||||
self,
|
||||
@@ -87,8 +96,12 @@ class BaseRepository:
|
||||
Returns:
|
||||
List of model instances.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
result = await self._session.execute(stmt)
|
||||
return list(result.scalars().all())
|
||||
rows = list(result.scalars().all())
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.debug("db_execute_scalars", duration_ms=round(elapsed_ms, 2), count=len(rows))
|
||||
return rows
|
||||
|
||||
async def _add_and_flush(self, model: TModel) -> TModel:
|
||||
"""Add model to session and flush.
|
||||
@@ -99,8 +112,11 @@ class BaseRepository:
|
||||
Returns:
|
||||
The persisted model with generated fields populated.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
self._session.add(model)
|
||||
await self._session.flush()
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.info("db_add_and_flush", duration_ms=round(elapsed_ms, 2))
|
||||
return model
|
||||
|
||||
async def _delete_and_flush(self, model: object) -> None:
|
||||
@@ -109,8 +125,11 @@ class BaseRepository:
|
||||
Args:
|
||||
model: ORM model instance to delete.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
await self._session.delete(model)
|
||||
await self._session.flush()
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.info("db_delete_and_flush", duration_ms=round(elapsed_ms, 2))
|
||||
|
||||
async def _add_all_and_flush(self, models: list[TModel]) -> list[TModel]:
|
||||
"""Add multiple models to session and flush once.
|
||||
@@ -123,8 +142,11 @@ class BaseRepository:
|
||||
Returns:
|
||||
The persisted models with generated fields populated.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
self._session.add_all(models)
|
||||
await self._session.flush()
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.info("db_add_all_and_flush", duration_ms=round(elapsed_ms, 2), count=len(models))
|
||||
return models
|
||||
|
||||
async def _execute_count(self, stmt: Select[tuple[int]]) -> int:
|
||||
@@ -136,8 +158,12 @@ class BaseRepository:
|
||||
Returns:
|
||||
Integer count value.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
result = await self._session.execute(stmt)
|
||||
return result.scalar_one()
|
||||
count = result.scalar_one()
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.debug("db_execute_count", duration_ms=round(elapsed_ms, 2), count=count)
|
||||
return count
|
||||
|
||||
async def _execute_exists(self, stmt: Select[tuple[TExists]]) -> bool:
|
||||
"""Check if any rows match the query.
|
||||
@@ -150,8 +176,12 @@ class BaseRepository:
|
||||
Returns:
|
||||
True if at least one row exists.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
result = await self._session.execute(stmt.limit(1))
|
||||
return result.scalar() is not None
|
||||
exists = result.scalar() is not None
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.debug("db_execute_exists", duration_ms=round(elapsed_ms, 2), exists=exists)
|
||||
return exists
|
||||
|
||||
async def _update_fields(
|
||||
self,
|
||||
@@ -167,9 +197,12 @@ class BaseRepository:
|
||||
Returns:
|
||||
The updated model.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
for key, value in fields.items():
|
||||
setattr(model, key, value)
|
||||
await self._session.flush()
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.info("db_update_fields", duration_ms=round(elapsed_ms, 2), field_count=len(fields))
|
||||
return model
|
||||
|
||||
async def _execute_update(self, stmt: Update) -> int:
|
||||
@@ -181,8 +214,11 @@ class BaseRepository:
|
||||
Returns:
|
||||
Number of rows affected.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
result = cast(CursorResult[tuple[()]], await self._session.execute(stmt))
|
||||
await self._session.flush()
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.info("db_execute_update", duration_ms=round(elapsed_ms, 2), rows_affected=result.rowcount)
|
||||
return result.rowcount
|
||||
|
||||
async def _execute_delete(self, stmt: Delete) -> int:
|
||||
@@ -194,8 +230,11 @@ class BaseRepository:
|
||||
Returns:
|
||||
Number of rows deleted.
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
result = cast(CursorResult[tuple[()]], await self._session.execute(stmt))
|
||||
await self._session.flush()
|
||||
elapsed_ms = (time.perf_counter() - start) * 1000
|
||||
logger.info("db_execute_delete", duration_ms=round(elapsed_ms, 2), rows_deleted=result.rowcount)
|
||||
return result.rowcount
|
||||
|
||||
|
||||
|
||||
@@ -11,12 +11,15 @@ from sqlalchemy import and_, delete, or_, select
|
||||
from noteflow.domain.entities import Annotation
|
||||
from noteflow.domain.value_objects import AnnotationId
|
||||
from noteflow.infrastructure.converters import OrmConverter
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.persistence.models import AnnotationModel
|
||||
from noteflow.infrastructure.persistence.repositories._base import BaseRepository
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.domain.value_objects import MeetingId
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
class SqlAlchemyAnnotationRepository(BaseRepository):
|
||||
"""SQLAlchemy implementation of AnnotationRepository."""
|
||||
@@ -45,6 +48,12 @@ class SqlAlchemyAnnotationRepository(BaseRepository):
|
||||
self._session.add(model)
|
||||
await self._session.flush()
|
||||
annotation.db_id = model.id
|
||||
logger.info(
|
||||
"annotation_added",
|
||||
annotation_id=str(annotation.id),
|
||||
meeting_id=str(annotation.meeting_id),
|
||||
annotation_type=annotation.annotation_type.value,
|
||||
)
|
||||
return annotation
|
||||
|
||||
async def get(self, annotation_id: AnnotationId) -> Annotation | None:
|
||||
@@ -158,6 +167,11 @@ class SqlAlchemyAnnotationRepository(BaseRepository):
|
||||
model.segment_ids = annotation.segment_ids
|
||||
|
||||
await self._session.flush()
|
||||
logger.info(
|
||||
"annotation_updated",
|
||||
annotation_id=str(annotation.id),
|
||||
annotation_type=annotation.annotation_type.value,
|
||||
)
|
||||
return annotation
|
||||
|
||||
async def delete(self, annotation_id: AnnotationId) -> bool:
|
||||
@@ -179,4 +193,5 @@ class SqlAlchemyAnnotationRepository(BaseRepository):
|
||||
|
||||
await self._session.execute(delete(AnnotationModel).where(AnnotationModel.id == model.id))
|
||||
await self._session.flush()
|
||||
logger.info("annotation_deleted", annotation_id=str(annotation_id))
|
||||
return True
|
||||
|
||||
@@ -48,4 +48,10 @@ class FileSystemAssetRepository(AssetRepository):
|
||||
|
||||
if meeting_dir.exists():
|
||||
shutil.rmtree(meeting_dir)
|
||||
logger.info("Deleted meeting assets at %s", meeting_dir)
|
||||
logger.info("assets_deleted", meeting_id=str(meeting_id), path=str(meeting_dir))
|
||||
else:
|
||||
logger.debug(
|
||||
"assets_delete_skipped_not_found",
|
||||
meeting_id=str(meeting_id),
|
||||
path=str(meeting_dir),
|
||||
)
|
||||
|
||||
@@ -3,20 +3,24 @@
|
||||
from collections.abc import Sequence
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import UTC, datetime
|
||||
from typing import Final
|
||||
from typing import Final, Unpack
|
||||
from uuid import UUID
|
||||
|
||||
from sqlalchemy import delete, select, update
|
||||
from sqlalchemy.exc import IntegrityError
|
||||
|
||||
from noteflow.config.constants import ERR_SERVER_RESTARTED
|
||||
from noteflow.domain.ports.repositories.background import DiarizationStatusKwargs
|
||||
from noteflow.domain.utils.time import utc_now
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.persistence.models import (
|
||||
DiarizationJobModel,
|
||||
StreamingDiarizationTurnModel,
|
||||
)
|
||||
from noteflow.infrastructure.persistence.repositories._base import BaseRepository
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
# Job status constants (mirrors proto enum)
|
||||
JOB_STATUS_UNSPECIFIED: Final[int] = 0
|
||||
JOB_STATUS_QUEUED: Final[int] = 1
|
||||
@@ -103,6 +107,12 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
|
||||
except IntegrityError as exc:
|
||||
msg = f"DiarizationJob {job.job_id} already exists"
|
||||
raise ValueError(msg) from exc
|
||||
logger.info(
|
||||
"diarization_job_created",
|
||||
job_id=job.job_id,
|
||||
meeting_id=job.meeting_id,
|
||||
status=job.status,
|
||||
)
|
||||
return job
|
||||
|
||||
async def get(self, job_id: str) -> DiarizationJob | None:
|
||||
@@ -123,21 +133,14 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
|
||||
self,
|
||||
job_id: str,
|
||||
status: int,
|
||||
*,
|
||||
segments_updated: int | None = None,
|
||||
speaker_ids: list[str] | None = None,
|
||||
error_message: str | None = None,
|
||||
started_at: datetime | None = None,
|
||||
**kwargs: Unpack[DiarizationStatusKwargs],
|
||||
) -> bool:
|
||||
"""Update job status and optional fields.
|
||||
|
||||
Args:
|
||||
job_id: Job identifier.
|
||||
status: New status value.
|
||||
segments_updated: Optional segments count.
|
||||
speaker_ids: Optional speaker IDs list.
|
||||
error_message: Optional error message.
|
||||
started_at: Optional job start timestamp.
|
||||
**kwargs: Optional update fields.
|
||||
|
||||
Returns:
|
||||
True if job was updated, False if not found.
|
||||
@@ -146,6 +149,11 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
|
||||
"status": status,
|
||||
"updated_at": utc_now(),
|
||||
}
|
||||
segments_updated = kwargs.get("segments_updated")
|
||||
speaker_ids = kwargs.get("speaker_ids")
|
||||
error_message = kwargs.get("error_message")
|
||||
started_at = kwargs.get("started_at")
|
||||
|
||||
if segments_updated is not None:
|
||||
values["segments_updated"] = segments_updated
|
||||
if speaker_ids is not None:
|
||||
@@ -157,6 +165,8 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
|
||||
|
||||
stmt = update(DiarizationJobModel).where(DiarizationJobModel.id == job_id).values(**values)
|
||||
rowcount = await self._execute_update(stmt)
|
||||
if rowcount > 0:
|
||||
logger.info("diarization_job_status_updated", job_id=job_id, status=status)
|
||||
return rowcount > 0
|
||||
|
||||
async def list_for_meeting(self, meeting_id: str) -> Sequence[DiarizationJob]:
|
||||
@@ -235,7 +245,10 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
|
||||
updated_at=utc_now(),
|
||||
)
|
||||
)
|
||||
return await self._execute_update(stmt)
|
||||
count = await self._execute_update(stmt)
|
||||
if count > 0:
|
||||
logger.info("diarization_jobs_marked_failed", count=count)
|
||||
return count
|
||||
|
||||
async def prune_completed(self, ttl_seconds: float) -> int:
|
||||
"""Delete completed/failed jobs older than TTL.
|
||||
@@ -255,7 +268,10 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
|
||||
),
|
||||
DiarizationJobModel.updated_at < cutoff_dt,
|
||||
)
|
||||
return await self._execute_delete(stmt)
|
||||
count = await self._execute_delete(stmt)
|
||||
if count > 0:
|
||||
logger.info("diarization_jobs_pruned", count=count, ttl_seconds=ttl_seconds)
|
||||
return count
|
||||
|
||||
# Streaming diarization turn methods
|
||||
|
||||
@@ -286,6 +302,7 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
|
||||
self._session.add(model)
|
||||
|
||||
await self._session.flush()
|
||||
logger.debug("streaming_turns_added", meeting_id=meeting_id, count=len(turns))
|
||||
return len(turns)
|
||||
|
||||
async def get_streaming_turns(self, meeting_id: str) -> list[StreamingTurn]:
|
||||
@@ -329,4 +346,7 @@ class SqlAlchemyDiarizationJobRepository(BaseRepository):
|
||||
stmt = delete(StreamingDiarizationTurnModel).where(
|
||||
StreamingDiarizationTurnModel.meeting_id == UUID(meeting_id)
|
||||
)
|
||||
return await self._execute_delete(stmt)
|
||||
count = await self._execute_delete(stmt)
|
||||
if count > 0:
|
||||
logger.info("streaming_turns_cleared", meeting_id=meeting_id, count=count)
|
||||
return count
|
||||
|
||||
@@ -10,6 +10,7 @@ from sqlalchemy import delete, select
|
||||
|
||||
from noteflow.domain.entities.named_entity import EntityCategory, NamedEntity
|
||||
from noteflow.infrastructure.converters.ner_converters import NerConverter
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.persistence.models import NamedEntityModel
|
||||
from noteflow.infrastructure.persistence.repositories._base import (
|
||||
BaseRepository,
|
||||
@@ -20,6 +21,8 @@ from noteflow.infrastructure.persistence.repositories._base import (
|
||||
if TYPE_CHECKING:
|
||||
from noteflow.domain.value_objects import MeetingId
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
class SqlAlchemyEntityRepository(
|
||||
BaseRepository,
|
||||
@@ -53,6 +56,12 @@ class SqlAlchemyEntityRepository(
|
||||
merged = await self._session.merge(model)
|
||||
await self._session.flush()
|
||||
entity.db_id = merged.id
|
||||
logger.info(
|
||||
"entity_saved",
|
||||
entity_id=str(entity.id),
|
||||
meeting_id=str(entity.meeting_id),
|
||||
category=entity.category.value,
|
||||
)
|
||||
return entity
|
||||
|
||||
async def save_batch(self, entities: Sequence[NamedEntity]) -> Sequence[NamedEntity]:
|
||||
@@ -74,6 +83,12 @@ class SqlAlchemyEntityRepository(
|
||||
entity.db_id = merged.id
|
||||
|
||||
await self._session.flush()
|
||||
if entities:
|
||||
logger.info(
|
||||
"entities_batch_saved",
|
||||
meeting_id=str(entities[0].meeting_id),
|
||||
count=len(entities),
|
||||
)
|
||||
return entities
|
||||
|
||||
async def get(self, entity_id: UUID) -> NamedEntity | None:
|
||||
@@ -116,7 +131,10 @@ class SqlAlchemyEntityRepository(
|
||||
stmt = delete(NamedEntityModel).where(
|
||||
NamedEntityModel.meeting_id == UUID(str(meeting_id))
|
||||
)
|
||||
return await self._execute_delete(stmt)
|
||||
count = await self._execute_delete(stmt)
|
||||
if count > 0:
|
||||
logger.info("entities_deleted_by_meeting", meeting_id=str(meeting_id), count=count)
|
||||
return count
|
||||
|
||||
async def update_pinned(self, entity_id: UUID, is_pinned: bool) -> bool:
|
||||
"""Update the pinned status of an entity.
|
||||
@@ -136,6 +154,7 @@ class SqlAlchemyEntityRepository(
|
||||
|
||||
model.is_pinned = is_pinned
|
||||
await self._session.flush()
|
||||
logger.info("entity_pinned_updated", entity_id=str(entity_id), is_pinned=is_pinned)
|
||||
return True
|
||||
|
||||
async def exists_for_meeting(self, meeting_id: MeetingId) -> bool:
|
||||
@@ -184,6 +203,7 @@ class SqlAlchemyEntityRepository(
|
||||
model.category = EntityCategory.from_string(category).value
|
||||
|
||||
await self._session.flush()
|
||||
logger.info("entity_updated", entity_id=str(entity_id))
|
||||
return NerConverter.orm_to_domain(model)
|
||||
|
||||
async def delete(self, entity_id: UUID) -> bool:
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from typing import cast
|
||||
from typing import Unpack, cast
|
||||
from uuid import UUID
|
||||
|
||||
from sqlalchemy import and_, func, select
|
||||
@@ -19,11 +19,10 @@ from noteflow.config.constants import (
|
||||
RULE_FIELD_TEMPLATE_ID,
|
||||
RULE_FIELD_TRIGGER_RULES,
|
||||
)
|
||||
from noteflow.domain.entities.project import (
|
||||
ExportRules,
|
||||
Project,
|
||||
ProjectSettings,
|
||||
TriggerRules,
|
||||
from noteflow.domain.entities.project import ExportRules, Project, ProjectSettings, TriggerRules
|
||||
from noteflow.domain.ports.repositories.identity._project import (
|
||||
ProjectCreateKwargs,
|
||||
ProjectCreateOptions,
|
||||
)
|
||||
from noteflow.domain.value_objects import ExportFormat
|
||||
from noteflow.infrastructure.persistence.models import ProjectModel
|
||||
@@ -256,10 +255,7 @@ class SqlAlchemyProjectRepository(
|
||||
project_id: UUID,
|
||||
workspace_id: UUID,
|
||||
name: str,
|
||||
slug: str | None = None,
|
||||
description: str | None = None,
|
||||
is_default: bool = False,
|
||||
settings: ProjectSettings | None = None,
|
||||
**kwargs: Unpack[ProjectCreateKwargs],
|
||||
) -> Project:
|
||||
"""Create a new project.
|
||||
|
||||
@@ -267,22 +263,20 @@ class SqlAlchemyProjectRepository(
|
||||
project_id: UUID for the new project.
|
||||
workspace_id: Parent workspace UUID.
|
||||
name: Project name.
|
||||
slug: Optional URL slug.
|
||||
description: Optional description.
|
||||
is_default: Whether this is the workspace's default project.
|
||||
settings: Optional project settings.
|
||||
**kwargs: Optional creation settings.
|
||||
|
||||
Returns:
|
||||
Created project.
|
||||
"""
|
||||
settings_dict = self._settings_to_dict(settings) if settings else {}
|
||||
merged = _merge_project_create_options(kwargs)
|
||||
settings_dict = self._settings_to_dict(merged.settings) if merged.settings else {}
|
||||
model = ProjectModel(
|
||||
id=project_id,
|
||||
workspace_id=workspace_id,
|
||||
name=name,
|
||||
slug=slug,
|
||||
description=description,
|
||||
is_default=is_default,
|
||||
slug=merged.slug,
|
||||
description=merged.description,
|
||||
is_default=merged.is_default,
|
||||
settings=settings_dict,
|
||||
metadata_={},
|
||||
)
|
||||
@@ -427,3 +421,11 @@ class SqlAlchemyProjectRepository(
|
||||
stmt = select(func.count()).select_from(ProjectModel).where(and_(*conditions))
|
||||
result = await self._session.execute(stmt)
|
||||
return result.scalar() or 0
|
||||
def _merge_project_create_options(kwargs: ProjectCreateKwargs) -> ProjectCreateOptions:
|
||||
"""Normalize project creation options from keyword args."""
|
||||
return ProjectCreateOptions(
|
||||
slug=kwargs.get("slug"),
|
||||
description=kwargs.get("description"),
|
||||
is_default=kwargs.get("is_default", False),
|
||||
settings=kwargs.get("settings"),
|
||||
)
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Sequence
|
||||
from typing import cast
|
||||
from typing import Unpack, cast
|
||||
from uuid import UUID
|
||||
|
||||
from sqlalchemy import and_, select
|
||||
@@ -26,6 +26,7 @@ from noteflow.domain.identity import (
|
||||
WorkspaceRole,
|
||||
WorkspaceSettings,
|
||||
)
|
||||
from noteflow.domain.ports.repositories.identity._workspace import WorkspaceCreateKwargs
|
||||
from noteflow.domain.value_objects import ExportFormat
|
||||
from noteflow.infrastructure.persistence.models import (
|
||||
DEFAULT_WORKSPACE_ID,
|
||||
@@ -247,9 +248,7 @@ class SqlAlchemyWorkspaceRepository(
|
||||
workspace_id: UUID,
|
||||
name: str,
|
||||
owner_id: UUID,
|
||||
slug: str | None = None,
|
||||
is_default: bool = False,
|
||||
settings: WorkspaceSettings | None = None,
|
||||
**kwargs: Unpack[WorkspaceCreateKwargs],
|
||||
) -> Workspace:
|
||||
"""Create a new workspace with owner membership.
|
||||
|
||||
@@ -257,13 +256,14 @@ class SqlAlchemyWorkspaceRepository(
|
||||
workspace_id: UUID for the new workspace.
|
||||
name: Workspace name.
|
||||
owner_id: User UUID of the owner.
|
||||
slug: Optional URL slug.
|
||||
is_default: Whether this is the user's default workspace.
|
||||
settings: Optional workspace settings.
|
||||
**kwargs: Optional fields (slug, is_default, settings).
|
||||
|
||||
Returns:
|
||||
Created workspace.
|
||||
"""
|
||||
slug = kwargs.get("slug")
|
||||
is_default = kwargs.get("is_default", False)
|
||||
settings = kwargs.get("settings")
|
||||
model = WorkspaceModel(
|
||||
id=workspace_id,
|
||||
name=name,
|
||||
|
||||
@@ -12,6 +12,7 @@ from noteflow.infrastructure.converters.integration_converters import (
|
||||
IntegrationConverter,
|
||||
SyncRunConverter,
|
||||
)
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.persistence.models.integrations import (
|
||||
IntegrationModel,
|
||||
IntegrationSecretModel,
|
||||
@@ -23,6 +24,8 @@ from noteflow.infrastructure.persistence.repositories._base import (
|
||||
GetByIdMixin,
|
||||
)
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
class SqlAlchemyIntegrationRepository(
|
||||
BaseRepository,
|
||||
@@ -98,6 +101,12 @@ class SqlAlchemyIntegrationRepository(
|
||||
kwargs = IntegrationConverter.to_orm_kwargs(integration)
|
||||
model = IntegrationModel(**kwargs)
|
||||
await self._add_and_flush(model)
|
||||
logger.info(
|
||||
"integration_created",
|
||||
integration_id=str(model.id),
|
||||
integration_type=model.type,
|
||||
name=model.name,
|
||||
)
|
||||
return IntegrationConverter.orm_to_domain(model)
|
||||
|
||||
async def update(self, integration: Integration) -> Integration:
|
||||
@@ -128,6 +137,11 @@ class SqlAlchemyIntegrationRepository(
|
||||
model.updated_at = integration.updated_at
|
||||
|
||||
await self._session.flush()
|
||||
logger.info(
|
||||
"integration_updated",
|
||||
integration_id=str(integration.id),
|
||||
status=integration.status.value,
|
||||
)
|
||||
return IntegrationConverter.orm_to_domain(model)
|
||||
|
||||
async def delete(self, integration_id: UUID) -> bool:
|
||||
@@ -195,6 +209,11 @@ class SqlAlchemyIntegrationRepository(
|
||||
for key, value in secrets.items()
|
||||
]
|
||||
await self._add_all_and_flush(models)
|
||||
logger.info(
|
||||
"integration_secrets_updated",
|
||||
integration_id=str(integration_id),
|
||||
secret_count=len(secrets),
|
||||
)
|
||||
|
||||
async def list_by_type(self, integration_type: str) -> Sequence[Integration]:
|
||||
"""List integrations by type.
|
||||
@@ -237,6 +256,11 @@ class SqlAlchemyIntegrationRepository(
|
||||
kwargs = SyncRunConverter.to_orm_kwargs(sync_run)
|
||||
model = IntegrationSyncRunModel(**kwargs)
|
||||
await self._add_and_flush(model)
|
||||
logger.info(
|
||||
"sync_run_created",
|
||||
sync_run_id=str(model.id),
|
||||
integration_id=str(sync_run.integration_id),
|
||||
)
|
||||
return SyncRunConverter.orm_to_domain(model)
|
||||
|
||||
async def get_sync_run(self, sync_run_id: UUID) -> SyncRun | None:
|
||||
@@ -282,6 +306,12 @@ class SqlAlchemyIntegrationRepository(
|
||||
model.stats = sync_run.stats
|
||||
|
||||
await self._session.flush()
|
||||
logger.info(
|
||||
"sync_run_updated",
|
||||
sync_run_id=str(sync_run.id),
|
||||
status=sync_run.status.value,
|
||||
duration_ms=sync_run.duration_ms,
|
||||
)
|
||||
return SyncRunConverter.orm_to_domain(model)
|
||||
|
||||
async def list_sync_runs(
|
||||
|
||||
@@ -2,17 +2,22 @@
|
||||
|
||||
from collections.abc import Sequence
|
||||
from datetime import datetime
|
||||
from typing import Unpack
|
||||
from uuid import UUID
|
||||
|
||||
from sqlalchemy import func, select
|
||||
|
||||
from noteflow.config.constants import ERROR_MSG_MEETING_PREFIX
|
||||
from noteflow.domain.entities import Meeting
|
||||
from noteflow.domain.ports.repositories.transcript import MeetingListKwargs
|
||||
from noteflow.domain.value_objects import MeetingId, MeetingState
|
||||
from noteflow.infrastructure.converters import OrmConverter
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.persistence.models import MeetingModel
|
||||
from noteflow.infrastructure.persistence.repositories._base import BaseRepository
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
class SqlAlchemyMeetingRepository(BaseRepository):
|
||||
"""SQLAlchemy implementation of MeetingRepository."""
|
||||
@@ -41,6 +46,7 @@ class SqlAlchemyMeetingRepository(BaseRepository):
|
||||
)
|
||||
self._session.add(model)
|
||||
await self._session.flush()
|
||||
logger.info("meeting_created", meeting_id=str(meeting.id))
|
||||
return meeting
|
||||
|
||||
async def get(self, meeting_id: MeetingId) -> Meeting | None:
|
||||
@@ -91,6 +97,7 @@ class SqlAlchemyMeetingRepository(BaseRepository):
|
||||
meeting.version = model.version
|
||||
|
||||
await self._session.flush()
|
||||
logger.info("meeting_updated", meeting_id=str(meeting.id), version=meeting.version)
|
||||
return meeting
|
||||
|
||||
async def delete(self, meeting_id: MeetingId) -> bool:
|
||||
@@ -106,30 +113,31 @@ class SqlAlchemyMeetingRepository(BaseRepository):
|
||||
model = await self._execute_scalar(stmt)
|
||||
|
||||
if model is None:
|
||||
logger.debug("meeting_delete_not_found", meeting_id=str(meeting_id))
|
||||
return False
|
||||
|
||||
await self._delete_and_flush(model)
|
||||
logger.info("meeting_deleted", meeting_id=str(meeting_id))
|
||||
return True
|
||||
|
||||
async def list_all(
|
||||
self,
|
||||
states: list[MeetingState] | None = None,
|
||||
limit: int = 100,
|
||||
offset: int = 0,
|
||||
sort_desc: bool = True,
|
||||
project_id: UUID | None = None,
|
||||
**kwargs: Unpack[MeetingListKwargs],
|
||||
) -> tuple[Sequence[Meeting], int]:
|
||||
"""List meetings with optional filtering.
|
||||
|
||||
Args:
|
||||
states: Optional list of states to filter by.
|
||||
limit: Maximum number of meetings to return.
|
||||
offset: Number of meetings to skip.
|
||||
sort_desc: Sort by created_at descending if True.
|
||||
**kwargs: Optional filters (states, limit, offset, sort_desc, project_id).
|
||||
|
||||
Returns:
|
||||
Tuple of (meetings list, total count matching filter).
|
||||
"""
|
||||
states = kwargs.get("states")
|
||||
limit = kwargs.get("limit", 100)
|
||||
offset = kwargs.get("offset", 0)
|
||||
sort_desc = kwargs.get("sort_desc", True)
|
||||
project_id = kwargs.get("project_id")
|
||||
|
||||
# Build base query
|
||||
stmt = select(MeetingModel)
|
||||
|
||||
|
||||
@@ -8,9 +8,12 @@ from datetime import datetime
|
||||
|
||||
from sqlalchemy import func, select
|
||||
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.persistence.models import UserPreferencesModel
|
||||
from noteflow.infrastructure.persistence.repositories._base import BaseRepository
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class PreferenceWithMetadata:
|
||||
@@ -77,10 +80,12 @@ class SqlAlchemyPreferencesRepository(BaseRepository):
|
||||
if model is None:
|
||||
model = UserPreferencesModel(key=key, value={"value": value})
|
||||
self._session.add(model)
|
||||
await self._session.flush()
|
||||
logger.info("preference_created", key=key)
|
||||
else:
|
||||
model.value = {"value": value}
|
||||
|
||||
await self._session.flush()
|
||||
await self._session.flush()
|
||||
logger.info("preference_updated", key=key)
|
||||
|
||||
async def delete(self, key: str) -> bool:
|
||||
"""Delete a preference.
|
||||
@@ -97,6 +102,7 @@ class SqlAlchemyPreferencesRepository(BaseRepository):
|
||||
return False
|
||||
|
||||
await self._delete_and_flush(model)
|
||||
logger.info("preference_deleted", key=key)
|
||||
return True
|
||||
|
||||
async def get_all(self, keys: list[str] | None = None) -> dict[str, object]:
|
||||
|
||||
@@ -8,9 +8,12 @@ from sqlalchemy import func, select, update
|
||||
from noteflow.domain.entities import Segment
|
||||
from noteflow.domain.value_objects import MeetingId
|
||||
from noteflow.infrastructure.converters import OrmConverter
|
||||
from noteflow.infrastructure.logging import get_logger
|
||||
from noteflow.infrastructure.persistence.models import SegmentModel, WordTimingModel
|
||||
from noteflow.infrastructure.persistence.repositories._base import BaseRepository
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
class SqlAlchemySegmentRepository(BaseRepository):
|
||||
"""SQLAlchemy implementation of SegmentRepository."""
|
||||
@@ -66,6 +69,7 @@ class SqlAlchemySegmentRepository(BaseRepository):
|
||||
# Update segment with db_id
|
||||
segment.db_id = model.id
|
||||
segment.meeting_id = meeting_id
|
||||
logger.info("segment_added", meeting_id=str(meeting_id), segment_id=segment.segment_id)
|
||||
return segment
|
||||
|
||||
async def add_batch(
|
||||
@@ -102,6 +106,7 @@ class SqlAlchemySegmentRepository(BaseRepository):
|
||||
segment.db_id = model.id
|
||||
segment.meeting_id = meeting_id
|
||||
|
||||
logger.info("segments_batch_added", meeting_id=str(meeting_id), count=len(segments))
|
||||
return list(segments)
|
||||
|
||||
async def get_by_meeting(
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user