chore: update logging configuration and enhance project structure

- Added new logging configuration to improve observability across various services.
- Introduced a `.repomixignore` file to exclude unnecessary files from version control.
- Updated `pyproject.toml` to include additional paths for script discovery.
- Refreshed submodule references for the client to ensure compatibility with recent changes.

All quality checks pass.
This commit is contained in:
2025-12-31 15:23:57 +00:00
parent bbc88ed10b
commit 96ed391a7c
147 changed files with 9879 additions and 1031 deletions

View File

@@ -9,7 +9,7 @@ conditions:
pattern: tests?/.*\.py$
- field: new_text
operator: regex_match
pattern: \b(for|while)\s+[^:]+:[\s\S]*?(assert|pytest\.raises)|if\s+[^:]+:[\s\S]*?(assert|pytest\.raises)
pattern: \b(for|while|if)\s+[^:]+:[\s\S]*?(assert|pytest\.raises)
---
🚫 **Test Quality Violation: Loops or Conditionals in Tests**

View File

@@ -1,4 +1,116 @@
# Add patterns to ignore here, one per line
# Example:
# *.log
# tmp/
# Generated protobuf files (large, auto-generated)
**/*_pb2.py
**/*_pb2_grpc.py
**/*.pb2.py
**/*.pb2_grpc.py
# Lock files (very large, not needed for code understanding)
uv.lock
**/Cargo.lock
**/package-lock.json
**/bun.lockb
**/yarn.lock
**/*.lock
# Binary/image files
**/*.png
**/*.jpg
**/*.jpeg
**/*.gif
**/*.ico
**/*.svg
**/*.icns
**/*.webp
client/app-icon.png
client/public/favicon.ico
client/public/placeholder.svg
# Build artifacts and generated code
**/target/
**/gen/
**/dist/
**/build/
**/.vite/
**/node_modules/
**/__pycache__/
**/*.egg-info/
**/.pytest_cache/
**/.mypy_cache/
**/.ruff_cache/
**/coverage/
**/htmlcov/
**/playwright-report/
**/test-results/
# Documentation (verbose, can be referenced separately)
docs/
**/*.md
!README.md
!AGENTS.md
!CLAUDE.md
# Benchmark files
.benchmarks/
**/*.json
!package.json
!tsconfig*.json
!biome.json
!components.json
!compose.yaml
!alembic.ini
!pyproject.toml
!repomix.config.json
# Large API spec file
noteflow-api-spec.json
# IDE and editor files
.vscode/
.idea/
*.swp
*.swo
*.swn
*.code-workspace
# Temporary and scratch files
*.tmp
*.temp
scratch.md
repomix-output.md
# Environment files
.env
.env.*
!.env.example
# Logs
logs/
*.log
# Spikes (experimental code)
spikes/
# Python virtual environment
.venv/
venv/
env/
# OS files
.DS_Store
._*
.Spotlight-V100
.Trashes
Thumbs.db
ehthumbs.db
*~
# Git files
.git/
.gitmodules
# Claude/Serena project files (internal tooling)
.claude/
.serena/
# Dev container configs (not needed for code understanding)
.devcontainer/

2
client

Submodule client updated: d85f9edd6d...4e52a319fb

View File

@@ -0,0 +1,415 @@
# NoteFlow Spec Validation (2025-12-31)
This document validates the previous spec review against the current repository state and
adds concrete evidence with file locations and excerpts. Locations are given as
`path:line` within this repo.
## Corrections vs prior spec (validated)
- Background tasks: diarization jobs are already tracked and cancelled on shutdown; the
untracked task issue is specific to integration sync tasks.
- Webhook executor already uses per-request timeouts and truncates response bodies; gaps
are delivery-id tracking and connection limits, not retry logic itself.
- Outlook adapter error handling is synchronous-safe with `response.text`, but lacks
explicit timeouts, pagination, and bounded error body logging.
---
## High-impact findings (confirmed/updated)
### 1) Timestamp representations are inconsistent across the gRPC schema
Status: Confirmed.
Evidence:
- `src/noteflow/grpc/proto/noteflow.proto:217`
```proto
// Creation timestamp (Unix epoch seconds)
double created_at = 4;
```
- `src/noteflow/grpc/proto/noteflow.proto:745`
```proto
// Start time (Unix timestamp seconds)
int64 start_time = 3;
```
- `src/noteflow/grpc/proto/noteflow.proto:1203`
```proto
// Start timestamp (ISO 8601)
string started_at = 7;
```
- `src/noteflow/grpc/proto/noteflow.proto:149`
```proto
// Server-side processing timestamp
double server_timestamp = 5;
```
Why it matters:
- Multiple time encodings (double seconds, int64 seconds, ISO strings) force
per-field conversions and increase client/server mismatch risk.
Recommendations:
- Standardize absolute time to `google.protobuf.Timestamp` and durations to
`google.protobuf.Duration` in new fields or v2 messages.
- Keep legacy fields for backward compatibility and deprecate them in comments.
- Provide helper conversions in `src/noteflow/grpc/_mixins/converters.py` to reduce
repeated ad-hoc conversions.
---
### 2) UpdateAnnotation uses sentinel defaults with no presence tracking
Status: Confirmed.
Evidence:
- `src/noteflow/grpc/proto/noteflow.proto:502`
```proto
message UpdateAnnotationRequest {
// Updated type (optional, keeps existing if not set)
AnnotationType annotation_type = 2;
// Updated text (optional, keeps existing if empty)
string text = 3;
// Updated start time (optional, keeps existing if 0)
double start_time = 4;
// Updated end time (optional, keeps existing if 0)
double end_time = 5;
// Updated segment IDs (replaces existing)
repeated int32 segment_ids = 6;
}
```
- `src/noteflow/grpc/_mixins/annotation.py:127`
```python
# Update fields if provided
if request.annotation_type != noteflow_pb2.ANNOTATION_TYPE_UNSPECIFIED:
annotation.annotation_type = proto_to_annotation_type(request.annotation_type)
if request.text:
annotation.text = request.text
if request.start_time > 0:
annotation.start_time = request.start_time
if request.end_time > 0:
annotation.end_time = request.end_time
if request.segment_ids:
annotation.segment_ids = list(request.segment_ids)
```
- Contrast: presence-aware optional fields already exist elsewhere:
`src/noteflow/grpc/proto/noteflow.proto:973`
```proto
message UpdateWebhookRequest {
// Updated URL (optional)
optional string url = 2;
// Updated name (optional)
optional string name = 4;
// Updated enabled status (optional)
optional bool enabled = 6;
}
```
Why it matters:
- You cannot clear text to an empty string or set a time to 0 intentionally.
- `segment_ids` cannot be cleared because an empty list is treated as "no update".
Recommendations:
- Introduce a patch-style request with `google.protobuf.FieldMask` (or `optional` fields)
and keep the legacy fields for backward compatibility.
- If you keep legacy fields, add explicit `clear_*` flags for fields that need clearing.
---
### 3) TranscriptUpdate payload is ambiguous without `oneof`
Status: Confirmed.
Evidence:
- `src/noteflow/grpc/proto/noteflow.proto:136`
```proto
message TranscriptUpdate {
string meeting_id = 1;
UpdateType update_type = 2;
string partial_text = 3;
FinalSegment segment = 4;
double server_timestamp = 5;
}
```
Why it matters:
- The schema allows both `partial_text` and `segment` or neither, even when
`update_type` implies one payload. Clients must defensively branch.
Recommendations:
- Add a new `TranscriptUpdateV2` with `oneof payload { PartialTranscript partial = 4; FinalSegment segment = 5; }`
and a new RPC (e.g., `StreamTranscriptionV2`) to avoid breaking existing clients.
- Prefer `google.protobuf.Timestamp` for `server_timestamp` in the v2 message.
---
### 4) Background task tracking is inconsistent
Status: Partially confirmed.
Evidence (tracked + cancelled diarization tasks):
- `src/noteflow/grpc/_mixins/diarization/_jobs.py:130`
```python
# Create background task and store reference for potential cancellation
task = asyncio.create_task(self._run_diarization_job(job_id, num_speakers))
self._diarization_tasks[job_id] = task
```
- `src/noteflow/grpc/service.py:445`
```python
for job_id, task in list(self._diarization_tasks.items()):
if not task.done():
task.cancel()
with contextlib.suppress(asyncio.CancelledError):
await task
```
Evidence (untracked sync tasks):
- `src/noteflow/grpc/_mixins/sync.py:109`
```python
sync_task = asyncio.create_task(
self._perform_sync(integration_id, sync_run.id, str(provider)),
name=f"sync-{sync_run.id}",
)
# Add callback to clean up on completion
sync_task.add_done_callback(lambda _: None)
```
Why it matters:
- Sync tasks are not stored for cancellation on shutdown and exceptions are not
centrally observed (even if `_perform_sync` handles most errors).
Recommendations:
- Add a shared background-task registry (or a `TaskGroup`) in the servicer and
register sync tasks so they can be cancelled on shutdown.
- Use a done-callback that logs uncaught exceptions and removes the task from the registry.
---
### 5) Segmenter leading buffer uses O(n) `pop(0)` in a hot path
Status: Confirmed.
Evidence:
- `src/noteflow/infrastructure/asr/segmenter.py:233`
```python
while total_duration > self.config.leading_buffer and self._leading_buffer:
removed = self._leading_buffer.pop(0)
self._leading_buffer_samples -= len(removed)
```
Why it matters:
- `pop(0)` shifts the list each time, causing O(n) behavior under sustained audio streaming.
Recommendations:
- Replace the list with `collections.deque` and use `popleft()` for O(1) removals.
---
### 6) ChunkedAssetReader lacks strict bounds checks for chunk framing
Status: Partially confirmed.
Evidence:
- `src/noteflow/infrastructure/security/crypto.py:279`
```python
length_bytes = self._handle.read(4)
if len(length_bytes) < 4:
break # End of file
chunk_length = struct.unpack(">I", length_bytes)[0]
chunk_data = self._handle.read(chunk_length)
if len(chunk_data) < chunk_length:
raise ValueError("Truncated chunk")
nonce = chunk_data[:NONCE_SIZE]
ciphertext = chunk_data[NONCE_SIZE:-TAG_SIZE]
tag = chunk_data[-TAG_SIZE:]
```
Why it matters:
- There is no explicit guard for `chunk_length < NONCE_SIZE + TAG_SIZE`, which can
create invalid slices and decryption failures.
- A short read of the 1-byte version header in `open()` is not checked before unpacking.
Recommendations:
- Add a `read_exact()` helper and validate `chunk_length >= NONCE_SIZE + TAG_SIZE`.
- Treat partial length headers as errors (or explicitly document EOF behavior).
- Consider optional AAD (chunk index/version) to detect reordering if needed.
---
## Medium-priority, but worth fixing
### 7) gRPC size limits are defined in multiple places
Status: Confirmed.
Evidence:
- `src/noteflow/grpc/service.py:86`
```python
MAX_CHUNK_SIZE: Final[int] = 1024 * 1024 # 1MB
```
- `src/noteflow/config/constants.py:27`
```python
MAX_GRPC_MESSAGE_SIZE: Final[int] = 100 * 1024 * 1024
```
- `src/noteflow/grpc/server.py:158`
```python
self._server = grpc.aio.server(
options=[
("grpc.max_send_message_length", 100 * 1024 * 1024),
("grpc.max_receive_message_length", 100 * 1024 * 1024),
],
)
```
Why it matters:
- Multiple sources of truth can drift and the service advertises `MAX_CHUNK_SIZE`
without enforcing it in the streaming path.
Recommendations:
- Move message size and chunk size into `Settings` and use them consistently in
`server.py` and `service.py`.
- Enforce chunk size in streaming handlers and surface the same value in `ServerInfo`.
---
### 8) Outlook adapter lacks explicit timeouts and pagination handling
Status: Confirmed.
Evidence:
- `src/noteflow/infrastructure/calendar/outlook_adapter.py:81`
```python
async with httpx.AsyncClient() as client:
response = await client.get(url, params=params, headers=headers)
if response.status_code != HTTP_STATUS_OK:
error_msg = response.text
logger.error("Microsoft Graph API error: %s", error_msg)
raise OutlookCalendarError(f"{ERR_API_PREFIX}{error_msg}")
```
Why it matters:
- No explicit timeouts or connection limits are set.
- Graph API frequently paginates via `@odata.nextLink`.
- Error bodies are logged in full (could be large).
Recommendations:
- Configure `httpx.AsyncClient(timeout=..., limits=httpx.Limits(...))`.
- Implement pagination with `@odata.nextLink` to honor `limit` correctly.
- Truncate error bodies before logging and raise a bounded error message.
---
### 9) Webhook executor: delivery ID is not recorded, and client limits are missing
Status: Partially confirmed.
Evidence:
- `src/noteflow/infrastructure/webhooks/executor.py:255`
```python
delivery_id = str(uuid4())
headers = {
HTTP_HEADER_WEBHOOK_DELIVERY: delivery_id,
HTTP_HEADER_WEBHOOK_TIMESTAMP: timestamp,
}
```
- `src/noteflow/infrastructure/webhooks/executor.py:306`
```python
return WebhookDelivery(
id=uuid4(),
webhook_id=config.id,
event_type=event_type,
...
)
```
- Client is initialized without limits:
`src/noteflow/infrastructure/webhooks/executor.py:103`
```python
self._client = httpx.AsyncClient(timeout=self._timeout)
```
Why it matters:
- The delivery ID sent to recipients is not stored in delivery records, making
correlation harder.
- Connection pooling limits are unspecified.
Recommendations:
- Reuse `delivery_id` as `WebhookDelivery.id` or add a dedicated field to persist it.
- Add `httpx.Limits` (max connections/keepalive) and consider retrying with `Retry-After`
for 429s.
- Include `delivery_id` in logs and any audit trail fields.
---
### 10) OpenTelemetry exporter uses `insecure=True`
Status: Confirmed.
Evidence:
- `src/noteflow/infrastructure/observability/otel.py:99`
```python
otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True)
```
Why it matters:
- TLS is disabled unconditionally when OTLP is configured, even in production.
Recommendations:
- Make `insecure` a settings flag or infer it from the endpoint scheme.
---
## Cross-cutting design recommendations
### 11) Replace stringly-typed statuses with enums in proto
Status: Confirmed.
Evidence:
- `src/noteflow/grpc/proto/noteflow.proto:1191`
```proto
// Status: "running", "success", "error"
string status = 3;
```
- `src/noteflow/grpc/proto/noteflow.proto:856`
```proto
// Connection status: disconnected, connected, error
string status = 2;
```
Why it matters:
- Clients must match string literals and risk typos or unsupported values.
Recommendations:
- Introduce enums (e.g., `SyncRunStatus`, `OAuthConnectionStatus`) with explicit values
and migrate clients gradually via new fields or v2 messages.
---
### 12) Test targets to cover the highest-risk changes
Status: Recommendation.
Existing coverage highlights:
- Segmenter fuzz tests already exist: `tests/stress/test_segmenter_fuzz.py`.
- Crypto chunk reader integrity tests exist: `tests/stress/test_audio_integrity.py`.
Suggested additions:
- A gRPC proto-level test for patch semantics on `UpdateAnnotation` once a mask/optional
field approach is introduced.
- A sync task lifecycle test that asserts background tasks are cancelled on shutdown.
- An Outlook adapter test that simulates `@odata.nextLink` pagination.
---
## Small, low-risk cleanup opportunities
- Consider replacing `Delete*Response { bool success }` in new RPCs with
`google.protobuf.Empty` to reduce payload variability.
- Audit other timestamp fields (`double` vs `int64` vs `string`) and normalize when
introducing new API versions.

View File

@@ -42,14 +42,61 @@ npm run quality:all # TS + Rust quality
### Code Limits
| Metric | Soft Limit | Hard Limit | Location |
|--------|------------|------------|----------|
| Module lines | 500 | 750 | `test_code_smells.py` |
| Function lines | 50 (tests), 75 (src) | | `test_code_smells.py` |
| Function complexity | 15 | | `test_code_smells.py` |
| Parameters | 7 | | `test_code_smells.py` |
| Class methods | 20 | | `test_code_smells.py` |
| Nesting depth | 5 | | `test_code_smells.py` |
| Metric | Threshold | Max Violations | Location |
|--------|-----------|----------------|----------|
| Module lines (soft) | 500 | 5 | `test_code_smells.py` |
| Module lines (hard) | 750 | 0 | `test_code_smells.py` |
| Function lines (src) | 75 | 7 | `test_code_smells.py` |
| Function lines (tests) | 50 | 3 | `test_test_smells.py` |
| Function complexity | 15 | 2 | `test_code_smells.py` |
| Parameters | 7 | 35 | `test_code_smells.py` |
| Class methods | 20 | 1 | `test_code_smells.py` |
| Class lines (god class) | 500 | 1 | `test_code_smells.py` |
| Nesting depth | 5 | 2 | `test_code_smells.py` |
| Feature envy | 5+ accesses | 5 | `test_code_smells.py` |
### Magic Values & Literals (`test_magic_values.py`)
| Rule | Max Allowed | Target | Description |
|------|-------------|--------|-------------|
| Magic numbers (>100) | 10 | 0 | Use named constants |
| Repeated string literals | 30 | 0 | Extract to constants |
| Hardcoded paths | 0 | 0 | Use Path objects/config |
| Hardcoded credentials | 0 | 0 | Use env vars/secrets |
### Stale Code (`test_stale_code.py`)
| Rule | Max Allowed | Target | Description |
|------|-------------|--------|-------------|
| Stale TODO/FIXME comments | 10 | 0 | Address or remove |
| Commented-out code blocks | 0 | 0 | Remove dead code |
| Unused imports | 5 | 0 | Remove or use |
| Unreachable code | 0 | 0 | Remove dead paths |
| Deprecated patterns | 5 | 0 | Modernize code |
### Duplicate Code (`test_duplicate_code.py`)
| Rule | Max Allowed | Target | Description |
|------|-------------|--------|-------------|
| Duplicate function bodies | 1 | 0 | Extract shared functions |
| Repeated code patterns | 177 | 50 | Refactor to reduce duplication |
### Unnecessary Wrappers (`test_unnecessary_wrappers.py`)
| Rule | Max Allowed | Target | Description |
|------|-------------|--------|-------------|
| Trivial wrapper functions | varies | 0 | Remove or add value |
| Alias imports | varies | 0 | Import directly |
| Redundant type aliases | 2 | 0 | Use original types |
| Passthrough classes | 1 | 0 | Flatten hierarchy |
### Decentralized Helpers (`test_decentralized_helpers.py`)
| Rule | Max Allowed | Target | Description |
|------|-------------|--------|-------------|
| Scattered helper functions | 15 | 5 | Consolidate to utils |
| Utility modules not centralized | 0 | 0 | Move to shared location |
| Duplicate helper implementations | 25 | 0 | Deduplicate |
### Test Requirements
@@ -57,27 +104,33 @@ npm run quality:all # TS + Rust quality
| Rule | Max Allowed | Target | File |
|------|-------------|--------|------|
| Assertion roulette (>3 assertions without msg) | 25 | 0 | `test_test_smells.py` |
| Conditional test logic | 15 | 0 | `test_test_smells.py` |
| Assertion roulette (>3 assertions without msg) | 50 | 0 | `test_test_smells.py` |
| Conditional test logic | 40 | 0 | `test_test_smells.py` |
| Empty tests | 0 | 0 | `test_test_smells.py` |
| Sleepy tests (time.sleep) | 3 | 0 | `test_test_smells.py` |
| Tests without assertions | 3 | 0 | `test_test_smells.py` |
| Tests without assertions | 5 | 0 | `test_test_smells.py` |
| Redundant assertions | 0 | 0 | `test_test_smells.py` |
| Print statements in tests | 3 | 0 | `test_test_smells.py` |
| Print statements in tests | 5 | 0 | `test_test_smells.py` |
| Skipped tests without reason | 0 | 0 | `test_test_smells.py` |
| Exception handling (try/except) | 3 | 0 | `test_test_smells.py` |
| Magic numbers in assertions | 25 | 10 | `test_test_smells.py` |
| Duplicate test names | 5 | 0 | `test_test_smells.py` |
| Exception handling (broad try/except) | 3 | 0 | `test_test_smells.py` |
| Magic numbers in assertions | 50 | 10 | `test_test_smells.py` |
| Sensitive equality (str/repr compare) | 10 | 0 | `test_test_smells.py` |
| Eager tests (>10 method calls) | 10 | 0 | `test_test_smells.py` |
| Duplicate test names | 15 | 0 | `test_test_smells.py` |
| Hardcoded test data paths | 0 | 0 | `test_test_smells.py` |
| Long test methods (>50 lines) | 3 | 0 | `test_test_smells.py` |
| unittest-style assertions | 0 | 0 | `test_test_smells.py` |
| Fixtures without type hints | 5 | 0 | `test_test_smells.py` |
| Unused fixture parameters | 3 | 0 | `test_test_smells.py` |
| pytest.raises without match= | 20 | 0 | `test_test_smells.py` |
| Session fixtures with mutation | 0 | 0 | `test_test_smells.py` |
| Fixtures without type hints | 10 | 0 | `test_test_smells.py` |
| Unused fixture parameters | 5 | 0 | `test_test_smells.py` |
| Fixtures with wrong scope | 5 | 0 | `test_test_smells.py` |
| Conftest fixture duplication | 0 | 0 | `test_test_smells.py` |
| pytest.raises without match= | 50 | 0 | `test_test_smells.py` |
| Cross-file fixture duplicates | 0 | 0 | `test_test_smells.py` |
**Reduction schedule**:
- After each sprint, reduce non-zero thresholds by 20% (rounded down)
- Goal: All thresholds at target values by Sprint 6
- Goal: All thresholds at target values by Sprint 8
### Docstring Requirements
@@ -134,12 +187,13 @@ npm run quality:all # TS + Rust quality
| Repeated strings | >3 occurrences | Extract to constants |
| TODO/FIXME comments | >10 | Address or remove |
| Long functions | >100 lines | Split into helpers |
| Deep nesting | >5 levels (20 spaces) | Flatten control flow |
| Deep nesting | >7 levels (28 spaces) | Flatten control flow |
| unwrap() calls | >20 | Use ? or expect() |
| clone() per file | >10 | Review ownership |
| clone() per file | >10 suspicious | Review ownership (excludes Arc::clone, handles) |
| Parameters | >5 | Use struct/builder |
| Duplicate error messages | >2 | Use error enum |
| File size | >500 lines | Split module |
| Scattered helpers | >10 files | Consolidate format_/parse_/convert_ functions |
### Clippy Enforcement

View File

@@ -0,0 +1,503 @@
# Sprint: Centralized Logging Infrastructure
| Attribute | Value |
|-----------|-------|
| **Size** | L (Large) |
| **Phase** | Infrastructure |
| **Prerequisites** | None |
| **Owner** | TBD |
---
## Open Issues
| Issue | Blocking? | Resolution Path |
|-------|-----------|-----------------|
| LogBuffer integration | No | Adapt LogBuffer to consume structlog events |
| CLI modules use Rich Console | No | Ensure no conflicts with structlog Rich renderer |
---
## Validation Status
| Component | Exists | Notes |
|-----------|--------|-------|
| `LogBuffer` / `LogBufferHandler` | Yes | Needs adaptation for structlog |
| `get_logging_context()` | Yes | Context vars for request_id, user_id, workspace_id |
| OTEL trace context capture | Yes | `_get_current_trace_context()` in log_buffer.py |
| Rich dependency | Yes | `rich>=14.2.0` in pyproject.toml |
| structlog dependency | No | Must add to pyproject.toml |
---
## Objective
Centralize NoteFlow's logging infrastructure using **structlog** with dual output: **Rich console rendering** for development and **JSON output** for observability/OTEL integration. Migrate all 71 existing files from stdlib `logging` to structlog while preserving existing context propagation and OTEL trace correlation.
---
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| **Library** | structlog + Rich | Structured logging with context binding; Rich console renderer included |
| **Output Strategy** | Dual simultaneous | JSON to file/collector AND Rich to console always |
| **Context Handling** | Auto-inject + override | Leverage existing context vars; allow per-call extras |
| **Migration Scope** | Full migration | Convert all 71 files to `structlog.get_logger()` |
| **stdlib Bridge** | Yes | Use `structlog.stdlib` for seamless integration |
---
## What Already Exists
### Reusable Assets
| Asset | Location | Reuse Strategy |
|-------|----------|----------------|
| Context variables | `infrastructure/logging/structured.py` | Inject via structlog processor |
| LogEntry dataclass | `infrastructure/logging/log_buffer.py` | Adapt as structlog processor output |
| LogBuffer | `infrastructure/logging/log_buffer.py` | Create structlog processor that feeds LogBuffer |
| OTEL trace extraction | `infrastructure/logging/log_buffer.py:132-156` | Convert to structlog processor |
| Observability setup | `infrastructure/observability/otel.py` | Integrate with structlog OTEL processor |
### Current Logging Patterns
```python
# Pattern 1: Module-level logger (71 files)
import logging
logger = logging.getLogger(__name__)
logger.info("message", extra={...})
# Pattern 2: %-style formatting (widespread)
logger.warning("Failed to process %s: %s", item_id, error)
# Pattern 3: Exception logging
logger.exception("Operation failed")
# Pattern 4: CLI modules (2 files)
logging.basicConfig(level=logging.INFO)
```
---
## Scope
### Task Breakdown
| Task | Effort | Description |
|------|--------|-------------|
| **T1: Core Configuration** | M | Create `configure_logging()` with dual output |
| **T2: Structlog Processors** | M | Build processor chain (context, OTEL, timestamps) |
| **T3: Rich Renderer Integration** | S | Configure structlog's ConsoleRenderer with Rich |
| **T4: JSON Renderer** | S | Configure JSONRenderer for observability |
| **T5: LogBuffer Processor** | M | Create processor that feeds existing LogBuffer |
| **T6: Context Injection Processor** | S | Processor using `get_logging_context()` |
| **T7: OTEL Span Processor** | S | Extract trace_id/span_id from current span |
| **T8: Entry Point Updates** | S | Update `grpc/server.py`, CLI entry points |
| **T9: Migration Script** | M | AST-based migration of 71 files |
| **T10: File Migration (Batch 1)** | L | Migrate application/services (12 files) |
| **T11: File Migration (Batch 2)** | L | Migrate infrastructure/* (35 files) |
| **T12: File Migration (Batch 3)** | L | Migrate grpc/* (24 files) |
| **T13: Test Updates** | M | Update test fixtures and assertions |
| **T14: Documentation** | S | Update CLAUDE.md and add logging guide |
**Total Effort:** XL (spans multiple sessions)
---
## Architecture
### Processor Chain
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Structlog Processor Chain │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. filter_by_level ─► Skip if level too low │
│ 2. add_logger_name ─► Add logger name to event │
│ 3. add_log_level ─► Add level string │
│ 4. PositionalArgumentsFormatter ─► Handle %-style formatting │
│ 5. TimeStamper(fmt="iso") ─► ISO 8601 timestamp │
│ 6. add_noteflow_context ─► request_id, user_id, workspace_id │
│ 7. add_otel_trace_context ─► trace_id, span_id, parent_span_id │
│ 8. CallsiteParameterAdder ─► filename, func_name, lineno │
│ 9. StackInfoRenderer ─► Stack traces if requested │
│ 10. format_exc_info ─► Exception formatting │
│ 11. UnicodeDecoder ─► Decode bytes to str │
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Rich Console │ │ JSON File/OTLP │ │
│ │ (dev.ConsoleRenderer)│ │ (JSONRenderer) │ │
│ └──────────┬──────────┘ └──────────┬──────────┘ │
│ │ │ │
│ ▼ ▼ │
│ StreamHandler FileHandler / LogBuffer │
│ (stderr, TTY) (noteflow.log / in-memory) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
```
### Module Structure
```
infrastructure/logging/
├── __init__.py # Public API exports
├── config.py # NEW: configure_logging(), LoggingConfig
├── processors.py # NEW: Custom processors (context, OTEL, LogBuffer)
├── handlers.py # NEW: Dual-output handler configuration
├── structured.py # KEEP: Context variables (minimal changes)
└── log_buffer.py # ADAPT: LogBuffer processor integration
```
---
## Domain Model
### LoggingConfig (New)
```python
@dataclass(frozen=True, slots=True)
class LoggingConfig:
"""Configuration for centralized logging."""
level: str = "INFO"
json_file: Path | None = None # None = no file output
enable_console: bool = True
enable_json_console: bool = False # Force JSON even on TTY
enable_log_buffer: bool = True
enable_otel_context: bool = True
enable_noteflow_context: bool = True
console_colors: bool = True # Auto-detect TTY if None
```
### Custom Processors (New)
```python
def add_noteflow_context(
logger: WrappedLogger,
method_name: str,
event_dict: EventDict,
) -> EventDict:
"""Inject request_id, user_id, workspace_id from context vars."""
ctx = get_logging_context()
for key, value in ctx.items():
if value is not None and key not in event_dict:
event_dict[key] = value
return event_dict
def add_otel_trace_context(
logger: WrappedLogger,
method_name: str,
event_dict: EventDict,
) -> EventDict:
"""Inject OpenTelemetry trace/span IDs if available."""
try:
from opentelemetry import trace
span = trace.get_current_span()
if span.is_recording():
ctx = span.get_span_context()
event_dict["trace_id"] = format(ctx.trace_id, "032x")
event_dict["span_id"] = format(ctx.span_id, "016x")
parent = getattr(span, "parent", None)
if parent:
event_dict["parent_span_id"] = format(parent.span_id, "016x")
except ImportError:
pass
return event_dict
def log_buffer_processor(
logger: WrappedLogger,
method_name: str,
event_dict: EventDict,
) -> EventDict:
"""Feed structured event to LogBuffer for UI streaming."""
buffer = get_log_buffer()
buffer.append(
LogEntry(
timestamp=event_dict.get("timestamp", datetime.now(UTC)),
level=event_dict.get("level", "info"),
source=event_dict.get("logger", ""),
message=event_dict.get("event", ""),
details={k: str(v) for k, v in event_dict.items()
if k not in ("timestamp", "level", "logger", "event")},
trace_id=event_dict.get("trace_id"),
span_id=event_dict.get("span_id"),
)
)
return event_dict
```
---
## Configuration API
### Primary Entry Point
```python
# infrastructure/logging/config.py
def configure_logging(
config: LoggingConfig | None = None,
*,
level: str = "INFO",
json_file: Path | None = None,
) -> None:
"""Configure centralized logging with dual output.
Call once at application startup (e.g., in grpc/server.py main()).
Args:
config: Full configuration object, or use keyword args.
level: Log level (DEBUG, INFO, WARNING, ERROR).
json_file: Optional path for JSON log file.
"""
if config is None:
config = LoggingConfig(level=level, json_file=json_file)
shared_processors = _build_processor_chain(config)
# Configure structlog
structlog.configure(
processors=shared_processors + [
structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
],
wrapper_class=structlog.stdlib.BoundLogger,
logger_factory=structlog.stdlib.LoggerFactory(),
cache_logger_on_first_use=True,
)
# Configure stdlib logging handlers
_configure_handlers(config, shared_processors)
```
### Usage After Migration
```python
# Before (current)
import logging
logger = logging.getLogger(__name__)
logger.info("Processing meeting %s", meeting_id)
# After (migrated)
import structlog
logger = structlog.get_logger()
logger.info("processing_meeting", meeting_id=meeting_id)
# Or with bound context
logger = structlog.get_logger().bind(meeting_id=meeting_id)
logger.info("processing_started")
logger.info("processing_completed", segments=42)
```
---
## Migration Strategy
### Phase 1: Infrastructure (T1-T8)
1. Add `structlog>=24.0` to `pyproject.toml`
2. Create `infrastructure/logging/config.py` with `configure_logging()`
3. Create `infrastructure/logging/processors.py` with custom processors
4. Create `infrastructure/logging/handlers.py` for handler setup
5. Update entry points to call `configure_logging()`
### Phase 2: Automated Migration (T9)
Create AST-based migration script:
```python
# scripts/migrate_logging.py
"""Migrate stdlib logging to structlog.
Transforms:
import logging
logger = logging.getLogger(__name__)
logger.info("message %s", arg)
To:
import structlog
logger = structlog.get_logger()
logger.info("message", arg=arg)
"""
```
### Phase 3: Batch Migration (T10-T12)
| Batch | Files | Priority | Notes |
|-------|-------|----------|-------|
| **Batch 1** | `application/services/*` (12) | High | Core business logic |
| **Batch 2** | `infrastructure/*` (35) | Medium | Infrastructure adapters |
| **Batch 3** | `grpc/*` (24) | High | API layer, interceptors |
### Rollback Strategy
- Keep stdlib logging configured as structlog backend
- If issues arise, revert `configure_logging()` to stdlib-only mode
- Migration is reversible via git; no database changes
---
## Deliverables
### New Files
- [ ] `src/noteflow/infrastructure/logging/config.py`
- [ ] `src/noteflow/infrastructure/logging/processors.py`
- [ ] `src/noteflow/infrastructure/logging/handlers.py`
- [ ] `scripts/migrate_logging.py`
- [ ] `docs/guides/logging.md`
### Modified Files
- [ ] `pyproject.toml` — add structlog dependency
- [ ] `src/noteflow/infrastructure/logging/__init__.py` — export new API
- [ ] `src/noteflow/infrastructure/logging/log_buffer.py` — adapt for structlog
- [ ] `src/noteflow/grpc/server.py` — call `configure_logging()`
- [ ] `src/noteflow/cli/retention.py` — remove `basicConfig`, use structlog
- [ ] `src/noteflow/cli/models.py` — remove `basicConfig`, use structlog
- [ ] 71 files with `logging.getLogger()` — migrate to structlog
### Tests
- [ ] `tests/infrastructure/logging/test_config.py`
- [ ] `tests/infrastructure/logging/test_processors.py`
- [ ] `tests/infrastructure/logging/test_handlers.py`
- [ ] Update existing tests that assert on log output
---
## Test Strategy
### Unit Tests
```python
# tests/infrastructure/logging/test_processors.py
@pytest.fixture
def mock_context_vars(monkeypatch):
"""Set up context variables for testing."""
monkeypatch.setattr("noteflow.infrastructure.logging.structured.request_id_var",
ContextVar("request_id", default="test-req-123"))
# ...
def test_add_noteflow_context_injects_request_id(mock_context_vars):
"""Verify context vars are injected into event dict."""
event_dict = {"event": "test"}
result = add_noteflow_context(None, "info", event_dict)
assert result["request_id"] == "test-req-123"
def test_add_otel_trace_context_graceful_without_otel():
"""Verify processor works when OpenTelemetry not installed."""
event_dict = {"event": "test"}
result = add_otel_trace_context(None, "info", event_dict)
assert "trace_id" not in result # Graceful degradation
@pytest.mark.parametrize("level,expected", [
("DEBUG", True),
("INFO", True),
("WARNING", True),
("ERROR", True),
])
def test_configure_logging_accepts_all_levels(level, expected):
"""Verify all log levels are accepted."""
config = LoggingConfig(level=level)
configure_logging(config)
# Assert no exception raised
```
### Integration Tests
```python
# tests/infrastructure/logging/test_integration.py
@pytest.mark.integration
def test_dual_output_produces_both_formats(tmp_path, capsys):
"""Verify console and JSON outputs are produced simultaneously."""
json_file = tmp_path / "test.log"
configure_logging(LoggingConfig(
level="INFO",
json_file=json_file,
enable_console=True,
))
logger = structlog.get_logger("test")
logger.info("test_event", key="value")
# Verify console output (Rich formatted)
captured = capsys.readouterr()
assert "test_event" in captured.err
# Verify JSON file output
with open(json_file) as f:
log_line = json.loads(f.readline())
assert log_line["event"] == "test_event"
assert log_line["key"] == "value"
```
---
## Quality Gates
### Exit Criteria
- [ ] All 71 files migrated to structlog
- [ ] Dual output working (Rich console + JSON)
- [ ] Context variables auto-injected (request_id, user_id, workspace_id)
- [ ] OTEL trace/span IDs appear in logs when tracing enabled
- [ ] LogBuffer receives structured events for UI streaming
- [ ] No `logging.basicConfig()` calls remain
- [ ] All tests pass
- [ ] `ruff check` and `basedpyright` pass
- [ ] Documentation updated
### Performance Requirements
- Log emission overhead < 10μs per call
- No blocking I/O in hot paths (async file writes)
- Memory-bounded LogBuffer (existing 1000-entry limit)
---
## Dependencies
### New Dependencies
```toml
# pyproject.toml
[project]
dependencies = [
# ... existing ...
"structlog>=24.0",
]
```
### Compatibility Notes
- structlog 24.0+ required for `ProcessorFormatter` improvements
- Rich 14.2.0 already installed (compatible)
- OpenTelemetry integration optional (graceful degradation)
---
## Risks & Mitigations
| Risk | Impact | Mitigation |
|------|--------|------------|
| Migration breaks existing log parsing | Medium | Maintain JSON schema compatibility |
| Performance regression | Low | Benchmark before/after; structlog is fast |
| Rich console conflicts with existing CLI usage | Low | CLI modules already use Rich; test integration |
| OTEL context not propagating | Medium | Integration tests with mock tracer |
---
## References
- [structlog Documentation](https://www.structlog.org/)
- [structlog + Rich Integration](https://www.structlog.org/en/stable/console-output.html)
- [structlog + OTEL](https://www.structlog.org/en/stable/frameworks.html#opentelemetry)
- [Existing LogBuffer Implementation](../src/noteflow/infrastructure/logging/log_buffer.py)

View File

@@ -0,0 +1,744 @@
# Centralized Logging Migration - Agent-Driven Execution Plan
> **Sprint Reference**: `docs/sprints/sprint_logging_centralization.md`
> **Technical Debt**: `docs/triage.md`
> **Quality Gates**: `docs/sprints/QUALITY_STANDARDS.md`
---
## Executive Summary
This plan orchestrates the migration from stdlib `logging` to `structlog` using specialized agents for discovery, validation, and implementation. Each phase is designed for parallel execution where possible.
---
## ⚠️ CRITICAL: Quality Enforcement Rules
### ABSOLUTE PROHIBITIONS
1. **NEVER modify quality test thresholds** - If violations exceed thresholds, FIX THE CODE, not the tests
2. **NEVER claim errors are "preexisting"** without baseline proof - Capture baselines BEFORE any changes
3. **NEVER batch quality checks** - Run `make quality-py` after EVERY file modification
4. **NEVER skip quality gates** to "fix later" - All code must pass before proceeding
### MANDATORY QUALITY CHECKPOINTS
**After EVERY code change** (not "at the end of a phase"):
```bash
# Run after EACH file edit - no exceptions
make quality-py
```
This runs:
- `ruff check .` — Linting (ALL code, not just tests)
- `basedpyright` — Type checking (ALL code, not just tests)
- `pytest tests/quality/ -q` — Code smell detection (ALL code)
### BASELINE CAPTURE (Required Before Phase 2)
```bash
# Capture baseline BEFORE any migration work
make quality-py 2>&1 | tee /tmp/quality_baseline_$(date +%Y%m%d_%H%M%S).log
# Record current threshold violations
pytest tests/quality/ -v --tb=no | grep -E "(PASSED|FAILED|violations)" > /tmp/threshold_baseline.txt
```
Any NEW violations introduced during migration are **agent responsibility** and must be fixed immediately.
### CODE QUALITY STANDARDS (Apply to ALL Code)
These apply to **infrastructure modules, services, gRPC handlers, processors** — not just tests:
| Rule | Applies To | Enforcement |
|------|-----------|-------------|
| No `# type: ignore` | ALL Python code | `basedpyright` |
| No `Any` type | ALL Python code | `basedpyright` |
| Union syntax `str \| None` | ALL Python code | `ruff UP` |
| Module < 500 lines (soft) | ALL modules | `tests/quality/test_code_smells.py` |
| Module < 750 lines (hard) | ALL modules | `tests/quality/test_code_smells.py` |
| Function < 75 lines | ALL functions | `tests/quality/test_code_smells.py` |
| Complexity < 15 | ALL functions | `tests/quality/test_code_smells.py` |
| Parameters ≤ 7 | ALL functions | `tests/quality/test_code_smells.py` |
| No magic numbers > 100 | ALL code | `tests/quality/test_magic_values.py` |
| No hardcoded paths | ALL code | `tests/quality/test_magic_values.py` |
| No repeated string literals | ALL code | `tests/quality/test_magic_values.py` |
| No stale TODO/FIXME | ALL code | `tests/quality/test_stale_code.py` |
| No commented-out code | ALL code | `tests/quality/test_stale_code.py` |
---
## Phase 0: Pre-Flight Validation
### Agent Task: Dependency Audit
| Agent | Purpose | Deliverable |
|-------|---------|-------------|
| `Explore` | Verify structlog compatibility with existing Rich usage | Compatibility report |
| `Explore` | Locate all `logging.basicConfig()` calls | File list with line numbers |
| `Explore` | Find LogBuffer integration points | Integration map |
**Commands to validate:**
```bash
# Verify Rich is installed
python -c "import rich; print(rich.__version__)"
# Dry-run structlog install
uv pip install --dry-run "structlog>=24.0"
```
---
## Phase 1: Discovery & Target Mapping
### 1.1 Agent: Locate All Logging Usage
**Objective**: Build comprehensive map of all 71+ files using stdlib logging.
**Agent Type**: `Explore` (thorough mode)
**Queries**:
1. "Find all files with `import logging` in src/noteflow/"
2. "Find all `logging.getLogger(__name__)` patterns"
3. "Find all `logger.info/debug/warning/error/exception` calls with their argument patterns"
4. "Identify %-style formatting vs f-string usage in log calls"
**Expected Output Structure**:
```yaml
discovery:
files_with_logging: 71
patterns:
module_logger: 68 # logger = logging.getLogger(__name__)
basic_config: 2 # logging.basicConfig()
percent_style: 45 # logger.info("msg %s", arg)
fstring_style: 23 # logger.info(f"msg {arg}")
exception_calls: 12 # logger.exception()
by_layer:
application: 12
infrastructure: 35
grpc: 24
```
### 1.2 Agent: Map Critical Logging Gaps (from triage.md)
**Agent Type**: `Explore` (thorough mode)
**Objective**: Validate each issue in triage.md still exists and capture exact locations.
**Target Categories**:
| Category | File Pattern | Agent Query |
|----------|--------------|-------------|
| Network/External | `*_provider.py`, `*_adapter.py` | "Find async HTTP calls without timing logs" |
| Blocking Ops | `*_engine.py`, `*_service.py` | "Find `run_in_executor` calls without duration logging" |
| Silent Failures | `repositories/*.py` | "Find try/except blocks that return None without logging" |
| State Transitions | `_mixins/*.py` | "Find state assignments without transition logs" |
| DB Operations | `repositories/*.py`, `unit_of_work.py` | "Find commit/rollback without logging" |
### 1.3 Agent: Context Variable Analysis
**Agent Type**: `feature-dev:code-explorer`
**Objective**: Trace `get_logging_context()` usage for processor design.
**Tasks**:
1. Find all `request_id_var`, `user_id_var`, `workspace_id_var` usages
2. Map where context is SET vs where it's READ
3. Identify any gaps in context propagation
---
## Phase 2: Infrastructure Implementation
### 2.1 Create Core Configuration Module
**File**: `src/noteflow/infrastructure/logging/config.py`
**Agent Type**: `feature-dev:code-architect`
**Design Constraints** (from QUALITY_STANDARDS.md):
- No `Any` types
- No `# type: ignore` without justification
- All public functions must have return type annotations
- Docstrings written imperatively
**Implementation Spec**:
```python
"""Centralized logging configuration with dual output.
Configures structlog with Rich console + JSON file output.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import TYPE_CHECKING
import structlog
if TYPE_CHECKING:
from collections.abc import Sequence
from structlog.typing import Processor
@dataclass(frozen=True, slots=True)
class LoggingConfig:
"""Configuration for centralized logging."""
level: str = "INFO"
json_file: Path | None = None
enable_console: bool = True
enable_json_console: bool = False
enable_log_buffer: bool = True
enable_otel_context: bool = True
enable_noteflow_context: bool = True
console_colors: bool = True
def configure_logging(
config: LoggingConfig | None = None,
*,
level: str = "INFO",
json_file: Path | None = None,
) -> None:
"""Configure centralized logging with dual output.
Call once at application startup.
Args:
config: Full configuration object, or use keyword args.
level: Log level (DEBUG, INFO, WARNING, ERROR).
json_file: Optional path for JSON log file.
"""
...
```
### 2.2 Create Custom Processors Module
**File**: `src/noteflow/infrastructure/logging/processors.py`
**Agent Type**: `feature-dev:code-architect`
**Processors to Implement**:
| Processor | Source | Purpose |
|-----------|--------|---------|
| `add_noteflow_context` | New | Inject request_id, user_id, workspace_id |
| `add_otel_trace_context` | Adapt from `log_buffer.py:132-156` | Inject trace_id, span_id |
| `log_buffer_processor` | New | Feed events to existing LogBuffer |
**Quality Requirements**:
- Each processor must be a pure function
- Must handle missing context gracefully (no exceptions)
- Must include type annotations for all parameters
### 2.3 Create Handlers Module
**File**: `src/noteflow/infrastructure/logging/handlers.py`
**Agent Type**: `feature-dev:code-architect`
**Responsibilities**:
- Configure Rich ConsoleRenderer for TTY
- Configure JSONRenderer for file/OTEL
- Wire both to stdlib logging handlers
### 2.4 Adapt LogBuffer
**File**: `src/noteflow/infrastructure/logging/log_buffer.py`
**Agent Type**: `feature-dev:code-reviewer` (review current implementation first)
**Changes Required**:
1. Create structlog processor that feeds LogBuffer
2. Convert `LogEntry` creation to use structlog event_dict
3. Preserve existing `_get_current_trace_context()` logic
---
## Phase 3: Entry Point Integration
### 3.1 Agent: Locate Entry Points
**Agent Type**: `Explore`
**Query**: "Find all main() functions and startup initialization in src/noteflow/"
**Expected Entry Points**:
- `src/noteflow/grpc/server.py` - Main server
- `src/noteflow/cli/retention.py` - CLI tool
- `src/noteflow/cli/models.py` - CLI tool
### 3.2 Integration Tasks
| File | Change | Validation |
|------|--------|------------|
| `grpc/server.py` | Add `configure_logging()` call before server start | Server logs in both formats |
| `cli/retention.py` | Remove `basicConfig()`, add `configure_logging()` | CLI logs correctly |
| `cli/models.py` | Remove `basicConfig()`, add `configure_logging()` | CLI logs correctly |
---
## Phase 4: Automated Migration Script
### 4.1 Migration Script Design
**File**: `scripts/migrate_logging.py`
**Agent Type**: `feature-dev:code-architect`
**Transformations**:
```python
# Transform 1: Import statement
# Before: import logging
# After: import structlog
# Transform 2: Logger creation
# Before: logger = logging.getLogger(__name__)
# After: logger = structlog.get_logger()
# Transform 3: %-style formatting
# Before: logger.info("Processing %s for %s", item_id, user_id)
# After: logger.info("processing", item_id=item_id, user_id=user_id)
# Transform 4: Exception logging
# Before: logger.exception("Failed to process")
# After: logger.exception("processing_failed")
```
**Quality Requirements**:
- Must preserve semantic meaning
- Must handle all patterns found in Phase 1
- Must be idempotent (safe to run multiple times)
- Must generate report of changes
### 4.2 Validation Agent
**Agent Type**: `agent-code-quality`
**Post-Migration Checks**:
1. Run `ruff check` on migrated files
2. Run `basedpyright` on migrated files
3. Verify no `import logging` remains (except stdlib bridge)
4. Verify all `logger.` calls use keyword arguments
---
## Phase 5: Batch Migration Execution
### 5.1 Batch 1: Application Services (12 files)
**Agent Type**: `agent-python-executor`
**Files** (to be confirmed by discovery agent):
```
src/noteflow/application/services/
├── meeting_service.py
├── recovery_service.py
├── export_service.py
├── summarization_service.py
├── trigger_service.py
├── webhook_service.py
├── calendar_service.py
├── retention_service.py
├── ner_service.py
└── ...
```
**⚠️ Execution Strategy (PER-FILE, NOT PER-BATCH)**:
```bash
# For EACH file in the batch:
# 1. Migrate ONE file
python scripts/migrate_logging.py src/noteflow/application/services/meeting_service.py
# 2. IMMEDIATELY run quality check
make quality-py
# 3. If NEW violations introduced:
# - FIX THEM NOW
# - Re-run make quality-py
# - Do NOT proceed until clean
# 4. Only then migrate next file
python scripts/migrate_logging.py src/noteflow/application/services/recovery_service.py
make quality-py
# ... repeat for each file
# 5. After ALL files in batch pass individually:
pytest tests/application/ -v
```
**PROHIBITED**: Running migration script on entire batch then checking quality once
### 5.2 Batch 2: Infrastructure (35 files)
**Agent Type**: `agent-python-executor`
**Subdirectories**:
- `audio/` - capture, writer, playback
- `asr/` - engine, segmenter
- `diarization/` - engine, session
- `summarization/` - providers, parsing
- `persistence/` - database, repositories, unit_of_work
- `triggers/` - calendar, audio, app
- `webhooks/` - executor
- `calendar/` - adapters, oauth
- `ner/` - engine
- `export/` - markdown, html, pdf
- `security/` - keystore
- `observability/` - otel
**⚠️ Same per-file workflow as Batch 1:**
```bash
# For EACH of the 35 files:
# 1. Migrate ONE file
# 2. make quality-py
# 3. Fix any NEW violations
# 4. Proceed only when clean
```
### 5.3 Batch 3: gRPC Layer (24 files)
**Agent Type**: `agent-python-executor`
**Components**:
- `server.py`, `service.py`, `client.py`
- `_mixins/` - all mixins
- `interceptors/` - identity interceptor
- `_client_mixins/` - client mixins
**⚠️ Same per-file workflow as Batch 1:**
```bash
# For EACH of the 24 files:
# 1. Migrate ONE file
# 2. make quality-py
# 3. Fix any NEW violations
# 4. Proceed only when clean
```
### Migration Abort Conditions
**STOP IMMEDIATELY if any of these occur:**
1. **Threshold modification detected** - Any change to `tests/quality/*.py` threshold values
2. **Cumulative violations > 5** - Too many unfixed violations accumulating
3. **Type errors without fix** - `basedpyright` errors not immediately addressed
4. **Baseline not captured** - Starting migration without `/tmp/quality_baseline.log`
**Recovery**: Revert all changes since last known-good state, re-capture baseline, restart
---
## Phase 6: Test Updates & Validation
### 6.1 Agent: Update Test Fixtures
**Agent Type**: `agent-testing-architect`
**Tasks**:
1. Create `tests/infrastructure/logging/conftest.py` with shared fixtures
2. Update any tests asserting on log output
3. Add integration tests for dual output
**Fixture Requirements** (per QUALITY_STANDARDS.md):
```python
@pytest.fixture
def logging_config() -> LoggingConfig:
"""Provide test logging configuration."""
return LoggingConfig(
level="DEBUG",
enable_console=False, # Suppress console in tests
enable_log_buffer=True,
)
```
### 6.2 Agent: Write Unit Tests
**Agent Type**: `agent-testing-architect`
**Test Files to Create**:
- `tests/infrastructure/logging/test_config.py`
- `tests/infrastructure/logging/test_processors.py`
- `tests/infrastructure/logging/test_handlers.py`
**Test Requirements** (per QUALITY_STANDARDS.md):
- No loops in tests
- No conditionals in tests
- Use `pytest.mark.parametrize` for multiple cases
- Use `pytest.param` with descriptive IDs
- All fixtures must have type hints
### 6.3 Quality Gate Execution
**Commands**:
```bash
# Run quality checks
pytest tests/quality/ -v
# Run new logging tests
pytest tests/infrastructure/logging/ -v
# Run full test suite
pytest -m "not integration" -v
# Type checking
basedpyright src/noteflow/infrastructure/logging/
```
---
## Phase 7: Documentation & Cleanup
### 7.1 Documentation Updates
| File | Change |
|------|--------|
| `CLAUDE.md` | Add logging configuration section |
| `docs/guides/logging.md` | Create usage guide (NEW) |
| `docs/triage.md` | Mark resolved issues |
### 7.2 Cleanup Tasks
**Agent Type**: `Explore`
**Verification Queries**:
1. "Confirm no `logging.basicConfig()` calls remain"
2. "Confirm no `logging.getLogger(__name__)` patterns remain"
3. "Confirm all files use `structlog.get_logger()`"
---
## Execution Order & Dependencies
```mermaid
graph TD
P0[Phase 0: Pre-Flight] --> P1[Phase 1: Discovery]
P1 --> P2[Phase 2: Infrastructure]
P2 --> P3[Phase 3: Entry Points]
P3 --> P4[Phase 4: Migration Script]
P4 --> P5A[Phase 5.1: App Services]
P5A --> P5B[Phase 5.2: Infrastructure]
P5B --> P5C[Phase 5.3: gRPC]
P5C --> P6[Phase 6: Tests]
P6 --> P7[Phase 7: Docs]
```
**Parallelization Opportunities**:
- Phase 2 modules (config.py, processors.py, handlers.py) can be developed in parallel
- Batch migrations can be parallelized across different directories
- Test writing can happen in parallel with Phase 5 batches
---
## Agent Orchestration Protocol
### MANDATORY: Quality Gate After Every Edit
**Every agent MUST run this after each file modification:**
```bash
# IMMEDIATE - after every file edit
make quality-py
# If ANY failure:
# 1. FIX THE VIOLATION IMMEDIATELY
# 2. Do NOT proceed to next file
# 3. Do NOT claim "preexisting" without baseline proof
```
### Agent Workflow Pattern
```
┌─────────────────────────────────────────────────────────────┐
│ FOR EACH FILE MODIFICATION: │
│ │
│ 1. Read current file state │
│ 2. Make edit │
│ 3. Run: make quality-py │
│ 4. IF FAIL → Fix immediately, go to step 3 │
│ 5. IF PASS → Proceed to next edit │
│ │
│ NEVER: Skip step 3-4 │
│ NEVER: Batch multiple edits before quality check │
│ NEVER: Change threshold values in tests/quality/ │
└─────────────────────────────────────────────────────────────┘
```
### Discovery Phase
```bash
# Capture baseline FIRST
make quality-py 2>&1 | tee /tmp/quality_baseline.log
# Launch exploration agents in parallel
# Agent: Explore (thorough)
# Queries:
# - "Find all files with 'import logging' in src/noteflow/"
# - "Map logging patterns by file type"
# - "Find silent error handlers returning None"
```
### Implementation Phase (Per-File Quality Gates)
```bash
# For EACH new file created:
# Step 1: Create file
# Step 2: IMMEDIATELY run quality check
make quality-py
# Step 3: If violations, fix before creating next file
# Step 4: Only proceed when clean
# Example sequence for config.py:
# - Write config.py
# - make quality-py ← MUST PASS
# - Write processors.py
# - make quality-py ← MUST PASS
# - Write handlers.py
# - make quality-py ← MUST PASS
```
### Migration Phase (Per-File Quality Gates)
```bash
# For EACH migrated file:
# Step 1: Migrate single file
# Step 2: IMMEDIATELY run quality check
make quality-py
# Step 3: If new violations introduced, fix before next file
# Compare against baseline to identify NEW vs preexisting
# Example: Migrating meeting_service.py
# - Edit meeting_service.py (logging → structlog)
# - make quality-py
# - If new violations: FIX THEM
# - Only then proceed to next service file
```
### Continuous Validation Commands
```bash
# Run continuously during development
watch -n 30 'make quality-py'
# Or after each save (if using editor hooks)
# VSCode: tasks.json with "runOn": "save"
```
---
## Risk Mitigation
| Risk | Mitigation | Agent Responsibility |
|------|------------|---------------------|
| Migration breaks existing log parsing | Maintain JSON schema compatibility | `agent-code-quality` |
| Rich console conflicts | Test CLI integration early | `Explore` |
| OTEL context not propagating | Integration tests with mock tracer | `agent-testing-architect` |
| Performance regression | Benchmark before/after | `agent-feasibility` |
---
## Success Criteria
- [ ] All 71 files migrated to structlog
- [ ] Zero `logging.basicConfig()` calls remain
- [ ] Zero `logging.getLogger(__name__)` patterns remain
- [ ] Dual output working (Rich console + JSON)
- [ ] Context variables auto-injected
- [ ] OTEL trace/span IDs appear when tracing enabled
- [ ] LogBuffer receives structured events
- [ ] All quality checks pass (`pytest tests/quality/`)
- [ ] All type checks pass (`basedpyright`)
- [ ] Documentation updated
---
## Appendix A: Quality Compliance Checklist
Per `docs/sprints/QUALITY_STANDARDS.md`:
### 🚫 PROHIBITED ACTIONS (Violation = Immediate Rollback)
- [ ] **NEVER** modify threshold values in `tests/quality/*.py`
- [ ] **NEVER** add `# type: ignore` without explicit user approval
- [ ] **NEVER** use `Any` type
- [ ] **NEVER** skip `make quality-py` after a file edit
- [ ] **NEVER** blame "preexisting issues" without baseline comparison
### ALL CODE Requirements (Not Just Tests)
These apply to `config.py`, `processors.py`, `handlers.py`, AND all migrated files:
| Requirement | Check Command | Applies To |
|-------------|---------------|------------|
| No `# type: ignore` | `basedpyright` | ALL `.py` files |
| No `Any` types | `basedpyright` | ALL `.py` files |
| Union syntax `X \| None` | `ruff check` | ALL `.py` files |
| Module < 500 lines | `pytest tests/quality/` | ALL modules |
| Function < 75 lines | `pytest tests/quality/` | ALL functions |
| Complexity < 15 | `pytest tests/quality/` | ALL functions |
| Parameters ≤ 7 | `pytest tests/quality/` | ALL functions |
| No magic numbers | `pytest tests/quality/` | ALL code |
| No hardcoded paths | `pytest tests/quality/` | ALL code |
| Docstrings imperative | Manual review | ALL public APIs |
### Test-Specific Requirements
These apply ONLY to test files in `tests/`:
- [ ] No loops around assertions
- [ ] No conditionals around assertions
- [ ] `pytest.mark.parametrize` for multiple cases
- [ ] `pytest.raises` with `match=` parameter
- [ ] All fixtures have type hints
- [ ] Fixtures in conftest.py (not duplicated)
### Per-Edit Verification Workflow
```bash
# After EVERY edit (not batched):
make quality-py
# Expected output for clean code:
# === Ruff (Python Lint) ===
# All checks passed!
# === Basedpyright ===
# 0 errors, 0 warnings, 0 informations
# === Python Test Quality ===
# XX passed in X.XXs
# If ANY failure: FIX IMMEDIATELY before next edit
```
---
## Appendix B: Threshold Values (READ-ONLY Reference)
**⚠️ These values are READ-ONLY. Agents MUST NOT modify them.**
From `tests/quality/test_code_smells.py`:
```python
# DO NOT CHANGE THESE VALUES
MODULE_SOFT_LIMIT = 500
MODULE_HARD_LIMIT = 750
FUNCTION_LINE_LIMIT = 75
COMPLEXITY_LIMIT = 15
PARAMETER_LIMIT = 7
```
From `tests/quality/test_magic_values.py`:
```python
# DO NOT CHANGE THESE VALUES
MAX_MAGIC_NUMBERS = 10
MAX_REPEATED_STRINGS = 30
MAX_HARDCODED_PATHS = 0
```
If your code exceeds these limits, **refactor the code**, not the thresholds.

View File

@@ -0,0 +1,904 @@
# Sprint: Quality Suite Hardening
## Overview
**Goal**: Transform the quality test suite from threshold-based enforcement to baseline-based enforcement, fix detection holes, and add self-tests to prevent regression.
**Priority**: High - Quality gates are the primary defense against technical debt creep
**Estimated Effort**: Medium (2-3 days)
## Problem Statement
The current quality suite has several weaknesses that make it "gameable" or prone to silent failures:
1. **Threshold-Based Enforcement**: Using `max_allowed = N` caps that drift over time
2. **Silent Parse Failures**: `except SyntaxError: continue` hides unparseable files
3. **Detection Holes**: Some rules are compiled but not applied (skipif), or have logic bugs (hardcoded paths)
4. **Allowlist Maintenance Sink**: Magic values allowlists will grow unbounded
5. **Inconsistent File Discovery**: Multiple `find_python_files()` implementations with different excludes
6. **No Self-Tests**: Quality detectors can silently degrade
## Proposed Architecture
### 1. Baseline-Based Enforcement System
Replace all `assert len(violations) <= N` with baseline comparison:
```
tests/quality/
├── __init__.py
├── _baseline.py # NEW: Baseline loading and comparison
├── baselines.json # NEW: Frozen violation snapshots
├── test_baseline_self.py # NEW: Self-tests for baseline system
├── test_code_smells.py # MODIFIED: Use baseline enforcement
├── test_stale_code.py # MODIFIED: Use baseline enforcement
├── test_test_smells.py # MODIFIED: Use baseline enforcement
├── test_magic_values.py # MODIFIED: Use baseline enforcement
├── test_duplicate_code.py # MODIFIED: Use baseline enforcement
├── test_unnecessary_wrappers.py # MODIFIED: Use baseline enforcement
├── test_decentralized_helpers.py # MODIFIED: Use baseline enforcement
└── _helpers.py # NEW: Centralized file discovery
```
### 2. Stable Violation IDs
Violation IDs must be stable across refactors (avoid line-number-only IDs):
| Rule Category | ID Format |
|---------------|-----------|
| Function-level | `rule|relative_path|function_name` |
| Class-level | `rule|relative_path|class_name` |
| Line-level | `rule|relative_path|content_hash` |
| Wrapper | `thin_wrapper|relative_path|function_name|wrapped_call` |
### 3. Baseline JSON Structure
```json
{
"schema_version": 1,
"generated_at": "2025-12-31T00:00:00Z",
"rules": {
"high_complexity": [
"src/noteflow/infrastructure/summarization/_parsing.py|parse_llm_response",
"src/noteflow/grpc/_mixins/streaming/_mixin.py|StreamTranscription"
],
"thin_wrapper": [
"src/noteflow/config/settings.py|get_settings|_load_settings"
],
"stale_todo": [
"src/noteflow/grpc/service.py|hash:abc123"
]
}
}
```
## Implementation Plan
### Phase 1: Foundation (Day 1)
#### Task 1.1: Create Baseline Infrastructure
**File**: `tests/quality/_baseline.py`
```python
"""Baseline-based quality enforcement infrastructure.
This module provides the foundation for "no new debt" quality gates.
Instead of allowing N violations, we compare against a frozen baseline
of existing violations. Any new violation fails immediately.
"""
from __future__ import annotations
import hashlib
import json
from dataclasses import dataclass
from datetime import datetime, timezone
from pathlib import Path
BASELINE_PATH = Path(__file__).parent / "baselines.json"
SCHEMA_VERSION = 1
@dataclass(frozen=True)
class Violation:
"""Represents a quality rule violation with stable identity."""
rule: str
relative_path: str
identifier: str # function/class name or content hash
detail: str = "" # optional detail (wrapped call, metric value, etc.)
@property
def stable_id(self) -> str:
"""Generate stable ID for baseline comparison."""
parts = [self.rule, self.relative_path, self.identifier]
if self.detail:
parts.append(self.detail)
return "|".join(parts)
def __str__(self) -> str:
"""Human-readable representation."""
return f"{self.relative_path}:{self.identifier} [{self.rule}]"
@dataclass
class BaselineResult:
"""Result of baseline comparison."""
new_violations: list[Violation]
fixed_violations: list[str] # IDs that were in baseline but not found
current_count: int
baseline_count: int
@property
def passed(self) -> bool:
"""True if no new violations introduced."""
return len(self.new_violations) == 0
def load_baseline() -> dict[str, set[str]]:
"""Load baseline violations from JSON file."""
if not BASELINE_PATH.exists():
return {}
data = json.loads(BASELINE_PATH.read_text(encoding="utf-8"))
# Version check
if data.get("schema_version", 0) != SCHEMA_VERSION:
raise ValueError(
f"Baseline schema version mismatch: "
f"expected {SCHEMA_VERSION}, got {data.get('schema_version')}"
)
return {rule: set(ids) for rule, ids in data.get("rules", {}).items()}
def save_baseline(violations_by_rule: dict[str, list[Violation]]) -> None:
"""Save current violations as new baseline.
This should only be called manually when intentionally updating the baseline.
"""
data = {
"schema_version": SCHEMA_VERSION,
"generated_at": datetime.now(timezone.utc).isoformat(),
"rules": {
rule: sorted(v.stable_id for v in violations)
for rule, violations in violations_by_rule.items()
}
}
BASELINE_PATH.write_text(
json.dumps(data, indent=2, sort_keys=True) + "\n",
encoding="utf-8"
)
def assert_no_new_violations(
rule: str,
current_violations: list[Violation],
*,
max_new_allowed: int = 0,
) -> BaselineResult:
"""Assert no new violations beyond the frozen baseline.
Args:
rule: The rule name (e.g., "high_complexity", "thin_wrapper")
current_violations: List of violations found in current scan
max_new_allowed: Allow up to N new violations (default 0)
Returns:
BaselineResult with comparison details
Raises:
AssertionError: If new violations exceed max_new_allowed
"""
baseline = load_baseline()
allowed_ids = baseline.get(rule, set())
current_ids = {v.stable_id for v in current_violations}
new_ids = current_ids - allowed_ids
fixed_ids = allowed_ids - current_ids
new_violations = [v for v in current_violations if v.stable_id in new_ids]
result = BaselineResult(
new_violations=sorted(new_violations, key=lambda v: v.stable_id),
fixed_violations=sorted(fixed_ids),
current_count=len(current_violations),
baseline_count=len(allowed_ids),
)
if len(new_violations) > max_new_allowed:
message_parts = [
f"[{rule}] {len(new_violations)} NEW violations introduced "
f"(baseline: {len(allowed_ids)}, current: {len(current_violations)}):",
]
for v in new_violations[:20]:
message_parts.append(f" + {v}")
if fixed_ids:
message_parts.append(f"\nFixed {len(fixed_ids)} violations (can update baseline):")
for fid in list(fixed_ids)[:5]:
message_parts.append(f" - {fid}")
raise AssertionError("\n".join(message_parts))
return result
def content_hash(content: str, length: int = 8) -> str:
"""Generate short hash of content for stable line-level IDs."""
return hashlib.sha256(content.encode()).hexdigest()[:length]
```
#### Task 1.2: Create Centralized File Discovery
**File**: `tests/quality/_helpers.py`
```python
"""Centralized helpers for quality tests.
All quality tests should use these helpers to ensure consistent
file discovery and avoid gaps in coverage.
"""
from __future__ import annotations
import ast
from pathlib import Path
# Root paths
PROJECT_ROOT = Path(__file__).parent.parent.parent
SRC_ROOT = PROJECT_ROOT / "src" / "noteflow"
TESTS_ROOT = PROJECT_ROOT / "tests"
# Excluded patterns (generated code)
GENERATED_PATTERNS = {"*_pb2.py", "*_pb2_grpc.py", "*_pb2.pyi"}
# Excluded directories
EXCLUDED_DIRS = {".venv", "__pycache__", "node_modules", ".git"}
def find_source_files(
root: Path = SRC_ROOT,
*,
include_tests: bool = False,
include_conftest: bool = False,
include_migrations: bool = False,
include_quality: bool = False,
) -> list[Path]:
"""Find Python source files with consistent exclusions.
Args:
root: Root directory to search
include_tests: Include test files (test_*.py)
include_conftest: Include conftest.py files
include_migrations: Include Alembic migration files
include_quality: Include tests/quality/ files
Returns:
List of Path objects for matching files
"""
files: list[Path] = []
for py_file in root.rglob("*.py"):
# Skip excluded directories
if any(d in py_file.parts for d in EXCLUDED_DIRS):
continue
# Skip generated files
if any(py_file.match(p) for p in GENERATED_PATTERNS):
continue
# Skip conftest unless included
if not include_conftest and py_file.name == "conftest.py":
continue
# Skip migrations unless included
if not include_migrations and "migrations" in py_file.parts:
continue
# Skip tests unless included
if not include_tests and "tests" in py_file.parts:
continue
# Skip quality tests unless included (prevents recursion)
if not include_quality and "quality" in py_file.parts:
continue
files.append(py_file)
return sorted(files)
def find_test_files(
root: Path = TESTS_ROOT,
*,
include_quality: bool = False,
) -> list[Path]:
"""Find test files with consistent exclusions.
Args:
root: Root directory to search
include_quality: Include tests/quality/ files
Returns:
List of test file paths
"""
files: list[Path] = []
for py_file in root.rglob("test_*.py"):
# Skip excluded directories
if any(d in py_file.parts for d in EXCLUDED_DIRS):
continue
# Skip quality tests unless included
if not include_quality and "quality" in py_file.parts:
continue
files.append(py_file)
return sorted(files)
def parse_file_safe(file_path: Path) -> tuple[ast.AST | None, str | None]:
"""Parse a Python file, returning AST or error message.
Unlike bare `ast.parse`, this never silently fails.
Returns:
(ast, None) on success
(None, error_message) on failure
"""
try:
source = file_path.read_text(encoding="utf-8")
tree = ast.parse(source)
return tree, None
except SyntaxError as e:
return None, f"{file_path}: SyntaxError at line {e.lineno}: {e.msg}"
except Exception as e:
return None, f"{file_path}: {type(e).__name__}: {e}"
def relative_path(file_path: Path, root: Path = SRC_ROOT) -> str:
"""Get path relative to project root for stable IDs."""
try:
return str(file_path.relative_to(PROJECT_ROOT))
except ValueError:
return str(file_path)
```
#### Task 1.3: Create Self-Tests for Quality Infrastructure
**File**: `tests/quality/test_baseline_self.py`
```python
"""Self-tests for quality infrastructure.
These tests ensure the quality detectors themselves work correctly.
This prevents the quality suite from silently degrading.
"""
from __future__ import annotations
import ast
import tempfile
from pathlib import Path
import pytest
from tests.quality._baseline import (
Violation,
assert_no_new_violations,
content_hash,
load_baseline,
)
from tests.quality._helpers import parse_file_safe
class TestParseFileSafe:
"""Tests for safe file parsing."""
def test_valid_python_parses(self, tmp_path: Path) -> None:
"""Valid Python code should parse successfully."""
file = tmp_path / "valid.py"
file.write_text("def foo(): pass\n")
tree, error = parse_file_safe(file)
assert tree is not None
assert error is None
def test_syntax_error_returns_message(self, tmp_path: Path) -> None:
"""Syntax errors should return descriptive message, not raise."""
file = tmp_path / "invalid.py"
file.write_text("def foo(\n") # Incomplete
tree, error = parse_file_safe(file)
assert tree is None
assert error is not None
assert "SyntaxError" in error
class TestViolation:
"""Tests for Violation dataclass."""
def test_stable_id_format(self) -> None:
"""Stable ID should include all components."""
v = Violation(
rule="thin_wrapper",
relative_path="src/foo.py",
identifier="my_func",
detail="wrapped_call",
)
assert v.stable_id == "thin_wrapper|src/foo.py|my_func|wrapped_call"
def test_stable_id_without_detail(self) -> None:
"""Stable ID should work without detail."""
v = Violation(
rule="high_complexity",
relative_path="src/bar.py",
identifier="complex_func",
)
assert v.stable_id == "high_complexity|src/bar.py|complex_func"
class TestContentHash:
"""Tests for content hashing."""
def test_same_content_same_hash(self) -> None:
"""Same content should produce same hash."""
content = "# TODO: fix this"
assert content_hash(content) == content_hash(content)
def test_different_content_different_hash(self) -> None:
"""Different content should produce different hash."""
assert content_hash("foo") != content_hash("bar")
# =============================================================================
# Detector Self-Tests
# =============================================================================
class TestSkipifDetection:
"""Self-tests for skipif detection (prevents the hole we found)."""
def test_detects_skip_without_reason(self) -> None:
"""Should detect @pytest.mark.skip without reason."""
code = '''
@pytest.mark.skip
def test_something():
pass
'''
# This is what the detector should catch
import re
skip_pattern = re.compile(r"@pytest\.mark\.skip\s*(?:\(\s*\))?$", re.MULTILINE)
matches = skip_pattern.findall(code)
assert len(matches) == 1
def test_detects_skip_with_empty_parens(self) -> None:
"""Should detect @pytest.mark.skip() with empty parens."""
code = "@pytest.mark.skip()\ndef test_foo(): pass"
import re
skip_pattern = re.compile(r"@pytest\.mark\.skip\s*(?:\(\s*\))?$", re.MULTILINE)
assert skip_pattern.search(code) is not None
def test_detects_skipif_without_reason(self) -> None:
"""Should detect @pytest.mark.skipif without reason keyword."""
code = '@pytest.mark.skipif(sys.platform == "win32")\ndef test_foo(): pass'
import re
# This pattern should match skipif without reason=
skipif_pattern = re.compile(
r"@pytest\.mark\.skipif\s*\([^)]*\)(?!\s*#.*reason)",
re.MULTILINE
)
# The current code compiles but doesn't use this pattern - this is the bug!
# The test validates what SHOULD happen
match = skipif_pattern.search(code)
# We expect to find it (without reason=)
assert match is not None
# But if reason= is present, we shouldn't match
code_with_reason = '@pytest.mark.skipif(sys.platform == "win32", reason="Windows")'
assert "reason=" in code_with_reason
class TestHardcodedPathDetection:
"""Self-tests for hardcoded path detection (fixes the split bug)."""
def test_detects_home_path(self) -> None:
"""Should detect /home/user paths."""
import re
pattern = r'["\']\/(?:home|usr|var|etc|opt|tmp)\/\w+'
line = 'PATH = "/home/user/data"'
assert re.search(pattern, line) is not None
def test_ignores_path_in_comment(self) -> None:
"""Should ignore paths that appear after # comment."""
import re
pattern = r'["\']\/(?:home|usr|var|etc|opt|tmp)\/\w+'
line = '# Example: PATH = "/home/user/data"'
match = re.search(pattern, line)
if match:
# The bug: line.split(pattern) doesn't work because pattern is regex
# This is the CORRECT check:
comment_pos = line.find("#")
if comment_pos != -1 and comment_pos < match.start():
# Path is in comment, should be ignored
assert True
else:
# Path is NOT in comment, should be flagged
assert False, "Should have been ignored"
def test_detects_path_with_inline_comment_after(self) -> None:
"""Path before inline comment should still be detected."""
import re
pattern = r'["\']\/(?:home|usr|var|etc|opt|tmp)\/\w+'
line = 'PATH = "/home/user/thing" # legit comment'
match = re.search(pattern, line)
assert match is not None
# Comment is AFTER the match, so this should be flagged
comment_pos = line.find("#")
assert comment_pos > match.start(), "Comment should be after the path"
class TestThinWrapperDetection:
"""Self-tests for thin wrapper detection."""
def test_detects_simple_passthrough(self) -> None:
"""Should detect simple return-only wrappers."""
code = '''
def wrapper():
return wrapped()
'''
tree = ast.parse(code)
func = tree.body[0]
assert isinstance(func, ast.FunctionDef)
# The body has one statement (Return with Call)
assert len(func.body) == 1
stmt = func.body[0]
assert isinstance(stmt, ast.Return)
assert isinstance(stmt.value, ast.Call)
def test_detects_await_passthrough(self) -> None:
"""Should detect async return await wrappers."""
code = '''
async def wrapper():
return await wrapped()
'''
tree = ast.parse(code)
func = tree.body[0]
assert isinstance(func, ast.AsyncFunctionDef)
stmt = func.body[0]
assert isinstance(stmt, ast.Return)
# The value is Await wrapping Call
assert isinstance(stmt.value, ast.Await)
assert isinstance(stmt.value.value, ast.Call)
def test_ignores_wrapper_with_logic(self) -> None:
"""Should ignore wrappers that add logic."""
code = '''
def wrapper(x):
if x:
return wrapped()
return None
'''
tree = ast.parse(code)
func = tree.body[0]
# Multiple statements = not a thin wrapper
assert len(func.body) > 1
```
### Phase 2: Fix Detection Holes (Day 1-2)
#### Task 2.1: Fix skipif Detection Bug
**File**: `tests/quality/test_test_smells.py`
The current code compiles the skipif pattern but never uses it:
```python
# CURRENT (broken):
skip_pattern = re.compile(r"@pytest\.mark\.skip\s*(?:\(\s*\))?$", re.MULTILINE)
re.compile( # <-- compiled but NOT assigned!
r"@pytest\.mark\.skipif\s*\([^)]*\)\s*$", re.MULTILINE
)
# FIXED:
skip_pattern = re.compile(r"@pytest\.mark\.skip\s*(?:\(\s*\))?$", re.MULTILINE)
skipif_pattern = re.compile(
r"@pytest\.mark\.skipif\s*\([^)]*\)(?!\s*,\s*reason=)",
re.MULTILINE
)
```
Then use both patterns in the detection loop.
#### Task 2.2: Fix Hardcoded Path Detection Bug
**File**: `tests/quality/test_magic_values.py`
The current code has a logic bug with `line.split(pattern)`:
```python
# CURRENT (broken):
if re.search(pattern, line):
if "test" not in line.lower() and "#" not in line.split(pattern)[0]:
# line.split(pattern) splits on LITERAL string, not regex!
violations.append(...)
# FIXED:
match = re.search(pattern, line)
if match:
# Check if # appears BEFORE the match
comment_pos = line.find("#")
if comment_pos != -1 and comment_pos < match.start():
continue # Path is in comment, skip
if "test" not in line.lower():
violations.append(...)
```
#### Task 2.3: Fix Silent SyntaxError Handling
Replace all `except SyntaxError: continue` with error collection:
```python
# CURRENT (silent failure):
for py_file in find_python_files(src_root):
source = py_file.read_text(encoding="utf-8")
try:
tree = ast.parse(source)
except SyntaxError:
continue # <-- Silent skip!
# FIXED (fail loudly):
from tests.quality._helpers import parse_file_safe
parse_errors: list[str] = []
for py_file in find_python_files(src_root):
tree, error = parse_file_safe(py_file)
if error:
parse_errors.append(error)
continue
# ... process tree ...
# At the end of the test:
assert not parse_errors, (
f"Quality scan hit {len(parse_errors)} parse error(s):\n"
+ "\n".join(parse_errors)
)
```
### Phase 3: Migrate Tests to Baseline (Day 2)
#### Task 3.1: Migrate High-Impact Tests First
Priority order (highest gaming risk first):
1. `test_no_stale_todos` - Easy to add TODOs
2. `test_no_trivial_wrapper_functions` - High cap (42)
3. `test_no_high_complexity_functions` - Complexity creep
4. `test_no_long_parameter_lists` - High cap (35)
5. `test_no_repeated_code_patterns` - Very high cap (177)
Example migration for `test_no_stale_todos`:
```python
# BEFORE:
def test_no_stale_todos() -> None:
# ... detection logic ...
max_allowed = 10
assert len(stale_comments) <= max_allowed, ...
# AFTER:
from tests.quality._baseline import Violation, assert_no_new_violations, content_hash
from tests.quality._helpers import find_source_files, parse_file_safe, relative_path
def test_no_stale_todos() -> None:
violations: list[Violation] = []
parse_errors: list[str] = []
for py_file in find_source_files():
lines = py_file.read_text(encoding="utf-8").splitlines()
rel_path = relative_path(py_file)
for i, line in enumerate(lines, start=1):
match = stale_pattern.search(line)
if match:
tag = match.group(1).upper()
message = match.group(2).strip()
violations.append(
Violation(
rule="stale_todo",
relative_path=rel_path,
identifier=content_hash(f"{i}:{line.strip()}"),
detail=tag,
)
)
assert not parse_errors, "\n".join(parse_errors)
assert_no_new_violations("stale_todo", violations)
```
#### Task 3.2: Generate Initial Baseline
After migrating all tests, generate the baseline:
```bash
# Run with special env var to generate baseline
QUALITY_GENERATE_BASELINE=1 pytest tests/quality/ -v
# Or use a management script
python -m tests.quality._baseline --generate
```
### Phase 4: Advanced Improvements (Day 3)
#### Task 4.1: Replace Magic Value Allowlists with "Must Be Named" Rule
Instead of maintaining `ALLOWED_NUMBERS` and `ALLOWED_STRINGS`, use:
```python
def test_no_repeated_literals() -> None:
"""Detect literals that repeat and should be constants.
Rule: Any literal that appears more than once in a module or
across multiple modules should be promoted to a named constant.
Universal exceptions (0, 1, -1, "utf-8", HTTP verbs) are allowed.
"""
UNIVERSAL_EXCEPTIONS = {
0, 1, 2, -1, # Universal integers
0.0, 1.0, # Universal floats
"", " ", "\n", "\t", # Universal strings
"utf-8", "utf-8",
"GET", "POST", "PUT", "DELETE", "PATCH",
}
literal_occurrences: dict[object, list[Violation]] = defaultdict(list)
for py_file in find_source_files():
# ... collect literals ...
for node in ast.walk(tree):
if isinstance(node, ast.Constant):
value = node.value
if value in UNIVERSAL_EXCEPTIONS:
continue
if isinstance(value, str) and len(value) < 3:
continue # Short strings OK
# ... add to occurrences ...
# Flag any literal appearing 3+ times
violations = []
for value, occurrences in literal_occurrences.items():
if len(occurrences) >= 3:
violations.extend(occurrences)
assert_no_new_violations("repeated_literal", violations)
```
#### Task 4.2: Add CODEOWNERS Protection
**File**: `.github/CODEOWNERS`
```
# Quality suite requires maintainer approval
tests/quality/ @your-team/maintainers
tests/quality/baselines.json @your-team/maintainers
```
### Phase 5: Documentation and Rollout
#### Task 5.1: Update QUALITY_STANDARDS.md
Add section explaining baseline enforcement:
```markdown
## Baseline-Based Quality Gates
Quality tests use baseline enforcement instead of fixed caps:
- **No new debt**: Adding any new violation fails immediately
- **Baseline file**: `tests/quality/baselines.json` freezes existing violations
- **Reducing debt**: Fix violations, then update baseline to remove entries
- **Protected file**: Baseline changes require maintainer approval
### Updating the Baseline
When you've fixed violations and want to update the baseline:
```bash
# Regenerate baseline with current violations
python -m pytest tests/quality/ --generate-baseline
# Review and commit
git diff tests/quality/baselines.json
git add tests/quality/baselines.json
git commit -m "chore: reduce quality baseline (N violations fixed)"
```
### Adding Exceptions (Rare)
If a new violation is intentional:
1. Document why in a comment near the code
2. Add the stable ID to `baselines.json`
3. Get maintainer approval for the change
```
## Migration Strategy
### Step 1: Add Infrastructure (No Behavior Change)
1. Add `_baseline.py` and `_helpers.py`
2. Add `test_baseline_self.py`
3. Verify self-tests pass
### Step 2: Fix Detection Bugs
1. Fix skipif detection
2. Fix hardcoded path comment handling
3. Fix silent SyntaxError handling
4. Verify with self-tests
### Step 3: Parallel Run Period
1. Add baseline checks alongside existing caps
2. Both must pass (cap AND baseline)
3. Monitor for issues
### Step 4: Remove Caps
1. Remove `max_allowed` assertions
2. Baseline becomes sole enforcement
3. Generate and commit initial baseline
### Step 5: Reduce Baseline Over Time
1. Sprint goals include "reduce N violations"
2. Update baseline when fixes land
3. Celebrate progress!
## Success Metrics
| Metric | Current | Target |
|--------|---------|--------|
| Parse error handling | Silent skip | Fail loudly |
| Enforcement mechanism | Threshold caps | Baseline comparison |
| Detection holes | 2+ known | 0 known |
| Self-test coverage | 0% | 10+ detector tests |
| Baseline violations | N/A | Tracked and decreasing |
## Risks and Mitigations
| Risk | Mitigation |
|------|------------|
| Baseline file conflicts | Small file, clear ownership |
| Too strict initially | Start with current counts frozen |
| Self-tests incomplete | Add tests as holes are found |
| Agent edits baseline | CODEOWNERS + branch protection |
## References
- [Test Smells](https://testsmells.org/)
- [xUnit Test Patterns](http://xunitpatterns.com/)
- [Quality Debt](https://martinfowler.com/bliki/TechnicalDebt.html)

View File

@@ -0,0 +1,155 @@
# Sprint Plan: Quality Suite Hardening
## Summary
Transform quality tests from threshold-based (`max_allowed = N`) to baseline-based enforcement, fix detection holes, and add self-tests.
## Execution Checklist
### Phase 1: Foundation Infrastructure
- [ ] Create `tests/quality/_baseline.py` with `Violation`, `assert_no_new_violations()`, `content_hash()`
- [ ] Create `tests/quality/_helpers.py` with centralized `find_source_files()`, `parse_file_safe()`
- [ ] Create `tests/quality/baselines.json` (empty initially, schema v1)
- [ ] Create `tests/quality/test_baseline_self.py` with infrastructure self-tests
### Phase 2: Fix Detection Holes
- [ ] Fix `test_no_ignored_tests_without_reason`: Add missing `skipif_pattern` variable and usage
- [ ] Fix `test_no_hardcoded_paths`: Replace `line.split(pattern)` with `match.start()` comparison
- [ ] Replace all `except SyntaxError: continue` with `parse_file_safe()` + error collection
- [ ] Add self-tests for each fixed detector
### Phase 3: Migrate to Baseline Enforcement
Priority order (highest gaming risk):
1. [ ] `test_no_stale_todos` (cap: 10)
2. [ ] `test_no_trivial_wrapper_functions` (cap: 42)
3. [ ] `test_no_high_complexity_functions` (cap: 2)
4. [ ] `test_no_long_parameter_lists` (cap: 35)
5. [ ] `test_no_repeated_code_patterns` (cap: 177)
6. [ ] `test_no_god_classes` (cap: 1)
7. [ ] `test_no_deep_nesting` (cap: 2)
8. [ ] `test_no_long_methods` (cap: 7)
9. [ ] `test_no_feature_envy` (cap: 5)
10. [ ] `test_no_orphaned_imports` (cap: 5)
11. [ ] `test_no_deprecated_patterns` (cap: 5)
12. [ ] `test_no_assertion_roulette` (cap: 50)
13. [ ] `test_no_conditional_test_logic` (cap: 40)
14. [ ] `test_no_sleepy_tests` (cap: 3)
15. [ ] `test_no_unknown_tests` (cap: 5)
16. [ ] `test_no_redundant_prints` (cap: 5)
17. [ ] `test_no_exception_handling_in_tests` (cap: 3)
18. [ ] `test_no_magic_numbers_in_assertions` (cap: 50)
19. [ ] `test_no_sensitive_equality` (cap: 10)
20. [ ] `test_no_eager_tests` (cap: 10)
21. [ ] `test_no_duplicate_test_names` (cap: 15)
22. [ ] `test_no_long_test_methods` (cap: 3)
23. [ ] `test_fixtures_have_type_hints` (cap: 10)
24. [ ] `test_no_unused_fixture_parameters` (cap: 5)
25. [ ] `test_fixture_scope_appropriate` (cap: 5)
26. [ ] `test_no_pytest_raises_without_match` (cap: 50)
27. [ ] `test_no_magic_numbers` (cap: 10)
28. [ ] `test_no_repeated_string_literals` (cap: 30)
29. [ ] `test_no_alias_imports` (cap: 10)
30. [ ] `test_no_redundant_type_aliases` (cap: 2)
31. [ ] `test_no_passthrough_classes` (cap: 1)
32. [ ] `test_no_duplicate_function_bodies` (cap: 1)
33. [ ] `test_helpers_not_scattered` (cap: 15)
34. [ ] `test_no_duplicate_helper_implementations` (cap: 25)
35. [ ] `test_module_size_limits` (soft cap: 5, hard: 0)
### Phase 4: Generate Initial Baseline
- [ ] Run all quality tests to collect current violations
- [ ] Generate `baselines.json` with frozen violation IDs
- [ ] Verify all tests pass with baseline enforcement
- [ ] Remove `max_allowed` assertions from all tests
### Phase 5: Advanced Improvements (Optional)
- [ ] Replace magic value allowlists with "must be named" rule
- [ ] Add `.github/CODEOWNERS` for `tests/quality/` protection
- [ ] Update `docs/sprints/QUALITY_STANDARDS.md` with baseline workflow
## Files to Create
| File | Purpose |
|------|---------|
| `tests/quality/_baseline.py` | Baseline loading, comparison, violation types |
| `tests/quality/_helpers.py` | Centralized file discovery, safe parsing |
| `tests/quality/baselines.json` | Frozen violation snapshot |
| `tests/quality/test_baseline_self.py` | Self-tests for infrastructure |
## Files to Modify
| File | Changes |
|------|---------|
| `tests/quality/test_code_smells.py` | Use `_helpers`, baseline enforcement |
| `tests/quality/test_stale_code.py` | Use `_helpers`, baseline enforcement |
| `tests/quality/test_test_smells.py` | Fix skipif bug, use baseline |
| `tests/quality/test_magic_values.py` | Fix path bug, use baseline |
| `tests/quality/test_unnecessary_wrappers.py` | Use `_helpers`, baseline |
| `tests/quality/test_duplicate_code.py` | Use `_helpers`, baseline |
| `tests/quality/test_decentralized_helpers.py` | Use `_helpers`, baseline |
## Key Design Decisions
### Stable Violation IDs
```
{rule}|{relative_path}|{identifier}[|{detail}]
Examples:
- high_complexity|src/noteflow/grpc/service.py|StreamTranscription
- thin_wrapper|src/noteflow/config/settings.py|get_settings|_load_settings
- stale_todo|src/noteflow/cli/main.py|hash:a1b2c3d4
```
### Baseline JSON Schema
```json
{
"schema_version": 1,
"generated_at": "ISO8601",
"rules": {
"rule_name": ["stable_id_1", "stable_id_2"]
}
}
```
### Parse Error Handling
```python
# Never silently skip
tree, error = parse_file_safe(file_path)
if error:
parse_errors.append(error)
continue
# Fail at end if any errors
assert not parse_errors, "\n".join(parse_errors)
```
## Verification Commands
```bash
# Run quality tests
pytest tests/quality/ -v
# Generate baseline (after migration)
QUALITY_GENERATE_BASELINE=1 pytest tests/quality/ -v
# Check for new violations only
pytest tests/quality/ -v --tb=short
```
## Success Criteria
1. All quality tests pass with baseline enforcement
2. No `max_allowed` caps remain in test code
3. Self-tests cover all detection mechanisms
4. Parse errors fail loudly instead of silently
5. Detection holes (skipif, hardcoded paths) are fixed
6. `baselines.json` tracks all existing violations

View File

@@ -0,0 +1,265 @@
# Sprint: Spec Validation Fixes
> **Source**: `docs/spec.md` (2025-12-31 validation)
> **Quality Gates**: `docs/sprints/QUALITY_STANDARDS.md`
> **Status**: Planning
---
## Executive Summary
This sprint addresses 12 findings from the spec validation document, ranging from gRPC schema inconsistencies to performance issues and security gaps. Each finding has been validated against the current codebase with exact file locations and evidence.
---
## Priority Classification
### P0 - Critical (Security/Data Integrity)
| ID | Finding | Risk | Effort |
|----|---------|------|--------|
| #6 | ChunkedAssetReader lacks bounds checks | Data corruption, decryption failures | Medium |
| #10 | OTEL exporter uses `insecure=True` | Telemetry data exposed in transit | Low |
### P1 - High (API Contract/Correctness)
| ID | Finding | Risk | Effort |
|----|---------|------|--------|
| #1 | Timestamp representations inconsistent | Client/server mismatch, conversion errors | High |
| #2 | UpdateAnnotation sentinel defaults | Cannot clear fields intentionally | Medium |
| #3 | TranscriptUpdate ambiguous without `oneof` | Clients must defensively branch | Medium |
| #11 | Stringly-typed statuses | Typos, unsupported values at runtime | Medium |
### P2 - Medium (Reliability/Performance)
| ID | Finding | Risk | Effort |
|----|---------|------|--------|
| #4 | Background task tracking inconsistent | Sync tasks not cancelled on shutdown | Medium |
| #5 | Segmenter O(n) `pop(0)` in hot path | Performance degradation under load | Low |
| #7 | gRPC size limits in multiple places | Configuration drift | Low |
| #8 | Outlook adapter lacks timeouts/pagination | Hangs, incomplete data | Medium |
| #9 | Webhook delivery ID not recorded | Correlation impossible | Low |
### P3 - Low (Test Coverage)
| ID | Finding | Risk | Effort |
|----|---------|------|--------|
| #12 | Test targets for high-risk changes | Regression risk | Medium |
---
## Detailed Findings
### #1 Timestamp Representations Inconsistent
**Status**: Confirmed
**Locations**:
- `src/noteflow/grpc/proto/noteflow.proto:217` - `double created_at`
- `src/noteflow/grpc/proto/noteflow.proto:745` - `int64 start_time`
- `src/noteflow/grpc/proto/noteflow.proto:1203` - `string started_at` (ISO 8601)
- `src/noteflow/grpc/proto/noteflow.proto:149` - `double server_timestamp`
**Impact**: Multiple time encodings force per-field conversions and increase mismatch risk.
**Solution**:
1. Add `google.protobuf.Timestamp` fields in new/v2 messages
2. Deprecate legacy fields with comments
3. Add helper conversions in `src/noteflow/grpc/_mixins/converters.py`
---
### #2 UpdateAnnotation Sentinel Defaults
**Status**: Confirmed
**Locations**:
- `src/noteflow/grpc/proto/noteflow.proto:502` - Message definition
- `src/noteflow/grpc/_mixins/annotation.py:127` - Handler logic
**Impact**: Cannot clear text to empty string, set time to 0, or clear segment_ids.
**Solution**:
1. Add `optional` keyword to fields (proto3 presence tracking)
2. Use `HasField()` checks in handler instead of sentinel comparisons
3. Add `clear_*` flags for backward compatibility
---
### #3 TranscriptUpdate Ambiguous Without `oneof`
**Status**: Confirmed
**Location**: `src/noteflow/grpc/proto/noteflow.proto:136`
**Impact**: Schema allows both `partial_text` and `segment` or neither.
**Solution**:
1. Create `TranscriptUpdateV2` with `oneof payload`
2. Add new RPC `StreamTranscriptionV2`
3. Use `google.protobuf.Timestamp` for `server_timestamp`
---
### #4 Background Task Tracking Inconsistent
**Status**: Partially confirmed
**Locations**:
- `src/noteflow/grpc/_mixins/diarization/_jobs.py:130` - Tracked tasks
- `src/noteflow/grpc/_mixins/sync.py:109` - Untracked sync tasks
**Impact**: Sync tasks not cancelled on shutdown, exceptions not observed.
**Solution**:
1. Add shared `BackgroundTaskRegistry` in servicer
2. Register sync tasks for cancellation
3. Add done-callback for exception logging
---
### #5 Segmenter O(n) `pop(0)` in Hot Path
**Status**: Confirmed
**Location**: `src/noteflow/infrastructure/asr/segmenter.py:233`
**Impact**: O(n) behavior under sustained audio streaming.
**Solution**:
1. Replace `list` with `collections.deque`
2. Use `popleft()` for O(1) removals
---
### #6 ChunkedAssetReader Lacks Bounds Checks
**Status**: Partially confirmed
**Location**: `src/noteflow/infrastructure/security/crypto.py:279`
**Impact**: No guard for `chunk_length < NONCE_SIZE + TAG_SIZE`, invalid slices possible.
**Solution**:
1. Add `read_exact()` helper
2. Validate `chunk_length >= NONCE_SIZE + TAG_SIZE`
3. Treat partial length headers as errors
4. Consider optional AAD for chunk index
---
### #7 gRPC Size Limits in Multiple Places
**Status**: Confirmed
**Locations**:
- `src/noteflow/grpc/service.py:86` - `MAX_CHUNK_SIZE = 1MB`
- `src/noteflow/config/constants.py:27` - `MAX_GRPC_MESSAGE_SIZE = 100MB`
- `src/noteflow/grpc/server.py:158` - Hardcoded in options
**Impact**: Multiple sources of truth can drift.
**Solution**:
1. Move to `Settings` class
2. Use consistently in `server.py` and `service.py`
3. Enforce chunk size in streaming handlers
4. Surface in `ServerInfo`
---
### #8 Outlook Adapter Lacks Timeouts/Pagination
**Status**: Confirmed
**Location**: `src/noteflow/infrastructure/calendar/outlook_adapter.py:81`
**Impact**: No timeouts, no pagination via `@odata.nextLink`, unbounded error logging.
**Solution**:
1. Configure `httpx.AsyncClient(timeout=..., limits=...)`
2. Implement pagination with `@odata.nextLink`
3. Truncate error bodies before logging
---
### #9 Webhook Delivery ID Not Recorded
**Status**: Partially confirmed
**Locations**:
- `src/noteflow/infrastructure/webhooks/executor.py:255` - ID generated
- `src/noteflow/infrastructure/webhooks/executor.py:306` - Different ID in record
- `src/noteflow/infrastructure/webhooks/executor.py:103` - No client limits
**Impact**: Delivery ID sent to recipients not stored, correlation impossible.
**Solution**:
1. Reuse `delivery_id` as `WebhookDelivery.id`
2. Add `httpx.Limits` configuration
3. Include `delivery_id` in logs
---
### #10 OTEL Exporter Uses `insecure=True`
**Status**: Confirmed
**Location**: `src/noteflow/infrastructure/observability/otel.py:99`
**Impact**: TLS disabled unconditionally, even in production.
**Solution**:
1. Add `NOTEFLOW_OTEL_INSECURE` setting
2. Infer from endpoint scheme (`http://` vs `https://`)
3. Default to secure
---
### #11 Stringly-Typed Statuses
**Status**: Confirmed
**Locations**:
- `src/noteflow/grpc/proto/noteflow.proto:1191` - `string status` for sync
- `src/noteflow/grpc/proto/noteflow.proto:856` - `string status` for OAuth
**Impact**: Clients must match string literals, risk typos.
**Solution**:
1. Add `SyncRunStatus` enum
2. Add `OAuthConnectionStatus` enum
3. Migrate via new fields or v2 messages
---
### #12 Test Targets for High-Risk Changes
**Status**: Recommendation
**Existing Coverage**:
- `tests/stress/test_segmenter_fuzz.py`
- `tests/stress/test_audio_integrity.py`
**Suggested Additions**:
1. gRPC proto-level test for patch semantics on `UpdateAnnotation`
2. Sync task lifecycle test for shutdown cancellation
3. Outlook adapter test for `@odata.nextLink` pagination
---
## Success Criteria
- [ ] All P0 findings resolved
- [ ] All P1 findings resolved or have v2 migration path
- [ ] All P2 findings resolved
- [ ] Test coverage for high-risk changes
- [ ] Zero new quality threshold violations
- [ ] All type checks pass (`basedpyright`)
- [ ] Documentation updated
---
## Dependencies
- Proto regeneration affects Rust/TS clients
- Backward compatibility required for existing API consumers
- Feature flags for v2 migrations where applicable
---
## Risk Assessment
| Risk | Mitigation |
|------|------------|
| Proto changes break clients | Deprecate + add new fields (no removal) |
| Performance regression from deque | Benchmark before/after |
| OTEL secure default breaks dev | Make configurable with sane defaults |
| Task registry overhead | Lightweight set-based tracking |

File diff suppressed because it is too large Load Diff

View File

@@ -29,6 +29,8 @@ dependencies = [
"authlib>=1.6.6",
"rich>=14.2.0",
"types-psutil>=7.2.0.20251228",
# Structured logging
"structlog>=24.0",
]
[project.optional-dependencies]
@@ -203,6 +205,7 @@ disable_error_code = ["import-untyped"]
[tool.basedpyright]
pythonVersion = "3.12"
typeCheckingMode = "standard"
extraPaths = ["scripts"]
reportMissingTypeStubs = false
reportUnknownMemberType = false
reportUnknownArgumentType = false

View File

@@ -12,9 +12,9 @@
"files": true,
"removeComments": true,
"removeEmptyLines": true,
"compress": true,
"compress": false,
"topFilesLength": 5,
"showLineNumbers": false,
"showLineNumbers": true,
"truncateBase64": false,
"copyToClipboard": false,
"tokenCountTree": false,
@@ -26,11 +26,67 @@
"includeLogsCount": 50
}
},
"include": ["src/"],
"include": [
"tests/quality"
],
"ignore": {
"useGitignore": true,
"useDefaultPatterns": true,
"customPatterns": []
"customPatterns": [
"**/*_pb2.py",
"**/*_pb2_grpc.py",
"**/*.pb2.py",
"**/*.pb2_grpc.py",
"**/*.pyi",
"**/noteflow.rs",
"**/noteflow_pb2.py",
"src/noteflow_pb2.py",
"client/src-tauri/src/grpc/noteflow.rs",
"src/noteflow/grpc/proto/noteflow_pb2.py",
"src/noteflow/grpc/proto/noteflow_pb2_grpc.py",
"src/noteflow/grpc/proto/noteflow_pb2.pyi",
"**/node_modules/**",
"**/target/**",
"**/gen/**",
"**/__pycache__/**",
"**/*.pyc",
"**/.pytest_cache/**",
"**/.mypy_cache/**",
"**/.ruff_cache/**",
"**/dist/**",
"**/build/**",
"**/.vite/**",
"**/coverage/**",
"**/htmlcov/**",
"**/playwright-report/**",
"**/test-results/**",
"uv.lock",
"**/Cargo.lock",
"**/package-lock.json",
"**/bun.lockb",
"**/yarn.lock",
"**/*.lock",
"**/*.lockb",
"**/*.png",
"**/*.jpg",
"**/*.jpeg",
"**/*.gif",
"**/*.ico",
"**/*.svg",
"**/*.icns",
"**/*.webp",
"**/*.xml",
"**/icons/**",
"**/public/**",
"client/app-icon.png",
"**/*.md",
".benchmarks/**",
"noteflow-api-spec.json",
"scratch.md",
"repomix-output.md",
"**/logs/**",
"**/status_line.json"
]
},
"security": {
"enableSecurityCheck": true

View File

@@ -6,7 +6,6 @@ Uses existing Integration entity and IntegrationRepository for persistence.
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
from uuid import UUID
@@ -22,6 +21,7 @@ from noteflow.infrastructure.calendar import (
from noteflow.infrastructure.calendar.google_adapter import GoogleCalendarError
from noteflow.infrastructure.calendar.oauth_manager import OAuthError
from noteflow.infrastructure.calendar.outlook_adapter import OutlookCalendarError
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Callable
@@ -29,7 +29,7 @@ if TYPE_CHECKING:
from noteflow.config.settings import CalendarIntegrationSettings
from noteflow.domain.ports.unit_of_work import UnitOfWork
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class CalendarServiceError(Exception):

View File

@@ -15,12 +15,15 @@ from noteflow.infrastructure.export import (
PdfExporter,
TranscriptExporter,
)
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.domain.entities import Meeting, Segment
from noteflow.domain.ports.unit_of_work import UnitOfWork
from noteflow.domain.value_objects import MeetingId
logger = get_logger(__name__)
class ExportFormat(Enum):
"""Supported export formats."""
@@ -83,17 +86,43 @@ class ExportService:
Raises:
ValueError: If meeting not found.
"""
logger.info(
"Starting transcript export",
meeting_id=str(meeting_id),
format=fmt.value,
)
async with self._uow:
found_meeting = await self._uow.meetings.get(meeting_id)
if not found_meeting:
from noteflow.config.constants import ERROR_MSG_MEETING_PREFIX
msg = f"{ERROR_MSG_MEETING_PREFIX}{meeting_id} not found"
logger.warning(
"Export failed: meeting not found",
meeting_id=str(meeting_id),
)
raise ValueError(msg)
segments = await self._uow.segments.get_by_meeting(meeting_id)
segment_count = len(segments)
logger.debug(
"Retrieved segments for export",
meeting_id=str(meeting_id),
segment_count=segment_count,
)
exporter = self._get_exporter(fmt)
return exporter.export(found_meeting, segments)
result = exporter.export(found_meeting, segments)
content_size = len(result) if isinstance(result, bytes) else len(result.encode("utf-8"))
logger.info(
"Transcript export completed",
meeting_id=str(meeting_id),
format=fmt.value,
segment_count=segment_count,
content_size_bytes=content_size,
)
return result
async def export_to_file(
self,
@@ -114,22 +143,60 @@ class ExportService:
Raises:
ValueError: If meeting not found or format cannot be determined.
"""
logger.info(
"Starting file export",
meeting_id=str(meeting_id),
output_path=str(output_path),
format=fmt.value if fmt else "inferred",
)
# Determine format from extension if not provided
if fmt is None:
fmt = self._infer_format_from_extension(output_path.suffix)
logger.debug(
"Format inferred from extension",
extension=output_path.suffix,
inferred_format=fmt.value,
)
content = await self.export_transcript(meeting_id, fmt)
# Ensure correct extension
exporter = self._get_exporter(fmt)
original_path = output_path
if output_path.suffix != exporter.file_extension:
output_path = output_path.with_suffix(exporter.file_extension)
logger.debug(
"Adjusted file extension",
original_path=str(original_path),
adjusted_path=str(output_path),
expected_extension=exporter.file_extension,
)
output_path.parent.mkdir(parents=True, exist_ok=True)
if isinstance(content, bytes):
output_path.write_bytes(content)
else:
output_path.write_text(content, encoding="utf-8")
try:
if isinstance(content, bytes):
output_path.write_bytes(content)
else:
output_path.write_text(content, encoding="utf-8")
file_size = output_path.stat().st_size
logger.info(
"File export completed",
meeting_id=str(meeting_id),
output_path=str(output_path),
format=fmt.value,
file_size_bytes=file_size,
)
except OSError as exc:
logger.error(
"File write failed",
meeting_id=str(meeting_id),
output_path=str(output_path),
error=str(exc),
)
raise
return output_path
def _infer_format_from_extension(self, extension: str) -> ExportFormat:
@@ -153,12 +220,23 @@ class ExportService:
".htm": ExportFormat.HTML,
EXPORT_EXT_PDF: ExportFormat.PDF,
}
fmt = extension_map.get(extension.lower())
normalized_ext = extension.lower()
fmt = extension_map.get(normalized_ext)
if fmt is None:
logger.warning(
"Unrecognized file extension for format inference",
extension=extension,
supported_extensions=list(extension_map.keys()),
)
raise ValueError(
f"Cannot infer format from extension '{extension}'. "
f"Supported: {', '.join(extension_map.keys())}"
)
logger.debug(
"Format inference successful",
extension=normalized_ext,
inferred_format=fmt.value,
)
return fmt
def get_supported_formats(self) -> list[tuple[str, str]]:

View File

@@ -7,7 +7,6 @@ Following hexagonal architecture:
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
from uuid import UUID, uuid4
@@ -23,6 +22,7 @@ from noteflow.domain.identity import (
WorkspaceContext,
WorkspaceRole,
)
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.models import (
DEFAULT_USER_ID,
DEFAULT_WORKSPACE_ID,
@@ -33,7 +33,7 @@ if TYPE_CHECKING:
from noteflow.domain.ports.unit_of_work import UnitOfWork
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class IdentityService:
@@ -64,6 +64,7 @@ class IdentityService:
"""
if not uow.supports_users:
# Return a synthetic context for memory mode
logger.debug("Memory mode: returning synthetic default user context")
return UserContext(
user_id=UUID(DEFAULT_USER_ID),
display_name=DEFAULT_USER_DISPLAY_NAME,
@@ -71,6 +72,7 @@ class IdentityService:
user = await uow.users.get_default()
if user:
logger.debug("Found existing default user: %s", user.id)
return UserContext(
user_id=user.id,
display_name=user.display_name,
@@ -110,6 +112,7 @@ class IdentityService:
"""
if not uow.supports_workspaces:
# Return a synthetic context for memory mode
logger.debug("Memory mode: returning synthetic default workspace context")
return WorkspaceContext(
workspace_id=UUID(DEFAULT_WORKSPACE_ID),
workspace_name=DEFAULT_WORKSPACE_NAME,
@@ -118,6 +121,11 @@ class IdentityService:
workspace = await uow.workspaces.get_default_for_user(user_id)
if workspace:
logger.debug(
"Found existing default workspace for user %s: %s",
user_id,
workspace.id,
)
membership = await uow.workspaces.get_membership(workspace.id, user_id)
role = WorkspaceRole(membership.role.value) if membership else WorkspaceRole.OWNER
return WorkspaceContext(
@@ -169,10 +177,22 @@ class IdentityService:
user = await self.get_or_create_default_user(uow)
if workspace_id:
logger.info(
"Resolving context for explicit workspace_id=%s, user_id=%s",
workspace_id,
user.user_id,
)
ws_context = await self._get_workspace_context(uow, workspace_id, user.user_id)
else:
logger.debug("No workspace_id provided, using default workspace")
ws_context = await self.get_or_create_default_workspace(uow, user.user_id)
logger.debug(
"Resolved operation context: user=%s, workspace=%s, request_id=%s",
user.user_id,
ws_context.workspace_id,
request_id,
)
return OperationContext(
user=user,
workspace=ws_context,
@@ -200,24 +220,38 @@ class IdentityService:
PermissionError: If user not a member.
"""
if not uow.supports_workspaces:
logger.debug("Memory mode: returning synthetic workspace context for %s", workspace_id)
return WorkspaceContext(
workspace_id=workspace_id,
workspace_name=DEFAULT_WORKSPACE_NAME,
role=WorkspaceRole.OWNER,
)
logger.debug("Looking up workspace %s for user %s", workspace_id, user_id)
workspace = await uow.workspaces.get(workspace_id)
if not workspace:
from noteflow.config.constants import ERROR_MSG_WORKSPACE_PREFIX
logger.warning("Workspace not found: %s", workspace_id)
msg = f"{ERROR_MSG_WORKSPACE_PREFIX}{workspace_id} not found"
raise ValueError(msg)
membership = await uow.workspaces.get_membership(workspace_id, user_id)
if not membership:
logger.warning(
"Permission denied: user %s is not a member of workspace %s",
user_id,
workspace_id,
)
msg = f"User not a member of workspace {workspace_id}"
raise PermissionError(msg)
logger.debug(
"Workspace access granted: user=%s, workspace=%s, role=%s",
user_id,
workspace_id,
membership.role,
)
return WorkspaceContext(
workspace_id=workspace.id,
workspace_name=workspace.name,
@@ -243,9 +277,18 @@ class IdentityService:
List of workspaces.
"""
if not uow.supports_workspaces:
logger.debug("Memory mode: returning empty workspace list")
return []
return await uow.workspaces.list_for_user(user_id, limit, offset)
workspaces = await uow.workspaces.list_for_user(user_id, limit, offset)
logger.debug(
"Listed %d workspaces for user %s (limit=%d, offset=%d)",
len(workspaces),
user_id,
limit,
offset,
)
return workspaces
async def create_workspace(
self,
@@ -316,9 +359,15 @@ class IdentityService:
User if found, None otherwise.
"""
if not uow.supports_users:
logger.debug("Memory mode: users not supported, returning None")
return None
return await uow.users.get(user_id)
user = await uow.users.get(user_id)
if user:
logger.debug("Found user: %s", user_id)
else:
logger.debug("User not found: %s", user_id)
return user
async def update_user_profile(
self,
@@ -347,15 +396,27 @@ class IdentityService:
user = await uow.users.get(user_id)
if not user:
logger.warning("User not found for profile update: %s", user_id)
return None
updated_fields: list[str] = []
if display_name:
user.display_name = display_name
updated_fields.append("display_name")
if email is not None:
user.email = email
updated_fields.append("email")
if not updated_fields:
logger.debug("No fields to update for user %s", user_id)
return user
updated = await uow.users.update(user)
await uow.commit()
logger.info("Updated user profile: %s", user_id)
logger.info(
"Updated user profile: user_id=%s, fields=%s",
user_id,
", ".join(updated_fields),
)
return updated

View File

@@ -5,7 +5,6 @@ Orchestrates meeting-related use cases with persistence.
from __future__ import annotations
import logging
from collections.abc import Sequence
from datetime import UTC, datetime
from typing import TYPE_CHECKING
@@ -20,6 +19,7 @@ from noteflow.domain.entities import (
WordTiming,
)
from noteflow.domain.value_objects import AnnotationId, AnnotationType
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Sequence as SequenceType
@@ -27,7 +27,7 @@ if TYPE_CHECKING:
from noteflow.domain.ports.unit_of_work import UnitOfWork
from noteflow.domain.value_objects import MeetingId, MeetingState
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class MeetingService:
@@ -64,6 +64,7 @@ class MeetingService:
async with self._uow:
saved = await self._uow.meetings.create(meeting)
await self._uow.commit()
logger.info("Created meeting", meeting_id=str(saved.id), title=title, state=saved.state.value)
return saved
async def get_meeting(self, meeting_id: MeetingId) -> Meeting | None:
@@ -76,7 +77,12 @@ class MeetingService:
Meeting if found, None otherwise.
"""
async with self._uow:
return await self._uow.meetings.get(meeting_id)
meeting = await self._uow.meetings.get(meeting_id)
if meeting is None:
logger.debug("Meeting not found", meeting_id=str(meeting_id))
else:
logger.debug("Retrieved meeting", meeting_id=str(meeting_id), state=meeting.state.value)
return meeting
async def list_meetings(
self,
@@ -97,12 +103,14 @@ class MeetingService:
Tuple of (meeting sequence, total matching count).
"""
async with self._uow:
return await self._uow.meetings.list_all(
meetings, total = await self._uow.meetings.list_all(
states=states,
limit=limit,
offset=offset,
sort_desc=sort_desc,
)
logger.debug("Listed meetings", count=len(meetings), total=total, limit=limit, offset=offset)
return meetings, total
async def start_recording(self, meeting_id: MeetingId) -> Meeting | None:
"""Start recording a meeting.
@@ -116,11 +124,14 @@ class MeetingService:
async with self._uow:
meeting = await self._uow.meetings.get(meeting_id)
if meeting is None:
logger.warning("Cannot start recording: meeting not found", meeting_id=str(meeting_id))
return None
previous_state = meeting.state.value
meeting.start_recording()
await self._uow.meetings.update(meeting)
await self._uow.commit()
logger.info("Started recording", meeting_id=str(meeting_id), from_state=previous_state, to_state=meeting.state.value)
return meeting
async def stop_meeting(self, meeting_id: MeetingId) -> Meeting | None:
@@ -137,13 +148,15 @@ class MeetingService:
async with self._uow:
meeting = await self._uow.meetings.get(meeting_id)
if meeting is None:
logger.warning("Cannot stop meeting: not found", meeting_id=str(meeting_id))
return None
# Graceful shutdown: RECORDING -> STOPPING -> STOPPED
meeting.begin_stopping()
previous_state = meeting.state.value
meeting.begin_stopping() # RECORDING -> STOPPING -> STOPPED
meeting.stop_recording()
await self._uow.meetings.update(meeting)
await self._uow.commit()
logger.info("Stopped meeting", meeting_id=str(meeting_id), from_state=previous_state, to_state=meeting.state.value)
return meeting
async def complete_meeting(self, meeting_id: MeetingId) -> Meeting | None:
@@ -158,11 +171,14 @@ class MeetingService:
async with self._uow:
meeting = await self._uow.meetings.get(meeting_id)
if meeting is None:
logger.warning("Cannot complete meeting: not found", meeting_id=str(meeting_id))
return None
previous_state = meeting.state.value
meeting.complete()
await self._uow.meetings.update(meeting)
await self._uow.commit()
logger.info("Completed meeting", meeting_id=str(meeting_id), from_state=previous_state, to_state=meeting.state.value)
return meeting
async def delete_meeting(self, meeting_id: MeetingId) -> bool:
@@ -181,16 +197,14 @@ class MeetingService:
async with self._uow:
meeting = await self._uow.meetings.get(meeting_id)
if meeting is None:
logger.warning("Cannot delete meeting: not found", meeting_id=str(meeting_id))
return False
# Delete filesystem assets (use stored asset_path if different from meeting_id)
await self._uow.assets.delete_meeting_assets(meeting_id, meeting.asset_path)
# Delete DB record (cascade handles children)
success = await self._uow.meetings.delete(meeting_id)
if success:
await self._uow.commit()
logger.info("Deleted meeting %s", meeting_id)
logger.info("Deleted meeting", meeting_id=str(meeting_id), title=meeting.title)
return success
@@ -240,25 +254,15 @@ class MeetingService:
async with self._uow:
saved = await self._uow.segments.add(meeting_id, segment)
await self._uow.commit()
logger.debug("Added segment", meeting_id=str(meeting_id), segment_id=segment_id, start=start_time, end=end_time)
return saved
async def add_segments_batch(
self,
meeting_id: MeetingId,
segments: Sequence[Segment],
) -> Sequence[Segment]:
"""Add multiple segments in batch.
Args:
meeting_id: Meeting identifier.
segments: Segments to add.
Returns:
Added segments.
"""
async def add_segments_batch(self, meeting_id: MeetingId, segments: Sequence[Segment]) -> Sequence[Segment]:
"""Add multiple segments in batch."""
async with self._uow:
saved = await self._uow.segments.add_batch(meeting_id, segments)
await self._uow.commit()
logger.debug("Added segments batch", meeting_id=str(meeting_id), count=len(segments))
return saved
async def get_segments(
@@ -339,19 +343,18 @@ class MeetingService:
async with self._uow:
saved = await self._uow.summaries.save(summary)
await self._uow.commit()
logger.info("Saved summary", meeting_id=str(meeting_id), provider=provider_name or "unknown", model=model_name or "unknown")
return saved
async def get_summary(self, meeting_id: MeetingId) -> Summary | None:
"""Get summary for a meeting.
Args:
meeting_id: Meeting identifier.
Returns:
Summary if exists, None otherwise.
"""
"""Get summary for a meeting."""
async with self._uow:
return await self._uow.summaries.get_by_meeting(meeting_id)
summary = await self._uow.summaries.get_by_meeting(meeting_id)
if summary is None:
logger.debug("Summary not found", meeting_id=str(meeting_id))
else:
logger.debug("Retrieved summary", meeting_id=str(meeting_id), provider=summary.provider_name or "unknown")
return summary
# Annotation methods
@@ -392,6 +395,14 @@ class MeetingService:
async with self._uow:
saved = await self._uow.annotations.add(annotation)
await self._uow.commit()
logger.info(
"Added annotation",
meeting_id=str(meeting_id),
annotation_id=str(annotation.id),
annotation_type=annotation_type.value,
start_time=start_time,
end_time=end_time,
)
return saved
async def get_annotation(self, annotation_id: AnnotationId) -> Annotation | None:
@@ -455,6 +466,12 @@ class MeetingService:
async with self._uow:
updated = await self._uow.annotations.update(annotation)
await self._uow.commit()
logger.info(
"Updated annotation",
annotation_id=str(annotation.id),
meeting_id=str(annotation.meeting_id),
annotation_type=annotation.annotation_type.value,
)
return updated
async def delete_annotation(self, annotation_id: AnnotationId) -> bool:
@@ -470,4 +487,10 @@ class MeetingService:
success = await self._uow.annotations.delete(annotation_id)
if success:
await self._uow.commit()
logger.info("Deleted annotation", annotation_id=str(annotation_id))
else:
logger.warning(
"Cannot delete annotation: not found",
annotation_id=str(annotation_id),
)
return success

View File

@@ -7,12 +7,12 @@ Orchestrates NER extraction, caching, and persistence following hexagonal archit
from __future__ import annotations
import asyncio
import logging
from dataclasses import dataclass
from typing import TYPE_CHECKING
from noteflow.config.settings import get_feature_flags
from noteflow.domain.entities.named_entity import NamedEntity
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Callable, Sequence
@@ -24,7 +24,7 @@ if TYPE_CHECKING:
UoWFactory = Callable[[], SqlAlchemyUnitOfWork]
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
@dataclass

View File

@@ -4,7 +4,7 @@ from __future__ import annotations
from uuid import UUID
from noteflow.config.constants import ERROR_MSG_PROJECT_PREFIX
from noteflow.config.constants import ERROR_MSG_PROJECT_PREFIX, ERROR_MSG_WORKSPACE_PREFIX
from noteflow.domain.entities.project import Project
from noteflow.domain.ports.unit_of_work import UnitOfWork
@@ -43,7 +43,7 @@ class ActiveProjectMixin:
workspace = await uow.workspaces.get(workspace_id)
if workspace is None:
msg = f"Workspace {workspace_id} not found"
msg = f"{ERROR_MSG_WORKSPACE_PREFIX}{workspace_id} not found"
raise ValueError(msg)
if project_id is not None:
@@ -92,7 +92,7 @@ class ActiveProjectMixin:
workspace = await uow.workspaces.get(workspace_id)
if workspace is None:
msg = f"Workspace {workspace_id} not found"
msg = f"{ERROR_MSG_WORKSPACE_PREFIX}{workspace_id} not found"
raise ValueError(msg)
active_project_id: UUID | None = None

View File

@@ -2,17 +2,17 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
from uuid import UUID, uuid4
from noteflow.domain.entities.project import Project, ProjectSettings, slugify
from noteflow.domain.ports.unit_of_work import UnitOfWork
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Sequence
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class ProjectCrudMixin:

View File

@@ -2,17 +2,17 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
from uuid import UUID
from noteflow.domain.identity import ProjectMembership, ProjectRole
from noteflow.domain.ports.unit_of_work import UnitOfWork
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Sequence
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class ProjectMembershipMixin:

View File

@@ -6,7 +6,6 @@ Optionally validate audio file integrity for crashed meetings.
from __future__ import annotations
import logging
from dataclasses import dataclass
from datetime import UTC, datetime
from pathlib import Path
@@ -15,12 +14,13 @@ from typing import TYPE_CHECKING, ClassVar
import sqlalchemy.exc
from noteflow.domain.value_objects import MeetingState
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.domain.entities import Meeting
from noteflow.domain.ports.unit_of_work import UnitOfWork
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
@dataclass(frozen=True)

View File

@@ -2,17 +2,18 @@
from __future__ import annotations
import logging
from collections.abc import Callable
from dataclasses import dataclass
from datetime import UTC, datetime, timedelta
from typing import TYPE_CHECKING
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.domain.entities import Meeting
from noteflow.domain.ports.unit_of_work import UnitOfWork
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
@dataclass(frozen=True)

View File

@@ -5,7 +5,6 @@ Coordinate provider selection, consent handling, and citation verification.
from __future__ import annotations
import logging
from dataclasses import dataclass, field
from enum import Enum
from typing import TYPE_CHECKING
@@ -19,6 +18,7 @@ from noteflow.domain.summarization import (
SummarizationRequest,
SummarizationResult,
)
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Awaitable, Callable, Sequence
@@ -33,7 +33,7 @@ if TYPE_CHECKING:
# Type alias for consent persistence callback
ConsentPersistCallback = Callable[[bool], Awaitable[None]]
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class SummarizationMode(Enum):

View File

@@ -5,17 +5,17 @@ Orchestrate trigger detection with rate limiting and snooze support.
from __future__ import annotations
import logging
import time
from dataclasses import dataclass
from typing import TYPE_CHECKING
from noteflow.domain.triggers.entities import TriggerAction, TriggerDecision, TriggerSignal
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.domain.triggers.ports import SignalProvider
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
@dataclass

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
from noteflow.config.constants import DEFAULT_MEETING_TITLE
@@ -14,13 +13,15 @@ from noteflow.domain.webhooks import (
WebhookConfig,
WebhookDelivery,
WebhookEventType,
payload_to_dict,
)
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.webhooks import WebhookExecutor
if TYPE_CHECKING:
from noteflow.domain.entities.meeting import Meeting
_logger = logging.getLogger(__name__)
_logger = get_logger(__name__)
class WebhookService:
@@ -95,7 +96,7 @@ class WebhookService:
return await self._deliver_to_all(
WebhookEventType.MEETING_COMPLETED,
payload.to_dict(),
payload_to_dict(payload),
)
async def trigger_summary_generated(
@@ -123,7 +124,7 @@ class WebhookService:
return await self._deliver_to_all(
WebhookEventType.SUMMARY_GENERATED,
payload.to_dict(),
payload_to_dict(payload),
)
async def trigger_recording_started(
@@ -149,7 +150,7 @@ class WebhookService:
return await self._deliver_to_all(
WebhookEventType.RECORDING_STARTED,
payload.to_dict(),
payload_to_dict(payload),
)
async def trigger_recording_stopped(
@@ -178,7 +179,7 @@ class WebhookService:
return await self._deliver_to_all(
WebhookEventType.RECORDING_STOPPED,
payload.to_dict(),
payload_to_dict(payload),
)
async def _deliver_to_all(

View File

@@ -9,12 +9,18 @@ import sys
from rich.console import Console
from noteflow.infrastructure.logging import get_logger
console = Console()
logger = get_logger(__name__)
def main() -> None:
"""Dispatch to appropriate subcommand CLI."""
logger.info("cli_invoked", argv=sys.argv)
if len(sys.argv) < 2:
logger.debug("cli_no_command", message="No command provided, showing help")
console.print("[bold]NoteFlow CLI[/bold]")
console.print()
console.print("Available commands:")
@@ -32,19 +38,31 @@ def main() -> None:
sys.exit(1)
command = sys.argv[1]
subcommand_args = sys.argv[2:]
# Remove the command from argv so submodule parsers work correctly
sys.argv = [sys.argv[0], *sys.argv[2:]]
sys.argv = [sys.argv[0], *subcommand_args]
if command == "retention":
from noteflow.cli.retention import main as retention_main
logger.debug("cli_dispatch", command=command, subcommand_args=subcommand_args)
try:
from noteflow.cli.retention import main as retention_main
retention_main()
retention_main()
except Exception:
logger.exception("cli_command_failed", command=command)
raise
elif command == "models":
from noteflow.cli.models import main as models_main
logger.debug("cli_dispatch", command=command, subcommand_args=subcommand_args)
try:
from noteflow.cli.models import main as models_main
models_main()
models_main()
except Exception:
logger.exception("cli_command_failed", command=command)
raise
else:
logger.warning("cli_unknown_command", command=command)
console.print(f"[red]Unknown command:[/red] {command}")
console.print("Available commands: retention, models")
sys.exit(1)

View File

@@ -7,7 +7,6 @@ Usage:
"""
import argparse
import logging
import subprocess
import sys
from dataclasses import dataclass, field
@@ -15,12 +14,10 @@ from dataclasses import dataclass, field
from rich.console import Console
from noteflow.config.constants import SPACY_MODEL_LG, SPACY_MODEL_SM
from noteflow.infrastructure.logging import configure_logging, get_logger
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
configure_logging()
logger = get_logger(__name__)
console = Console()
# Constants to avoid magic strings

View File

@@ -7,20 +7,17 @@ Usage:
import argparse
import asyncio
import logging
import sys
from rich.console import Console
from noteflow.application.services import RetentionService
from noteflow.config.settings import get_settings
from noteflow.infrastructure.logging import configure_logging, get_logger
from noteflow.infrastructure.persistence.unit_of_work import create_uow_factory
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
configure_logging()
logger = get_logger(__name__)
console = Console()

View File

@@ -222,3 +222,28 @@ SCHEMA_TYPE_BOOLEAN: Final[str] = "boolean"
SCHEMA_TYPE_ARRAY_ITEMS: Final[str] = "items"
"""JSON schema type name for array items."""
# Log event names - centralized to avoid repeated strings
LOG_EVENT_DATABASE_REQUIRED_FOR_ANNOTATIONS: Final[str] = "database_required_for_annotations"
"""Log event when annotations require database persistence."""
LOG_EVENT_ANNOTATION_NOT_FOUND: Final[str] = "annotation_not_found"
"""Log event when annotation lookup fails."""
LOG_EVENT_INVALID_ANNOTATION_ID: Final[str] = "invalid_annotation_id"
"""Log event when annotation ID is invalid."""
LOG_EVENT_SERVICE_NOT_ENABLED: Final[str] = "service_not_enabled"
"""Log event when a service feature is not enabled."""
LOG_EVENT_WEBHOOK_REGISTRATION_FAILED: Final[str] = "webhook_registration_failed"
"""Log event when webhook registration fails."""
LOG_EVENT_WEBHOOK_UPDATE_FAILED: Final[str] = "webhook_update_failed"
"""Log event when webhook update fails."""
LOG_EVENT_WEBHOOK_DELETE_FAILED: Final[str] = "webhook_delete_failed"
"""Log event when webhook deletion fails."""
LOG_EVENT_INVALID_WEBHOOK_ID: Final[str] = "invalid_webhook_id"
"""Log event when webhook ID is invalid."""

View File

@@ -485,6 +485,23 @@ class Settings(TriggerSettings):
Field(default=120.0, ge=10.0, le=600.0, description="Timeout for Ollama requests"),
]
# OpenTelemetry settings
otel_endpoint: Annotated[
str | None,
Field(default=None, description="OTLP endpoint for telemetry export"),
]
otel_insecure: Annotated[
bool | None,
Field(
default=None,
description="Use insecure (non-TLS) connection. If None, inferred from endpoint scheme",
),
]
otel_service_name: Annotated[
str,
Field(default="noteflow", description="Service name for OpenTelemetry resource"),
]
@property
def database_url_str(self) -> str:
"""Return database URL as string."""

View File

@@ -23,6 +23,8 @@ from .events import (
WebhookDelivery,
WebhookEventType,
WebhookPayload,
WebhookPayloadDict,
payload_to_dict,
)
__all__ = [
@@ -48,4 +50,7 @@ __all__ = [
"WebhookDelivery",
"WebhookEventType",
"WebhookPayload",
"WebhookPayloadDict",
# Helpers
"payload_to_dict",
]

View File

@@ -6,10 +6,10 @@ infrastructure/persistence/models/integrations/webhook.py for seamless conversio
from __future__ import annotations
from dataclasses import dataclass, field
from dataclasses import asdict, dataclass, field
from datetime import datetime
from enum import Enum
from typing import Any
from typing import TYPE_CHECKING
from uuid import UUID, uuid4
from noteflow.domain.utils.time import utc_now
@@ -18,6 +18,31 @@ from noteflow.domain.webhooks.constants import (
DEFAULT_WEBHOOK_TIMEOUT_MS,
)
# Type alias for JSON-serializable webhook payload values
# Webhook payloads use flat structures with primitive types
type WebhookPayloadValue = str | int | float | bool | None
type WebhookPayloadDict = dict[str, WebhookPayloadValue]
if TYPE_CHECKING:
from typing import TypeVar
_PayloadT = TypeVar("_PayloadT", bound="WebhookPayload")
def payload_to_dict(payload: WebhookPayload) -> WebhookPayloadDict:
"""Convert webhook payload dataclass to typed dictionary.
Uses dataclasses.asdict() for conversion, filtering out None values
to keep payloads compact.
Args:
payload: Any WebhookPayload subclass instance.
Returns:
Dictionary with non-None field values.
"""
return {k: v for k, v in asdict(payload).items() if v is not None}
class WebhookEventType(Enum):
"""Types of webhook trigger events."""
@@ -134,7 +159,7 @@ class WebhookDelivery:
id: UUID
webhook_id: UUID
event_type: WebhookEventType
payload: dict[str, Any]
payload: WebhookPayloadDict
status_code: int | None
response_body: str | None
error_message: str | None
@@ -147,7 +172,7 @@ class WebhookDelivery:
cls,
webhook_id: UUID,
event_type: WebhookEventType,
payload: dict[str, Any],
payload: WebhookPayloadDict,
*,
status_code: int | None = None,
response_body: str | None = None,
@@ -197,6 +222,8 @@ class WebhookDelivery:
class WebhookPayload:
"""Base webhook event payload.
Use payload_to_dict() helper for JSON serialization.
Attributes:
event: Event type identifier string.
timestamp: ISO 8601 formatted event timestamp.
@@ -207,18 +234,6 @@ class WebhookPayload:
timestamp: str
meeting_id: str
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary for JSON serialization.
Returns:
Dictionary representation of the payload.
"""
return {
"event": self.event,
"timestamp": self.timestamp,
"meeting_id": self.meeting_id,
}
@dataclass(frozen=True, slots=True)
class MeetingCompletedPayload(WebhookPayload):
@@ -236,21 +251,6 @@ class MeetingCompletedPayload(WebhookPayload):
segment_count: int
has_summary: bool
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary for JSON serialization.
Returns:
Dictionary representation including meeting details.
"""
base = WebhookPayload.to_dict(self)
return {
**base,
"title": self.title,
"duration_seconds": self.duration_seconds,
"segment_count": self.segment_count,
"has_summary": self.has_summary,
}
@dataclass(frozen=True, slots=True)
class SummaryGeneratedPayload(WebhookPayload):
@@ -268,21 +268,6 @@ class SummaryGeneratedPayload(WebhookPayload):
key_points_count: int
action_items_count: int
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary for JSON serialization.
Returns:
Dictionary representation including summary details.
"""
base = WebhookPayload.to_dict(self)
return {
**base,
"title": self.title,
"executive_summary": self.executive_summary,
"key_points_count": self.key_points_count,
"action_items_count": self.action_items_count,
}
@dataclass(frozen=True, slots=True)
class RecordingPayload(WebhookPayload):
@@ -295,14 +280,3 @@ class RecordingPayload(WebhookPayload):
title: str
duration_seconds: float | None = None
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary for JSON serialization.
Returns:
Dictionary representation including recording details.
"""
result = {**WebhookPayload.to_dict(self), "title": self.title}
if self.duration_seconds is not None:
result["duration_seconds"] = self.duration_seconds
return result

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
import grpc
@@ -13,11 +12,12 @@ from noteflow.grpc._client_mixins.converters import (
)
from noteflow.grpc._types import AnnotationInfo
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.grpc._client_mixins.protocols import ClientHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class AnnotationClientMixin:

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
import grpc
@@ -10,11 +9,12 @@ import grpc
from noteflow.grpc._client_mixins.converters import job_status_to_str
from noteflow.grpc._types import DiarizationResult, RenameSpeakerResult
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.grpc._client_mixins.protocols import ClientHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class DiarizationClientMixin:

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
import grpc
@@ -10,11 +9,12 @@ import grpc
from noteflow.grpc._client_mixins.converters import export_format_to_proto
from noteflow.grpc._types import ExportResult
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.grpc._client_mixins.protocols import ClientHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class ExportClientMixin:

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
import grpc
@@ -10,11 +9,12 @@ import grpc
from noteflow.grpc._client_mixins.converters import proto_to_meeting_info
from noteflow.grpc._types import MeetingInfo, TranscriptSegment
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.grpc._client_mixins.protocols import ClientHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class MeetingClientMixin:

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
import queue
import threading
import time
@@ -15,6 +14,7 @@ from noteflow.config.constants import DEFAULT_SAMPLE_RATE
from noteflow.grpc._config import STREAMING_CONFIG
from noteflow.grpc._types import ConnectionCallback, TranscriptCallback, TranscriptSegment
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
import numpy as np
@@ -22,7 +22,7 @@ if TYPE_CHECKING:
from noteflow.grpc._client_mixins.protocols import ClientHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class StreamingClientMixin:

View File

@@ -6,13 +6,14 @@ These are pure functions that operate on audio data without state.
from __future__ import annotations
import logging
import struct
import numpy as np
from numpy.typing import NDArray
logger = logging.getLogger(__name__)
from noteflow.infrastructure.logging import get_logger
logger = get_logger(__name__)
def resample_audio(

View File

@@ -7,8 +7,14 @@ from uuid import uuid4
import grpc.aio
from noteflow.config.constants import (
LOG_EVENT_ANNOTATION_NOT_FOUND,
LOG_EVENT_DATABASE_REQUIRED_FOR_ANNOTATIONS,
LOG_EVENT_INVALID_ANNOTATION_ID,
)
from noteflow.domain.entities import Annotation
from noteflow.domain.value_objects import AnnotationId
from noteflow.infrastructure.logging import get_logger
from ..proto import noteflow_pb2
from .converters import (
@@ -22,6 +28,8 @@ from .errors import abort_database_required, abort_invalid_argument, abort_not_f
if TYPE_CHECKING:
from .protocols import ServicerHost
logger = get_logger(__name__)
# Entity type names for error messages
_ENTITY_ANNOTATION = "Annotation"
_ENTITY_ANNOTATIONS = "Annotations"
@@ -42,6 +50,10 @@ class AnnotationMixin:
"""Add an annotation to a meeting."""
async with self._create_repository_provider() as repo:
if not repo.supports_annotations:
logger.error(
LOG_EVENT_DATABASE_REQUIRED_FOR_ANNOTATIONS,
meeting_id=request.meeting_id,
)
await abort_database_required(context, _ENTITY_ANNOTATIONS)
meeting_id = await parse_meeting_id_or_abort(request.meeting_id, context)
@@ -58,6 +70,14 @@ class AnnotationMixin:
saved = await repo.annotations.add(annotation)
await repo.commit()
logger.info(
"annotation_added",
annotation_id=str(saved.id),
meeting_id=str(meeting_id),
annotation_type=annotation_type.value,
start_time=saved.start_time,
end_time=saved.end_time,
)
return annotation_to_proto(saved)
async def GetAnnotation(
@@ -68,16 +88,34 @@ class AnnotationMixin:
"""Get an annotation by ID."""
async with self._create_repository_provider() as repo:
if not repo.supports_annotations:
logger.error(
LOG_EVENT_DATABASE_REQUIRED_FOR_ANNOTATIONS,
annotation_id=request.annotation_id,
)
await abort_database_required(context, _ENTITY_ANNOTATIONS)
try:
annotation_id = parse_annotation_id(request.annotation_id)
except ValueError:
logger.error(
LOG_EVENT_INVALID_ANNOTATION_ID,
annotation_id=request.annotation_id,
)
await abort_invalid_argument(context, "Invalid annotation_id")
annotation = await repo.annotations.get(annotation_id)
if annotation is None:
logger.error(
LOG_EVENT_ANNOTATION_NOT_FOUND,
annotation_id=request.annotation_id,
)
await abort_not_found(context, _ENTITY_ANNOTATION, request.annotation_id)
logger.debug(
"annotation_retrieved",
annotation_id=str(annotation_id),
meeting_id=str(annotation.meeting_id),
annotation_type=annotation.annotation_type.value,
)
return annotation_to_proto(annotation)
async def ListAnnotations(
@@ -88,11 +126,16 @@ class AnnotationMixin:
"""List annotations for a meeting."""
async with self._create_repository_provider() as repo:
if not repo.supports_annotations:
logger.error(
LOG_EVENT_DATABASE_REQUIRED_FOR_ANNOTATIONS,
meeting_id=request.meeting_id,
)
await abort_database_required(context, _ENTITY_ANNOTATIONS)
meeting_id = await parse_meeting_id_or_abort(request.meeting_id, context)
# Check if time range filter is specified
if request.start_time > 0 or request.end_time > 0:
has_time_filter = request.start_time > 0 or request.end_time > 0
if has_time_filter:
annotations = await repo.annotations.get_by_time_range(
meeting_id,
request.start_time,
@@ -101,6 +144,14 @@ class AnnotationMixin:
else:
annotations = await repo.annotations.get_by_meeting(meeting_id)
logger.debug(
"annotations_listed",
meeting_id=str(meeting_id),
count=len(annotations),
has_time_filter=has_time_filter,
start_time=request.start_time if has_time_filter else None,
end_time=request.end_time if has_time_filter else None,
)
return noteflow_pb2.ListAnnotationsResponse(
annotations=[annotation_to_proto(a) for a in annotations]
)
@@ -113,15 +164,27 @@ class AnnotationMixin:
"""Update an existing annotation."""
async with self._create_repository_provider() as repo:
if not repo.supports_annotations:
logger.error(
LOG_EVENT_DATABASE_REQUIRED_FOR_ANNOTATIONS,
annotation_id=request.annotation_id,
)
await abort_database_required(context, _ENTITY_ANNOTATIONS)
try:
annotation_id = parse_annotation_id(request.annotation_id)
except ValueError:
logger.error(
LOG_EVENT_INVALID_ANNOTATION_ID,
annotation_id=request.annotation_id,
)
await abort_invalid_argument(context, "Invalid annotation_id")
annotation = await repo.annotations.get(annotation_id)
if annotation is None:
logger.error(
LOG_EVENT_ANNOTATION_NOT_FOUND,
annotation_id=request.annotation_id,
)
await abort_not_found(context, _ENTITY_ANNOTATION, request.annotation_id)
# Update fields if provided
@@ -138,6 +201,12 @@ class AnnotationMixin:
updated = await repo.annotations.update(annotation)
await repo.commit()
logger.info(
"annotation_updated",
annotation_id=str(annotation_id),
meeting_id=str(updated.meeting_id),
annotation_type=updated.annotation_type.value,
)
return annotation_to_proto(updated)
async def DeleteAnnotation(
@@ -148,15 +217,31 @@ class AnnotationMixin:
"""Delete an annotation."""
async with self._create_repository_provider() as repo:
if not repo.supports_annotations:
logger.error(
LOG_EVENT_DATABASE_REQUIRED_FOR_ANNOTATIONS,
annotation_id=request.annotation_id,
)
await abort_database_required(context, _ENTITY_ANNOTATIONS)
try:
annotation_id = parse_annotation_id(request.annotation_id)
except ValueError:
logger.error(
LOG_EVENT_INVALID_ANNOTATION_ID,
annotation_id=request.annotation_id,
)
await abort_invalid_argument(context, "Invalid annotation_id")
success = await repo.annotations.delete(annotation_id)
if success:
await repo.commit()
logger.info(
"annotation_deleted",
annotation_id=str(annotation_id),
)
return noteflow_pb2.DeleteAnnotationResponse(success=True)
logger.error(
LOG_EVENT_ANNOTATION_NOT_FOUND,
annotation_id=request.annotation_id,
)
await abort_not_found(context, _ENTITY_ANNOTATION, request.annotation_id)

View File

@@ -9,10 +9,13 @@ import grpc.aio
from noteflow.application.services.calendar_service import CalendarServiceError
from noteflow.domain.entities.integration import IntegrationStatus
from noteflow.domain.value_objects import OAuthProvider
from noteflow.infrastructure.logging import get_logger
from ..proto import noteflow_pb2
from .errors import abort_internal, abort_invalid_argument, abort_unavailable
logger = get_logger(__name__)
_ERR_CALENDAR_NOT_ENABLED = "Calendar integration not enabled"
if TYPE_CHECKING:
@@ -50,12 +53,20 @@ class CalendarMixin:
) -> noteflow_pb2.ListCalendarEventsResponse:
"""List upcoming calendar events from connected providers."""
if self._calendar_service is None:
logger.warning("calendar_list_events_unavailable", reason="service_not_enabled")
await abort_unavailable(context, _ERR_CALENDAR_NOT_ENABLED)
provider = request.provider if request.provider else None
hours_ahead = request.hours_ahead if request.hours_ahead > 0 else None
limit = request.limit if request.limit > 0 else None
logger.debug(
"calendar_list_events_request",
provider=provider,
hours_ahead=hours_ahead,
limit=limit,
)
try:
events = await self._calendar_service.list_calendar_events(
provider=provider,
@@ -63,6 +74,7 @@ class CalendarMixin:
limit=limit,
)
except CalendarServiceError as e:
logger.error("calendar_list_events_failed", error=str(e), provider=provider)
await abort_internal(context, str(e))
proto_events = [
@@ -80,6 +92,12 @@ class CalendarMixin:
for event in events
]
logger.info(
"calendar_list_events_success",
provider=provider,
event_count=len(proto_events),
)
return noteflow_pb2.ListCalendarEventsResponse(
events=proto_events,
total_count=len(proto_events),
@@ -92,21 +110,38 @@ class CalendarMixin:
) -> noteflow_pb2.GetCalendarProvidersResponse:
"""Get available calendar providers with authentication status."""
if self._calendar_service is None:
logger.warning("calendar_providers_unavailable", reason="service_not_enabled")
await abort_unavailable(context, _ERR_CALENDAR_NOT_ENABLED)
logger.debug("calendar_get_providers_request")
providers = []
for provider_name, display_name in [
(OAuthProvider.GOOGLE.value, "Google Calendar"),
(OAuthProvider.OUTLOOK.value, "Microsoft Outlook"),
]:
status = await self._calendar_service.get_connection_status(provider_name)
is_authenticated = status.status == IntegrationStatus.CONNECTED.value
providers.append(
noteflow_pb2.CalendarProvider(
name=provider_name,
is_authenticated=status.status == IntegrationStatus.CONNECTED.value,
is_authenticated=is_authenticated,
display_name=display_name,
)
)
logger.debug(
"calendar_provider_status",
provider=provider_name,
is_authenticated=is_authenticated,
status=status.status,
)
authenticated_count = sum(1 for p in providers if p.is_authenticated)
logger.info(
"calendar_get_providers_success",
total_providers=len(providers),
authenticated_count=authenticated_count,
)
return noteflow_pb2.GetCalendarProvidersResponse(providers=providers)
@@ -117,16 +152,34 @@ class CalendarMixin:
) -> noteflow_pb2.InitiateOAuthResponse:
"""Start OAuth flow for a calendar provider."""
if self._calendar_service is None:
logger.warning("oauth_initiate_unavailable", reason="service_not_enabled")
await abort_unavailable(context, _ERR_CALENDAR_NOT_ENABLED)
logger.debug(
"oauth_initiate_request",
provider=request.provider,
has_redirect_uri=bool(request.redirect_uri),
)
try:
auth_url, state = await self._calendar_service.initiate_oauth(
provider=request.provider,
redirect_uri=request.redirect_uri if request.redirect_uri else None,
)
except CalendarServiceError as e:
logger.error(
"oauth_initiate_failed",
provider=request.provider,
error=str(e),
)
await abort_invalid_argument(context, str(e))
logger.info(
"oauth_initiate_success",
provider=request.provider,
state=state,
)
return noteflow_pb2.InitiateOAuthResponse(
auth_url=auth_url,
state=state,
@@ -139,8 +192,15 @@ class CalendarMixin:
) -> noteflow_pb2.CompleteOAuthResponse:
"""Complete OAuth flow with authorization code."""
if self._calendar_service is None:
logger.warning("oauth_complete_unavailable", reason="service_not_enabled")
await abort_unavailable(context, _ERR_CALENDAR_NOT_ENABLED)
logger.debug(
"oauth_complete_request",
provider=request.provider,
state=request.state,
)
try:
success = await self._calendar_service.complete_oauth(
provider=request.provider,
@@ -148,6 +208,11 @@ class CalendarMixin:
state=request.state,
)
except CalendarServiceError as e:
logger.warning(
"oauth_complete_failed",
provider=request.provider,
error=str(e),
)
return noteflow_pb2.CompleteOAuthResponse(
success=False,
error_message=str(e),
@@ -156,6 +221,12 @@ class CalendarMixin:
# Get the provider email after successful connection
status = await self._calendar_service.get_connection_status(request.provider)
logger.info(
"oauth_complete_success",
provider=request.provider,
email=status.email,
)
return noteflow_pb2.CompleteOAuthResponse(
success=success,
provider_email=status.email or "",
@@ -168,10 +239,25 @@ class CalendarMixin:
) -> noteflow_pb2.GetOAuthConnectionStatusResponse:
"""Get OAuth connection status for a provider."""
if self._calendar_service is None:
logger.warning("oauth_status_unavailable", reason="service_not_enabled")
await abort_unavailable(context, _ERR_CALENDAR_NOT_ENABLED)
logger.debug(
"oauth_status_request",
provider=request.provider,
integration_type=request.integration_type or "calendar",
)
info = await self._calendar_service.get_connection_status(request.provider)
logger.info(
"oauth_status_retrieved",
provider=request.provider,
status=info.status,
has_email=bool(info.email),
has_error=bool(info.error_message),
)
return noteflow_pb2.GetOAuthConnectionStatusResponse(
connection=_build_oauth_connection(info, request.integration_type or "calendar")
)
@@ -183,8 +269,16 @@ class CalendarMixin:
) -> noteflow_pb2.DisconnectOAuthResponse:
"""Disconnect OAuth integration and revoke tokens."""
if self._calendar_service is None:
logger.warning("oauth_disconnect_unavailable", reason="service_not_enabled")
await abort_unavailable(context, _ERR_CALENDAR_NOT_ENABLED)
logger.debug("oauth_disconnect_request", provider=request.provider)
success = await self._calendar_service.disconnect(request.provider)
if success:
logger.info("oauth_disconnect_success", provider=request.provider)
else:
logger.warning("oauth_disconnect_failed", provider=request.provider)
return noteflow_pb2.DisconnectOAuthResponse(success=success)

View File

@@ -3,15 +3,19 @@
from __future__ import annotations
import time
from datetime import UTC, datetime
from typing import TYPE_CHECKING
from uuid import UUID
from google.protobuf.timestamp_pb2 import Timestamp
from noteflow.application.services.export_service import ExportFormat
from noteflow.domain.entities import Annotation, Meeting, Segment, Summary, WordTiming
from noteflow.domain.value_objects import AnnotationId, AnnotationType, MeetingId
from noteflow.infrastructure.converters import AsrConverter
from ..proto import noteflow_pb2
from .errors import _AbortableContext
if TYPE_CHECKING:
from noteflow.infrastructure.asr.dto import AsrResult
@@ -38,7 +42,7 @@ def parse_meeting_id(meeting_id_str: str) -> MeetingId:
async def parse_meeting_id_or_abort(
meeting_id_str: str,
context: object,
context: _AbortableContext,
) -> MeetingId:
"""Parse meeting ID or abort with INVALID_ARGUMENT.
@@ -46,7 +50,7 @@ async def parse_meeting_id_or_abort(
Args:
meeting_id_str: Meeting ID as string (UUID format).
context: gRPC servicer context.
context: gRPC servicer context with abort capability.
Returns:
MeetingId value object.
@@ -317,3 +321,85 @@ def proto_to_export_format(proto_format: int) -> ExportFormat:
if proto_format == noteflow_pb2.EXPORT_FORMAT_PDF:
return ExportFormat.PDF
return ExportFormat.MARKDOWN # Default to Markdown
# -----------------------------------------------------------------------------
# Timestamp Conversion Helpers
# -----------------------------------------------------------------------------
def datetime_to_proto_timestamp(dt: datetime) -> Timestamp:
"""Convert datetime to protobuf Timestamp.
Args:
dt: Datetime to convert (should be timezone-aware).
Returns:
Protobuf Timestamp message.
"""
ts = Timestamp()
ts.FromDatetime(dt)
return ts
def proto_timestamp_to_datetime(ts: Timestamp) -> datetime:
"""Convert protobuf Timestamp to datetime.
Args:
ts: Protobuf Timestamp message.
Returns:
Timezone-aware datetime (UTC).
"""
return ts.ToDatetime().replace(tzinfo=UTC)
def epoch_seconds_to_datetime(seconds: float) -> datetime:
"""Convert Unix epoch seconds to datetime.
Args:
seconds: Unix epoch seconds (float for sub-second precision).
Returns:
Timezone-aware datetime (UTC).
"""
return datetime.fromtimestamp(seconds, tz=UTC)
def datetime_to_epoch_seconds(dt: datetime) -> float:
"""Convert datetime to Unix epoch seconds.
Args:
dt: Datetime to convert.
Returns:
Unix epoch seconds as float.
"""
return dt.timestamp()
def iso_string_to_datetime(iso_str: str) -> datetime:
"""Parse ISO 8601 string to datetime.
Args:
iso_str: ISO 8601 formatted string.
Returns:
Timezone-aware datetime (UTC if no timezone in string).
"""
dt = datetime.fromisoformat(iso_str.replace("Z", "+00:00"))
if dt.tzinfo is None:
dt = dt.replace(tzinfo=UTC)
return dt
def datetime_to_iso_string(dt: datetime) -> str:
"""Format datetime as ISO 8601 string.
Args:
dt: Datetime to format.
Returns:
ISO 8601 formatted string with timezone.
"""
return dt.isoformat()

View File

@@ -3,7 +3,6 @@
from __future__ import annotations
import asyncio
import logging
from typing import TYPE_CHECKING
from uuid import UUID, uuid4
@@ -11,6 +10,7 @@ import grpc
from noteflow.domain.utils import utc_now
from noteflow.domain.value_objects import MeetingState
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.repositories import DiarizationJob
from ...proto import noteflow_pb2
@@ -21,7 +21,7 @@ from ._types import DIARIZATION_TIMEOUT_SECONDS, GrpcContext
if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
def create_diarization_error_response(

View File

@@ -3,9 +3,10 @@
from __future__ import annotations
import asyncio
import logging
from typing import TYPE_CHECKING
from noteflow.infrastructure.logging import get_logger
from ...proto import noteflow_pb2
from ._jobs import JobsMixin
from ._refinement import RefinementMixin
@@ -16,7 +17,7 @@ from ._types import GrpcContext
if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class DiarizationMixin(

View File

@@ -2,13 +2,13 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
import numpy as np
from noteflow.infrastructure.audio.reader import MeetingAudioReader
from noteflow.infrastructure.diarization import SpeakerTurn
from noteflow.infrastructure.logging import get_logger
from ..converters import parse_meeting_id_or_none
from ._speaker import apply_speaker_to_segment
@@ -16,7 +16,7 @@ from ._speaker import apply_speaker_to_segment
if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class RefinementMixin:

View File

@@ -85,10 +85,10 @@ class SpeakerMixin:
"""
if not request.old_speaker_id or not request.new_speaker_name:
await abort_invalid_argument(
context, "old_speaker_id and new_speaker_name are required" # type: ignore[arg-type]
context, "old_speaker_id and new_speaker_name are required"
)
meeting_id = await parse_meeting_id_or_abort(request.meeting_id, context) # type: ignore[arg-type]
meeting_id = await parse_meeting_id_or_abort(request.meeting_id, context)
updated_count = 0

View File

@@ -2,10 +2,10 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
from noteflow.domain.utils import utc_now
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.repositories import DiarizationJob
from ...proto import noteflow_pb2
@@ -15,7 +15,7 @@ from ._types import DIARIZATION_TIMEOUT_SECONDS
if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class JobStatusMixin:

View File

@@ -3,19 +3,19 @@
from __future__ import annotations
import asyncio
import logging
from functools import partial
from typing import TYPE_CHECKING
import numpy as np
from numpy.typing import NDArray
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.repositories import StreamingTurn
if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class StreamingDiarizationMixin:

View File

@@ -4,13 +4,13 @@ from __future__ import annotations
import asyncio
import contextlib
import logging
from datetime import timedelta
from typing import TYPE_CHECKING, Protocol
import grpc
from noteflow.domain.utils.time import utc_now
from noteflow.infrastructure.logging import get_logger
from ..proto import noteflow_pb2
from .errors import ERR_CANCELLED_BY_USER, abort_not_found
@@ -18,7 +18,7 @@ from .errors import ERR_CANCELLED_BY_USER, abort_not_found
if TYPE_CHECKING:
from .protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
# Diarization job TTL default (1 hour in seconds)

View File

@@ -2,12 +2,13 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
from uuid import UUID
import grpc.aio
from noteflow.infrastructure.logging import get_logger
from ..proto import noteflow_pb2
from .converters import parse_meeting_id_or_abort
from .errors import (
@@ -25,7 +26,7 @@ if TYPE_CHECKING:
from .protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class EntitiesMixin:

View File

@@ -9,6 +9,7 @@ import grpc.aio
from noteflow.application.services.export_service import ExportFormat, ExportService
from noteflow.config.constants import EXPORT_EXT_HTML, EXPORT_EXT_PDF, EXPORT_FORMAT_HTML
from noteflow.infrastructure.logging import get_logger
from ..proto import noteflow_pb2
from .converters import parse_meeting_id_or_abort, proto_to_export_format
@@ -17,6 +18,8 @@ from .errors import ENTITY_MEETING, abort_not_found
if TYPE_CHECKING:
from .protocols import ServicerHost
logger = get_logger(__name__)
# Format metadata lookup
_FORMAT_METADATA: dict[ExportFormat, tuple[str, str]] = {
ExportFormat.MARKDOWN: ("Markdown", ".md"),
@@ -40,6 +43,13 @@ class ExportMixin:
"""Export meeting transcript to specified format."""
# Map proto format to ExportFormat
fmt = proto_to_export_format(request.format)
fmt_name, fmt_ext = _FORMAT_METADATA.get(fmt, ("Unknown", ""))
logger.info(
"Export requested: meeting_id=%s format=%s",
request.meeting_id,
fmt_name,
)
# Use unified repository provider - works with both DB and memory
meeting_id = await parse_meeting_id_or_abort(request.meeting_id, context)
@@ -55,16 +65,28 @@ class ExportMixin:
# PDF returns bytes which must be base64-encoded for gRPC string transport
if isinstance(result, bytes):
content = base64.b64encode(result).decode("ascii")
content_size = len(result)
else:
content = result
content_size = len(content)
# Get format metadata
fmt_name, fmt_ext = _FORMAT_METADATA.get(fmt, ("Unknown", ""))
logger.info(
"Export completed: meeting_id=%s format=%s bytes=%d",
request.meeting_id,
fmt_name,
content_size,
)
return noteflow_pb2.ExportTranscriptResponse(
content=content,
format_name=fmt_name,
file_extension=fmt_ext,
)
except ValueError:
except ValueError as exc:
logger.error(
"Export failed: meeting_id=%s format=%s error=%s",
request.meeting_id,
fmt_name,
str(exc),
)
await abort_not_found(context, ENTITY_MEETING, request.meeting_id)

View File

@@ -3,7 +3,6 @@
from __future__ import annotations
import asyncio
import logging
from typing import TYPE_CHECKING
from uuid import UUID
@@ -16,7 +15,7 @@ from noteflow.config.constants import (
from noteflow.domain.entities import Meeting
from noteflow.domain.ports.unit_of_work import UnitOfWork
from noteflow.domain.value_objects import MeetingState
from noteflow.infrastructure.logging import get_workspace_id
from noteflow.infrastructure.logging import get_logger, get_workspace_id
from ..proto import noteflow_pb2
from .converters import meeting_to_proto, parse_meeting_id_or_abort
@@ -25,7 +24,7 @@ from .errors import ENTITY_MEETING, abort_invalid_argument, abort_not_found
if TYPE_CHECKING:
from .protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
# Timeout for waiting for stream to exit gracefully
STOP_WAIT_TIMEOUT_SECONDS: float = 2.0
@@ -91,6 +90,10 @@ class MeetingMixin:
try:
project_id = UUID(request.project_id)
except ValueError:
logger.warning(
"CreateMeeting: invalid project_id format",
project_id=request.project_id,
)
await abort_invalid_argument(context, f"{ERROR_INVALID_PROJECT_ID_PREFIX}{request.project_id}")
async with self._create_repository_provider() as repo:
@@ -104,6 +107,12 @@ class MeetingMixin:
)
saved = await repo.meetings.create(meeting)
await repo.commit()
logger.info(
"Meeting created",
meeting_id=str(saved.id),
title=saved.title or DEFAULT_MEETING_TITLE,
project_id=str(project_id) if project_id else None,
)
return meeting_to_proto(saved)
async def StopMeeting(
@@ -117,6 +126,7 @@ class MeetingMixin:
and waits briefly for it to exit before closing resources.
"""
meeting_id = request.meeting_id
logger.info("StopMeeting requested", meeting_id=meeting_id)
# Signal stop to active stream and wait for graceful exit
if meeting_id in self._active_streams:
@@ -138,50 +148,49 @@ class MeetingMixin:
async with self._create_repository_provider() as repo:
meeting = await repo.meetings.get(parsed_meeting_id)
if meeting is None:
logger.warning("StopMeeting: meeting not found", meeting_id=meeting_id)
await abort_not_found(context, ENTITY_MEETING, meeting_id)
# Idempotency guard: return success if already stopped/stopping/completed
if meeting.state in (
MeetingState.STOPPED,
MeetingState.STOPPING,
MeetingState.COMPLETED,
):
previous_state = meeting.state.value
# Idempotency: return success if already stopped/stopping/completed
terminal_states = (MeetingState.STOPPED, MeetingState.STOPPING, MeetingState.COMPLETED)
if meeting.state in terminal_states:
logger.debug("StopMeeting: already terminal", meeting_id=meeting_id, state=meeting.state.value)
return meeting_to_proto(meeting)
try:
# Graceful shutdown: RECORDING -> STOPPING -> STOPPED
meeting.begin_stopping()
meeting.begin_stopping() # RECORDING -> STOPPING -> STOPPED
meeting.stop_recording()
except ValueError as e:
logger.error("StopMeeting: invalid transition", meeting_id=meeting_id, state=previous_state, error=str(e))
await abort_invalid_argument(context, str(e))
await repo.meetings.update(meeting)
# Clean up streaming diarization turns if DB supports it
if repo.supports_diarization_jobs:
await repo.diarization_jobs.clear_streaming_turns(meeting_id)
await repo.commit()
# Trigger webhooks (fire-and-forget)
if self._webhook_service is not None:
try:
await self._webhook_service.trigger_recording_stopped(
meeting_id=meeting_id,
title=meeting.title or DEFAULT_MEETING_TITLE,
duration_seconds=meeting.duration_seconds or 0.0,
)
# INTENTIONAL BROAD HANDLER: Fire-and-forget webhook
# - Webhook failures must never block StopMeeting RPC
except Exception:
logger.exception("Failed to trigger recording.stopped webhooks")
try:
await self._webhook_service.trigger_meeting_completed(meeting)
# INTENTIONAL BROAD HANDLER: Fire-and-forget webhook
# - Webhook failures must never block StopMeeting RPC
except Exception:
logger.exception("Failed to trigger meeting.completed webhooks")
logger.info("Meeting stopped", meeting_id=meeting_id, from_state=previous_state, to_state=meeting.state.value)
await self._fire_stop_webhooks(meeting)
return meeting_to_proto(meeting)
async def _fire_stop_webhooks(self: ServicerHost, meeting: Meeting) -> None:
"""Trigger webhooks for meeting stop (fire-and-forget)."""
if self._webhook_service is None:
return
try:
await self._webhook_service.trigger_recording_stopped(
meeting_id=str(meeting.id),
title=meeting.title or DEFAULT_MEETING_TITLE,
duration_seconds=meeting.duration_seconds or 0.0,
)
except Exception:
logger.exception("Failed to trigger recording.stopped webhooks")
try:
await self._webhook_service.trigger_meeting_completed(meeting)
except Exception:
logger.exception("Failed to trigger meeting.completed webhooks")
async def ListMeetings(
self: ServicerHost,
request: noteflow_pb2.ListMeetingsRequest,
@@ -211,6 +220,14 @@ class MeetingMixin:
sort_desc=sort_desc,
project_id=project_id,
)
logger.debug(
"ListMeetings returned",
count=len(meetings),
total=total,
limit=limit,
offset=offset,
project_id=str(project_id) if project_id else None,
)
return noteflow_pb2.ListMeetingsResponse(
meetings=[meeting_to_proto(m, include_segments=False) for m in meetings],
total_count=total,
@@ -222,10 +239,17 @@ class MeetingMixin:
context: grpc.aio.ServicerContext,
) -> noteflow_pb2.Meeting:
"""Get meeting details."""
logger.debug(
"GetMeeting requested",
meeting_id=request.meeting_id,
include_segments=request.include_segments,
include_summary=request.include_summary,
)
meeting_id = await parse_meeting_id_or_abort(request.meeting_id, context)
async with self._create_repository_provider() as repo:
meeting = await repo.meetings.get(meeting_id)
if meeting is None:
logger.warning("GetMeeting: meeting not found", meeting_id=request.meeting_id)
await abort_not_found(context, ENTITY_MEETING, request.meeting_id)
# Load segments if requested
if request.include_segments:
@@ -247,10 +271,13 @@ class MeetingMixin:
context: grpc.aio.ServicerContext,
) -> noteflow_pb2.DeleteMeetingResponse:
"""Delete a meeting."""
logger.info("DeleteMeeting requested", meeting_id=request.meeting_id)
meeting_id = await parse_meeting_id_or_abort(request.meeting_id, context)
async with self._create_repository_provider() as repo:
success = await repo.meetings.delete(meeting_id)
if success:
await repo.commit()
logger.info("Meeting deleted", meeting_id=request.meeting_id)
return noteflow_pb2.DeleteMeetingResponse(success=True)
logger.warning("DeleteMeeting: meeting not found", meeting_id=request.meeting_id)
await abort_not_found(context, ENTITY_MEETING, request.meeting_id)

View File

@@ -4,11 +4,12 @@ from __future__ import annotations
import hashlib
import json
import logging
from typing import TYPE_CHECKING
import grpc.aio
from noteflow.infrastructure.logging import get_logger
from ..proto import noteflow_pb2
from .errors import abort_database_required, abort_failed_precondition
@@ -16,7 +17,7 @@ if TYPE_CHECKING:
from .protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
# Entity type names for error messages
_ENTITY_PREFERENCES = "Preferences"

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
from uuid import UUID
@@ -10,6 +9,7 @@ import grpc.aio
from noteflow.config.constants import ERROR_INVALID_PROJECT_ID_PREFIX
from noteflow.domain.errors import CannotArchiveDefaultProjectError
from noteflow.infrastructure.logging import get_logger
from ...proto import noteflow_pb2
from ..errors import (
@@ -29,7 +29,7 @@ if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
async def _require_project_service(

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from collections.abc import AsyncIterator
from typing import TYPE_CHECKING
@@ -10,6 +9,7 @@ import numpy as np
from numpy.typing import NDArray
from noteflow.domain.entities import Segment
from noteflow.infrastructure.logging import get_logger
from ...proto import noteflow_pb2
from ..converters import (
@@ -21,7 +21,7 @@ from ..converters import (
if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
async def process_audio_segment(

View File

@@ -2,13 +2,14 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
def cleanup_stream_resources(host: ServicerHost, meeting_id: str) -> None:

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from collections.abc import AsyncIterator
from typing import TYPE_CHECKING
@@ -10,6 +9,8 @@ import grpc.aio
import numpy as np
from numpy.typing import NDArray
from noteflow.infrastructure.logging import get_logger
from ...proto import noteflow_pb2
from .._audio_helpers import convert_audio_format
from ..errors import abort_failed_precondition, abort_invalid_argument
@@ -29,7 +30,7 @@ from ._types import StreamSessionInit
if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class StreamingMixin:

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from collections.abc import AsyncIterator
from typing import TYPE_CHECKING
@@ -10,6 +9,8 @@ import grpc.aio
import numpy as np
from numpy.typing import NDArray
from noteflow.infrastructure.logging import get_logger
from ...proto import noteflow_pb2
from .._audio_helpers import convert_audio_format, decode_audio_chunk, validate_stream_format
from ..converters import create_vad_update
@@ -20,7 +21,7 @@ from ._partials import clear_partial_buffer, maybe_emit_partial
if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
async def process_stream_chunk(

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
import grpc
@@ -10,6 +9,7 @@ import grpc.aio
from noteflow.config.constants import DEFAULT_MEETING_TITLE, ERROR_MSG_MEETING_PREFIX
from noteflow.infrastructure.diarization import SpeakerTurn
from noteflow.infrastructure.logging import get_logger
from ..converters import parse_meeting_id_or_none
from ..errors import abort_failed_precondition
@@ -18,7 +18,7 @@ from ._types import StreamSessionInit
if TYPE_CHECKING:
from ..protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class StreamSessionManager:
@@ -87,7 +87,7 @@ class StreamSessionManager:
return StreamSessionInit(
next_segment_id=0,
error_code=grpc.StatusCode.NOT_FOUND,
error_message=f"Meeting {meeting_id} not found",
error_message=f"{ERROR_MSG_MEETING_PREFIX}{meeting_id} not found",
)
dek, wrapped_dek, dek_updated = host._ensure_meeting_dek(meeting)

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
import grpc.aio
@@ -10,6 +9,7 @@ import grpc.aio
from noteflow.domain.entities import Segment, Summary
from noteflow.domain.summarization import ProviderUnavailableError
from noteflow.domain.value_objects import MeetingId
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.summarization._parsing import build_style_prompt
from ..proto import noteflow_pb2
@@ -21,7 +21,7 @@ if TYPE_CHECKING:
from .protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class SummarizationMixin:

View File

@@ -3,13 +3,13 @@
from __future__ import annotations
import asyncio
import logging
from typing import TYPE_CHECKING
from uuid import UUID
import grpc.aio
from noteflow.domain.entities import SyncRun
from noteflow.infrastructure.logging import get_logger
from ..proto import noteflow_pb2
from .errors import (
@@ -24,7 +24,7 @@ from .errors import (
if TYPE_CHECKING:
from .protocols import ServicerHost
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
_ERR_CALENDAR_NOT_ENABLED = "Calendar integration not enabled"
@@ -49,77 +49,64 @@ class SyncMixin:
request: noteflow_pb2.StartIntegrationSyncRequest,
context: grpc.aio.ServicerContext,
) -> noteflow_pb2.StartIntegrationSyncResponse:
"""Start a sync operation for an integration.
Creates a sync run record and kicks off the actual sync asynchronously.
"""
"""Start a sync operation for an integration."""
if self._calendar_service is None:
await abort_unavailable(context, _ERR_CALENDAR_NOT_ENABLED)
try:
integration_id = UUID(request.integration_id)
except ValueError:
await abort_invalid_argument(
context,
f"Invalid integration_id format: {request.integration_id}",
)
await abort_invalid_argument(context, f"Invalid integration_id format: {request.integration_id}")
return noteflow_pb2.StartIntegrationSyncResponse()
# Verify integration exists
async with self._create_repository_provider() as uow:
integration = await uow.integrations.get(integration_id)
# Fallback: if integration not found by ID, try looking up by provider name
# This handles cases where frontend uses local IDs that don't match backend
integration, integration_id = await self._resolve_integration(uow, integration_id, context, request)
if integration is None:
# Try to find connected calendar integration by provider (google/outlook)
from noteflow.domain.value_objects import OAuthProvider
for provider_name in [OAuthProvider.GOOGLE, OAuthProvider.OUTLOOK]:
candidate = await uow.integrations.get_by_provider(
provider=provider_name,
integration_type="calendar",
)
if candidate is not None and candidate.is_connected:
integration = candidate
integration_id = integration.id
break
if integration is None:
await abort_not_found(context, ENTITY_INTEGRATION, request.integration_id)
return noteflow_pb2.StartIntegrationSyncResponse()
return noteflow_pb2.StartIntegrationSyncResponse()
provider = integration.config.get("provider") if integration.config else None
if not provider:
await abort_failed_precondition(
context,
"Integration provider not configured",
)
await abort_failed_precondition(context, "Integration provider not configured")
return noteflow_pb2.StartIntegrationSyncResponse()
# Create sync run
sync_run = SyncRun.start(integration_id)
sync_run = await uow.integrations.create_sync_run(sync_run)
await uow.commit()
# Cache the sync run for quick status lookups
cache = self._ensure_sync_runs_cache()
cache[sync_run.id] = sync_run
# Fire off async sync task (store reference to prevent GC)
sync_task = asyncio.create_task(
asyncio.create_task(
self._perform_sync(integration_id, sync_run.id, str(provider)),
name=f"sync-{sync_run.id}",
)
# Add callback to clean up on completion
sync_task.add_done_callback(lambda _: None)
).add_done_callback(lambda _: None)
logger.info("Started sync run %s for integration %s", sync_run.id, integration_id)
return noteflow_pb2.StartIntegrationSyncResponse(sync_run_id=str(sync_run.id), status="running")
return noteflow_pb2.StartIntegrationSyncResponse(
sync_run_id=str(sync_run.id),
status="running",
)
async def _resolve_integration(
self: ServicerHost,
uow: object,
integration_id: UUID,
context: grpc.aio.ServicerContext,
request: noteflow_pb2.StartIntegrationSyncRequest,
) -> tuple[object | None, UUID]:
"""Resolve integration by ID with provider fallback.
Returns (integration, resolved_id) tuple. Returns (None, id) if not found after aborting.
"""
from noteflow.domain.value_objects import OAuthProvider
integration = await uow.integrations.get(integration_id)
if integration is not None:
return integration, integration_id
# Fallback: try connected calendar integrations by provider
for provider_name in [OAuthProvider.GOOGLE, OAuthProvider.OUTLOOK]:
candidate = await uow.integrations.get_by_provider(provider=provider_name, integration_type="calendar")
if candidate is not None and candidate.is_connected:
return candidate, candidate.id
await abort_not_found(context, ENTITY_INTEGRATION, request.integration_id)
return None, integration_id
async def _perform_sync(
self: ServicerHost,

View File

@@ -8,16 +8,26 @@ from uuid import UUID
import grpc.aio
from noteflow.config.constants import (
LOG_EVENT_INVALID_WEBHOOK_ID,
LOG_EVENT_WEBHOOK_DELETE_FAILED,
LOG_EVENT_WEBHOOK_REGISTRATION_FAILED,
LOG_EVENT_WEBHOOK_UPDATE_FAILED,
)
from noteflow.domain.errors import ErrorCode
from noteflow.domain.utils.time import utc_now
from noteflow.domain.webhooks.events import (
WebhookConfig,
WebhookDelivery,
WebhookEventType,
)
from noteflow.infrastructure.logging import get_logger
from ..proto import noteflow_pb2
from .errors import abort_database_required, abort_invalid_argument, abort_not_found
logger = get_logger(__name__)
if TYPE_CHECKING:
from .protocols import ServicerHost
@@ -85,28 +95,30 @@ class WebhooksMixin:
"""Register a new webhook configuration."""
# Validate URL
if not request.url or not request.url.startswith(("http://", "https://")):
await abort_invalid_argument(
context, "URL must start with http:// or https://"
)
logger.error(LOG_EVENT_WEBHOOK_REGISTRATION_FAILED, reason="invalid_url", url=request.url)
await abort_invalid_argument(context, "URL must start with http:// or https://")
# Validate events
if not request.events:
logger.error(LOG_EVENT_WEBHOOK_REGISTRATION_FAILED, reason="no_events", url=request.url)
await abort_invalid_argument(context, "At least one event type required")
try:
events = _parse_events(list(request.events))
except ValueError as exc:
logger.error(LOG_EVENT_WEBHOOK_REGISTRATION_FAILED, reason="invalid_event_type", url=request.url, error=str(exc))
await abort_invalid_argument(context, f"Invalid event type: {exc}")
try:
workspace_id = UUID(request.workspace_id)
except ValueError:
from noteflow.config.constants import ERROR_INVALID_WORKSPACE_ID_FORMAT
logger.error(LOG_EVENT_WEBHOOK_REGISTRATION_FAILED, reason="invalid_workspace_id", workspace_id=request.workspace_id)
await abort_invalid_argument(context, ERROR_INVALID_WORKSPACE_ID_FORMAT)
async with self._create_repository_provider() as uow:
if not uow.supports_webhooks:
logger.error(LOG_EVENT_WEBHOOK_REGISTRATION_FAILED, reason=ErrorCode.DATABASE_REQUIRED.code)
await abort_database_required(context, _ENTITY_WEBHOOKS)
config = WebhookConfig.create(
@@ -120,6 +132,7 @@ class WebhooksMixin:
)
saved = await uow.webhooks.create(config)
await uow.commit()
logger.info("webhook_registered", webhook_id=str(saved.id), workspace_id=str(workspace_id), url=request.url, name=saved.name)
return _webhook_config_to_proto(saved)
async def ListWebhooks(
@@ -130,6 +143,10 @@ class WebhooksMixin:
"""List registered webhooks."""
async with self._create_repository_provider() as uow:
if not uow.supports_webhooks:
logger.error(
"webhook_list_failed",
reason=ErrorCode.DATABASE_REQUIRED.code,
)
await abort_database_required(context, _ENTITY_WEBHOOKS)
if request.enabled_only:
@@ -137,6 +154,11 @@ class WebhooksMixin:
else:
webhooks = await uow.webhooks.get_all()
logger.debug(
"webhooks_listed",
count=len(webhooks),
enabled_only=request.enabled_only,
)
return noteflow_pb2.ListWebhooksResponse(
webhooks=[_webhook_config_to_proto(w) for w in webhooks],
total_count=len(webhooks),
@@ -151,14 +173,29 @@ class WebhooksMixin:
try:
webhook_id = _parse_webhook_id(request.webhook_id)
except ValueError:
logger.error(
LOG_EVENT_WEBHOOK_UPDATE_FAILED,
reason=LOG_EVENT_INVALID_WEBHOOK_ID,
webhook_id=request.webhook_id,
)
await abort_invalid_argument(context, _ERR_INVALID_WEBHOOK_ID)
async with self._create_repository_provider() as uow:
if not uow.supports_webhooks:
logger.error(
LOG_EVENT_WEBHOOK_UPDATE_FAILED,
reason=ErrorCode.DATABASE_REQUIRED.code,
webhook_id=str(webhook_id),
)
await abort_database_required(context, _ENTITY_WEBHOOKS)
config = await uow.webhooks.get_by_id(webhook_id)
if config is None:
logger.error(
LOG_EVENT_WEBHOOK_UPDATE_FAILED,
reason="not_found",
webhook_id=str(webhook_id),
)
await abort_not_found(context, _ENTITY_WEBHOOK, request.webhook_id)
# Build updates dict with proper typing
@@ -184,6 +221,12 @@ class WebhooksMixin:
updated = replace(config, **updates, updated_at=utc_now())
saved = await uow.webhooks.update(updated)
await uow.commit()
logger.info(
"webhook_updated",
webhook_id=str(webhook_id),
updated_fields=list(updates.keys()),
)
return _webhook_config_to_proto(saved)
async def DeleteWebhook(
@@ -195,14 +238,36 @@ class WebhooksMixin:
try:
webhook_id = _parse_webhook_id(request.webhook_id)
except ValueError:
logger.error(
LOG_EVENT_WEBHOOK_DELETE_FAILED,
reason=LOG_EVENT_INVALID_WEBHOOK_ID,
webhook_id=request.webhook_id,
)
await abort_invalid_argument(context, _ERR_INVALID_WEBHOOK_ID)
async with self._create_repository_provider() as uow:
if not uow.supports_webhooks:
logger.error(
LOG_EVENT_WEBHOOK_DELETE_FAILED,
reason=ErrorCode.DATABASE_REQUIRED.code,
webhook_id=str(webhook_id),
)
await abort_database_required(context, _ENTITY_WEBHOOKS)
deleted = await uow.webhooks.delete(webhook_id)
await uow.commit()
if deleted:
logger.info(
"webhook_deleted",
webhook_id=str(webhook_id),
)
else:
logger.error(
LOG_EVENT_WEBHOOK_DELETE_FAILED,
reason="not_found",
webhook_id=str(webhook_id),
)
return noteflow_pb2.DeleteWebhookResponse(success=deleted)
async def GetWebhookDeliveries(
@@ -214,15 +279,32 @@ class WebhooksMixin:
try:
webhook_id = _parse_webhook_id(request.webhook_id)
except ValueError:
logger.error(
"webhook_deliveries_query_failed",
reason=LOG_EVENT_INVALID_WEBHOOK_ID,
webhook_id=request.webhook_id,
)
await abort_invalid_argument(context, _ERR_INVALID_WEBHOOK_ID)
limit = min(request.limit or 50, 500)
async with self._create_repository_provider() as uow:
if not uow.supports_webhooks:
logger.error(
"webhook_deliveries_query_failed",
reason=ErrorCode.DATABASE_REQUIRED.code,
webhook_id=str(webhook_id),
)
await abort_database_required(context, _ENTITY_WEBHOOKS)
deliveries = await uow.webhooks.get_deliveries(webhook_id, limit=limit)
logger.debug(
"webhook_deliveries_queried",
webhook_id=str(webhook_id),
count=len(deliveries),
limit=limit,
)
return noteflow_pb2.GetWebhookDeliveriesResponse(
deliveries=[_webhook_delivery_to_proto(d) for d in deliveries],
total_count=len(deliveries),

View File

@@ -6,7 +6,6 @@ clean separation of concerns for server initialization.
from __future__ import annotations
import logging
import sys
from typing import TypedDict
@@ -30,6 +29,7 @@ from noteflow.config.settings import (
from noteflow.domain.entities.integration import IntegrationStatus
from noteflow.grpc._config import DiarizationConfig, GrpcServerConfig
from noteflow.infrastructure.diarization import DiarizationEngine
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.ner import NerEngine
from noteflow.infrastructure.persistence.database import (
create_engine_and_session_factory,
@@ -49,7 +49,7 @@ class DiarizationEngineKwargs(TypedDict, total=False):
min_speakers: int
max_speakers: int
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
async def _auto_enable_cloud_llm(

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
import queue
import threading
import time
@@ -16,6 +15,7 @@ from noteflow.config.constants import DEFAULT_SAMPLE_RATE
from noteflow.grpc._config import STREAMING_CONFIG
from noteflow.grpc.client import TranscriptSegment
from noteflow.grpc.proto import noteflow_pb2
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
import numpy as np
@@ -24,7 +24,7 @@ if TYPE_CHECKING:
from noteflow.grpc.client import ConnectionCallback, TranscriptCallback
from noteflow.grpc.proto import noteflow_pb2_grpc
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
@dataclass

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import logging
import queue
import threading
from typing import TYPE_CHECKING, Final
@@ -29,6 +28,7 @@ from noteflow.grpc._types import (
TranscriptSegment,
)
from noteflow.grpc.proto import noteflow_pb2, noteflow_pb2_grpc
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
import numpy as np
@@ -48,7 +48,7 @@ __all__ = [
"TranscriptSegment",
]
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
DEFAULT_SERVER: Final[str] = "localhost:50051"
CHUNK_TIMEOUT: Final[float] = 0.1 # Timeout for getting chunks from queue

View File

@@ -6,7 +6,6 @@ by extracting from metadata and setting context variables.
from __future__ import annotations
import logging
from collections.abc import Awaitable, Callable
from typing import TypeVar
@@ -15,12 +14,13 @@ from grpc import aio
from noteflow.infrastructure.logging import (
generate_request_id,
get_logger,
request_id_var,
user_id_var,
workspace_id_var,
)
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
# Metadata keys for identity context
METADATA_REQUEST_ID = "x-request-id"

View File

@@ -4,7 +4,6 @@ from __future__ import annotations
import argparse
import asyncio
import logging
import signal
import time
from typing import TYPE_CHECKING
@@ -15,7 +14,7 @@ from pydantic import ValidationError
from noteflow.config.settings import get_feature_flags, get_settings
from noteflow.infrastructure.asr import FasterWhisperEngine
from noteflow.infrastructure.asr.engine import VALID_MODEL_SIZES
from noteflow.infrastructure.logging import LogBufferHandler
from noteflow.infrastructure.logging import LoggingConfig, configure_logging, get_logger
from noteflow.infrastructure.persistence.unit_of_work import SqlAlchemyUnitOfWork
from noteflow.infrastructure.summarization import create_summarization_service
@@ -49,7 +48,7 @@ if TYPE_CHECKING:
from noteflow.config.settings import Settings
from noteflow.infrastructure.diarization import DiarizationEngine
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class NoteFlowServer:
@@ -515,15 +514,9 @@ def main() -> None:
"""Entry point for NoteFlow gRPC server."""
args = _parse_args()
# Configure logging
log_level = logging.DEBUG if args.verbose else logging.INFO
logging.basicConfig(
level=log_level,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
)
root_logger = logging.getLogger()
if not any(isinstance(handler, LogBufferHandler) for handler in root_logger.handlers):
root_logger.addHandler(LogBufferHandler(level=log_level))
# Configure centralized logging with structlog
log_level = "DEBUG" if args.verbose else "INFO"
configure_logging(LoggingConfig(level=log_level))
# Load settings from environment
try:

View File

@@ -4,7 +4,6 @@ from __future__ import annotations
import asyncio
import contextlib
import logging
import time
from pathlib import Path
from typing import TYPE_CHECKING, ClassVar, Final
@@ -20,6 +19,7 @@ from noteflow.infrastructure.asr import Segmenter, SegmenterConfig, StreamingVad
from noteflow.infrastructure.audio.partial_buffer import PartialAudioBuffer
from noteflow.infrastructure.audio.writer import MeetingAudioWriter
from noteflow.infrastructure.diarization import DiarizationSession
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.persistence.memory import MemoryUnitOfWork
from noteflow.infrastructure.persistence.repositories import DiarizationJob
from noteflow.infrastructure.persistence.unit_of_work import SqlAlchemyUnitOfWork
@@ -59,7 +59,7 @@ if TYPE_CHECKING:
from noteflow.infrastructure.asr import FasterWhisperEngine
from noteflow.infrastructure.diarization import DiarizationEngine, SpeakerTurn
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class NoteFlowServicer(
@@ -140,11 +140,6 @@ class NoteFlowServicer(
self._crypto = AesGcmCryptoBox(self._keystore)
self._audio_writers: dict[str, MeetingAudioWriter] = {}
# Initialize all state dictionaries
self._init_streaming_state_dicts()
def _init_streaming_state_dicts(self) -> None:
"""Initialize all streaming state dictionaries."""
# VAD and segmentation state per meeting
self._vad_instances: dict[str, StreamingVad] = {}
self._segmenters: dict[str, Segmenter] = {}

View File

@@ -6,18 +6,19 @@ Provides Whisper-based transcription with word-level timestamps.
from __future__ import annotations
import asyncio
import logging
from collections.abc import Iterator
from functools import partial
from typing import TYPE_CHECKING, Final
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
import numpy as np
from numpy.typing import NDArray
from noteflow.infrastructure.asr.dto import AsrResult, WordTiming
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
# Available model sizes
VALID_MODEL_SIZES: Final[tuple[str, ...]] = (

View File

@@ -5,6 +5,7 @@ Manages speech segment boundaries using Voice Activity Detection.
from __future__ import annotations
from collections import deque
from dataclasses import dataclass, field
from enum import Enum, auto
from typing import TYPE_CHECKING
@@ -75,7 +76,8 @@ class Segmenter:
_leading_duration: float = field(default=0.0, init=False)
# Audio buffers with cached sample counts for O(1) length lookups
_leading_buffer: list[NDArray[np.float32]] = field(default_factory=list, init=False)
# Using deque for _leading_buffer enables O(1) popleft() vs O(n) pop(0)
_leading_buffer: deque[NDArray[np.float32]] = field(default_factory=deque, init=False)
_leading_buffer_samples: int = field(default=0, init=False)
_speech_buffer: list[NDArray[np.float32]] = field(default_factory=list, init=False)
_speech_buffer_samples: int = field(default=0, init=False)
@@ -238,9 +240,9 @@ class Segmenter:
# Calculate total buffer duration using cached sample count
total_duration = self._leading_buffer_samples / self.config.sample_rate
# Trim to configured leading buffer size
# Trim to configured leading buffer size (O(1) with deque.popleft)
while total_duration > self.config.leading_buffer and self._leading_buffer:
removed = self._leading_buffer.pop(0)
removed = self._leading_buffer.popleft()
self._leading_buffer_samples -= len(removed)
total_duration = self._leading_buffer_samples / self.config.sample_rate

View File

@@ -5,7 +5,6 @@ Provide cross-platform audio input capture with device handling.
from __future__ import annotations
import logging
import time
from typing import TYPE_CHECKING
@@ -14,11 +13,12 @@ import sounddevice as sd
from noteflow.config.constants import DEFAULT_SAMPLE_RATE
from noteflow.infrastructure.audio.dto import AudioDeviceInfo, AudioFrameCallback
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from numpy.typing import NDArray
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class SoundDeviceCapture:

View File

@@ -5,7 +5,6 @@ Provide cross-platform audio output playback from ring buffer audio.
from __future__ import annotations
import logging
import threading
from collections.abc import Callable
from enum import Enum, auto
@@ -16,11 +15,12 @@ import sounddevice as sd
from numpy.typing import NDArray
from noteflow.config.constants import DEFAULT_SAMPLE_RATE, POSITION_UPDATE_INTERVAL
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.infrastructure.audio.dto import TimestampedAudio
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class PlaybackState(Enum):

View File

@@ -7,7 +7,6 @@ Reuses ChunkedAssetReader from security/crypto.py for decryption.
from __future__ import annotations
import json
import logging
from pathlib import Path
from typing import TYPE_CHECKING
@@ -15,12 +14,13 @@ import numpy as np
from noteflow.config.constants import DEFAULT_SAMPLE_RATE
from noteflow.infrastructure.audio.dto import TimestampedAudio
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.security.crypto import ChunkedAssetReader
if TYPE_CHECKING:
from noteflow.infrastructure.security.crypto import AesGcmCryptoBox
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class MeetingAudioReader:

View File

@@ -4,7 +4,6 @@ from __future__ import annotations
import io
import json
import logging
import threading
from datetime import UTC, datetime
from pathlib import Path
@@ -17,6 +16,7 @@ from noteflow.config.constants import (
DEFAULT_SAMPLE_RATE,
PERIODIC_FLUSH_INTERVAL_SECONDS,
)
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.security.crypto import ChunkedAssetWriter
if TYPE_CHECKING:
@@ -24,7 +24,7 @@ if TYPE_CHECKING:
from noteflow.infrastructure.security.crypto import AesGcmCryptoBox
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class MeetingAudioWriter:

View File

@@ -6,17 +6,17 @@ Fetches and parses OIDC provider configuration from
from __future__ import annotations
import logging
from typing import TYPE_CHECKING
import httpx
from noteflow.domain.auth.oidc import OidcDiscoveryConfig
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.domain.auth.oidc import OidcProviderConfig
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class OidcDiscoveryError(Exception):

View File

@@ -6,7 +6,6 @@ like Authentik, Authelia, Keycloak, Auth0, Okta, and Azure AD.
from __future__ import annotations
import logging
from dataclasses import dataclass, field
from typing import TYPE_CHECKING
from uuid import UUID
@@ -20,11 +19,12 @@ from noteflow.infrastructure.auth.oidc_discovery import (
OidcDiscoveryClient,
OidcDiscoveryError,
)
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.domain.ports.unit_of_work import UnitOfWork
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
@dataclass(frozen=True, slots=True)

View File

@@ -5,7 +5,6 @@ Implements CalendarPort for Google Calendar using the Google Calendar API v3.
from __future__ import annotations
import logging
from datetime import UTC, datetime, timedelta
import httpx
@@ -21,8 +20,9 @@ from noteflow.config.constants import (
)
from noteflow.domain.ports.calendar import CalendarEventInfo, CalendarPort
from noteflow.domain.value_objects import OAuthProvider
from noteflow.infrastructure.logging import get_logger
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class GoogleCalendarError(Exception):

View File

@@ -8,7 +8,6 @@ from __future__ import annotations
import base64
import hashlib
import logging
import secrets
from datetime import UTC, datetime, timedelta
from typing import TYPE_CHECKING, ClassVar
@@ -26,11 +25,12 @@ from noteflow.config.constants import (
)
from noteflow.domain.ports.calendar import OAuthPort
from noteflow.domain.value_objects import OAuthProvider, OAuthState, OAuthTokens
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.config.settings import CalendarIntegrationSettings
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class OAuthError(Exception):

View File

@@ -5,8 +5,8 @@ Implements CalendarPort for Outlook using Microsoft Graph API.
from __future__ import annotations
import logging
from datetime import UTC, datetime, timedelta
from typing import Final
import httpx
@@ -21,14 +21,35 @@ from noteflow.config.constants import (
)
from noteflow.domain.ports.calendar import CalendarEventInfo, CalendarPort
from noteflow.domain.value_objects import OAuthProvider
from noteflow.infrastructure.logging import get_logger
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
# HTTP client configuration
GRAPH_API_TIMEOUT: Final[float] = 30.0 # seconds
MAX_CONNECTIONS: Final[int] = 10
MAX_ERROR_BODY_LENGTH: Final[int] = 500
GRAPH_API_MAX_PAGE_SIZE: Final[int] = 100 # Graph API maximum
class OutlookCalendarError(Exception):
"""Outlook Calendar API error."""
def _truncate_error_body(body: str) -> str:
"""Truncate error body to prevent log bloat.
Args:
body: Raw error response body.
Returns:
Truncated body with indicator if truncation occurred.
"""
if len(body) <= MAX_ERROR_BODY_LENGTH:
return body
return body[:MAX_ERROR_BODY_LENGTH] + "... (truncated)"
class OutlookCalendarAdapter(CalendarPort):
"""Microsoft Graph Calendar API adapter.
@@ -46,10 +67,13 @@ class OutlookCalendarAdapter(CalendarPort):
) -> list[CalendarEventInfo]:
"""Fetch upcoming calendar events from Outlook Calendar.
Implements pagination via @odata.nextLink to ensure all events
within the limit are retrieved.
Args:
access_token: Microsoft Graph OAuth token with Calendars.Read scope.
hours_ahead: Hours to look ahead from current time.
limit: Maximum events to return (capped by Graph API).
limit: Maximum events to return.
Returns:
List of Outlook calendar events ordered by start datetime.
@@ -61,11 +85,18 @@ class OutlookCalendarAdapter(CalendarPort):
start_time = now.strftime("%Y-%m-%dT%H:%M:%SZ")
end_time = (now + timedelta(hours=hours_ahead)).strftime("%Y-%m-%dT%H:%M:%SZ")
url = f"{self.GRAPH_API_BASE}/me/calendarView"
params: dict[str, str | int] = {
headers = {
HTTP_AUTHORIZATION: f"{HTTP_BEARER_PREFIX}{access_token}",
"Prefer": 'outlook.timezone="UTC"',
}
# Initial page request
page_size = min(limit, GRAPH_API_MAX_PAGE_SIZE)
url: str | None = f"{self.GRAPH_API_BASE}/me/calendarView"
params: dict[str, str | int] | None = {
"startDateTime": start_time,
"endDateTime": end_time,
"$top": limit,
"$top": page_size,
"$orderby": "start/dateTime",
"$select": (
"id,subject,start,end,location,bodyPreview,"
@@ -73,26 +104,36 @@ class OutlookCalendarAdapter(CalendarPort):
),
}
headers = {
HTTP_AUTHORIZATION: f"{HTTP_BEARER_PREFIX}{access_token}",
"Prefer": 'outlook.timezone="UTC"',
}
all_events: list[CalendarEventInfo] = []
async with httpx.AsyncClient() as client:
response = await client.get(url, params=params, headers=headers)
async with httpx.AsyncClient(
timeout=httpx.Timeout(GRAPH_API_TIMEOUT),
limits=httpx.Limits(max_connections=MAX_CONNECTIONS),
) as client:
while url is not None:
response = await client.get(url, params=params, headers=headers)
if response.status_code == HTTP_STATUS_UNAUTHORIZED:
raise OutlookCalendarError(ERR_TOKEN_EXPIRED)
if response.status_code == HTTP_STATUS_UNAUTHORIZED:
raise OutlookCalendarError(ERR_TOKEN_EXPIRED)
if response.status_code != HTTP_STATUS_OK:
error_msg = response.text
logger.error("Microsoft Graph API error: %s", error_msg)
raise OutlookCalendarError(f"{ERR_API_PREFIX}{error_msg}")
if response.status_code != HTTP_STATUS_OK:
error_body = _truncate_error_body(response.text)
logger.error("Microsoft Graph API error: %s", error_body)
raise OutlookCalendarError(f"{ERR_API_PREFIX}{error_body}")
data = response.json()
items = data.get("value", [])
data = response.json()
items = data.get("value", [])
return [self._parse_event(item) for item in items]
for item in items:
all_events.append(self._parse_event(item))
if len(all_events) >= limit:
return all_events
# Check for next page
url = data.get("@odata.nextLink")
params = None # nextLink includes query params
return all_events
async def get_user_email(self, access_token: str) -> str:
"""Get authenticated user's email address.
@@ -110,16 +151,19 @@ class OutlookCalendarAdapter(CalendarPort):
params = {"$select": "mail,userPrincipalName"}
headers = {HTTP_AUTHORIZATION: f"{HTTP_BEARER_PREFIX}{access_token}"}
async with httpx.AsyncClient() as client:
async with httpx.AsyncClient(
timeout=httpx.Timeout(GRAPH_API_TIMEOUT),
limits=httpx.Limits(max_connections=MAX_CONNECTIONS),
) as client:
response = await client.get(url, params=params, headers=headers)
if response.status_code == HTTP_STATUS_UNAUTHORIZED:
raise OutlookCalendarError(ERR_TOKEN_EXPIRED)
if response.status_code != HTTP_STATUS_OK:
error_msg = response.text
logger.error("Microsoft Graph API error: %s", error_msg)
raise OutlookCalendarError(f"{ERR_API_PREFIX}{error_msg}")
error_body = _truncate_error_body(response.text)
logger.error("Microsoft Graph API error: %s", error_body)
raise OutlookCalendarError(f"{ERR_API_PREFIX}{error_body}")
data = response.json()
# Prefer mail, fall back to userPrincipalName

View File

@@ -8,6 +8,7 @@ from __future__ import annotations
from typing import TYPE_CHECKING
from uuid import UUID
from noteflow.config.constants import RULE_FIELD_DESCRIPTION
from noteflow.domain.ports.calendar import CalendarEventInfo
from noteflow.infrastructure.triggers.calendar import CalendarEvent
@@ -70,7 +71,7 @@ class CalendarEventConverter:
"calendar_id": calendar_id,
"calendar_name": calendar_name,
"title": event.title,
"description": event.description,
RULE_FIELD_DESCRIPTION: event.description,
"start_time": event.start_time,
"end_time": event.end_time,
"location": event.location,

View File

@@ -8,7 +8,6 @@ Requires optional dependencies: pip install noteflow[diarization]
from __future__ import annotations
import logging
import os
import warnings
from typing import TYPE_CHECKING
@@ -16,6 +15,7 @@ from typing import TYPE_CHECKING
from noteflow.config.constants import DEFAULT_SAMPLE_RATE, ERR_HF_TOKEN_REQUIRED
from noteflow.infrastructure.diarization.dto import SpeakerTurn
from noteflow.infrastructure.diarization.session import DiarizationSession
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Sequence
@@ -24,7 +24,7 @@ if TYPE_CHECKING:
from numpy.typing import NDArray
from pyannote.core import Annotation
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class DiarizationEngine:

View File

@@ -6,20 +6,20 @@ without cross-talk. Shared models are loaded once and reused across sessions.
from __future__ import annotations
import logging
from collections.abc import Sequence
from dataclasses import dataclass, field
from typing import TYPE_CHECKING
from noteflow.config.constants import DEFAULT_SAMPLE_RATE
from noteflow.infrastructure.diarization.dto import SpeakerTurn
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
import numpy as np
from diart import SpeakerDiarization
from numpy.typing import NDArray
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
@dataclass

View File

@@ -1,6 +1,15 @@
"""Logging infrastructure for NoteFlow."""
"""Logging infrastructure for NoteFlow.
This module provides centralized logging with structlog, supporting:
- Dual output (Rich console for development, JSON for observability)
- Automatic context injection (request_id, user_id, workspace_id)
- OpenTelemetry trace correlation
- In-memory log buffer for UI streaming
"""
from .config import LoggingConfig, configure_logging, get_logger
from .log_buffer import LogBuffer, LogBufferHandler, LogEntry, get_log_buffer
from .processors import add_noteflow_context, add_otel_trace_context
from .structured import (
generate_request_id,
get_logging_context,
@@ -16,8 +25,13 @@ __all__ = [
"LogBuffer",
"LogBufferHandler",
"LogEntry",
"LoggingConfig",
"add_noteflow_context",
"add_otel_trace_context",
"configure_logging",
"generate_request_id",
"get_log_buffer",
"get_logger",
"get_logging_context",
"get_request_id",
"get_user_id",

View File

@@ -0,0 +1,185 @@
"""Centralized logging configuration with dual output.
Configure structlog with Rich console + JSON file output for both development
and observability use cases.
"""
from __future__ import annotations
import logging
import sys
from dataclasses import dataclass
from pathlib import Path
from typing import TYPE_CHECKING
import structlog
from .processors import build_processor_chain
if TYPE_CHECKING:
from collections.abc import Sequence
from structlog.typing import Processor
# Default log level constant
_DEFAULT_LEVEL = "INFO"
# Rich console width for traceback formatting
_RICH_CONSOLE_WIDTH = 120
@dataclass(frozen=True, slots=True)
class LoggingConfig:
"""Configuration for centralized logging.
Attributes:
level: Minimum log level (DEBUG, INFO, WARNING, ERROR).
json_file: Optional path for JSON log file output.
enable_console: Enable Rich console output.
enable_json_console: Force JSON output to console (for production).
enable_log_buffer: Feed logs to in-memory LogBuffer for UI streaming.
enable_otel_context: Include OpenTelemetry trace/span IDs.
enable_noteflow_context: Include request_id, user_id, workspace_id.
console_colors: Enable Rich colors (auto-detect TTY if not set).
"""
level: str = _DEFAULT_LEVEL
json_file: Path | None = None
enable_console: bool = True
enable_json_console: bool = False
enable_log_buffer: bool = True
enable_otel_context: bool = True
enable_noteflow_context: bool = True
console_colors: bool = True
# Log level name to constant mapping
_LEVEL_MAP: dict[str, int] = {
"DEBUG": logging.DEBUG,
"INFO": logging.INFO,
"WARNING": logging.WARNING,
"ERROR": logging.ERROR,
"CRITICAL": logging.CRITICAL,
}
def _get_log_level(level_name: str) -> int:
"""Convert level name to logging constant."""
return _LEVEL_MAP.get(level_name.upper(), logging.INFO)
def _create_renderer(config: LoggingConfig) -> Processor:
"""Create the appropriate renderer based on configuration.
Uses Rich console rendering for TTY output with colors and formatting,
JSON for non-TTY or production environments.
"""
if config.enable_json_console or not sys.stderr.isatty():
return structlog.processors.JSONRenderer()
# Use Rich console renderer for beautiful TTY output
from rich.console import Console
from rich.traceback import install as install_rich_traceback
# Install Rich traceback handler for better exception formatting
install_rich_traceback(show_locals=False, width=_RICH_CONSOLE_WIDTH, suppress=[structlog])
Console(stderr=True, force_terminal=config.console_colors)
return structlog.dev.ConsoleRenderer(
colors=config.console_colors,
exception_formatter=structlog.dev.rich_traceback,
)
def _configure_structlog(processors: Sequence[Processor]) -> None:
"""Configure structlog with the processor chain."""
structlog.configure(
processors=[*processors, structlog.stdlib.ProcessorFormatter.wrap_for_formatter],
wrapper_class=structlog.stdlib.BoundLogger,
logger_factory=structlog.stdlib.LoggerFactory(),
cache_logger_on_first_use=True,
)
def _setup_handlers(
config: LoggingConfig,
log_level: int,
processors: Sequence[Processor],
renderer: Processor,
) -> None:
"""Configure and attach handlers to the root logger."""
formatter = structlog.stdlib.ProcessorFormatter(
foreign_pre_chain=processors,
processors=[structlog.stdlib.ProcessorFormatter.remove_processors_meta, renderer],
)
root_logger = logging.getLogger()
root_logger.setLevel(log_level)
# Clear existing handlers
for handler in root_logger.handlers[:]:
root_logger.removeHandler(handler)
if config.enable_console:
console_handler = logging.StreamHandler(sys.stderr)
console_handler.setFormatter(formatter)
console_handler.setLevel(log_level)
root_logger.addHandler(console_handler)
if config.json_file is not None:
json_formatter = structlog.stdlib.ProcessorFormatter(
foreign_pre_chain=processors,
processors=[
structlog.stdlib.ProcessorFormatter.remove_processors_meta,
structlog.processors.JSONRenderer(),
],
)
file_handler = logging.FileHandler(config.json_file)
file_handler.setFormatter(json_formatter)
file_handler.setLevel(log_level)
root_logger.addHandler(file_handler)
if config.enable_log_buffer:
from .log_buffer import LogBufferHandler, get_log_buffer
buffer_handler = LogBufferHandler(buffer=get_log_buffer(), level=log_level)
root_logger.addHandler(buffer_handler)
def configure_logging(
config: LoggingConfig | None = None,
*,
level: str = _DEFAULT_LEVEL,
json_file: Path | None = None,
) -> None:
"""Configure centralized logging with dual output.
Call once at application startup. Configures both structlog and stdlib
logging for seamless integration.
Args:
config: Full configuration object, or use keyword args.
level: Log level (DEBUG, INFO, WARNING, ERROR).
json_file: Optional path for JSON log file.
"""
if config is None:
config = LoggingConfig(level=level, json_file=json_file)
log_level = _get_log_level(config.level)
processors = build_processor_chain(config)
renderer = _create_renderer(config)
_configure_structlog(processors)
_setup_handlers(config, log_level, processors, renderer)
def get_logger(name: str | None = None) -> structlog.stdlib.BoundLogger:
"""Get a structlog logger instance.
Args:
name: Optional logger name (defaults to calling module).
Returns:
Configured structlog BoundLogger.
"""
return structlog.get_logger(name)

View File

@@ -0,0 +1,139 @@
"""Custom structlog processors for NoteFlow logging.
Provide context injection, OpenTelemetry integration, and LogBuffer feeding.
"""
from __future__ import annotations
from typing import TYPE_CHECKING, Final
import structlog
# Log field name constants (avoid repeated string literals)
_TRACE_ID: Final = "trace_id"
_SPAN_ID: Final = "span_id"
_PARENT_SPAN_ID: Final = "parent_span_id"
_HEX_32: Final = "032x"
_HEX_16: Final = "016x"
if TYPE_CHECKING:
from collections.abc import Sequence
from structlog.typing import EventDict, Processor, WrappedLogger
from .config import LoggingConfig
def add_noteflow_context(
logger: WrappedLogger,
method_name: str,
event_dict: EventDict,
) -> EventDict:
"""Inject request_id, user_id, workspace_id from context vars.
Only adds values that are set and not already present in the event.
Args:
logger: The wrapped logger instance.
method_name: Name of the log method called.
event_dict: Current event dictionary.
Returns:
Updated event dictionary with context values.
"""
from .structured import get_logging_context
ctx = get_logging_context()
for key, value in ctx.items():
if value is not None and key not in event_dict:
event_dict[key] = value
return event_dict
def add_otel_trace_context(
logger: WrappedLogger,
method_name: str,
event_dict: EventDict,
) -> EventDict:
"""Inject OpenTelemetry trace/span IDs if available.
Gracefully handles missing OpenTelemetry installation.
Args:
logger: The wrapped logger instance.
method_name: Name of the log method called.
event_dict: Current event dictionary.
Returns:
Updated event dictionary with trace context.
"""
try:
from opentelemetry import trace
span = trace.get_current_span()
if span is not None and span.is_recording():
ctx = span.get_span_context()
if ctx is not None and ctx.is_valid:
event_dict[_TRACE_ID] = format(ctx.trace_id, _HEX_32)
event_dict[_SPAN_ID] = format(ctx.span_id, _HEX_16)
# Parent span ID if available
parent = getattr(span, "parent", None)
if parent is not None:
parent_ctx = getattr(parent, _SPAN_ID, None)
if parent_ctx is not None:
event_dict[_PARENT_SPAN_ID] = format(parent_ctx, _HEX_16)
except ImportError:
pass
except (AttributeError, TypeError):
# Graceful degradation for edge cases
pass
return event_dict
def build_processor_chain(config: LoggingConfig) -> Sequence[Processor]:
"""Build the structlog processor chain based on configuration.
Args:
config: Logging configuration.
Returns:
Sequence of processors in execution order.
"""
processors: list[Processor] = [
# Filter by level early
structlog.stdlib.filter_by_level,
# Add standard fields
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
# Handle %-style formatting from legacy code
structlog.stdlib.PositionalArgumentsFormatter(),
# ISO 8601 timestamp
structlog.processors.TimeStamper(fmt="iso"),
]
# Context injection (optional based on config)
if config.enable_noteflow_context:
processors.append(add_noteflow_context)
if config.enable_otel_context:
processors.append(add_otel_trace_context)
# Additional standard processors
processors.extend([
# Add callsite information (file, function, line)
structlog.processors.CallsiteParameterAdder(
parameters=[
structlog.processors.CallsiteParameter.FILENAME,
structlog.processors.CallsiteParameter.FUNC_NAME,
structlog.processors.CallsiteParameter.LINENO,
]
),
# Stack traces if requested
structlog.processors.StackInfoRenderer(),
# Exception formatting
structlog.processors.format_exc_info,
# Decode bytes to strings
structlog.processors.UnicodeDecoder(),
])
return processors

View File

@@ -3,14 +3,15 @@
from __future__ import annotations
import asyncio
import logging
import time
from collections import deque
from dataclasses import dataclass
import psutil
logger = logging.getLogger(__name__)
from noteflow.infrastructure.logging import get_logger
logger = get_logger(__name__)
@dataclass(frozen=True, slots=True)

View File

@@ -6,7 +6,6 @@ Provides named entity extraction with lazy model loading and segment tracking.
from __future__ import annotations
import asyncio
import logging
from functools import partial
from typing import TYPE_CHECKING, Final
@@ -17,11 +16,12 @@ from noteflow.config.constants import (
SPACY_MODEL_TRF,
)
from noteflow.domain.entities.named_entity import EntityCategory, NamedEntity
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from spacy.language import Language
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
# Map spaCy entity types to our categories
_SPACY_CATEGORY_MAP: Final[dict[str, EntityCategory]] = {

View File

@@ -7,12 +7,13 @@ this module gracefully degrades to no-op behavior.
from __future__ import annotations
import logging
from contextlib import AbstractContextManager
from functools import cache
from typing import Protocol, cast
logger = logging.getLogger(__name__)
from noteflow.infrastructure.logging import get_logger
logger = get_logger(__name__)
# Track whether OpenTelemetry is available and configured
_otel_configured: bool = False
@@ -47,6 +48,7 @@ def configure_observability(
*,
enable_grpc_instrumentation: bool = True,
otlp_endpoint: str | None = None,
otlp_insecure: bool | None = None,
) -> bool:
"""Initialize OpenTelemetry trace and metrics providers.
@@ -58,6 +60,8 @@ def configure_observability(
service_name: Service name for resource identification.
enable_grpc_instrumentation: Whether to auto-instrument gRPC.
otlp_endpoint: Optional OTLP endpoint for exporting telemetry.
otlp_insecure: Use insecure connection. If None, infers from endpoint
scheme (http:// = insecure, https:// = secure).
Returns:
True if configuration succeeded, False if OTel is not available.
@@ -96,9 +100,20 @@ def configure_observability(
)
from opentelemetry.sdk.trace.export import BatchSpanProcessor
otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True)
# Determine insecure mode: explicit setting or infer from scheme
if otlp_insecure is not None:
use_insecure = otlp_insecure
else:
# Infer from endpoint scheme: http:// = insecure, https:// = secure
use_insecure = otlp_endpoint.startswith("http://")
otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=use_insecure)
tracer_provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
logger.info("OTLP trace exporter configured: %s", otlp_endpoint)
logger.info(
"OTLP trace exporter configured: %s (insecure=%s)",
otlp_endpoint,
use_insecure,
)
except ImportError:
logger.warning("OTLP exporter not available, traces will not be exported")

View File

@@ -16,6 +16,7 @@ from noteflow.application.observability.ports import (
UsageEvent,
UsageEventSink,
)
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.observability.otel import _check_otel_available
if TYPE_CHECKING:
@@ -25,7 +26,7 @@ if TYPE_CHECKING:
SqlAlchemyUsageEventRepository,
)
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class LoggingUsageEventSink:

View File

@@ -3,7 +3,6 @@
from __future__ import annotations
import asyncio
import logging
from pathlib import Path
from typing import TYPE_CHECKING
@@ -18,10 +17,12 @@ from sqlalchemy.ext.asyncio import (
create_async_engine as sa_create_async_engine,
)
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from noteflow.config import Settings
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
def create_async_engine(settings: Settings) -> AsyncEngine:

View File

@@ -1,13 +1,13 @@
"""File system asset repository."""
import logging
import shutil
from pathlib import Path
from noteflow.domain.ports.repositories import AssetRepository
from noteflow.domain.value_objects import MeetingId
from noteflow.infrastructure.logging import get_logger
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
class FileSystemAssetRepository(AssetRepository):

View File

@@ -50,15 +50,15 @@ class SqlAlchemyProjectRepository(BaseRepository):
if settings.export_rules is not None:
export_data: dict[str, object] = {}
if settings.export_rules.default_format is not None:
export_data["default_format"] = settings.export_rules.default_format.value
export_data[RULE_FIELD_DEFAULT_FORMAT] = settings.export_rules.default_format.value
if settings.export_rules.include_audio is not None:
export_data["include_audio"] = settings.export_rules.include_audio
export_data[RULE_FIELD_INCLUDE_AUDIO] = settings.export_rules.include_audio
if settings.export_rules.include_timestamps is not None:
export_data["include_timestamps"] = settings.export_rules.include_timestamps
export_data[RULE_FIELD_INCLUDE_TIMESTAMPS] = settings.export_rules.include_timestamps
if settings.export_rules.template_id is not None:
export_data["template_id"] = str(settings.export_rules.template_id)
export_data[RULE_FIELD_TEMPLATE_ID] = str(settings.export_rules.template_id)
if export_data:
data["export_rules"] = export_data
data[RULE_FIELD_EXPORT_RULES] = export_data
if settings.trigger_rules is not None:
trigger_data: dict[str, object] = {}

View File

@@ -71,11 +71,18 @@ class SqlAlchemyWorkspaceRepository(BaseRepository):
app_match_patterns=trigger_data.get(RULE_FIELD_APP_MATCH_PATTERNS),
)
# Extract and validate optional settings with type narrowing
rag_enabled_raw = data.get("rag_enabled")
rag_enabled = rag_enabled_raw if isinstance(rag_enabled_raw, bool) else None
template_raw = data.get("default_summarization_template")
template = template_raw if isinstance(template_raw, str) else None
return WorkspaceSettings(
export_rules=export_rules,
trigger_rules=trigger_rules,
rag_enabled=data.get("rag_enabled"), # type: ignore[arg-type]
default_summarization_template=data.get("default_summarization_template"), # type: ignore[arg-type]
rag_enabled=rag_enabled,
default_summarization_template=template,
)
@staticmethod

View File

@@ -5,7 +5,6 @@ Provides AES-GCM encryption for audio data with envelope encryption.
from __future__ import annotations
import logging
import secrets
import struct
from collections.abc import Iterator
@@ -15,23 +14,45 @@ from typing import TYPE_CHECKING, BinaryIO, Final
from cryptography.exceptions import InvalidTag
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.security.protocols import EncryptedChunk
if TYPE_CHECKING:
from noteflow.infrastructure.security.keystore import InMemoryKeyStore, KeyringKeyStore
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
# Constants
KEY_SIZE: Final[int] = 32 # 256-bit key
NONCE_SIZE: Final[int] = 12 # 96-bit nonce for AES-GCM
TAG_SIZE: Final[int] = 16 # 128-bit authentication tag
MIN_CHUNK_LENGTH: Final[int] = NONCE_SIZE + TAG_SIZE # Minimum valid encrypted chunk
# File format magic number and version
FILE_MAGIC: Final[bytes] = b"NFAE" # NoteFlow Audio Encrypted
FILE_VERSION: Final[int] = 1
def _read_exact(handle: BinaryIO, size: int, description: str) -> bytes:
"""Read exactly size bytes or raise ValueError.
Args:
handle: File handle to read from.
size: Number of bytes to read.
description: Description for error message.
Returns:
Exactly size bytes.
Raises:
ValueError: If fewer than size bytes available.
"""
data = handle.read(size)
if len(data) < size:
raise ValueError(f"Truncated {description}: expected {size}, got {len(data)}")
return data
class AesGcmCryptoBox:
"""AES-GCM based encryption with envelope encryption.
@@ -263,7 +284,14 @@ class ChunkedAssetReader:
self._handle = None
raise ValueError(f"Invalid file format: expected {FILE_MAGIC!r}, got {magic!r}")
version = struct.unpack("B", self._handle.read(1))[0]
try:
version_bytes = _read_exact(self._handle, 1, "version header")
except ValueError as e:
self._handle.close()
self._handle = None
raise ValueError(f"Invalid file format: {e}") from e
version = struct.unpack("B", version_bytes)[0]
if version != FILE_VERSION:
self._handle.close()
self._handle = None
@@ -279,15 +307,22 @@ class ChunkedAssetReader:
while True:
# Read chunk length
length_bytes = self._handle.read(4)
if len(length_bytes) == 0:
break # Clean end of file
if len(length_bytes) < 4:
break # End of file
raise ValueError("Truncated chunk length header")
chunk_length = struct.unpack(">I", length_bytes)[0]
# Validate minimum chunk size (nonce + tag at minimum)
if chunk_length < MIN_CHUNK_LENGTH:
raise ValueError(
f"Invalid chunk length {chunk_length}: "
f"minimum is {MIN_CHUNK_LENGTH} (nonce + tag)"
)
# Read chunk data
chunk_data = self._handle.read(chunk_length)
if len(chunk_data) < chunk_length:
raise ValueError("Truncated chunk")
chunk_data = _read_exact(self._handle, chunk_length, "chunk data")
# Parse chunk (nonce + ciphertext + tag)
nonce = chunk_data[:NONCE_SIZE]

View File

@@ -5,7 +5,6 @@ Provides secure master key storage using OS credential stores.
import base64
import binascii
import logging
import os
import secrets
import stat
@@ -15,8 +14,9 @@ from typing import Final
import keyring
from noteflow.config.constants import APP_DIR_NAME
from noteflow.infrastructure.logging import get_logger
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
# Constants
KEY_SIZE: Final[int] = 32 # 256-bit key

View File

@@ -3,12 +3,12 @@
from __future__ import annotations
import json
import logging
from datetime import UTC, datetime
from typing import TYPE_CHECKING, TypedDict, cast
from noteflow.domain.entities import ActionItem, KeyPoint, Summary
from noteflow.domain.summarization import InvalidResponseError
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
from collections.abc import Sequence
@@ -16,7 +16,7 @@ if TYPE_CHECKING:
from noteflow.domain.entities import Segment
from noteflow.domain.summarization import SummarizationRequest
_logger = logging.getLogger(__name__)
_logger = get_logger(__name__)
class _KeyPointData(TypedDict, total=False):

View File

@@ -1,17 +1,16 @@
"""Factory for creating configured SummarizationService instances."""
import logging
from noteflow.application.services.summarization_service import (
SummarizationMode,
SummarizationService,
SummarizationServiceSettings,
)
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.summarization.citation_verifier import SegmentCitationVerifier
from noteflow.infrastructure.summarization.mock_provider import MockSummarizer
from noteflow.infrastructure.summarization.ollama_provider import OllamaSummarizer
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
def create_summarization_service(

View File

@@ -3,15 +3,12 @@
from __future__ import annotations
import asyncio
import logging
import os
import time
from datetime import UTC, datetime
from typing import TYPE_CHECKING
from noteflow.domain.entities import Summary
logger = logging.getLogger(__name__)
from noteflow.domain.summarization import (
InvalidResponseError,
ProviderUnavailableError,
@@ -19,6 +16,7 @@ from noteflow.domain.summarization import (
SummarizationResult,
SummarizationTimeoutError,
)
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.summarization._parsing import (
SYSTEM_PROMPT,
build_transcript_prompt,
@@ -28,6 +26,8 @@ from noteflow.infrastructure.summarization._parsing import (
if TYPE_CHECKING:
import ollama
logger = get_logger(__name__)
def _get_ollama_settings() -> tuple[str, float, float]:
"""Get Ollama settings with fallback defaults.

View File

@@ -7,7 +7,6 @@ This is a best-effort heuristic: it combines (a) system output activity and
from __future__ import annotations
import logging
import time
from dataclasses import dataclass, field
from typing import TYPE_CHECKING
@@ -15,6 +14,7 @@ from typing import TYPE_CHECKING
from noteflow.config.constants import DEFAULT_SAMPLE_RATE
from noteflow.domain.triggers.entities import TriggerSignal, TriggerSource
from noteflow.infrastructure.audio.levels import RmsLevelProvider
from noteflow.infrastructure.logging import get_logger
from noteflow.infrastructure.triggers.audio_activity import (
AudioActivityProvider,
AudioActivitySettings,
@@ -24,7 +24,7 @@ if TYPE_CHECKING:
import numpy as np
from numpy.typing import NDArray
logger = logging.getLogger(__name__)
logger = get_logger(__name__)
@dataclass

View File

@@ -12,6 +12,7 @@ from dataclasses import dataclass
from typing import TYPE_CHECKING
from noteflow.domain.triggers.entities import TriggerSignal, TriggerSource
from noteflow.infrastructure.logging import get_logger
if TYPE_CHECKING:
import numpy as np
@@ -19,6 +20,8 @@ if TYPE_CHECKING:
from noteflow.infrastructure.audio import RmsLevelProvider
logger = get_logger(__name__)
@dataclass
class AudioActivitySettings:
@@ -71,6 +74,20 @@ class AudioActivityProvider:
self._settings = settings
self._history: deque[tuple[float, bool]] = deque(maxlen=self._settings.max_history)
self._lock = threading.Lock()
self._last_signal_state: bool = False
self._frame_count: int = 0
self._active_frame_count: int = 0
logger.info(
"Audio activity provider initialized",
enabled=settings.enabled,
threshold_db=settings.threshold_db,
window_seconds=settings.window_seconds,
min_active_ratio=settings.min_active_ratio,
min_samples=settings.min_samples,
max_history=settings.max_history,
weight=settings.weight,
)
@property
def source(self) -> TriggerSource:
@@ -98,6 +115,20 @@ class AudioActivityProvider:
is_active = db >= self._settings.threshold_db
with self._lock:
self._history.append((timestamp, is_active))
self._frame_count += 1
if is_active:
self._active_frame_count += 1
# Log summary every 100 frames to avoid spam
if self._frame_count % 100 == 0:
logger.debug(
"Audio activity update summary",
frame_count=self._frame_count,
active_frames=self._active_frame_count,
history_size=len(self._history),
last_db=round(db, 1),
last_active=is_active,
)
def get_signal(self) -> TriggerSignal | None:
"""Get current signal if sustained activity detected.
@@ -113,21 +144,62 @@ class AudioActivityProvider:
history = list(self._history)
if len(history) < self._settings.min_samples:
logger.debug(
"Insufficient samples for signal evaluation",
history_size=len(history),
min_samples=self._settings.min_samples,
)
return None
# Prune old samples outside window
now = time.monotonic()
cutoff = now - self._settings.window_seconds
recent = [(ts, active) for ts, active in history if ts >= cutoff]
pruned_count = len(history) - len(recent)
if pruned_count > 0:
logger.debug(
"Pruned old samples from history",
pruned_count=pruned_count,
remaining_count=len(recent),
window_seconds=self._settings.window_seconds,
)
if len(recent) < self._settings.min_samples:
logger.debug(
"Insufficient recent samples after pruning",
recent_count=len(recent),
min_samples=self._settings.min_samples,
)
return None
# Calculate activity ratio
active_count = sum(bool(active) for _, active in recent)
ratio = active_count / len(recent)
signal_detected = ratio >= self._settings.min_active_ratio
if ratio < self._settings.min_active_ratio:
# Log state transitions (signal detected vs not)
if signal_detected != self._last_signal_state:
if signal_detected:
logger.info(
"Audio activity signal detected",
activity_ratio=round(ratio, 3),
min_active_ratio=self._settings.min_active_ratio,
active_count=active_count,
sample_count=len(recent),
weight=self.max_weight,
)
else:
logger.info(
"Audio activity signal cleared",
activity_ratio=round(ratio, 3),
min_active_ratio=self._settings.min_active_ratio,
active_count=active_count,
sample_count=len(recent),
)
self._last_signal_state = signal_detected
if not signal_detected:
return None
return TriggerSignal(source=self.source, weight=self.max_weight)
@@ -139,4 +211,13 @@ class AudioActivityProvider:
def clear_history(self) -> None:
"""Clear activity history. Useful when recording starts."""
with self._lock:
previous_size = len(self._history)
self._history.clear()
self._frame_count = 0
self._active_frame_count = 0
self._last_signal_state = False
logger.debug(
"Audio activity history cleared",
previous_size=previous_size,
)

Some files were not shown because too many files have changed in this diff Show More