Add summarization and trigger services

- Introduced `SummarizationService` and `TriggerService` to orchestrate summarization and trigger detection functionalities.
- Added new modules for summarization, including citation verification and cloud-based summarization providers.
- Implemented trigger detection based on audio activity and foreground application status.
- Updated project configuration to include new dependencies for summarization and trigger functionalities.
- Created tests for summarization and trigger services to ensure functionality and reliability.
This commit is contained in:
2025-12-18 00:08:51 +00:00
parent b36ee5c211
commit 4eef1b3be6
49 changed files with 15909 additions and 4256 deletions

103
CLAUDE.md Normal file
View File

@@ -0,0 +1,103 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
NoteFlow is an intelligent meeting notetaker: local-first audio capture + navigable recall + evidence-linked summaries. Client-server architecture using gRPC for bidirectional audio streaming and transcription.
## Build and Development Commands
```bash
# Install (editable with dev dependencies)
python -m pip install -e ".[dev]"
# Run gRPC server
python -m noteflow.grpc.server --help
# Run Flet client UI
python -m noteflow.client.app --help
# Tests
pytest # Full suite
pytest -m "not integration" # Skip external-service tests
pytest tests/domain/ # Run specific test directory
pytest -k "test_segment" # Run by pattern
# Linting and type checking
ruff check . # Lint
ruff check --fix . # Autofix
mypy src/noteflow # Strict type checks
basedpyright # Additional type checks
```
## Architecture
```
src/noteflow/
├── domain/ # Entities (meeting, segment, annotation, summary) + ports (repository interfaces)
├── application/ # Use-cases/services (MeetingService, RecoveryService, ExportService)
├── infrastructure/ # Implementations
│ ├── audio/ # sounddevice capture, ring buffer, VU levels, playback
│ ├── asr/ # faster-whisper engine, VAD segmenter, streaming
│ ├── persistence/ # SQLAlchemy + asyncpg + pgvector, Alembic migrations
│ ├── security/ # keyring keystore, AES-GCM encryption
│ ├── export/ # Markdown/HTML export
│ └── converters/ # ORM ↔ domain entity converters
├── grpc/ # Proto definitions, server, client, meeting store
├── client/ # Flet UI app + components (transcript, VU meter, playback)
└── config/ # Pydantic settings (NOTEFLOW_ env vars)
```
**Key patterns:**
- Hexagonal architecture: domain → application → infrastructure
- Repository pattern with Unit of Work (`SQLAlchemyUnitOfWork`)
- gRPC bidirectional streaming for audio → transcript flow
- Protocol-based DI (see `domain/ports/` and infrastructure `protocols.py` files)
## Database
PostgreSQL with pgvector extension. Async SQLAlchemy with asyncpg driver.
```bash
# Alembic migrations
alembic upgrade head
alembic revision --autogenerate -m "description"
```
Connection via `NOTEFLOW_DATABASE_URL` env var or settings.
## Testing Conventions
- Test files: `test_*.py`, functions: `test_*`
- Markers: `@pytest.mark.slow` (model loading), `@pytest.mark.integration` (external services)
- Integration tests use testcontainers for PostgreSQL
- Asyncio auto-mode enabled
## Proto/gRPC
Proto definitions: `src/noteflow/grpc/proto/noteflow.proto`
Generated files excluded from lint: `*_pb2.py`, `*_pb2_grpc.py`
Regenerate after proto changes:
```bash
python -m grpc_tools.protoc -I src/noteflow/grpc/proto \
--python_out=src/noteflow/grpc/proto \
--grpc_python_out=src/noteflow/grpc/proto \
src/noteflow/grpc/proto/noteflow.proto
```
## Code Style
- Python 3.12+, 100-char line length
- Strict mypy (allow `type: ignore[code]` only with comment explaining why)
- Ruff for linting (E, W, F, I, B, C4, UP, SIM, RUF)
- Module soft limit 500 LoC, hard limit 750 LoC
## Spikes (De-risking Experiments)
`spikes/` contains validated platform experiments with `FINDINGS.md`:
- `spike_01_ui_tray_hotkeys/` - Flet + pystray + pynput (requires X11)
- `spike_02_audio_capture/` - sounddevice + PortAudio
- `spike_03_asr_latency/` - faster-whisper benchmarks (0.05x real-time)
- `spike_04_encryption/` - keyring + AES-GCM (826 MB/s throughput)