Add summarization and trigger services

- Introduced `SummarizationService` and `TriggerService` to orchestrate summarization and trigger detection functionalities.
- Added new modules for summarization, including citation verification and cloud-based summarization providers.
- Implemented trigger detection based on audio activity and foreground application status.
- Updated project configuration to include new dependencies for summarization and trigger functionalities.
- Created tests for summarization and trigger services to ensure functionality and reliability.
This commit is contained in:
2025-12-18 00:08:51 +00:00
parent b36ee5c211
commit 4eef1b3be6
49 changed files with 15909 additions and 4256 deletions

View File

@@ -94,6 +94,32 @@ Im writing this so engineering can start building without reinterpreting p
* Final segments persisted to DB
* Post-meeting transcript view
**Current status:**
* Final segments are emitted and persisted; partial updates are not yet produced.
**Implementation plan (add partials end-to-end):**
* ASR layer:
* Extend ASR engine interface to surface partial hypotheses at a fixed cadence
(e.g., every N seconds or on each VAD speech chunk).
* Add a lightweight streaming mode for faster-whisper (or a buffering strategy
that returns interim text from recent audio while finalization waits for
silence).
* Ensure partial outputs include a stable `segment_id=0` (or temporary ID)
and do not persist to DB.
* Server:
* Emit `UPDATE_TYPE_PARTIAL` messages from the ASR loop on cadence.
* Debounce partial updates to avoid UI churn and bandwidth spikes.
* Keep final segment emission unchanged; partials must be overwritten by finals.
* Client/UI:
* Render a single “live partial” row at the bottom of the transcript list
(grey text), replaced in-place on each partial update.
* Drop partials on stop or on first final segment after a partial.
* Tests:
* Unit tests for partial cadence and suppression of partial persistence.
* Integration test that partials appear before finals and are cleared on final.
**Exit criteria:**
* Live view shows partial text that settles into final segments.
@@ -112,6 +138,12 @@ Im writing this so engineering can start building without reinterpreting p
* Export: Markdown + HTML
* Meeting library list + per-meeting search
**Gaps to close in this milestone:**
* Wire meeting library into the main UI and selection flow.
* Add per-meeting transcript search (client-side filter is acceptable for V1).
* Add `risk` annotation type end-to-end (domain enum, UI, persistence).
**Exit criteria:**
* Clicking a segment seeks audio playback to that time.
@@ -132,6 +164,12 @@ Im writing this so engineering can start building without reinterpreting p
* Prompt notification + snooze + suppress per-app
* Settings for sensitivity and auto-start opt-in
**Deferred to a later, tray/hotkey-focused milestone:**
* Trigger prompts that include per-app suppression, calendar stubs, and
snooze presets integrated with tray/menubar UX.
* Persistent “recording/monitoring” indicator when background capture is active.
**Exit criteria:**
* Trigger prompts happen when expected and can be snoozed.
@@ -153,6 +191,27 @@ Im writing this so engineering can start building without reinterpreting p
* Citation verifier + “uncited drafts” handling
* Summary UI panel with clickable citations
**Implementation plan (citations enforced):**
* Summarizer provider interface:
* Define `Summarizer` protocol with `extract()` and `synthesize()` phases.
* Provide `MockSummarizer` for tests and a cloud-backed provider behind opt-in.
* Extraction stage:
* Segment-aware chunking (~500 tokens) with stable `segment_ids` in each chunk.
* Extraction prompt returns structured items: quote, segment_ids, category.
* Synthesis stage:
* Rewrite extracted items into bullets; each bullet must end with
`[...]` containing segment IDs.
* Verification stage:
* Parse bullets; suppress any uncited bullets by default.
* Store uncited drafts separately for optional user review.
* UI:
* Summary panel lists key points + action items with clickable citations.
* Clicking a bullet scrolls transcript and seeks audio to the first segment.
* Tests:
* Unit tests for citation parsing, uncited suppression, and click→segment mapping.
* Integration test for summary generation request and persisted citations.
**Exit criteria:**
* Every displayed bullet has citations.
@@ -173,6 +232,19 @@ Im writing this so engineering can start building without reinterpreting p
* “Check for updates” flow (manual link + version display)
* Release checklist & troubleshooting docs
**Implementation plan (delete/retention correctness):**
* Meeting deletion:
* Extend delete flow to remove encrypted audio assets on disk.
* Delete wrapped DEK and master key references so audio cannot be decrypted.
* Add best-effort cleanup for orphaned files on next startup.
* Retention:
* Scheduled job that deletes meetings older than retention days.
* Include DB rows, summaries, and audio assets in the purge.
* Tests:
* Integration test that delete removes DB rows + audio file path.
* Integration test that retention job removes expired meetings and assets.
**Exit criteria:**
* A signed installer (or unsigned for internal) that installs and runs on both OSs.

View File

@@ -1,6 +1,6 @@
Below is a rewritten, endtoend **Product Specification + Engineering Design Document** for **NoteFlow V1 (Minimum Lovable Product)** that merges:
* your **revised V1 draft** (confidence-model triggers, single-process, partial/final UX, extractthensynthesize citations, pragmatic typing, packaging constraints, risks table), and
* your **revised V1 draft** (confidence-model triggers, client/server architecture, partial/final UX, extractthensynthesize citations, pragmatic typing, packaging constraints, risks table), and
* the **de-risking feedback** I gave earlier (audio capture reality, diarization scope, citation enforcement, OS permissions, shipping concerns, storage/retention, update strategy, and “dont promise what you cant reliably ship”).
Ive kept it “shipping-ready” by being explicit about decisions, failure modes, acceptance criteria, and what is deferred.
@@ -292,7 +292,9 @@ The system is split into two components that can run on the same machine or sepa
**Server (Headless Backend)**
* **ASR Engine:** faster-whisper for transcription
* **Meeting Store:** in-memory meeting management
* **Storage:** LanceDB for persistence + encrypted audio assets
* **Storage:** PostgreSQL + pgvector for persistence + encrypted audio assets
(current implementation). LanceDB is supported as an optional adapter for
local-only deployments in single-process mode.
* **gRPC Service:** bidirectional streaming for real-time transcription
**Client (GUI Application)**
@@ -310,6 +312,8 @@ The system is split into two components that can run on the same machine or sepa
**Deployment modes:**
1. **Local:** Server + Client on same machine (default)
2. **Split:** Server on headless machine, Client on workstation with audio
3. **Local-only adapter:** Optional LanceDB-backed, single-process mode
for development or constrained environments (feature-parity not guaranteed).
---
@@ -427,11 +431,17 @@ Supported provider modes:
## 9. Storage & Data Model
**Backend support:** The reference implementation uses PostgreSQL + pgvector.
LanceDB is supported as an optional adapter for local-only, single-process
deployments. The schema below describes the logical model and should be mapped
to either backend.
### 9.1 On-Disk Layout (Per User)
* App data directory (OS standard)
* `db/` (LanceDB)
* `db/` (PostgreSQL + pgvector)
* `lancedb/` (optional local-only adapter)
* `meetings/<meeting_id>/`
* `audio.<ext>` (encrypted container)
@@ -439,7 +449,7 @@ Supported provider modes:
* `logs/` (rotating; content-free)
* `settings.json`
### 9.2 Database Schema (LanceDB)
### 9.2 Database Schema (PostgreSQL baseline)
Core tables: