Add summarization and trigger services

- Introduced `SummarizationService` and `TriggerService` to orchestrate summarization and trigger detection functionalities. - Added new modules for summarization, including citation verification and cloud-based summarization providers. - Implemented trigger detection based on audio activity and foreground application status. - Updated project configuration to include new dependencies for summarization and trigger functionalities. - Created tests for summarization and trigger services to ensure functionality and reliability.
2025-12-18 00:08:51 +00:00
parent b36ee5c211
commit 4eef1b3be6
49 changed files with 15909 additions and 4256 deletions
--- a/docs/milestones.md
+++ b/docs/milestones.md
@@ -94,6 +94,32 @@ I’m writing this so engineering can start building without re‑interpreting p
 * Final segments persisted to DB
 * Post-meeting transcript view

+**Current status:**
+
+* Final segments are emitted and persisted; partial updates are not yet produced.
+
+**Implementation plan (add partials end-to-end):**
+
+* ASR layer:
+  * Extend ASR engine interface to surface partial hypotheses at a fixed cadence
+    (e.g., every N seconds or on each VAD speech chunk).
+  * Add a lightweight streaming mode for faster-whisper (or a buffering strategy
+    that returns interim text from recent audio while finalization waits for
+    silence).
+  * Ensure partial outputs include a stable `segment_id=0` (or temporary ID)
+    and do not persist to DB.
+* Server:
+  * Emit `UPDATE_TYPE_PARTIAL` messages from the ASR loop on cadence.
+  * Debounce partial updates to avoid UI churn and bandwidth spikes.
+  * Keep final segment emission unchanged; partials must be overwritten by finals.
+* Client/UI:
+  * Render a single “live partial” row at the bottom of the transcript list
+    (grey text), replaced in-place on each partial update.
+  * Drop partials on stop or on first final segment after a partial.
+* Tests:
+  * Unit tests for partial cadence and suppression of partial persistence.
+  * Integration test that partials appear before finals and are cleared on final.
+
 **Exit criteria:**

 * Live view shows partial text that settles into final segments.
@@ -112,6 +138,12 @@ I’m writing this so engineering can start building without re‑interpreting p
 * Export: Markdown + HTML
 * Meeting library list + per-meeting search

+**Gaps to close in this milestone:**
+
+* Wire meeting library into the main UI and selection flow.
+* Add per-meeting transcript search (client-side filter is acceptable for V1).
+* Add `risk` annotation type end-to-end (domain enum, UI, persistence).
+
 **Exit criteria:**

 * Clicking a segment seeks audio playback to that time.
@@ -132,6 +164,12 @@ I’m writing this so engineering can start building without re‑interpreting p
 * Prompt notification + snooze + suppress per-app
 * Settings for sensitivity and auto-start opt-in

+**Deferred to a later, tray/hotkey-focused milestone:**
+
+* Trigger prompts that include per-app suppression, calendar stubs, and
+  snooze presets integrated with tray/menubar UX.
+* Persistent “recording/monitoring” indicator when background capture is active.
+
 **Exit criteria:**

 * Trigger prompts happen when expected and can be snoozed.
@@ -153,6 +191,27 @@ I’m writing this so engineering can start building without re‑interpreting p
 * Citation verifier + “uncited drafts” handling
 * Summary UI panel with clickable citations

+**Implementation plan (citations enforced):**
+
+* Summarizer provider interface:
+  * Define `Summarizer` protocol with `extract()` and `synthesize()` phases.
+  * Provide `MockSummarizer` for tests and a cloud-backed provider behind opt-in.
+* Extraction stage:
+  * Segment-aware chunking (~500 tokens) with stable `segment_ids` in each chunk.
+  * Extraction prompt returns structured items: quote, segment_ids, category.
+* Synthesis stage:
+  * Rewrite extracted items into bullets; each bullet must end with
+    `[...]` containing segment IDs.
+* Verification stage:
+  * Parse bullets; suppress any uncited bullets by default.
+  * Store uncited drafts separately for optional user review.
+* UI:
+  * Summary panel lists key points + action items with clickable citations.
+  * Clicking a bullet scrolls transcript and seeks audio to the first segment.
+* Tests:
+  * Unit tests for citation parsing, uncited suppression, and click→segment mapping.
+  * Integration test for summary generation request and persisted citations.
+
 **Exit criteria:**

 * Every displayed bullet has citations.
@@ -173,6 +232,19 @@ I’m writing this so engineering can start building without re‑interpreting p
 * “Check for updates” flow (manual link + version display)
 * Release checklist & troubleshooting docs

+**Implementation plan (delete/retention correctness):**
+
+* Meeting deletion:
+  * Extend delete flow to remove encrypted audio assets on disk.
+  * Delete wrapped DEK and master key references so audio cannot be decrypted.
+  * Add best-effort cleanup for orphaned files on next startup.
+* Retention:
+  * Scheduled job that deletes meetings older than retention days.
+  * Include DB rows, summaries, and audio assets in the purge.
+* Tests:
+  * Integration test that delete removes DB rows + audio file path.
+  * Integration test that retention job removes expired meetings and assets.
+
 **Exit criteria:**

 * A signed installer (or unsigned for internal) that installs and runs on both OSs.
--- a/docs/spec.md
+++ b/docs/spec.md
@@ -1,6 +1,6 @@
 Below is a rewritten, end‑to‑end **Product Specification + Engineering Design Document** for **NoteFlow V1 (Minimum Lovable Product)** that merges:

-* your **revised V1 draft** (confidence-model triggers, single-process, partial/final UX, extract‑then‑synthesize citations, pragmatic typing, packaging constraints, risks table), and
+* your **revised V1 draft** (confidence-model triggers, client/server architecture, partial/final UX, extract‑then‑synthesize citations, pragmatic typing, packaging constraints, risks table), and
 * the **de-risking feedback** I gave earlier (audio capture reality, diarization scope, citation enforcement, OS permissions, shipping concerns, storage/retention, update strategy, and “don’t promise what you can’t reliably ship”).

 I’ve kept it “shipping-ready” by being explicit about decisions, failure modes, acceptance criteria, and what is deferred.
@@ -292,7 +292,9 @@ The system is split into two components that can run on the same machine or sepa
 **Server (Headless Backend)**
 * **ASR Engine:** faster-whisper for transcription
 * **Meeting Store:** in-memory meeting management
-* **Storage:** LanceDB for persistence + encrypted audio assets
+* **Storage:** PostgreSQL + pgvector for persistence + encrypted audio assets
+  (current implementation). LanceDB is supported as an optional adapter for
+  local-only deployments in single-process mode.
 * **gRPC Service:** bidirectional streaming for real-time transcription

 **Client (GUI Application)**
@@ -310,6 +312,8 @@ The system is split into two components that can run on the same machine or sepa
 **Deployment modes:**
 1. **Local:** Server + Client on same machine (default)
 2. **Split:** Server on headless machine, Client on workstation with audio
+3. **Local-only adapter:** Optional LanceDB-backed, single-process mode
+   for development or constrained environments (feature-parity not guaranteed).

 ---

@@ -427,11 +431,17 @@ Supported provider modes:

 ## 9. Storage & Data Model

+**Backend support:** The reference implementation uses PostgreSQL + pgvector.
+LanceDB is supported as an optional adapter for local-only, single-process
+deployments. The schema below describes the logical model and should be mapped
+to either backend.
+
 ### 9.1 On-Disk Layout (Per User)

 * App data directory (OS standard)

-  * `db/` (LanceDB)
+  * `db/` (PostgreSQL + pgvector)
+  * `lancedb/` (optional local-only adapter)
  * `meetings/<meeting_id>/`

    * `audio.<ext>` (encrypted container)
@@ -439,7 +449,7 @@ Supported provider modes:
  * `logs/` (rotating; content-free)
  * `settings.json`

-### 9.2 Database Schema (LanceDB)
+### 9.2 Database Schema (PostgreSQL baseline)

 Core tables: