Add summarization and trigger services
- Introduced `SummarizationService` and `TriggerService` to orchestrate summarization and trigger detection functionalities. - Added new modules for summarization, including citation verification and cloud-based summarization providers. - Implemented trigger detection based on audio activity and foreground application status. - Updated project configuration to include new dependencies for summarization and trigger functionalities. - Created tests for summarization and trigger services to ensure functionality and reliability.
This commit is contained in:
@@ -94,6 +94,32 @@ I’m writing this so engineering can start building without re‑interpreting p
|
||||
* Final segments persisted to DB
|
||||
* Post-meeting transcript view
|
||||
|
||||
**Current status:**
|
||||
|
||||
* Final segments are emitted and persisted; partial updates are not yet produced.
|
||||
|
||||
**Implementation plan (add partials end-to-end):**
|
||||
|
||||
* ASR layer:
|
||||
* Extend ASR engine interface to surface partial hypotheses at a fixed cadence
|
||||
(e.g., every N seconds or on each VAD speech chunk).
|
||||
* Add a lightweight streaming mode for faster-whisper (or a buffering strategy
|
||||
that returns interim text from recent audio while finalization waits for
|
||||
silence).
|
||||
* Ensure partial outputs include a stable `segment_id=0` (or temporary ID)
|
||||
and do not persist to DB.
|
||||
* Server:
|
||||
* Emit `UPDATE_TYPE_PARTIAL` messages from the ASR loop on cadence.
|
||||
* Debounce partial updates to avoid UI churn and bandwidth spikes.
|
||||
* Keep final segment emission unchanged; partials must be overwritten by finals.
|
||||
* Client/UI:
|
||||
* Render a single “live partial” row at the bottom of the transcript list
|
||||
(grey text), replaced in-place on each partial update.
|
||||
* Drop partials on stop or on first final segment after a partial.
|
||||
* Tests:
|
||||
* Unit tests for partial cadence and suppression of partial persistence.
|
||||
* Integration test that partials appear before finals and are cleared on final.
|
||||
|
||||
**Exit criteria:**
|
||||
|
||||
* Live view shows partial text that settles into final segments.
|
||||
@@ -112,6 +138,12 @@ I’m writing this so engineering can start building without re‑interpreting p
|
||||
* Export: Markdown + HTML
|
||||
* Meeting library list + per-meeting search
|
||||
|
||||
**Gaps to close in this milestone:**
|
||||
|
||||
* Wire meeting library into the main UI and selection flow.
|
||||
* Add per-meeting transcript search (client-side filter is acceptable for V1).
|
||||
* Add `risk` annotation type end-to-end (domain enum, UI, persistence).
|
||||
|
||||
**Exit criteria:**
|
||||
|
||||
* Clicking a segment seeks audio playback to that time.
|
||||
@@ -132,6 +164,12 @@ I’m writing this so engineering can start building without re‑interpreting p
|
||||
* Prompt notification + snooze + suppress per-app
|
||||
* Settings for sensitivity and auto-start opt-in
|
||||
|
||||
**Deferred to a later, tray/hotkey-focused milestone:**
|
||||
|
||||
* Trigger prompts that include per-app suppression, calendar stubs, and
|
||||
snooze presets integrated with tray/menubar UX.
|
||||
* Persistent “recording/monitoring” indicator when background capture is active.
|
||||
|
||||
**Exit criteria:**
|
||||
|
||||
* Trigger prompts happen when expected and can be snoozed.
|
||||
@@ -153,6 +191,27 @@ I’m writing this so engineering can start building without re‑interpreting p
|
||||
* Citation verifier + “uncited drafts” handling
|
||||
* Summary UI panel with clickable citations
|
||||
|
||||
**Implementation plan (citations enforced):**
|
||||
|
||||
* Summarizer provider interface:
|
||||
* Define `Summarizer` protocol with `extract()` and `synthesize()` phases.
|
||||
* Provide `MockSummarizer` for tests and a cloud-backed provider behind opt-in.
|
||||
* Extraction stage:
|
||||
* Segment-aware chunking (~500 tokens) with stable `segment_ids` in each chunk.
|
||||
* Extraction prompt returns structured items: quote, segment_ids, category.
|
||||
* Synthesis stage:
|
||||
* Rewrite extracted items into bullets; each bullet must end with
|
||||
`[...]` containing segment IDs.
|
||||
* Verification stage:
|
||||
* Parse bullets; suppress any uncited bullets by default.
|
||||
* Store uncited drafts separately for optional user review.
|
||||
* UI:
|
||||
* Summary panel lists key points + action items with clickable citations.
|
||||
* Clicking a bullet scrolls transcript and seeks audio to the first segment.
|
||||
* Tests:
|
||||
* Unit tests for citation parsing, uncited suppression, and click→segment mapping.
|
||||
* Integration test for summary generation request and persisted citations.
|
||||
|
||||
**Exit criteria:**
|
||||
|
||||
* Every displayed bullet has citations.
|
||||
@@ -173,6 +232,19 @@ I’m writing this so engineering can start building without re‑interpreting p
|
||||
* “Check for updates” flow (manual link + version display)
|
||||
* Release checklist & troubleshooting docs
|
||||
|
||||
**Implementation plan (delete/retention correctness):**
|
||||
|
||||
* Meeting deletion:
|
||||
* Extend delete flow to remove encrypted audio assets on disk.
|
||||
* Delete wrapped DEK and master key references so audio cannot be decrypted.
|
||||
* Add best-effort cleanup for orphaned files on next startup.
|
||||
* Retention:
|
||||
* Scheduled job that deletes meetings older than retention days.
|
||||
* Include DB rows, summaries, and audio assets in the purge.
|
||||
* Tests:
|
||||
* Integration test that delete removes DB rows + audio file path.
|
||||
* Integration test that retention job removes expired meetings and assets.
|
||||
|
||||
**Exit criteria:**
|
||||
|
||||
* A signed installer (or unsigned for internal) that installs and runs on both OSs.
|
||||
|
||||
18
docs/spec.md
18
docs/spec.md
@@ -1,6 +1,6 @@
|
||||
Below is a rewritten, end‑to‑end **Product Specification + Engineering Design Document** for **NoteFlow V1 (Minimum Lovable Product)** that merges:
|
||||
|
||||
* your **revised V1 draft** (confidence-model triggers, single-process, partial/final UX, extract‑then‑synthesize citations, pragmatic typing, packaging constraints, risks table), and
|
||||
* your **revised V1 draft** (confidence-model triggers, client/server architecture, partial/final UX, extract‑then‑synthesize citations, pragmatic typing, packaging constraints, risks table), and
|
||||
* the **de-risking feedback** I gave earlier (audio capture reality, diarization scope, citation enforcement, OS permissions, shipping concerns, storage/retention, update strategy, and “don’t promise what you can’t reliably ship”).
|
||||
|
||||
I’ve kept it “shipping-ready” by being explicit about decisions, failure modes, acceptance criteria, and what is deferred.
|
||||
@@ -292,7 +292,9 @@ The system is split into two components that can run on the same machine or sepa
|
||||
**Server (Headless Backend)**
|
||||
* **ASR Engine:** faster-whisper for transcription
|
||||
* **Meeting Store:** in-memory meeting management
|
||||
* **Storage:** LanceDB for persistence + encrypted audio assets
|
||||
* **Storage:** PostgreSQL + pgvector for persistence + encrypted audio assets
|
||||
(current implementation). LanceDB is supported as an optional adapter for
|
||||
local-only deployments in single-process mode.
|
||||
* **gRPC Service:** bidirectional streaming for real-time transcription
|
||||
|
||||
**Client (GUI Application)**
|
||||
@@ -310,6 +312,8 @@ The system is split into two components that can run on the same machine or sepa
|
||||
**Deployment modes:**
|
||||
1. **Local:** Server + Client on same machine (default)
|
||||
2. **Split:** Server on headless machine, Client on workstation with audio
|
||||
3. **Local-only adapter:** Optional LanceDB-backed, single-process mode
|
||||
for development or constrained environments (feature-parity not guaranteed).
|
||||
|
||||
---
|
||||
|
||||
@@ -427,11 +431,17 @@ Supported provider modes:
|
||||
|
||||
## 9. Storage & Data Model
|
||||
|
||||
**Backend support:** The reference implementation uses PostgreSQL + pgvector.
|
||||
LanceDB is supported as an optional adapter for local-only, single-process
|
||||
deployments. The schema below describes the logical model and should be mapped
|
||||
to either backend.
|
||||
|
||||
### 9.1 On-Disk Layout (Per User)
|
||||
|
||||
* App data directory (OS standard)
|
||||
|
||||
* `db/` (LanceDB)
|
||||
* `db/` (PostgreSQL + pgvector)
|
||||
* `lancedb/` (optional local-only adapter)
|
||||
* `meetings/<meeting_id>/`
|
||||
|
||||
* `audio.<ext>` (encrypted container)
|
||||
@@ -439,7 +449,7 @@ Supported provider modes:
|
||||
* `logs/` (rotating; content-free)
|
||||
* `settings.json`
|
||||
|
||||
### 9.2 Database Schema (LanceDB)
|
||||
### 9.2 Database Schema (PostgreSQL baseline)
|
||||
|
||||
Core tables:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user