Files
noteflow/docs/milestones.md
Travis Vasceannie b333ea5b23 Add initial Docker and development environment setup
- Created .dockerignore to exclude unnecessary files from Docker builds.
- Added .repomixignore for managing ignored patterns in Repomix.
- Introduced Dockerfile.dev for development environment setup with Python 3.12.
- Configured docker-compose.yaml to define services, including a PostgreSQL database.
- Established a devcontainer.json for Visual Studio Code integration.
- Implemented postCreate.sh for automatic dependency installation in the dev container.
- Added constants.py to centralize configuration constants for the project.
- Updated pyproject.toml to include new development dependencies.
- Created initial documentation files for project overview and style conventions.
- Added tests for new functionalities to ensure reliability and correctness.
2025-12-19 05:02:16 +00:00

38 KiB
Raw Blame History

NoteFlow V1 Implementation Plan

Architecture: Client-Server with gRPC (evolved from original single-process design) Core principles: Local-first, mic capture baseline, partial→final transcripts, evidence-linked summaries with strict citation enforcement.

Last updated: December 2025


1) Milestones and Gates

Milestone 0 — Spikes to de-risk platform & pipeline COMPLETE

Goal: validate the 4 biggest "desktop app cliffs" before committing to architecture.

Spikes (each ends with a tiny working prototype + written findings):

  1. UI + Tray + Hotkeys feasibility

    • Verified: system tray/menubar icon, notification prompt, global hotkey start/stop
    • Flet works for main UI; pystray/pynput validated for tray + hotkeys
    • Location: spikes/spike_01_ui_tray_hotkeys/
  2. Audio capture robustness

    • Validated sounddevice.InputStream with PortAudio:
      • default mic capture works
      • device unplug / device switch handling
      • stable VU meter feed
    • Location: spikes/spike_02_audio_capture/
  3. ASR latency feasibility

    • faster-whisper benchmarked at 0.05x real-time (excellent)
    • Model download/cache strategy validated
    • Location: spikes/spike_03_asr_latency/
  4. Key storage + encryption approach

    • OS keystore integration works (Keychain/Credential Manager via keyring)
    • Encrypted streaming audio file validated (chunked AES-GCM, 826 MB/s throughput)
    • Location: spikes/spike_04_encryption/

Exit criteria (M0): ALL MET

  • Start recording → see VU meter → stop → playback file on both OSs
  • Run ASR over captured audio and display text in UI
  • Store/read an encrypted blob using a stored master key

Milestone 1 — Repo foundation + CI + core contracts COMPLETE

Goal: establish maintainable structure, typing, test harness, logging.

Deliverables: ALL COMPLETE

  • Repository layout with hexagonal architecture (domain → application → infrastructure)
  • pyproject.toml + uv lockfile
  • Quality gates: ruff, basedpyright, pytest
  • Structured logging (structlog) with content-safe defaults
  • Settings system (Pydantic settings + NOTEFLOW_ env vars)
  • Minimal "app shell" (Flet UI opens, logs write)

Implementation locations:

  • Domain: src/noteflow/domain/ (entities, ports, value objects)
  • Application: src/noteflow/application/services/
  • Infrastructure: src/noteflow/infrastructure/
  • Config: src/noteflow/config/

Exit criteria: ALL MET

  • CI passes lint/type/tests
  • Running app opens a window (tray integration deferred to M5)

Milestone 2 — Meeting lifecycle + mic capture + crash-safe persistence COMPLETE

Goal: reliable recording as the foundation.

Deliverables: ALL COMPLETE

  • MeetingService state machine (CREATED → RECORDING → STOPPING → STOPPED → COMPLETED)
    • Location: src/noteflow/application/services/meeting_service.py
  • Audio capture via SoundDeviceCapture
    • Location: src/noteflow/infrastructure/audio/capture.py
  • Encrypted streaming asset writer (NFAE format, AES-256-GCM)
    • Location: src/noteflow/infrastructure/audio/writer.py
    • Crypto: src/noteflow/infrastructure/security/crypto.py
  • Meeting folder layout + manifest.json
    • Format: ~/.noteflow/meetings/<meeting-uuid>/audio.enc + manifest.json
  • Active Meeting UI: timer + VU meter + start/stop
    • Components: recording_timer.py, vu_meter.py in client/components/
  • Crash recovery via RecoveryService
    • Location: src/noteflow/application/services/recovery_service.py
    • Detects meetings left in RECORDING/STOPPING state, marks as ERROR

Exit criteria: ALL MET

  • Record 30 minutes without UI freezing
  • App restart after forced kill recovers incomplete meetings

Milestone 3 — Partial→Final transcription + transcript persistence COMPLETE

Goal: near real-time transcription with stability rules.

Deliverables: ALL COMPLETE

  • ASR wrapper service (faster-whisper with word timestamps)
    • Location: src/noteflow/infrastructure/asr/engine.py
    • Supports 13 model sizes, CPU/GPU, word-level timestamps
  • VAD + segment finalization logic
    • EnergyVad: src/noteflow/infrastructure/asr/streaming_vad.py
    • Segmenter: src/noteflow/infrastructure/asr/segmenter.py
  • Partial transcript feed to UI
    • Server: _maybe_emit_partial() called during streaming (service.py:601)
    • 2-second cadence with text deduplication
    • Client: Handles is_final=False in client.py:458-467
    • UI: [LIVE] row with blue styling (transcript.py:182-219)
  • Final segments persisted to PostgreSQL + pgvector
    • Repository: src/noteflow/infrastructure/persistence/repositories/segment.py
  • Post-meeting transcript view
    • Component: src/noteflow/client/components/transcript.py

Implementation details:

  • Server emits UPDATE_TYPE_PARTIAL every 2 seconds during speech activity
  • Minimum 0.5 seconds of audio before partial inference
  • Partial text deduplicated (only emitted when changed)
  • Client renders partials with is_final=False flag
  • UI displays [LIVE] indicator with blue background, grey italic text
  • Partial row cleared when final segment arrives

Exit criteria: ALL MET

  • Live view shows partial text that settles into final segments
  • After restart, final segments are still present and searchable within the meeting

Milestone 4 — Review UX: playback, annotations, export MOSTLY COMPLETE

Goal: navigable recall loop.

Deliverables:

  • Audio playback synced to segment timestamps
    • PlaybackControls: src/noteflow/client/components/playback_controls.py
    • PlaybackSyncController: src/noteflow/client/components/playback_sync.py
    • SoundDevicePlayback: src/noteflow/infrastructure/audio/playback.py
  • Add annotations in live view + review view
    • AnnotationToolbar: src/noteflow/client/components/annotation_toolbar.py
    • AnnotationDisplay: src/noteflow/client/components/annotation_display.py
    • All 4 types: ACTION_ITEM, DECISION, NOTE, RISK
  • Export: Markdown + HTML
    • ExportService: src/noteflow/application/services/export_service.py
    • Markdown exporter: src/noteflow/infrastructure/export/markdown.py
    • HTML exporter: src/noteflow/infrastructure/export/html.py
  • Meeting library list + per-meeting search
    • MeetingLibrary: src/noteflow/client/components/meeting_library.py
    • TranscriptComponent with search: src/noteflow/client/components/transcript.py

Previous gaps — now closed:

  • Wire meeting library into the main UI and selection flow
  • Add per-meeting transcript search (client-side filter)
  • Add risk annotation type end-to-end (domain enum, UI, persistence)

Exit criteria: ALL MET

  • Clicking a segment seeks audio playback to that time
  • Export produces correct Markdown/HTML for at least one meeting

Milestone 5 — Smart triggers (confidence model) + snooze/suppression ⚠️ PARTIALLY INTEGRATED

Goal: prompts that are helpful, not annoying.

Deliverables:

  • Trigger engine + scoring
    • TriggerService: src/noteflow/application/services/trigger_service.py
    • Domain entities: src/noteflow/domain/triggers/entities.py
    • TriggerSignal, TriggerDecision, TriggerAction (IGNORE, NOTIFY, AUTO_START)
  • SignalProvider protocol defined
    • Location: src/noteflow/domain/triggers/ports.py
  • Foreground app detector integration
    • Infrastructure: src/noteflow/infrastructure/triggers/foreground_app.py
    • Wired via TriggerMixin: src/noteflow/client/_trigger_mixin.py
  • Audio activity detector integration
    • Infrastructure: src/noteflow/infrastructure/triggers/audio_activity.py
    • Wired via TriggerMixin: src/noteflow/client/_trigger_mixin.py
  • Optional calendar connector stub (disabled by default)
  • Trigger prompts + snooze (AlertDialog, not system notifications)
    • TriggerMixin._show_trigger_prompt() displays AlertDialog
    • Snooze button integrated
    • Rate limiting active
  • System tray integration ← GAP
  • Global hotkeys ← GAP
  • Settings for sensitivity and auto-start opt-in (in TriggerService)

Current integration status:

  • Client app inherits from TriggerMixin (app.py:65)
  • Signal providers initialized in _initialize_triggers() method
  • Background trigger check loop runs via _trigger_check_loop()
  • Handles NOTIFY and AUTO_START actions
  • Prompts shown via Flet AlertDialog (not system notifications)

What works:

  • Confidence scoring with configurable thresholds (0.40 notify, 0.80 auto-start)
  • Rate limiting between triggers
  • Snooze functionality with remaining time tracking
  • Per-app suppression config
  • Foreground app detection (PyWinCtl)
  • Audio activity detection (RMS sliding window)

Remaining work:

  1. System Tray Integration (New file: src/noteflow/client/tray.py)

    • Integrate pystray for minimize-to-tray
    • Show trigger prompts as system notifications
    • Recording indicator icon
    • Complexity: Medium (spike validated in spikes/spike_01_ui_tray_hotkeys/)
  2. Global Hotkeys (New file: src/noteflow/client/hotkeys.py)

    • Integrate pynput for start/stop/annotation hotkeys
    • Complexity: Medium (spike validated)

Exit criteria:

  • Trigger prompts happen when expected and can be snoozed
  • Prompt rate-limited to prevent spam
  • System tray notifications (currently AlertDialog only)
  • Global hotkeys for quick actions

Milestone 6 — Evidence-linked summaries (extract → synthesize → verify) COMPLETE

Goal: no uncited claims.

Deliverables:

  • Summarizer provider interface
    • Protocol: src/noteflow/domain/summarization/ports.py
    • DTOs: SummarizationRequest, SummarizationResult, CitationVerificationResult
  • Provider implementations (3 complete):
    • MockSummarizer: src/noteflow/infrastructure/summarization/mock_provider.py
    • OllamaSummarizer (local): src/noteflow/infrastructure/summarization/ollama_provider.py
    • CloudSummarizer (OpenAI/Anthropic): src/noteflow/infrastructure/summarization/cloud_provider.py
  • Citation verifier + "uncited drafts" handling
    • CitationVerifier: src/noteflow/infrastructure/summarization/citation_verifier.py
    • Validates segment_ids, filters invalid citations
  • Summary UI panel with clickable citations
    • SummaryPanel: src/noteflow/client/components/summary_panel.py
    • Shows key points + action items with evidence links
    • "Uncited drafts hidden" toggle
  • Factory function for service creation
    • create_summarization_service(): src/noteflow/infrastructure/summarization/factory.py
    • Shared by client app and gRPC server

Application service complete:

  • SummarizationService: src/noteflow/application/services/summarization_service.py
  • Multi-provider with consent management
  • Fallback chain: CLOUD → LOCAL → MOCK
  • Citation verification and filtering

gRPC integration complete:

  • GenerateSummary RPC calls SummarizationService.summarize()
  • Auto-detects provider availability (tries LOCAL, falls back to MOCK)
  • Placeholder fallback if service unavailable

Exit criteria: ALL MET

  • Every displayed bullet has citations (RPC wired to real service)
  • Clicking bullet jumps to cited transcript segment and audio timestamp

Milestone 7 — Retention, deletion, telemetry (opt-in), packaging ⚠️ RETENTION COMPLETE

Goal: ship safely.

Deliverables:

  • Retention job
    • RetentionService: src/noteflow/application/services/retention_service.py
    • Configurable retention days, dry-run support
    • Runs at startup and periodically
  • Delete meeting (cryptographic delete)
    • MeetingService.delete_meeting() removes:
      • Database rows (meeting, segments, summary, annotations)
      • Encrypted audio file (audio.enc)
      • Wrapped DEK from manifest (renders audio unrecoverable)
      • Meeting directory
  • Optional telemetry (content-free) ← GAP
  • PyInstaller build ← GAP
  • "Check for updates" flow ← GAP
  • Release checklist & troubleshooting docs

What's implemented:

  • Meeting deletion cascade is complete:

    • DB cascade: meeting → segments → summary → annotations
    • Filesystem: ~/.noteflow/meetings/<meeting-uuid>/ removed
    • Crypto: DEK deleted from manifest, audio unrecoverable
  • Retention service is complete:

    • RetentionService.run_cleanup() with dry-run
    • Finds meetings older than retention cutoff
    • Generates RetentionReport with counts
    • Integration tests validate cascade

Remaining work:

  1. PyInstaller Packaging (New: build scripts)

    • Create distributable for macOS + Windows
    • Complexity: High (cross-platform, native deps)
  2. Code Signing

    • macOS notarization, Windows signing
    • Complexity: Medium
  3. Update Check Flow

    • Version display + "Check for Updates" → release page
    • Complexity: Low
  4. Telemetry (Opt-in)

    • Content-free metrics: crash stacktrace, latency, feature flags
    • Complexity: Medium

Exit criteria:

  • A signed installer that installs and runs on both OSs
  • Deleting a meeting removes DB rows + assets; audio cannot be decrypted after key deletion

Milestone 8 (Optional prerelease) — Post-meeting anonymous diarization COMPLETE

Goal: "Speaker A/B/C" best-effort labeling.

Deliverables:

  • Diarization engine with streaming + offline modes
    • Location: src/noteflow/infrastructure/diarization/engine.py (315 lines)
    • Streaming: diart library for real-time processing
    • Offline: pyannote.audio for post-meeting refinement
    • Device support: auto, cpu, cuda, mps
  • Speaker assignment logic
    • Location: src/noteflow/infrastructure/diarization/assigner.py
    • assign_speaker() maps time ranges via maximum overlap
    • assign_speakers_batch() for bulk assignment
    • Confidence scoring based on overlap duration
  • Data transfer objects
    • Location: src/noteflow/infrastructure/diarization/dto.py
    • SpeakerTurn with validation and overlap methods
  • Domain entity updates
    • Segment.speaker_id: str | None and speaker_confidence: float
  • Proto/gRPC definitions
    • FinalSegment.speaker_id and speaker_confidence fields
    • ServerInfo.diarization_enabled and diarization_ready flags
    • RefineSpeakerDiarization and RenameSpeaker RPCs
  • gRPC refinement RPC
    • refine_speaker_diarization() in service.py for post-meeting processing
    • rename_speaker() for user-friendly speaker labels
  • Configuration/settings
    • diarization_enabled, diarization_hf_token, diarization_device
    • diarization_streaming_latency, diarization_min/max_speakers
  • Dependencies added
    • Optional extra [diarization]: pyannote.audio, diart, torch
  • UI display
    • Speaker labels with color coding in transcript.py
    • "Analyze Speakers" and "Rename Speakers" buttons in meeting_library.py
  • Server initialization
    • DiarizationEngine wired in server.py with CLI args
    • --diarization, --diarization-hf-token, --diarization-device flags
  • Client integration
    • refine_speaker_diarization() and rename_speaker() methods in client.py
    • DiarizationResult and RenameSpeakerResult DTOs
  • Tests
    • 24 unit tests in tests/infrastructure/test_diarization.py
    • Covers SpeakerTurn, assign_speaker(), assign_speakers_batch()

Deferred (optional future enhancement):

  • Streaming integration - Real-time speaker labels during recording
    • Feed audio chunks to diarization during StreamTranscription
    • Emit speaker changes in real-time
    • Complexity: High (requires significant latency tuning)

Exit criteria: ALL MET

  • If diarization fails, app degrades gracefully to "Unknown."
  • Post-meeting diarization refinement works end-to-end
  • (Optional) Streaming diarization shows live speaker labels — deferred

2) Proposed Repository Layout

This layout is designed to:

  • separate server and client concerns,
  • isolate platform-specific code,
  • keep modules < 500 LoC,
  • make DI clean,
  • keep writing to disk centralized.
noteflow/
├─ pyproject.toml
├─ src/noteflow/
│  ├─ core/
│  │  ├─ config.py            # Settings (Pydantic) + load/save
│  │  ├─ logging.py           # structlog config, redaction helpers
│  │  ├─ types.py             # common NewTypes / Protocols
│  │  └─ errors.py            # domain error types
│  │
│  ├─ grpc/                   # gRPC server components
│  │  ├─ proto/
│  │  │  ├─ noteflow.proto    # Service definitions
│  │  │  ├─ noteflow_pb2.py   # Generated protobuf
│  │  │  └─ noteflow_pb2_grpc.py
│  │  ├─ server.py            # Server entry point
│  │  ├─ service.py           # NoteFlowServicer implementation
│  │  ├─ meeting_store.py     # In-memory meeting management
│  │  └─ client.py            # gRPC client wrapper
│  │
│  ├─ client/                 # GUI client application
│  │  ├─ app.py               # Flet app entry point
│  │  ├─ state.py             # App state store
│  │  └─ components/
│  │     ├─ transcript.py
│  │     ├─ vu_meter.py
│  │     └─ summary_panel.py
│  │
│  ├─ audio/                  # Audio capture (client-side)
│  │  ├─ capture.py           # sounddevice InputStream wrapper
│  │  ├─ levels.py            # RMS/VU meter computation
│  │  ├─ ring_buffer.py       # timestamped audio buffer
│  │  └─ playback.py          # audio playback synced to timestamp
│  │
│  ├─ asr/                    # ASR engine (server-side)
│  │  ├─ engine.py            # faster-whisper wrapper + model cache
│  │  ├─ segmenter.py         # partial/final logic, silence boundaries
│  │  └─ dto.py               # ASR outputs (words optional)
│  │
│  ├─ data/                   # Persistence (server-side)
│  │  ├─ db.py                # LanceDB connection + table handles
│  │  ├─ schema.py            # table schemas + version
│  │  └─ repos/
│  │     ├─ meetings.py
│  │     ├─ segments.py
│  │     └─ summaries.py
│  │
│  ├─ platform/               # Platform-specific (client-side)
│  │  ├─ tray/                # tray/menubar (pystray)
│  │  ├─ hotkeys/             # global hotkeys (pynput)
│  │  └─ notifications/       # toast notifications
│  │
│  └─ summarization/          # Summary generation (server-side)
│     ├─ providers/
│     │  ├─ base.py
│     │  └─ cloud.py
│     ├─ prompts.py
│     └─ verifier.py
│
├─ spikes/                    # De-risking spikes (M0)
│  ├─ spike_01_ui_tray_hotkeys/
│  ├─ spike_02_audio_capture/
│  ├─ spike_03_asr_latency/
│  └─ spike_04_encryption/
│
└─ tests/
   ├─ unit/
   ├─ integration/
   └─ e2e/

3) Core Runtime Design

3.1 State Machine (Meeting Lifecycle)

Define explicitly so UI + services remain consistent.

IDLE
  ├─ start(manual/trigger) → RECORDING
  └─ prompt(trigger) → PROMPTED

PROMPTED
  ├─ accept → RECORDING
  └─ dismiss/snooze → IDLE

RECORDING
  ├─ stop → STOPPING
  ├─ error(audio) → ERROR (with recover attempt)
  └─ crash → RECOVERABLE_INCOMPLETE on restart

STOPPING
  ├─ flush assets/segments → REVIEW_READY
  └─ failure → REVIEW_READY (marked incomplete)

REVIEW_READY
  ├─ summarize → REVIEW_READY (summary updated)
  └─ delete → IDLE

Invariant: segments are only “final” when persisted. Partial text is never persisted.


3.2 Threading + Queue Model (Client-Server)

Server Threads:

  • gRPC thread pool: handles incoming RPC requests
  • ASR worker thread: processes audio buffers through faster-whisper
  • IO worker thread: only place that writes DB + manifest updates
  • Background jobs: summarization, diarization, retention

Client Threads:

  • Main/UI thread: Flet rendering + user actions
  • Audio callback thread: receives frames, does minimal work:
    • compute lightweight RMS for VU meter
    • enqueue frames to gRPC stream queue
  • gRPC stream thread: sends audio chunks, receives transcript updates
  • Event dispatch: updates UI from transcript callbacks

Rules:

  • Anything blocking > 5ms does not run in the audio callback
  • Only the server's IO worker writes to the database

4) Dependency Injection and Service Wiring

Use a small container (manual DI) rather than a framework.

# core/types.py
from typing import Protocol

class Clock(Protocol):
    def monotonic(self) -> float: ...
    def now(self): ...

class Notifier(Protocol):
    def prompt_recording(self, title: str, body: str) -> None: ...
    def toast(self, title: str, body: str) -> None: ...

class ForegroundAppProvider(Protocol):
    def current_app(self) -> str | None: ...

class KeyStore(Protocol):
    def get_or_create_master_key(self) -> bytes: ...
# app.py (wiring idea)
def build_container() -> AppContainer:
    settings = load_settings()
    logger = configure_logging(settings)
    keystore = build_keystore()
    crypt = CryptoBox(keystore)
    db = LanceDatabase(settings.paths.db_dir)
    repos = Repositories(db)
    jobs = JobQueue(...)
    audio = AudioCapture(...)
    asr = AsrEngine(...)
    meeting = MeetingService(...)
    triggers = TriggerService(...)
    ui = UiController(...)
    return AppContainer(...)

5) Detailed Subsystem Plans

5.1 Audio Capture + Assets

AudioCapture

Responsibilities:

  • open/close stream
  • handle device change / reconnect
  • feed ring buffer
  • expose current level for VU meter

Key APIs:

class AudioCapture:
    def start(self, on_frames: Callable[[np.ndarray, float], None]) -> None: ...
    def stop(self) -> None: ...
    def current_device(self) -> AudioDeviceInfo: ...

RingBuffer (timestamped)

  • store (timestamp, frames) so segment times are stable even if UI thread lags
  • provide “last N seconds” view for ASR worker

VAD

Define an interface so you can swap implementations (webrtcvad vs silero) without rewriting pipeline.

class Vad:
    def is_speech(self, pcm16: bytes, sample_rate: int) -> bool: ...

Encrypted Audio Container (streaming)

Implementation approach (V1-safe): encrypted chunk format (AES-GCM) storing PCM16 frames. Optional: later add “compress after meeting” job (Opus) once stable.

Writer contract:

  • write header once
  • write chunks frequently (every ~200500ms)
  • flush frequently (crash-safe)

Deletion contract:

  • delete per-meeting DEK record first (crypto delete)
  • delete meeting folder

5.2 ASR and Segment Finalization

ASR Engine Wrapper (faster-whisper)

Responsibilities:

  • model download/cache
  • run inference
  • return tokens/segments with timestamps (word timestamps optional)
class AsrEngine:
    def transcribe(self, audio_f32_16k: np.ndarray) -> AsrResult: ...

Segmenter (partial/final)

Responsibilities:

  • build current “active utterance” from VAD-speech frames
  • run partial inference every N seconds
  • finalize when silence boundary detected

Data contract:

  • PartialUpdate: {text, start_offset, end_offset, stable=False}
  • FinalSegment: {segment_id, text, start_offset, end_offset, stable=True}

Important: final segments get their IDs at commit time (IO worker), not earlier.


5.3 Persistence (LanceDB + repositories)

DB access policy

  • One DB connection managed centrally
  • IO worker serializes all writes

Repositories:

  • MeetingsRepo: create/update meeting status, store DEK metadata reference
  • SegmentsRepo: append segments, query by meeting, basic search
  • AnnotationsRepo: add/list annotations
  • SummariesRepo: store summary + verification report

Also store:

  • schema version
  • app version
  • migration logic (even if minimal)

5.4 MeetingService (Orchestration)

Responsibilities:

  • create meeting directory + metadata
  • start/stop audio capture
  • start/stop ASR segmenter
  • handle UI events (annotation hotkeys, stop, etc.)
  • coordinate with TriggerService
  • ensure crash-safe flush and marking incomplete

Key public API:

class MeetingService:
    def start(self, source: TriggerSource) -> MeetingID: ...
    def stop(self) -> None: ...
    def add_annotation(self, type: AnnotationType, text: str | None = None) -> None: ...
    def current_meeting_id(self) -> MeetingID | None: ...

5.5 TriggerService (Confidence Model + throttling)

Inputs (each independently optional):

  • calendar (optional connector)
  • foreground app provider
  • audio activity provider

Outputs:

  • prompt notification
  • optional auto-start (if user enabled)
  • snooze & suppression state

Policies:

  • rate limit prompts (e.g., max 1 prompt / 10 min)
  • cooldown after dismiss
  • per-app suppression config

Implementation detail:

  • TriggerService publishes events via signals:

    • trigger_prompted
    • trigger_snoozed
    • trigger_accepted

5.6 Summarization Service (Extract → Synthesize → Verify)

Provider interface:

class SummarizerProvider(Protocol):
    def extract(self, transcript: str) -> ExtractionResult: ...
    def synthesize(self, extraction: ExtractionResult) -> DraftSummary: ...

Verifier:

  • parse bullets
  • ensure each displayed bullet contains [...] with at least one Segment ID
  • uncited bullets go into uncited_points and are hidden by default

UI behavior:

  • Summary panel shows “X uncited drafts hidden” toggle
  • Clicking bullet scrolls transcript and seeks audio

Testing requirement:

  • Summary verifier must be unit-tested with adversarial outputs (missing brackets, invalid IDs, empty citations).

5.7 UI Implementation Approach (Flet)

State management

Treat UI as a thin layer over a single state store:

  • AppState

    • current meeting status
    • live transcript partial
    • list of finalized segments
    • playback state
    • summary state
    • settings state
    • prompt/snooze state

Changes flow:

  • Services emit signals (blinker)
  • UI controller converts signal payload → state update → re-render

This avoids UI code reaching into services and creating race conditions.


6) Testing Plan (Practical and CI-friendly)

Unit tests (fast)

  • Trigger scoring + thresholds
  • Summarization verifier
  • Segment model validation (end >= start)
  • Retention policy logic
  • Encryption chunk read/write roundtrip

Integration tests

  • DB CRUD roundtrip for each repo
  • Meeting create → segments append → summary store
  • Delete meeting removes all rows and assets

E2E tests (required)

Audio injection harness

  • Feed prerecorded WAV into AudioCapture abstraction (mock capture)

  • Run through VAD + ASR pipeline

  • Assert:

    • segments are produced
    • partial updates happen
    • final segments persist
    • seeking works (timestamp consistency)

Note: CI should never require a live microphone.


7) Release Checklist (V1)

  • Recording indicator always visible when capturing
  • Permission errors show actionable instructions
  • Crash recovery works for incomplete meetings
  • Summary bullets displayed are always cited
  • Delete meeting removes keys + assets + DB rows
  • Telemetry default off; no content ever logged
  • Build artifacts install/run on macOS + Windows

8) "First Implementation Targets" (what to build first)

Build server-side first, then client, to ensure reliable foundation:

Server (build first):

  1. gRPC service skeleton - proto definitions + basic server startup
  2. Meeting store - in-memory meeting lifecycle management
  3. ASR integration - faster-whisper wrapper with streaming output
  4. Bidirectional streaming - audio in, transcripts out
  5. Persistence - LanceDB storage for meetings/segments
  6. Summarization - evidence-linked summary generation

Client (build second): 7. gRPC client wrapper - connection management + streaming 8. Audio capture - sounddevice integration + VU meter 9. Live UI - Flet app with transcript display 10. Tray + hotkeys - pystray/pynput integration 11. Review view - playback synced to transcript 12. Packaging - PyInstaller for both server and client

This ordering ensures the server is stable before building client features on top.


9) Minimal API Skeletons (so devs can start coding)

gRPC Service Definition (proto)

service NoteFlowService {
  // Bidirectional streaming: audio → transcripts
  rpc StreamTranscription(stream AudioChunk) returns (stream TranscriptUpdate);

  // Meeting lifecycle
  rpc CreateMeeting(CreateMeetingRequest) returns (Meeting);
  rpc StopMeeting(StopMeetingRequest) returns (Meeting);
  rpc ListMeetings(ListMeetingsRequest) returns (ListMeetingsResponse);
  rpc GetMeeting(GetMeetingRequest) returns (Meeting);

  // Summary generation
  rpc GenerateSummary(GenerateSummaryRequest) returns (Summary);

  // Server health
  rpc GetServerInfo(ServerInfoRequest) returns (ServerInfo);
}

Client Callback Types

# Client receives these from server via gRPC stream
@dataclass
class TranscriptSegment:
    segment_id: int
    text: str
    start_time: float
    end_time: float
    language: str
    is_final: bool

# Callback signatures
TranscriptCallback = Callable[[TranscriptSegment], None]
ConnectionCallback = Callable[[bool, str], None]  # connected, message

Client-Side Signals (UI updates)

# client/signals.py - for UI thread dispatch
from blinker import signal

audio_level_updated = signal("audio_level_updated")     # rms: float
transcript_received = signal("transcript_received")     # TranscriptSegment
connection_changed = signal("connection_changed")       # connected: bool, message: str

And a “job queue” minimal contract:

class JobQueue:
    def submit(self, job: "Job") -> None: ...
    def cancel(self, job_id: str) -> None: ...

class Job(Protocol):
    id: str
    def run(self) -> None: ...

10) Current Implementation Status

Summary by Milestone

Milestone Status Completeness
M0 Spikes Complete 100%
M1 Repo Foundation Complete 100%
M2 Meeting Lifecycle Complete 100%
M3 Transcription Complete 100%
M4 Review UX Complete 100%
M5 Triggers ⚠️ Partial 70% (integrated via mixin, tray/hotkeys not)
M6 Summarization Complete 100%
M7 Packaging ⚠️ Partial 40% (retention done, packaging not)
M8 Diarization ⚠️ Partial 55% (infrastructure done, wiring not)

Layer-by-Layer Status

Domain Layer 100%

  • Meeting entity with state machine
  • Segment entity with word-level timing
  • Annotation entity (4 types)
  • Summary entity with evidence links (KeyPoint, ActionItem)
  • Repository ports (Protocol-based DI)
  • Unit of Work port
  • Trigger domain (TriggerSignal, TriggerDecision)
  • Summarization ports

Application Layer 100%

  • MeetingService - full CRUD + lifecycle
  • SummarizationService - multi-provider, consent, verification
  • TriggerService - scoring, rate limiting, snooze
  • RetentionService - cleanup, dry-run
  • ExportService - Markdown, HTML
  • RecoveryService - crash recovery

Infrastructure Layer 98%

  • Audio: capture, ring buffer, levels, playback, encrypted writer/reader
  • ASR: faster-whisper engine, VAD, segmenter
  • Persistence: SQLAlchemy + pgvector, Alembic migrations
  • Security: AES-256-GCM, keyring keystore
  • Summarization: Mock, Ollama, Cloud providers + citation verifier
  • Export: Markdown, HTML formatters
  • Triggers: signal providers wired via TriggerMixin
  • Diarization: engine, assigner, DTOs (not wired to server)

gRPC Layer 100%

  • Proto definitions with bidirectional streaming
  • Server: StreamTranscription, CreateMeeting, StopMeeting, etc.
  • Client wrapper with connection management
  • Meeting store (in-memory + DB modes)
  • GenerateSummary RPC wired to SummarizationService
  • Partial transcript streaming (2-second cadence, deduplication)

Client Layer 85%

  • Flet app with state management
  • VU meter, recording timer, transcript
  • Playback controls + sync controller
  • Annotation toolbar + display
  • Meeting library
  • Summary panel with clickable citations
  • Connection panel with auto-reconnect
  • Trigger detection via TriggerMixin (AlertDialog prompts)
  • System tray integration (spike validated, not integrated)
  • Global hotkeys (spike validated, not integrated)

11) Remaining Work Summary

Medium Priority (Platform Features)

# Task Files Complexity Blocker For
1 System Tray Integration New: src/noteflow/client/tray.py Medium M5 completion
Integrate pystray for minimize-to-tray, system notifications, recording indicator
2 Global Hotkeys New: src/noteflow/client/hotkeys.py Medium M5 completion
Integrate pynput for start/stop/annotation hotkeys

Medium Priority (Diarization Wiring)

# Task Files Complexity Blocker For
3 Diarization Application Service New: application/services/diarization_service.py Medium M8 completion
Orchestrate diarization workflow, model management
4 Diarization Server Wiring src/noteflow/grpc/server.py Low M8 completion
Initialize DiarizationEngine on startup when enabled
5 Diarization Tests New: tests/infrastructure/diarization/ Medium M8 stability
Unit tests for engine, assigner, DTOs

Lower Priority (Shipping)

# Task Files Complexity Blocker For
6 PyInstaller Packaging New: build scripts High M7 release
Create distributable for macOS + Windows
7 Code Signing Build config Medium M7 release
macOS notarization, Windows signing
8 Update Check Flow New: src/noteflow/client/update.py Low M7 release
Version display + "Check for Updates" link
9 Telemetry (Opt-in) New: telemetry module Medium M7 release
Content-free metrics collection
  1. System Tray + Hotkeys (Can be done in parallel, completes M5)
  2. Diarization Wiring (Server init + tests, completes M8 core)
  3. PyInstaller Packaging (Enables distribution)
  4. Remaining M7 items (Polish for release)

12) Architecture Reference

Key File Locations

Component Location
Domain Entities src/noteflow/domain/entities/
Repository Ports src/noteflow/domain/ports/repositories.py
Application Services src/noteflow/application/services/
gRPC Server src/noteflow/grpc/server.py, service.py
gRPC Client src/noteflow/grpc/client.py
Audio Capture src/noteflow/infrastructure/audio/
ASR Engine src/noteflow/infrastructure/asr/
Persistence src/noteflow/infrastructure/persistence/
Security src/noteflow/infrastructure/security/
Summarization src/noteflow/infrastructure/summarization/
Client App src/noteflow/client/app.py
UI Components src/noteflow/client/components/

Data Flow

┌─────────────────────────────────────────────────────────────┐
│                         CLIENT                               │
├─────────────────────────────────────────────────────────────┤
│  Audio Capture → VU Meter → gRPC Stream → UI Components     │
│       ↓                         ↑                            │
│  sounddevice               Transcript Updates                │
└─────────────────────────────────────────────────────────────┘
                              │ gRPC
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                         SERVER                               │
├─────────────────────────────────────────────────────────────┤
│  Audio Buffer → VAD → Segmenter → ASR Engine                │
│       ↓                              ↓                       │
│  Encrypted Writer              Final Segments                │
│       ↓                              ↓                       │
│  audio.enc                    PostgreSQL + pgvector          │
└─────────────────────────────────────────────────────────────┘

Meeting Lifecycle States

CREATED → RECORDING → STOPPING → STOPPED → COMPLETED
    ↓          ↓           ↓
  ERROR ←──────┴───────────┘ (crash recovery)