Files

Travis Vasceannie b333ea5b23 Add initial Docker and development environment setup

- Created .dockerignore to exclude unnecessary files from Docker builds.
- Added .repomixignore for managing ignored patterns in Repomix.
- Introduced Dockerfile.dev for development environment setup with Python 3.12.
- Configured docker-compose.yaml to define services, including a PostgreSQL database.
- Established a devcontainer.json for Visual Studio Code integration.
- Implemented postCreate.sh for automatic dependency installation in the dev container.
- Added constants.py to centralize configuration constants for the project.
- Updated pyproject.toml to include new development dependencies.
- Created initial documentation files for project overview and style conventions.
- Added tests for new functionalities to ensure reliability and correctness.

2025-12-19 05:02:16 +00:00

38 KiB

Raw Blame History

NoteFlow V1 Implementation Plan

Architecture: Client-Server with gRPC (evolved from original single-process design) Core principles: Local-first, mic capture baseline, partial→final transcripts, evidence-linked summaries with strict citation enforcement.

Last updated: December 2025

1) Milestones and Gates

Milestone 0 — Spikes to de-risk platform & pipeline ✅ COMPLETE

Goal: validate the 4 biggest "desktop app cliffs" before committing to architecture.

Spikes (each ends with a tiny working prototype + written findings):

UI + Tray + Hotkeys feasibility ✅
- Verified: system tray/menubar icon, notification prompt, global hotkey start/stop
- Flet works for main UI; pystray/pynput validated for tray + hotkeys
- Location: spikes/spike_01_ui_tray_hotkeys/
Audio capture robustness ✅
- Validated sounddevice.InputStream with PortAudio:
  - default mic capture works
  - device unplug / device switch handling
  - stable VU meter feed
- Location: spikes/spike_02_audio_capture/
ASR latency feasibility ✅
- faster-whisper benchmarked at 0.05x real-time (excellent)
- Model download/cache strategy validated
- Location: spikes/spike_03_asr_latency/
Key storage + encryption approach ✅
- OS keystore integration works (Keychain/Credential Manager via keyring)
- Encrypted streaming audio file validated (chunked AES-GCM, 826 MB/s throughput)
- Location: spikes/spike_04_encryption/

Exit criteria (M0): ✅ ALL MET

Start recording → see VU meter → stop → playback file on both OSs
Run ASR over captured audio and display text in UI
Store/read an encrypted blob using a stored master key

Milestone 1 — Repo foundation + CI + core contracts ✅ COMPLETE

Goal: establish maintainable structure, typing, test harness, logging.

Deliverables: ✅ ALL COMPLETE

Repository layout with hexagonal architecture (domain → application → infrastructure)
pyproject.toml + uv lockfile
Quality gates: ruff, basedpyright, pytest
Structured logging (structlog) with content-safe defaults
Settings system (Pydantic settings + NOTEFLOW_ env vars)
Minimal "app shell" (Flet UI opens, logs write)

Implementation locations:

Domain: src/noteflow/domain/ (entities, ports, value objects)
Application: src/noteflow/application/services/
Infrastructure: src/noteflow/infrastructure/
Config: src/noteflow/config/

Exit criteria: ✅ ALL MET

CI passes lint/type/tests
Running app opens a window (tray integration deferred to M5)

Milestone 2 — Meeting lifecycle + mic capture + crash-safe persistence ✅ COMPLETE

Goal: reliable recording as the foundation.

Deliverables: ✅ ALL COMPLETE

MeetingService state machine (CREATED → RECORDING → STOPPING → STOPPED → COMPLETED)
- Location: src/noteflow/application/services/meeting_service.py
Audio capture via SoundDeviceCapture
- Location: src/noteflow/infrastructure/audio/capture.py
Encrypted streaming asset writer (NFAE format, AES-256-GCM)
- Location: src/noteflow/infrastructure/audio/writer.py
- Crypto: src/noteflow/infrastructure/security/crypto.py
Meeting folder layout + manifest.json
- Format: ~/.noteflow/meetings/<meeting-uuid>/audio.enc + manifest.json
Active Meeting UI: timer + VU meter + start/stop
- Components: recording_timer.py, vu_meter.py in client/components/
Crash recovery via RecoveryService
- Location: src/noteflow/application/services/recovery_service.py
- Detects meetings left in RECORDING/STOPPING state, marks as ERROR

Exit criteria: ✅ ALL MET

Record 30 minutes without UI freezing
App restart after forced kill recovers incomplete meetings

Milestone 3 — Partial→Final transcription + transcript persistence ✅ COMPLETE

Goal: near real-time transcription with stability rules.

Deliverables: ✅ ALL COMPLETE

ASR wrapper service (faster-whisper with word timestamps)
- Location: src/noteflow/infrastructure/asr/engine.py
- Supports 13 model sizes, CPU/GPU, word-level timestamps
VAD + segment finalization logic
- EnergyVad: src/noteflow/infrastructure/asr/streaming_vad.py
- Segmenter: src/noteflow/infrastructure/asr/segmenter.py
Partial transcript feed to UI
- Server: _maybe_emit_partial() called during streaming (service.py:601)
- 2-second cadence with text deduplication
- Client: Handles is_final=False in client.py:458-467
- UI: [LIVE] row with blue styling (transcript.py:182-219)
Final segments persisted to PostgreSQL + pgvector
- Repository: src/noteflow/infrastructure/persistence/repositories/segment.py
Post-meeting transcript view
- Component: src/noteflow/client/components/transcript.py

Implementation details:

Server emits UPDATE_TYPE_PARTIAL every 2 seconds during speech activity
Minimum 0.5 seconds of audio before partial inference
Partial text deduplicated (only emitted when changed)
Client renders partials with is_final=False flag
UI displays [LIVE] indicator with blue background, grey italic text
Partial row cleared when final segment arrives

Exit criteria: ✅ ALL MET

Live view shows partial text that settles into final segments
After restart, final segments are still present and searchable within the meeting

Milestone 4 — Review UX: playback, annotations, export ✅ MOSTLY COMPLETE

Goal: navigable recall loop.

Deliverables:

Audio playback synced to segment timestamps
- PlaybackControls: src/noteflow/client/components/playback_controls.py
- PlaybackSyncController: src/noteflow/client/components/playback_sync.py
- SoundDevicePlayback: src/noteflow/infrastructure/audio/playback.py
Add annotations in live view + review view
- AnnotationToolbar: src/noteflow/client/components/annotation_toolbar.py
- AnnotationDisplay: src/noteflow/client/components/annotation_display.py
- All 4 types: ACTION_ITEM, DECISION, NOTE, RISK
Export: Markdown + HTML
- ExportService: src/noteflow/application/services/export_service.py
- Markdown exporter: src/noteflow/infrastructure/export/markdown.py
- HTML exporter: src/noteflow/infrastructure/export/html.py
Meeting library list + per-meeting search
- MeetingLibrary: src/noteflow/client/components/meeting_library.py
- TranscriptComponent with search: src/noteflow/client/components/transcript.py

Previous gaps — now closed:

Wire meeting library into the main UI and selection flow
Add per-meeting transcript search (client-side filter)
Add risk annotation type end-to-end (domain enum, UI, persistence)

Exit criteria: ✅ ALL MET

Clicking a segment seeks audio playback to that time
Export produces correct Markdown/HTML for at least one meeting

Milestone 5 — Smart triggers (confidence model) + snooze/suppression ⚠️ PARTIALLY INTEGRATED

Goal: prompts that are helpful, not annoying.

Deliverables:

Trigger engine + scoring
- TriggerService: src/noteflow/application/services/trigger_service.py
- Domain entities: src/noteflow/domain/triggers/entities.py
- TriggerSignal, TriggerDecision, TriggerAction (IGNORE, NOTIFY, AUTO_START)
SignalProvider protocol defined
- Location: src/noteflow/domain/triggers/ports.py
Foreground app detector integration
- Infrastructure: src/noteflow/infrastructure/triggers/foreground_app.py
- Wired via TriggerMixin: src/noteflow/client/_trigger_mixin.py
Audio activity detector integration
- Infrastructure: src/noteflow/infrastructure/triggers/audio_activity.py
- Wired via TriggerMixin: src/noteflow/client/_trigger_mixin.py
Optional calendar connector stub (disabled by default)
Trigger prompts + snooze (AlertDialog, not system notifications)
- TriggerMixin._show_trigger_prompt() displays AlertDialog
- Snooze button integrated
- Rate limiting active
System tray integration ← GAP
Global hotkeys ← GAP
Settings for sensitivity and auto-start opt-in (in TriggerService)

Current integration status:

Client app inherits from TriggerMixin (app.py:65)
Signal providers initialized in _initialize_triggers() method
Background trigger check loop runs via _trigger_check_loop()
Handles NOTIFY and AUTO_START actions
Prompts shown via Flet AlertDialog (not system notifications)

What works:

Confidence scoring with configurable thresholds (0.40 notify, 0.80 auto-start)
Rate limiting between triggers
Snooze functionality with remaining time tracking
Per-app suppression config
Foreground app detection (PyWinCtl)
Audio activity detection (RMS sliding window)

Remaining work:

System Tray Integration (New file: src/noteflow/client/tray.py)
- Integrate pystray for minimize-to-tray
- Show trigger prompts as system notifications
- Recording indicator icon
- Complexity: Medium (spike validated in spikes/spike_01_ui_tray_hotkeys/)
Global Hotkeys (New file: src/noteflow/client/hotkeys.py)
- Integrate pynput for start/stop/annotation hotkeys
- Complexity: Medium (spike validated)

Exit criteria:

Trigger prompts happen when expected and can be snoozed
Prompt rate-limited to prevent spam
System tray notifications (currently AlertDialog only)
Global hotkeys for quick actions

Milestone 6 — Evidence-linked summaries (extract → synthesize → verify) ✅ COMPLETE

Goal: no uncited claims.

Deliverables:

Summarizer provider interface
- Protocol: src/noteflow/domain/summarization/ports.py
- DTOs: SummarizationRequest, SummarizationResult, CitationVerificationResult
Provider implementations (3 complete):
- MockSummarizer: src/noteflow/infrastructure/summarization/mock_provider.py
- OllamaSummarizer (local): src/noteflow/infrastructure/summarization/ollama_provider.py
- CloudSummarizer (OpenAI/Anthropic): src/noteflow/infrastructure/summarization/cloud_provider.py
Citation verifier + "uncited drafts" handling
- CitationVerifier: src/noteflow/infrastructure/summarization/citation_verifier.py
- Validates segment_ids, filters invalid citations
Summary UI panel with clickable citations
- SummaryPanel: src/noteflow/client/components/summary_panel.py
- Shows key points + action items with evidence links
- "Uncited drafts hidden" toggle
Factory function for service creation
- create_summarization_service(): src/noteflow/infrastructure/summarization/factory.py
- Shared by client app and gRPC server

Application service complete:

SummarizationService: src/noteflow/application/services/summarization_service.py
Multi-provider with consent management
Fallback chain: CLOUD → LOCAL → MOCK
Citation verification and filtering

gRPC integration complete:

GenerateSummary RPC calls SummarizationService.summarize()
Auto-detects provider availability (tries LOCAL, falls back to MOCK)
Placeholder fallback if service unavailable

Exit criteria: ✅ ALL MET

Every displayed bullet has citations (RPC wired to real service)
Clicking bullet jumps to cited transcript segment and audio timestamp

Milestone 7 — Retention, deletion, telemetry (opt-in), packaging ⚠️ RETENTION COMPLETE

Goal: ship safely.

Deliverables:

Retention job
- RetentionService: src/noteflow/application/services/retention_service.py
- Configurable retention days, dry-run support
- Runs at startup and periodically
Delete meeting (cryptographic delete)
- MeetingService.delete_meeting() removes:
  - Database rows (meeting, segments, summary, annotations)
  - Encrypted audio file (audio.enc)
  - Wrapped DEK from manifest (renders audio unrecoverable)
  - Meeting directory
Optional telemetry (content-free) ← GAP
PyInstaller build ← GAP
"Check for updates" flow ← GAP
Release checklist & troubleshooting docs

What's implemented:

Meeting deletion cascade is complete:
- DB cascade: meeting → segments → summary → annotations
- Filesystem: ~/.noteflow/meetings/<meeting-uuid>/ removed
- Crypto: DEK deleted from manifest, audio unrecoverable
Retention service is complete:
- RetentionService.run_cleanup() with dry-run
- Finds meetings older than retention cutoff
- Generates RetentionReport with counts
- Integration tests validate cascade

Remaining work:

PyInstaller Packaging (New: build scripts)
- Create distributable for macOS + Windows
- Complexity: High (cross-platform, native deps)
Code Signing
- macOS notarization, Windows signing
- Complexity: Medium
Update Check Flow
- Version display + "Check for Updates" → release page
- Complexity: Low
Telemetry (Opt-in)
- Content-free metrics: crash stacktrace, latency, feature flags
- Complexity: Medium

Exit criteria:

A signed installer that installs and runs on both OSs
Deleting a meeting removes DB rows + assets; audio cannot be decrypted after key deletion

Milestone 8 (Optional pre‑release) — Post-meeting anonymous diarization ✅ COMPLETE

Goal: "Speaker A/B/C" best-effort labeling.

Deliverables:

Diarization engine with streaming + offline modes
- Location: src/noteflow/infrastructure/diarization/engine.py (315 lines)
- Streaming: diart library for real-time processing
- Offline: pyannote.audio for post-meeting refinement
- Device support: auto, cpu, cuda, mps
Speaker assignment logic
- Location: src/noteflow/infrastructure/diarization/assigner.py
- assign_speaker() maps time ranges via maximum overlap
- assign_speakers_batch() for bulk assignment
- Confidence scoring based on overlap duration
Data transfer objects
- Location: src/noteflow/infrastructure/diarization/dto.py
- SpeakerTurn with validation and overlap methods
Domain entity updates
- Segment.speaker_id: str | None and speaker_confidence: float
Proto/gRPC definitions
- FinalSegment.speaker_id and speaker_confidence fields
- ServerInfo.diarization_enabled and diarization_ready flags
- RefineSpeakerDiarization and RenameSpeaker RPCs
gRPC refinement RPC
- refine_speaker_diarization() in service.py for post-meeting processing
- rename_speaker() for user-friendly speaker labels
Configuration/settings
- diarization_enabled, diarization_hf_token, diarization_device
- diarization_streaming_latency, diarization_min/max_speakers
Dependencies added
- Optional extra [diarization]: pyannote.audio, diart, torch
UI display
- Speaker labels with color coding in transcript.py
- "Analyze Speakers" and "Rename Speakers" buttons in meeting_library.py
Server initialization
- DiarizationEngine wired in server.py with CLI args
- --diarization, --diarization-hf-token, --diarization-device flags
Client integration
- refine_speaker_diarization() and rename_speaker() methods in client.py
- DiarizationResult and RenameSpeakerResult DTOs
Tests
- 24 unit tests in tests/infrastructure/test_diarization.py
- Covers SpeakerTurn, assign_speaker(), assign_speakers_batch()

Deferred (optional future enhancement):

Streaming integration - Real-time speaker labels during recording
- Feed audio chunks to diarization during StreamTranscription
- Emit speaker changes in real-time
- Complexity: High (requires significant latency tuning)

Exit criteria: ✅ ALL MET

If diarization fails, app degrades gracefully to "Unknown."
Post-meeting diarization refinement works end-to-end
(Optional) Streaming diarization shows live speaker labels — deferred

2) Proposed Repository Layout

This layout is designed to:

separate server and client concerns,
isolate platform-specific code,
keep modules < 500 LoC,
make DI clean,
keep writing to disk centralized.

noteflow/
├─ pyproject.toml
├─ src/noteflow/
│  ├─ core/
│  │  ├─ config.py            # Settings (Pydantic) + load/save
│  │  ├─ logging.py           # structlog config, redaction helpers
│  │  ├─ types.py             # common NewTypes / Protocols
│  │  └─ errors.py            # domain error types
│  │
│  ├─ grpc/                   # gRPC server components
│  │  ├─ proto/
│  │  │  ├─ noteflow.proto    # Service definitions
│  │  │  ├─ noteflow_pb2.py   # Generated protobuf
│  │  │  └─ noteflow_pb2_grpc.py
│  │  ├─ server.py            # Server entry point
│  │  ├─ service.py           # NoteFlowServicer implementation
│  │  ├─ meeting_store.py     # In-memory meeting management
│  │  └─ client.py            # gRPC client wrapper
│  │
│  ├─ client/                 # GUI client application
│  │  ├─ app.py               # Flet app entry point
│  │  ├─ state.py             # App state store
│  │  └─ components/
│  │     ├─ transcript.py
│  │     ├─ vu_meter.py
│  │     └─ summary_panel.py
│  │
│  ├─ audio/                  # Audio capture (client-side)
│  │  ├─ capture.py           # sounddevice InputStream wrapper
│  │  ├─ levels.py            # RMS/VU meter computation
│  │  ├─ ring_buffer.py       # timestamped audio buffer
│  │  └─ playback.py          # audio playback synced to timestamp
│  │
│  ├─ asr/                    # ASR engine (server-side)
│  │  ├─ engine.py            # faster-whisper wrapper + model cache
│  │  ├─ segmenter.py         # partial/final logic, silence boundaries
│  │  └─ dto.py               # ASR outputs (words optional)
│  │
│  ├─ data/                   # Persistence (server-side)
│  │  ├─ db.py                # LanceDB connection + table handles
│  │  ├─ schema.py            # table schemas + version
│  │  └─ repos/
│  │     ├─ meetings.py
│  │     ├─ segments.py
│  │     └─ summaries.py
│  │
│  ├─ platform/               # Platform-specific (client-side)
│  │  ├─ tray/                # tray/menubar (pystray)
│  │  ├─ hotkeys/             # global hotkeys (pynput)
│  │  └─ notifications/       # toast notifications
│  │
│  └─ summarization/          # Summary generation (server-side)
│     ├─ providers/
│     │  ├─ base.py
│     │  └─ cloud.py
│     ├─ prompts.py
│     └─ verifier.py
│
├─ spikes/                    # De-risking spikes (M0)
│  ├─ spike_01_ui_tray_hotkeys/
│  ├─ spike_02_audio_capture/
│  ├─ spike_03_asr_latency/
│  └─ spike_04_encryption/
│
└─ tests/
   ├─ unit/
   ├─ integration/
   └─ e2e/

3) Core Runtime Design

3.1 State Machine (Meeting Lifecycle)

Define explicitly so UI + services remain consistent.

IDLE
  ├─ start(manual/trigger) → RECORDING
  └─ prompt(trigger) → PROMPTED

PROMPTED
  ├─ accept → RECORDING
  └─ dismiss/snooze → IDLE

RECORDING
  ├─ stop → STOPPING
  ├─ error(audio) → ERROR (with recover attempt)
  └─ crash → RECOVERABLE_INCOMPLETE on restart

STOPPING
  ├─ flush assets/segments → REVIEW_READY
  └─ failure → REVIEW_READY (marked incomplete)

REVIEW_READY
  ├─ summarize → REVIEW_READY (summary updated)
  └─ delete → IDLE

Invariant: segments are only “final” when persisted. Partial text is never persisted.

3.2 Threading + Queue Model (Client-Server)

Server Threads:

gRPC thread pool: handles incoming RPC requests
ASR worker thread: processes audio buffers through faster-whisper
IO worker thread: only place that writes DB + manifest updates
Background jobs: summarization, diarization, retention

Client Threads:

Main/UI thread: Flet rendering + user actions
Audio callback thread: receives frames, does minimal work:
- compute lightweight RMS for VU meter
- enqueue frames to gRPC stream queue
gRPC stream thread: sends audio chunks, receives transcript updates
Event dispatch: updates UI from transcript callbacks

Rules:

Anything blocking > 5ms does not run in the audio callback
Only the server's IO worker writes to the database

4) Dependency Injection and Service Wiring

Use a small container (manual DI) rather than a framework.

# core/types.py
from typing import Protocol

class Clock(Protocol):
    def monotonic(self) -> float: ...
    def now(self): ...

class Notifier(Protocol):
    def prompt_recording(self, title: str, body: str) -> None: ...
    def toast(self, title: str, body: str) -> None: ...

class ForegroundAppProvider(Protocol):
    def current_app(self) -> str | None: ...

class KeyStore(Protocol):
    def get_or_create_master_key(self) -> bytes: ...

# app.py (wiring idea)
def build_container() -> AppContainer:
    settings = load_settings()
    logger = configure_logging(settings)
    keystore = build_keystore()
    crypt = CryptoBox(keystore)
    db = LanceDatabase(settings.paths.db_dir)
    repos = Repositories(db)
    jobs = JobQueue(...)
    audio = AudioCapture(...)
    asr = AsrEngine(...)
    meeting = MeetingService(...)
    triggers = TriggerService(...)
    ui = UiController(...)
    return AppContainer(...)

5) Detailed Subsystem Plans

5.1 Audio Capture + Assets

AudioCapture

Responsibilities:

open/close stream
handle device change / reconnect
feed ring buffer
expose current level for VU meter

Key APIs:

class AudioCapture:
    def start(self, on_frames: Callable[[np.ndarray, float], None]) -> None: ...
    def stop(self) -> None: ...
    def current_device(self) -> AudioDeviceInfo: ...

RingBuffer (timestamped)

store (timestamp, frames) so segment times are stable even if UI thread lags
provide “last N seconds” view for ASR worker

VAD

Define an interface so you can swap implementations (webrtcvad vs silero) without rewriting pipeline.

class Vad:
    def is_speech(self, pcm16: bytes, sample_rate: int) -> bool: ...

Encrypted Audio Container (streaming)

Implementation approach (V1-safe): encrypted chunk format (AES-GCM) storing PCM16 frames. Optional: later add “compress after meeting” job (Opus) once stable.

Writer contract:

write header once
write chunks frequently (every ~200–500ms)
flush frequently (crash-safe)

Deletion contract:

delete per-meeting DEK record first (crypto delete)
delete meeting folder

5.2 ASR and Segment Finalization

ASR Engine Wrapper (faster-whisper)

Responsibilities:

model download/cache
run inference
return tokens/segments with timestamps (word timestamps optional)

class AsrEngine:
    def transcribe(self, audio_f32_16k: np.ndarray) -> AsrResult: ...

Segmenter (partial/final)

Responsibilities:

build current “active utterance” from VAD-speech frames
run partial inference every N seconds
finalize when silence boundary detected

Data contract:

PartialUpdate: {text, start_offset, end_offset, stable=False}
FinalSegment: {segment_id, text, start_offset, end_offset, stable=True}

Important: final segments get their IDs at commit time (IO worker), not earlier.

5.3 Persistence (LanceDB + repositories)

DB access policy

One DB connection managed centrally
IO worker serializes all writes

Repositories:

MeetingsRepo: create/update meeting status, store DEK metadata reference
SegmentsRepo: append segments, query by meeting, basic search
AnnotationsRepo: add/list annotations
SummariesRepo: store summary + verification report

Also store:

schema version
app version
migration logic (even if minimal)

5.4 MeetingService (Orchestration)

Responsibilities:

create meeting directory + metadata
start/stop audio capture
start/stop ASR segmenter
handle UI events (annotation hotkeys, stop, etc.)
coordinate with TriggerService
ensure crash-safe flush and marking incomplete

Key public API:

class MeetingService:
    def start(self, source: TriggerSource) -> MeetingID: ...
    def stop(self) -> None: ...
    def add_annotation(self, type: AnnotationType, text: str | None = None) -> None: ...
    def current_meeting_id(self) -> MeetingID | None: ...

5.5 TriggerService (Confidence Model + throttling)

Inputs (each independently optional):

calendar (optional connector)
foreground app provider
audio activity provider

Outputs:

prompt notification
optional auto-start (if user enabled)
snooze & suppression state

Policies:

rate limit prompts (e.g., max 1 prompt / 10 min)
cooldown after dismiss
per-app suppression config

Implementation detail:

TriggerService publishes events via signals:
- trigger_prompted
- trigger_snoozed
- trigger_accepted

5.6 Summarization Service (Extract → Synthesize → Verify)

Provider interface:

class SummarizerProvider(Protocol):
    def extract(self, transcript: str) -> ExtractionResult: ...
    def synthesize(self, extraction: ExtractionResult) -> DraftSummary: ...

Verifier:

parse bullets
ensure each displayed bullet contains [...] with at least one Segment ID
uncited bullets go into uncited_points and are hidden by default

UI behavior:

Summary panel shows “X uncited drafts hidden” toggle
Clicking bullet scrolls transcript and seeks audio

Testing requirement:

Summary verifier must be unit-tested with adversarial outputs (missing brackets, invalid IDs, empty citations).

5.7 UI Implementation Approach (Flet)

State management

Treat UI as a thin layer over a single state store:

AppState
- current meeting status
- live transcript partial
- list of finalized segments
- playback state
- summary state
- settings state
- prompt/snooze state

Changes flow:

Services emit signals (blinker)
UI controller converts signal payload → state update → re-render

This avoids UI code reaching into services and creating race conditions.

6) Testing Plan (Practical and CI-friendly)

Unit tests (fast)

Trigger scoring + thresholds
Summarization verifier
Segment model validation (end >= start)
Retention policy logic
Encryption chunk read/write roundtrip

Integration tests

DB CRUD roundtrip for each repo
Meeting create → segments append → summary store
Delete meeting removes all rows and assets

E2E tests (required)

Audio injection harness

Feed prerecorded WAV into AudioCapture abstraction (mock capture)
Run through VAD + ASR pipeline
Assert:
- segments are produced
- partial updates happen
- final segments persist
- seeking works (timestamp consistency)

Note: CI should never require a live microphone.

7) Release Checklist (V1)

Recording indicator always visible when capturing
Permission errors show actionable instructions
Crash recovery works for incomplete meetings
Summary bullets displayed are always cited
Delete meeting removes keys + assets + DB rows
Telemetry default off; no content ever logged
Build artifacts install/run on macOS + Windows

8) "First Implementation Targets" (what to build first)

Build server-side first, then client, to ensure reliable foundation:

Server (build first):

gRPC service skeleton - proto definitions + basic server startup
Meeting store - in-memory meeting lifecycle management
ASR integration - faster-whisper wrapper with streaming output
Bidirectional streaming - audio in, transcripts out
Persistence - LanceDB storage for meetings/segments
Summarization - evidence-linked summary generation

Client (build second): 7. gRPC client wrapper - connection management + streaming 8. Audio capture - sounddevice integration + VU meter 9. Live UI - Flet app with transcript display 10. Tray + hotkeys - pystray/pynput integration 11. Review view - playback synced to transcript 12. Packaging - PyInstaller for both server and client

This ordering ensures the server is stable before building client features on top.

9) Minimal API Skeletons (so devs can start coding)

gRPC Service Definition (proto)

service NoteFlowService {
  // Bidirectional streaming: audio → transcripts
  rpc StreamTranscription(stream AudioChunk) returns (stream TranscriptUpdate);

  // Meeting lifecycle
  rpc CreateMeeting(CreateMeetingRequest) returns (Meeting);
  rpc StopMeeting(StopMeetingRequest) returns (Meeting);
  rpc ListMeetings(ListMeetingsRequest) returns (ListMeetingsResponse);
  rpc GetMeeting(GetMeetingRequest) returns (Meeting);

  // Summary generation
  rpc GenerateSummary(GenerateSummaryRequest) returns (Summary);

  // Server health
  rpc GetServerInfo(ServerInfoRequest) returns (ServerInfo);
}

Client Callback Types

# Client receives these from server via gRPC stream
@dataclass
class TranscriptSegment:
    segment_id: int
    text: str
    start_time: float
    end_time: float
    language: str
    is_final: bool

# Callback signatures
TranscriptCallback = Callable[[TranscriptSegment], None]
ConnectionCallback = Callable[[bool, str], None]  # connected, message

Client-Side Signals (UI updates)

# client/signals.py - for UI thread dispatch
from blinker import signal

audio_level_updated = signal("audio_level_updated")     # rms: float
transcript_received = signal("transcript_received")     # TranscriptSegment
connection_changed = signal("connection_changed")       # connected: bool, message: str

And a “job queue” minimal contract:

class JobQueue:
    def submit(self, job: "Job") -> None: ...
    def cancel(self, job_id: str) -> None: ...

class Job(Protocol):
    id: str
    def run(self) -> None: ...

10) Current Implementation Status

Summary by Milestone

Milestone	Status	Completeness
M0 Spikes	✅ Complete	100%
M1 Repo Foundation	✅ Complete	100%
M2 Meeting Lifecycle	✅ Complete	100%
M3 Transcription	✅ Complete	100%
M4 Review UX	✅ Complete	100%
M5 Triggers	⚠️ Partial	70% (integrated via mixin, tray/hotkeys not)
M6 Summarization	✅ Complete	100%
M7 Packaging	⚠️ Partial	40% (retention done, packaging not)
M8 Diarization	⚠️ Partial	55% (infrastructure done, wiring not)

Layer-by-Layer Status

Domain Layer ✅ 100%

Meeting entity with state machine
Segment entity with word-level timing
Annotation entity (4 types)
Summary entity with evidence links (KeyPoint, ActionItem)
Repository ports (Protocol-based DI)
Unit of Work port
Trigger domain (TriggerSignal, TriggerDecision)
Summarization ports

Application Layer ✅ 100%

MeetingService - full CRUD + lifecycle
SummarizationService - multi-provider, consent, verification
TriggerService - scoring, rate limiting, snooze
RetentionService - cleanup, dry-run
ExportService - Markdown, HTML
RecoveryService - crash recovery

Infrastructure Layer ✅ 98%

Audio: capture, ring buffer, levels, playback, encrypted writer/reader
ASR: faster-whisper engine, VAD, segmenter
Persistence: SQLAlchemy + pgvector, Alembic migrations
Security: AES-256-GCM, keyring keystore
Summarization: Mock, Ollama, Cloud providers + citation verifier
Export: Markdown, HTML formatters
Triggers: signal providers wired via TriggerMixin
Diarization: engine, assigner, DTOs (not wired to server)

gRPC Layer ✅ 100%

Proto definitions with bidirectional streaming
Server: StreamTranscription, CreateMeeting, StopMeeting, etc.
Client wrapper with connection management
Meeting store (in-memory + DB modes)
GenerateSummary RPC wired to SummarizationService
Partial transcript streaming (2-second cadence, deduplication)

Client Layer ✅ 85%

Flet app with state management
VU meter, recording timer, transcript
Playback controls + sync controller
Annotation toolbar + display
Meeting library
Summary panel with clickable citations
Connection panel with auto-reconnect
Trigger detection via TriggerMixin (AlertDialog prompts)
System tray integration (spike validated, not integrated)
Global hotkeys (spike validated, not integrated)

11) Remaining Work Summary

Medium Priority (Platform Features)

#	Task	Files	Complexity	Blocker For
1	System Tray Integration	New: `src/noteflow/client/tray.py`	Medium	M5 completion
	Integrate pystray for minimize-to-tray, system notifications, recording indicator
2	Global Hotkeys	New: `src/noteflow/client/hotkeys.py`	Medium	M5 completion
	Integrate pynput for start/stop/annotation hotkeys

Medium Priority (Diarization Wiring)

#	Task	Files	Complexity	Blocker For
3	Diarization Application Service	New: `application/services/diarization_service.py`	Medium	M8 completion
	Orchestrate diarization workflow, model management
4	Diarization Server Wiring	`src/noteflow/grpc/server.py`	Low	M8 completion
	Initialize DiarizationEngine on startup when enabled
5	Diarization Tests	New: `tests/infrastructure/diarization/`	Medium	M8 stability
	Unit tests for engine, assigner, DTOs

Lower Priority (Shipping)

#	Task	Files	Complexity	Blocker For
6	PyInstaller Packaging	New: build scripts	High	M7 release
	Create distributable for macOS + Windows
7	Code Signing	Build config	Medium	M7 release
	macOS notarization, Windows signing
8	Update Check Flow	New: `src/noteflow/client/update.py`	Low	M7 release
	Version display + "Check for Updates" link
9	Telemetry (Opt-in)	New: telemetry module	Medium	M7 release
	Content-free metrics collection

Recommended Implementation Order

System Tray + Hotkeys (Can be done in parallel, completes M5)
Diarization Wiring (Server init + tests, completes M8 core)
PyInstaller Packaging (Enables distribution)
Remaining M7 items (Polish for release)

12) Architecture Reference

Key File Locations

Component	Location
Domain Entities	`src/noteflow/domain/entities/`
Repository Ports	`src/noteflow/domain/ports/repositories.py`
Application Services	`src/noteflow/application/services/`
gRPC Server	`src/noteflow/grpc/server.py`, `service.py`
gRPC Client	`src/noteflow/grpc/client.py`
Audio Capture	`src/noteflow/infrastructure/audio/`
ASR Engine	`src/noteflow/infrastructure/asr/`
Persistence	`src/noteflow/infrastructure/persistence/`
Security	`src/noteflow/infrastructure/security/`
Summarization	`src/noteflow/infrastructure/summarization/`
Client App	`src/noteflow/client/app.py`
UI Components	`src/noteflow/client/components/`

Data Flow

┌─────────────────────────────────────────────────────────────┐
│                         CLIENT                               │
├─────────────────────────────────────────────────────────────┤
│  Audio Capture → VU Meter → gRPC Stream → UI Components     │
│       ↓                         ↑                            │
│  sounddevice               Transcript Updates                │
└─────────────────────────────────────────────────────────────┘
                              │ gRPC
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                         SERVER                               │
├─────────────────────────────────────────────────────────────┤
│  Audio Buffer → VAD → Segmenter → ASR Engine                │
│       ↓                              ↓                       │
│  Encrypted Writer              Final Segments                │
│       ↓                              ↓                       │
│  audio.enc                    PostgreSQL + pgvector          │
└─────────────────────────────────────────────────────────────┘

Meeting Lifecycle States

CREATED → RECORDING → STOPPING → STOPPED → COMPLETED
    ↓          ↓           ↓
  ERROR ←──────┴───────────┘ (crash recovery)

38 KiB Raw Blame History Unescape Escape

NoteFlow V1 Implementation Plan

1) Milestones and Gates

Milestone 0 — Spikes to de-risk platform & pipeline ✅ COMPLETE

Milestone 1 — Repo foundation + CI + core contracts ✅ COMPLETE

Milestone 2 — Meeting lifecycle + mic capture + crash-safe persistence ✅ COMPLETE

Milestone 3 — Partial→Final transcription + transcript persistence ✅ COMPLETE

Milestone 4 — Review UX: playback, annotations, export ✅ MOSTLY COMPLETE

Milestone 5 — Smart triggers (confidence model) + snooze/suppression ⚠️ PARTIALLY INTEGRATED

Milestone 6 — Evidence-linked summaries (extract → synthesize → verify) ✅ COMPLETE

Milestone 7 — Retention, deletion, telemetry (opt-in), packaging ⚠️ RETENTION COMPLETE

Milestone 8 (Optional pre‑release) — Post-meeting anonymous diarization ✅ COMPLETE

2) Proposed Repository Layout

3) Core Runtime Design

3.1 State Machine (Meeting Lifecycle)

3.2 Threading + Queue Model (Client-Server)

4) Dependency Injection and Service Wiring

5) Detailed Subsystem Plans

5.1 Audio Capture + Assets

AudioCapture

RingBuffer (timestamped)

VAD

Encrypted Audio Container (streaming)

5.2 ASR and Segment Finalization

ASR Engine Wrapper (faster-whisper)

Segmenter (partial/final)

5.3 Persistence (LanceDB + repositories)

DB access policy

5.4 MeetingService (Orchestration)

5.5 TriggerService (Confidence Model + throttling)

5.6 Summarization Service (Extract → Synthesize → Verify)

5.7 UI Implementation Approach (Flet)

State management

6) Testing Plan (Practical and CI-friendly)

Unit tests (fast)

Integration tests

E2E tests (required)

7) Release Checklist (V1)

8) "First Implementation Targets" (what to build first)

9) Minimal API Skeletons (so devs can start coding)

gRPC Service Definition (proto)

Client Callback Types

Client-Side Signals (UI updates)

10) Current Implementation Status

Summary by Milestone

Layer-by-Layer Status

11) Remaining Work Summary

Medium Priority (Platform Features)

Medium Priority (Diarization Wiring)

Lower Priority (Shipping)

Recommended Implementation Order

12) Architecture Reference

Key File Locations

Data Flow

Meeting Lifecycle States

38 KiB

Raw Blame History