- Created .dockerignore to exclude unnecessary files from Docker builds. - Added .repomixignore for managing ignored patterns in Repomix. - Introduced Dockerfile.dev for development environment setup with Python 3.12. - Configured docker-compose.yaml to define services, including a PostgreSQL database. - Established a devcontainer.json for Visual Studio Code integration. - Implemented postCreate.sh for automatic dependency installation in the dev container. - Added constants.py to centralize configuration constants for the project. - Updated pyproject.toml to include new development dependencies. - Created initial documentation files for project overview and style conventions. - Added tests for new functionalities to ensure reliability and correctness.
38 KiB
NoteFlow V1 Implementation Plan
Architecture: Client-Server with gRPC (evolved from original single-process design) Core principles: Local-first, mic capture baseline, partial→final transcripts, evidence-linked summaries with strict citation enforcement.
Last updated: December 2025
1) Milestones and Gates
Milestone 0 — Spikes to de-risk platform & pipeline ✅ COMPLETE
Goal: validate the 4 biggest "desktop app cliffs" before committing to architecture.
Spikes (each ends with a tiny working prototype + written findings):
-
UI + Tray + Hotkeys feasibility ✅
- Verified: system tray/menubar icon, notification prompt, global hotkey start/stop
- Flet works for main UI; pystray/pynput validated for tray + hotkeys
- Location:
spikes/spike_01_ui_tray_hotkeys/
-
Audio capture robustness ✅
- Validated
sounddevice.InputStreamwith PortAudio:- default mic capture works
- device unplug / device switch handling
- stable VU meter feed
- Location:
spikes/spike_02_audio_capture/
- Validated
-
ASR latency feasibility ✅
- faster-whisper benchmarked at 0.05x real-time (excellent)
- Model download/cache strategy validated
- Location:
spikes/spike_03_asr_latency/
-
Key storage + encryption approach ✅
- OS keystore integration works (Keychain/Credential Manager via
keyring) - Encrypted streaming audio file validated (chunked AES-GCM, 826 MB/s throughput)
- Location:
spikes/spike_04_encryption/
- OS keystore integration works (Keychain/Credential Manager via
Exit criteria (M0): ✅ ALL MET
- Start recording → see VU meter → stop → playback file on both OSs
- Run ASR over captured audio and display text in UI
- Store/read an encrypted blob using a stored master key
Milestone 1 — Repo foundation + CI + core contracts ✅ COMPLETE
Goal: establish maintainable structure, typing, test harness, logging.
Deliverables: ✅ ALL COMPLETE
- Repository layout with hexagonal architecture (domain → application → infrastructure)
pyproject.toml+ uv lockfile- Quality gates:
ruff,basedpyright,pytest - Structured logging (structlog) with content-safe defaults
- Settings system (Pydantic settings +
NOTEFLOW_env vars) - Minimal "app shell" (Flet UI opens, logs write)
Implementation locations:
- Domain:
src/noteflow/domain/(entities, ports, value objects) - Application:
src/noteflow/application/services/ - Infrastructure:
src/noteflow/infrastructure/ - Config:
src/noteflow/config/
Exit criteria: ✅ ALL MET
- CI passes lint/type/tests
- Running app opens a window (tray integration deferred to M5)
Milestone 2 — Meeting lifecycle + mic capture + crash-safe persistence ✅ COMPLETE
Goal: reliable recording as the foundation.
Deliverables: ✅ ALL COMPLETE
MeetingServicestate machine (CREATED → RECORDING → STOPPING → STOPPED → COMPLETED)- Location:
src/noteflow/application/services/meeting_service.py
- Location:
- Audio capture via
SoundDeviceCapture- Location:
src/noteflow/infrastructure/audio/capture.py
- Location:
- Encrypted streaming asset writer (NFAE format, AES-256-GCM)
- Location:
src/noteflow/infrastructure/audio/writer.py - Crypto:
src/noteflow/infrastructure/security/crypto.py
- Location:
- Meeting folder layout + manifest.json
- Format:
~/.noteflow/meetings/<meeting-uuid>/audio.enc+manifest.json
- Format:
- Active Meeting UI: timer + VU meter + start/stop
- Components:
recording_timer.py,vu_meter.pyinclient/components/
- Components:
- Crash recovery via
RecoveryService- Location:
src/noteflow/application/services/recovery_service.py - Detects meetings left in RECORDING/STOPPING state, marks as ERROR
- Location:
Exit criteria: ✅ ALL MET
- Record 30 minutes without UI freezing
- App restart after forced kill recovers incomplete meetings
Milestone 3 — Partial→Final transcription + transcript persistence ✅ COMPLETE
Goal: near real-time transcription with stability rules.
Deliverables: ✅ ALL COMPLETE
- ASR wrapper service (faster-whisper with word timestamps)
- Location:
src/noteflow/infrastructure/asr/engine.py - Supports 13 model sizes, CPU/GPU, word-level timestamps
- Location:
- VAD + segment finalization logic
- EnergyVad:
src/noteflow/infrastructure/asr/streaming_vad.py - Segmenter:
src/noteflow/infrastructure/asr/segmenter.py
- EnergyVad:
- Partial transcript feed to UI
- Server:
_maybe_emit_partial()called during streaming (service.py:601) - 2-second cadence with text deduplication
- Client: Handles
is_final=Falseinclient.py:458-467 - UI:
[LIVE]row with blue styling (transcript.py:182-219)
- Server:
- Final segments persisted to PostgreSQL + pgvector
- Repository:
src/noteflow/infrastructure/persistence/repositories/segment.py
- Repository:
- Post-meeting transcript view
- Component:
src/noteflow/client/components/transcript.py
- Component:
Implementation details:
- Server emits
UPDATE_TYPE_PARTIALevery 2 seconds during speech activity - Minimum 0.5 seconds of audio before partial inference
- Partial text deduplicated (only emitted when changed)
- Client renders partials with
is_final=Falseflag - UI displays
[LIVE]indicator with blue background, grey italic text - Partial row cleared when final segment arrives
Exit criteria: ✅ ALL MET
- Live view shows partial text that settles into final segments
- After restart, final segments are still present and searchable within the meeting
Milestone 4 — Review UX: playback, annotations, export ✅ MOSTLY COMPLETE
Goal: navigable recall loop.
Deliverables:
- Audio playback synced to segment timestamps
PlaybackControls:src/noteflow/client/components/playback_controls.pyPlaybackSyncController:src/noteflow/client/components/playback_sync.pySoundDevicePlayback:src/noteflow/infrastructure/audio/playback.py
- Add annotations in live view + review view
AnnotationToolbar:src/noteflow/client/components/annotation_toolbar.pyAnnotationDisplay:src/noteflow/client/components/annotation_display.py- All 4 types:
ACTION_ITEM,DECISION,NOTE,RISK
- Export: Markdown + HTML
ExportService:src/noteflow/application/services/export_service.py- Markdown exporter:
src/noteflow/infrastructure/export/markdown.py - HTML exporter:
src/noteflow/infrastructure/export/html.py
- Meeting library list + per-meeting search
MeetingLibrary:src/noteflow/client/components/meeting_library.pyTranscriptComponentwith search:src/noteflow/client/components/transcript.py
Previous gaps — now closed:
- Wire meeting library into the main UI and selection flow
- Add per-meeting transcript search (client-side filter)
- Add
riskannotation type end-to-end (domain enum, UI, persistence)
Exit criteria: ✅ ALL MET
- Clicking a segment seeks audio playback to that time
- Export produces correct Markdown/HTML for at least one meeting
Milestone 5 — Smart triggers (confidence model) + snooze/suppression ⚠️ PARTIALLY INTEGRATED
Goal: prompts that are helpful, not annoying.
Deliverables:
- Trigger engine + scoring
TriggerService:src/noteflow/application/services/trigger_service.py- Domain entities:
src/noteflow/domain/triggers/entities.py TriggerSignal,TriggerDecision,TriggerAction(IGNORE, NOTIFY, AUTO_START)
SignalProviderprotocol defined- Location:
src/noteflow/domain/triggers/ports.py
- Location:
- Foreground app detector integration
- Infrastructure:
src/noteflow/infrastructure/triggers/foreground_app.py - Wired via
TriggerMixin:src/noteflow/client/_trigger_mixin.py
- Infrastructure:
- Audio activity detector integration
- Infrastructure:
src/noteflow/infrastructure/triggers/audio_activity.py - Wired via
TriggerMixin:src/noteflow/client/_trigger_mixin.py
- Infrastructure:
- Optional calendar connector stub (disabled by default)
- Trigger prompts + snooze (AlertDialog, not system notifications)
TriggerMixin._show_trigger_prompt()displays AlertDialog- Snooze button integrated
- Rate limiting active
- System tray integration ← GAP
- Global hotkeys ← GAP
- Settings for sensitivity and auto-start opt-in (in
TriggerService)
Current integration status:
- Client app inherits from
TriggerMixin(app.py:65) - Signal providers initialized in
_initialize_triggers()method - Background trigger check loop runs via
_trigger_check_loop() - Handles NOTIFY and AUTO_START actions
- Prompts shown via Flet AlertDialog (not system notifications)
What works:
- Confidence scoring with configurable thresholds (0.40 notify, 0.80 auto-start)
- Rate limiting between triggers
- Snooze functionality with remaining time tracking
- Per-app suppression config
- Foreground app detection (PyWinCtl)
- Audio activity detection (RMS sliding window)
Remaining work:
-
System Tray Integration (New file:
src/noteflow/client/tray.py)- Integrate pystray for minimize-to-tray
- Show trigger prompts as system notifications
- Recording indicator icon
- Complexity: Medium (spike validated in
spikes/spike_01_ui_tray_hotkeys/)
-
Global Hotkeys (New file:
src/noteflow/client/hotkeys.py)- Integrate pynput for start/stop/annotation hotkeys
- Complexity: Medium (spike validated)
Exit criteria:
- Trigger prompts happen when expected and can be snoozed
- Prompt rate-limited to prevent spam
- System tray notifications (currently AlertDialog only)
- Global hotkeys for quick actions
Milestone 6 — Evidence-linked summaries (extract → synthesize → verify) ✅ COMPLETE
Goal: no uncited claims.
Deliverables:
- Summarizer provider interface
- Protocol:
src/noteflow/domain/summarization/ports.py - DTOs:
SummarizationRequest,SummarizationResult,CitationVerificationResult
- Protocol:
- Provider implementations (3 complete):
MockSummarizer:src/noteflow/infrastructure/summarization/mock_provider.pyOllamaSummarizer(local):src/noteflow/infrastructure/summarization/ollama_provider.pyCloudSummarizer(OpenAI/Anthropic):src/noteflow/infrastructure/summarization/cloud_provider.py
- Citation verifier + "uncited drafts" handling
CitationVerifier:src/noteflow/infrastructure/summarization/citation_verifier.py- Validates segment_ids, filters invalid citations
- Summary UI panel with clickable citations
SummaryPanel:src/noteflow/client/components/summary_panel.py- Shows key points + action items with evidence links
- "Uncited drafts hidden" toggle
- Factory function for service creation
create_summarization_service():src/noteflow/infrastructure/summarization/factory.py- Shared by client app and gRPC server
Application service complete:
SummarizationService:src/noteflow/application/services/summarization_service.py- Multi-provider with consent management
- Fallback chain: CLOUD → LOCAL → MOCK
- Citation verification and filtering
gRPC integration complete:
GenerateSummaryRPC callsSummarizationService.summarize()- Auto-detects provider availability (tries LOCAL, falls back to MOCK)
- Placeholder fallback if service unavailable
Exit criteria: ✅ ALL MET
- Every displayed bullet has citations (RPC wired to real service)
- Clicking bullet jumps to cited transcript segment and audio timestamp
Milestone 7 — Retention, deletion, telemetry (opt-in), packaging ⚠️ RETENTION COMPLETE
Goal: ship safely.
Deliverables:
- Retention job
RetentionService:src/noteflow/application/services/retention_service.py- Configurable retention days, dry-run support
- Runs at startup and periodically
- Delete meeting (cryptographic delete)
MeetingService.delete_meeting()removes:- Database rows (meeting, segments, summary, annotations)
- Encrypted audio file (
audio.enc) - Wrapped DEK from manifest (renders audio unrecoverable)
- Meeting directory
- Optional telemetry (content-free) ← GAP
- PyInstaller build ← GAP
- "Check for updates" flow ← GAP
- Release checklist & troubleshooting docs
What's implemented:
-
Meeting deletion cascade is complete:
- DB cascade: meeting → segments → summary → annotations
- Filesystem:
~/.noteflow/meetings/<meeting-uuid>/removed - Crypto: DEK deleted from manifest, audio unrecoverable
-
Retention service is complete:
RetentionService.run_cleanup()with dry-run- Finds meetings older than retention cutoff
- Generates
RetentionReportwith counts - Integration tests validate cascade
Remaining work:
-
PyInstaller Packaging (New: build scripts)
- Create distributable for macOS + Windows
- Complexity: High (cross-platform, native deps)
-
Code Signing
- macOS notarization, Windows signing
- Complexity: Medium
-
Update Check Flow
- Version display + "Check for Updates" → release page
- Complexity: Low
-
Telemetry (Opt-in)
- Content-free metrics: crash stacktrace, latency, feature flags
- Complexity: Medium
Exit criteria:
- A signed installer that installs and runs on both OSs
- Deleting a meeting removes DB rows + assets; audio cannot be decrypted after key deletion
Milestone 8 (Optional pre‑release) — Post-meeting anonymous diarization ✅ COMPLETE
Goal: "Speaker A/B/C" best-effort labeling.
Deliverables:
- Diarization engine with streaming + offline modes
- Location:
src/noteflow/infrastructure/diarization/engine.py(315 lines) - Streaming:
diartlibrary for real-time processing - Offline:
pyannote.audiofor post-meeting refinement - Device support: auto, cpu, cuda, mps
- Location:
- Speaker assignment logic
- Location:
src/noteflow/infrastructure/diarization/assigner.py assign_speaker()maps time ranges via maximum overlapassign_speakers_batch()for bulk assignment- Confidence scoring based on overlap duration
- Location:
- Data transfer objects
- Location:
src/noteflow/infrastructure/diarization/dto.py SpeakerTurnwith validation and overlap methods
- Location:
- Domain entity updates
Segment.speaker_id: str | Noneandspeaker_confidence: float
- Proto/gRPC definitions
FinalSegment.speaker_idandspeaker_confidencefieldsServerInfo.diarization_enabledanddiarization_readyflagsRefineSpeakerDiarizationandRenameSpeakerRPCs
- gRPC refinement RPC
refine_speaker_diarization()inservice.pyfor post-meeting processingrename_speaker()for user-friendly speaker labels
- Configuration/settings
diarization_enabled,diarization_hf_token,diarization_devicediarization_streaming_latency,diarization_min/max_speakers
- Dependencies added
- Optional extra
[diarization]: pyannote.audio, diart, torch
- Optional extra
- UI display
- Speaker labels with color coding in
transcript.py - "Analyze Speakers" and "Rename Speakers" buttons in
meeting_library.py
- Speaker labels with color coding in
- Server initialization
DiarizationEnginewired inserver.pywith CLI args--diarization,--diarization-hf-token,--diarization-deviceflags
- Client integration
refine_speaker_diarization()andrename_speaker()methods inclient.pyDiarizationResultandRenameSpeakerResultDTOs
- Tests
- 24 unit tests in
tests/infrastructure/test_diarization.py - Covers
SpeakerTurn,assign_speaker(),assign_speakers_batch()
- 24 unit tests in
Deferred (optional future enhancement):
- Streaming integration - Real-time speaker labels during recording
- Feed audio chunks to diarization during
StreamTranscription - Emit speaker changes in real-time
- Complexity: High (requires significant latency tuning)
- Feed audio chunks to diarization during
Exit criteria: ✅ ALL MET
- If diarization fails, app degrades gracefully to "Unknown."
- Post-meeting diarization refinement works end-to-end
- (Optional) Streaming diarization shows live speaker labels — deferred
2) Proposed Repository Layout
This layout is designed to:
- separate server and client concerns,
- isolate platform-specific code,
- keep modules < 500 LoC,
- make DI clean,
- keep writing to disk centralized.
noteflow/
├─ pyproject.toml
├─ src/noteflow/
│ ├─ core/
│ │ ├─ config.py # Settings (Pydantic) + load/save
│ │ ├─ logging.py # structlog config, redaction helpers
│ │ ├─ types.py # common NewTypes / Protocols
│ │ └─ errors.py # domain error types
│ │
│ ├─ grpc/ # gRPC server components
│ │ ├─ proto/
│ │ │ ├─ noteflow.proto # Service definitions
│ │ │ ├─ noteflow_pb2.py # Generated protobuf
│ │ │ └─ noteflow_pb2_grpc.py
│ │ ├─ server.py # Server entry point
│ │ ├─ service.py # NoteFlowServicer implementation
│ │ ├─ meeting_store.py # In-memory meeting management
│ │ └─ client.py # gRPC client wrapper
│ │
│ ├─ client/ # GUI client application
│ │ ├─ app.py # Flet app entry point
│ │ ├─ state.py # App state store
│ │ └─ components/
│ │ ├─ transcript.py
│ │ ├─ vu_meter.py
│ │ └─ summary_panel.py
│ │
│ ├─ audio/ # Audio capture (client-side)
│ │ ├─ capture.py # sounddevice InputStream wrapper
│ │ ├─ levels.py # RMS/VU meter computation
│ │ ├─ ring_buffer.py # timestamped audio buffer
│ │ └─ playback.py # audio playback synced to timestamp
│ │
│ ├─ asr/ # ASR engine (server-side)
│ │ ├─ engine.py # faster-whisper wrapper + model cache
│ │ ├─ segmenter.py # partial/final logic, silence boundaries
│ │ └─ dto.py # ASR outputs (words optional)
│ │
│ ├─ data/ # Persistence (server-side)
│ │ ├─ db.py # LanceDB connection + table handles
│ │ ├─ schema.py # table schemas + version
│ │ └─ repos/
│ │ ├─ meetings.py
│ │ ├─ segments.py
│ │ └─ summaries.py
│ │
│ ├─ platform/ # Platform-specific (client-side)
│ │ ├─ tray/ # tray/menubar (pystray)
│ │ ├─ hotkeys/ # global hotkeys (pynput)
│ │ └─ notifications/ # toast notifications
│ │
│ └─ summarization/ # Summary generation (server-side)
│ ├─ providers/
│ │ ├─ base.py
│ │ └─ cloud.py
│ ├─ prompts.py
│ └─ verifier.py
│
├─ spikes/ # De-risking spikes (M0)
│ ├─ spike_01_ui_tray_hotkeys/
│ ├─ spike_02_audio_capture/
│ ├─ spike_03_asr_latency/
│ └─ spike_04_encryption/
│
└─ tests/
├─ unit/
├─ integration/
└─ e2e/
3) Core Runtime Design
3.1 State Machine (Meeting Lifecycle)
Define explicitly so UI + services remain consistent.
IDLE
├─ start(manual/trigger) → RECORDING
└─ prompt(trigger) → PROMPTED
PROMPTED
├─ accept → RECORDING
└─ dismiss/snooze → IDLE
RECORDING
├─ stop → STOPPING
├─ error(audio) → ERROR (with recover attempt)
└─ crash → RECOVERABLE_INCOMPLETE on restart
STOPPING
├─ flush assets/segments → REVIEW_READY
└─ failure → REVIEW_READY (marked incomplete)
REVIEW_READY
├─ summarize → REVIEW_READY (summary updated)
└─ delete → IDLE
Invariant: segments are only “final” when persisted. Partial text is never persisted.
3.2 Threading + Queue Model (Client-Server)
Server Threads:
- gRPC thread pool: handles incoming RPC requests
- ASR worker thread: processes audio buffers through faster-whisper
- IO worker thread: only place that writes DB + manifest updates
- Background jobs: summarization, diarization, retention
Client Threads:
- Main/UI thread: Flet rendering + user actions
- Audio callback thread: receives frames, does minimal work:
- compute lightweight RMS for VU meter
- enqueue frames to gRPC stream queue
- gRPC stream thread: sends audio chunks, receives transcript updates
- Event dispatch: updates UI from transcript callbacks
Rules:
- Anything blocking > 5ms does not run in the audio callback
- Only the server's IO worker writes to the database
4) Dependency Injection and Service Wiring
Use a small container (manual DI) rather than a framework.
# core/types.py
from typing import Protocol
class Clock(Protocol):
def monotonic(self) -> float: ...
def now(self): ...
class Notifier(Protocol):
def prompt_recording(self, title: str, body: str) -> None: ...
def toast(self, title: str, body: str) -> None: ...
class ForegroundAppProvider(Protocol):
def current_app(self) -> str | None: ...
class KeyStore(Protocol):
def get_or_create_master_key(self) -> bytes: ...
# app.py (wiring idea)
def build_container() -> AppContainer:
settings = load_settings()
logger = configure_logging(settings)
keystore = build_keystore()
crypt = CryptoBox(keystore)
db = LanceDatabase(settings.paths.db_dir)
repos = Repositories(db)
jobs = JobQueue(...)
audio = AudioCapture(...)
asr = AsrEngine(...)
meeting = MeetingService(...)
triggers = TriggerService(...)
ui = UiController(...)
return AppContainer(...)
5) Detailed Subsystem Plans
5.1 Audio Capture + Assets
AudioCapture
Responsibilities:
- open/close stream
- handle device change / reconnect
- feed ring buffer
- expose current level for VU meter
Key APIs:
class AudioCapture:
def start(self, on_frames: Callable[[np.ndarray, float], None]) -> None: ...
def stop(self) -> None: ...
def current_device(self) -> AudioDeviceInfo: ...
RingBuffer (timestamped)
- store
(timestamp, frames)so segment times are stable even if UI thread lags - provide “last N seconds” view for ASR worker
VAD
Define an interface so you can swap implementations (webrtcvad vs silero) without rewriting pipeline.
class Vad:
def is_speech(self, pcm16: bytes, sample_rate: int) -> bool: ...
Encrypted Audio Container (streaming)
Implementation approach (V1-safe): encrypted chunk format (AES-GCM) storing PCM16 frames. Optional: later add “compress after meeting” job (Opus) once stable.
Writer contract:
- write header once
- write chunks frequently (every ~200–500ms)
- flush frequently (crash-safe)
Deletion contract:
- delete per-meeting DEK record first (crypto delete)
- delete meeting folder
5.2 ASR and Segment Finalization
ASR Engine Wrapper (faster-whisper)
Responsibilities:
- model download/cache
- run inference
- return tokens/segments with timestamps (word timestamps optional)
class AsrEngine:
def transcribe(self, audio_f32_16k: np.ndarray) -> AsrResult: ...
Segmenter (partial/final)
Responsibilities:
- build current “active utterance” from VAD-speech frames
- run partial inference every N seconds
- finalize when silence boundary detected
Data contract:
- PartialUpdate:
{text, start_offset, end_offset, stable=False} - FinalSegment:
{segment_id, text, start_offset, end_offset, stable=True}
Important: final segments get their IDs at commit time (IO worker), not earlier.
5.3 Persistence (LanceDB + repositories)
DB access policy
- One DB connection managed centrally
- IO worker serializes all writes
Repositories:
MeetingsRepo: create/update meeting status, store DEK metadata referenceSegmentsRepo: append segments, query by meeting, basic searchAnnotationsRepo: add/list annotationsSummariesRepo: store summary + verification report
Also store:
- schema version
- app version
- migration logic (even if minimal)
5.4 MeetingService (Orchestration)
Responsibilities:
- create meeting directory + metadata
- start/stop audio capture
- start/stop ASR segmenter
- handle UI events (annotation hotkeys, stop, etc.)
- coordinate with TriggerService
- ensure crash-safe flush and marking incomplete
Key public API:
class MeetingService:
def start(self, source: TriggerSource) -> MeetingID: ...
def stop(self) -> None: ...
def add_annotation(self, type: AnnotationType, text: str | None = None) -> None: ...
def current_meeting_id(self) -> MeetingID | None: ...
5.5 TriggerService (Confidence Model + throttling)
Inputs (each independently optional):
- calendar (optional connector)
- foreground app provider
- audio activity provider
Outputs:
- prompt notification
- optional auto-start (if user enabled)
- snooze & suppression state
Policies:
- rate limit prompts (e.g., max 1 prompt / 10 min)
- cooldown after dismiss
- per-app suppression config
Implementation detail:
-
TriggerService publishes events via signals:
trigger_promptedtrigger_snoozedtrigger_accepted
5.6 Summarization Service (Extract → Synthesize → Verify)
Provider interface:
class SummarizerProvider(Protocol):
def extract(self, transcript: str) -> ExtractionResult: ...
def synthesize(self, extraction: ExtractionResult) -> DraftSummary: ...
Verifier:
- parse bullets
- ensure each displayed bullet contains
[...]with at least one Segment ID - uncited bullets go into
uncited_pointsand are hidden by default
UI behavior:
- Summary panel shows “X uncited drafts hidden” toggle
- Clicking bullet scrolls transcript and seeks audio
Testing requirement:
- Summary verifier must be unit-tested with adversarial outputs (missing brackets, invalid IDs, empty citations).
5.7 UI Implementation Approach (Flet)
State management
Treat UI as a thin layer over a single state store:
-
AppState- current meeting status
- live transcript partial
- list of finalized segments
- playback state
- summary state
- settings state
- prompt/snooze state
Changes flow:
- Services emit signals (blinker)
- UI controller converts signal payload → state update → re-render
This avoids UI code reaching into services and creating race conditions.
6) Testing Plan (Practical and CI-friendly)
Unit tests (fast)
- Trigger scoring + thresholds
- Summarization verifier
- Segment model validation (
end >= start) - Retention policy logic
- Encryption chunk read/write roundtrip
Integration tests
- DB CRUD roundtrip for each repo
- Meeting create → segments append → summary store
- Delete meeting removes all rows and assets
E2E tests (required)
Audio injection harness
-
Feed prerecorded WAV into AudioCapture abstraction (mock capture)
-
Run through VAD + ASR pipeline
-
Assert:
- segments are produced
- partial updates happen
- final segments persist
- seeking works (timestamp consistency)
Note: CI should never require a live microphone.
7) Release Checklist (V1)
- Recording indicator always visible when capturing
- Permission errors show actionable instructions
- Crash recovery works for incomplete meetings
- Summary bullets displayed are always cited
- Delete meeting removes keys + assets + DB rows
- Telemetry default off; no content ever logged
- Build artifacts install/run on macOS + Windows
8) "First Implementation Targets" (what to build first)
Build server-side first, then client, to ensure reliable foundation:
Server (build first):
- gRPC service skeleton - proto definitions + basic server startup
- Meeting store - in-memory meeting lifecycle management
- ASR integration - faster-whisper wrapper with streaming output
- Bidirectional streaming - audio in, transcripts out
- Persistence - LanceDB storage for meetings/segments
- Summarization - evidence-linked summary generation
Client (build second): 7. gRPC client wrapper - connection management + streaming 8. Audio capture - sounddevice integration + VU meter 9. Live UI - Flet app with transcript display 10. Tray + hotkeys - pystray/pynput integration 11. Review view - playback synced to transcript 12. Packaging - PyInstaller for both server and client
This ordering ensures the server is stable before building client features on top.
9) Minimal API Skeletons (so devs can start coding)
gRPC Service Definition (proto)
service NoteFlowService {
// Bidirectional streaming: audio → transcripts
rpc StreamTranscription(stream AudioChunk) returns (stream TranscriptUpdate);
// Meeting lifecycle
rpc CreateMeeting(CreateMeetingRequest) returns (Meeting);
rpc StopMeeting(StopMeetingRequest) returns (Meeting);
rpc ListMeetings(ListMeetingsRequest) returns (ListMeetingsResponse);
rpc GetMeeting(GetMeetingRequest) returns (Meeting);
// Summary generation
rpc GenerateSummary(GenerateSummaryRequest) returns (Summary);
// Server health
rpc GetServerInfo(ServerInfoRequest) returns (ServerInfo);
}
Client Callback Types
# Client receives these from server via gRPC stream
@dataclass
class TranscriptSegment:
segment_id: int
text: str
start_time: float
end_time: float
language: str
is_final: bool
# Callback signatures
TranscriptCallback = Callable[[TranscriptSegment], None]
ConnectionCallback = Callable[[bool, str], None] # connected, message
Client-Side Signals (UI updates)
# client/signals.py - for UI thread dispatch
from blinker import signal
audio_level_updated = signal("audio_level_updated") # rms: float
transcript_received = signal("transcript_received") # TranscriptSegment
connection_changed = signal("connection_changed") # connected: bool, message: str
And a “job queue” minimal contract:
class JobQueue:
def submit(self, job: "Job") -> None: ...
def cancel(self, job_id: str) -> None: ...
class Job(Protocol):
id: str
def run(self) -> None: ...
10) Current Implementation Status
Summary by Milestone
| Milestone | Status | Completeness |
|---|---|---|
| M0 Spikes | ✅ Complete | 100% |
| M1 Repo Foundation | ✅ Complete | 100% |
| M2 Meeting Lifecycle | ✅ Complete | 100% |
| M3 Transcription | ✅ Complete | 100% |
| M4 Review UX | ✅ Complete | 100% |
| M5 Triggers | ⚠️ Partial | 70% (integrated via mixin, tray/hotkeys not) |
| M6 Summarization | ✅ Complete | 100% |
| M7 Packaging | ⚠️ Partial | 40% (retention done, packaging not) |
| M8 Diarization | ⚠️ Partial | 55% (infrastructure done, wiring not) |
Layer-by-Layer Status
Domain Layer ✅ 100%
- Meeting entity with state machine
- Segment entity with word-level timing
- Annotation entity (4 types)
- Summary entity with evidence links (KeyPoint, ActionItem)
- Repository ports (Protocol-based DI)
- Unit of Work port
- Trigger domain (TriggerSignal, TriggerDecision)
- Summarization ports
Application Layer ✅ 100%
MeetingService- full CRUD + lifecycleSummarizationService- multi-provider, consent, verificationTriggerService- scoring, rate limiting, snoozeRetentionService- cleanup, dry-runExportService- Markdown, HTMLRecoveryService- crash recovery
Infrastructure Layer ✅ 98%
- Audio: capture, ring buffer, levels, playback, encrypted writer/reader
- ASR: faster-whisper engine, VAD, segmenter
- Persistence: SQLAlchemy + pgvector, Alembic migrations
- Security: AES-256-GCM, keyring keystore
- Summarization: Mock, Ollama, Cloud providers + citation verifier
- Export: Markdown, HTML formatters
- Triggers: signal providers wired via TriggerMixin
- Diarization: engine, assigner, DTOs (not wired to server)
gRPC Layer ✅ 100%
- Proto definitions with bidirectional streaming
- Server: StreamTranscription, CreateMeeting, StopMeeting, etc.
- Client wrapper with connection management
- Meeting store (in-memory + DB modes)
- GenerateSummary RPC wired to SummarizationService
- Partial transcript streaming (2-second cadence, deduplication)
Client Layer ✅ 85%
- Flet app with state management
- VU meter, recording timer, transcript
- Playback controls + sync controller
- Annotation toolbar + display
- Meeting library
- Summary panel with clickable citations
- Connection panel with auto-reconnect
- Trigger detection via TriggerMixin (AlertDialog prompts)
- System tray integration (spike validated, not integrated)
- Global hotkeys (spike validated, not integrated)
11) Remaining Work Summary
Medium Priority (Platform Features)
| # | Task | Files | Complexity | Blocker For |
|---|---|---|---|---|
| 1 | System Tray Integration | New: src/noteflow/client/tray.py |
Medium | M5 completion |
| Integrate pystray for minimize-to-tray, system notifications, recording indicator | ||||
| 2 | Global Hotkeys | New: src/noteflow/client/hotkeys.py |
Medium | M5 completion |
| Integrate pynput for start/stop/annotation hotkeys |
Medium Priority (Diarization Wiring)
| # | Task | Files | Complexity | Blocker For |
|---|---|---|---|---|
| 3 | Diarization Application Service | New: application/services/diarization_service.py |
Medium | M8 completion |
| Orchestrate diarization workflow, model management | ||||
| 4 | Diarization Server Wiring | src/noteflow/grpc/server.py |
Low | M8 completion |
| Initialize DiarizationEngine on startup when enabled | ||||
| 5 | Diarization Tests | New: tests/infrastructure/diarization/ |
Medium | M8 stability |
| Unit tests for engine, assigner, DTOs |
Lower Priority (Shipping)
| # | Task | Files | Complexity | Blocker For |
|---|---|---|---|---|
| 6 | PyInstaller Packaging | New: build scripts | High | M7 release |
| Create distributable for macOS + Windows | ||||
| 7 | Code Signing | Build config | Medium | M7 release |
| macOS notarization, Windows signing | ||||
| 8 | Update Check Flow | New: src/noteflow/client/update.py |
Low | M7 release |
| Version display + "Check for Updates" link | ||||
| 9 | Telemetry (Opt-in) | New: telemetry module | Medium | M7 release |
| Content-free metrics collection |
Recommended Implementation Order
- System Tray + Hotkeys (Can be done in parallel, completes M5)
- Diarization Wiring (Server init + tests, completes M8 core)
- PyInstaller Packaging (Enables distribution)
- Remaining M7 items (Polish for release)
12) Architecture Reference
Key File Locations
| Component | Location |
|---|---|
| Domain Entities | src/noteflow/domain/entities/ |
| Repository Ports | src/noteflow/domain/ports/repositories.py |
| Application Services | src/noteflow/application/services/ |
| gRPC Server | src/noteflow/grpc/server.py, service.py |
| gRPC Client | src/noteflow/grpc/client.py |
| Audio Capture | src/noteflow/infrastructure/audio/ |
| ASR Engine | src/noteflow/infrastructure/asr/ |
| Persistence | src/noteflow/infrastructure/persistence/ |
| Security | src/noteflow/infrastructure/security/ |
| Summarization | src/noteflow/infrastructure/summarization/ |
| Client App | src/noteflow/client/app.py |
| UI Components | src/noteflow/client/components/ |
Data Flow
┌─────────────────────────────────────────────────────────────┐
│ CLIENT │
├─────────────────────────────────────────────────────────────┤
│ Audio Capture → VU Meter → gRPC Stream → UI Components │
│ ↓ ↑ │
│ sounddevice Transcript Updates │
└─────────────────────────────────────────────────────────────┘
│ gRPC
▼
┌─────────────────────────────────────────────────────────────┐
│ SERVER │
├─────────────────────────────────────────────────────────────┤
│ Audio Buffer → VAD → Segmenter → ASR Engine │
│ ↓ ↓ │
│ Encrypted Writer Final Segments │
│ ↓ ↓ │
│ audio.enc PostgreSQL + pgvector │
└─────────────────────────────────────────────────────────────┘
Meeting Lifecycle States
CREATED → RECORDING → STOPPING → STOPPED → COMPLETED
↓ ↓ ↓
ERROR ←──────┴───────────┘ (crash recovery)