- Updated basedpyright linting results (705 files analyzed, analysis time reduced from 22.928s to 13.105s). - Updated biome linting artifact with warning about unnecessary hook dependency (preferencesVersion) in MeetingDetail.tsx.
28 KiB
SPRINT-GAP-011: Post-Processing Pipeline Gaps
| Attribute | Value |
|---|---|
| Sprint | GAP-011 |
| Size | L (Large) |
| Owner | TBD |
| Phase | Hardening |
| Prerequisites | GAP-003 (Error Handling), GAP-004 (Diarization Lifecycle) |
Executive Summary
After a meeting recording completes, the system fails to automatically trigger post-processing workflows (summarization, entity extraction, diarization refinement). Users see only raw recordings without transcriptions, summaries, or extracted intelligence. The architecture has all the components but lacks orchestration to connect them.
Open Issues
- Define post-processing trigger strategy (server-side vs client-side orchestration)
- Determine parallel vs sequential execution of post-processing steps
- Decide on failure handling for individual processing steps
- Define retry policy for failed processing
- Establish processing completion signals
Validation Status
| Component | Exists | Status |
|---|---|---|
| Streaming transcription | Yes | Working |
| GenerateSummary RPC | Yes | Manual trigger only |
| ExtractEntities RPC | Yes | Manual trigger only |
| RefineSpeakerDiarization RPC | Yes | Manual trigger only |
| Auto-trigger on meeting stop | No | Gap - needs implementation |
| Processing completion signals | No | Gap - needs implementation |
| Progress tracking UI | Partial | Summary has events, others missing |
| Client-side orchestration | No | Gap - needs implementation |
Objective
Implement automatic post-processing orchestration that triggers summarization, entity extraction, and diarization refinement after a meeting recording stops, with proper progress tracking, error handling, and completion signals.
Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Orchestration location | Client-side | Server should remain stateless; client knows user preferences |
| Processing order | Parallel where possible | Summarization and NER can run concurrently; diarization can start immediately |
| Failure handling | Continue on failure | One failed step shouldn't block others |
| Retry policy | User-initiated | Auto-retry risks resource exhaustion |
| Completion tracking | Per-step status | Enable partial success states |
What Already Exists
Backend Infrastructure
Summarization Service (src/noteflow/grpc/_mixins/summarization.py)
GenerateSummaryRPC fully functional- Returns cached summary if exists (unless
force_regenerate=True) - Fires
SUMMARY_GENERATEDwebhook on completion - Supports multiple providers (Cloud, Ollama, Mock)
Entity Extraction (src/noteflow/grpc/_mixins/entities.py)
ExtractEntitiesRPC fully functional- Feature flag gated (
NOTEFLOW_FEATURE_NER_ENABLED) - Returns cached entities if exist (unless
force_refresh=True) - Uses spaCy NER engine
Diarization Refinement (src/noteflow/grpc/_mixins/diarization/_mixin.py)
RefineSpeakerDiarizationRPC launches background jobGetDiarizationJobStatusfor polling- Persisted to database for recovery
Webhook Events (src/noteflow/domain/webhooks/events.py)
class WebhookEventType(str, Enum):
RECORDING_STARTED = "recording.started"
RECORDING_STOPPED = "recording.stopped"
MEETING_COMPLETED = "meeting.completed"
SUMMARY_GENERATED = "summary.generated"
# Missing: ENTITIES_EXTRACTED, DIARIZATION_COMPLETED
Client Infrastructure
Tauri Adapter (client/src/api/tauri-adapter.ts)
// All RPCs available but require manual invocation
generateSummary(meetingId: string, forceRegenerate?: boolean): Promise<Summary>
extractEntities(meetingId: string, forceRefresh?: boolean): Promise<ExtractedEntity[]>
refineSpeakers(meetingId: string, numSpeakers?: number): Promise<{ jobId: string }>
getDiarizationJobStatus(jobId: string): Promise<DiarizationJobStatus>
Diarization Hook (client/src/hooks/use-diarization.ts)
- Polling pattern with exponential backoff
- Max duration aligned to server timeout
- Auto-recovery on mount
- Progress event handling
Entity Extraction Hook (client/src/hooks/use-entity-extraction.ts)
// Auto-extract feature EXISTS but is DISABLED
useEffect(() => {
if (autoExtract && meetingId && meetingState === 'completed') {
extract(false); // This would work!
}
}, [autoExtract, meetingId, meetingState, extract]);
Summary Progress Events (client/src-tauri/src/commands/summary.rs)
// Server emits progress events every second
emit_summary_progress(app.clone(), meeting_id.clone(), elapsed_s);
// But no React component listens for these
Identified Issues
Gap 1: No Automatic Post-Meeting Workflows (Critical)
Severity: Critical Impact: Core feature missing - users get no processed output
Location: src/noteflow/grpc/_mixins/meeting.py:246-285
async def StopMeeting(self, request, context):
# ... state transition logic ...
# Webhooks fire, but no processing triggers
await self._fire_webhooks_safely(
lambda: self._webhook_service.trigger_recording_stopped(...)
)
await self._fire_webhooks_safely(
lambda: self._webhook_service.trigger_meeting_completed(...)
)
# GAP: No call to:
# - GenerateSummary
# - ExtractEntities
# - RefineSpeakerDiarization
return response
Problem: After StopMeeting() completes:
- Webhooks fire for external integrations
- But no internal processing is triggered
- All processing requires explicit client RPC calls
Gap 2: Client Immediately Navigates Away (Critical)
Severity: Critical Impact: No opportunity for client-side orchestration
Location: client/src/pages/Recording.tsx:313-346
const stopRecording = async () => {
setIsRecording(false);
streamRef.current?.close();
const stoppedMeeting = await api.stopMeeting(meeting.id);
setMeeting(stoppedMeeting);
// GAP: Immediately navigates away - no processing triggered
navigate(
projectId
? `/projects/${projectId}/meetings/${meeting.id}`
: '/projects'
);
};
Problem:
- Stop button triggers navigation immediately
- No post-processing orchestration before navigation
- User lands on detail page with empty summary/entities
Gap 3: MeetingDetail Only Fetches, Doesn't Process (Critical)
Severity: Critical Impact: Viewing a meeting doesn't trigger missing processing
Location: client/src/pages/MeetingDetail.tsx:79-98
useEffect(() => {
const loadMeeting = async () => {
const data = await getAPI().getMeeting({
meeting_id: id,
include_segments: true,
include_summary: true, // Fetches existing, doesn't generate
});
setMeeting(data.meeting);
setSegments(data.segments || []);
setSummary(data.summary); // null if not generated
};
loadMeeting();
}, [id]);
Problem:
include_summary: trueonly includes existing summary- Does not trigger generation if missing
- User must manually click "Generate Summary" button
Gap 4: Auto-Extract Feature Disabled (Medium)
Severity: Medium Impact: Working feature not utilized
Location: client/src/hooks/use-entity-extraction.ts:116-121
// Feature exists but is never enabled
useEffect(() => {
if (autoExtract && meetingId && meetingState === 'completed') {
extract(false);
}
}, [autoExtract, meetingId, meetingState, extract]);
Location: client/src/pages/MeetingDetail.tsx:72-76
const { extract: extractEntities } = useEntityExtraction({
meetingId: id,
meetingTitle: meeting?.title,
meetingState: meeting?.state,
// autoExtract: true <-- MISSING
});
Problem:
- Hook has auto-extract capability
- Never enabled in consuming components
- Simple fix: pass
autoExtract: true
Gap 5: No Processing Completion Signals (High)
Severity: High Impact: Client cannot track what processing is done
Location: src/noteflow/domain/entities/meeting.py
class Meeting:
id: MeetingId
title: str
state: MeetingState
# ... other fields ...
# GAP: No processing status fields
# Missing:
# - transcription_complete: bool
# - summary_generated: bool
# - entities_extracted: bool
# - diarization_refined: bool
Problem:
- Meeting entity has no processing status tracking
- Client must query each subsystem separately
- No way to show "processing complete" badge
Gap 6: Summary Progress Events Ignored (Medium)
Severity: Medium Impact: No progress UI during summarization
Location: client/src-tauri/src/commands/summary.rs:96-134
tauri::async_runtime::spawn(async move {
let mut interval = tokio::time::interval(Duration::from_secs(1));
loop {
interval.tick().await;
let elapsed_s = start.elapsed().as_secs();
// Emits event but nothing listens
emit_summary_progress(app.clone(), meeting_id.clone(), elapsed_s);
if elapsed_s >= 300 { break; }
}
});
Problem:
- Server emits
summary_progressevents every second - No React component subscribes to these events
- User sees no progress during potentially long summarization
Gap 7: No Diarization Auto-Trigger (Medium)
Severity: Medium Impact: Speaker labels never refined automatically
Location: client/src/hooks/use-diarization.ts
// Hook provides start() function but nothing calls it automatically
export function useDiarization(options: UseDiarizationOptions = {}) {
const start = useCallback(async (meetingId: string, numSpeakers?: number) => {
// ... implementation
}, [api]);
// No auto-start logic exists
return { state, start, cancel, poll, recover };
}
Location: client/src/pages/MeetingDetail.tsx
- No call to
start(meetingId)on mount or meeting completion - Diarization button exists but requires manual click
Problem:
- Diarization hook is well-designed with polling
- But never triggered automatically after recording
- User must manually click "Refine Speakers" button
Gap 8: Silent ASR Error Handling (Medium)
Severity: Medium Impact: Transcription failures invisible to user
Location: src/noteflow/grpc/_mixins/streaming/_asr.py:45-60
async def process_audio_segment(
host: "ServicerHost",
audio: np.ndarray,
# ...
) -> AsyncIterator[TranscriptUpdate]:
if audio.size == 0:
return # Silent return - no logging
if not host.asr_engine:
return # Silent return - user not informed
# ... processing
Problem:
- Empty audio segments silently ignored
- ASR engine unavailability silently skipped
- No user notification of processing failures
- Only visible in server logs
Architecture
Current State (Broken)
Recording.tsx Server MeetingDetail.tsx
│ │ │
│ stopRecording() │ │
├────────────────────────────>│ │
│ │ StopMeeting() │
│ │ ├─ State → STOPPED │
│ │ ├─ Webhooks fired │
│ │ └─ [NO PROCESSING] │
│<────────────────────────────┤ │
│ │ │
│ navigate(/meeting/id) ──────┼──────────────────────────────────>
│ │ │
│ │ getMeeting() │
│ │<─────────────────────────────────┤
│ │ include_summary: true │
│ │ (fetches null - not generated) │
│ ├─────────────────────────────────>│
│ │ │
│ │ Shows empty summary │
Target State (Fixed)
Recording.tsx Server MeetingDetail.tsx
│ │ │
│ stopRecording() │ │
├────────────────────────────>│ │
│ │ StopMeeting() │
│ │ ├─ State → STOPPED │
│ │ └─ Webhooks fired │
│<────────────────────────────┤ │
│ │ │
│ navigate(/meeting/id) ──────┼──────────────────────────────────>
│ │ │
│ │ getMeeting() │
│ │<─────────────────────────────────┤
│ ├─────────────────────────────────>│
│ │ │
│ │ ┌─────────────────────────────┐│
│ │ │ usePostProcessing() hook ││
│ │ │ ├─ Check processing status ││
│ │ │ ├─ Trigger if needed: ││
│ │ │ │ ├─ generateSummary() ││
│ │ │ │ ├─ extractEntities() ││
│ │ │ │ └─ refineSpeakers() ││
│ │ │ └─ Track progress ││
│ │ └─────────────────────────────┘│
│ │ │
│ │<── generateSummary() ────────────┤
│ │────────────────────────────────>│
│ │ Progress events... │
│ │<── extractEntities() ────────────┤
│ │────────────────────────────────>│
│ │<── refineSpeakers() ─────────────┤
│ │────────────────────────────────>│
│ │ Poll status... │
│ │ │
│ │ All complete │
│ │ Show full meeting │
Scope
Task Breakdown
| Task | Effort | Priority | Description |
|---|---|---|---|
Create usePostProcessing hook |
M | P0 | Orchestrates all post-processing with progress tracking |
| Add processing status to Meeting entity | S | P0 | Track what processing has been done |
| Enable auto-extract in MeetingDetail | S | P0 | Pass autoExtract: true to entity hook |
| Add summary progress listener | S | P1 | Subscribe to summary_progress events |
| Create auto-diarization trigger | S | P1 | Trigger refinement on meeting completion |
Add ProcessingStatus component |
M | P1 | Shows progress for all processing steps |
| Add missing webhook events | S | P2 | ENTITIES_EXTRACTED, DIARIZATION_COMPLETED |
| Add ASR error surfacing | S | P2 | Emit events for ASR failures |
| Add processing status to proto | S | P2 | Include in GetMeeting response |
Files to Create
Client:
client/src/hooks/use-post-processing.ts- Orchestration hookclient/src/components/meeting/processing-status.tsx- Progress UI component
Files to Modify
Client:
client/src/pages/MeetingDetail.tsx- AddusePostProcessinghook, enable auto-extractclient/src/hooks/use-entity-extraction.ts- Minor fixes if neededclient/src/hooks/use-diarization.ts- Add auto-start capabilityclient/src/api/tauri-adapter.ts- Add event listeners for summary progress
Backend:
src/noteflow/domain/entities/meeting.py- Add processing status fieldssrc/noteflow/domain/webhooks/events.py- Add new event typessrc/noteflow/grpc/_mixins/streaming/_asr.py- Add error event emissionsrc/noteflow/grpc/proto/noteflow.proto- Add ProcessingStatus message
Implementation Plan
Phase 1: Quick Wins (Low Risk)
Goal: Enable existing but disabled functionality
-
Enable auto-extract in MeetingDetail
// client/src/pages/MeetingDetail.tsx const { extract: extractEntities } = useEntityExtraction({ meetingId: id, meetingTitle: meeting?.title, meetingState: meeting?.state, autoExtract: true, // ADD THIS }); -
Add summary progress listener
// In MeetingDetail or new useSummaryProgress hook useEffect(() => { const unlisten = listen<{ meetingId: string; elapsed: number }>( 'summary_progress', (event) => setSummaryProgress(event.payload) ); return () => { unlisten.then(fn => fn()); }; }, []);
Phase 2: Processing Status (Low Risk)
Goal: Track what processing has been done
-
Add processing status to Meeting entity
# src/noteflow/domain/entities/meeting.py @dataclass class ProcessingStatus: summary_generated: bool = False entities_extracted: bool = False diarization_refined: bool = False @dataclass class Meeting: # ... existing fields ... processing_status: ProcessingStatus = field(default_factory=ProcessingStatus) -
Add ProcessingStatus to proto
message ProcessingStatus { bool summary_generated = 1; bool entities_extracted = 2; bool diarization_refined = 3; } message Meeting { // ... existing fields ... ProcessingStatus processing_status = 20; }
Phase 3: Orchestration Hook (Medium Risk)
Goal: Create unified post-processing orchestration
-
Create
usePostProcessinghook// client/src/hooks/use-post-processing.ts interface UsePostProcessingOptions { meetingId: string; meetingState: MeetingState; autoStart?: boolean; onComplete?: () => void; } interface ProcessingStep { name: 'summary' | 'entities' | 'diarization'; status: 'pending' | 'running' | 'completed' | 'failed' | 'skipped'; progress?: number; error?: string; } export function usePostProcessing(options: UsePostProcessingOptions) { const [steps, setSteps] = useState<ProcessingStep[]>([ { name: 'summary', status: 'pending' }, { name: 'entities', status: 'pending' }, { name: 'diarization', status: 'pending' }, ]); const api = useAPI(); const startProcessing = useCallback(async () => { // Run summary and entities in parallel const summaryPromise = runSummary(); const entitiesPromise = runEntities(); // Start diarization (polling-based) const diarizationPromise = runDiarization(); await Promise.allSettled([ summaryPromise, entitiesPromise, diarizationPromise, ]); }, [meetingId]); // Auto-start when meeting becomes completed useEffect(() => { if (autoStart && meetingState === 'completed') { startProcessing(); } }, [autoStart, meetingState, startProcessing]); return { steps, startProcessing, isProcessing, isComplete }; } -
Create
ProcessingStatuscomponent// client/src/components/meeting/processing-status.tsx export function ProcessingStatus({ steps }: { steps: ProcessingStep[] }) { return ( <Card> <CardHeader> <CardTitle>Processing</CardTitle> </CardHeader> <CardContent> {steps.map(step => ( <div key={step.name} className="flex items-center gap-2"> <StatusIcon status={step.status} /> <span>{stepLabels[step.name]}</span> {step.status === 'running' && step.progress && ( <Progress value={step.progress} /> )} </div> ))} </CardContent> </Card> ); }
Phase 4: Webhook Events (Low Risk)
Goal: Add missing webhook events for external integrations
-
Add new event types
# src/noteflow/domain/webhooks/events.py class WebhookEventType(str, Enum): RECORDING_STARTED = "recording.started" RECORDING_STOPPED = "recording.stopped" MEETING_COMPLETED = "meeting.completed" SUMMARY_GENERATED = "summary.generated" ENTITIES_EXTRACTED = "entities.extracted" # NEW DIARIZATION_COMPLETED = "diarization.completed" # NEW -
Fire events on completion
# In entities mixin await self._webhook_service.trigger_entities_extracted(meeting_id, entities) # In diarization mixin await self._webhook_service.trigger_diarization_completed(meeting_id, job_id)
Phase 5: Error Surfacing (Low Risk)
Goal: Make ASR and processing errors visible
- Add ASR error events
# src/noteflow/grpc/_mixins/streaming/_asr.py if audio.size == 0: logger.warning("Empty audio segment received", meeting_id=meeting_id) # Optionally emit metric return if not host.asr_engine: logger.error("ASR engine not available", meeting_id=meeting_id) # Could emit error event to client return
Deliverables
Backend
- Add
ProcessingStatusto Meeting entity - Add
ProcessingStatusto proto schema - Add
ENTITIES_EXTRACTEDwebhook event type - Add
DIARIZATION_COMPLETEDwebhook event type - Fire webhook events on processing completion
- Add ASR error logging with meeting context
- Update GetMeeting to include processing status
Client
- Create
usePostProcessingorchestration hook - Create
ProcessingStatusUI component - Enable
autoExtract: truein MeetingDetail - Add summary progress event listener
- Add auto-diarization trigger
- Wire
usePostProcessinginto MeetingDetail
Tests
- Unit test:
usePostProcessinghook state transitions - Unit test:
ProcessingStatuscomponent rendering - Integration test: Full post-processing flow
- E2E test: Recording → Processing → Complete flow
Test Strategy
Fixtures
- Mock meeting with
state: 'completed' - Mock API responses for each processing step
- Mock Tauri events for progress
Test Cases
| Case | Input | Expected |
|---|---|---|
| Auto-trigger on mount | Meeting with state=completed, no summary | Summary generation starts |
| Parallel processing | Meeting ready for processing | Summary + entities run in parallel |
| Individual failure | Summary fails, others succeed | Shows summary as failed, others complete |
| Progress tracking | Summary in progress | Shows progress percentage |
| Skip if done | Meeting with existing summary | Skips summary, runs others |
| Processing complete | All steps done | Shows complete badge, fires callback |
E2E Test Plan
// client/e2e/post-processing.spec.ts
test('automatically processes meeting after recording stops', async ({ page }) => {
// 1. Start recording
await page.click('[data-testid="start-recording"]');
// 2. Record for a few seconds
await page.waitForTimeout(3000);
// 3. Stop recording
await page.click('[data-testid="stop-recording"]');
// 4. Should navigate to detail page
await expect(page).toHaveURL(/\/meetings\/.+/);
// 5. Should show processing status
await expect(page.locator('[data-testid="processing-status"]')).toBeVisible();
// 6. Wait for processing to complete
await expect(page.locator('[data-testid="summary-status"]')).toHaveAttribute(
'data-status',
'completed',
{ timeout: 60000 }
);
// 7. Summary should be visible
await expect(page.locator('[data-testid="summary-content"]')).toBeVisible();
});
Quality Gates
- All processing triggers automatically on meeting completion
- Progress visible for all processing steps
- Failures don't block other processing steps
- Processing status persisted and queryable
- Webhook events fire for all processing completions
- E2E test validates full flow
- No regression in manual trigger paths
Migration Strategy
Rollout Order
- Phase 1 (Day 1): Enable auto-extract - immediate improvement, no new code
- Phase 2 (Day 2-3): Add processing status tracking - backend changes
- Phase 3 (Day 4-5): Create orchestration hook - main feature
- Phase 4 (Day 6): Add webhook events - external integrations
- Phase 5 (Day 7): Add error surfacing - observability
Rollback Plan
Each phase is independently reversible:
- Phase 1: Set
autoExtract: false - Phase 2: Processing status optional, can ignore
- Phase 3: Hook is additive, can disable
- Phase 4: Webhook events are additive
- Phase 5: Error logging is additive
Dependencies
External Dependencies
- GAP-003 (Error Handling) - Proper error classification for processing failures
- GAP-004 (Diarization Lifecycle) - Diarization polling pattern
Feature Flags
NOTEFLOW_FEATURE_NER_ENABLED- Controls entity extraction availability- Consider:
NOTEFLOW_FEATURE_AUTO_PROCESSING- Enable/disable auto-processing
Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Processing overload on server | Medium | High | Rate limit concurrent processing per user |
| Long summarization blocks UI | Medium | Medium | Progress events + async processing |
| Failed processing not visible | High | Medium | ProcessingStatus component |
| Diarization timeout | Low | Low | Already handled by GAP-004 |
References
src/noteflow/grpc/_mixins/summarization.py- Summary generation RPCsrc/noteflow/grpc/_mixins/entities.py- Entity extraction RPCsrc/noteflow/grpc/_mixins/diarization/- Diarization refinementclient/src/hooks/use-diarization.ts- Polling pattern to reuseclient/src/hooks/use-entity-extraction.ts- Auto-extract pattern- GAP-003 - Error handling patterns
- GAP-004 - Diarization lifecycle patterns