Files
noteflow/docs/sprints/phase-ongoing/.archive/sprint-gap-011-post-processing-pipeline/README.md
Travis Vasceannie 0a18f2d23d chore: update linting artifacts
- Updated basedpyright linting results (705 files analyzed, analysis time reduced from 22.928s to 13.105s).
- Updated biome linting artifact with warning about unnecessary hook dependency (preferencesVersion) in MeetingDetail.tsx.
2026-01-08 21:45:05 -05:00

28 KiB

SPRINT-GAP-011: Post-Processing Pipeline Gaps

Attribute Value
Sprint GAP-011
Size L (Large)
Owner TBD
Phase Hardening
Prerequisites GAP-003 (Error Handling), GAP-004 (Diarization Lifecycle)

Executive Summary

After a meeting recording completes, the system fails to automatically trigger post-processing workflows (summarization, entity extraction, diarization refinement). Users see only raw recordings without transcriptions, summaries, or extracted intelligence. The architecture has all the components but lacks orchestration to connect them.

Open Issues

  • Define post-processing trigger strategy (server-side vs client-side orchestration)
  • Determine parallel vs sequential execution of post-processing steps
  • Decide on failure handling for individual processing steps
  • Define retry policy for failed processing
  • Establish processing completion signals

Validation Status

Component Exists Status
Streaming transcription Yes Working
GenerateSummary RPC Yes Manual trigger only
ExtractEntities RPC Yes Manual trigger only
RefineSpeakerDiarization RPC Yes Manual trigger only
Auto-trigger on meeting stop No Gap - needs implementation
Processing completion signals No Gap - needs implementation
Progress tracking UI Partial Summary has events, others missing
Client-side orchestration No Gap - needs implementation

Objective

Implement automatic post-processing orchestration that triggers summarization, entity extraction, and diarization refinement after a meeting recording stops, with proper progress tracking, error handling, and completion signals.

Key Decisions

Decision Choice Rationale
Orchestration location Client-side Server should remain stateless; client knows user preferences
Processing order Parallel where possible Summarization and NER can run concurrently; diarization can start immediately
Failure handling Continue on failure One failed step shouldn't block others
Retry policy User-initiated Auto-retry risks resource exhaustion
Completion tracking Per-step status Enable partial success states

What Already Exists

Backend Infrastructure

Summarization Service (src/noteflow/grpc/_mixins/summarization.py)

  • GenerateSummary RPC fully functional
  • Returns cached summary if exists (unless force_regenerate=True)
  • Fires SUMMARY_GENERATED webhook on completion
  • Supports multiple providers (Cloud, Ollama, Mock)

Entity Extraction (src/noteflow/grpc/_mixins/entities.py)

  • ExtractEntities RPC fully functional
  • Feature flag gated (NOTEFLOW_FEATURE_NER_ENABLED)
  • Returns cached entities if exist (unless force_refresh=True)
  • Uses spaCy NER engine

Diarization Refinement (src/noteflow/grpc/_mixins/diarization/_mixin.py)

  • RefineSpeakerDiarization RPC launches background job
  • GetDiarizationJobStatus for polling
  • Persisted to database for recovery

Webhook Events (src/noteflow/domain/webhooks/events.py)

class WebhookEventType(str, Enum):
    RECORDING_STARTED = "recording.started"
    RECORDING_STOPPED = "recording.stopped"
    MEETING_COMPLETED = "meeting.completed"
    SUMMARY_GENERATED = "summary.generated"
    # Missing: ENTITIES_EXTRACTED, DIARIZATION_COMPLETED

Client Infrastructure

Tauri Adapter (client/src/api/tauri-adapter.ts)

// All RPCs available but require manual invocation
generateSummary(meetingId: string, forceRegenerate?: boolean): Promise<Summary>
extractEntities(meetingId: string, forceRefresh?: boolean): Promise<ExtractedEntity[]>
refineSpeakers(meetingId: string, numSpeakers?: number): Promise<{ jobId: string }>
getDiarizationJobStatus(jobId: string): Promise<DiarizationJobStatus>

Diarization Hook (client/src/hooks/use-diarization.ts)

  • Polling pattern with exponential backoff
  • Max duration aligned to server timeout
  • Auto-recovery on mount
  • Progress event handling

Entity Extraction Hook (client/src/hooks/use-entity-extraction.ts)

// Auto-extract feature EXISTS but is DISABLED
useEffect(() => {
  if (autoExtract && meetingId && meetingState === 'completed') {
    extract(false);  // This would work!
  }
}, [autoExtract, meetingId, meetingState, extract]);

Summary Progress Events (client/src-tauri/src/commands/summary.rs)

// Server emits progress events every second
emit_summary_progress(app.clone(), meeting_id.clone(), elapsed_s);
// But no React component listens for these

Identified Issues

Gap 1: No Automatic Post-Meeting Workflows (Critical)

Severity: Critical Impact: Core feature missing - users get no processed output

Location: src/noteflow/grpc/_mixins/meeting.py:246-285

async def StopMeeting(self, request, context):
    # ... state transition logic ...

    # Webhooks fire, but no processing triggers
    await self._fire_webhooks_safely(
        lambda: self._webhook_service.trigger_recording_stopped(...)
    )
    await self._fire_webhooks_safely(
        lambda: self._webhook_service.trigger_meeting_completed(...)
    )

    # GAP: No call to:
    # - GenerateSummary
    # - ExtractEntities
    # - RefineSpeakerDiarization

    return response

Problem: After StopMeeting() completes:

  • Webhooks fire for external integrations
  • But no internal processing is triggered
  • All processing requires explicit client RPC calls

Gap 2: Client Immediately Navigates Away (Critical)

Severity: Critical Impact: No opportunity for client-side orchestration

Location: client/src/pages/Recording.tsx:313-346

const stopRecording = async () => {
  setIsRecording(false);
  streamRef.current?.close();

  const stoppedMeeting = await api.stopMeeting(meeting.id);
  setMeeting(stoppedMeeting);

  // GAP: Immediately navigates away - no processing triggered
  navigate(
    projectId
      ? `/projects/${projectId}/meetings/${meeting.id}`
      : '/projects'
  );
};

Problem:

  • Stop button triggers navigation immediately
  • No post-processing orchestration before navigation
  • User lands on detail page with empty summary/entities

Gap 3: MeetingDetail Only Fetches, Doesn't Process (Critical)

Severity: Critical Impact: Viewing a meeting doesn't trigger missing processing

Location: client/src/pages/MeetingDetail.tsx:79-98

useEffect(() => {
  const loadMeeting = async () => {
    const data = await getAPI().getMeeting({
      meeting_id: id,
      include_segments: true,
      include_summary: true,  // Fetches existing, doesn't generate
    });
    setMeeting(data.meeting);
    setSegments(data.segments || []);
    setSummary(data.summary);  // null if not generated
  };

  loadMeeting();
}, [id]);

Problem:

  • include_summary: true only includes existing summary
  • Does not trigger generation if missing
  • User must manually click "Generate Summary" button

Gap 4: Auto-Extract Feature Disabled (Medium)

Severity: Medium Impact: Working feature not utilized

Location: client/src/hooks/use-entity-extraction.ts:116-121

// Feature exists but is never enabled
useEffect(() => {
  if (autoExtract && meetingId && meetingState === 'completed') {
    extract(false);
  }
}, [autoExtract, meetingId, meetingState, extract]);

Location: client/src/pages/MeetingDetail.tsx:72-76

const { extract: extractEntities } = useEntityExtraction({
  meetingId: id,
  meetingTitle: meeting?.title,
  meetingState: meeting?.state,
  // autoExtract: true  <-- MISSING
});

Problem:

  • Hook has auto-extract capability
  • Never enabled in consuming components
  • Simple fix: pass autoExtract: true

Gap 5: No Processing Completion Signals (High)

Severity: High Impact: Client cannot track what processing is done

Location: src/noteflow/domain/entities/meeting.py

class Meeting:
    id: MeetingId
    title: str
    state: MeetingState
    # ... other fields ...

    # GAP: No processing status fields
    # Missing:
    # - transcription_complete: bool
    # - summary_generated: bool
    # - entities_extracted: bool
    # - diarization_refined: bool

Problem:

  • Meeting entity has no processing status tracking
  • Client must query each subsystem separately
  • No way to show "processing complete" badge

Gap 6: Summary Progress Events Ignored (Medium)

Severity: Medium Impact: No progress UI during summarization

Location: client/src-tauri/src/commands/summary.rs:96-134

tauri::async_runtime::spawn(async move {
    let mut interval = tokio::time::interval(Duration::from_secs(1));
    loop {
        interval.tick().await;
        let elapsed_s = start.elapsed().as_secs();

        // Emits event but nothing listens
        emit_summary_progress(app.clone(), meeting_id.clone(), elapsed_s);

        if elapsed_s >= 300 { break; }
    }
});

Problem:

  • Server emits summary_progress events every second
  • No React component subscribes to these events
  • User sees no progress during potentially long summarization

Gap 7: No Diarization Auto-Trigger (Medium)

Severity: Medium Impact: Speaker labels never refined automatically

Location: client/src/hooks/use-diarization.ts

// Hook provides start() function but nothing calls it automatically
export function useDiarization(options: UseDiarizationOptions = {}) {
  const start = useCallback(async (meetingId: string, numSpeakers?: number) => {
    // ... implementation
  }, [api]);

  // No auto-start logic exists

  return { state, start, cancel, poll, recover };
}

Location: client/src/pages/MeetingDetail.tsx

  • No call to start(meetingId) on mount or meeting completion
  • Diarization button exists but requires manual click

Problem:

  • Diarization hook is well-designed with polling
  • But never triggered automatically after recording
  • User must manually click "Refine Speakers" button

Gap 8: Silent ASR Error Handling (Medium)

Severity: Medium Impact: Transcription failures invisible to user

Location: src/noteflow/grpc/_mixins/streaming/_asr.py:45-60

async def process_audio_segment(
    host: "ServicerHost",
    audio: np.ndarray,
    # ...
) -> AsyncIterator[TranscriptUpdate]:
    if audio.size == 0:
        return  # Silent return - no logging

    if not host.asr_engine:
        return  # Silent return - user not informed

    # ... processing

Problem:

  • Empty audio segments silently ignored
  • ASR engine unavailability silently skipped
  • No user notification of processing failures
  • Only visible in server logs

Architecture

Current State (Broken)

Recording.tsx                    Server                         MeetingDetail.tsx
     │                             │                                  │
     │ stopRecording()             │                                  │
     ├────────────────────────────>│                                  │
     │                             │ StopMeeting()                    │
     │                             │  ├─ State → STOPPED              │
     │                             │  ├─ Webhooks fired               │
     │                             │  └─ [NO PROCESSING]              │
     │<────────────────────────────┤                                  │
     │                             │                                  │
     │ navigate(/meeting/id) ──────┼──────────────────────────────────>
     │                             │                                  │
     │                             │                  getMeeting()    │
     │                             │<─────────────────────────────────┤
     │                             │  include_summary: true           │
     │                             │  (fetches null - not generated)  │
     │                             ├─────────────────────────────────>│
     │                             │                                  │
     │                             │              Shows empty summary │

Target State (Fixed)

Recording.tsx                    Server                         MeetingDetail.tsx
     │                             │                                  │
     │ stopRecording()             │                                  │
     ├────────────────────────────>│                                  │
     │                             │ StopMeeting()                    │
     │                             │  ├─ State → STOPPED              │
     │                             │  └─ Webhooks fired               │
     │<────────────────────────────┤                                  │
     │                             │                                  │
     │ navigate(/meeting/id) ──────┼──────────────────────────────────>
     │                             │                                  │
     │                             │                  getMeeting()    │
     │                             │<─────────────────────────────────┤
     │                             ├─────────────────────────────────>│
     │                             │                                  │
     │                             │   ┌─────────────────────────────┐│
     │                             │   │ usePostProcessing() hook    ││
     │                             │   │  ├─ Check processing status ││
     │                             │   │  ├─ Trigger if needed:      ││
     │                             │   │  │   ├─ generateSummary()   ││
     │                             │   │  │   ├─ extractEntities()   ││
     │                             │   │  │   └─ refineSpeakers()    ││
     │                             │   │  └─ Track progress          ││
     │                             │   └─────────────────────────────┘│
     │                             │                                  │
     │                             │<── generateSummary() ────────────┤
     │                             │────────────────────────────────>│
     │                             │        Progress events...        │
     │                             │<── extractEntities() ────────────┤
     │                             │────────────────────────────────>│
     │                             │<── refineSpeakers() ─────────────┤
     │                             │────────────────────────────────>│
     │                             │        Poll status...            │
     │                             │                                  │
     │                             │              All complete        │
     │                             │              Show full meeting   │

Scope

Task Breakdown

Task Effort Priority Description
Create usePostProcessing hook M P0 Orchestrates all post-processing with progress tracking
Add processing status to Meeting entity S P0 Track what processing has been done
Enable auto-extract in MeetingDetail S P0 Pass autoExtract: true to entity hook
Add summary progress listener S P1 Subscribe to summary_progress events
Create auto-diarization trigger S P1 Trigger refinement on meeting completion
Add ProcessingStatus component M P1 Shows progress for all processing steps
Add missing webhook events S P2 ENTITIES_EXTRACTED, DIARIZATION_COMPLETED
Add ASR error surfacing S P2 Emit events for ASR failures
Add processing status to proto S P2 Include in GetMeeting response

Files to Create

Client:

  • client/src/hooks/use-post-processing.ts - Orchestration hook
  • client/src/components/meeting/processing-status.tsx - Progress UI component

Files to Modify

Client:

  • client/src/pages/MeetingDetail.tsx - Add usePostProcessing hook, enable auto-extract
  • client/src/hooks/use-entity-extraction.ts - Minor fixes if needed
  • client/src/hooks/use-diarization.ts - Add auto-start capability
  • client/src/api/tauri-adapter.ts - Add event listeners for summary progress

Backend:

  • src/noteflow/domain/entities/meeting.py - Add processing status fields
  • src/noteflow/domain/webhooks/events.py - Add new event types
  • src/noteflow/grpc/_mixins/streaming/_asr.py - Add error event emission
  • src/noteflow/grpc/proto/noteflow.proto - Add ProcessingStatus message

Implementation Plan

Phase 1: Quick Wins (Low Risk)

Goal: Enable existing but disabled functionality

  1. Enable auto-extract in MeetingDetail

    // client/src/pages/MeetingDetail.tsx
    const { extract: extractEntities } = useEntityExtraction({
      meetingId: id,
      meetingTitle: meeting?.title,
      meetingState: meeting?.state,
      autoExtract: true,  // ADD THIS
    });
    
  2. Add summary progress listener

    // In MeetingDetail or new useSummaryProgress hook
    useEffect(() => {
      const unlisten = listen<{ meetingId: string; elapsed: number }>(
        'summary_progress',
        (event) => setSummaryProgress(event.payload)
      );
      return () => { unlisten.then(fn => fn()); };
    }, []);
    

Phase 2: Processing Status (Low Risk)

Goal: Track what processing has been done

  1. Add processing status to Meeting entity

    # src/noteflow/domain/entities/meeting.py
    @dataclass
    class ProcessingStatus:
        summary_generated: bool = False
        entities_extracted: bool = False
        diarization_refined: bool = False
    
    @dataclass
    class Meeting:
        # ... existing fields ...
        processing_status: ProcessingStatus = field(default_factory=ProcessingStatus)
    
  2. Add ProcessingStatus to proto

    message ProcessingStatus {
      bool summary_generated = 1;
      bool entities_extracted = 2;
      bool diarization_refined = 3;
    }
    
    message Meeting {
      // ... existing fields ...
      ProcessingStatus processing_status = 20;
    }
    

Phase 3: Orchestration Hook (Medium Risk)

Goal: Create unified post-processing orchestration

  1. Create usePostProcessing hook

    // client/src/hooks/use-post-processing.ts
    interface UsePostProcessingOptions {
      meetingId: string;
      meetingState: MeetingState;
      autoStart?: boolean;
      onComplete?: () => void;
    }
    
    interface ProcessingStep {
      name: 'summary' | 'entities' | 'diarization';
      status: 'pending' | 'running' | 'completed' | 'failed' | 'skipped';
      progress?: number;
      error?: string;
    }
    
    export function usePostProcessing(options: UsePostProcessingOptions) {
      const [steps, setSteps] = useState<ProcessingStep[]>([
        { name: 'summary', status: 'pending' },
        { name: 'entities', status: 'pending' },
        { name: 'diarization', status: 'pending' },
      ]);
    
      const api = useAPI();
    
      const startProcessing = useCallback(async () => {
        // Run summary and entities in parallel
        const summaryPromise = runSummary();
        const entitiesPromise = runEntities();
    
        // Start diarization (polling-based)
        const diarizationPromise = runDiarization();
    
        await Promise.allSettled([
          summaryPromise,
          entitiesPromise,
          diarizationPromise,
        ]);
      }, [meetingId]);
    
      // Auto-start when meeting becomes completed
      useEffect(() => {
        if (autoStart && meetingState === 'completed') {
          startProcessing();
        }
      }, [autoStart, meetingState, startProcessing]);
    
      return { steps, startProcessing, isProcessing, isComplete };
    }
    
  2. Create ProcessingStatus component

    // client/src/components/meeting/processing-status.tsx
    export function ProcessingStatus({ steps }: { steps: ProcessingStep[] }) {
      return (
        <Card>
          <CardHeader>
            <CardTitle>Processing</CardTitle>
          </CardHeader>
          <CardContent>
            {steps.map(step => (
              <div key={step.name} className="flex items-center gap-2">
                <StatusIcon status={step.status} />
                <span>{stepLabels[step.name]}</span>
                {step.status === 'running' && step.progress && (
                  <Progress value={step.progress} />
                )}
              </div>
            ))}
          </CardContent>
        </Card>
      );
    }
    

Phase 4: Webhook Events (Low Risk)

Goal: Add missing webhook events for external integrations

  1. Add new event types

    # src/noteflow/domain/webhooks/events.py
    class WebhookEventType(str, Enum):
        RECORDING_STARTED = "recording.started"
        RECORDING_STOPPED = "recording.stopped"
        MEETING_COMPLETED = "meeting.completed"
        SUMMARY_GENERATED = "summary.generated"
        ENTITIES_EXTRACTED = "entities.extracted"  # NEW
        DIARIZATION_COMPLETED = "diarization.completed"  # NEW
    
  2. Fire events on completion

    # In entities mixin
    await self._webhook_service.trigger_entities_extracted(meeting_id, entities)
    
    # In diarization mixin
    await self._webhook_service.trigger_diarization_completed(meeting_id, job_id)
    

Phase 5: Error Surfacing (Low Risk)

Goal: Make ASR and processing errors visible

  1. Add ASR error events
    # src/noteflow/grpc/_mixins/streaming/_asr.py
    if audio.size == 0:
        logger.warning("Empty audio segment received", meeting_id=meeting_id)
        # Optionally emit metric
        return
    
    if not host.asr_engine:
        logger.error("ASR engine not available", meeting_id=meeting_id)
        # Could emit error event to client
        return
    

Deliverables

Backend

  • Add ProcessingStatus to Meeting entity
  • Add ProcessingStatus to proto schema
  • Add ENTITIES_EXTRACTED webhook event type
  • Add DIARIZATION_COMPLETED webhook event type
  • Fire webhook events on processing completion
  • Add ASR error logging with meeting context
  • Update GetMeeting to include processing status

Client

  • Create usePostProcessing orchestration hook
  • Create ProcessingStatus UI component
  • Enable autoExtract: true in MeetingDetail
  • Add summary progress event listener
  • Add auto-diarization trigger
  • Wire usePostProcessing into MeetingDetail

Tests

  • Unit test: usePostProcessing hook state transitions
  • Unit test: ProcessingStatus component rendering
  • Integration test: Full post-processing flow
  • E2E test: Recording → Processing → Complete flow

Test Strategy

Fixtures

  • Mock meeting with state: 'completed'
  • Mock API responses for each processing step
  • Mock Tauri events for progress

Test Cases

Case Input Expected
Auto-trigger on mount Meeting with state=completed, no summary Summary generation starts
Parallel processing Meeting ready for processing Summary + entities run in parallel
Individual failure Summary fails, others succeed Shows summary as failed, others complete
Progress tracking Summary in progress Shows progress percentage
Skip if done Meeting with existing summary Skips summary, runs others
Processing complete All steps done Shows complete badge, fires callback

E2E Test Plan

// client/e2e/post-processing.spec.ts
test('automatically processes meeting after recording stops', async ({ page }) => {
  // 1. Start recording
  await page.click('[data-testid="start-recording"]');

  // 2. Record for a few seconds
  await page.waitForTimeout(3000);

  // 3. Stop recording
  await page.click('[data-testid="stop-recording"]');

  // 4. Should navigate to detail page
  await expect(page).toHaveURL(/\/meetings\/.+/);

  // 5. Should show processing status
  await expect(page.locator('[data-testid="processing-status"]')).toBeVisible();

  // 6. Wait for processing to complete
  await expect(page.locator('[data-testid="summary-status"]')).toHaveAttribute(
    'data-status',
    'completed',
    { timeout: 60000 }
  );

  // 7. Summary should be visible
  await expect(page.locator('[data-testid="summary-content"]')).toBeVisible();
});

Quality Gates

  • All processing triggers automatically on meeting completion
  • Progress visible for all processing steps
  • Failures don't block other processing steps
  • Processing status persisted and queryable
  • Webhook events fire for all processing completions
  • E2E test validates full flow
  • No regression in manual trigger paths

Migration Strategy

Rollout Order

  1. Phase 1 (Day 1): Enable auto-extract - immediate improvement, no new code
  2. Phase 2 (Day 2-3): Add processing status tracking - backend changes
  3. Phase 3 (Day 4-5): Create orchestration hook - main feature
  4. Phase 4 (Day 6): Add webhook events - external integrations
  5. Phase 5 (Day 7): Add error surfacing - observability

Rollback Plan

Each phase is independently reversible:

  • Phase 1: Set autoExtract: false
  • Phase 2: Processing status optional, can ignore
  • Phase 3: Hook is additive, can disable
  • Phase 4: Webhook events are additive
  • Phase 5: Error logging is additive

Dependencies

External Dependencies

  • GAP-003 (Error Handling) - Proper error classification for processing failures
  • GAP-004 (Diarization Lifecycle) - Diarization polling pattern

Feature Flags

  • NOTEFLOW_FEATURE_NER_ENABLED - Controls entity extraction availability
  • Consider: NOTEFLOW_FEATURE_AUTO_PROCESSING - Enable/disable auto-processing

Risks and Mitigations

Risk Likelihood Impact Mitigation
Processing overload on server Medium High Rate limit concurrent processing per user
Long summarization blocks UI Medium Medium Progress events + async processing
Failed processing not visible High Medium ProcessingStatus component
Diarization timeout Low Low Already handled by GAP-004

References

  • src/noteflow/grpc/_mixins/summarization.py - Summary generation RPC
  • src/noteflow/grpc/_mixins/entities.py - Entity extraction RPC
  • src/noteflow/grpc/_mixins/diarization/ - Diarization refinement
  • client/src/hooks/use-diarization.ts - Polling pattern to reuse
  • client/src/hooks/use-entity-extraction.ts - Auto-extract pattern
  • GAP-003 - Error handling patterns
  • GAP-004 - Diarization lifecycle patterns