Files

2026-01-02 04:22:40 +00:00

8.1 KiB

Raw Blame History

SPRINT-GAP-003: Error Handling Mismatches

Attribute	Value
Sprint	GAP-003
Size	M (Medium)
Owner	TBD
Phase	Hardening
Prerequisites	None

Open Issues

Define error severity levels for user-facing vs silent errors
Decide on error aggregation strategy (toast vs panel vs inline)
Determine retry eligibility by error type

Validation Status

Component	Exists	Needs Work
gRPC error helpers	Yes	Complete
Domain error mapping	Yes	Complete
Client error emission	Partial	Inconsistent
Error recovery UI	No	Required

Objective

Establish consistent error handling across the backend-client boundary, ensuring all errors are appropriately surfaced, logged, and actionable while maintaining system stability.

Key Decisions

Decision	Choice	Rationale
Silenced errors	Eliminate `.catch(() => {})`	Hidden failures cause debugging nightmares
Error classification	Retry/Fatal/Transient	Different handling for each type
User notification	Severity-based	Critical → toast, Warning → inline, Info → log
Webhook failures	Log + metrics only	Fire-and-forget by design, but with visibility

What Already Exists

Backend Error Infrastructure (`src/noteflow/grpc/_mixins/errors.py`)

abort_not_found(), abort_invalid_argument(), etc.
DomainError → gRPC status mapping
domain_error_handler decorator
Feature requirement helpers (require_feature_*)

Client Error Patterns

emit_error() in Tauri commands
ErrorEvent type for Tauri events
Connection error state tracking

Identified Issues

1. Silenced Errors Pattern (Critical)

Multiple locations use .catch(() => {}) to ignore errors:

Location 1: client/src/api/tauri-adapter.ts:107

this.invoke(TauriCommands.SEND_AUDIO_CHUNK, args).catch((_err) => {});

Location 2: client/src/api/tauri-adapter.ts:124

this.invoke(TauriCommands.STOP_RECORDING, { meeting_id: this.meetingId }).catch((_err) => {});

Location 3: client/src/api/index.ts:51

await startTauriEventBridge().catch((_error) => {});

Location 4: client/src/lib/preferences.ts:234

validateCachedIntegrations().catch(() => {});

Location 5: client/src/contexts/project-context.tsx:134

.catch(() => {});

Location 6: client/src/api/cached-adapter.ts:134

await startTauriEventBridge().catch(() => {});

Problem: Errors are silently discarded:

No logging for debugging
No user notification for actionable errors
No metrics for monitoring
Failures appear as success to calling code

2. Inconsistent Error Emission (Medium)

Location: client/src-tauri/src/commands/diarization.rs:51-54

Err(err) => {
    emit_error(&app, "diarization_error", &err);
    Err(err)
}

Problem: Some commands emit errors, others don't:

refine_speaker_diarization emits errors
rename_speaker doesn't emit (line 81-86)
Inconsistent user experience

3. Webhook Failures Invisible (Low - By Design)

Location: src/noteflow/grpc/_mixins/meeting.py:187-192

try:
    await self._webhook_service.trigger_recording_stopped(...)
except Exception:
    logger.exception("Failed to trigger recording.stopped webhooks")

Problem: Fire-and-forget is correct for webhooks, but:

No metrics emitted for failure rate
No visibility into webhook health
Failures only visible in logs

4. Error Type Information Lost (Medium)

Problem: gRPC errors are well-structured on backend but client often reduces to string:

Backend:

await context.abort(grpc.StatusCode.NOT_FOUND, f"Meeting {meeting_id} not found")

Client:

Err(err) => {
    emit_error(&app, "diarization_error", &err);  // Just error code, no status
    ...
}

Lost information:

gRPC status code (NOT_FOUND, INVALID_ARGUMENT, etc.)
Whether error is retryable
Error category for appropriate UI handling

Scope

Task Breakdown

Task	Effort	Description
Replace `.catch(() => {})` with logging	S	Log all caught errors with context
Add error classification	M	Categorize errors as Retry/Fatal/Transient
Preserve gRPC status in client	M	Include status code in error events
Add error metrics	S	Emit metrics for error rates by type
Consistent error emission	S	All Tauri commands emit errors
Webhook failure metrics	S	Track webhook delivery success/failure

Files to Modify

Client:

client/src/api/tauri-adapter.ts - Replace silent catches
client/src/api/index.ts - Log event bridge errors
client/src/api/cached-adapter.ts - Log event bridge errors
client/src/lib/preferences.ts - Surface validation errors
client/src/contexts/project-context.tsx - Handle errors
client/src-tauri/src/commands/*.rs - Consistent error emission
client/src/types/errors.ts - Add error classification

Backend:

src/noteflow/grpc/_mixins/meeting.py - Add webhook metrics
src/noteflow/application/services/webhook_service.py - Add metrics

Error Classification Schema

enum ErrorSeverity {
  Fatal = 'fatal',       // Unrecoverable, show dialog
  Retryable = 'retryable', // Can retry, show with retry button
  Transient = 'transient', // Temporary, auto-retry
  Warning = 'warning',   // Show inline, don't block
  Info = 'info',         // Log only, no UI
}

enum ErrorCategory {
  Network = 'network',
  Auth = 'auth',
  Validation = 'validation',
  NotFound = 'not_found',
  Server = 'server',
  Client = 'client',
}

interface ClassifiedError {
  message: string;
  code: string;
  severity: ErrorSeverity;
  category: ErrorCategory;
  retryable: boolean;
  grpcStatus?: number;
  context?: Record<string, unknown>;
}

Migration Strategy

Phase 1: Add Logging (Low Risk)

Replace .catch(() => {}) with .catch(logError)
No user-facing changes
Immediate debugging improvement

Phase 2: Add Classification (Low Risk)

Classify errors at emission point
Add gRPC status preservation
Backward compatible

Phase 3: Consistent Emission (Medium Risk)

All Tauri commands emit errors
UI handles new error events
May surface previously hidden errors

Phase 4: User-Facing Improvements (Medium Risk)

Add retry buttons for retryable errors
Add error aggregation panel
May change user experience

Deliverables

Backend

Add webhook delivery metrics (success/failure/retry counts)
Emit structured error events for observability

Client

Zero .catch(() => {}) patterns
Error classification utility
Consistent error emission from all Tauri commands
Error logging with context
Metrics emission for error rates

Shared

Error classification schema documentation
gRPC status → severity mapping

Tests

Unit test: error classification logic
Integration test: error propagation end-to-end
E2E test: error UI rendering

Test Strategy

Fixtures

Mock server that returns various error codes
Network failure simulation
Invalid request payloads

Test Cases

Case	Input	Expected
gRPC NOT_FOUND	Get non-existent meeting	Error with severity=Warning, category=NotFound
gRPC UNAVAILABLE	Server down	Error with severity=Retryable, category=Network
gRPC INTERNAL	Server bug	Error with severity=Fatal, category=Server
Network timeout	Slow response	Error with severity=Retryable, category=Network
Validation error	Invalid UUID	Error with severity=Warning, category=Validation
Webhook failure	HTTP 500 from webhook	Logged, metric emitted, no user error

Quality Gates

Zero .catch(() => {}) in production code
All errors have classification
All Tauri commands emit errors consistently
Error metrics visible in observability dashboard
gRPC status preserved in client errors
Tests cover all error categories
Documentation for error classification

8.1 KiB Raw Blame History