8.1 KiB
SPRINT-GAP-003: Error Handling Mismatches
| Attribute | Value |
|---|---|
| Sprint | GAP-003 |
| Size | M (Medium) |
| Owner | TBD |
| Phase | Hardening |
| Prerequisites | None |
Open Issues
- Define error severity levels for user-facing vs silent errors
- Decide on error aggregation strategy (toast vs panel vs inline)
- Determine retry eligibility by error type
Validation Status
| Component | Exists | Needs Work |
|---|---|---|
| gRPC error helpers | Yes | Complete |
| Domain error mapping | Yes | Complete |
| Client error emission | Partial | Inconsistent |
| Error recovery UI | No | Required |
Objective
Establish consistent error handling across the backend-client boundary, ensuring all errors are appropriately surfaced, logged, and actionable while maintaining system stability.
Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Silenced errors | Eliminate .catch(() => {}) |
Hidden failures cause debugging nightmares |
| Error classification | Retry/Fatal/Transient | Different handling for each type |
| User notification | Severity-based | Critical → toast, Warning → inline, Info → log |
| Webhook failures | Log + metrics only | Fire-and-forget by design, but with visibility |
What Already Exists
Backend Error Infrastructure (src/noteflow/grpc/_mixins/errors.py)
abort_not_found(),abort_invalid_argument(), etc.DomainError→ gRPC status mappingdomain_error_handlerdecorator- Feature requirement helpers (
require_feature_*)
Client Error Patterns
emit_error()in Tauri commandsErrorEventtype for Tauri events- Connection error state tracking
Identified Issues
1. Silenced Errors Pattern (Critical)
Multiple locations use .catch(() => {}) to ignore errors:
Location 1: client/src/api/tauri-adapter.ts:107
this.invoke(TauriCommands.SEND_AUDIO_CHUNK, args).catch((_err) => {});
Location 2: client/src/api/tauri-adapter.ts:124
this.invoke(TauriCommands.STOP_RECORDING, { meeting_id: this.meetingId }).catch((_err) => {});
Location 3: client/src/api/index.ts:51
await startTauriEventBridge().catch((_error) => {});
Location 4: client/src/lib/preferences.ts:234
validateCachedIntegrations().catch(() => {});
Location 5: client/src/contexts/project-context.tsx:134
.catch(() => {});
Location 6: client/src/api/cached-adapter.ts:134
await startTauriEventBridge().catch(() => {});
Problem: Errors are silently discarded:
- No logging for debugging
- No user notification for actionable errors
- No metrics for monitoring
- Failures appear as success to calling code
2. Inconsistent Error Emission (Medium)
Location: client/src-tauri/src/commands/diarization.rs:51-54
Err(err) => {
emit_error(&app, "diarization_error", &err);
Err(err)
}
Problem: Some commands emit errors, others don't:
refine_speaker_diarizationemits errorsrename_speakerdoesn't emit (line 81-86)- Inconsistent user experience
3. Webhook Failures Invisible (Low - By Design)
Location: src/noteflow/grpc/_mixins/meeting.py:187-192
try:
await self._webhook_service.trigger_recording_stopped(...)
except Exception:
logger.exception("Failed to trigger recording.stopped webhooks")
Problem: Fire-and-forget is correct for webhooks, but:
- No metrics emitted for failure rate
- No visibility into webhook health
- Failures only visible in logs
4. Error Type Information Lost (Medium)
Problem: gRPC errors are well-structured on backend but client often reduces to string:
Backend:
await context.abort(grpc.StatusCode.NOT_FOUND, f"Meeting {meeting_id} not found")
Client:
Err(err) => {
emit_error(&app, "diarization_error", &err); // Just error code, no status
...
}
Lost information:
- gRPC status code (NOT_FOUND, INVALID_ARGUMENT, etc.)
- Whether error is retryable
- Error category for appropriate UI handling
Scope
Task Breakdown
| Task | Effort | Description |
|---|---|---|
Replace .catch(() => {}) with logging |
S | Log all caught errors with context |
| Add error classification | M | Categorize errors as Retry/Fatal/Transient |
| Preserve gRPC status in client | M | Include status code in error events |
| Add error metrics | S | Emit metrics for error rates by type |
| Consistent error emission | S | All Tauri commands emit errors |
| Webhook failure metrics | S | Track webhook delivery success/failure |
Files to Modify
Client:
client/src/api/tauri-adapter.ts- Replace silent catchesclient/src/api/index.ts- Log event bridge errorsclient/src/api/cached-adapter.ts- Log event bridge errorsclient/src/lib/preferences.ts- Surface validation errorsclient/src/contexts/project-context.tsx- Handle errorsclient/src-tauri/src/commands/*.rs- Consistent error emissionclient/src/types/errors.ts- Add error classification
Backend:
src/noteflow/grpc/_mixins/meeting.py- Add webhook metricssrc/noteflow/application/services/webhook_service.py- Add metrics
Error Classification Schema
enum ErrorSeverity {
Fatal = 'fatal', // Unrecoverable, show dialog
Retryable = 'retryable', // Can retry, show with retry button
Transient = 'transient', // Temporary, auto-retry
Warning = 'warning', // Show inline, don't block
Info = 'info', // Log only, no UI
}
enum ErrorCategory {
Network = 'network',
Auth = 'auth',
Validation = 'validation',
NotFound = 'not_found',
Server = 'server',
Client = 'client',
}
interface ClassifiedError {
message: string;
code: string;
severity: ErrorSeverity;
category: ErrorCategory;
retryable: boolean;
grpcStatus?: number;
context?: Record<string, unknown>;
}
Migration Strategy
Phase 1: Add Logging (Low Risk)
- Replace
.catch(() => {})with.catch(logError) - No user-facing changes
- Immediate debugging improvement
Phase 2: Add Classification (Low Risk)
- Classify errors at emission point
- Add gRPC status preservation
- Backward compatible
Phase 3: Consistent Emission (Medium Risk)
- All Tauri commands emit errors
- UI handles new error events
- May surface previously hidden errors
Phase 4: User-Facing Improvements (Medium Risk)
- Add retry buttons for retryable errors
- Add error aggregation panel
- May change user experience
Deliverables
Backend
- Add webhook delivery metrics (success/failure/retry counts)
- Emit structured error events for observability
Client
- Zero
.catch(() => {})patterns - Error classification utility
- Consistent error emission from all Tauri commands
- Error logging with context
- Metrics emission for error rates
Shared
- Error classification schema documentation
- gRPC status → severity mapping
Tests
- Unit test: error classification logic
- Integration test: error propagation end-to-end
- E2E test: error UI rendering
Test Strategy
Fixtures
- Mock server that returns various error codes
- Network failure simulation
- Invalid request payloads
Test Cases
| Case | Input | Expected |
|---|---|---|
| gRPC NOT_FOUND | Get non-existent meeting | Error with severity=Warning, category=NotFound |
| gRPC UNAVAILABLE | Server down | Error with severity=Retryable, category=Network |
| gRPC INTERNAL | Server bug | Error with severity=Fatal, category=Server |
| Network timeout | Slow response | Error with severity=Retryable, category=Network |
| Validation error | Invalid UUID | Error with severity=Warning, category=Validation |
| Webhook failure | HTTP 500 from webhook | Logged, metric emitted, no user error |
Quality Gates
- Zero
.catch(() => {})in production code - All errors have classification
- All Tauri commands emit errors consistently
- Error metrics visible in observability dashboard
- gRPC status preserved in client errors
- Tests cover all error categories
- Documentation for error classification