Files
noteflow/docs/sprints/phase-gaps/sprint-gap-002-state-sync-gaps.md
2026-01-02 04:22:40 +00:00

266 lines
8.1 KiB
Markdown

# SPRINT-GAP-002: State Synchronization Gaps
| Attribute | Value |
|-----------|-------|
| **Sprint** | GAP-002 |
| **Size** | M (Medium) |
| **Owner** | TBD |
| **Phase** | Hardening |
| **Prerequisites** | None |
## Open Issues
- [ ] Determine cache invalidation strategy (push vs poll vs hybrid)
- [ ] Define acceptable staleness window for meeting data
- [ ] Decide on WebSocket/SSE vs polling for real-time updates
## Validation Status
| Component | Exists | Needs Work |
|-----------|--------|------------|
| Meeting cache | Yes | Needs invalidation |
| Sync run cache (backend) | Yes | Client unaware of TTL |
| Active project resolution | Yes | Client unaware of implicit selection |
| Integration ID validation | Yes | Partial implementation |
## Objective
Ensure consistent state between backend and client by implementing proper cache invalidation, explicit state communication, and recovery mechanisms for stale data.
## Key Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Cache invalidation | Event-driven + polling fallback | Real-time when connected, polling for recovery |
| Staleness window | 30 seconds | Balance freshness vs server load |
| Active project sync | Explicit API response | Server should return resolved project_id |
| Sync run status | Polling with backoff | Already implemented, needs resilience |
## What Already Exists
### Backend State Management
- `_sync_runs` in-memory cache with 60-second TTL (`sync.py:146`)
- Active project resolution at meeting creation time
- Diarization job status tracking (DB + memory fallback)
### Client State Management
- `meetingCache` in `lib/cache/meeting-cache.ts`
- Connection state machine in `connection-state.ts`
- Reconnection logic in `reconnection.ts`
- Integration ID caching in preferences
## Identified Issues
### 1. Meeting Cache Never Invalidates (High)
**Location**: `client/src/api/tauri-adapter.ts:257-286`
```typescript
async createMeeting(request: CreateMeetingRequest): Promise<Meeting> {
const meeting = await invoke<Meeting>(TauriCommands.CREATE_MEETING, {...});
meetingCache.cacheMeeting(meeting); // Cached
return meeting;
}
```
**Problem**: Meetings are cached on create/fetch but never invalidated:
- Server-side state changes (stop, complete) not reflected
- Another client's modifications invisible
- Stale data shown after server restart
**Impact**: Users see outdated meeting states, segments, summaries.
### 2. Sync Run Cache TTL Invisible to Client (Medium)
**Location**: `src/noteflow/grpc/_mixins/sync.py:143-146`
```python
finally:
# Clean up cache after a delay (keep for status queries)
await asyncio.sleep(60)
cache.pop(sync_run_id, None)
```
**Problem**: Backend clears sync run from cache after 60 seconds, but client:
- Continues to poll `GetSyncStatus` expecting data
- Receives NOT_FOUND after TTL expires
- No distinction between "completed and expired" vs "never existed"
### 3. Active Project Silently Resolved (Medium)
**Location**: `src/noteflow/grpc/_mixins/meeting.py:100-101`
```python
if project_id is None:
project_id = await _resolve_active_project_id(self, repo)
```
**Problem**: When client doesn't send `project_id`:
- Server resolves from workspace context
- Client doesn't know which project was used
- UI may show meeting in wrong project context
### 4. Integration ID Validation Fire-and-Forget (Low)
**Location**: `client/src/lib/preferences.ts:234`
```typescript
validateCachedIntegrations().catch(() => {});
```
**Problem**: Integration validation errors are silently ignored:
- Stale integration IDs remain in cache
- Operations fail with confusing errors later
- No user notification of invalid cached data
### 5. Reconnection Doesn't Sync State (Medium)
**Location**: `client/src/api/reconnection.ts:49-53`
```typescript
try {
await getAPI().connect();
resetReconnectAttempts();
setConnectionMode('connected');
setConnectionError(null);
} catch (error) { ... }
```
**Problem**: After reconnection:
- Active streams are not recovered
- Meeting states may be stale
- No synchronization of in-flight operations
## Scope
### Task Breakdown
| Task | Effort | Description |
|------|--------|-------------|
| Add meeting cache invalidation | M | Invalidate on reconnect, periodic refresh |
| Return resolved project_id in responses | S | Backend returns actual project_id used |
| Add sync run expiry to response | S | Include `expires_at` field |
| Add cache version header | S | Server sends version, client invalidates on mismatch |
| Implement state sync on reconnect | M | Refresh critical state after connection restored |
| Surface validation errors | S | Emit events for integration validation failures |
### Files to Modify
**Backend:**
- `src/noteflow/grpc/_mixins/meeting.py` - Return resolved project_id
- `src/noteflow/grpc/_mixins/sync.py` - Add expiry info to response
- `src/noteflow/grpc/proto/noteflow.proto` - Add fields
**Client:**
- `client/src/lib/cache/meeting-cache.ts` - Add invalidation
- `client/src/api/reconnection.ts` - Sync state on reconnect
- `client/src/lib/preferences.ts` - Surface validation errors
- `client/src/hooks/use-sync-status.ts` - Handle expiry
## API Schema Changes
### Meeting Response Enhancement
```protobuf
message CreateMeetingResponse {
Meeting meeting = 1;
// New: Explicit resolved project context
optional string resolved_project_id = 2;
}
```
### Sync Status Response Enhancement
```protobuf
message GetSyncStatusResponse {
string status = 1;
// Existing fields...
// New: When this sync run expires from cache
optional string expires_at = 10;
// New: Distinguish "not found" reasons
optional string not_found_reason = 11; // "expired" | "never_existed"
}
```
### Cache Versioning
```protobuf
message ServerInfo {
// Existing fields...
// New: Increment on breaking state changes
int64 state_version = 10;
}
```
## Migration Strategy
### Phase 1: Add Expiry Information (Low Risk)
- Add `expires_at` to sync run responses
- Client shows "Sync info expired" instead of error
- No breaking changes
### Phase 2: Add Resolved IDs (Low Risk)
- Return resolved `project_id` in meeting responses
- Client updates UI context accordingly
- Backward compatible (optional field)
### Phase 3: Implement Cache Invalidation (Medium Risk)
- Add cache version to server info
- Client invalidates on version mismatch
- Add event-driven invalidation for critical updates
### Phase 4: Reconnection Sync (Medium Risk)
- Refresh active meeting state on reconnect
- Notify user of any state changes
- Handle conflicts gracefully
## Deliverables
### Backend
- [ ] Return resolved `project_id` in `CreateMeeting` response
- [ ] Add `expires_at` to sync status responses
- [ ] Add `state_version` to server info
- [ ] Emit events for state changes (future: WebSocket)
### Client
- [ ] Meeting cache invalidation on reconnect
- [ ] Meeting cache periodic refresh (30s for active meeting)
- [ ] Handle sync run expiry gracefully
- [ ] Update context with resolved project_id
- [ ] Surface integration validation errors
- [ ] State synchronization on reconnect
### Tests
- [ ] Integration test: meeting state sync after disconnect
- [ ] Integration test: sync run expiry handling
- [ ] Unit test: cache invalidation triggers
- [ ] E2E test: multi-client state consistency
## Test Strategy
### Fixtures
- Mock server with controllable state version
- Multi-client simulation
- Network partition simulation
### Test Cases
| Case | Input | Expected |
|------|-------|----------|
| Meeting modified by server | Create, modify via API, refresh | Client shows updated state |
| Sync run expires | Start sync, wait 70s, check status | Graceful "expired" message |
| Reconnection | Disconnect, modify, reconnect | State synchronized |
| Active project | Create meeting without project_id | Response includes resolved project_id |
| Cache version bump | Server restart with new version | Client invalidates caches |
## Quality Gates
- [ ] No stale meeting states shown after reconnection
- [ ] Sync run expiry handled gracefully (no error dialogs)
- [ ] Active project always known to client
- [ ] Integration validation errors surface to user
- [ ] All cache operations have invalidation path
- [ ] Tests cover multi-client scenarios