Files
noteflow/docs/sprints/phase-ongoing/sprint-gap-012-state-sync-and-observability/README.md
Travis Vasceannie 0a18f2d23d chore: update linting artifacts
- Updated basedpyright linting results (705 files analyzed, analysis time reduced from 22.928s to 13.105s).
- Updated biome linting artifact with warning about unnecessary hook dependency (preferencesVersion) in MeetingDetail.tsx.
2026-01-08 21:45:05 -05:00

20 KiB

SPRINT-GAP-012: State Synchronization & Observability Gaps

Attribute Value
Sprint GAP-012
Size L (Large)
Owner TBD
Phase Hardening
Prerequisites None

Executive Summary

Investigation revealed multiple gaps in state synchronization, observability, and test coverage that cause confusing user experiences and difficult debugging:

  1. Server Address Timing Bug: After saving a new server address and navigating away, the displayed address resets to the previous value despite successful persistence
  2. OAuth State Consistency: Integrations display "connected" status with blank OAuth fields, causing silent sync failures
  3. Server Logging Gaps: Silent failures throughout the codebase make debugging impossible
  4. E2E Test Coverage: Tests validate structure but not functional round-trips

Current Status (reviewed January 9, 2026)

  • Still open: Server address timing/synchronization bug (effective URL not refreshed after connect).
  • Still open: Integration toggle can mark "connected" without required credentials; OIDC validation is missing in hasRequiredIntegrationFields().
  • Partially addressed: Logging gaps — ASR and webhook paths now log more failures, but standards/audit work remains.
  • Partially addressed: E2E coverage improved (connection + settings UI), but no persistence or lifecycle tests.

Open Issues

  • Define canonical source of truth for server address (local override vs preferences vs Rust state vs gRPC client)
  • Determine OAuth credential loading strategy (eager vs lazy)
  • Establish logging standards for failure modes (doc location + conventions)
  • Define e2e test functional coverage requirements (persistence, lifecycle, error recovery)

Issue 1: Server Address Timing/Synchronization Bug

Symptom

User saves server address 127.0.0.1:50051, connection succeeds, but navigating to another page and back shows 192.168.50.151:50051 (previous value).

Root Cause Analysis

The server address exists in multiple locations with different update timings:

Location Update Trigger Read On
TypeScript localStorage (noteflow_preferences) preferences.setServerConnection() Page mount via preferences.get()
Local override (noteflow_server_address_override_value) handleConnect() preferences.get() hydration
Rust AppState.preferences save_preferences IPC command get_effective_server_url IPC command
gRPC client internal endpoint connect() IPC command Connection health checks

Timing Gap in handleConnect()

File: client/src/pages/Settings.tsx (lines 260-289)

const handleConnect = async () => {
  setIsConnecting(true);
  try {
    const normalized = normalizeServerInput(serverHost, serverPort);
    if (normalized.host !== serverHost) setServerHost(normalized.host);
    if (normalized.port !== serverPort) setServerPort(normalized.port);

    // Local override + preferences
    localStorage.setItem('noteflow_server_address_override', 'true');
    localStorage.setItem(
      'noteflow_server_address_override_value',
      JSON.stringify({ host: normalized.host, port: normalized.port, updated_at: Date.now() })
    );
    preferences.setServerConnection(normalized.host, normalized.port);

    // Persist to Rust state (Tauri) and connect via gRPC
    const api = isTauriEnvironment() ? await initializeTauriAPI() : getAPI();
    await api.savePreferences(preferences.get());
    const info = await api.connect(buildServerUrl(normalized.host, normalized.port));

    setIsConnected(true);
    setServerInfo(info);
    // BUG: effectiveServerUrl is NOT re-fetched here
  } catch (error) {
    // ...
  }
};

Critical Bug

After handleConnect() completes:

  • effectiveServerUrl state still holds the old value from mount
  • When user navigates away and the component unmounts, then returns:
    • checkConnection() fetches effectiveServerUrl from Rust state
    • Rust state may prefer server_address_customized (from preferences) over env/default

File: client/src-tauri/src/commands/connection.rs (lines 100-121)

pub fn get_effective_server_url(state: State<'_, Arc<AppState>>) -> EffectiveServerUrl {
    let prefs = state.preferences.read();
    let cfg = config();
    let prefs_url = format!("{}:{}", prefs.server_host, prefs.server_port);

    // If preferences explicitly customized, use them
    if prefs.server_address_customized && !prefs.server_host.is_empty() {
        return EffectiveServerUrl {
            url: prefs_url,
            source: ServerAddressSource::Preferences,
        };
    }

    EffectiveServerUrl {
        url: cfg.server.default_address.clone(),
        source: cfg.server.address_source,
    }
}

Proposed Fix

After successful connect, re-fetch effectiveServerUrl:

// In handleConnect(), after setServerInfo(info):
const urlInfo = await api.getEffectiveServerUrl();
setEffectiveServerUrl(urlInfo);

This ensures the displayed URL reflects the actual connected state after all async operations complete.

Validation Test (UI)

// e2e test to add
test('server address persists across navigation', async ({ page }) => {
  await page.goto('/settings?tab=status');
  await page.fill('#host', '127.0.0.1');
  await page.fill('#port', '50051');

  // Navigate away and back
  await page.goto('/');
  await page.goto('/settings?tab=status');

  // Verify persisted values
  await expect(page.locator('#host')).toHaveValue('127.0.0.1');
  await expect(page.locator('#port')).toHaveValue('50051');
});

Issue 2: OAuth/Integration State Consistency

Symptom

Integrations display "connected" status without valid credentials, causing:

  1. UI confusion (users see "connected" but credentials are missing)
  2. Potential failures when OAuth operations are attempted

Note: The sync scheduler only processes calendar and pkm integration types, so auth/oidc integrations with blank credentials don't cause sync failures directly.

Root Cause Analysis

Primary vulnerability and contributing factors:

2.0 Direct Status Toggle Without Validation (PRIMARY)

File: client/src/components/settings/integrations-section.tsx (lines 122-131)

const handleIntegrationToggle = (integration: Integration) => {
  const newStatus =
    integration.status === ConnStatus.CONNECTED ? ConnStatus.DISCONNECTED : ConnStatus.CONNECTED;
  preferences.updateIntegration(integration.id, { status: newStatus });
  // NO VALIDATION - allows "connected" without credentials
};

This is the primary entry point for invalid state. Users can toggle integrations to "connected" without any credential validation.

2.1 Empty OAuth Defaults

File: client/src/lib/default-integrations.ts (lines 21, 41-67)

const emptyOAuth = { client_id: '', client_secret: '', redirect_uri: '', scopes: [] as string[] };

// Used in:
createIntegration('Google SSO', 'auth', {
  oauth_config: { ...emptyOAuth, scopes: ['openid', 'email', 'profile'] },
}),

Integrations start with empty credentials by design, but status can become "connected" through various paths.

2.2 Status Updated Without Credential Validation

File: client/src/lib/preferences.ts (updateIntegration function)

The updateIntegration() function merges partial updates without validating that required fields for "connected" status are present:

updateIntegration(id: string, updates: Partial<Integration>): void {
  withPreferences((prefs) => {
    const idx = prefs.integrations.findIndex((i) => i.id === id);
    if (idx !== -1) {
      prefs.integrations[idx] = { ...prefs.integrations[idx], ...updates };
      // No validation that connected status has valid oauth_config
    }
  });
}

2.3 Type System Allows Invalid States

File: client/src/api/types/requests/integrations.ts

The Integration type allows status: 'connected' without requiring any config to be populated:

interface Integration {
  id: string;
  name: string;
  type: IntegrationType;
  status: IntegrationStatus;  // Can be 'connected'
  oauth_config?: OAuthConfig; // Optional - allows empty
  // ...
}

2.4 Secrets Stored Separately from Integration Objects

OAuth secrets are encrypted and stored separately via useSecureIntegrationSecrets, but the integration object may be loaded without secrets:

// In Settings.tsx loadEncryptedApiKeys():
const integrationsWithSecrets = await loadAllSecrets(preferences.getIntegrations());
setIntegrations(integrationsWithSecrets);
// If loadAllSecrets fails silently, integrations remain without credentials

2.5 Backend Separation of Secrets

The backend stores OAuth secrets separately from Integration records (IntegrationSecretModel). When fetching integrations, secrets may not be joined, leading to "connected" integrations without credentials.

2.6 Sync Trigger Doesn't Validate Credentials

File: client/src/hooks/use-integration-sync.ts

The scheduler only runs for calendar and pkm integrations with integration_id, but status can still be set to connected without required credentials for other integration types, leading to UI confusion and broken flows.

// In triggerSync():
if (integration.status !== 'connected') {
  return { success: false, error: 'Integration not connected' };
}
// No check for oauth_config.client_id presence

2.7 Graceful Error Handling Masks Root Cause

Sync failures are caught and displayed as toast errors, but the underlying "connected without credentials" state persists, causing repeated failures.

Proposed Fixes

  1. Extend existing validation helper (client/src/lib/integration-utils.ts):

    The file already has hasRequiredIntegrationFields() - extend it to add OIDC support and a convenience wrapper:

    // Add OIDC case to existing switch statement
    case 'oidc':
      return !!(integration.oidc_config?.issuer_url && integration.oidc_config?.client_id);
    
    // Add convenience wrapper below existing function
    export const isEffectivelyConnected = (integration: Integration): boolean =>
      integration.status === 'connected' && hasRequiredIntegrationFields(integration);
    
  2. Validate in handleIntegrationToggle (integrations-section.tsx):

    import { hasRequiredIntegrationFields } from '@/lib/integration-utils';
    
    const handleIntegrationToggle = (integration: Integration) => {
      if (integration.status === ConnStatus.DISCONNECTED) {
        // Trying to connect - validate first
        if (!hasRequiredIntegrationFields(integration)) {
          toast({
            title: 'Missing credentials',
            description: `Configure ${integration.name} credentials before connecting`,
            variant: 'destructive',
          });
          return;
        }
      }
      const newStatus =
        integration.status === ConnStatus.CONNECTED ? ConnStatus.DISCONNECTED : ConnStatus.CONNECTED;
      preferences.updateIntegration(integration.id, { status: newStatus });
      // ... rest unchanged
    };
    
  3. UI indicator for missing credentials: Display warning badge on integrations that are "connected" but fail hasRequiredIntegrationFields().

Affected Integrations

Integration OAuth Required Credential Source
Google SSO Yes User-provided
Microsoft 365 Yes User-provided
Google Calendar Yes OAuth flow
Outlook Calendar Yes OAuth flow
Authentik (OIDC) Yes User-provided
Keycloak (OIDC) Yes User-provided

Issue 3: Server Logging Gaps

Symptom

Logging coverage has improved, but some failure paths still lack consistent, structured logging standards.

Identified Logging Gaps

Location Pattern Impact
Broad exception handlers Missing context fields or inconsistent levels Root cause analysis harder
Early returns in hot paths No debug logs on skipped work Operational flow unclear
Background tasks Missing start/finish logs Difficult to trace async failures

Examples (Updated)

3.1 ASR Processing (now logs)

# src/noteflow/grpc/_mixins/streaming/_asr.py (approximate)
async def _process_audio_chunk(self, chunk: bytes) -> None:
    if not self._asr_engine:
        logger.error("ASR engine unavailable during segment processing", ...)
    # ...

Fix:

async def _process_audio_chunk(self, chunk: bytes) -> None:
    if not self._asr_engine:
        logger.debug("ASR engine unavailable, skipping chunk processing")
        return
    # ...

3.2 contextlib.suppress (now scoped)

# Remaining usages are limited to task cancellation paths (acceptable).

Fix: Replace with explicit exception handling:

try:
    await potentially_failing_operation()
except SpecificException as e:
    logger.warning("Operation failed", error=str(e), exc_info=True)
except Exception as e:
    logger.error("Unexpected failure in operation", error=str(e), exc_info=True)

3.3 Webhook Delivery (now logs)

# Background webhook delivery
try:
    delivery = await self._executor.deliver(...)
except Exception:
    _logger.exception("Unexpected error delivering webhook ...")

Fix:

async def _deliver_with_logging(self, config, payload):
    try:
        await self._deliver_webhook(config, payload)
    except Exception as e:
        logger.error("Webhook delivery failed",
                    webhook_id=config.id,
                    error=str(e),
                    exc_info=True)

asyncio.create_task(self._deliver_with_logging(config, payload))

Proposed Logging Standards (Still Needed)

  1. Always log on early returns: Any function that returns early due to missing prerequisites should log at DEBUG level
  2. Never suppress without logging: Replace contextlib.suppress() with try/except that logs
  3. Log background task lifecycle: Log when background tasks start and when they complete/fail
  4. Structured logging for all failures: Include context (IDs, operation names, parameters)
  5. Log at appropriate levels:
    • ERROR: Unexpected failures that need investigation
    • WARNING: Expected failures (validation, user errors)
    • INFO: Significant state changes
    • DEBUG: Detailed operational flow

Validation

Add log assertion tests:

def test_asr_unavailable_logs_debug(caplog):
    with caplog.at_level(logging.DEBUG):
        await process_chunk_without_asr()
    assert "ASR engine unavailable" in caplog.text

Issue 4: E2E Test Coverage Gaps

Current State

  • Playwright tests exist in client/e2e/ (connection, settings UI, OAuth/OIDC, etc.)
  • connection.spec.ts and settings-ui.spec.ts validate basic API round-trips and UI structure
  • No persistence or lifecycle tests for server address, connection states, or integration validation

Coverage Gaps

Gap Description Risk
No state persistence tests Navigation doesn't verify local override + prefs sync State bugs undetected
No connection lifecycle tests Connect/disconnect/reconnect not tested Connection bugs undetected
No integration validation tests Toggle/connect without creds not tested Credential bugs undetected
Limited error recovery tests Mostly happy path Error handling untested
Limited sync operation tests Calendar/integration sync untested Sync failures undetected

4.1 Connection Persistence Test

// client/e2e/connection-roundtrip.spec.ts
import { test, expect } from './fixtures';
import { callAPI, navigateTo, waitForAPI } from './fixtures';

test.describe('Connection Round-Trip', () => {
  test.beforeEach(async ({ page }) => {
    await navigateTo(page, '/settings?tab=status');
    await waitForAPI(page);
  });

  test('persists connection across navigation', async ({ page }) => {
    await page.fill('#host', '127.0.0.1');
    await page.fill('#port', '50051');

    // Navigate away
    await page.goto('/meetings');
    await page.waitForLoadState('networkidle');

    // Navigate back
    await page.goto('/settings?tab=status');

    // Should still show saved address
    await expect(page.locator('#host')).toHaveValue('127.0.0.1');
    await expect(page.locator('#port')).toHaveValue('50051');
  });
});

4.2 Integration Validation Test

// client/e2e/integration-state.spec.ts
import { test, expect } from './fixtures';
import { navigateTo, waitForAPI } from './fixtures';

test.describe('Integration State Consistency', () => {
  test.beforeEach(async ({ page }) => {
    await navigateTo(page, '/settings?tab=integrations');
    await waitForAPI(page);
  });

  test('integrations tab loads without errors', async ({ page }) => {
    // Verify tab content is visible
    await expect(page.locator('.settings-tab-content')).toBeVisible();
    // No error toasts should appear on load
    const errorToast = page.locator('[role="alert"]').filter({ hasText: /error|failed/i });
    await expect(errorToast).not.toBeVisible({ timeout: 3000 }).catch(() => {});
  });

  test('integration without credentials shows warning when toggled', async ({ page }) => {
    // Find any integration toggle button
    const toggleButton = page.locator('button').filter({ hasText: /connect/i }).first();

    if (await toggleButton.isVisible()) {
      await toggleButton.click();

    // Should show validation toast if credentials missing
    // (after fix is implemented)
      const toast = page.locator('[role="alert"]');
      // Either success or "missing credentials" warning
      await expect(toast).toBeVisible({ timeout: 5000 }).catch(() => {});
    }
  });
});

4.3 Connection Lifecycle Test (Optional)

// client/e2e/recording-roundtrip.spec.ts
import { test, expect } from './fixtures';
import { callAPI, navigateTo, waitForAPI } from './fixtures';

test.describe('Recording Round-Trip', () => {
  test.skip(true, 'Requires running backend with audio capture');

  test('recording page loads', async ({ page }) => {
    await navigateTo(page, '/recording');
    await waitForAPI(page);

    // Verify recording UI is present
    const recordButton = page.locator('button').filter({ hasText: /start|record/i });
    await expect(recordButton).toBeVisible();
  });

  test('can list meetings via API', async ({ page }) => {
    await navigateTo(page, '/');
    await waitForAPI(page);

    // Verify API round-trip works
    const meetings = await callAPI(page, 'listMeetings');
    expect(Array.isArray(meetings)).toBe(true);
  });
});

Test Infrastructure

The existing client/e2e/fixtures.ts already provides:

  • callAPI<T>(page, method, ...args) - Call API methods and return typed results
  • navigateTo(page, path) - Navigate with proper waiting
  • waitForAPI(page) - Wait for API to be available
  • getConnectionState(page) - Get current connection state

These fixtures should be used consistently across all new tests.


Implementation Priority

Issue Priority Effort Impact
Server Address Timing P1 Small High - User-facing bug
OAuth State Consistency P1 Medium High - Silent failures
E2E Persistence + Validation Tests P2 Medium Medium - Regression prevention
Logging Standards + Audit P3 Medium Medium - Debugging capability

Success Criteria

  1. Server address persists correctly across navigation and effective URL tooltip updates after connect
  2. Integrations cannot reach "connected" status without valid credentials (including OIDC)
  3. E2E tests cover persistence and integration validation
  4. Logging standards documented and applied consistently
  5. make quality passes when code changes are made

References

  • GAP-003: Error Handling Patterns
  • GAP-004: Diarization Lifecycle
  • GAP-011: Post-Processing Pipeline