- Web: Don't set fullscreen state synchronously; rely on event listeners
- Web: Add fullscreenerror event handler to catch failed fullscreen requests
- iOS: Add connection ID to prevent stale WebSocket callbacks from corrupting
new connection state when reconnecting
- Set connectionState to .disconnected on view disappear (was incorrectly .connected)
- Only reset exponential backoff on successful (non-error) events to maintain proper
backoff behavior when server is unavailable
- Add onerror handler for image loading to prevent memory leaks
- Reset isPaused on disconnect to avoid UI desync
- Fix data race on backoff variable using nonisolated(unsafe)
- Fix WebSocket reconnection on slider changes by using initial values for URL params
- Fix iOS connected status set before WebSocket actually connects
- Fix mission state mapping to properly handle waiting_for_tool state
Implements desktop streaming feature to watch the AI agent work in real-time:
- Backend: WebSocket endpoint at /api/desktop/stream using MJPEG frames
- iOS: Bottom sheet UI with play/pause, FPS and quality controls
- Web: Side-by-side split view with toggleable desktop panel
- Better OpenCode error messages for debugging
- Fix missions showing "Default" label by using mission ID instead when no model override
- Add ConnectionState enum to track SSE stream health with reconnecting/disconnected states
- Implement automatic reconnection with exponential backoff (1s→30s)
- Show connection status in toolbar when disconnecting, hide error bubbles for connection issues
- Fix status event filtering to only apply to currently viewed mission
- Reset run state when creating new mission or switching missions
- Redesign input field to ChatGPT style: clean outline, no background fill, integrated send button
- Fix missions showing "Default" label by using mission ID instead when no model override
- Add ConnectionState enum to track SSE stream health with reconnecting/disconnected states
- Implement automatic reconnection with exponential backoff (1s→30s)
- Show connection status in toolbar when disconnecting, hide error bubbles for connection issues
- Fix status event filtering to only apply to currently viewed mission
- Reset run state when creating new mission or switching missions
- Redesign input field to ChatGPT style: clean outline, no background fill, integrated send button
This refactor simplifies the architecture by:
## Backend Changes
- Remove AgentBackend enum and dual-backend logic
- Make OpenCode the sole execution backend
- Change opencode_base_url from Option<String> to String with default
- Update default model to claude-sonnet-4-20250514
## Provider System
- Add GET /api/providers endpoint for model discovery
- Create .open_agent/providers.json config file
- Support grouped models by provider with billing type metadata
## Code Cleanup
- Delete SimpleAgent (src/agents/simple.rs)
- Delete TaskExecutor (src/agents/leaf/)
- Delete orchestrator module (src/agents/orchestrator/)
- Keep LLM client (needed for memory embeddings)
- Keep budget system (useful for cost tracking)
- Keep tools module (for MCP API listing)
## Dashboard Updates
- Add listProviders() API function
- Update model selector to group by provider
- Show billing type (subscription vs pay-per-token)
## Documentation
- Update CLAUDE.md to reflect OpenCode-only architecture
* Fix missions staying Active after completion with OpenCode backend
- Add TerminalReason::Completed variant for successful task completion
- Set terminal_reason in OpenCodeAgent on success to trigger auto-complete
- Update control.rs to explicitly handle Completed terminal reason
- Update CLAUDE.md with OpenCode backend documentation
* Improve iOS dashboard UI polish
- Remove harsh input field border, use ultraThinMaterial background with subtle focus glow
- Clean up model selector pills: remove ugly truncated mission IDs, increase padding
- Remove agent working indicator border for cleaner look
- Increase input area bottom padding for better thumb reach
* Add real-time event streaming for OpenCode backend
- Add SSE streaming support to OpenCodeClient via /event endpoint
- Parse and forward OpenCode events (thinking, tool_call, tool_result)
- Update OpenCodeAgent to consume stream and forward to control channel
- Add fallback to blocking mode if SSE connection fails
This enables live UI updates in the dashboard when using OpenCode backend.
* Fix running mission tracking to use actual executing mission ID
Track the mission ID that the main `running` task is actually working on
separately from `current_mission`, which can change when the user creates
a new mission. This ensures ListRunning and GracefulShutdown correctly
identify which mission is being executed.
* Add MCP server for desktop tools and Playwright integration
- Create desktop-mcp binary that exposes i3/Xvfb desktop automation tools
as an MCP server for use with OpenCode backend
- Add opencode.json with both desktop and Playwright MCP configurations
- Update deployment command to include desktop-mcp binary
- Document available MCP tools in CLAUDE.md
Desktop tools: start_session, stop_session, screenshot, type, click,
mouse_move, scroll, i3_command, get_text
* Document SSH key and desktop-mcp binary in production section
- Add ~/.ssh/cursor as the SSH key for production access
- Add desktop-mcp binary location to production table
* Emphasize bun usage and add gitignore entries
- Add clear instructions to ALWAYS use bun, never npm for dashboard
- Gitignore .playwright-mcp/ directory (local MCP data)
- Gitignore dashboard/package-lock.json (we use bun.lockb)
* Add mission delete and cleanup features to web and iOS dashboards
Backend (Rust):
- Add delete_mission() and delete_empty_untitled_missions() to supabase.rs
- Add DELETE /api/control/missions/:id endpoint with running mission guard
- Add POST /api/control/missions/cleanup endpoint for bulk cleanup
Web Dashboard (Next.js):
- Add deleteMission() and cleanupEmptyMissions() API functions
- Add delete button (trash icon) on hover for each mission row
- Add "Cleanup Empty" button with sparkles icon in filters area
- Fix analytics to compute stats from missions/runs data instead of broken /api/stats
iOS Dashboard (Swift):
- Add deleteMission() and cleanupEmptyMissions() to APIService
- Add delete() HTTP helper method
- Add swipe-to-delete on mission rows (disabled for active missions)
- Add "Cleanup" button with sparkles icon and progress indicator
- Add success banner with auto-dismiss after cleanup
* Fix CancelMission and MCP notification parsing bugs
- CancelMission now uses running_mission_id instead of current_mission
to correctly identify the executing mission (fixes race condition
when user creates new mission while another is running)
- MCP server JsonRpcRequest.id field now has #[serde(default)] to
handle JSON-RPC 2.0 notifications which don't have an id field
* Fix running mission tracking bugs
- delete_mission: Query control actor for actual running missions
instead of using always-empty running_missions list
- cleanup_empty_missions: Exclude running missions from cleanup to
prevent deleting missions mid-execution
- get_parallel_config: Query control actor for accurate running count
- Task completion: Save running_mission_id before clearing and use it
for persist and auto-complete (fixes race when user creates new
mission while task is running)
All endpoints now use ControlCommand::ListRunning to get accurate
running state from the control actor loop.
* Fix bugbot issues: analytics cost, browser cleanup, title truncation, history append
- Add get_total_cost_cents() to supabase.rs for aggregating all run costs
- Update /api/stats endpoint to return actual total cost from database
- Fix analytics page to use stats endpoint for total cost (not limited to 100 runs)
- Fix desktop_mcp.rs to save browser_pid to session file after launch
- Fix mission title truncation to use safe_truncate_index and append "..."
- Fix mission history to append to existing DB history instead of replacing
(prevents data loss when CreateMission is called during task execution)
* Fix history context contamination and cumulative thinking content
- Only push to local history if completed mission matches current mission,
preventing old mission exchanges from contaminating new mission context
- Accumulate thinking content across iterations so frontend replacement
shows all thinking, matching OpenCode backend behavior
* Fix MCP notifications, orphaned processes, and shutdown persistence
- MCP server no longer sends responses to JSON-RPC notifications (per spec)
- Clean up Xvfb/i3/Chromium processes on partial session startup failure
- Graceful shutdown only persists history if running mission matches current
* Fix partial field selection deserialization in cleanup endpoint
Use PartialMission struct for partial field queries to avoid
deserialization failure when DbMission's required fields are missing.
* Clarify analytics success rate measures missions not tasks
Update labels to "Mission Success Rate" and "X missions completed"
to make it clear the metric is mission-level, not task-level.
* Fix missions staying Active after completion with OpenCode backend
- Add TerminalReason::Completed variant for successful task completion
- Set terminal_reason in OpenCodeAgent on success to trigger auto-complete
- Update control.rs to explicitly handle Completed terminal reason
- Update CLAUDE.md with OpenCode backend documentation
* Improve iOS dashboard UI polish
- Remove harsh input field border, use ultraThinMaterial background with subtle focus glow
- Clean up model selector pills: remove ugly truncated mission IDs, increase padding
- Remove agent working indicator border for cleaner look
- Increase input area bottom padding for better thumb reach
* Add real-time event streaming for OpenCode backend
- Add SSE streaming support to OpenCodeClient via /event endpoint
- Parse and forward OpenCode events (thinking, tool_call, tool_result)
- Update OpenCodeAgent to consume stream and forward to control channel
- Add fallback to blocking mode if SSE connection fails
This enables live UI updates in the dashboard when using OpenCode backend.
* Fix running mission tracking to use actual executing mission ID
Track the mission ID that the main `running` task is actually working on
separately from `current_mission`, which can change when the user creates
a new mission. This ensures ListRunning and GracefulShutdown correctly
identify which mission is being executed.
* Fix missions staying Active after completion with OpenCode backend
- Add TerminalReason::Completed variant for successful task completion
- Set terminal_reason in OpenCodeAgent on success to trigger auto-complete
- Update control.rs to explicitly handle Completed terminal reason
- Update CLAUDE.md with OpenCode backend documentation
* Improve iOS dashboard UI polish
- Remove harsh input field border, use ultraThinMaterial background with subtle focus glow
- Clean up model selector pills: remove ugly truncated mission IDs, increase padding
- Remove agent working indicator border for cleaner look
- Increase input area bottom padding for better thumb reach
* Add real-time event streaming for OpenCode backend
- Add SSE streaming support to OpenCodeClient via /event endpoint
- Parse and forward OpenCode events (thinking, tool_call, tool_result)
- Update OpenCodeAgent to consume stream and forward to control channel
- Add fallback to blocking mode if SSE connection fails
This enables live UI updates in the dashboard when using OpenCode backend.
* Fix running mission tracking to use actual executing mission ID
Track the mission ID that the main `running` task is actually working on
separately from `current_mission`, which can change when the user creates
a new mission. This ensures ListRunning and GracefulShutdown correctly
identify which mission is being executed.
The img tag cannot send custom HTTP headers, causing authenticated image requests to fail. Fetch the image as a blob with proper Bearer token authentication, then use a blob URL for the src attribute.
Add timestamps and timestamps to all messages, syntax-highlighted code blocks with copy buttons, file preview modal with syntax highlighting, analytics dashboard, quick action templates, and extended iOS ToolUI support for progress bars, alerts, and code blocks.
The field was defined and configurable via SUMMARIZE_LARGE_RESULTS env var,
but never actually used in any code path. LLM-based summarization of large
tool results was not implemented. Remove to avoid misleading configuration.
MAX_TOOL_RESULT_CHARS env var was loaded into ExecutionThresholds but
the truncation logic used ctx.config.context.max_tool_result_chars
directly. Now thresholds properly override config default when set.
- benchmarks: Fix test_normalize_id to expect '/' to be preserved
(provider prefix is needed for matching, only removes :, -, _, .)
- learned: Fix test_select_model_prefers_high_success_low_cost to use
values that correctly trigger the scoring formula behavior
- retry: Fix test_budget_exhausted_with_progress by using 85% budget
(condition is > 0.8, not >= 0.8)
- error: Fix exponential_backoff to cap total delay (including jitter)
at 60 seconds, not just the base delay before jitter is added
- Implement smart tool result handling with UTF-8-safe truncation
- Add category-aware pivot prompts when agent gets stuck in loops
- Wire up benchmark-based model routing for optimal task-type matching
- Create 4 new composite tools (analyze_codebase, deep_search, prepare_project, debug_error)
- Implement configurable execution thresholds via environment variables
- Add blocker detection for early termination of impossible tasks
- Improve tool failure tracking with cross-category fallback suggestions
These improvements reduce iteration count, provide better guidance when stuck,
and automatically select the right model for each task type.
The agent system now uses SimpleAgent → TaskExecutor, not the old
hierarchical orchestrator. Remove "Adding a New Leaf Agent" sections
from both CLAUDE.md and cursor rules as they reference deprecated
RootAgent/LeafAgent patterns.
This adds tracking of execution termination reasons (cancellation, budget exhaustion, LLM errors, stalling, infinite loops, max iterations) to properly distinguish between different failure modes in agent execution.
Add .claude/CLAUDE.md with project documentation (architecture, commands, conventions, env vars) and .claude/settings.json with tool permissions for streamlined agent development.
1. Upload progress bar - shows real-time progress with bytes/percentage
2. URL download - paste any URL, server downloads directly (faster for large files)
3. Chunked uploads - files >10MB split into 5MB chunks with retry (3 attempts)
Dashboard changes:
- Progress bar UI with bytes transferred
- Link icon button to paste URLs
- Uses chunked upload for large files automatically
Backend changes:
- /api/fs/upload-chunk - receives file chunks
- /api/fs/upload-finalize - assembles chunks into final file
- /api/fs/download-url - server downloads from URL to filesystem
When CONSOLE_SSH_HOST is 127.0.0.1/localhost, use direct file
operations instead of SSH/SFTP to itself. This makes uploads instant
instead of going through the full SFTP overhead.
Optimizes: upload, download, list, mkdir, rm
1. Dropdown now shows "Continue Mission" for blocked status (was only
showing "Reactivate" which doesn't provide resume context)
2. Loop warning message now accurately shows remaining attempts before
termination instead of always saying "next call will terminate"
LLMs sometimes generate plausible-looking image URLs without actually
uploading images. This adds validation to detect URLs that weren't
in the pending_uploads list and warns the model to use actual tools.
- Lower warning threshold from 3 to 2 repetitions
- Lower force-complete threshold from 5 to 4 repetitions
- Include last tool result in warning message so model sees WHY it's failing
- Make warning message more actionable with specific suggestions