356 Commits

Author SHA1 Message Date
Thomas Marchand
3e38d2716e Merge master into ios-mission-loading branch
Resolve merge conflicts in ControlView.swift:
- Keep improved SSE reconnection logic with successful event tracking
- Keep better error filtering for SSE-specific errors
- Keep initial disconnected state for connection indicator
- Keep switch-based state mapping for running missions
2026-01-03 06:05:24 +00:00
Thomas Marchand
3fbe3cc662 Fix fullscreen state sync and stale WebSocket callbacks
- Web: Don't set fullscreen state synchronously; rely on event listeners
- Web: Add fullscreenerror event handler to catch failed fullscreen requests
- iOS: Add connection ID to prevent stale WebSocket callbacks from corrupting
  new connection state when reconnecting
2026-01-02 23:32:38 +00:00
Thomas Marchand
82573f2587 Fix connection state and backoff logic in iOS ControlView
- Set connectionState to .disconnected on view disappear (was incorrectly .connected)
- Only reset exponential backoff on successful (non-error) events to maintain proper
  backoff behavior when server is unavailable
2026-01-02 23:18:13 +00:00
Thomas Marchand
3fd40035e3 Fix initial connection state and task cleanup
- Start iOS connection state as disconnected until first event
- Abort spawned tasks when WebSocket handler exits to prevent resource waste
2026-01-02 23:03:54 +00:00
Thomas Marchand
c898f68026 Address remaining Bugbot review issues
- Make error filtering more specific to SSE reconnection errors only
- Use refs for FPS/quality values to preserve current settings on reconnect
2026-01-02 22:44:14 +00:00
Thomas Marchand
d7007acc20 Fix additional Bugbot review issues
- Add onerror handler for image loading to prevent memory leaks
- Reset isPaused on disconnect to avoid UI desync
- Fix data race on backoff variable using nonisolated(unsafe)
2026-01-02 22:32:03 +00:00
Thomas Marchand
0480828aa2 Change default model from Sonnet 4 to Opus 4.5
Update DEFAULT_MODEL default value to claude-opus-4-5-20251101,
the most capable model in the Claude family.
2026-01-02 22:19:24 +00:00
Thomas Marchand
69e5c4d915 Fix Bugbot review issues
- Fix WebSocket reconnection on slider changes by using initial values for URL params
- Fix iOS connected status set before WebSocket actually connects
- Fix mission state mapping to properly handle waiting_for_tool state
2026-01-02 22:12:39 +00:00
Thomas Marchand
c9a2ef45c7 Add real-time desktop streaming with WebSocket MJPEG
Implements desktop streaming feature to watch the AI agent work in real-time:

- Backend: WebSocket endpoint at /api/desktop/stream using MJPEG frames
- iOS: Bottom sheet UI with play/pause, FPS and quality controls
- Web: Side-by-side split view with toggleable desktop panel
- Better OpenCode error messages for debugging
2026-01-02 21:40:00 +00:00
Thomas Marchand
3398dbe271 iOS: Improve mission UI, add auto-reconnect, and refine input field (#16)
- Fix missions showing "Default" label by using mission ID instead when no model override
- Add ConnectionState enum to track SSE stream health with reconnecting/disconnected states
- Implement automatic reconnection with exponential backoff (1s→30s)
- Show connection status in toolbar when disconnecting, hide error bubbles for connection issues
- Fix status event filtering to only apply to currently viewed mission
- Reset run state when creating new mission or switching missions
- Redesign input field to ChatGPT style: clean outline, no background fill, integrated send button
2026-01-02 13:00:16 -08:00
Thomas Marchand
0a960e6381 iOS: Improve mission UI, add auto-reconnect, and refine input field
- Fix missions showing "Default" label by using mission ID instead when no model override
- Add ConnectionState enum to track SSE stream health with reconnecting/disconnected states
- Implement automatic reconnection with exponential backoff (1s→30s)
- Show connection status in toolbar when disconnecting, hide error bubbles for connection issues
- Fix status event filtering to only apply to currently viewed mission
- Reset run state when creating new mission or switching missions
- Redesign input field to ChatGPT style: clean outline, no background fill, integrated send button
2026-01-02 20:59:36 +00:00
Thomas Marchand
48fbdfdc60 Remove Local Backend, make OpenCode the only execution path (#15)
This refactor simplifies the architecture by:

## Backend Changes
- Remove AgentBackend enum and dual-backend logic
- Make OpenCode the sole execution backend
- Change opencode_base_url from Option<String> to String with default
- Update default model to claude-sonnet-4-20250514

## Provider System
- Add GET /api/providers endpoint for model discovery
- Create .open_agent/providers.json config file
- Support grouped models by provider with billing type metadata

## Code Cleanup
- Delete SimpleAgent (src/agents/simple.rs)
- Delete TaskExecutor (src/agents/leaf/)
- Delete orchestrator module (src/agents/orchestrator/)
- Keep LLM client (needed for memory embeddings)
- Keep budget system (useful for cost tracking)
- Keep tools module (for MCP API listing)

## Dashboard Updates
- Add listProviders() API function
- Update model selector to group by provider
- Show billing type (subscription vs pay-per-token)

## Documentation
- Update CLAUDE.md to reflect OpenCode-only architecture
2026-01-02 12:32:27 -08:00
Thomas Marchand
ce33838968 OpenCode refactor and mission tracking fixes (#14)
* Fix missions staying Active after completion with OpenCode backend

- Add TerminalReason::Completed variant for successful task completion
- Set terminal_reason in OpenCodeAgent on success to trigger auto-complete
- Update control.rs to explicitly handle Completed terminal reason
- Update CLAUDE.md with OpenCode backend documentation

* Improve iOS dashboard UI polish

- Remove harsh input field border, use ultraThinMaterial background with subtle focus glow
- Clean up model selector pills: remove ugly truncated mission IDs, increase padding
- Remove agent working indicator border for cleaner look
- Increase input area bottom padding for better thumb reach

* Add real-time event streaming for OpenCode backend

- Add SSE streaming support to OpenCodeClient via /event endpoint
- Parse and forward OpenCode events (thinking, tool_call, tool_result)
- Update OpenCodeAgent to consume stream and forward to control channel
- Add fallback to blocking mode if SSE connection fails

This enables live UI updates in the dashboard when using OpenCode backend.

* Fix running mission tracking to use actual executing mission ID

Track the mission ID that the main `running` task is actually working on
separately from `current_mission`, which can change when the user creates
a new mission. This ensures ListRunning and GracefulShutdown correctly
identify which mission is being executed.

* Add MCP server for desktop tools and Playwright integration

- Create desktop-mcp binary that exposes i3/Xvfb desktop automation tools
  as an MCP server for use with OpenCode backend
- Add opencode.json with both desktop and Playwright MCP configurations
- Update deployment command to include desktop-mcp binary
- Document available MCP tools in CLAUDE.md

Desktop tools: start_session, stop_session, screenshot, type, click,
mouse_move, scroll, i3_command, get_text

* Document SSH key and desktop-mcp binary in production section

- Add ~/.ssh/cursor as the SSH key for production access
- Add desktop-mcp binary location to production table

* Emphasize bun usage and add gitignore entries

- Add clear instructions to ALWAYS use bun, never npm for dashboard
- Gitignore .playwright-mcp/ directory (local MCP data)
- Gitignore dashboard/package-lock.json (we use bun.lockb)

* Add mission delete and cleanup features to web and iOS dashboards

Backend (Rust):
- Add delete_mission() and delete_empty_untitled_missions() to supabase.rs
- Add DELETE /api/control/missions/:id endpoint with running mission guard
- Add POST /api/control/missions/cleanup endpoint for bulk cleanup

Web Dashboard (Next.js):
- Add deleteMission() and cleanupEmptyMissions() API functions
- Add delete button (trash icon) on hover for each mission row
- Add "Cleanup Empty" button with sparkles icon in filters area
- Fix analytics to compute stats from missions/runs data instead of broken /api/stats

iOS Dashboard (Swift):
- Add deleteMission() and cleanupEmptyMissions() to APIService
- Add delete() HTTP helper method
- Add swipe-to-delete on mission rows (disabled for active missions)
- Add "Cleanup" button with sparkles icon and progress indicator
- Add success banner with auto-dismiss after cleanup

* Fix CancelMission and MCP notification parsing bugs

- CancelMission now uses running_mission_id instead of current_mission
  to correctly identify the executing mission (fixes race condition
  when user creates new mission while another is running)
- MCP server JsonRpcRequest.id field now has #[serde(default)] to
  handle JSON-RPC 2.0 notifications which don't have an id field

* Fix running mission tracking bugs

- delete_mission: Query control actor for actual running missions
  instead of using always-empty running_missions list
- cleanup_empty_missions: Exclude running missions from cleanup to
  prevent deleting missions mid-execution
- get_parallel_config: Query control actor for accurate running count
- Task completion: Save running_mission_id before clearing and use it
  for persist and auto-complete (fixes race when user creates new
  mission while task is running)

All endpoints now use ControlCommand::ListRunning to get accurate
running state from the control actor loop.

* Fix bugbot issues: analytics cost, browser cleanup, title truncation, history append

- Add get_total_cost_cents() to supabase.rs for aggregating all run costs
- Update /api/stats endpoint to return actual total cost from database
- Fix analytics page to use stats endpoint for total cost (not limited to 100 runs)
- Fix desktop_mcp.rs to save browser_pid to session file after launch
- Fix mission title truncation to use safe_truncate_index and append "..."
- Fix mission history to append to existing DB history instead of replacing
  (prevents data loss when CreateMission is called during task execution)

* Fix history context contamination and cumulative thinking content

- Only push to local history if completed mission matches current mission,
  preventing old mission exchanges from contaminating new mission context
- Accumulate thinking content across iterations so frontend replacement
  shows all thinking, matching OpenCode backend behavior

* Fix MCP notifications, orphaned processes, and shutdown persistence

- MCP server no longer sends responses to JSON-RPC notifications (per spec)
- Clean up Xvfb/i3/Chromium processes on partial session startup failure
- Graceful shutdown only persists history if running mission matches current

* Fix partial field selection deserialization in cleanup endpoint

Use PartialMission struct for partial field queries to avoid
deserialization failure when DbMission's required fields are missing.

* Clarify analytics success rate measures missions not tasks

Update labels to "Mission Success Rate" and "X missions completed"
to make it clear the metric is mission-level, not task-level.
2026-01-02 09:45:01 -08:00
Thomas Marchand
3164febd57 OpenCode integration with real-time streaming (#13)
* Fix missions staying Active after completion with OpenCode backend

- Add TerminalReason::Completed variant for successful task completion
- Set terminal_reason in OpenCodeAgent on success to trigger auto-complete
- Update control.rs to explicitly handle Completed terminal reason
- Update CLAUDE.md with OpenCode backend documentation

* Improve iOS dashboard UI polish

- Remove harsh input field border, use ultraThinMaterial background with subtle focus glow
- Clean up model selector pills: remove ugly truncated mission IDs, increase padding
- Remove agent working indicator border for cleaner look
- Increase input area bottom padding for better thumb reach

* Add real-time event streaming for OpenCode backend

- Add SSE streaming support to OpenCodeClient via /event endpoint
- Parse and forward OpenCode events (thinking, tool_call, tool_result)
- Update OpenCodeAgent to consume stream and forward to control channel
- Add fallback to blocking mode if SSE connection fails

This enables live UI updates in the dashboard when using OpenCode backend.

* Fix running mission tracking to use actual executing mission ID

Track the mission ID that the main `running` task is actually working on
separately from `current_mission`, which can change when the user creates
a new mission. This ensures ListRunning and GracefulShutdown correctly
identify which mission is being executed.
2026-01-02 08:20:02 +00:00
Thomas Marchand
6acab1da5c Fix missions showing as Active after OpenCode completion (#12)
* Fix missions staying Active after completion with OpenCode backend

- Add TerminalReason::Completed variant for successful task completion
- Set terminal_reason in OpenCodeAgent on success to trigger auto-complete
- Update control.rs to explicitly handle Completed terminal reason
- Update CLAUDE.md with OpenCode backend documentation

* Improve iOS dashboard UI polish

- Remove harsh input field border, use ultraThinMaterial background with subtle focus glow
- Clean up model selector pills: remove ugly truncated mission IDs, increase padding
- Remove agent working indicator border for cleaner look
- Increase input area bottom padding for better thumb reach

* Add real-time event streaming for OpenCode backend

- Add SSE streaming support to OpenCodeClient via /event endpoint
- Parse and forward OpenCode events (thinking, tool_call, tool_result)
- Update OpenCodeAgent to consume stream and forward to control channel
- Add fallback to blocking mode if SSE connection fails

This enables live UI updates in the dashboard when using OpenCode backend.

* Fix running mission tracking to use actual executing mission ID

Track the mission ID that the main `running` task is actually working on
separately from `current_mission`, which can change when the user creates
a new mission. This ensures ListRunning and GracefulShutdown correctly
identify which mission is being executed.
2026-01-02 08:16:50 +00:00
Thomas Marchand
640d2b39fd Merge pull request #11 from lfglabs-dev/Th0rgal/open-code-refactor
Add OpenCode integration for backend execution
2026-01-02 07:49:00 +00:00
Thomas Marchand
50e0b0df26 Add .env*.local to dashboard gitignore
Vercel CLI automatically added this entry when pulling env vars.
2026-01-02 07:48:33 +00:00
Thomas Marchand
610b9366f2 Add OpenCode integration for backend execution
- Add OpenCode HTTP client module (src/opencode/mod.rs)
- Add OpenCodeAgent for delegating task execution (src/agents/opencode.rs)
- Update config to support AGENT_BACKEND selection (opencode/local)
- Fix path canonicalization for OpenCode directory requirement
- Update routes to use OpenCodeAgent when backend=opencode
2026-01-02 07:39:24 +00:00
Thomas Marchand
4fa25b9d70 Merge pull request #10 from lfglabs-dev/Th0rgal/fix-image-auth
Fix image preview authentication in FilePreviewModal
2025-12-26 19:49:52 +03:00
Thomas Marchand
17b021b313 Fix race condition causing blob URL memory leak
Add staleness check to prevent updating state/refs after cleanup runs when path changes during an in-flight fetch.
2025-12-26 17:39:31 +01:00
Thomas Marchand
3f487829ea Fix image preview authentication in FilePreviewModal
The img tag cannot send custom HTTP headers, causing authenticated image requests to fail. Fetch the image as a blob with proper Bearer token authentication, then use a blob URL for the src attribute.
2025-12-26 11:07:12 +01:00
Thomas Marchand
0b46c4e91f Merge pull request #9 from lfglabs-dev/Th0rgal/client-improvements
Improve web and iOS clients with enhanced UX features
2025-12-26 12:50:30 +03:00
Thomas Marchand
7356aade92 Improve web and iOS clients with enhanced UX features
Add timestamps and timestamps to all messages, syntax-highlighted code blocks with copy buttons, file preview modal with syntax highlighting, analytics dashboard, quick action templates, and extended iOS ToolUI support for progress bars, alerts, and code blocks.
2025-12-26 10:09:45 +01:00
Thomas Marchand
37fede3105 Merge pull request #8 from lfglabs-dev/Th0rgal/agent-improvements
Enhance agent capabilities with smart pivoting and model routing
2025-12-26 11:38:16 +03:00
Thomas Marchand
1289be8d44 Remove unused summarize_large_results config option
The field was defined and configurable via SUMMARIZE_LARGE_RESULTS env var,
but never actually used in any code path. LLM-based summarization of large
tool results was not implemented. Remove to avoid misleading configuration.
2025-12-26 09:24:43 +01:00
Thomas Marchand
04e15e34cd Fix blocker false positives and truncation char/byte mismatch
- Remove generic "audit" keyword from Solidity task detection to avoid
  false positive TypeMismatch blockers on non-Solidity audit tasks
- Add Solidity-specific keywords: .sol, evm, foundry, hardhat
- Fix DeepSearch truncation check to compare char count (not byte count)
  to match the chars().take(10000) truncation logic
- Add test for generic audit not triggering false positive
2025-12-26 09:10:23 +01:00
Thomas Marchand
a6346051c4 Fix env var thresholds for truncation not being applied
MAX_TOOL_RESULT_CHARS env var was loaded into ExecutionThresholds but
the truncation logic used ctx.config.context.max_tool_result_chars
directly. Now thresholds properly override config default when set.
2025-12-26 08:58:47 +01:00
Thomas Marchand
8d2806fe29 Fix pre-existing test failures in budget and llm modules
- benchmarks: Fix test_normalize_id to expect '/' to be preserved
  (provider prefix is needed for matching, only removes :, -, _, .)

- learned: Fix test_select_model_prefers_high_success_low_cost to use
  values that correctly trigger the scoring formula behavior

- retry: Fix test_budget_exhausted_with_progress by using 85% budget
  (condition is > 0.8, not >= 0.8)

- error: Fix exponential_backoff to cap total delay (including jitter)
  at 60 seconds, not just the base delay before jitter is added
2025-12-26 08:44:08 +01:00
Thomas Marchand
620f35991f Enhance agent capabilities with smart pivoting and adaptive model selection
- Implement smart tool result handling with UTF-8-safe truncation
- Add category-aware pivot prompts when agent gets stuck in loops
- Wire up benchmark-based model routing for optimal task-type matching
- Create 4 new composite tools (analyze_codebase, deep_search, prepare_project, debug_error)
- Implement configurable execution thresholds via environment variables
- Add blocker detection for early termination of impossible tasks
- Improve tool failure tracking with cross-category fallback suggestions

These improvements reduce iteration count, provide better guidance when stuck,
and automatically select the right model for each task type.
2025-12-26 08:39:59 +01:00
Thomas Marchand
747b455a4f Merge pull request #7 from lfglabs-dev/Th0rgal/fix-build
Add TerminalReason enum to track execution failure modes
2025-12-25 22:53:03 +03:00
Thomas Marchand
f1f86d787e Add missing RunningMissionsBar.swift to iOS Xcode project
The file existed but wasn't included in the project.pbxproj, causing the build to fail with 'cannot find RunningMissionsBar in scope'.
2025-12-25 20:48:38 +01:00
Thomas Marchand
52ca4b00e9 Merge pull request #6 from lfglabs-dev/Th0rgal/claude-context-setup
Add Claude context configuration files
2025-12-25 22:45:54 +03:00
Thomas Marchand
37dfac1472 Remove outdated leaf agent docs, reflect SimpleAgent architecture
The agent system now uses SimpleAgent → TaskExecutor, not the old
hierarchical orchestrator. Remove "Adding a New Leaf Agent" sections
from both CLAUDE.md and cursor rules as they reference deprecated
RootAgent/LeafAgent patterns.
2025-12-25 20:45:12 +01:00
Thomas Marchand
4a76da16f6 Expand Rust conventions with provability-first design principles
Add pure functions, algebraic types, error handling examples, leaf agent creation guide, and enhanced design system notes.
2025-12-25 20:42:54 +01:00
Thomas Marchand
c8724621ca Add TerminalReason enum and terminal_reason field to AgentResult
This adds tracking of execution termination reasons (cancellation, budget exhaustion, LLM errors, stalling, infinite loops, max iterations) to properly distinguish between different failure modes in agent execution.
2025-12-25 20:42:42 +01:00
Thomas Marchand
fb2f3407b4 Add Claude context configuration files
Add .claude/CLAUDE.md with project documentation (architecture, commands, conventions, env vars) and .claude/settings.json with tool permissions for streamlined agent development.
2025-12-25 20:41:20 +01:00
Thomas Marchand
812bc4dc08 Merge pull request #5 from lfglabs-dev/fixes
ios fix
2025-12-25 12:04:57 +03:00
Thomas Marchand
96f0dad563 ios fix 2025-12-25 09:49:58 +01:00
Thomas Marchand
067afb28d0 Merge pull request #4 from lfglabs-dev/fixes
Fixes
2025-12-25 11:32:56 +03:00
Thomas Marchand
950c238d6b fix: interruption 2025-12-25 07:20:09 +01:00
Thomas Marchand
121cb2b7b9 fix: tool duration 2025-12-24 20:55:19 +01:00
Thomas Marchand
9abb646699 fix: open agent reports 2025-12-24 18:47:34 +01:00
Thomas Marchand
4c5e355640 fix: bugbot reported 2025-12-23 21:21:13 +01:00
Thomas Marchand
76fa7ebe89 feat: upload progress bar, URL download, and chunked uploads
1. Upload progress bar - shows real-time progress with bytes/percentage
2. URL download - paste any URL, server downloads directly (faster for large files)
3. Chunked uploads - files >10MB split into 5MB chunks with retry (3 attempts)

Dashboard changes:
- Progress bar UI with bytes transferred
- Link icon button to paste URLs
- Uses chunked upload for large files automatically

Backend changes:
- /api/fs/upload-chunk - receives file chunks
- /api/fs/upload-finalize - assembles chunks into final file
- /api/fs/download-url - server downloads from URL to filesystem
2025-12-23 19:30:06 +01:00
Thomas Marchand
ce6f552d4a perf: skip SFTP for localhost file operations
When CONSOLE_SSH_HOST is 127.0.0.1/localhost, use direct file
operations instead of SSH/SFTP to itself. This makes uploads instant
instead of going through the full SFTP overhead.

Optimizes: upload, download, list, mkdir, rm
2025-12-23 19:09:21 +01:00
Thomas Marchand
4217dbe038 fix: dropdown resume option for blocked missions + accurate loop warning
1. Dropdown now shows "Continue Mission" for blocked status (was only
   showing "Reactivate" which doesn't provide resume context)
2. Loop warning message now accurately shows remaining attempts before
   termination instead of always saying "next call will terminate"
2025-12-23 17:37:17 +01:00
Thomas Marchand
71bcc57bf6 fix: add missing blocked/not_feasible status mappings in load_mission_from_db
The helper was missing these cases, causing blocked missions to be
loaded as active and breaking the resume check.
2025-12-23 13:20:03 +01:00
Thomas Marchand
cabb6f926d fix: detect hallucinated Supabase image URLs in responses
LLMs sometimes generate plausible-looking image URLs without actually
uploading images. This adds validation to detect URLs that weren't
in the pending_uploads list and warns the model to use actual tools.
2025-12-23 12:02:58 +01:00
Thomas Marchand
3578c5bb40 fix: improve infinite loop detection with earlier warning and context
- Lower warning threshold from 3 to 2 repetitions
- Lower force-complete threshold from 5 to 4 repetitions
- Include last tool result in warning message so model sees WHY it's failing
- Make warning message more actionable with specific suggestions
2025-12-23 10:54:02 +01:00
Thomas Marchand
261794ffe7 feat: filter and group models in dropdown
- Remove llama models from selection
- Remove OpenAI o-series models (o1, o3, etc.)
- Group models by provider (Google, DeepSeek, Qwen, Anthropic, Mistral, OpenAI)
- Sort within each category alphabetically
2025-12-23 08:54:38 +01:00