- Mark missions as "blocked" instead of "failed" when max iterations reached
- Allow resuming blocked missions (same as interrupted)
- Add "Continue Mission" button in UI for blocked missions
- Show iteration limit banner in conversation view
- Add GPT-5.2 and GPT-5.2-pro to model families
- Add qwen3-next-80b-a3b-thinking to model families
- Add friendly display names in dashboard dropdown (e.g., "4.5-sonnet", "gpt-5.2-pro")
- Update CAPABLE_MODEL_BASES allowlist
Key fixes for reliability issues identified during Merlin audit:
P0: Handle reasoning-only LLM responses
- Agent was stalling when LLM returned only thinking without action
- Now detects empty/reasoning-only responses and prompts model to act
- Force completes after 4 consecutive empty responses
P1: Auto-complete missions on terminal states
- Missions staying "active" forever when loops ended
- Now auto-marks missions as failed when detecting terminal output
(max iterations, stall, budget exhausted, cancelled)
P2: Add LLM call timeout
- 5-minute timeout per LLM call to detect hangs
- Returns with partial results instead of hanging forever
P3: Track failed tool approaches
- Detects when same tool category fails repeatedly
- Injects warnings suggesting pivot to different approach
- Helps break out of "try slither 20 times" loops
Events (Thinking, AgentPhase, AgentTree, Progress, ToolCall, ToolResult)
were being sent with mission_id: None even for parallel missions.
This caused the dashboard to show events from all missions when
selecting a specific mission. Now all events include the correct
mission_id from the execution context.
Chrome extensions don't work reliably in headless mode for proxy auth.
Instead, we now start gost (Go Simple Tunnel) as a local proxy forwarder
that handles authentication with the upstream proxy. Chrome connects to
the local gost proxy without needing any auth.
This approach works in both headless and GUI modes.
Chrome headless mode doesn't support extensions properly, so when
proxy authentication is required, we now automatically start an
Xvfb virtual display and run Chrome there instead.
- Add BROWSER_PROXY env var to configure proxy (format: user:pass@host:port)
- Add BROWSER_LAUNCH env var to auto-launch Chrome with proxy settings
- Add BROWSER_HEADLESS env var to control headless mode
- Create Chrome extension dynamically for proxy auth (HTTP/HTTPS)
- Support socks5:// and http:// scheme prefixes
- Add final_tree JSONB column to missions table for storing tree on completion
- Save tree snapshot when mission completes (via complete_mission tool or manual status change)
- Add GET /api/control/missions/:id/tree endpoint to fetch mission-specific tree
- Update dashboard to fetch and display saved tree for finished missions instead of fallback
- "interesting" was matching "test" due to substring check
- Now uses word boundaries to avoid false positives
- Added browser/screenshot to ToolCalling task type
- return_image: Agent can SEE the screenshot (via VISION_IMAGE marker)
- upload: Upload for sharing with user (default: true)
This matches the pattern used by desktop_screenshot.
browser_screenshot now automatically uploads to Supabase Storage when
configured and returns markdown directly. This is a one-step workflow
instead of requiring upload_image as a separate step.
- Default model changed from Claude to qwen/qwen3-next-80b-a3b-thinking
- Improved upload_image tool description with explicit instructions
- Added upload tracking: executor now tracks pending uploads
- Validation on complete_mission: warns agent if uploads not included in response
- Add tool selection guide to system prompts (browser vs fetch, built-in vs MCP)
- Filter out MCP tools that overlap with built-in tools (filesystem, puppeteer)
- Add validation in complete_mission to require deliverables or explanation
- Improve prompts for tool priority (CLI > browser > desktop)
- Add MARKETING_VERSION and CURRENT_PROJECT_VERSION to project
- Update Info.plist to use build setting variables
- Required for Xcode Cloud App Store distribution
The iOS dashboard was showing events from ALL running missions,
not just the currently viewed mission. Now it filters SSE events
by mission_id to match the web dashboard behavior.
This prevents confusion when parallel missions are running.
P0: Prefix MCP tools with server name
- MCP tools now prefixed (e.g., filesystem_read_file)
- Prevents duplicate function name errors with xAI/Google
- strip_prefix() used when calling actual MCP server
P1: Filter mission summaries by current mission
- In mission mode, only inject summaries from THIS mission
- Prevents cross-mission contamination from past task summaries
- User facts still shared (intentional)
P2: xAI/Google now work with MCP (no duplicate function names)
- Pass McpRegistry to mission runner and agent context
- Route tool calls to MCP servers when built-in tool not found
- Include MCP tool descriptions and schemas in system prompt
- Add has_tool() method to ToolRegistry for routing
- Add shared resolve_path() and PathResolution utilities to tools/mod.rs
- Update all file/directory/search/terminal tools to use shared utilities
- Emphasize relative paths in tool descriptions (e.g., "output/report.md")
- Show resolved paths in tool outputs for clarity
- Remove duplicated resolve_path functions from individual tool files
This encourages agents to naturally work within their workspace while
preserving the ability to access system files via absolute paths.
- Set ctx.mission_id in mission runner (was always None)
- Add build_memory_context_with_options() with skip flag
- Skip cross-mission chunk retrieval when mission_id is set
- User facts and mission summaries still injected (shared intentionally)
- Split system prompt into mission mode vs regular mode
- Mission mode: all paths use {working_dir} variable, no hardcoded /root/work/
- Examples now use mission's assigned directory
- Simplified session context mission indicator
- Removes conflicting guidance about creating new folders in /root/work/
- Fix event filtering: properly filter events by mission_id when viewing parallel missions
- Fix new mission visibility: update viewingMissionId when creating a mission
- Add model override input: new mission dialog with optional model override field
Mistral sends prose as function names (e.g., "### Step 2: Explore...")
causing HTTP 400 errors: "Function name must be a-z, A-Z, 0-9, underscores, dashes"
Added to both compatibility.rs and pricing.rs blocklists.