356 Commits

Author SHA1 Message Date
Thomas Marchand
5f62e7a1c6 fix: import TerminalReason in control.rs 2025-12-23 08:51:26 +01:00
Thomas Marchand
486db50be3 feat: make max iterations resumable with Continue button
- Mark missions as "blocked" instead of "failed" when max iterations reached
- Allow resuming blocked missions (same as interrupted)
- Add "Continue Mission" button in UI for blocked missions
- Show iteration limit banner in conversation view
2025-12-23 08:50:38 +01:00
Thomas Marchand
045b704712 Merge pull request #3 from lfglabs-dev/fixes
Fixes
2025-12-23 00:40:42 +03:00
Thomas Marchand
f20b84f172 feat: add GPT-5.2 and qwen3-thinking models, friendlier display names
- Add GPT-5.2 and GPT-5.2-pro to model families
- Add qwen3-next-80b-a3b-thinking to model families
- Add friendly display names in dashboard dropdown (e.g., "4.5-sonnet", "gpt-5.2-pro")
- Update CAPABLE_MODEL_BASES allowlist
2025-12-22 21:39:25 +01:00
Thomas Marchand
331fd5025c fix: skip MCP tools that conflict with built-in tools 2025-12-22 18:01:40 +01:00
Thomas Marchand
c1f2e4c2c7 fix: add hint when command exits non-zero but produces output 2025-12-22 17:56:14 +01:00
Thomas Marchand
73b29d45d8 wip: fixes 2025-12-22 17:04:31 +01:00
Thomas Marchand
8daa252c20 fix: binary data 2025-12-22 16:36:41 +01:00
Thomas Marchand
255440fb1f fix: reported issues 2025-12-22 16:25:32 +01:00
Thomas Marchand
68a06807e0 refactor: cleanup 2025-12-22 09:13:47 +00:00
Thomas Marchand
ae19f125b6 refactor: navigation 2025-12-22 09:01:07 +00:00
Thomas Marchand
ba8bf6a4b0 cleanup 2025-12-22 08:49:48 +00:00
Thomas Marchand
2f62842ecd fix 2025-12-21 21:37:54 +00:00
Thomas Marchand
93205cfb59 fix: open agent 2025-12-21 21:16:48 +00:00
Thomas Marchand
e24f3f01c2 fix: prevent agent stalls and auto-complete stuck missions
Key fixes for reliability issues identified during Merlin audit:

P0: Handle reasoning-only LLM responses
- Agent was stalling when LLM returned only thinking without action
- Now detects empty/reasoning-only responses and prompts model to act
- Force completes after 4 consecutive empty responses

P1: Auto-complete missions on terminal states
- Missions staying "active" forever when loops ended
- Now auto-marks missions as failed when detecting terminal output
  (max iterations, stall, budget exhausted, cancelled)

P2: Add LLM call timeout
- 5-minute timeout per LLM call to detect hangs
- Returns with partial results instead of hanging forever

P3: Track failed tool approaches
- Detects when same tool category fails repeatedly
- Injects warnings suggesting pivot to different approach
- Helps break out of "try slither 20 times" loops
2025-12-21 20:24:06 +00:00
Thomas Marchand
88745345ab refactor: Replace complex agent hierarchy with SimpleAgent
Remove over-engineered multi-agent orchestration that added overhead
without improving reliability:

- Delete ComplexityEstimator (LLM-based, unreliable)
- Delete ModelSelector (U-curve optimization, over-engineered)
- Delete NodeAgent (recursive splitting lost context)
- Delete Verifier (rubber-stamped everything)
- Delete RootAgent (complex orchestration)
- Delete calibrate binary (no longer needed)

Add SimpleAgent that:
- Directly executes tasks via TaskExecutor
- Simple model selection (user override or config default)
- No automatic task splitting (user controls granularity)

Also update TaskExecutor prompts with anti-fabrication rules to
prevent fake content generation when blocked.

Preserves: parallel missions, SSE events, message history, agent tree.
2025-12-21 19:21:59 +00:00
Thomas Marchand
8896410830 wip: checkpoint 2025-12-21 17:35:05 +00:00
Thomas Marchand
58a4dc995c Fix mission_id missing from SSE events for parallel missions
Events (Thinking, AgentPhase, AgentTree, Progress, ToolCall, ToolResult)
were being sent with mission_id: None even for parallel missions.
This caused the dashboard to show events from all missions when
selecting a specific mission. Now all events include the correct
mission_id from the execution context.
2025-12-21 15:33:48 +00:00
Thomas Marchand
38525b504f Remove unused proxy extension files (now using gost) 2025-12-21 14:49:15 +00:00
Thomas Marchand
f7f6e54e63 Add browser proxy env vars to documentation 2025-12-21 14:48:50 +00:00
Thomas Marchand
8e9d04729e Use gost proxy forwarder for browser proxy auth
Chrome extensions don't work reliably in headless mode for proxy auth.
Instead, we now start gost (Go Simple Tunnel) as a local proxy forwarder
that handles authentication with the upstream proxy. Chrome connects to
the local gost proxy without needing any auth.

This approach works in both headless and GUI modes.
2025-12-21 14:47:12 +00:00
Thomas Marchand
7acd51551d Use virtual display (Xvfb) for browser when proxy auth is needed
Chrome headless mode doesn't support extensions properly, so when
proxy authentication is required, we now automatically start an
Xvfb virtual display and run Chrome there instead.
2025-12-21 14:34:47 +00:00
Thomas Marchand
c42f83d347 Add SOCKS5/HTTP proxy support for browser tools
- Add BROWSER_PROXY env var to configure proxy (format: user:pass@host:port)
- Add BROWSER_LAUNCH env var to auto-launch Chrome with proxy settings
- Add BROWSER_HEADLESS env var to control headless mode
- Create Chrome extension dynamically for proxy auth (HTTP/HTTPS)
- Support socks5:// and http:// scheme prefixes
2025-12-21 14:24:18 +00:00
Thomas Marchand
8377e7539f fix: UI/UX 2025-12-21 09:30:01 +00:00
Thomas Marchand
2aed7240c0 feat: improved missions ux 2025-12-21 09:03:08 +00:00
Thomas Marchand
cb4b89ec12 feat: Save and display agent tree for finished missions
- Add final_tree JSONB column to missions table for storing tree on completion
- Save tree snapshot when mission completes (via complete_mission tool or manual status change)
- Add GET /api/control/missions/:id/tree endpoint to fetch mission-specific tree
- Update dashboard to fetch and display saved tree for finished missions instead of fallback
2025-12-21 08:51:51 +00:00
Thomas Marchand
f85ea14b3b Fix UTF-8 truncation panic in tool results
- Add safe_truncate_index helper to find valid char boundaries
- Fix truncation in executor.rs, browser.rs, web.rs, git.rs, memory.rs
- Fix truncation in context.rs, retriever.rs, control.rs, routes.rs
- Prevents panic when truncating strings with multi-byte chars (e.g. Chinese)
2025-12-20 21:39:38 +00:00
Thomas Marchand
c6fd67664b Fix task type inference using word boundaries
- "interesting" was matching "test" due to substring check
- Now uses word boundaries to avoid false positives
- Added browser/screenshot to ToolCalling task type
2025-12-20 21:25:27 +00:00
Thomas Marchand
ce92992fa5 Add qwen3-next and qwen3-235b to model allowlist
Also improve browser_screenshot to encourage page verification before
screenshotting.
2025-12-20 20:37:39 +00:00
Thomas Marchand
d6d69413c0 Add return_image and upload params to browser_screenshot
- return_image: Agent can SEE the screenshot (via VISION_IMAGE marker)
- upload: Upload for sharing with user (default: true)

This matches the pattern used by desktop_screenshot.
2025-12-20 20:08:58 +00:00
Thomas Marchand
a83202105f Auto-upload screenshots to Supabase
browser_screenshot now automatically uploads to Supabase Storage when
configured and returns markdown directly. This is a one-step workflow
instead of requiring upload_image as a separate step.
2025-12-20 20:03:50 +00:00
Thomas Marchand
8f0aa36b54 Change default model to qwen3-next and add upload image validation
- Default model changed from Claude to qwen/qwen3-next-80b-a3b-thinking
- Improved upload_image tool description with explicit instructions
- Added upload tracking: executor now tracks pending uploads
- Validation on complete_mission: warns agent if uploads not included in response
2025-12-20 19:56:20 +00:00
Thomas Marchand
56d7421c3a Add tool selection guidance, MCP overlap filtering, and completion validation
- Add tool selection guide to system prompts (browser vs fetch, built-in vs MCP)
- Filter out MCP tools that overlap with built-in tools (filesystem, puppeteer)
- Add validation in complete_mission to require deliverables or explanation
- Improve prompts for tool priority (CLI > browser > desktop)
2025-12-20 18:26:53 +00:00
Thomas Marchand
4fc75b6a4c Add browser tools verification logging 2025-12-20 16:32:33 +00:00
Thomas Marchand
8f8aaae707 Add final registry count 2025-12-20 16:28:26 +00:00
Thomas Marchand
f5c7baa09f Add registry tracing 2025-12-20 16:24:39 +00:00
Thomas Marchand
ecabcc6947 Debug browser enabled env var 2025-12-20 16:21:07 +00:00
Thomas Marchand
328fdedaeb Add tool discovery logging 2025-12-20 16:18:57 +00:00
Thomas Marchand
d76a387344 Add browser tool registration logging 2025-12-20 16:09:38 +00:00
Thomas Marchand
8b834507ab fix(ios): add version settings for App Store Connect
- Add MARKETING_VERSION and CURRENT_PROJECT_VERSION to project
- Update Info.plist to use build setting variables
- Required for Xcode Cloud App Store distribution
2025-12-20 14:01:00 +00:00
Thomas Marchand
f7fe119e94 fix(ios): add per-mission event filtering to prevent cross-contamination
The iOS dashboard was showing events from ALL running missions,
not just the currently viewed mission. Now it filters SSE events
by mission_id to match the web dashboard behavior.

This prevents confusion when parallel missions are running.
2025-12-20 13:18:22 +00:00
Thomas Marchand
f52be88224 fix: prevent MCP tool conflicts and mission summary contamination
P0: Prefix MCP tools with server name
- MCP tools now prefixed (e.g., filesystem_read_file)
- Prevents duplicate function name errors with xAI/Google
- strip_prefix() used when calling actual MCP server

P1: Filter mission summaries by current mission
- In mission mode, only inject summaries from THIS mission
- Prevents cross-mission contamination from past task summaries
- User facts still shared (intentional)

P2: xAI/Google now work with MCP (no duplicate function names)
2025-12-20 13:01:09 +00:00
Thomas Marchand
c2c505a2f8 feat: MCP tool integration in missions
- Pass McpRegistry to mission runner and agent context
- Route tool calls to MCP servers when built-in tool not found
- Include MCP tool descriptions and schemas in system prompt
- Add has_tool() method to ToolRegistry for routing
2025-12-20 12:11:42 +00:00
Thomas Marchand
334711b2fe feat: workspace-first tool design with relative path defaults
- Add shared resolve_path() and PathResolution utilities to tools/mod.rs
- Update all file/directory/search/terminal tools to use shared utilities
- Emphasize relative paths in tool descriptions (e.g., "output/report.md")
- Show resolved paths in tool outputs for clarity
- Remove duplicated resolve_path functions from individual tool files

This encourages agents to naturally work within their workspace while
preserving the ability to access system files via absolute paths.
2025-12-20 11:52:35 +00:00
Thomas Marchand
4117abaf96 chore: add audit reports 2025-12-20 11:45:34 +00:00
Thomas Marchand
e598d01946 fix: prevent memory contamination between parallel missions
- Set ctx.mission_id in mission runner (was always None)
- Add build_memory_context_with_options() with skip flag
- Skip cross-mission chunk retrieval when mission_id is set
- User facts and mission summaries still injected (shared intentionally)
2025-12-20 11:31:09 +00:00
Thomas Marchand
81c73b69d1 fix: dynamic system prompt for mission isolation
- Split system prompt into mission mode vs regular mode
- Mission mode: all paths use {working_dir} variable, no hardcoded /root/work/
- Examples now use mission's assigned directory
- Simplified session context mission indicator
- Removes conflicting guidance about creating new folders in /root/work/
2025-12-20 09:17:56 +00:00
Thomas Marchand
77039361fb feat: mission isolation + dashboard improvements
Backend:
- Add mission-specific working directories (/root/work/mission-{id}/)
- Auto-create mission folders with output/ and temp/ subdirs
- Inject mission directory rules into system prompt

Dashboard:
- Fix event filtering: properly filter events when viewing specific missions
- Make Running Missions panel compact (horizontal scrollable)
- Simplify mission card UI with smaller footprint
2025-12-20 08:30:39 +00:00
Thomas Marchand
b1a30a1a05 feat(dashboard): improve parallel mission UI
- Fix event filtering: properly filter events by mission_id when viewing parallel missions
- Fix new mission visibility: update viewingMissionId when creating a mission
- Add model override input: new mission dialog with optional model override field
2025-12-20 08:19:02 +00:00
Thomas Marchand
6cdc40ba3b fix: blocklist Mistral model for malformed function names
Mistral sends prose as function names (e.g., "### Step 2: Explore...")
causing HTTP 400 errors: "Function name must be a-z, A-Z, 0-9, underscores, dashes"

Added to both compatibility.rs and pricing.rs blocklists.
2025-12-20 08:06:02 +00:00