356 Commits

Author SHA1 Message Date
Thomas Marchand
2bb7fa5611 style: format file_ops.rs 2025-12-20 04:05:19 +00:00
Thomas Marchand
8e0614c9ce fix: prevent panic in read_file with out-of-range line numbers
Added bounds checking to prevent slice index panic when start_line
is greater than the file length.
2025-12-20 04:03:49 +00:00
Thomas Marchand
cebded0037 fix: propagate model override to parallel missions
The StartParallel API now accepts an optional 'model' parameter
that takes priority over the DB-stored model_override. This fixes
the issue where parallel missions ignored the requested model.
2025-12-19 22:41:24 +00:00
Thomas Marchand
f08853a275 Improve mission reliability and add architectural safeguards
Major improvements based on analysis of Oraxen/Folia mission failures:

1. Deliverable tracking (src/task/deliverables.rs)
   - Extract expected deliverables from user prompts
   - Track file paths that should be created
   - Detect research tasks and report requirements

2. Mission health monitoring (src/api/mission_runner.rs)
   - Track last activity timestamp
   - Detect stalled missions (>60s idle)
   - Warn when missions end without expected deliverables
   - Add multi-step task instructions to prevent premature completion

3. Write file verification (src/tools/file_ops.rs)
   - Verify written content matches expected length
   - Detect potential truncation (unclosed code blocks, mid-sentence endings)
   - Warn users about incomplete writes

4. Model compatibility (src/budget/compatibility.rs, src/llm/error.rs)
   - Add IncompatibleModel error type for models with broken tool calling
   - Detect non-standard tool call formats in responses
   - Registry of known incompatible model prefixes

5. Dashboard improvements (dashboard/src/app/control/control-client.tsx)
   - Show stall warnings for missions idle >60s
   - Display expected deliverable count
   - Visual indicators for stalled/severely stalled missions

6. System prompt improvements (src/agents/leaf/executor.rs)
   - Emphasize using provided information (don't ask for given URLs)
   - Add multi-step task completion rules
   - Require verification of deliverables before completing
2025-12-19 21:30:59 +00:00
Thomas Marchand
9216aebb6b Block deepseek-r1-distill models with broken tool calling format 2025-12-19 20:45:48 +00:00
Thomas Marchand
a8acb3bbe2 fix: always load fresh mission history when switching
- Always fetch from API instead of using cached items
- Prevents stale events from previous tasks being shown
- Remove auto-caching of SSE events (they lack mission_id)
2025-12-19 20:01:12 +00:00
Thomas Marchand
835fc6a4ed feat: improve parallel mission UX with proper filtering
- Add viewingMissionId state to track which mission's events to show
- Filter SSE events by viewing mission ID
- Store items per-mission to preserve context when switching
- Click mission card to switch view (not just View button)
- Load mission history from API when switching missions
- Show checkmark on currently viewing mission
2025-12-19 19:51:04 +00:00
Thomas Marchand
0ad2366a7e feat: add loop detection to prevent agents from getting stuck
- Track repetition count when same tool call is repeated
- After 3 repetitions: inject warning message to break the loop
- After 5 repetitions: force-complete with error message
- Log warnings when loop is detected for debugging
2025-12-19 19:10:06 +00:00
Thomas Marchand
594a401888 feat: add parallel missions panel to dashboard
- Add mission_id to all event types for filtering
- Add Running Missions panel showing all parallel missions
- Filter SSE events by current mission to avoid mixing
- Add API functions for getRunningMissions, startMissionParallel, cancelMission
- Auto-show panel when multiple missions are running
2025-12-19 18:50:52 +00:00
Thomas Marchand
7ed522861f fix: serialize reasoning as reasoning_details for OpenRouter 2025-12-19 18:40:57 +00:00
Thomas Marchand
a309d32f12 fix: copy Gemini reasoning_details.data to tool_call thought_signature
OpenRouter returns Gemini 3 thought signatures in reasoning_details.data
linked by id to tool calls. We now copy this to the tool_call.thought_signature
so it gets serialized back in subsequent requests.
2025-12-19 18:38:48 +00:00
Thomas Marchand
d156db7cb6 debug: add logging to debug Gemini thought_signature 2025-12-19 18:34:40 +00:00
Thomas Marchand
6b987058f4 fix: add camelCase alias for thoughtSignature to support Gemini 2025-12-19 18:31:54 +00:00
Thomas Marchand
d3133d68cd feat: implement true parallel mission execution
- StartParallel now spawns independent MissionRunner instances
- Each parallel mission has its own history, queue, and cancellation token
- Events from parallel missions tagged with mission_id
- ListRunning returns all parallel runners
- CancelMission works for parallel runners
- Polling loop cleans up completed parallel missions
2025-12-19 18:16:37 +00:00
Thomas Marchand
75be1cd23c fix: use mission's model_override when no explicit model in message 2025-12-19 17:59:41 +00:00
Thomas Marchand
f8c40a176f fix: create_mission API now accepts title and model_override 2025-12-19 17:56:18 +00:00
Thomas Marchand
3f0545cc83 feat: parallel missions, UI improvements, context isolation prompt
Backend:
- Add MissionRunner abstraction for parallel execution
- Add mission_id field to AgentEvent for routing
- Add MAX_PARALLEL_MISSIONS config option
- New API endpoints for parallel mission management

Dashboard:
- Fix tree not updating when switching missions
- Add BrainLogo component for consistent branding
- Improve status panel UI with glass styling

Prompts:
- Add security_audit_v2.md with mandatory workspace setup
- Enforce cloning sources INTO work folder (not /root/context/)
- Add source manifest requirement

Docs:
- Add Context Isolation proposal (Section 7)
- Update testing checklist
2025-12-19 16:55:11 +00:00
Thomas Marchand
f62f034bd4 Add auto-close for stale missions after 24 hours inactivity
- Add STALE_MISSION_HOURS config option (default: 24, 0 to disable)
- Add get_stale_active_missions() query in supabase.rs
- Add background cleanup task that runs hourly
- Auto-marks inactive missions as completed with summary
2025-12-19 15:37:45 +00:00
Thomas Marchand
7607fdb2a3 fix: ios app 2025-12-19 15:33:16 +00:00
Thomas Marchand
b28b6684b5 fix: Swift concurrency warning for ScrollOffsetPreferenceKey
Mark defaultValue as nonisolated(unsafe) to fix Xcode Cloud build error:
"Static property 'defaultValue' is not concurrency-safe because it is
nonisolated global shared mutable state"
2025-12-19 15:31:36 +00:00
Thomas Marchand
9c22c6dc2d feat: model override fixes and command safety guards
- Fix model override to bypass allowlist for user-requested models
- Add command pattern blacklist to block dangerous commands (find /, grep -r /, etc.)
- Persist model_override to missions table in Supabase
- Improve system prompt with explicit deliverable requirements
- Add proposals documentation for future improvements
2025-12-19 15:24:40 +00:00
Thomas Marchand
de8721e09f fix: handle reasoning field as string or array
Some models (Kimi) return `reasoning` as a plain string and
`reasoning_details` as an array. Others may return just one.

- Add flexible get_reasoning() method to handle both formats
- Parse reasoning field as serde_json::Value to avoid type errors
- Convert string reasoning to ReasoningContent when needed
2025-12-19 12:17:15 +00:00
Thomas Marchand
89e8dabe6f docs: update cursor rules to prefer gemini/qwen over Claude
- Change default model ladder to gemini-3-flash → qwen → deepseek
- Add cost warning: never use Claude models (10-50x more expensive)
- Update model tier examples to exclude Claude
- Update DEFAULT_MODEL documentation
2025-12-19 11:59:11 +00:00
Thomas Marchand
7c9f280d65 fix: preserve Gemini thought_signature for tool call continuations
- Add thought_signature field to ToolCall and FunctionCall structs
- Add alias for reasoning_details in OpenRouterMessage
- Expand ReasoningContent with format/index fields
- Add debug logging for reasoning block tracking

Fixes Gemini 3 "Function call is missing a thought_signature" errors
2025-12-19 11:56:42 +00:00
Thomas Marchand
c9734cc746 feat: improve ios 2025-12-19 10:07:21 +00:00
Thomas Marchand
b2c2d05725 Show model override immediately in Agent Tree visualization
When a model override is specified via the control session, the Model
Selector node now displays the requested model immediately instead of
showing "Selecting optimal model..." until completion. This gives better
visibility into which model is being used for each mission.
2025-12-19 09:12:50 +00:00
Thomas Marchand
24809fd0bf Fix MissionStatus type check in recent-tasks component 2025-12-19 08:38:55 +00:00
Thomas Marchand
1fd2ad0702 Add model override support to control session API
The /api/control/message endpoint now accepts an optional `model` field
to specify which model to use for a particular message. This enables:
- Model comparison tests from the dashboard
- Per-message model selection in the control session

The model override is passed through to the task's requested_model field,
which the ModelSelector respects when choosing the execution model.
2025-12-19 08:35:27 +00:00
Thomas Marchand
1f155bd3b5 Add reasoning token support for Gemini 3 thinking models
Gemini 3 and other "thinking" models require reasoning blocks with
thought_signature to be preserved in subsequent requests when using
tool calls. This enables the model to resume its chain of thought.

Changes:
- Add ReasoningContent struct for reasoning blocks
- Add reasoning field to ChatMessage and ChatResponse
- Parse reasoning from OpenRouter responses
- Preserve reasoning when building assistant messages with tool calls

Reference: https://openrouter.ai/docs/use-cases/reasoning-tokens
2025-12-19 08:19:44 +00:00
Thomas Marchand
6177ac3a5f Add model comparison test framework and analysis report
- scripts/check_results.py: Python script to check task results
- scripts/run_security_test.sh: Interactive security test runner
- test_results/MODEL_ANALYSIS_REPORT.md: Comprehensive analysis of model selection

Key findings:
- 8/10 requested models work with the agent
- Gemini 3 thinking models require special reasoning token handling
- Price-based capability estimation underestimates cheap models
- Benchmark data integration not working properly
2025-12-19 07:54:40 +00:00
Thomas Marchand
14241f92c1 Add test models to CAPABLE_MODEL_BASES and create model comparison scripts
- Add moonshotai/kimi-k2-thinking, x-ai/grok-4.1-fast, google/gemini-3-flash-preview,
  deepseek/deepseek-v3.2-speciale, qwen/qwen3-vl-235b-a22b-thinking, amazon/nova-pro-v1,
  z-ai/glm-4.6v and related model variants to the model allowlist
- Create test_model_comparison.sh for full security research task comparison
- Create quick_model_test.sh for rapid model capability verification
2025-12-19 07:33:55 +00:00
Thomas Marchand
2b38422c7d feat: Add model resolver and fix remaining build issues
- Add resolver.rs for model name resolution
- Update budget/mod.rs exports
- Fix remaining compilation errors
2025-12-19 04:32:17 +00:00
Thomas Marchand
0e4588516a feat: Add refresh resilience + progress indicator for dashboard
Backend:
- Add tree_snapshot and progress_snapshot to ControlState for state persistence
- Add GET /api/control/tree and GET /api/control/progress endpoints
- Emit progress events after each subtask wave completes
- Store tree snapshot when emitting tree events

Frontend:
- Agents page fetches tree snapshot on mount before subscribing to SSE
- Control page fetches progress on mount and shows "Subtask X/Y" indicator
- Both pages handle progress SSE events for real-time updates
- Clear state when agent goes idle

Documentation:
- Update dashboard.mdc with refresh resilience pattern
- Update project.mdc with new control session endpoints
2025-12-19 04:31:24 +00:00
Thomas Marchand
c49801466e Fix: Dashboard history and recent tasks visibility
- Recent Tasks widget now shows missions instead of tasks (fixing broken links)
- History page now shows Recent Runs at the top, before Missions
- This ensures completed runs are visible without scrolling past 50+ missions
2025-12-18 22:53:30 +00:00
Thomas Marchand
2c1b280240 Fix: Pass executor output to verifier for accurate verification
The verifier was making decisions without seeing what the executor actually
produced. Now the last_output from the executor is stored on the Task and
passed to the LLM verifier prompt, enabling it to make informed verification
decisions based on actual results rather than just the task description.
2025-12-18 21:43:53 +00:00
Thomas Marchand
e2a06ed393 Improve task planning with CLI guidance and model propagation
Changes:
- Add CLI-preferred approach guidance to task splitting prompt
- Propagate requested_model from parent task to all subtasks
- Use user-requested model directly if available instead of optimizing
- Fix lifetime issues in model selector

These changes should improve Chrome extension extraction tasks by:
1. Preferring curl/wget over browser automation in subtask planning
2. Respecting the user's model choice for all subtasks
2025-12-18 17:34:41 +00:00
Thomas Marchand
c2cbf70f10 Fix task splitting to use dependencies for sequential execution
Key improvements:
- Add dependencies field to task splitting prompt so LLM specifies execution order
- Parse dependencies from LLM response and use them for wave-based execution
- Respect user-requested model as minimum capability floor in model selector
- Add guidance to prefer CLI tools over desktop automation in executor prompt
- Include Chrome extension download URL pattern in system prompt

This fixes the issue where all subtasks ran in parallel even when they had
implicit dependencies (e.g., can't analyze code before downloading it).
2025-12-18 17:24:16 +00:00
Thomas Marchand
e405b74ca0 Fix agent tree node positioning bug
The motion.g wrapper was animating y position while child elements
also used absolute y coordinates. This caused the y offset to be
applied twice, making nodes appear far below where edges ended.

Fix: Remove y animation from the motion.g wrapper, keep only
opacity and scale animations. Child elements already use the
correct absolute x,y coordinates.
2025-12-18 16:53:46 +00:00
Thomas Marchand
9b75a874a1 Fix agent tree layout - reduce gaps and add minimum zoom
Changes:
- Reduced horizontal gap from 60 to 30 for more compact layout
- Reduced vertical gap from 120 to 100 for better visibility
- Added minimum zoom of 0.4 for auto-fit (nodes stay readable)
- Fixed vertical centering - tall trees start from top instead of center
- Increased minimum zoom for scroll from 0.2 to 0.3
- Reduced zoom button increment from 1.2 to 1.15 for finer control
2025-12-18 16:49:32 +00:00
Thomas Marchand
5c7f82da9e Fix agent tree layout algorithm and zoom sensitivity
Issues fixed:
- Layout algorithm was creating edges with wrong positions - rewrote
  to use proper two-pass approach (compute positions first, then edges)
- Zoom sensitivity was too strong (0.9/1.1) - reduced to 0.97/1.03
- Tree now auto-fits to viewport on initial render and when switching demos
- Renamed "Reset" button to "Fit" for clearer purpose
- Added padding around tree for better visibility
2025-12-18 16:42:36 +00:00
Thomas Marchand
adf1427ec2 Add dynamic animated agent tree visualization
Features:
- SVG-based tree visualization with framer-motion animations
- Curved glowing connections with status-based colors
- Each node displays: name, model, status icon, budget spent/allocated
- Interactive pan/zoom with mouse controls
- Details panel on node click showing full info
- Demo mode with 3 tree generators (simple, complex, deep)
- Live simulation updates for testing without API

Components:
- AgentTreeCanvas: Main visualization with SVG rendering
- AnimatedEdge: Curved paths with glow effects and pulse animation
- AnimatedNode: Spring-animated nodes with status indicators
- NodeDetailsPanel: Slide-in panel with agent details

Demo modes:
- Simple: 5 nodes basic orchestrator
- Complex: 10-15 nodes with subtask decomposition
- Deep: 50+ nodes recursive nesting for stress testing

Documentation added to .cursor/rules/dashboard.mdc
2025-12-18 16:26:54 +00:00
Thomas Marchand
cc19303b8c Fix Control view not showing active streaming state
When navigating to the Control view while the agent is running,
users would only see the completed history without any indication
that the agent was actively working.

Changes:
- Add "Agent is working..." indicator when runState is running but
  no active thinking/phase item is visible
- Show animated loader in empty state when agent is running
- Update status handling to properly track reconnection state

This ensures users see that the agent is busy even if they navigate
to Control mid-execution.
2025-12-18 15:37:13 +00:00
Thomas Marchand
0319ea2647 Refactor context handling into centralized ContextConfig and ContextBuilder
- Add ContextConfig to config.rs with all context-related limits
  - Conversation history: max messages, chars per message, total chars
  - Memory retrieval: chunk limit, threshold, facts limit, summaries limit
  - Tool results: truncation limit
  - Directory names: context, work, tools

- Create ContextBuilder (src/memory/context.rs) with:
  - SessionContext: time, working_dir, context_files, mission_title
  - MemoryContext: past_experience, user_facts, mission_summaries
  - Structured context assembly with format() methods
  - Unit tests for all context types

- Refactor executor.rs to use ContextBuilder:
  - Remove hardcoded magic numbers
  - Use config-driven limits for truncation
  - Delegate context building to ContextBuilder

- Refactor control.rs to use ContextBuilder:
  - Remove duplicate build_conversation_context function
  - Use shared ContextConfig for history limits

- Update cursor rules with context architecture docs:
  - project.mdc: Context system section with ContextConfig table
  - secrets.mdc: Environment variables for context limits

All limits now configurable via CONTEXT_* environment variables.
2025-12-18 15:04:33 +00:00
Thomas Marchand
58ec2c19ab Add memory system: session metadata, user facts, mission summaries
- Layer 1: Session metadata injection (time, working dir, context files)
- Layer 2: Memory tools (search_memory, store_fact)
- Layer 3: Auto-inject relevant memory chunks into TaskExecutor
- Layer 4: User facts system with embeddings
- Layer 5: Mission summaries generated on completion

New tables: user_facts, mission_summaries (see docs/MEMORY_TABLES.sql)
2025-12-18 14:27:18 +00:00
Thomas Marchand
2f76ce0dfe Fix module panel close button not working
- Add stopPropagation to panel div to prevent click bubbling
- Add stopPropagation to close buttons
- Allow clicking selected card to toggle/close panel
2025-12-18 13:38:43 +00:00
Thomas Marchand
f42200c998 Add parallel task execution and fix agent panel overlay
UI Fixes:
- Agent details panel now uses fixed overlay positioning (like modules page)
- Added close button (X) and back chevron to panel header
- Panel no longer resizes the main content area

Parallel Task Execution:
- Add execution_waves() method to SubtaskPlan for parallel grouping
- Tasks with no dependencies execute in parallel within each wave
- Wave-based execution respects dependency ordering
- Tree updates are thread-safe using Arc<Mutex>
- Logs show parallel wave progress

This enables faster task execution when subtasks are independent.
2025-12-18 13:35:00 +00:00
Thomas Marchand
3a91db0fe2 Add real-time agent tree visualization with streaming updates
Backend:
- Add AgentTreeNode struct for tree visualization data
- Add AgentTree event type for streaming tree updates
- RootAgent now builds and emits tree structure as it executes
- NodeAgent has execute_with_tree for recursive tree updates
- Tree updates emitted at each step: complexity estimation, model selection,
  task splitting, execution, and verification

Frontend:
- Add realTree state that receives live tree updates via SSE
- Convert backend tree format to frontend AgentNode format
- Auto-expand new nodes as they appear
- Fall back to mock tree when no real data available

This provides real-time visibility into the agent execution hierarchy,
showing multi-level task splitting and progress as it happens.
2025-12-18 11:39:21 +00:00
Thomas Marchand
361f764685 Add agent phase events for preparation feedback
- Add AgentPhase event type showing what agent is doing during prep:
  - estimating_complexity, selecting_model, splitting_task, executing, verifying
- Emit phase events from RootAgent and NodeAgent during their planning phases
- Add emit_phase helper to AgentContext for easy event emission
- Update control-client.tsx to display phase indicator with agent name
- Phase indicator shows current step with subtle animation before thinking starts

This addresses the delay before "thinking" appears by providing immediate
visual feedback when complexity estimation and model selection are happening.
2025-12-18 11:29:03 +00:00
Thomas Marchand
086e3ad3ed Improve Agent Tree page visualization and add recursive task splitting
Dashboard:
- Redesigned Agent Tree page with hierarchical tree visualization
- Added Tree Overview mini-map showing agent counts by status
- Added collapsible nodes with smooth animations
- Added color-coded status indicators (running/completed/failed/pending)
- Added details panel showing agent info, budget, complexity, and logs
- Added Expand All / Collapse All controls

Backend:
- Enhanced NodeAgent with recursive complexity estimation and splitting
- Added extract_json helper for robust LLM response parsing
- RootAgent now delegates to NodeAgent for recursive subtask execution
2025-12-18 11:03:18 +00:00
Thomas Marchand
45f5675d51 Add reasoning_details support for Gemini 3 and Claude 3.7+ models
Models with extended thinking (Gemini 3, Claude 3.7+) require
reasoning_details to be preserved from responses and passed back
in subsequent requests for tool calling to work properly.

- Add reasoning_details field to ChatMessage and ChatResponse
- Parse reasoning_details from OpenRouter responses
- Preserve reasoning_details when creating assistant messages
2025-12-17 21:18:52 +00:00