The StartParallel API now accepts an optional 'model' parameter
that takes priority over the DB-stored model_override. This fixes
the issue where parallel missions ignored the requested model.
Major improvements based on analysis of Oraxen/Folia mission failures:
1. Deliverable tracking (src/task/deliverables.rs)
- Extract expected deliverables from user prompts
- Track file paths that should be created
- Detect research tasks and report requirements
2. Mission health monitoring (src/api/mission_runner.rs)
- Track last activity timestamp
- Detect stalled missions (>60s idle)
- Warn when missions end without expected deliverables
- Add multi-step task instructions to prevent premature completion
3. Write file verification (src/tools/file_ops.rs)
- Verify written content matches expected length
- Detect potential truncation (unclosed code blocks, mid-sentence endings)
- Warn users about incomplete writes
4. Model compatibility (src/budget/compatibility.rs, src/llm/error.rs)
- Add IncompatibleModel error type for models with broken tool calling
- Detect non-standard tool call formats in responses
- Registry of known incompatible model prefixes
5. Dashboard improvements (dashboard/src/app/control/control-client.tsx)
- Show stall warnings for missions idle >60s
- Display expected deliverable count
- Visual indicators for stalled/severely stalled missions
6. System prompt improvements (src/agents/leaf/executor.rs)
- Emphasize using provided information (don't ask for given URLs)
- Add multi-step task completion rules
- Require verification of deliverables before completing
- Always fetch from API instead of using cached items
- Prevents stale events from previous tasks being shown
- Remove auto-caching of SSE events (they lack mission_id)
- Add viewingMissionId state to track which mission's events to show
- Filter SSE events by viewing mission ID
- Store items per-mission to preserve context when switching
- Click mission card to switch view (not just View button)
- Load mission history from API when switching missions
- Show checkmark on currently viewing mission
- Track repetition count when same tool call is repeated
- After 3 repetitions: inject warning message to break the loop
- After 5 repetitions: force-complete with error message
- Log warnings when loop is detected for debugging
- Add mission_id to all event types for filtering
- Add Running Missions panel showing all parallel missions
- Filter SSE events by current mission to avoid mixing
- Add API functions for getRunningMissions, startMissionParallel, cancelMission
- Auto-show panel when multiple missions are running
OpenRouter returns Gemini 3 thought signatures in reasoning_details.data
linked by id to tool calls. We now copy this to the tool_call.thought_signature
so it gets serialized back in subsequent requests.
- StartParallel now spawns independent MissionRunner instances
- Each parallel mission has its own history, queue, and cancellation token
- Events from parallel missions tagged with mission_id
- ListRunning returns all parallel runners
- CancelMission works for parallel runners
- Polling loop cleans up completed parallel missions
Mark defaultValue as nonisolated(unsafe) to fix Xcode Cloud build error:
"Static property 'defaultValue' is not concurrency-safe because it is
nonisolated global shared mutable state"
Some models (Kimi) return `reasoning` as a plain string and
`reasoning_details` as an array. Others may return just one.
- Add flexible get_reasoning() method to handle both formats
- Parse reasoning field as serde_json::Value to avoid type errors
- Convert string reasoning to ReasoningContent when needed
- Change default model ladder to gemini-3-flash → qwen → deepseek
- Add cost warning: never use Claude models (10-50x more expensive)
- Update model tier examples to exclude Claude
- Update DEFAULT_MODEL documentation
- Add thought_signature field to ToolCall and FunctionCall structs
- Add alias for reasoning_details in OpenRouterMessage
- Expand ReasoningContent with format/index fields
- Add debug logging for reasoning block tracking
Fixes Gemini 3 "Function call is missing a thought_signature" errors
When a model override is specified via the control session, the Model
Selector node now displays the requested model immediately instead of
showing "Selecting optimal model..." until completion. This gives better
visibility into which model is being used for each mission.
The /api/control/message endpoint now accepts an optional `model` field
to specify which model to use for a particular message. This enables:
- Model comparison tests from the dashboard
- Per-message model selection in the control session
The model override is passed through to the task's requested_model field,
which the ModelSelector respects when choosing the execution model.
Gemini 3 and other "thinking" models require reasoning blocks with
thought_signature to be preserved in subsequent requests when using
tool calls. This enables the model to resume its chain of thought.
Changes:
- Add ReasoningContent struct for reasoning blocks
- Add reasoning field to ChatMessage and ChatResponse
- Parse reasoning from OpenRouter responses
- Preserve reasoning when building assistant messages with tool calls
Reference: https://openrouter.ai/docs/use-cases/reasoning-tokens
- scripts/check_results.py: Python script to check task results
- scripts/run_security_test.sh: Interactive security test runner
- test_results/MODEL_ANALYSIS_REPORT.md: Comprehensive analysis of model selection
Key findings:
- 8/10 requested models work with the agent
- Gemini 3 thinking models require special reasoning token handling
- Price-based capability estimation underestimates cheap models
- Benchmark data integration not working properly
- Add moonshotai/kimi-k2-thinking, x-ai/grok-4.1-fast, google/gemini-3-flash-preview,
deepseek/deepseek-v3.2-speciale, qwen/qwen3-vl-235b-a22b-thinking, amazon/nova-pro-v1,
z-ai/glm-4.6v and related model variants to the model allowlist
- Create test_model_comparison.sh for full security research task comparison
- Create quick_model_test.sh for rapid model capability verification
Backend:
- Add tree_snapshot and progress_snapshot to ControlState for state persistence
- Add GET /api/control/tree and GET /api/control/progress endpoints
- Emit progress events after each subtask wave completes
- Store tree snapshot when emitting tree events
Frontend:
- Agents page fetches tree snapshot on mount before subscribing to SSE
- Control page fetches progress on mount and shows "Subtask X/Y" indicator
- Both pages handle progress SSE events for real-time updates
- Clear state when agent goes idle
Documentation:
- Update dashboard.mdc with refresh resilience pattern
- Update project.mdc with new control session endpoints
- Recent Tasks widget now shows missions instead of tasks (fixing broken links)
- History page now shows Recent Runs at the top, before Missions
- This ensures completed runs are visible without scrolling past 50+ missions
The verifier was making decisions without seeing what the executor actually
produced. Now the last_output from the executor is stored on the Task and
passed to the LLM verifier prompt, enabling it to make informed verification
decisions based on actual results rather than just the task description.
Changes:
- Add CLI-preferred approach guidance to task splitting prompt
- Propagate requested_model from parent task to all subtasks
- Use user-requested model directly if available instead of optimizing
- Fix lifetime issues in model selector
These changes should improve Chrome extension extraction tasks by:
1. Preferring curl/wget over browser automation in subtask planning
2. Respecting the user's model choice for all subtasks
Key improvements:
- Add dependencies field to task splitting prompt so LLM specifies execution order
- Parse dependencies from LLM response and use them for wave-based execution
- Respect user-requested model as minimum capability floor in model selector
- Add guidance to prefer CLI tools over desktop automation in executor prompt
- Include Chrome extension download URL pattern in system prompt
This fixes the issue where all subtasks ran in parallel even when they had
implicit dependencies (e.g., can't analyze code before downloading it).
The motion.g wrapper was animating y position while child elements
also used absolute y coordinates. This caused the y offset to be
applied twice, making nodes appear far below where edges ended.
Fix: Remove y animation from the motion.g wrapper, keep only
opacity and scale animations. Child elements already use the
correct absolute x,y coordinates.
Changes:
- Reduced horizontal gap from 60 to 30 for more compact layout
- Reduced vertical gap from 120 to 100 for better visibility
- Added minimum zoom of 0.4 for auto-fit (nodes stay readable)
- Fixed vertical centering - tall trees start from top instead of center
- Increased minimum zoom for scroll from 0.2 to 0.3
- Reduced zoom button increment from 1.2 to 1.15 for finer control
Issues fixed:
- Layout algorithm was creating edges with wrong positions - rewrote
to use proper two-pass approach (compute positions first, then edges)
- Zoom sensitivity was too strong (0.9/1.1) - reduced to 0.97/1.03
- Tree now auto-fits to viewport on initial render and when switching demos
- Renamed "Reset" button to "Fit" for clearer purpose
- Added padding around tree for better visibility
Features:
- SVG-based tree visualization with framer-motion animations
- Curved glowing connections with status-based colors
- Each node displays: name, model, status icon, budget spent/allocated
- Interactive pan/zoom with mouse controls
- Details panel on node click showing full info
- Demo mode with 3 tree generators (simple, complex, deep)
- Live simulation updates for testing without API
Components:
- AgentTreeCanvas: Main visualization with SVG rendering
- AnimatedEdge: Curved paths with glow effects and pulse animation
- AnimatedNode: Spring-animated nodes with status indicators
- NodeDetailsPanel: Slide-in panel with agent details
Demo modes:
- Simple: 5 nodes basic orchestrator
- Complex: 10-15 nodes with subtask decomposition
- Deep: 50+ nodes recursive nesting for stress testing
Documentation added to .cursor/rules/dashboard.mdc
When navigating to the Control view while the agent is running,
users would only see the completed history without any indication
that the agent was actively working.
Changes:
- Add "Agent is working..." indicator when runState is running but
no active thinking/phase item is visible
- Show animated loader in empty state when agent is running
- Update status handling to properly track reconnection state
This ensures users see that the agent is busy even if they navigate
to Control mid-execution.
- Add ContextConfig to config.rs with all context-related limits
- Conversation history: max messages, chars per message, total chars
- Memory retrieval: chunk limit, threshold, facts limit, summaries limit
- Tool results: truncation limit
- Directory names: context, work, tools
- Create ContextBuilder (src/memory/context.rs) with:
- SessionContext: time, working_dir, context_files, mission_title
- MemoryContext: past_experience, user_facts, mission_summaries
- Structured context assembly with format() methods
- Unit tests for all context types
- Refactor executor.rs to use ContextBuilder:
- Remove hardcoded magic numbers
- Use config-driven limits for truncation
- Delegate context building to ContextBuilder
- Refactor control.rs to use ContextBuilder:
- Remove duplicate build_conversation_context function
- Use shared ContextConfig for history limits
- Update cursor rules with context architecture docs:
- project.mdc: Context system section with ContextConfig table
- secrets.mdc: Environment variables for context limits
All limits now configurable via CONTEXT_* environment variables.
- Add stopPropagation to panel div to prevent click bubbling
- Add stopPropagation to close buttons
- Allow clicking selected card to toggle/close panel
UI Fixes:
- Agent details panel now uses fixed overlay positioning (like modules page)
- Added close button (X) and back chevron to panel header
- Panel no longer resizes the main content area
Parallel Task Execution:
- Add execution_waves() method to SubtaskPlan for parallel grouping
- Tasks with no dependencies execute in parallel within each wave
- Wave-based execution respects dependency ordering
- Tree updates are thread-safe using Arc<Mutex>
- Logs show parallel wave progress
This enables faster task execution when subtasks are independent.
Backend:
- Add AgentTreeNode struct for tree visualization data
- Add AgentTree event type for streaming tree updates
- RootAgent now builds and emits tree structure as it executes
- NodeAgent has execute_with_tree for recursive tree updates
- Tree updates emitted at each step: complexity estimation, model selection,
task splitting, execution, and verification
Frontend:
- Add realTree state that receives live tree updates via SSE
- Convert backend tree format to frontend AgentNode format
- Auto-expand new nodes as they appear
- Fall back to mock tree when no real data available
This provides real-time visibility into the agent execution hierarchy,
showing multi-level task splitting and progress as it happens.
- Add AgentPhase event type showing what agent is doing during prep:
- estimating_complexity, selecting_model, splitting_task, executing, verifying
- Emit phase events from RootAgent and NodeAgent during their planning phases
- Add emit_phase helper to AgentContext for easy event emission
- Update control-client.tsx to display phase indicator with agent name
- Phase indicator shows current step with subtle animation before thinking starts
This addresses the delay before "thinking" appears by providing immediate
visual feedback when complexity estimation and model selection are happening.
Dashboard:
- Redesigned Agent Tree page with hierarchical tree visualization
- Added Tree Overview mini-map showing agent counts by status
- Added collapsible nodes with smooth animations
- Added color-coded status indicators (running/completed/failed/pending)
- Added details panel showing agent info, budget, complexity, and logs
- Added Expand All / Collapse All controls
Backend:
- Enhanced NodeAgent with recursive complexity estimation and splitting
- Added extract_json helper for robust LLM response parsing
- RootAgent now delegates to NodeAgent for recursive subtask execution
Models with extended thinking (Gemini 3, Claude 3.7+) require
reasoning_details to be preserved from responses and passed back
in subsequent requests for tool calling to work properly.
- Add reasoning_details field to ChatMessage and ChatResponse
- Parse reasoning_details from OpenRouter responses
- Preserve reasoning_details when creating assistant messages