openagent

Author	SHA1	Message	Date
Thomas Marchand	2bb7fa5611	style: format file_ops.rs	2025-12-20 04:05:19 +00:00
Thomas Marchand	8e0614c9ce	fix: prevent panic in read_file with out-of-range line numbers Added bounds checking to prevent slice index panic when start_line is greater than the file length.	2025-12-20 04:03:49 +00:00
Thomas Marchand	cebded0037	fix: propagate model override to parallel missions The StartParallel API now accepts an optional 'model' parameter that takes priority over the DB-stored model_override. This fixes the issue where parallel missions ignored the requested model.	2025-12-19 22:41:24 +00:00
Thomas Marchand	f08853a275	Improve mission reliability and add architectural safeguards Major improvements based on analysis of Oraxen/Folia mission failures: 1. Deliverable tracking (src/task/deliverables.rs) - Extract expected deliverables from user prompts - Track file paths that should be created - Detect research tasks and report requirements 2. Mission health monitoring (src/api/mission_runner.rs) - Track last activity timestamp - Detect stalled missions (>60s idle) - Warn when missions end without expected deliverables - Add multi-step task instructions to prevent premature completion 3. Write file verification (src/tools/file_ops.rs) - Verify written content matches expected length - Detect potential truncation (unclosed code blocks, mid-sentence endings) - Warn users about incomplete writes 4. Model compatibility (src/budget/compatibility.rs, src/llm/error.rs) - Add IncompatibleModel error type for models with broken tool calling - Detect non-standard tool call formats in responses - Registry of known incompatible model prefixes 5. Dashboard improvements (dashboard/src/app/control/control-client.tsx) - Show stall warnings for missions idle >60s - Display expected deliverable count - Visual indicators for stalled/severely stalled missions 6. System prompt improvements (src/agents/leaf/executor.rs) - Emphasize using provided information (don't ask for given URLs) - Add multi-step task completion rules - Require verification of deliverables before completing	2025-12-19 21:30:59 +00:00
Thomas Marchand	9216aebb6b	Block deepseek-r1-distill models with broken tool calling format	2025-12-19 20:45:48 +00:00
Thomas Marchand	a8acb3bbe2	fix: always load fresh mission history when switching - Always fetch from API instead of using cached items - Prevents stale events from previous tasks being shown - Remove auto-caching of SSE events (they lack mission_id)	2025-12-19 20:01:12 +00:00
Thomas Marchand	835fc6a4ed	feat: improve parallel mission UX with proper filtering - Add viewingMissionId state to track which mission's events to show - Filter SSE events by viewing mission ID - Store items per-mission to preserve context when switching - Click mission card to switch view (not just View button) - Load mission history from API when switching missions - Show checkmark on currently viewing mission	2025-12-19 19:51:04 +00:00
Thomas Marchand	0ad2366a7e	feat: add loop detection to prevent agents from getting stuck - Track repetition count when same tool call is repeated - After 3 repetitions: inject warning message to break the loop - After 5 repetitions: force-complete with error message - Log warnings when loop is detected for debugging	2025-12-19 19:10:06 +00:00
Thomas Marchand	594a401888	feat: add parallel missions panel to dashboard - Add mission_id to all event types for filtering - Add Running Missions panel showing all parallel missions - Filter SSE events by current mission to avoid mixing - Add API functions for getRunningMissions, startMissionParallel, cancelMission - Auto-show panel when multiple missions are running	2025-12-19 18:50:52 +00:00
Thomas Marchand	7ed522861f	fix: serialize reasoning as reasoning_details for OpenRouter	2025-12-19 18:40:57 +00:00
Thomas Marchand	a309d32f12	fix: copy Gemini reasoning_details.data to tool_call thought_signature OpenRouter returns Gemini 3 thought signatures in reasoning_details.data linked by id to tool calls. We now copy this to the tool_call.thought_signature so it gets serialized back in subsequent requests.	2025-12-19 18:38:48 +00:00
Thomas Marchand	d156db7cb6	debug: add logging to debug Gemini thought_signature	2025-12-19 18:34:40 +00:00
Thomas Marchand	6b987058f4	fix: add camelCase alias for thoughtSignature to support Gemini	2025-12-19 18:31:54 +00:00
Thomas Marchand	d3133d68cd	feat: implement true parallel mission execution - StartParallel now spawns independent MissionRunner instances - Each parallel mission has its own history, queue, and cancellation token - Events from parallel missions tagged with mission_id - ListRunning returns all parallel runners - CancelMission works for parallel runners - Polling loop cleans up completed parallel missions	2025-12-19 18:16:37 +00:00
Thomas Marchand	75be1cd23c	fix: use mission's model_override when no explicit model in message	2025-12-19 17:59:41 +00:00
Thomas Marchand	f8c40a176f	fix: create_mission API now accepts title and model_override	2025-12-19 17:56:18 +00:00
Thomas Marchand	3f0545cc83	feat: parallel missions, UI improvements, context isolation prompt Backend: - Add MissionRunner abstraction for parallel execution - Add mission_id field to AgentEvent for routing - Add MAX_PARALLEL_MISSIONS config option - New API endpoints for parallel mission management Dashboard: - Fix tree not updating when switching missions - Add BrainLogo component for consistent branding - Improve status panel UI with glass styling Prompts: - Add security_audit_v2.md with mandatory workspace setup - Enforce cloning sources INTO work folder (not /root/context/) - Add source manifest requirement Docs: - Add Context Isolation proposal (Section 7) - Update testing checklist	2025-12-19 16:55:11 +00:00
Thomas Marchand	f62f034bd4	Add auto-close for stale missions after 24 hours inactivity - Add STALE_MISSION_HOURS config option (default: 24, 0 to disable) - Add get_stale_active_missions() query in supabase.rs - Add background cleanup task that runs hourly - Auto-marks inactive missions as completed with summary	2025-12-19 15:37:45 +00:00
Thomas Marchand	7607fdb2a3	fix: ios app	2025-12-19 15:33:16 +00:00
Thomas Marchand	b28b6684b5	fix: Swift concurrency warning for ScrollOffsetPreferenceKey Mark defaultValue as nonisolated(unsafe) to fix Xcode Cloud build error: "Static property 'defaultValue' is not concurrency-safe because it is nonisolated global shared mutable state"	2025-12-19 15:31:36 +00:00
Thomas Marchand	9c22c6dc2d	feat: model override fixes and command safety guards - Fix model override to bypass allowlist for user-requested models - Add command pattern blacklist to block dangerous commands (find /, grep -r /, etc.) - Persist model_override to missions table in Supabase - Improve system prompt with explicit deliverable requirements - Add proposals documentation for future improvements	2025-12-19 15:24:40 +00:00
Thomas Marchand	de8721e09f	fix: handle reasoning field as string or array Some models (Kimi) return `reasoning` as a plain string and `reasoning_details` as an array. Others may return just one. - Add flexible get_reasoning() method to handle both formats - Parse reasoning field as serde_json::Value to avoid type errors - Convert string reasoning to ReasoningContent when needed	2025-12-19 12:17:15 +00:00
Thomas Marchand	89e8dabe6f	docs: update cursor rules to prefer gemini/qwen over Claude - Change default model ladder to gemini-3-flash → qwen → deepseek - Add cost warning: never use Claude models (10-50x more expensive) - Update model tier examples to exclude Claude - Update DEFAULT_MODEL documentation	2025-12-19 11:59:11 +00:00
Thomas Marchand	7c9f280d65	fix: preserve Gemini thought_signature for tool call continuations - Add thought_signature field to ToolCall and FunctionCall structs - Add alias for reasoning_details in OpenRouterMessage - Expand ReasoningContent with format/index fields - Add debug logging for reasoning block tracking Fixes Gemini 3 "Function call is missing a thought_signature" errors	2025-12-19 11:56:42 +00:00
Thomas Marchand	c9734cc746	feat: improve ios	2025-12-19 10:07:21 +00:00
Thomas Marchand	b2c2d05725	Show model override immediately in Agent Tree visualization When a model override is specified via the control session, the Model Selector node now displays the requested model immediately instead of showing "Selecting optimal model..." until completion. This gives better visibility into which model is being used for each mission.	2025-12-19 09:12:50 +00:00
Thomas Marchand	24809fd0bf	Fix MissionStatus type check in recent-tasks component	2025-12-19 08:38:55 +00:00
Thomas Marchand	1fd2ad0702	Add model override support to control session API The /api/control/message endpoint now accepts an optional `model` field to specify which model to use for a particular message. This enables: - Model comparison tests from the dashboard - Per-message model selection in the control session The model override is passed through to the task's requested_model field, which the ModelSelector respects when choosing the execution model.	2025-12-19 08:35:27 +00:00
Thomas Marchand	1f155bd3b5	Add reasoning token support for Gemini 3 thinking models Gemini 3 and other "thinking" models require reasoning blocks with thought_signature to be preserved in subsequent requests when using tool calls. This enables the model to resume its chain of thought. Changes: - Add ReasoningContent struct for reasoning blocks - Add reasoning field to ChatMessage and ChatResponse - Parse reasoning from OpenRouter responses - Preserve reasoning when building assistant messages with tool calls Reference: https://openrouter.ai/docs/use-cases/reasoning-tokens	2025-12-19 08:19:44 +00:00
Thomas Marchand	6177ac3a5f	Add model comparison test framework and analysis report - scripts/check_results.py: Python script to check task results - scripts/run_security_test.sh: Interactive security test runner - test_results/MODEL_ANALYSIS_REPORT.md: Comprehensive analysis of model selection Key findings: - 8/10 requested models work with the agent - Gemini 3 thinking models require special reasoning token handling - Price-based capability estimation underestimates cheap models - Benchmark data integration not working properly	2025-12-19 07:54:40 +00:00
Thomas Marchand	14241f92c1	Add test models to CAPABLE_MODEL_BASES and create model comparison scripts - Add moonshotai/kimi-k2-thinking, x-ai/grok-4.1-fast, google/gemini-3-flash-preview, deepseek/deepseek-v3.2-speciale, qwen/qwen3-vl-235b-a22b-thinking, amazon/nova-pro-v1, z-ai/glm-4.6v and related model variants to the model allowlist - Create test_model_comparison.sh for full security research task comparison - Create quick_model_test.sh for rapid model capability verification	2025-12-19 07:33:55 +00:00
Thomas Marchand	2b38422c7d	feat: Add model resolver and fix remaining build issues - Add resolver.rs for model name resolution - Update budget/mod.rs exports - Fix remaining compilation errors	2025-12-19 04:32:17 +00:00
Thomas Marchand	0e4588516a	feat: Add refresh resilience + progress indicator for dashboard Backend: - Add tree_snapshot and progress_snapshot to ControlState for state persistence - Add GET /api/control/tree and GET /api/control/progress endpoints - Emit progress events after each subtask wave completes - Store tree snapshot when emitting tree events Frontend: - Agents page fetches tree snapshot on mount before subscribing to SSE - Control page fetches progress on mount and shows "Subtask X/Y" indicator - Both pages handle progress SSE events for real-time updates - Clear state when agent goes idle Documentation: - Update dashboard.mdc with refresh resilience pattern - Update project.mdc with new control session endpoints	2025-12-19 04:31:24 +00:00
Thomas Marchand	c49801466e	Fix: Dashboard history and recent tasks visibility - Recent Tasks widget now shows missions instead of tasks (fixing broken links) - History page now shows Recent Runs at the top, before Missions - This ensures completed runs are visible without scrolling past 50+ missions	2025-12-18 22:53:30 +00:00
Thomas Marchand	2c1b280240	Fix: Pass executor output to verifier for accurate verification The verifier was making decisions without seeing what the executor actually produced. Now the last_output from the executor is stored on the Task and passed to the LLM verifier prompt, enabling it to make informed verification decisions based on actual results rather than just the task description.	2025-12-18 21:43:53 +00:00
Thomas Marchand	e2a06ed393	Improve task planning with CLI guidance and model propagation Changes: - Add CLI-preferred approach guidance to task splitting prompt - Propagate requested_model from parent task to all subtasks - Use user-requested model directly if available instead of optimizing - Fix lifetime issues in model selector These changes should improve Chrome extension extraction tasks by: 1. Preferring curl/wget over browser automation in subtask planning 2. Respecting the user's model choice for all subtasks	2025-12-18 17:34:41 +00:00
Thomas Marchand	c2cbf70f10	Fix task splitting to use dependencies for sequential execution Key improvements: - Add dependencies field to task splitting prompt so LLM specifies execution order - Parse dependencies from LLM response and use them for wave-based execution - Respect user-requested model as minimum capability floor in model selector - Add guidance to prefer CLI tools over desktop automation in executor prompt - Include Chrome extension download URL pattern in system prompt This fixes the issue where all subtasks ran in parallel even when they had implicit dependencies (e.g., can't analyze code before downloading it).	2025-12-18 17:24:16 +00:00
Thomas Marchand	e405b74ca0	Fix agent tree node positioning bug The motion.g wrapper was animating y position while child elements also used absolute y coordinates. This caused the y offset to be applied twice, making nodes appear far below where edges ended. Fix: Remove y animation from the motion.g wrapper, keep only opacity and scale animations. Child elements already use the correct absolute x,y coordinates.	2025-12-18 16:53:46 +00:00
Thomas Marchand	9b75a874a1	Fix agent tree layout - reduce gaps and add minimum zoom Changes: - Reduced horizontal gap from 60 to 30 for more compact layout - Reduced vertical gap from 120 to 100 for better visibility - Added minimum zoom of 0.4 for auto-fit (nodes stay readable) - Fixed vertical centering - tall trees start from top instead of center - Increased minimum zoom for scroll from 0.2 to 0.3 - Reduced zoom button increment from 1.2 to 1.15 for finer control	2025-12-18 16:49:32 +00:00
Thomas Marchand	5c7f82da9e	Fix agent tree layout algorithm and zoom sensitivity Issues fixed: - Layout algorithm was creating edges with wrong positions - rewrote to use proper two-pass approach (compute positions first, then edges) - Zoom sensitivity was too strong (0.9/1.1) - reduced to 0.97/1.03 - Tree now auto-fits to viewport on initial render and when switching demos - Renamed "Reset" button to "Fit" for clearer purpose - Added padding around tree for better visibility	2025-12-18 16:42:36 +00:00
Thomas Marchand	adf1427ec2	Add dynamic animated agent tree visualization Features: - SVG-based tree visualization with framer-motion animations - Curved glowing connections with status-based colors - Each node displays: name, model, status icon, budget spent/allocated - Interactive pan/zoom with mouse controls - Details panel on node click showing full info - Demo mode with 3 tree generators (simple, complex, deep) - Live simulation updates for testing without API Components: - AgentTreeCanvas: Main visualization with SVG rendering - AnimatedEdge: Curved paths with glow effects and pulse animation - AnimatedNode: Spring-animated nodes with status indicators - NodeDetailsPanel: Slide-in panel with agent details Demo modes: - Simple: 5 nodes basic orchestrator - Complex: 10-15 nodes with subtask decomposition - Deep: 50+ nodes recursive nesting for stress testing Documentation added to .cursor/rules/dashboard.mdc	2025-12-18 16:26:54 +00:00
Thomas Marchand	cc19303b8c	Fix Control view not showing active streaming state When navigating to the Control view while the agent is running, users would only see the completed history without any indication that the agent was actively working. Changes: - Add "Agent is working..." indicator when runState is running but no active thinking/phase item is visible - Show animated loader in empty state when agent is running - Update status handling to properly track reconnection state This ensures users see that the agent is busy even if they navigate to Control mid-execution.	2025-12-18 15:37:13 +00:00
Thomas Marchand	0319ea2647	Refactor context handling into centralized ContextConfig and ContextBuilder - Add ContextConfig to config.rs with all context-related limits - Conversation history: max messages, chars per message, total chars - Memory retrieval: chunk limit, threshold, facts limit, summaries limit - Tool results: truncation limit - Directory names: context, work, tools - Create ContextBuilder (src/memory/context.rs) with: - SessionContext: time, working_dir, context_files, mission_title - MemoryContext: past_experience, user_facts, mission_summaries - Structured context assembly with format() methods - Unit tests for all context types - Refactor executor.rs to use ContextBuilder: - Remove hardcoded magic numbers - Use config-driven limits for truncation - Delegate context building to ContextBuilder - Refactor control.rs to use ContextBuilder: - Remove duplicate build_conversation_context function - Use shared ContextConfig for history limits - Update cursor rules with context architecture docs: - project.mdc: Context system section with ContextConfig table - secrets.mdc: Environment variables for context limits All limits now configurable via CONTEXT_* environment variables.	2025-12-18 15:04:33 +00:00
Thomas Marchand	58ec2c19ab	Add memory system: session metadata, user facts, mission summaries - Layer 1: Session metadata injection (time, working dir, context files) - Layer 2: Memory tools (search_memory, store_fact) - Layer 3: Auto-inject relevant memory chunks into TaskExecutor - Layer 4: User facts system with embeddings - Layer 5: Mission summaries generated on completion New tables: user_facts, mission_summaries (see docs/MEMORY_TABLES.sql)	2025-12-18 14:27:18 +00:00
Thomas Marchand	2f76ce0dfe	Fix module panel close button not working - Add stopPropagation to panel div to prevent click bubbling - Add stopPropagation to close buttons - Allow clicking selected card to toggle/close panel	2025-12-18 13:38:43 +00:00
Thomas Marchand	f42200c998	Add parallel task execution and fix agent panel overlay UI Fixes: - Agent details panel now uses fixed overlay positioning (like modules page) - Added close button (X) and back chevron to panel header - Panel no longer resizes the main content area Parallel Task Execution: - Add execution_waves() method to SubtaskPlan for parallel grouping - Tasks with no dependencies execute in parallel within each wave - Wave-based execution respects dependency ordering - Tree updates are thread-safe using Arc<Mutex> - Logs show parallel wave progress This enables faster task execution when subtasks are independent.	2025-12-18 13:35:00 +00:00
Thomas Marchand	3a91db0fe2	Add real-time agent tree visualization with streaming updates Backend: - Add AgentTreeNode struct for tree visualization data - Add AgentTree event type for streaming tree updates - RootAgent now builds and emits tree structure as it executes - NodeAgent has execute_with_tree for recursive tree updates - Tree updates emitted at each step: complexity estimation, model selection, task splitting, execution, and verification Frontend: - Add realTree state that receives live tree updates via SSE - Convert backend tree format to frontend AgentNode format - Auto-expand new nodes as they appear - Fall back to mock tree when no real data available This provides real-time visibility into the agent execution hierarchy, showing multi-level task splitting and progress as it happens.	2025-12-18 11:39:21 +00:00
Thomas Marchand	361f764685	Add agent phase events for preparation feedback - Add AgentPhase event type showing what agent is doing during prep: - estimating_complexity, selecting_model, splitting_task, executing, verifying - Emit phase events from RootAgent and NodeAgent during their planning phases - Add emit_phase helper to AgentContext for easy event emission - Update control-client.tsx to display phase indicator with agent name - Phase indicator shows current step with subtle animation before thinking starts This addresses the delay before "thinking" appears by providing immediate visual feedback when complexity estimation and model selection are happening.	2025-12-18 11:29:03 +00:00
Thomas Marchand	086e3ad3ed	Improve Agent Tree page visualization and add recursive task splitting Dashboard: - Redesigned Agent Tree page with hierarchical tree visualization - Added Tree Overview mini-map showing agent counts by status - Added collapsible nodes with smooth animations - Added color-coded status indicators (running/completed/failed/pending) - Added details panel showing agent info, budget, complexity, and logs - Added Expand All / Collapse All controls Backend: - Enhanced NodeAgent with recursive complexity estimation and splitting - Added extract_json helper for robust LLM response parsing - RootAgent now delegates to NodeAgent for recursive subtask execution	2025-12-18 11:03:18 +00:00
Thomas Marchand	45f5675d51	Add reasoning_details support for Gemini 3 and Claude 3.7+ models Models with extended thinking (Gemini 3, Claude 3.7+) require reasoning_details to be preserved from responses and passed back in subsequent requests for tool calling to work properly. - Add reasoning_details field to ChatMessage and ChatResponse - Parse reasoning_details from OpenRouter responses - Preserve reasoning_details when creating assistant messages	2025-12-17 21:18:52 +00:00

... 3 4 5 6 7 ...

356 Commits