Add stall detection warning for stuck agent operations (#28)

* Add multi-user auth and per-user control sessions

* Add mission store abstraction and auth UX polish

* Fix unused warnings in tooling

* Fix Bugbot review issues

- Prevent username enumeration by using generic error message
- Add pagination support to InMemoryMissionStore::list_missions
- Improve config error when JWT_SECRET missing but DASHBOARD_PASSWORD set

* Trim stored username in comparison for consistency

* Fix mission cleanup to also remove orphaned tree data

* Refactor Open Agent as OpenCode workspace host

* Remove chromiumoxide and pin @types/react

* Pin idna_adapter for MSRV compatibility

* Add host-mcp bin target

* Use isolated Playwright MCP sessions

* Allow Playwright MCP as root

* Fix iOS dashboard warnings

* Add autoFocus to username field in multi-user login mode

Mirrors the iOS implementation behavior where username field is focused
when multi-user auth mode is active.

* Fix Bugbot review issues

- Add conditional ellipsis for tool descriptions (only when > 32 chars)
- Add serde(default) to JWT usr field for backward compatibility

* Fix empty user ID fallback in multi-user auth

Add effective_user_id helper that falls back to username when id is empty,
preventing session sharing and token verification issues.

* Fix parallel mission history preservation

Load existing mission history into runner before starting parallel
execution to prevent losing conversation context.

* Fix desktop stream controls layout overflow on iPad

- Add frame(maxWidth: .infinity) constraints to ensure controls stay
  within bounds on wide displays
- Add alignment: .leading to VStacks for consistent layout
- Add Spacer() to buttons row to prevent spreading
- Increase label width to 55 for consistent FPS/Quality alignment
- Add alignment: .trailing to value text frames

* Fix queued user messages not persisted to mission history

When a user message was queued (sent while another task was running),
it was not being added to the history or persisted to the database.
This caused queued messages to be lost from mission history.

Added the same persistence logic used for initial messages to the
queued message handling code path.

* Add stall detection warning for stuck agent operations

When an agent hasn't reported activity for 60+ seconds, show a warning
banner in the chat UI with a Stop button. After 120+ seconds, the warning
becomes more urgent with a Force Stop button.

Changes:
- Dashboard: Add viewingMissionStallSeconds tracking and stall warning banner
- Backend: Update parallel runner last_activity when receiving events

This helps users identify and cancel stuck missions (e.g., when OpenCode
tool execution hangs indefinitely).

* Fix main mission stall detection always reporting zero

Track main_runner_last_activity separately from parallel runners.
Update activity timestamp when events match the running main mission.

Resolves Bugbot review finding.

* Reset stall timer when new task starts

Reset main_runner_last_activity when spawning a new task to prevent
false stall warnings from idle time between tasks.

Resolves Bugbot review finding.

* Update CLAUDE.md to prefer debug builds by default

- Debug builds compile 5-10x faster than release builds
- Only use --release for production deployment or when explicitly requested
- Added Build Mode Policy section documenting this preference
This commit is contained in:
Thomas Marchand
2026-01-04 23:55:27 -08:00
committed by GitHub
parent a3d3437b1d
commit aa65c4a1ef
3 changed files with 113 additions and 8 deletions

View File

@@ -15,14 +15,18 @@ Minimal autonomous coding agent in Rust with **full machine access** (not sandbo
## Commands
```bash
# Backend
cargo build --release # Build
cargo run --release # Run server (port 3000)
RUST_LOG=debug cargo run # Debug mode
# Backend - ALWAYS use debug builds by default (faster compilation)
cargo build # Build (debug mode - use this by default)
cargo run # Run server (port 3000)
RUST_LOG=debug cargo run # Run with debug logging
cargo test # Run tests
cargo fmt # Format code
cargo clippy # Lint
# Release builds - ONLY use when explicitly requested or for production deployment
cargo build --release # Release build (slower compilation, faster binary)
cargo run --release # Run in release mode
# Dashboard (uses Bun, NOT npm/yarn/pnpm)
cd dashboard
bun install # Install deps (NEVER use npm install)
@@ -34,10 +38,18 @@ bun run build # Production build
# - bun add <pkg> (not npm install <pkg>)
# - bun run <script> (not npm run <script>)
# Deployment
# Deployment (release build required for production)
ssh root@95.216.112.253 'cd /root/open_agent && git pull && cargo build --release && cp target/release/open_agent /usr/local/bin/ && cp target/release/desktop-mcp /usr/local/bin/ && cp target/release/host-mcp /usr/local/bin/ && systemctl restart open_agent'
```
## Build Mode Policy
**Always prefer debug builds** unless explicitly requested otherwise:
- Debug builds compile much faster (~5-10x)
- Use `cargo build` and `cargo run` (no `--release` flag)
- Only use `--release` for production deployment or when user explicitly requests it
- Performance difference is negligible for development/testing
## Architecture
Open Agent uses OpenCode as its execution backend, enabling Claude Max subscription usage.
@@ -144,7 +156,7 @@ OPENCODE_PERMISSIVE=true
**Desktop Tools with OpenCode:**
To enable desktop tools (i3, Xvfb, screenshots):
1. Build the MCP servers: `cargo build --release --bin desktop-mcp --bin host-mcp`
1. Build the MCP servers: `cargo build --bin desktop-mcp --bin host-mcp` (use `--release` only for production)
2. Workspace `opencode.json` files are generated automatically under `workspaces/`
from `.openagent/mcp/config.json` (override by editing MCP configs via the UI).
3. OpenCode will automatically load the tools from the MCP server

View File

@@ -70,6 +70,7 @@ import {
PanelRight,
Wifi,
WifiOff,
AlertTriangle,
} from "lucide-react";
import {
OptionList,
@@ -664,6 +665,18 @@ export default function ControlClient() {
return mission.state === "running" || mission.state === "waiting_for_tool";
}, [viewingMissionId, runningMissions, runState]);
// Check if the mission we're viewing appears stalled (no activity for 60+ seconds)
const viewingMissionStallSeconds = useMemo(() => {
if (!viewingMissionId) return 0;
const mission = runningMissions.find((m) => m.mission_id === viewingMissionId);
if (!mission) return 0;
if (mission.state !== "running") return 0;
return mission.seconds_since_activity;
}, [viewingMissionId, runningMissions]);
const isViewingMissionStalled = viewingMissionStallSeconds >= 60;
const isViewingMissionSeverelyStalled = viewingMissionStallSeconds >= 120;
const isBusy = viewingMissionIsRunning;
const streamCleanupRef = useRef<null | (() => void)>(null);
@@ -2535,6 +2548,53 @@ export default function ControlClient() {
</div>
)}
{/* Stall warning banner when agent hasn't reported activity for 60+ seconds */}
{isViewingMissionStalled && viewingMissionId && (
<div className="flex justify-center py-4 animate-fade-in">
<div className={cn(
"flex flex-col sm:flex-row items-start sm:items-center gap-3 rounded-xl px-5 py-4",
isViewingMissionSeverelyStalled
? "bg-red-500/10 border border-red-500/20"
: "bg-amber-500/10 border border-amber-500/20"
)}>
<div className="flex items-center gap-3">
<AlertTriangle className={cn(
"h-5 w-5 shrink-0",
isViewingMissionSeverelyStalled ? "text-red-400" : "text-amber-400"
)} />
<div className="text-sm">
<span className={cn(
"font-medium",
isViewingMissionSeverelyStalled ? "text-red-400" : "text-amber-400"
)}>
Agent may be stuck
</span>
<span className="text-white/50 ml-1">
No activity for {Math.floor(viewingMissionStallSeconds)}s
</span>
<p className="text-white/40 text-xs mt-1">
{isViewingMissionSeverelyStalled
? "The agent appears to be stuck on a long-running operation. Consider stopping it."
: "A tool or external operation may be taking longer than expected."}
</p>
</div>
</div>
<button
onClick={() => handleCancelMission(viewingMissionId)}
className={cn(
"shrink-0 inline-flex items-center gap-1.5 rounded-lg px-3 py-1.5 text-sm font-medium transition-colors",
isViewingMissionSeverelyStalled
? "bg-red-500 text-white hover:bg-red-400"
: "bg-amber-500/20 text-amber-400 hover:bg-amber-500/30 border border-amber-500/30"
)}
>
<Square className="h-3.5 w-3.5" />
{isViewingMissionSeverelyStalled ? "Force Stop" : "Stop"}
</button>
</div>
</div>
)}
{/* Continue banner for blocked missions */}
{activeMission?.status === "blocked" && items.length > 0 && (
<div className="flex justify-center py-4">

View File

@@ -1624,7 +1624,7 @@ fn spawn_control_session(
mission_store: Arc<dyn MissionStore>,
) -> ControlState {
let (cmd_tx, cmd_rx) = mpsc::channel::<ControlCommand>(256);
let (events_tx, _events_rx) = broadcast::channel::<AgentEvent>(1024);
let (events_tx, events_rx) = broadcast::channel::<AgentEvent>(1024);
let tool_hub = Arc::new(FrontendToolHub::new());
let status = Arc::new(RwLock::new(ControlStatus {
state: ControlRunState::Idle,
@@ -1666,6 +1666,7 @@ fn spawn_control_session(
mission_cmd_rx,
mission_cmd_tx,
events_tx.clone(),
events_rx,
tool_hub,
status,
current_mission,
@@ -1749,6 +1750,7 @@ async fn control_actor_loop(
mut mission_cmd_rx: mpsc::Receiver<crate::tools::mission::MissionControlCommand>,
mission_cmd_tx: mpsc::Sender<crate::tools::mission::MissionControlCommand>,
events_tx: broadcast::Sender<AgentEvent>,
mut events_rx: broadcast::Receiver<AgentEvent>,
tool_hub: Arc<FrontendToolHub>,
status: Arc<RwLock<ControlStatus>>,
current_mission: Arc<RwLock<Option<Uuid>>>,
@@ -1767,6 +1769,8 @@ async fn control_actor_loop(
// Track which mission the main `running` task is actually working on.
// This is different from `current_mission` which can change when user creates a new mission.
let mut running_mission_id: Option<Uuid> = None;
// Track last activity for the main runner (for stall detection)
let mut main_runner_last_activity: std::time::Instant = std::time::Instant::now();
// Parallel mission runners - each runs independently
let mut parallel_runners: std::collections::HashMap<
@@ -2101,6 +2105,8 @@ async fn control_actor_loop(
let mission_id = current_mission.read().await.clone();
running_cancel = Some(cancel.clone());
running_mission_id = mission_id;
// Reset activity timer when new task starts to avoid false stall warnings
main_runner_last_activity = std::time::Instant::now();
running = Some(tokio::spawn(async move {
let result = run_single_control_turn(
cfg,
@@ -2341,7 +2347,7 @@ async fn control_actor_loop(
state: "running".to_string(),
queue_len: queue.len(),
history_len: history.len(),
seconds_since_activity: 0, // Main runner doesn't track this yet
seconds_since_activity: main_runner_last_activity.elapsed().as_secs(),
expected_deliverables: 0,
});
}
@@ -2788,6 +2794,8 @@ async fn control_actor_loop(
// Capture which mission this task is working on
let mission_id = current_mission.read().await.clone();
running_mission_id = mission_id;
// Reset activity timer when new task starts to avoid false stall warnings
main_runner_last_activity = std::time::Instant::now();
running = Some(tokio::spawn(async move {
let result = run_single_control_turn(
cfg,
@@ -2871,6 +2879,31 @@ async fn control_actor_loop(
tracing::info!("Parallel mission {} removed from runners", mid);
}
}
// Update last_activity for runners when we receive events for them
event = events_rx.recv() => {
if let Ok(event) = event {
// Extract mission_id from event if present
let mission_id = match &event {
AgentEvent::ToolCall { mission_id, .. } => *mission_id,
AgentEvent::ToolResult { mission_id, .. } => *mission_id,
AgentEvent::Thinking { mission_id, .. } => *mission_id,
AgentEvent::AgentPhase { mission_id, .. } => *mission_id,
AgentEvent::AgentTree { mission_id, .. } => *mission_id,
AgentEvent::Progress { mission_id, .. } => *mission_id,
_ => None,
};
// Update last_activity for matching runner (main or parallel)
if let Some(mid) = mission_id {
if running_mission_id == Some(mid) {
// Update main runner activity
main_runner_last_activity = std::time::Instant::now();
} else if let Some(runner) = parallel_runners.get_mut(&mid) {
// Update parallel runner activity
runner.touch();
}
}
}
}
}
}
}