Go to file

Thomas Marchand b42ed192cf Add real-time desktop streaming for watching AI agent work (#17 )

* iOS: Improve mission UI, add auto-reconnect, and refine input field

- Fix missions showing "Default" label by using mission ID instead when no model override
- Add ConnectionState enum to track SSE stream health with reconnecting/disconnected states
- Implement automatic reconnection with exponential backoff (1s→30s)
- Show connection status in toolbar when disconnecting, hide error bubbles for connection issues
- Fix status event filtering to only apply to currently viewed mission
- Reset run state when creating new mission or switching missions
- Redesign input field to ChatGPT style: clean outline, no background fill, integrated send button

* Add real-time desktop streaming with WebSocket MJPEG

Implements desktop streaming feature to watch the AI agent work in real-time:

- Backend: WebSocket endpoint at /api/desktop/stream using MJPEG frames
- iOS: Bottom sheet UI with play/pause, FPS and quality controls
- Web: Side-by-side split view with toggleable desktop panel
- Better OpenCode error messages for debugging

* Fix Bugbot review issues

- Fix WebSocket reconnection on slider changes by using initial values for URL params
- Fix iOS connected status set before WebSocket actually connects
- Fix mission state mapping to properly handle waiting_for_tool state

* Change default model from Sonnet 4 to Opus 4.5

Update DEFAULT_MODEL default value to claude-opus-4-5-20251101,
the most capable model in the Claude family.

* Fix additional Bugbot review issues

- Add onerror handler for image loading to prevent memory leaks
- Reset isPaused on disconnect to avoid UI desync
- Fix data race on backoff variable using nonisolated(unsafe)

* Address remaining Bugbot review issues

- Make error filtering more specific to SSE reconnection errors only
- Use refs for FPS/quality values to preserve current settings on reconnect

* Fix initial connection state and task cleanup

- Start iOS connection state as disconnected until first event
- Abort spawned tasks when WebSocket handler exits to prevent resource waste

* Fix connection state and backoff logic in iOS ControlView

- Set connectionState to .disconnected on view disappear (was incorrectly .connected)
- Only reset exponential backoff on successful (non-error) events to maintain proper
  backoff behavior when server is unavailable

* Fix fullscreen state sync and stale WebSocket callbacks

- Web: Don't set fullscreen state synchronously; rely on event listeners
- Web: Add fullscreenerror event handler to catch failed fullscreen requests
- iOS: Add connection ID to prevent stale WebSocket callbacks from corrupting
  new connection state when reconnecting

* Fix user message not appearing when viewing parallel missions

When switching to a parallel mission, currentMission was not being
updated, causing viewingId != currentId. This made the event filter
skip user_message events (which have mission_id: None from main session).

Now always update currentMission when switching, ensuring the filter
passes events correctly.

* Fix web dashboard showing "Agent is working..." for idle missions

Two fixes:
1. Set viewingMissionId immediately when loading mission from URL param
   - Previously viewingMissionId was null, falling back to global runState
   - Now it's set immediately so viewingMissionIsRunning checks runningMissions

2. Add status event filtering by mission_id
   - Status events now only update runState if they match the viewing mission
   - Similar to iOS fix for cross-mission status contamination

* Fix mission not loading when accessed via URL before authentication

When loading a mission via URL param (?mission=...), the initial API
fetch would fail with 401 before the user authenticated. After login,
nothing triggered a re-fetch of the mission data.

Added auth retry mechanism:
- Add signalAuthSuccess() to dispatch event after successful login
- Add authRetryTrigger state and listener in control-client
- Re-fetch mission and providers when auth succeeds

* Fix user message not appearing when viewing a specific mission

The user_message SSE event was being sent with mission_id: None, causing
it to be filtered out by the frontend when viewing a specific mission.
Now we read the current_mission before emitting the event and include
its ID, so the frontend correctly displays the user's message.

* Separate viewed mission from main mission to prevent event leaking

- Thread mission_id through main control runs so assistant/thinking/tool
  events are tagged with the correct mission ID
- Web: Track viewingMission separately from currentMission; filter SSE
  events by mission_id; revert to previous view on load failures
- iOS: Track viewingMission separately from currentMission; filter SSE
  events by mission_id; restore previous view on load failures; parse
  depth from both 'depth' and 'current_depth' SSE fields
- Update "Auto uses" label to Opus 4.5 on web

This prevents mission switching from leaking messages or status updates
across different missions when running parallel missions.

* Fix Bugbot review issues

- Use getValidJwt() and getRuntimeApiBase() in desktop-stream.tsx
  instead of incorrect storage keys
- Show error toast for mission load failures (except 401 auth errors)
  to fix silent failures for already-authenticated users

* Fix additional Bugbot review issues

- Add connectionId guard to desktop stream WebSocket to prevent race
  conditions where stale onclose callbacks incorrectly set disconnected
  state after reconnection
- Fix sync effect in control-client to only update viewingMission when
  viewingMissionId matches currentMission.id, preventing state corruption
- Restore runState, queueLength, progress on iOS mission switch failure
  to avoid mismatched status indicators

* Add race condition guard to URL-based mission loading

* Fix data race in iOS reconnection backoff using OSAllocatedUnfairLock

Replace nonisolated(unsafe) with proper thread-safe synchronization
using OSAllocatedUnfairLock for the receivedSuccessfulEvent boolean
that is written from the stream callback and read after completion.

2026-01-03 04:21:35 -08:00

.claude

Remove Local Backend, make OpenCode the only execution path (#15 )

2026-01-02 12:32:27 -08:00

.cursor/rules

Remove outdated leaf agent docs, reflect SimpleAgent architecture

2025-12-25 20:45:12 +01:00

dashboard

Add real-time desktop streaming for watching AI agent work (#17 )

2026-01-03 04:21:35 -08:00

docs

feat: improved missions ux

2025-12-21 09:03:08 +00:00

ios_dashboard

Add real-time desktop streaming for watching AI agent work (#17 )

2026-01-03 04:21:35 -08:00

scripts

feat: add GPT-5.2 and qwen3-thinking models, friendlier display names

2025-12-22 21:39:25 +01:00

src

Add real-time desktop streaming for watching AI agent work (#17 )

2026-01-03 04:21:35 -08:00

.cursorignore

Initial implementation: core agent with HTTP API and full toolset

2025-12-14 21:15:05 +00:00

.env.example

Add OpenCode integration for backend execution

2026-01-02 07:39:24 +00:00

.gitignore

OpenCode refactor and mission tracking fixes (#14 )

2026-01-02 09:45:01 -08:00

Cargo.toml

OpenCode refactor and mission tracking fixes (#14 )

2026-01-02 09:45:01 -08:00

models_with_benchmarks.json

feat: add GPT-5.2 and qwen3-thinking models, friendlier display names

2025-12-22 21:39:25 +01:00

opencode.json

OpenCode refactor and mission tracking fixes (#14 )

2026-01-02 09:45:01 -08:00

README.md

Add OpenCode integration for backend execution

2026-01-02 07:39:24 +00:00

secrets.json.example

wip: ios app

2025-12-17 08:55:04 +00:00

test_improvements.md

Enhance agent capabilities with smart pivoting and adaptive model selection

2025-12-26 08:39:59 +01:00

README.md

Open Agent

A minimal autonomous coding agent with full machine access, implemented in Rust.

Features

HTTP API for task submission and monitoring
Tool-based agent loop following the "tools in a loop" pattern
Full toolset: file operations, terminal, machine-wide search, web access, git
OpenRouter integration for LLM access (supports any model)
SSE streaming for real-time task progress
AI-maintainable Rust codebase with strong typing

Quick Start

Prerequisites

Rust 1.70+ (install via rustup)
An OpenRouter API key (get one here)

Installation

git clone <repo-url>
cd open_agent
cargo build --release

Running

# Set your API key
export OPENROUTER_API_KEY="sk-or-v1-..."

# Optional: configure model (default: anthropic/claude-sonnet-4.5)
export DEFAULT_MODEL="anthropic/claude-sonnet-4.5"

# Optional: default working directory for relative paths (absolute paths work everywhere)
# In production this is typically /root
export WORKING_DIR="."

# Start the server
cargo run --release

The server starts on http://127.0.0.1:3000 by default.

OpenCode Backend (External Agent)

Open Agent can delegate execution to an OpenCode server instead of using its built-in agent loop.

# Point to a running OpenCode server
export AGENT_BACKEND="opencode"
export OPENCODE_BASE_URL="http://127.0.0.1:4096"

# Optional: choose OpenCode agent (build/plan/etc)
export OPENCODE_AGENT="build"

# Optional: auto-allow all permissions for OpenCode sessions (default: true)
export OPENCODE_PERMISSIVE="true"

API Reference

Submit a Task

curl -X POST http://localhost:3000/api/task \
  -H "Content-Type: application/json" \
  -d '{"task": "Create a Python script that prints Hello World"}'

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending"
}

Get Task Status

curl http://localhost:3000/api/task/{id}

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "task": "Create a Python script that prints Hello World",
  "model": "openai/gpt-4.1-mini",
  "iterations": 3,
  "result": "I've created hello.py with a simple Hello World script...",
  "log": [...]
}

Stream Task Progress (SSE)

curl http://localhost:3000/api/task/{id}/stream

Events:

log - Execution log entries (tool calls, results)
done - Task completion with final status

Health Check

curl http://localhost:3000/api/health

Available Tools

Tool	Description
`read_file`	Read file contents (any path on the machine) with optional line range
`write_file`	Write/create files anywhere on the machine
`delete_file`	Delete files anywhere on the machine
`list_directory`	List directory contents anywhere on the machine
`search_files`	Search for files by name pattern (machine-wide; scope with `path`)
`run_command`	Execute shell commands (optionally in a specified `cwd`)
`grep_search`	Search file contents with regex (machine-wide; scope with `path`)
`web_search`	Search the web (DuckDuckGo)
`fetch_url`	Fetch URL contents
`git_status`	Get git status for any repo path
`git_diff`	Show git diff for any repo path
`git_commit`	Create git commits for any repo path
`git_log`	Show git log for any repo path

Configuration

Variable	Default	Description
`OPENROUTER_API_KEY`	(required)	Your OpenRouter API key
`DEFAULT_MODEL`	`anthropic/claude-sonnet-4.5`	Default LLM model
`WORKING_DIR`	`.` (dev) / `/root` (prod)	Default working directory for relative paths (agent still has full machine access)
`HOST`	`127.0.0.1`	Server bind address
`PORT`	`3000`	Server port
`MAX_ITERATIONS`	`50`	Max agent loop iterations

Architecture

┌─────────────────┐     ┌─────────────────┐
│   HTTP Client   │────▶│   HTTP API      │
└─────────────────┘     │   (axum)        │
                        └────────┬────────┘
                                 │
                        ┌────────▼────────┐
                        │   Agent Loop    │◀──────┐
                        │                 │       │
                        └────────┬────────┘       │
                                 │                │
                   ┌─────────────┼─────────────┐  │
                   ▼             ▼             ▼  │
            ┌──────────┐  ┌──────────┐  ┌──────────┐
            │   LLM    │  │  Tools   │  │  Tools   │
            │(OpenRouter)│ │(file,git)│ │(term,web)│
            └──────────┘  └──────────┘  └──────────┘
                   │
                   └──────────────────────────────┘
                            (results fed back)

Development

# Run with debug logging
RUST_LOG=debug cargo run

# Run tests
cargo test

# Format code
cargo fmt

# Check for issues
cargo clippy

Dashboard (Bun)

The dashboard lives in dashboard/ and uses Bun as the package manager.

cd dashboard
bun install
PORT=3001 bun dev

Calibration (Trial-and-Error Tuning)

Open Agent supports empirical tuning of its difficulty (complexity) and cost estimation via a calibration harness.

Run calibrator

export OPENROUTER_API_KEY="sk-or-v1-..."
cargo run --release --bin calibrate -- --workspace ./.open_agent_calibration --model openai/gpt-4.1-mini --write-tuning

This writes a tuning file at ./.open_agent_calibration/.open_agent/tuning.json. Move/copy it to your real workspace as ./.open_agent/tuning.json to enable it.

License

MIT

Languages

Rust 38.9%

TypeScript 35.2%

HTML 13.7%

Swift 9.6%

CSS 1.3%

Other 1.3%