Go to file

Thomas Marchand 269958f0a9 Fix SSE event filtering and add Picture-in-Picture support (#25 )

* Fix SSE event filtering race condition in mission views

Events were being filtered out during mission load due to a race condition where
viewingMissionId was set before currentMission finished loading. Now events only
get filtered when both IDs are set and different, allowing streaming updates to
display while missions are loading.

* Improve desktop stream UX with auto-open and auto-close

- Auto-extract display ID from desktop_start_session tool result
- Auto-open desktop stream when agent starts a desktop session
- Auto-close desktop stream when agent finishes (status becomes idle)
- Apply same improvements to both web and iOS dashboards

* Fix desktop display extraction from JSON string results

Tool results may be returned as JSON strings rather than parsed objects.
Handle both cases when extracting the display ID from desktop_start_session.

* Fix desktop stream staying open when status=idle during loading

The event filtering was updated to allow events through when currentMissionId
is null (during initial load), but the status application logic wasn't updated
to match. This created a window where tool_result could open the desktop stream
but status=idle wouldn't close it because shouldApplyStatus was false.

Now both the event filter and status application logic use consistent conditions:
allow when currentMissionId hasn't loaded yet.

* Fix desktop auto-open and add Picture-in-Picture support

- Use tool_result event's name field directly for desktop_start_session detection
  (fixes auto-open when tool_call event was filtered or missed)
- Add native Picture-in-Picture button to desktop stream
  - Converts canvas to video stream for OS-level floating window
  - Works outside the browser tab
  - Shows PiP button only when browser supports it

* Add iOS Picture-in-Picture support for desktop stream

- Implement AVSampleBufferDisplayLayer-based PiP for iOS
- Convert JPEG frames to CMSampleBuffer for PiP playback
- Add PiP buttons to desktop stream header and controls
- Fix web dashboard auto-open to use tool name from event data directly
- Add audio background mode to Info.plist for PiP support

* Fix React anti-patterns flagged by Bugbot

- Use itemsRef for synchronous read instead of calling state setters
  inside setItems updater callback (React strict mode safe)
- Attach PiP event listeners directly to video element instead of
  document, since these events don't bubble

* Fix PiP issues flagged by Bugbot

- iOS: Only disconnect stream onDisappear if PiP is not active,
  allowing stream to continue in PiP mode after sheet is dismissed
- Web: Stop existing stream tracks before creating new ones to
  prevent resource leaks on repeated PiP toggle

* Fix iOS PiP cleanup when stopped after view dismissal

- Add shouldDisconnectAfterPip flag to track deferred cleanup
- Set flag in onDisappear when PiP is active
- Clean up WebSocket and PiP resources when PiP stops if flag is set

* Fix additional PiP issues flagged by Bugbot

- iOS: Return actual isPaused state in PiP delegate using MainActor.assumeIsolated
- iOS: Add isPipReady flag and disable PiP button until setup completes
- Web: Don't forcibly exit PiP on unmount to match iOS behavior

2026-01-03 14:16:02 -08:00

.claude

Remove Local Backend, make OpenCode the only execution path (#15 )

2026-01-02 12:32:27 -08:00

.cursor/rules

Remove outdated leaf agent docs, reflect SimpleAgent architecture

2025-12-25 20:45:12 +01:00

dashboard

Fix SSE event filtering and add Picture-in-Picture support (#25 )

2026-01-03 14:16:02 -08:00

docs

feat: improved missions ux

2025-12-21 09:03:08 +00:00

ios_dashboard

Fix SSE event filtering and add Picture-in-Picture support (#25 )

2026-01-03 14:16:02 -08:00

scripts

feat: add GPT-5.2 and qwen3-thinking models, friendlier display names

2025-12-22 21:39:25 +01:00

src

Fix user message not showing when mission is loading

2026-01-03 13:17:27 +00:00

.cursorignore

Initial implementation: core agent with HTTP API and full toolset

2025-12-14 21:15:05 +00:00

.env.example

Add OpenCode integration for backend execution

2026-01-02 07:39:24 +00:00

.gitignore

OpenCode refactor and mission tracking fixes (#14 )

2026-01-02 09:45:01 -08:00

Cargo.toml

OpenCode refactor and mission tracking fixes (#14 )

2026-01-02 09:45:01 -08:00

models_with_benchmarks.json

feat: add GPT-5.2 and qwen3-thinking models, friendlier display names

2025-12-22 21:39:25 +01:00

opencode.json

OpenCode refactor and mission tracking fixes (#14 )

2026-01-02 09:45:01 -08:00

README.md

Add OpenCode integration for backend execution

2026-01-02 07:39:24 +00:00

secrets.json.example

wip: ios app

2025-12-17 08:55:04 +00:00

test_improvements.md

Enhance agent capabilities with smart pivoting and adaptive model selection

2025-12-26 08:39:59 +01:00

README.md

Open Agent

A minimal autonomous coding agent with full machine access, implemented in Rust.

Features

HTTP API for task submission and monitoring
Tool-based agent loop following the "tools in a loop" pattern
Full toolset: file operations, terminal, machine-wide search, web access, git
OpenRouter integration for LLM access (supports any model)
SSE streaming for real-time task progress
AI-maintainable Rust codebase with strong typing

Quick Start

Prerequisites

Rust 1.70+ (install via rustup)
An OpenRouter API key (get one here)

Installation

git clone <repo-url>
cd open_agent
cargo build --release

Running

# Set your API key
export OPENROUTER_API_KEY="sk-or-v1-..."

# Optional: configure model (default: anthropic/claude-sonnet-4.5)
export DEFAULT_MODEL="anthropic/claude-sonnet-4.5"

# Optional: default working directory for relative paths (absolute paths work everywhere)
# In production this is typically /root
export WORKING_DIR="."

# Start the server
cargo run --release

The server starts on http://127.0.0.1:3000 by default.

OpenCode Backend (External Agent)

Open Agent can delegate execution to an OpenCode server instead of using its built-in agent loop.

# Point to a running OpenCode server
export AGENT_BACKEND="opencode"
export OPENCODE_BASE_URL="http://127.0.0.1:4096"

# Optional: choose OpenCode agent (build/plan/etc)
export OPENCODE_AGENT="build"

# Optional: auto-allow all permissions for OpenCode sessions (default: true)
export OPENCODE_PERMISSIVE="true"

API Reference

Submit a Task

curl -X POST http://localhost:3000/api/task \
  -H "Content-Type: application/json" \
  -d '{"task": "Create a Python script that prints Hello World"}'

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending"
}

Get Task Status

curl http://localhost:3000/api/task/{id}

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "task": "Create a Python script that prints Hello World",
  "model": "openai/gpt-4.1-mini",
  "iterations": 3,
  "result": "I've created hello.py with a simple Hello World script...",
  "log": [...]
}

Stream Task Progress (SSE)

curl http://localhost:3000/api/task/{id}/stream

Events:

log - Execution log entries (tool calls, results)
done - Task completion with final status

Health Check

curl http://localhost:3000/api/health

Available Tools

Tool	Description
`read_file`	Read file contents (any path on the machine) with optional line range
`write_file`	Write/create files anywhere on the machine
`delete_file`	Delete files anywhere on the machine
`list_directory`	List directory contents anywhere on the machine
`search_files`	Search for files by name pattern (machine-wide; scope with `path`)
`run_command`	Execute shell commands (optionally in a specified `cwd`)
`grep_search`	Search file contents with regex (machine-wide; scope with `path`)
`web_search`	Search the web (DuckDuckGo)
`fetch_url`	Fetch URL contents
`git_status`	Get git status for any repo path
`git_diff`	Show git diff for any repo path
`git_commit`	Create git commits for any repo path
`git_log`	Show git log for any repo path

Configuration

Variable	Default	Description
`OPENROUTER_API_KEY`	(required)	Your OpenRouter API key
`DEFAULT_MODEL`	`anthropic/claude-sonnet-4.5`	Default LLM model
`WORKING_DIR`	`.` (dev) / `/root` (prod)	Default working directory for relative paths (agent still has full machine access)
`HOST`	`127.0.0.1`	Server bind address
`PORT`	`3000`	Server port
`MAX_ITERATIONS`	`50`	Max agent loop iterations

Architecture

┌─────────────────┐     ┌─────────────────┐
│   HTTP Client   │────▶│   HTTP API      │
└─────────────────┘     │   (axum)        │
                        └────────┬────────┘
                                 │
                        ┌────────▼────────┐
                        │   Agent Loop    │◀──────┐
                        │                 │       │
                        └────────┬────────┘       │
                                 │                │
                   ┌─────────────┼─────────────┐  │
                   ▼             ▼             ▼  │
            ┌──────────┐  ┌──────────┐  ┌──────────┐
            │   LLM    │  │  Tools   │  │  Tools   │
            │(OpenRouter)│ │(file,git)│ │(term,web)│
            └──────────┘  └──────────┘  └──────────┘
                   │
                   └──────────────────────────────┘
                            (results fed back)

Development

# Run with debug logging
RUST_LOG=debug cargo run

# Run tests
cargo test

# Format code
cargo fmt

# Check for issues
cargo clippy

Dashboard (Bun)

The dashboard lives in dashboard/ and uses Bun as the package manager.

cd dashboard
bun install
PORT=3001 bun dev

Calibration (Trial-and-Error Tuning)

Open Agent supports empirical tuning of its difficulty (complexity) and cost estimation via a calibration harness.

Run calibrator

export OPENROUTER_API_KEY="sk-or-v1-..."
cargo run --release --bin calibrate -- --workspace ./.open_agent_calibration --model openai/gpt-4.1-mini --write-tuning

This writes a tuning file at ./.open_agent_calibration/.open_agent/tuning.json. Move/copy it to your real workspace as ./.open_agent/tuning.json to enable it.

License

MIT

Languages

Rust 38.9%

TypeScript 35.2%

HTML 13.7%

Swift 9.6%

CSS 1.3%

Other 1.3%