Files

Thomas Marchand b519f02b62 Th0rgal/ios compat review (#37 )

* Add hardcoded Google/Gemini OAuth credentials

Use the same client credentials as Gemini CLI for seamless OAuth flow.
This removes the need for GOOGLE_CLIENT_ID/GOOGLE_CLIENT_SECRET env vars.

* Add iOS Settings view and first-launch setup flow

- Add SetupSheet for configuring server URL on first launch
- Add SettingsView for managing server URL and appearance
- Add isConfigured flag to APIService to detect unconfigured state
- Show setup sheet automatically when no server URL is configured

* Add iOS global workspace state management

- Add WorkspaceState singleton for shared workspace selection
- Refactor ControlView to use global workspace state
- Refactor FilesView with workspace picker in toolbar
- Refactor HistoryView with workspace picker in toolbar
- Refactor TerminalView with workspace picker and improved UI
- Update Xcode project with new files

* Add reusable EnvVarsEditor component and fix page scrolling

- Extract EnvVarsEditor as reusable component with password masking
- Refactor workspaces page to use EnvVarsEditor component
- Refactor workspace-templates page to use EnvVarsEditor component
- Fix workspace-templates page to use h-screen with overflow-hidden
- Add min-h-0 to flex containers to enable proper internal scrolling
- Environment and Init Script tabs now scroll internally

* Improve workspace creation UX and build log auto-scroll

- Auto-scroll build log to bottom when new content arrives
- Fix chroot workspace creation to show correct building status immediately
- Prevent status flicker by triggering build before closing dialog

* Improve iOS control view empty state and input styling

- Show workspace name in empty state subtitle
- Distinguish between host and isolated workspaces
- Refine input field alignment and padding

* Add production security and self-hosting documentation

- Add Section 10: TLS + Reverse Proxy setup (Caddy and Nginx examples)
- Add Section 11: Authentication modes documentation (disabled, single tenant, multi-user)
- Add Section 12: Dashboard configuration (web and iOS)
- Add Section 13: OAuth provider setup information
- Add Production Deployment Checklist

* fix: wip

* wip

* Improve settings sync UX and fix failed mission display

Settings page:
- Add out-of-sync warning when Library and System settings differ
- Add post-save modal prompting to restart OpenCode
- Load both Library and System settings for comparison

Control client:
- Fix missionHistoryToItems to show "Failed" status for failed missions
- Last assistant message now inherits mission's failed status
- Show resume button for failed resumable missions

* Fix: restore original URL on connection failure in SetupSheet

Previously, SetupSheet.connectToServer() persisted the URL before validation.
If the health check failed, the invalid URL remained in UserDefaults, causing
the app to skip the setup flow on next launch and attempt to connect to an
unreachable server. Now the original URL is restored on failure, matching
the behavior in SettingsView.testConnection().

* Fix: restore queueLength on failed removal in ControlView

The removeFromQueue function now properly saves and restores both
queuedItems and queueLength on API error, matching the behavior of
clearQueue. Previously only queuedItems was refreshed via loadQueueItems()
while queueLength remained incorrectly decremented until the next SSE event.

* Add selective encryption for template environment variables

- Add lock/unlock icon to each env var row for encryption toggle
- When locking, automatically hide value and show eye icon
- Auto-enable encryption when key matches sensitive patterns
- Backend selectively encrypts only keys in encrypted_keys array
- Backwards compatible: detects encrypted values in legacy templates
- Refactor workspaces page to use SWR for data fetching

Frontend:
- env-vars-editor.tsx: Add encrypted field, lock toggle, getEncryptedKeys()
- api.ts: Add encrypted_keys to WorkspaceTemplate types
- workspaces/page.tsx: Use SWR, pass encrypted_keys on save
- workspace-templates/page.tsx: Load/save encrypted_keys

Backend:
- library/types.rs: Add encrypted_keys field to WorkspaceTemplate
- library/mod.rs: Selective encryption logic + legacy detection
- api/library.rs: Accept encrypted_keys in save request

* Fix: Settings Cancel restores URL and queue ops refresh on error

SettingsView:
- Store original URL at view init and restore it on Cancel
- Ensures Cancel properly discards unsaved changes including tested URLs

ControlView:
- Queue operations now refresh from server on error instead of restoring
  captured state, avoiding race conditions with concurrent operations

* Fix: preserve undefined for encrypted_keys to enable auto-detection

Passing `template.encrypted_keys || []` converted undefined to an empty
array, which broke the auto-detection logic in toEnvRows. The nullish
coalescing in `encryptedKeys?.includes(key) ?? secret` only falls back
to `secret` when encryptedKeys is undefined, not when it's an empty array.

* Add Queue button and fix SSE/desktop session handling

- Dashboard: Show Queue button when agent is busy to allow message queuing
- OpenCode: Fix SSE inactivity timeout to only reset on meaningful events,
  not heartbeats, preventing false timeout resets
- Desktop: Deduplicate sessions by display to prevent showing duplicate entries
- Docs: Add dashboard password to installation prerequisites

* Fix race conditions in default agent selection and workspace creation

- Fix default agent config being ignored: wait for config to finish loading
  before setting defaults to prevent race between agents and config SWR fetches
- Fix workspace list not refreshing after build failure: move mutateWorkspaces
  call to immediately after createWorkspace, add try/catch around getWorkspace

* Fix encryption lock icon and add skill content encryption

- Fix lock icon showing unlocked for sensitive keys when encrypted_keys is
  empty: now falls back to auto-detection based on key name patterns
- Add showEncryptionToggle prop to EnvVarsEditor to conditionally show
  encryption toggle (only for workspace templates)
- Add skill content encryption with <encrypted>...</encrypted> tags
- Update config pages with consistent styling and encryption support

2026-01-16 01:41:11 -08:00

7.7 KiB

Raw Permalink Blame History

Desktop Environment Setup

This guide covers setting up a headless desktop environment for the Open Agent to control browsers and graphical applications.

Overview

The desktop automation stack consists of:

Xvfb: Virtual framebuffer for headless X11
i3: Minimal, deterministic window manager
xdotool: Keyboard and mouse automation
scrot: Screenshot capture
Chromium: Web browser
AT-SPI2: Accessibility tree extraction
Tesseract: OCR fallback for text extraction

Installation (Ubuntu/Debian)

# Update package list
apt update

# Install core X11 and window manager
apt install -y xvfb i3 x11-utils

# Install automation tools
apt install -y xdotool scrot imagemagick

# Install Chromium browser
apt install -y chromium chromium-sandbox

# Install accessibility tools (AT-SPI2)
apt install -y at-spi2-core libatspi2.0-0 python3-gi python3-gi-cairo gir1.2-atspi-2.0

# Install OCR
apt install -y tesseract-ocr

# Install fonts for proper rendering
apt install -y fonts-liberation fonts-dejavu-core

i3 Configuration

Create a minimal, deterministic i3 config at /root/.config/i3/config:

mkdir -p /root/.config/i3
cat > /root/.config/i3/config << 'EOF'
# Open Agent i3 Config - Minimal and Deterministic
# No decorations, no animations, simple layout

# Use Super (Mod4) as modifier
set $mod Mod4

# Font for window titles (not shown due to no decorations)
font pango:DejaVu Sans Mono 10

# Remove window decorations
default_border none
default_floating_border none

# No gaps
gaps inner 0
gaps outer 0

# Focus follows mouse (predictable behavior)
focus_follows_mouse no

# Disable window titlebars completely
for_window [class=".*"] border pixel 0

# Make all windows float by default for easier positioning
# (comment out if you prefer tiling)
# for_window [class=".*"] floating enable

# Chromium-specific: maximize and remove sandbox issues
for_window [class="Chromium"] border pixel 0
for_window [class="chromium"] border pixel 0

# Keybindings (minimal set)
bindsym $mod+Return exec chromium --no-sandbox --disable-gpu
bindsym $mod+Shift+q kill
bindsym $mod+d exec dmenu_run

# Focus movement
bindsym $mod+h focus left
bindsym $mod+j focus down
bindsym $mod+k focus up
bindsym $mod+l focus right

# Exit i3
bindsym $mod+Shift+e exit

# Reload config
bindsym $mod+Shift+r reload

# Workspace setup (just workspace 1)
workspace 1 output primary
EOF

Environment Variables

Add these to /etc/open_agent/open_agent.env:

# Enable desktop automation tools
DESKTOP_ENABLED=true

# Xvfb resolution (width x height)
DESKTOP_RESOLUTION=1920x1080

# Starting display number (will increment for concurrent sessions)
DESKTOP_DISPLAY_START=99

Manual Testing

Test the setup manually before enabling for the agent:

# Start Xvfb on display :99
Xvfb :99 -screen 0 1920x1080x24 &
export DISPLAY=:99

# Start i3 window manager
i3 &

# Launch Chromium
chromium --no-sandbox --disable-gpu &

# Take a screenshot
sleep 2
scrot /tmp/test_screenshot.png

# Verify screenshot exists
ls -la /tmp/test_screenshot.png

# Test xdotool
xdotool getactivewindow

# Clean up
pkill -f "Xvfb :99"

AT-SPI Accessibility Tree

Test accessibility tree extraction:

export DISPLAY=:99
export DBUS_SESSION_BUS_ADDRESS=unix:path=/tmp/dbus-session-$$

# Start dbus session (required for AT-SPI)
dbus-daemon --session --fork --address=$DBUS_SESSION_BUS_ADDRESS

# Python script to dump accessibility tree
python3 << 'EOF'
import gi
gi.require_version('Atspi', '2.0')
from gi.repository import Atspi

def print_tree(obj, indent=0):
    try:
        name = obj.get_name() or ""
        role = obj.get_role_name()
        if name or role != "unknown":
            print("  " * indent + f"[{role}] {name}")
        for i in range(obj.get_child_count()):
            child = obj.get_child_at_index(i)
            if child:
                print_tree(child, indent + 1)
    except Exception as e:
        pass

desktop = Atspi.get_desktop(0)
for i in range(desktop.get_child_count()):
    app = desktop.get_child_at_index(i)
    if app:
        print_tree(app)
EOF

OCR with Tesseract

Test OCR on a screenshot:

# Take screenshot and run OCR
DISPLAY=:99 scrot /tmp/screen.png
tesseract /tmp/screen.png stdout

# With language hint
tesseract /tmp/screen.png stdout -l eng

Troubleshooting

Xvfb won't start

# Check if display is already in use
ls -la /tmp/.X*-lock
# Remove stale lock files
rm -f /tmp/.X99-lock /tmp/.X11-unix/X99

Chromium sandbox issues

Always use --no-sandbox flag when running as root:

chromium --no-sandbox --disable-gpu

xdotool can't find windows

# List all windows
xdotool search --name ""

# Ensure DISPLAY is set
echo $DISPLAY

AT-SPI not working

# Ensure dbus is running
export $(dbus-launch)

# Enable AT-SPI for Chromium
chromium --force-renderer-accessibility --no-sandbox

No fonts rendering

# Install additional fonts
apt install -y fonts-noto fonts-freefont-ttf

# Rebuild font cache
fc-cache -fv

Security Considerations

The agent runs with full system access
Xvfb sessions are isolated per-task
Sessions are cleaned up when tasks complete
Chromium runs with --no-sandbox (required for root, but limits isolation)
Consider running in a container for additional isolation

Window Layout with i3-msg

The desktop_i3_command tool allows the agent to control window positioning using i3-msg.

Creating a Multi-Window Layout

Example: Chrome on left, terminal with fastfetch top-right, calculator bottom-right:

# Start session
desktop_start_session

# Launch Chrome (takes left half by default in tiling mode)
i3-msg exec chromium --no-sandbox

# Prepare to split the right side horizontally
i3-msg split h

# Split right side vertically for stacked windows
i3-msg focus right
i3-msg split v

# Launch terminal with fastfetch (top-right)
i3-msg exec xterm -e fastfetch

# Launch calculator (bottom-right)
i3-msg exec xcalc

Common i3-msg Commands

Command	Description
`exec <app>`	Launch an application
`split h`	Next window opens horizontally adjacent
`split v`	Next window opens vertically adjacent
`focus left/right/up/down`	Move focus to adjacent window
`move left/right/up/down`	Move focused window
`resize grow width 100 px`	Make window wider
`resize grow height 100 px`	Make window taller
`layout splitv/splith`	Change container layout
`fullscreen toggle`	Toggle fullscreen
`kill`	Close focused window

Pre-installed Applications

These are installed on the production server:

chromium --no-sandbox - Web browser
xterm - Terminal emulator
xcalc - Calculator
fastfetch - System info display

Session Lifecycle

Task starts: Agent calls desktop_start_session
Xvfb starts: Virtual display created at :99 (or next available)
i3 starts: Window manager provides predictable layout
Browser launches: Chromium opens (if requested)
Agent works: Screenshots, clicks, typing via desktop_* tools
Task ends: desktop_stop_session kills Xvfb and children
Cleanup: Any orphaned sessions killed on task failure

Available Desktop Tools

Tool	Description
`desktop_start_session`	Start Xvfb + i3 + optional Chromium
`desktop_stop_session`	Stop the desktop session
`desktop_screenshot`	Take screenshot (saves locally)
`desktop_type`	Send keyboard input (text or keys)
`desktop_click`	Mouse click at coordinates
`desktop_mouse_move`	Move mouse cursor
`desktop_scroll`	Scroll mouse wheel
`desktop_get_text`	Extract visible text (AT-SPI or OCR)
`desktop_i3_command`	Execute i3-msg commands for window control

7.7 KiB Raw Permalink Blame History