// your AIs already have hands. who's flying? ↓
Claude, Codex, Gemini — they already have hands. ATO is the seat you fly them from. Set the rules. Watch every tool call. Kill the runaway. Compare what each one actually did to your code. Multi-LLM war-rooms, code review, replay, regression alerts. Local-first. MIT.
War-rooms, sessions, code review, replay, receipts, MCP — all free. Pro ($29/mo) replays your prompts overnight and tells you what to switch.
You paste the same question into Claude, GPT, Gemini one tab at a time. Each starts from zero. None of them see what the others said. The disagreement that should be the signal is buried in your clipboard history.
Most multi-LLM debate tools can’t read your repo, can’t grep, can’t verify a single claim before stitching the answers together. They’re vibes-as-a-service — clever, but unverifiable.
You get an answer, you read it, you move on. No record of which LLM made which claim, no way to cite “confirmed by GPT, disputed by Claude,” no markdown you can paste into a PR. The receipt is the artifact, and it’s missing.
Every multi-LLM dispatch lands in your local SQLite as a session you can scroll through later. Each row carries an auto-generated summary, the runtimes that spoke, the personas (when you used --agent), tags, and a session id you can pass to ato sessions get from your terminal. No accounts, no cloud round-trip — all on the developer’s machine.
Real-time team workspaces (GUI + CLI) • Browser ↔ Desktop tether (X25519 + AEAD) • Create + manage teams from any browser • ato war-rooms sweep auto-closes idle war-rooms from cron • ato subagent log brings Claude Code’s Task tool into the same execution log
ato war-rooms·sessions·chats share and append-event — so your agents collaborate too. Browse from any browser, mobile-responsive.browser_pubkey_fp. X25519 DH + HKDF-SHA256 + XChaCha20-Poly1305 over a cloud relay that never sees plaintext. Defense-in-depth “Allow always” lives in a local tether_approvals table on the desktop, not the cloud.ato war-rooms sweep — Auto-closes idle war-rooms with a coordinator summary. Single-JSON envelope output, clap-layer validators. Wire to launchd / cron and one-shot R1 multi-LLM reviews self-close once seats land — no more invisible war-rooms in the Sessions feed.ato subagent log — Claude Code’s Agent (Task) tool dispatches now show up in execution_logs alongside outer-session work. Canonical auth_mode + billing_surface vocab. Git commit SHA per receipt. UTF-8-safe truncation on multibyte prompts.run_agent.@reviewer from Sonnet 4.6 to Opus 4.7 and the dashboard flags “success rate dropped 17pp across 412 conversations.” Joins the configuration-change ledger with trace windows automatically. Severity-tagged: regressions first, improvements second, neutral hidden by default.{user_name}, {project_root}, {recent_orders} in your system prompt. Resolvers: static, env, project path, file, database query, MCP call, computed JS.Pick any past trace. Click Replay. Re-run the original prompt against a different runtime. See source vs replay side-by-side with duration + estimated cost delta. Would Codex have answered correctly on those failing prompts? Now you can find out.
prompt_agent_inner so the replay is itself killable + appears in Live runs. Status pill ticks pending → running → done; result panel renders both responses + duration delta. Source prompts come from your local execution log — ATO never sends prompt content to a server you don’t already use.@code-writer · claude → codex · −59% per call · projected $1.01/mo at this volume. Surfaces concrete swaps when you have multi-runtime history on the same agent and the alternative is meaningfully cheaper at preserved quality. Quality guards: ≥30% cheaper, ok-rate within 10pp, eval-score within 5pp. Renders nothing if no rec qualifies — better than fake confidence.parent_run_id. One row per pipeline; click into the per-stage flow with handoff arrows + per-stage timing + files touched per stage.Per-runtime context breakdown. Switch between Claude, Codex, OpenClaw, and Hermes to see what each agent has loaded. Skills shown as on-demand — not counted in the total.
Manage skills across all runtimes with per-runtime tabs. Browse the marketplace, install community skills, or ask AI to create one for you.
Visual workflow editor that auto-detects flows from your installed skills. Any skill with Step or Phase headers becomes a visual automation.
Pick an agent (or a routed/sequential group) and a schedule. The agent’s system prompt, variables, hooks, memory, and skills all fire on every run — not just a raw prompt.
systemd --user timers on Linux, Task Scheduler on Windows. Jobs fire even when ATO is closed.Centralized dashboard to store, rotate, and scope API keys for every major LLM provider. Keys are encrypted locally — never sent to any server.
Live dashboard showing active agent sessions, token consumption rates, runtime health, and smart alerts — across all your AI coding tools at once.
Complete audit trail of every action across your agentic systems. Filter by action type, resource, and time range. Export to JSON for compliance.
Connect your company's identity provider. Google Workspace, Okta, Microsoft Entra, or any OIDC provider — with domain restriction and auto-provisioning.
Every ATO agent is exposed as an MCP tool. Any MCP-aware runtime — Claude Code, Codex, Cursor, others — can dispatch to any ATO agent regardless of which runtime owns it.
~/.ato/local.db · AES-256 at restato review with tool callsThe principle: you can run every primitive yourself for free. We charge for the codified automation we package on top. Same model as GitLab, Sentry, Supabase.
| Free | Pro $29/mo | Team $49/mo | |
|---|---|---|---|
| Dispatch + compare across runtimes | ✓ | ✓ | ✓ |
| Agent creation + war rooms + sessions + replay | ✓ | ✓ | ✓ |
| Methodology runner — reusable test recipes with Welch t + 95% CI | ✓ | ✓ | ✓ |
| MCP server (27 tools incl. methodology) + Tauri desktop + Insights panel | ✓ | ✓ | ✓ |
| Quality checks (regex, structural, your own LLM-judge with your key) | ✓ | ✓ | ✓ |
methodology schedule create — auto-rerun on cron |
DIY w/ crontab | ✓ | ✓ |
methodology diagnose — codified learning loop (failure → agent change → A/B) |
DIY w/ ato dispatch |
✓ | ✓ |
| Cloud traces + cross-device regression detection | — | 30-day | 30-day |
| Cloud sync of methodology runs + scheduled evaluators | — | ✓ | ✓ |
| Auto-revert watch + auto-PR after A/B wins | — | ✓ | ✓ |
| Team workspaces (multi-user shared agents + skills + methodologies; real-time shared war-rooms/sessions/chats with live append) | — | — | ✓ |
| Encrypted provider key store + cron usage-poller | — | — | ✓ |
Every Pro row is automation we built on top of the free primitives. You can build the same loop yourself with ato dispatch + bash + your own LLM prompts — you just don’t get our button. Full tier mapping →
Free, open source, and ready for your platform.
> Free forever: war-rooms, sessions, ato review with tool calls, replay, file attribution, live runs, receipts, MCP server, cost optimizer, Tauri desktop, embedded terminal, skills marketplace. Local-first. MIT. Pro ($29/mo) adds cloud trace retention, regression alerts, scheduled evaluators, and LLM-judge quality scoring — the things that automate what you’d do manually.
Complementary, not competing. ATO is your local war room for humans and LLMs — the developer side of multi-runtime AI work. For SDK-based production observability across your deployed app stack, use Langfuse, Helicone, or LangSmith. Most production teams run one from each camp — they cover different sides of the same agent. More on how they fit together →