If you've ever set a cron job to re-run your eval suite and Slack you when scores drop, you've built half of ATO Pro by hand. Here's what we ship.
🕑
Scheduled methodology re-runs
methodology schedule create — pick a methodology, pick a cron, get a fresh Welch-t + 95% CI report on every wake. No bash, no crontab -e, no parsing JSON.
Free DIY: crontab + ato methodology run + a wrapper script.
🔁
Codified learning loop (methodology diagnose)
Failed regression on Friday? Pro spawns a 3-LLM diagnose war-room over the weekend, proposes a concrete agent change (prompt, model, tool list), runs the A/B, and tells you which version won. Monday morning you have a verdict instead of a backlog.
Free DIY: 3 ato dispatch calls + your own merging prompts + a follow-up methodology run.
📊
Cloud trace retention + regression alerts
Every dispatch from every machine you use streams to ato-cloud (30-day retention). When tool success rate drops 17pp after a model swap, you get an alert — not a six-week-late realization. Cross-device by design: your laptop's evals and the CI box's are in one ledger.
Free DIY: nothing, this needs a server you'd have to run yourself.
↶
Auto-revert watch + auto-PR after A/B wins
When methodology diagnose finds the new version beats baseline at p<0.05, Pro opens the PR for you — agent file change, methodology results in the PR body, before/after numbers. When it loses, you get a revert PR instead.
Free DIY: GitHub API + parsing methodology JSON + churning out the PR body.
☁
Cloud sync of methodology runs + scheduled evaluators
Run a methodology on your laptop at 3pm. Open ATO on the desktop at home, the run's there. Schedule an evaluator to fire nightly — Pro runs it in the cloud even when your laptop's closed. Your free tier still does everything locally; Pro just keeps the lights on while you're asleep.
Free DIY: sync a SQLite file across machines + run a self-hosted scheduler.
⚖
Hosted LLM-as-judge quality scoring
You can already use any provider as a judge with your own API key (free tier). Pro adds a hosted judge endpoint with retry, cache, rate-limit handling, and a managed prompt library — so your overnight evals don't crash on a transient 429.
Free DIY: ato dispatch with your own judge prompt + retry wrapper.