2026-05-13 · Multi-LLM · 8 min read

I rebuilt Karpathy’s LLM Council with tool calls and an audit trail

Last November, Andrej Karpathy shipped a small repo called llm-council. The idea: instead of asking one LLM a hard question, you ask a council of them. Each model answers independently, they review each other anonymously, and a “Chairman LLM” compiles the final response.

18,695 stars. 3,616 forks. 124 open issues. Last commit: November 22, 2025.

Karpathy put this note in the README:

“I’m not going to support it in any way, it’s provided here as is for other people’s inspiration and I don’t intend to improve it. Code is ephemeral now and libraries are over, ask your LLM to change it in whatever way you like.”

That’s a compact validation of a real pattern — and an explicit invitation for someone else to make it actually useful.

We’ve been building ato in the open for the last 60 days. Originally we positioned it as “the GUI for multi-runtime AI agents.” It wasn’t landing. Then a friend — call him Thiago, a $200/mo Claude Max + $200/mo Codex power user, plugged into the OpenClaw community — sent us Karpathy’s repo and said, paraphrased: “I don’t see myself using ato for code review, but my actual pain is exactly this thing Karpathy just shipped. I vibe-coded a version of it with LangChain last year. It worked. Then LangChain caducated.”

So we repositioned around the council pattern. And in doing so, we noticed that ato had quietly become the maintained, tool-equipped, multi-provider version of Karpathy’s repo. Same primitive. Different shape.

This post is what’s different, why, and how to try it.

What’s in Karpathy’s repo

A FastAPI backend, a React + Vite frontend, OpenRouter as the model gateway. The 3-stage flow:

First opinions — each LLM answers the user’s query independently.
Review — each LLM ranks the others’ answers (anonymized so no model can play favorites).
Chairman — a designated LLM compiles a final synthesized answer.

JSON files in data/conversations/. About 600 lines of Python. Clever, fast, lovely as a Saturday hack. Karpathy was clear about its scope: a tool for exploring multiple models side by side, especially in the context of reading books together with LLMs.

What’s NOT in Karpathy’s repo

None of these gaps are criticism — Karpathy was explicit it was a hack. They’re the gaps that turn the primitive into something usable in a developer’s day-to-day.

Tool calls. The LLMs shuffle text. They can’t read your repo, can’t grep, can’t check whether a function they’re claiming exists actually exists. If you ask “review my PR,” they’re guessing from the diff you pasted.

Multi-provider auth. OpenRouter only. If you have a Claude Max subscription, a Codex CLI subscription, or local Ollama, you pay a third time to use them through llm-council. The $400/mo power user is a third-class citizen.

A persistent audit log. The council answers your question, you read the output, the session disappears. There’s no record of which LLM made which claim, no way to cite “this finding was confirmed by Reviewer A and disputed by Reviewer B,” no way to paste a signed transcript into a PR.

Persistent specialists. Same models every time, configured in config.py. You can’t say “use @security-specialist on Gemini and @perf-reviewer on MiniMax for this one.”

Active maintenance. Karpathy said it himself.

What ato adds

The core command:

ato review --against main \
  --reviewer @security-specialist \
  --reviewer @perf-reviewer \
  --reviewer claude \
  --reviewer minimax \
  --lean \
  --out review.md

Each reviewer runs in the same session — reviewer #2 sees #1’s findings via history replay, no re-pasting context. Function-calling tools (read_file, grep, git_log) let every LLM walk your actual repo and cite files at the line level. The audit log records every tool call by every reviewer, so the GUI can badge a reply verified via 3 tool calls versus prompt-only.

Agents are specialists you define once and compose:

ato agents create --slug security-specialist --runtime claude \
  --system-prompt "You are a senior security reviewer. Prioritize auth changes,
input validation, secrets in logs, crypto correctness. Cite file:line."

ato agents create --slug perf-reviewer --runtime minimax \
  --system-prompt "You're a perf reviewer. Flag N+1 patterns, hot-path allocations,
sync I/O in async contexts. Cite file:line."

Then ato review --reviewer @security-specialist --reviewer @perf-reviewer runs both in one session against your PR. They see each other’s findings. You moderate. The audit log captures who did what.

We tested this on ato’s own positioning

When it came time to pick a new headline for ato, we did the dogfooding move: opened a session, dropped Gemini and MiniMax into it, and made them argue about ato’s positioning for five rounds with the human as moderator. They disagreed in interesting ways — Gemini argued use-case-first (citing Crossing the Chasm), MiniMax argued primitive-first (citing category creation). We pushed back on both. Round 5 they converged on a hybrid.

Session id: 1379b231-9d2b-4e06-a974-e9eb9217fbb6. Recorded as live demo.

The headline currently shipping:

ato is your local war room for humans and LLMs: decide together, call real tools, walk out with a signed audit trail.

That sentence was produced by the product, on camera, in 45 minutes of structured debate. We find it hard to fake that.

Honest tradeoffs vs. llm-council

	llm-council	ato
Install	`uv sync && npm install`	Tauri desktop + CLI (heavier)
Vendor lock	OpenRouter only	20+ runtimes, CLI subs supported
Tool calls	None	`read_file` / `grep` / `git_log`
Audit trail	JSON dump	SQLite session, per-turn tool-call log
Anonymized cross-review	Yes	No — identity is the receipt
Chairman synthesizes	Yes	Human moderates (CLI flag coming)
ChatGPT-like web UI	Yes	Desktop GUI + CLI + MCP
License	Unlicensed	MIT
Maintained	Explicitly no	Active, daily ships

A few of those are real losses to call out:

Higher install friction. A docker-compose for the “just let me try it” persona is on the way.

No Chairman pattern (yet). Karpathy’s design synthesizes the council’s verdict for you. ato puts the human in the chairman seat — intentional, because the high-touch part is the feature for power users who want product input between turns. An opt-in --chairman <runtime> flag is on the roadmap for the audience that wants a pre-chewed verdict.

Less polished web UI. We’re not competing on the ChatGPT-clone vector. The CLI + GUI + MCP triad is the wedge.

How to try it

brew tap WillNigri/ato
brew install --cask ato

Or grab the DMG / AppImage / Windows installer directly from Releases.

Bring your own API keys, or piggyback on a CLI subscription you already have (Claude Code, Codex CLI, Gemini CLI, OpenClaw, Hermes, Ollama). OpenRouter is supported, not required.

A thank-you and an open invitation

@karpathy — thanks for shipping the primitive. Two of the things ato does differently were directly motivated by Thiago’s first read of your repo on WhatsApp last weekend (“multi-provider so my Claude Max isn’t wasted,” “tool calls so the council can actually read my code”). Happy to remove or restructure the framing if you’d prefer a different reference; otherwise consider this the maintained-fork-in-spirit. Either way, the council pattern was the unlock.

If you starred llm-council and have been waiting for someone to actually finish it — this is us trying. Bug reports and feedback are very welcome; the issues tab is wide open.

— Beatriz Nigri