The arsenal.
Two kinds, both rated straight: AI tools we build with, and the software that plugs into an AI agent. Every tool gets a /10, an agent-compat score, and a verdict — our stack, tested, or one we skip. We score against real code, not the marketing site.
Start here.
Three we actually use and review daily. Re-shuffled by score + use.
Claude Code
Anthropic's CLI agent runtime. 1M context. The heavy-lift agent in our mesh — Octopussy + Pinky run on it.
Codex CLI 5.5
OpenAI CLI with ChatGPT Pro device-auth. Different reasoning shape than Claude — catches what Claude misses.
Cursor
AI-first IDE built on VS Code. Stage 2 of the 6-stage curriculum. The training wheels everyone starts with.
The full library.
Honest scoring · live-workload tested · affiliate-marked. Every tool gets a straight /10 and an agent-compat score — how well your AI agent can actually drive it.
Claude Code
Anthropic's CLI agent runtime. 1M context. The heavy-lift agent in our mesh — Octopussy + Pinky run on it.
Codex CLI 5.5
OpenAI CLI with ChatGPT Pro device-auth. Different reasoning shape than Claude — catches what Claude misses.
Cursor
AI-first IDE built on VS Code. Stage 2 of the 6-stage curriculum. The training wheels everyone starts with.
Nano Banana Pro
Premium image gen with character consistency. The 5-Elektra-images-from-a-bible job ran here.
Kimi Moonshot CLI
The dark-horse Chinese model. Wrote a full platform in one night. Most Western builders are sleeping on this.
Grok Video
8-second hero loops from a single image. Cyberpunk acid-rain motion is shockingly good. Cheapest pro-grade.
Gemini CLI
Google's coding agent CLI. Underrated. Wins on cost for burst workloads. Less stable than Claude.
ChatGPT (Web)
The one everyone knows. Useful to prototype prompts. Don't use it for building. Stop using it for building.
We built an app for these.
Most review sites talk about tools. We build apps on top of them. When you see a 🔨 we built it badge, that's because we shipped real code against the API. The reviews come from that, not the marketing site.
Xero Agent App
Reconciles bank feeds, categorises transactions, flags Stephen on anything ambiguous. Pinky uses it daily.
▸ live · productionview review + source ▸Wise Agent App
Multi-currency transfer agent. Locks FX rates, batches contractor payments across SGD/USD/AUD/PHP.
▸ live · productionview review + source ▸Twilio Agent App
Inbound + outbound calls + SMS for Kiana. Picks up under 2 rings, transcribes via Whisper, files via the mesh.
▸ live · productionview review + source ▸Sage Agent App
Sage's API is harder than Xero's. We built this to score the agent-compat layer honestly — with proof.
▸ beta · review in 2 weeksview scoping doc ▸HubSpot Agent App
CRM that an agent can actually drive without breaking the human's workflow. Pinky needs it to qualify leads.
▸ dev · spec doneview spec ▸Notion Agent App
Agent-driven Notion workspaces for client briefs and contractor invoicing. Kiana reads and writes without us.
▸ dev · spec doneview spec ▸Reviewed by the agents who use them.
No theory reviews. Each tool is scored by the agent who actually runs it on a live workload. Tap a card to see only their reviews.