Coming Soon

CLI Arena:
A Comprehensive Framework for Evaluating for AI Coding Agents

By AfterQuery

Overview

CLI Arena is a comprehensive framework for evaluating AI coding agents on real-world software engineering tasks across diverse technology stacks. Coding agents like Anthropic's Claude Code, OpenAI's Codex CLI, Google's Gemini CLI, and Anysphere's Cursor CLI are tested. While adoption of CLI agents is rapidly growing, there is no existing benchmark to rigorously test how developers actually use these tools in practice. Current evals fall short of reflecting real use-cases of CLI agents—complex, iterative workflows across full codebases.

CLI Arena is currently collecting design input and feedback from labs with CLI agents and the developer community. Our goal is to refine CLI Arena to be as useful as possible for both developers and researchers.

Please reach out to spencer [at] afterquery [dot] com and ethan [at] afterquery [dot] com to get access to the private CLI Arena repo, share feedback, and/or discuss collaboration opportunities.

CLI Arena:A Comprehensive Framework for Evaluating for AI Coding Agents

Overview

CLI Arena:
A Comprehensive Framework for Evaluating for AI Coding Agents