Imagine having a single leaderboard that compares the performance of different AI code editors like Cursor, Windsurf, and others. Sounds too good to be true? Well, I’m on a mission to make it a reality.
As a developer, I’ve struggled to find a comprehensive comparison of these editors. SWE-bench Verified is a great resource, but it’s harness-based, not editor-native. That’s why I want to create a community-driven spreadsheet that crowdsources small, testable tasks to evaluate these editors.
I propose we track core fields like editor version, model provider, mode (inline, chat, or agent), task type, evaluation metrics (tests, rubric), end-to-end time, retries, human assistance level, and cost or tokens (if visible). Optional fields could include temperature, top-p, or max-tokens if the UI exposes them.
I’ve seen Windsurf community comparisons and Aider’s editor-specific leaderboards, but what I’m looking for is a single, cross-editor leaderboard. If it doesn’t exist, I’m happy to start a minimal, editor-agnostic sheet with the core fields and let the community contribute.
So, who’s with me? Share your thoughts in the comments, and let’s create a resource that helps developers make informed decisions about the best AI code editor for their needs.