Launch of SWE-Bench-Arena
SWE-Bench-Arena, a platform for blind evaluation of AI-generated code patches, is now live. Unlike benchmarks that only measure test-pass rates, SWE-Bench-Arena evaluates patches across five production-relevant dimensions: correctness, maintainability, readability, performance, and simplicity.
SWE-Bench-Arena — blind evaluation of AI-generated code patches
Related resources: