SWE-Bench_Pro
an archive of posts with this tag
| Apr 05, 2026 | Hidden Naming Contracts in SWE-Agent Benchmarks A programmatic scan across six SWE-bench-style benchmarks finds that tests sometimes encode hidden naming requirements, penalizing behaviorally correct fixes that choose different identifiers. |
|---|