2026
an archive of posts from this year
| Apr 05, 2026 | Hidden Naming Contracts in SWE-Agent Benchmarks A programmatic scan across six SWE-bench-style benchmarks finds that tests sometimes encode hidden naming requirements, penalizing behaviorally correct fixes that choose different identifiers. |
|---|