SWE-PolyBench
an archive of posts with this tag
| Apr 05, 2026 | Hidden Naming Contracts in SWE-Agent Benchmarks A programmatic scan of six SWE-bench-style benchmarks — SWE-bench Verified, SWE-bench Pro and SWE-PolyBench — finds tests that encode hidden naming contracts, penalizing behaviorally correct fixes that choose different identifiers. |
|---|