SWE-PolyBench

an archive of posts with this tag

Apr 05, 2026 Hidden Naming Contracts in SWE-Agent Benchmarks
A programmatic scan of six SWE-bench-style benchmarks — SWE-bench Verified, SWE-bench Pro and SWE-PolyBench — finds tests that encode hidden naming contracts, penalizing behaviorally correct fixes that choose different identifiers.