2026

an archive of posts from this year

Apr 05, 2026 Hidden Naming Contracts in SWE-Agent Benchmarks
A programmatic scan across six SWE-bench-style benchmarks finds that tests sometimes encode hidden naming requirements, penalizing behaviorally correct fixes that choose different identifiers.