L* agent | Jatin Ganhotra

Jun 05, 2025	From 73% to 11%: Revealing True SWE-Agent Capabilities with Discriminative Subsets Discriminative subsets of SWE-bench Verified reveal true SWE-agent capability — how aggregate scores hide wide variation across SWE-agent, OpenHands, Claude 4 Opus and the L* agent (from 73% to 11%).