Amazon Q

an archive of posts with this tag

Mar 30, 2025 The Multi-File Frontier: Why SWE-Bench Verified Doesn't Reflect Real-World Programming Challenges
Why SWE-bench Verified's focus on single-file changes misses real-world multi-file programming — analyzed across SWE-agent, Agentless, Claude 3 Opus, Claude 3.5 Sonnet, OpenAI o1 and Amazon Q.
Dec 26, 2024 SWE-Bench Verified ⊊ real-world SWE tasks
Why SWE-bench Verified is only a subset of real-world software engineering tasks — comparing SWE-agents such as OpenHands CodeAct v2.1, Amazon Q, SWE-agent, Agentless and AutoCodeRover, with Claude 3.5 Sonnet.