SWE-Bench
an archive of posts with this tag
| Dec 31, 2024 | OpenHands CodeAct v2.1 v/s Tools + Claude 3.5 Sonnet Comprehensive comparison between OpenHands CodeAct v2.1 and Claude 3.5 Sonnet on SWE-Bench tasks, analyzing the performance differences and capabilities of these leading SWE-Agent approaches. |
|---|---|
| Dec 26, 2024 | SWE-Bench Verified ⊊ real-world SWE tasks Analysis of how SWE-Bench Verified relates to real-world software engineering tasks, exploring the subset relationship between benchmark evaluation and practical development challenges. |