blog
an archive of posts in this category
| Jul 26, 2025 | The Visual Complexity Penalty in Code Understanding - SWE-bench Multimodal Analysis Analyzing how visual content dramatically impacts AI agents' performance on SWE tasks |
|---|---|
| Jun 05, 2025 | From 73% to 11%: Revealing True SWE-Agent Capabilities with Discriminative Subsets Uncovering the real performance of SWE-Agents by analyzing discriminative subsets of SWE-Bench Verified, showing how aggregate scores can mask significant performance variations across task types. |
| Apr 15, 2025 | Cracking the Code: How Difficult Are SWE-Bench-Verified Tasks Really? Analysis of task difficulty distribution in SWE-Bench-Verified using human annotations, revealing the true complexity spectrum and what it means for AI coding performance evaluation. |
| Mar 30, 2025 | The Multi-File Frontier: Why SWE-Bench Verified Doesn't Reflect Real-World Programming Challenges Deep analysis of why SWE-Bench Verified's focus on single-file changes doesn't represent real-world programming challenges that typically involve multi-file modifications and complex codebase interactions. |
| Jan 05, 2025 | Do SWE-Agents Solve Multi-File Issues Like Humans? A Deep Dive into SWE-Bench Verified Exploring how SWE-Agents handle multi-file software engineering tasks compared to human developers, with detailed analysis of patterns and performance on SWE-Bench Verified benchmark. |
| Dec 31, 2024 | OpenHands CodeAct v2.1 v/s Tools + Claude 3.5 Sonnet Comprehensive comparison between OpenHands CodeAct v2.1 and Claude 3.5 Sonnet on SWE-Bench tasks, analyzing the performance differences and capabilities of these leading SWE-Agent approaches. |
| Dec 26, 2024 | SWE-Bench Verified ⊊ real-world SWE tasks Analysis of how SWE-Bench Verified relates to real-world software engineering tasks, exploring the subset relationship between benchmark evaluation and practical development challenges. |
| Jan 21, 2014 | Installing Octave on OS X 10.9 Mavericks |
| Aug 30, 2013 | Comparison is always false due to limited range of data type |
| Aug 03, 2013 | Keyboard Review - Microsoft Natural Ergonomic Keyboard 4000 Comprehensive review of the Microsoft Natural Ergonomic Keyboard 4000 from a software developer's perspective, focusing on ergonomics, comfort, and RSI prevention for long coding sessions. |
| Feb 24, 2013 | Concurrent and Sequential statements in Verilog A beginner's guide to understanding concurrent and sequential statements in Verilog HDL, explaining how Verilog differs from conventional programming languages. |
| Feb 09, 2013 | C++ - Variable Declaration in 'if' expression |
| Nov 25, 2012 | Forward Class Declaration in C++ |
| Nov 03, 2012 | Why use GIT and hang CVS? Presentation and discussion on the advantages of Git over CVS for version control, covering key benefits and reasons why developers should migrate from legacy systems to modern Git workflows. |
| Aug 25, 2012 | Integer Limits and Types In C/C++ Understanding platform-dependent primitive data types in C/C++, including integer sizes, limits, and portability considerations across different architectures. |