blog

an archive of posts in this category

Jul 26, 2025 The Visual Complexity Penalty in Code Understanding - SWE-bench Multimodal Analysis
Analyzing how visual content dramatically impacts AI agents' performance on SWE tasks
Jun 05, 2025 From 73% to 11%: Revealing True SWE-Agent Capabilities with Discriminative Subsets
Uncovering the real performance of SWE-Agents by analyzing discriminative subsets of SWE-Bench Verified, showing how aggregate scores can mask significant performance variations across task types.
Apr 15, 2025 Cracking the Code: How Difficult Are SWE-Bench-Verified Tasks Really?
Analysis of task difficulty distribution in SWE-Bench-Verified using human annotations, revealing the true complexity spectrum and what it means for AI coding performance evaluation.
Mar 30, 2025 The Multi-File Frontier: Why SWE-Bench Verified Doesn't Reflect Real-World Programming Challenges
Deep analysis of why SWE-Bench Verified's focus on single-file changes doesn't represent real-world programming challenges that typically involve multi-file modifications and complex codebase interactions.
Jan 05, 2025 Do SWE-Agents Solve Multi-File Issues Like Humans? A Deep Dive into SWE-Bench Verified
Exploring how SWE-Agents handle multi-file software engineering tasks compared to human developers, with detailed analysis of patterns and performance on SWE-Bench Verified benchmark.
Dec 31, 2024 OpenHands CodeAct v2.1 v/s Tools + Claude 3.5 Sonnet
Comprehensive comparison between OpenHands CodeAct v2.1 and Claude 3.5 Sonnet on SWE-Bench tasks, analyzing the performance differences and capabilities of these leading SWE-Agent approaches.
Dec 26, 2024 SWE-Bench Verified ⊊ real-world SWE tasks
Analysis of how SWE-Bench Verified relates to real-world software engineering tasks, exploring the subset relationship between benchmark evaluation and practical development challenges.
Jan 21, 2014 Installing Octave on OS X 10.9 Mavericks
Aug 30, 2013 Comparison is always false due to limited range of data type
Aug 03, 2013 Keyboard Review - Microsoft Natural Ergonomic Keyboard 4000
Comprehensive review of the Microsoft Natural Ergonomic Keyboard 4000 from a software developer's perspective, focusing on ergonomics, comfort, and RSI prevention for long coding sessions.
Feb 24, 2013 Concurrent and Sequential statements in Verilog
A beginner's guide to understanding concurrent and sequential statements in Verilog HDL, explaining how Verilog differs from conventional programming languages.
Feb 09, 2013 C++ - Variable Declaration in 'if' expression
Nov 25, 2012 Forward Class Declaration in C++
Nov 03, 2012 Why use GIT and hang CVS?
Presentation and discussion on the advantages of Git over CVS for version control, covering key benefits and reasons why developers should migrate from legacy systems to modern Git workflows.
Aug 25, 2012 Integer Limits and Types In C/C++
Understanding platform-dependent primitive data types in C/C++, including integer sizes, limits, and portability considerations across different architectures.