blog | Jatin Ganhotra

A New Chapter in Blogging Exploring the World of Agents

After a decade-long hiatus, I am thrilled to announce my return to blogging! This new journey will center around the fascinating and ever-evolving domain of Agents, with a particular focus on Software Engineering Agents (SWE-Agents).

Through this blog, I aim to share insights, ideas, and developments in this exciting field. My goal is to spark thought-provoking discussions and provide content that is both insightful and valuable to readers. Your feedback and perspectives will be invaluable, so I warmly invite you to share your thoughts in the comments and join the conversation.

The Visual Complexity Penalty in Code Understanding - SWE-bench Multimodal Analysis

Analyzing how visual content dramatically impacts AI agents' performance on SWE tasks

10 min read · July 26, 2025

2025 · evaluation benchmarks SWE-Bench Multimodal · blog swe-agents
From 73% to 11%: Revealing True SWE-Agent Capabilities with Discriminative Subsets

Uncovering the real performance of SWE-Agents by analyzing discriminative subsets of SWE-Bench Verified, showing how aggregate scores can mask significant performance variations across task types.

14 min read · June 05, 2025

2025 · evaluation benchmarks SWE-Bench_Verified · blog swe-agents
Cracking the Code: How Difficult Are SWE-Bench-Verified Tasks Really?

Analysis of task difficulty distribution in SWE-Bench-Verified using human annotations, revealing the true complexity spectrum and what it means for AI coding performance evaluation.

10 min read · April 15, 2025

2025 · evaluation benchmarks SWE-Bench_Verified · blog swe-agents
The Multi-File Frontier: Why SWE-Bench Verified Doesn't Reflect Real-World Programming Challenges

Deep analysis of why SWE-Bench Verified's focus on single-file changes doesn't represent real-world programming challenges that typically involve multi-file modifications and complex codebase interactions.

6 min read · March 30, 2025

2025 · evaluation benchmarks SWE-Bench_Verified · blog swe-agents
Do SWE-Agents Solve Multi-File Issues Like Humans? A Deep Dive into SWE-Bench Verified

Exploring how SWE-Agents handle multi-file software engineering tasks compared to human developers, with detailed analysis of patterns and performance on SWE-Bench Verified benchmark.

29 min read · January 05, 2025

2025 · evaluation benchmarks SWE-Bench_Verified · blog swe-agents

A New Chapter in Blogging Exploring the World of Agents

The Visual Complexity Penalty in Code Understanding - SWE-bench Multimodal Analysis

From 73% to 11%: Revealing True SWE-Agent Capabilities with Discriminative Subsets

Cracking the Code: How Difficult Are SWE-Bench-Verified Tasks Really?

The Multi-File Frontier: Why SWE-Bench Verified Doesn't Reflect Real-World Programming Challenges

Do SWE-Agents Solve Multi-File Issues Like Humans? A Deep Dive into SWE-Bench Verified