Bug localization continues to be a challenge for both bug triage and root-cause analysis. Agentic approaches suggest a way forward. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

The Innovation
This month’s pick is LocAgent- Graph-Guided LLM Agents for Code Localization. The authors are from Yale, USC, Stanford and All-Hands AI. The paper was posted in April 2025 and has 25 citations so far.
It’s been a while since we last looked at localization. Agentic activity around debug, growing more popular, prompts another look. The paper’s method is based on building a graph view of a code base, where nodes are functions, found in other objects (classes, files, directories) and edges are relations between nodes like contain, invoke, import, inherit.
One point I find interesting is that this is a purely static analysis of the codebase supported by training against (GitHub) bug and resolution reports. There is no attempt that I saw to probe dynamic behavior to test hypotheses. The authors claim high levels of accuracy at file, module and function levels.
Paul’s view
We look again this month at LLM-based code localization – finding the relevant file/class/function to fix a bug or make an enhancement. Much of the focus for ongoing research, both in academia and industry, is on the retrieval part of the prompt engineering process, where additional information is added to the prompt to help the LLM perform its localization task.
This month’s paper out of Stanford, Yale, and USC, presents a retrieval system, LocAgent, based on a knowledge graph that is essentially the union of file/directory hierarchy on disk (contain, import edges) and class/function relationships based on the elaborated syntax tree for the code (call, inherit edges). Traversing edges in this graph can traverse files and directories as well as function call paths in the code.
The authors also contribute a new benchmark suite for code localization, LocBench. This suite is based on code changes to Python repositories in Github dated October 2024 or later, which is after the training date for all the LLMs benchmarked in the paper. It also has a balance of code changes between bug fixes, new feature requests, and performance issues (around 240, 150, and 140 cases respectively).
The authors benchmark LocAgent vs. other leading code localizers using LocBench, measuring each localizer’s ability to report the correct set of files/modules/functions. Using Claude-3.5-Sonnet, and compared OpenHands, the best alternate tool benchmarked, LocBench scores 6% higher in its ability to report the correct files in its top-10, and about 2% better at reporting the correct methods and functions in its top-10. A solid contribution in probably one of the most actively researched topics in agentic AI today. It’s worth noting however that the absolute top 10 score on function localization is around 60% which is still low, so huge room still for improvement.
As a final contribution, the authors also present a fine-tuned mini-LLM (7B parameter Qwen model) based on around 800 training cases taken from GitHub prior to October 2024 and show that this works reasonably well with LocAgent on LocBench, with about 8% lower top-10 scores. Its inference cost is 15x lower than using Claude-3.5-Sonnet (5 cents vs. 79 cents). An interesting datapoint on cost-outcome trade-off.
Raúl’s view
Where in a code base do you fix a bug, add a feature or address security or performance improvements? This month’s paper proposes LOCAGENT, a framework for code localization — the task of identifying the precise files, classes, or functions in a codebase that need to be modified for a fix. LOCAGENT combines 1) A unified graph representation of an entire code structure and dependencies with four node types (directory, file, class and function) and four relation types (contain, import, invoke and inherit). It is lightweight (takes seconds to build), enables powerful multi-hop reasoning, unifies content search with structural traversal, and is explicitly optimized for LLM consumption. 2) Only three highly optimized tools for search and traversal (SearchEntity, TraverseGraph and RetrieveEntity). 3) Fine-tuned open-source LLMs (Qwen-2.5-Coder 7B and 32B) for localization.
The authors claim that existing benchmarks have two key problems: potential training contamination (LLMs may have seen these repos/issues during pretraining), and overfocus on bug reports, lacking coverage of feature requests, security issues, and performance problems. So, they created a new benchmark, LOC-BENCH collected from modern post-2024 repos to avoid contamination, which includes 560 issues across 4 categories (bugs, features, performance, security), curated to ensure issues require realistic localization. The authors also use SWE-Bench-Lite (300 cases) which was created primarily to evaluate end-to-end bug-fixing capabilities (localization being only an intermediate step) to evaluate their approach.
The experimental evaluation of LOCAGENT includes several dimensions, below my attempt to summarize them:
- Using fine-tuned open source LLMs (Qwen 2.5) achieves performance comparable to Claude-3.5 at ~86% lower cost.
- LOCAGENT is correct @ one of the top 1 to 10 predictions in locating file, module and function between 72-94%, outperforming all other methods: embedding-based 40-85%, procedure-based 55-80% and agent-based 46-90%.
- Cost efficiency is evaluated only against agent-based methods (best of the rest), and LOCAGENT using Qwen beats the best (Moatless using Claude) by .09$ to .46$ for accuracy @ 10 predictions.
- Evaluating the tools using LOC-BENCH shows LOCAGENT being superior to other agent-based methods by small margins (5-10%, with the exception being security enhancements up to 20%).
Code localization is an important but underemphasized task relative to bug fixing or code generation. The paper convincingly argues that localization is distinct from retrieval, more complex and often the bottleneck in program repair. The approach using a heterogeneous graph, agent tools and fine-tuned LLMs is somewhat novel and empirically effective. LOC-BENCH is well motivated and the breath of empirical analysis and the results are strong. Evaluation is limited to Python, no evidence is provided on non-Python languages, and general applicability is asserted but not demonstrated. Some baselines (SWE-Agent, OpenHands) are not primarily localization systems, so comparisons may overstate LOCAGENT’s advantages.
This is a strong paper with contributions to code localization and agentic software engineering. It is technically sound, well-evaluated, and practically useful. The combination of a unified dependency graph, carefully designed agent tools, and efficient fine-tuning forms an approach that advances the state of the art.
Also Read:
WEBINAR: Is Agentic AI the Future of EDA?
Emulator-Like Simulation Acceleration on GPUs. Innovation in Verification
Neurosymbolic code generation. Innovation in Verification
Share this post via:

Comments
There are no comments yet.
You must register or log in to view/post comments.