Hierarchical Approach to Code Summarization Using Large Language Models

Friday 07 March 2025


The quest for better source code summarization has been an ongoing challenge in software development, with developers and researchers alike seeking ways to distill complex codebases into concise, understandable summaries. In a recent paper, a team of researchers proposes a novel approach to solving this problem by leveraging large language models (LLMs) trained on vast amounts of text data.


The researchers’ approach involves breaking down large code repositories into smaller segments, such as functions and variables, and then using LLMs to generate summaries for each segment. These summaries are then combined to create high-level summaries of the entire repository. This hierarchical approach allows the model to focus on specific aspects of the code while still capturing its broader context.


To improve the accuracy of these summaries, the researchers incorporated domain-specific knowledge into the model’s training data. By providing the model with information about the business application and problem context, they were able to generate summaries that are not only concise but also relevant and domain-aware.


The team tested their approach on a publicly available repository for a business support system in the telecommunications domain, evaluating its performance using traditional metrics such as ROUGE- L and BLEU. The results showed significant improvements over existing methods, with the hierarchical approach generating summaries that were both accurate and informative.


One of the key advantages of this approach is its ability to handle large codebases with ease. By decomposing the code into smaller segments, the model can process and summarize even the most complex repositories in a manageable way. This makes it an attractive solution for developers working on large-scale software projects.


The researchers also explored the potential applications of their approach beyond source code summarization. They demonstrated its effectiveness in generating commit messages and bug reports, highlighting its versatility and potential for broader impact.


As the software development landscape continues to evolve, the need for effective code summarization tools will only grow more pressing. The hierarchical LLM approach proposed by this team offers a promising solution to this challenge, and its potential applications extend far beyond the realm of source code analysis.


Cite this article: “Hierarchical Approach to Code Summarization Using Large Language Models”, The Science Archive, 2025.


Large Language Models, Software Development, Code Summarization, Hierarchical Approach, Domain-Specific Knowledge, Business Applications, Telecommunications, Rouge-L, Bleu, Commit Messages, Bug Reports


Reference: Nilesh Dhulshette, Sapan Shah, Vinay Kulkarni, “Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs” (2025).


Leave a Reply