Pre-Trained Language Models for Abstractive Text Summarization: A Comparative Study

Sunday 30 March 2025


The quest for perfect text summarization has been an ongoing challenge in the realm of natural language processing. Researchers have long sought to develop algorithms that can accurately condense lengthy texts into concise, informative summaries while preserving their essential content. A recent study published by a team of researchers from Jadavpur University and Techno India University delves into this problem, presenting four pre-trained models for abstractive text summarization.


The approach taken by the researchers involves fine-tuning four large language models (LLMs) – BART, FLAN-T5, LLaMA-3-8B, and Gemma-7B – on five diverse datasets. Each dataset consists of a collection of articles paired with their corresponding summaries. The models are trained to generate concise summaries that closely resemble the original summaries.


The results show that each model exhibits distinct strengths and weaknesses when evaluated using various metrics, including ROUGE (Recall-Oriented Understudy for Gisting Evaluation), METEOR, and BERTScore. FLAN-T5 emerges as a top performer on the CNN/DM dataset, while Gemma-7B takes the lead on XSum. In contrast, LLaMA-3-8B performs well on BBC News, and BART excels on the News Summary dataset.


The researchers also conducted a human evaluation, where ChatGPT was asked to select the most preferred machine-generated summary from each example. While the results show some discrepancies with the automatic metric-based evaluations, they generally align closely with the top-performing models identified earlier.


Upon examining the summaries generated by each model, it becomes clear that they exhibit varying levels of coherence, accuracy, and relevance. Some models tend to repeat information or introduce unnecessary details, while others struggle to accurately capture the essence of the original text.


Despite these limitations, the study highlights the potential of pre-trained language models for abstractive text summarization. By fine-tuning these models on specific datasets, researchers can tailor their performance to particular domains and applications. The findings also underscore the importance of human evaluation in assessing the quality of machine-generated summaries.


The development of more sophisticated summarization algorithms will likely involve continued advances in natural language processing and machine learning. As the field continues to evolve, it is essential to prioritize both automated evaluations and human assessments to ensure that these models can effectively condense complex texts into informative and accurate summaries.


Cite this article: “Pre-Trained Language Models for Abstractive Text Summarization: A Comparative Study”, The Science Archive, 2025.


Text Summarization, Natural Language Processing, Abstractive Text Summarization, Pre-Trained Models, Fine-Tuning, Datasets, Rouge, Meteor, Bertscore, Human Evaluation


Reference: Tohida Rehman, Soumabha Ghosh, Kuntal Das, Souvik Bhattacharjee, Debarshi Kumar Sanyal, Samiran Chattopadhyay, “Evaluating LLMs and Pre-trained Models for Text Summarization Across Diverse Datasets” (2025).


Leave a Reply