A Review of Automated Text Summarization Models on Diverse Datasets: An Evaluation Perspective
Keywords:
Automatic Text Summarization (ATS), Natural Language Processing (NLP), Extractive and Abstractive Summarization, Transformer-Based Models (BERT, BART, T5, PEGASUS), Deep Learning for Summarization, Large Language Models (LLMs)Abstract
This paper reviews Automatic Text Summarization which is one of the tasks in Natural Language Processing (NLP). It is driven by speedy increase in textual data across domains. The reviews systematically examined the recent advancements in Extractive, Abstractive and hybrid automatic text Summarization Models between 2019 and 2025 using Preferred Reporting Items for Reviews and Meta-Analysis (PRISMA). Selected and relevant related papers were taken from Elsevier, Google scholar, IEEE Xplorer, ACM digital library, and Springer. After removing duplicates (n=96), 174 irrelevant records were removed to meet the inclusion criteria covering models like BERT (Bidirectional Encoder Representations from Transformers), BART (Bidirectional and Auto Regressive Transformers), T5 (Text-To-Text Transformer), TextRank, LSA (Latent Semantic Analaysis), and PEGASUS (Pre-training with Extracted Gap-sentences for Abstractive Summarization Sequence-to-to-Sequence Models) across Diverse datasets including news, scholarly and technical corpora. Extractive approaches depicted strong lexical accuracy and computational efficiency, whereas transformer-based Abstractive models showed superior semantic coherence but needed higher computational costs. This review paper also highlighted persistent gaps including dataset bias, long-document Summarization, hallucination in generative models, and over reliance on traditional metrics such as ROUGE.The results show the need for cross-domain evaluation, hybrid model integration, and adoption of advanced semantic metrics like BERTScore and MoverScore. Future directions should take into priority cross-domain benchmarks, standardized multi-metric evaluation, hybrid approach exploration and testing for long and multilingual documents. In furtherance, Reproducible Reporting of Computational cost such as GPU-hours and failure modes such as hallucinations will support more practical comparisons.