Combining Pre-Trained Language Models with Lexical Semantic Features for Vietnamese Sentiment Analysis

Saturday 08 March 2025

A team of researchers has made a significant breakthrough in developing a new approach to sentiment analysis, a crucial task in natural language processing. The study, published recently, proposes a novel method that combines two powerful techniques: pre-trained language models and lexical semantic features.

Sentiment analysis is the process of determining whether a piece of text expresses a positive or negative sentiment, often used in applications such as product reviews, social media monitoring, and opinion mining. Traditional approaches to sentiment analysis rely heavily on machine learning algorithms and feature extraction techniques, but these methods can be limited by their reliance on hand-crafted features and may not generalize well to new domains.

The researchers’ approach uses a pre-trained language model called PhoBERT- V2, which is specifically designed for Vietnamese text. This model has achieved state-of-the-art performance in various natural language processing tasks, including sentiment analysis. However, the researchers found that PhoBERT-V2 alone was not sufficient to achieve high accuracy on their dataset.

To address this limitation, they introduced a novel lexical semantic feature called SentiWordNet, which is designed specifically for Vietnamese text. SentiWordNet is a dictionary-based approach that leverages the meanings of words and phrases to determine their sentiment. The researchers used SentiWordNet to extract sentiment features from the text, which were then combined with the output of PhoBERT-V2.

The resulting model, called CombViSA, achieved excellent performance on two datasets: VLSP 2016 and AIVIVN 2019. These datasets consist of user reviews from e-commerce websites, which is a challenging task due to the variability in language style and tone.

One of the key advantages of CombViSA is its ability to handle long sentences and complex text structures, which are common in Vietnamese language. The model’s performance was evaluated using three metrics: precision, recall, and F1 score. The results showed that CombViSA outperformed other baseline models, including those that used traditional machine learning algorithms.

The researchers’ approach has significant implications for the development of sentiment analysis systems for Vietnamese text. By combining pre-trained language models with lexical semantic features, they have created a robust and accurate model that can be applied to various applications.

Furthermore, the study highlights the importance of developing domain-specific resources, such as SentiWordNet, which can improve the performance of machine learning models. The researchers’ approach demonstrates the potential benefits of integrating multiple techniques and domains to achieve better results in natural language processing.

Cite this article: “Combining Pre-Trained Language Models with Lexical Semantic Features for Vietnamese Sentiment Analysis”, The Science Archive, 2025.

Here Are The Keywords: Sentiment Analysis, Natural Language Processing, Vietnamese Text, Phobert-V2, Sentiwordnet, Lexical Semantic Features, Combvisa, Machine Learning Algorithms, Feature Extraction, Domain-Specific Resources

Reference: Hong-Viet Tran, Van-Tan Bui, Lam-Quan Tran, “Expanding Vietnamese SentiWordNet to Improve Performance of Vietnamese Sentiment Analysis Models” (2025).

Leave a ReplyCancel Reply

Related Posts

Neural USD: A Novel Approach to Object-Centric Image Editing

Integrating Information Extraction with Target Databases for Efficient Data Analysis

Breaking Barriers in Distributed Graph Algorithms: A New Algorithm for Efficiently Coloring Graphs with Bounded Neighborhood Independence

Realistic Urban Traffic Simulation for Autonomous Vehicles

Unraveling Chaos: A New Approach to Forecasting Complex Systems

ArtiLatent: A Breakthrough Framework for Realistic 3D Object Generation from Single Images