Deciphering Online Personas: A Study on Author Profiling using Machine Learning

Saturday 01 February 2025


The quest for a more nuanced understanding of online personas has led researchers to develop sophisticated machine learning models that can predict an author’s gender and age based on their writing style. In a recent study, a team of scientists delved into the realm of Bangla language, where they created a dataset of 300 authors and over 30,000 labeled Facebook status updates.


The researchers employed a range of classical machine learning algorithms, including Support Vector Machines (SVM) and Naive Bayes, to classify the authors’ gender and age. They also explored the use of deep learning techniques, such as Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs), to analyze the linguistic patterns in the text.


The study found that classical machine learning models generally outperformed their deep learning counterparts in predicting an author’s gender and age. SVM and Naive Bayes emerged as top performers, with accuracy scores of over 80% for gender classification and nearly 92% for age classification.


One notable finding was the consistent performance of Naive Bayes across different models and hyperparameters. This algorithm’s simplicity and robustness make it an attractive choice for author profiling tasks. SVM, on the other hand, demonstrated strong performance in specific configurations, particularly when paired with word n-gram features and polynomial kernels.


The researchers also analyzed the performance of deep learning models, which tended to struggle with age classification. However, CNNs showed promising results when used for gender classification, achieving an accuracy score of 71%.


In exploring the linguistic patterns that contribute to author profiling, the study highlighted the importance of word n-grams and character n-grams in capturing an author’s writing style. These features can be used to identify subtle differences between male and female writers or different age groups.


The findings of this study have significant implications for the field of natural language processing and social media analysis. By developing more accurate models for author profiling, researchers can better understand online behavior, detect disinformation campaigns, and improve content recommendation algorithms.


As researchers continue to explore the intricacies of human communication on social media, they will need to balance the complexity of their models with the simplicity of their interpretations. The development of robust author profiling techniques will require careful consideration of linguistic patterns, cultural context, and individual variation.


Cite this article: “Deciphering Online Personas: A Study on Author Profiling using Machine Learning”, The Science Archive, 2025.


Machine Learning, Author Profiling, Natural Language Processing, Social Media, Gender Classification, Age Classification, Word N-Grams, Character N-Grams, Svm, Naive Bayes


Reference: Raisa Tasnim, Mehanaz Chowdhury, Md Ataur Rahman, “BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts” (2024).


Leave a Reply