AI Language Models Limited Ability to Capture Cultural and Moral Norms

Friday 31 January 2025


As AI language models continue to advance, researchers are increasingly interested in understanding their ability to capture cultural and moral norms. A recent study set out to investigate whether four pre-trained language models – BLOOMZ-560M, OPT-125M, GPT-2 base model, and Qwen2 – can accurately reflect human moral judgments across different cultures.


The researchers used two datasets: the World Values Survey (WVS) and the PEW Research Center’s survey data. The WVS dataset includes responses from over 100 countries on various moral topics, while the PEW dataset focuses on a smaller set of countries with more nuanced moral questions.


The results show that all four models tend to simplify complex moral judgments, often characterizing most topics as generally acceptable. This is concerning, as human morality is inherently complex and context-dependent. The study found that the BLOOMZ-560M model, which is trained on 46 different languages, performed slightly better than the other three models in mirroring societal views.


One of the key findings was that the choice of moral tokens used had a greater impact on the model scores than the choice of prompt types. This highlights the importance of carefully selecting the language and tone used to elicit human responses when evaluating AI models’ ability to capture cultural and moral norms.


The study also found significant negative correlations between the model scores and actual survey responses, indicating that the models tend to diverge from human judgments on many topics. Notably, the GPT-2 base model showed a higher incidence of negative scores compared to the other three models.


The researchers acknowledge several limitations of their study, including the use of publicly available datasets that may not fully represent moral norms across all cultures and languages. They also note that averaging moral ratings for each culture simplifies the diverse range of moral values to a single value.


Despite these limitations, the study provides valuable insights into the capabilities and limitations of AI language models in capturing cultural and moral norms. As AI continues to play an increasingly important role in our lives, it is essential to understand its potential biases and limitations to ensure that it can be used responsibly and ethically.


The results of this study have significant implications for the development of AI systems that interact with humans on a daily basis. By understanding how language models process moral information, researchers can work towards creating more accurate and culturally sensitive AI systems that better reflect human values and norms.


Cite this article: “AI Language Models Limited Ability to Capture Cultural and Moral Norms”, The Science Archive, 2025.


Ai Language Models, Cultural Norms, Moral Judgments, World Values Survey, Pew Research Center, Gpt-2, Bloomz-560M, Opt-125M, Qwen2, Moral Tokens


Reference: Evi Papadopoulou, Hadi Mohammadi, Ayoub Bagheri, “Large Language Models as Mirrors of Societal Moral Standards” (2024).


Leave a Reply