AI Models Fall Short on Cultural Moral Values

Friday 31 January 2025


The latest research in AI ethics has shed new light on the limitations of large language models (LLMs) when it comes to understanding and reflecting cultural diversity in moral judgments. A recent study has found that despite their impressive capabilities, LLMs are prone to propagating homogenized views on cross-cultural moral values.


Researchers tested several popular LLMs, including GPT-2 Medium, GPT-2 Large, OPT-125, QWEN, and BLOOM, using a variety of methods to evaluate their ability to capture the nuances of cultural moral attitudes. The results were striking: while some models performed better than others in certain areas, none of them demonstrated a truly accurate understanding of the complexities of moral judgment across cultures.


One of the primary issues with LLMs is that they are trained on large datasets that are often biased towards Western values and perspectives. This can result in models that reflect a narrow and culturally limited view of morality, which may not align with the diverse values and beliefs held by people from other cultures.


The study used two datasets, the World Values Survey (WVS) and the Pew Research Center’s Global Attitudes Survey (PEW), to evaluate the LLMs’ performance. The WVS dataset provides a comprehensive overview of moral attitudes across 97 countries, while the PEW survey focuses on specific topics related to morality, such as divorce, homosexuality, and contraception.


When tested against these datasets, the LLMs were found to be surprisingly inaccurate in their predictions. For example, one model, GPT-2 Large, was able to correctly identify some of the most controversial WVS topics, such as suicide and political violence, but struggled to accurately categorize others, like cheating on taxes.


Similarly, when evaluating PEW survey data, the LLMs were found to be inconsistent in their judgments. For instance, one model, QWEN, correctly identified having an abortion as a highly controversial topic, while another model, BLOOM, incorrectly categorized it as a widely accepted practice.


The study’s findings have significant implications for the development and deployment of AI technologies that rely on LLMs. As these models become increasingly integrated into our daily lives, it is essential to recognize their limitations and biases in order to ensure that they do not perpetuate harmful stereotypes or reinforce cultural norms that are damaging to marginalized communities.


Cite this article: “AI Models Fall Short on Cultural Moral Values”, The Science Archive, 2025.


Ai Ethics, Large Language Models, Moral Judgments, Cross-Cultural Values, Bias, Western Values, Cultural Diversity, World Values Survey, Pew Research Center, Global Attitudes Survey


Reference: Mijntje Meijer, Hadi Mohammadi, Ayoub Bagheri, “LLMs as mirrors of societal moral standards: reflection of cultural divergence and agreement across ethical topics” (2024).


Leave a Reply