Wednesday 16 April 2025
The quest for clarity in a world of ambiguity has long been a challenge for researchers and academics alike. Defining complex concepts, such as media bias, can be a daunting task, especially when it comes to extracting precise definitions from academic texts. A new study has shed light on this issue by developing a framework that leverages large language models (LLMs) to automate the process of definition extraction.
The researchers behind this project have designed a system called TaxoMatic, which uses a combination of relevance classification and definition extraction techniques to identify and extract definitions from academic texts. The team tested their approach on a dataset of 2,398 articles related to media bias, using five different prompting strategies to see how well the LLMs performed.
The results are impressive. By using Chain-of-Thought (CoT) prompting, which involves step-by-step reasoning, the Claude-3-sonnet model achieved a median cosine similarity score of 0.557, indicating a strong alignment between the extracted definitions and human-assigned definitions. This is particularly noteworthy given that media bias is a concept often shrouded in ambiguity and controversy.
One of the key challenges facing definition extraction is the problem of class imbalance, where certain concepts are more frequently mentioned than others. To address this issue, the researchers developed a novel approach using role prompting, which involves training the LLM to assume the role of an expert in the field. This enabled the model to better understand the nuances of media bias and extract definitions that accurately reflected its complexity.
The study’s findings have significant implications for the way we approach definition extraction in academic research. By automating this process using LLMs, researchers can save time and increase the accuracy of their results. Moreover, TaxoMatic’s ability to handle class imbalance and extract definitions from diverse texts opens up new possibilities for interdisciplinary research.
The authors acknowledge that their system is not without its limitations. For instance, the use of LLMs may reinforce existing biases present in source materials, which could perpetuate problematic concepts. Additionally, relying on automated definition extraction may lead to overconfidence in the accuracy of the results.
Despite these caveats, TaxoMatic represents a significant step forward in the quest for clarity in academic research. By harnessing the power of LLMs, researchers can gain a deeper understanding of complex concepts and extract definitions that accurately reflect their complexity.
Cite this article: “Unlocking the Power of Large Language Models for Automated Definition Extraction: A Study on Media Bias Detection”, The Science Archive, 2025.
Media Bias, Definition Extraction, Large Language Models, Taxomatic, Chain-Of-Thought Prompting, Role Prompting, Class Imbalance, Academic Research, Interdisciplinary Research, Clarity







