Thursday 27 March 2025
Researchers have made significant progress in developing small, open-source language models that can analyze and assess essays on argumentation skills. These models, known as Large Language Models (LLMs), are designed to evaluate student writing by identifying the different components of an argument, such as claims, evidence, and conclusions.
The team behind this research has fine-tuned several LLMs, including Qwen 2.5 7B, Llama 3.1 8B, and Gemma 2 9B, to perform three key tasks: segmenting essays into distinct argument components, classifying the type of argument, and assessing the quality of the argument.
The segmentation task involves breaking down an essay into individual arguments or claims, which can be challenging as it requires identifying the logical structure of the text. The fine-tuned LLMs performed well in this task, with precision and recall scores ranging from 66% to 99%. This level of accuracy is crucial for assessing student writing, as it allows educators to provide targeted feedback on areas such as clarity, coherence, and effectiveness.
The classification task involves identifying the type of argument presented in an essay, including claims, counterclaims, evidence, and conclusions. The fine-tuned LLMs achieved macro-averaged F1 scores ranging from 27% to 47%, with Qwen 2.5 7B performing best on this task.
Finally, the quality assessment task involves evaluating the effectiveness of an argument by considering factors such as clarity, coherence, and evidence. The fine-tuned LLMs achieved macro-averaged F1 scores ranging from 34% to 47%, with Gemma 2 9B performing best on this task.
These results demonstrate the potential of small, open-source LLMs in evaluating student writing on argumentation skills. By using these models, educators can provide more effective feedback and support students in developing their critical thinking and writing abilities.
The use of small, open-source LLMs also offers several benefits, including reduced computational cost and increased accessibility. These models can be deployed locally without the need for expensive hardware or cloud computing infrastructure, making them an attractive option for educational institutions with limited resources.
In addition to their potential applications in education, these fine-tuned LLMs could also be used in other fields such as law, medicine, and business, where effective argumentation is essential.
Cite this article: “Evaluating Argumentation Skills with Small, Open-Source Language Models”, The Science Archive, 2025.
Language Models, Argumentation Skills, Essays, Large Language Models, Fine-Tuning, Segmentation, Classification, Quality Assessment, Critical Thinking, Writing Abilities







