AI-Powered Feedback Fails to Impress: A Critical Evaluation of Llama 3.2s Programming Exercise Assistance

Wednesday 16 April 2025


Researchers have been exploring the potential of artificial intelligence (AI) in generating feedback for novice programmers, aiming to improve their learning experience. In a recent study, scientists evaluated the performance of an open-source language model called Llama 3.2, which is designed to provide personalized feedback on programming exercises.


The researchers used authentic student solutions to introductory programming tasks written in Java and analyzed the output generated by Llama 3.2. They found that while the model was able to identify some errors, its overall performance was limited. The study revealed that many errors were not detected, and corrections provided were often incorrect or incomplete.


Moreover, the researchers discovered that Llama 3.2’s feedback lacked consistency and redundancy, making it challenging for students to understand and utilize effectively. Grammar and spelling mistakes further added to the complexity of the output. These findings suggest that relying solely on AI-generated feedback may not be sufficient for supporting novice programmers’ learning.


Despite these limitations, researchers acknowledge that Llama 3.2 still has potential in generating feedback, particularly when combined with human oversight or used as a supplement to other teaching methods. The study highlights the need for further research into developing more effective and accurate AI-powered feedback systems tailored to meet the specific needs of programming learners.


The results also underscore the importance of considering the cognitive demands and opportunities presented by generative AI in educational settings. As AI technologies continue to evolve, educators must carefully evaluate their potential benefits and limitations to ensure that they are used effectively to enhance student learning outcomes.


In their analysis, the researchers employed a combination of quantitative and qualitative methods to assess Llama 3.2’s performance. They evaluated the model’s output against a set of predefined criteria, examining both the accuracy of its feedback and its overall quality.


The study’s findings have significant implications for computer science education, highlighting the need for educators to consider alternative approaches to providing feedback that are more effective and supportive of novice programmers’ learning. As AI technologies continue to advance, it is essential to prioritize research into developing more accurate and helpful feedback systems that can be integrated into educational settings.


The researchers’ work contributes to a growing body of literature on AI-generated feedback in programming education, shedding light on the potential benefits and limitations of this technology. By exploring the intersection of AI and education, scientists can better understand how to harness these tools to support student learning and improve teaching practices.


Cite this article: “AI-Powered Feedback Fails to Impress: A Critical Evaluation of Llama 3.2s Programming Exercise Assistance”, The Science Archive, 2025.


Artificial Intelligence, Programming Education, Language Model, Feedback, Novice Programmers, Machine Learning, Java, Cognitive Demands, Educational Technology, Computer Science.


Reference: Imen Azaiz, Natalie Kiesler, Sven Strickroth, Anni Zhang, “Open, Small, Rigmarole — Evaluating Llama 3.2 3B’s Feedback for Programming Exercises” (2025).


Leave a Reply