Machine Learning Models Improve Ability to Understand Short Answers in Less Common Languages

Sunday 09 March 2025


For a long time, machines have been trying to understand human language. They’ve made progress, but there’s still a big gap between what humans can do and what computers can do. One of the biggest challenges is understanding short answers, like those you might give in an exam or on a survey.


Recently, some clever people came up with a new way to train machines to understand short answers. They used something called large language models (LLMs) – basically, super-smart computer programs that can learn from huge amounts of text.


The team tested these LLMs on two languages: Latvian and Lithuanian. These languages are part of the Baltic language family, which is not as well-represented in AI research as some other languages. So, this was a chance to see how well the machines would do with languages that are less common.


The team asked the LLMs to generate answers to questions in both Latvian and Lithuanian. They then compared these answers to correct answers provided by native speakers of each language. This way, they could see if the machines were able to produce answers that a human would consider correct or not.


The results were interesting. In Latvian, one of the LLMs (called GPT-4o) did surprisingly well. It was able to generate answers that matched the correct answers most of the time, even when they used different words or phrases. The other LLMs didn’t do as well, but still managed to get some answers right.


In Lithuanian, things were a bit different. GPT-4o again did well, but another LLM (called LLaMa3:7b) also performed well. This one was able to generate answers that matched the correct answers even when they used very different words or phrases.


The team then asked two native speakers of each language to evaluate the machine-generated answers. They had to decide whether each answer was correct or not, and why. This gave the researchers a better idea of what the machines were getting right and wrong.


The results of this evaluation showed that GPT-4o was generally good at generating matched answers in both languages. However, it struggled with non-matched answers – those where the machine-generated answer didn’t match the correct answer. The other LLMs did even worse on these types of questions.


Cite this article: “Machine Learning Models Improve Ability to Understand Short Answers in Less Common Languages”, The Science Archive, 2025.


Here Are The Keywords: Machine Learning, Language Models, Latvian, Lithuanian, Baltic Languages, Ai Research, Natural Language Processing, Computer Programs, Text Analysis, Language Understanding.


Reference: Yevhen Kostiuk, Oxana Vitman, Łukasz Gagała, Artur Kiulian, “The Veln(ia)s is in the Details: Evaluating LLM Judgment on Latvian and Lithuanian Short Answer Matching” (2025).


Leave a Reply